Stanford CS131 学习笔记（六）

上一篇笔记中主要是学习的怎么寻找关键点，那么找到关键点之后，下一步就是要在关键点周围找个区域，是这个区域通过缩放，旋转等操作能够match 那么怎么去找这个区域呢？

Scale invariant region selection

Automatic scale selection
Difference-of-Guanssian detector

Scale Invariant Detection

什么叫缩放不变性？就是一旦选中了关键点，不论这个选择的区域怎么缩放，两幅图中的这一部分都会match. 那么问题来了，我们怎么去独立的选取两幅图中的区域呢？有一种情况就是，当这个区域选择的比较小的情况下，两幅图是match的，但是一旦把这个区域放大，两幅图就不Match了，我们要避免这种情况。

方案
1. 设计一个关于区域的函数，达到区域不变性，两幅图像都要有对应的函数，尽管他们的区域大小不同。例如，平均灰度值，尽管区域大小不同，相应区域的值应该是相等的。
2. 对于一个点，我们可以把它看做是一个关于区域大小的函数。
通常做法：找到这个函数的局部最大值
观察：最大值是对应的区域大小，就是我们要选择的区域大小
一个好得函数只有一个稳定的剧烈变化的峰
对于一般的图像：一个好的函数是一个关于对比度的函数，也就是局部灰度值剧烈变化
决定区域大小的函数 f = Kernel*image , 下边是两个Kernel

1. Laplacian 拉普拉斯算子

$$
L = \sigma_{2}\left(G_{xx}(x, y, \sigma) + G_{yy}(x, y, \sigma)\right)
$$

2. DoG 高斯差分算子

$$
DoG = G(x, y, k\sigma) - G(x, y, \sigma) \\\\
where \ \ G(x, y, \sigma) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{x^{2} + y^{2}}{2\sigma^{2}}}
$$

Both kernels are invariant to scale and rotation

Scale invariant Detectors

这里有一个3维模型.

Harris-Laplacian

找局部最大值：x, y坐标是Harris corner detector, z坐标是laplacian in scale
- Harris corner detector in space (image coordinates )
- Laplacian in scale
SIFT

找局部最大值: x,y 坐标和z坐标都是DoG
- Difference of Gaussian in space and scale.

总结

输入：相同场景的两幅图，有着非常大的scale difference
目标：独立找到在两图像中找到相同的 interest points
方案: 在整幅图像中寻找scale函数的最大值。
1. Harris-Laplacian: maximum Laplacian over scale, Harris’s measure of corner response over the image.
2. SIFT: maximum DoG over scale and the image

SIFT

Local Descriptors

我们已经知道了怎么去寻找关键点，那么我们怎么描述这些关键点，然后去match呢？

这个描述需要包含两个重要特性

Invariant
Distictive

Invariant Local Features

图像内容被变换成局部特征坐标系，这个坐标系对于平移，缩放，旋转和其他的图像操作都具有不变性。

不变的局部特征的优点

局部性: 特征是局部的，能够适应场景是被遮挡或者杂乱的。
特殊性: 每一个特征可以跟一个非常大的数据库做匹配
数量：即使是非常小的物体，也可以生成大量的局部特征。
高效：接近实时
扩展性：能够非常容易的扩展到大范围的特征类型,增加稳定性

Scale Invariant

需要一种方法，能够在location和scale范围内重复的选取这些点。

The only reasonable scale-space kernel is Gaussian

An effective choice is to detect peaks in the DoG pyramid

DoG with constant ratio of scales is a close approximation to Lindeberg’s scale-normalized Laplacian.

Becoming rotation invariant

我们给定了关键点和scale from DoG, We will given a keypoint and its scale from DoG
我们要为这个关键点选择一个方向特征（基于局部的最大梯度）, We will select a characteristic orientation for the keypoint, based on the most prominent gradient there.
我们要把所有的特征都描述到相对这个方向上去, We will describe all features relative to this orientation.
Causes features to be rotation invariant
- 如果关键点在另一幅图像中旋转了，特征还会是一样的，因为他们都是相对于特征方向的， if the keypoint appears in another image, the features will be the same, because they are relative to the characteristic orientation.
选择特征方向, choosing characteristic orientation
1. Use the blurred image associated with the keypoint’s scale. Look at pixels in a squre around it(say, size 16x16).
2. Computer gradient direction at each pixel, this is easy, using vertical and horizontal edge filters.
3. Create a histogram of these local gradient directions
4. Keypoint orientation = the peak of that histogram
5. Minor details: we will also weight each pixel’s histogram contribution by the magnitude of its gradient and how close it is to the keypoint.
这样，每一个keypoint 都有了一个稳定的2D 坐标(x, y, scale, orientation). 我们必须给他们一个”指纹”。

SIFT descriptor formation

Use the blurred image associated with the keypoints scale.
Take image gradients over the keypoint neighborhood.
To become rotation invraiant, rotate the gradiant directions AND locations by keypoint orientation $\theta$
1. Now we have cancelled out rotation and have gradients expressed at locations relative to keypoint orientation $\theta$
2. We could also have just rotated the whole image by $-\theta$, but that would be slower.
Using precise gradient location is fragile, we’d like to allow sosme “slop” in the image, and still produce a very similar descriptor.
Create array of orientation histograms
Put the rotated gradients into their local orientation histograms
1. A gradient’s contribution is divided among the nearby histograms based on distance. If it’s halfway between histogram locations, it gives half contribution to both.
2. Also, scle down gradient contributions for gradients far from the center.
The SIFT authors found the best result were with 8 orientations bins per histogram, and a 4x4 hitogram array.
8 orientation bins per histogram, and a 4x4 histogram array, yield 8x4x4=128 numbers.
So a SIFT descriptor is a length of 128 vector, which is invariant to rotation (because we rotated the descriptor) and scale (because we worked with the saled image from DoG).
We can compare each vecctor from image A to each vector from image B to find the matching keypoints!
- Euclidean “distance” between descriptor vectors gives a good measure of keypoint simlarity. .
Adding robustness to illumination changes:
1. Remember that the descriptor is made of gradients (differences between pixels), so it’s already invariant to changes in brightness (e.g. adding 10 to all image pixels yield the exact same descriptor).
2. A high-contrast photo will increase the magnitude of gradients linearly. So, to correct for contrast changes, normalize the vector (scale to length 1.0 ).
3. Very large image gradients are usually from unreliable 3D illumination effects(glare, etc). So, to reduce there effect, clamp all values in the vector to be $\leq 0.2$ (an experimentally tuned value). Then normalize the vector again.
4. Result is a vector which is fairly invariant to illumination changes.

应用

Wide baseline stereo
运动侦测
Panoramas, 全景照片
Mobile robot navigation
3D reconstruction
Recognition