Documentation - C API

sift.h File Reference

Scale Invariant Feature Transform (SIFT) More...

#include <stdio.h>
#include "generic.h"

Data Structures

struct  VlSiftKeypoint
 SIFT filter keypoint. More...
struct  VlSiftFilt
 SIFT filter. More...

Typedefs

typedef float vl_sift_pix
 SIFT filter pixel type.

Functions

Create and destroy
VlSiftFiltvl_sift_new (int width, int height, int noctaves, int nlevels, int o_min)
 Create a new SIFT filter.
void vl_sift_delete (VlSiftFilt *f)
 Delete SIFT filter.
Process data
int vl_sift_process_first_octave (VlSiftFilt *f, vl_sift_pix const *im)
 Start processing a new image.
int vl_sift_process_next_octave (VlSiftFilt *f)
 Process next octave.
void vl_sift_detect (VlSiftFilt *f)
 Detect keypoints.
int vl_sift_calc_keypoint_orientations (VlSiftFilt *f, double angles[4], VlSiftKeypoint const *k)
 Calculate the keypoint orientation(s)
void vl_sift_calc_keypoint_descriptor (VlSiftFilt *f, vl_sift_pix *descr, VlSiftKeypoint const *k, double angle)
 Compute the descriptor of a keypoint.
void vl_sift_calc_raw_descriptor (VlSiftFilt const *f, vl_sift_pix const *image, vl_sift_pix *descr, int widht, int height, double x, double y, double s, double angle0)
 Run the SIFT descriptor on raw data.
void vl_sift_keypoint_init (VlSiftFilt const *f, VlSiftKeypoint *k, double x, double y, double sigma)
 Initialize a keypoint from its position and scale.
Retrieve data and parameters
int vl_sift_get_octave_index (VlSiftFilt const *f)
 Get current octave index.
int vl_sift_get_noctaves (VlSiftFilt const *f)
 Get number of octaves.
int vl_sift_get_octave_first (VlSiftFilt const *f)
 Get first octave.
int vl_sift_get_octave_width (VlSiftFilt const *f)
 Get current octave width.
int vl_sift_get_octave_height (VlSiftFilt const *f)
 Get current octave height.
int vl_sift_get_nlevels (VlSiftFilt const *f)
 Get number of levels per octave.
int vl_sift_get_nkeypoints (VlSiftFilt const *f)
 Get number of keypoints.
double vl_sift_get_peak_thresh (VlSiftFilt const *f)
 Get peaks treashold.
double vl_sift_get_edge_thresh (VlSiftFilt const *f)
 Get edges threshold.
double vl_sift_get_norm_thresh (VlSiftFilt const *f)
 Get norm threshold.
double vl_sift_get_magnif (VlSiftFilt const *f)
 Get the magnification factor.
double vl_sift_get_window_size (VlSiftFilt const *f)
 Get the Gaussian window size.
vl_sift_pixvl_sift_get_octave (VlSiftFilt const *f, int s)
 Get current octave data.
VlSiftKeypoint const * vl_sift_get_keypoints (VlSiftFilt const *f)
 Get keypoints.
Set parameters
void vl_sift_set_peak_thresh (VlSiftFilt *f, double t)
 Set peaks threshold.
void vl_sift_set_edge_thresh (VlSiftFilt *f, double t)
 Set edges threshold.
void vl_sift_set_norm_thresh (VlSiftFilt *f, double t)
 Set norm threshold.
void vl_sift_set_magnif (VlSiftFilt *f, double m)
 Set the magnification factor.
void vl_sift_set_window_size (VlSiftFilt *f, double x)
 Set the Gaussian window size.

Detailed Description

Author:
Andrea Vedaldi
Credits:
May people have contributed with suggestions and bug reports. Although the following list is certainly incomplete, we would like to thank: Brian Fulkerson, Wei Dong, Loic, Giuseppe, Liu, Erwin, P. Ivanov, and Q. S. Luo.

Scale Invariant Feature Transform

This library module implements a SIFT filter object, a reusable object to extract SIFT features from one or multiple images of the same size.

Overview

A SIFT feature is a selected image region (also called keypoint) with an associated descriptor. Keypoints are extracted by the SIFT detector and their descriptors are computed by the SIFT descriptor. It is also common to use independently the SIFT detector (i.e. computing the keypoints without descriptors) or the SIFT descriptor (i.e. computing descriptors of custom keypoints).

SIFT detector

See also:
Scale space technical details, Detector technical details

A SIFT keypoint is a circular image region with an orientation. It is described by a geometric frame of four parameters: the keypoint center coordinates x and y, its scale (the radius of the region), and its orientation (an angle expressed in radians). The SIFT detector uses as keypoints image structures which resemble “blobs”. By searching for blobs at multiple scales and positions, the SIFT detector is invariant (or, more accurately, covariant) to translation, rotations, and rescaling of the image.

The keypoint orientation is also determined from the local image appearance and is covariant to image rotations. Depending on the symmetry of the keypoint appearance, determining the orientation can be ambiguous. In this case, the SIFT detectors returns a list of up to four possible orientations, constructing up to four frames (differing only by their orientation) for each detected image blob.

sift-frame.png

SIFT keypoints are circular image regions with an orientation.

There are several parameters that influence the detection of SIFT keypoints. First, searching keypoints at multiple scales is obtained by constructing a so-called “Gaussian scale space”. The scale space is just a collection of images obtained by progressively smoothing the input image, which is analogous to gradually reducing the image resolution. Conventionally, the smoothing level is called scale of the image. The construction of the scale space is influenced by the following parameters, set when creating the SIFT filter object by vl_sift_new():

  • Number of octaves. Increasing the scale by an octave means doubling the size of the smoothing kernel, whose effect is roughly equivalent to halving the image resolution. By default, the scale space spans as many octaves as possible (i.e. roughly log2(min(width,height)), which has the effect of searching keypoints of all possible sizes.
  • First octave index. By convention, the octave of index 0 starts with the image full resolution. Specifying an index greater than 0 starts the scale space at a lower resolution (e.g. 1 halves the resolution). Similarly, specifying a negative index starts the scale space at an higher resolution image, and can be useful to extract very small features (since this is obtained by interpolating the input image, it does not make much sense to go past -1).
  • Number of levels per octave. Each octave is sampled at this given number of intermediate scales (by default 3). Increasing this number might in principle return more refined keypoints, but in practice can make their selection unstable due to noise (see [1]).

Keypoints are further refined by eliminating those that are likely to be unstable, either because they are selected nearby an image edge, rather than an image blob, or are found on image structures with low contrast. Filtering is controlled by the follow:

  • Peak threshold. This is the minimum amount of contrast to accept a keypoint. It is set by configuring the SIFT filter object by vl_sift_set_peak_thresh().
  • Edge threshold. This is the edge rejection threshold. It is set by configuring the SIFT filter object by vl_sift_set_edge_thresh().
Summary of the parameters influencing the SIFT detector.
Parameter See also Controlled by Comment
number of octaves SIFT detector vl_sift_new
first octave index SIFT detector vl_sift_new set to -1 to extract very small features
number of scale levels per octave SIFT detector vl_sift_new can affect the number of extracted keypoints
edge threshold SIFT detector vl_sift_set_edge_thresh decrease to eliminate more keypoints
peak threshold SIFT detector vl_sift_set_peak_thresh increase to eliminate more keypoints

SIFT Descriptor

See also:
Descriptor technical details

A SIFT descriptor is a 3-D spatial histogram of the image gradients in characterizing the appearance of a keypoint. The gradient at each pixel is regarded as a sample of a three-dimensional elementary feature vector, formed by the pixel location and the gradient orientation. Samples are weighed by the gradient norm and accumulated in a 3-D histogram h, which (up to normalization and clamping) forms the SIFT descriptor of the region. An additional Gaussian weighting function is applied to give less importance to gradients farther away from the keypoint center. Orientations are quantized into eight bins and the spatial coordinates into four each, as follows:

sift-descr-easy.png

The SIFT descriptor is a spatial histogram of the image gradient.

SIFT descriptors are computed by either calling vl_sift_calc_keypoint_descriptor or vl_sift_calc_raw_descriptor. They accept as input a keypoint frame, which specifies the descriptor center, its size, and its orientation on the image plane. The following parameters influence the descriptor calculation:

  • magnification factor. The descriptor size is determined by multiplying the keypoint scale by this factor. It is set by vl_sift_set_magnif.
  • Gaussian window size. The descriptor support is determined by a Gaussian window, which discounts gradient contributions farther away from the descriptor center. The standard deviation of this window is set by vl_sift_set_window_size and expressed in unit of bins.

VLFeat SIFT descriptor uses the following convention. The y axis points downwards and angles are measured clockwise (to be consistent with the standard image convention). The 3-D histogram (consisting of $ 8 \times 4 \times 4 = 128 $ bins) is stacked as a single 128-dimensional vector, where the fastest varying dimension is the orientation and the slowest the y spatial coordinate. This is illustrated by the following figure.

sift-conv-vlfeat.png

VLFeat conventions

Note:
Keypoints (frames) D. Lowe's SIFT implementation convention is slightly different: The y axis points upwards and the angles are measured counter-clockwise.
sift-conv.png

D. Lowes' SIFT implementation conventions

Summary of the parameters influencing the SIFT descriptor.
Parameter See also Controlled by Comment
magnification factor SIFT Descriptor vl_sift_set_magnif increase this value to enlarge the image region described
Gaussian window size SIFT Descriptor vl_sift_set_window_size smaller values let the center of the descriptor count more

Extensions

Eliminating low-contrast descriptors. Near-uniform patches do not yield stable keypoints or descriptors. vl_sift_set_norm_thresh() can be used to set a threshold on the average norm of the local gradient to zero-out descriptors that correspond to very low contrast regions. By default, the threshold is equal to zero, which means that no descriptor is zeroed. Normally this option is useful only with custom keypoints, as detected keypoints are implicitly selected at high contrast image regions.

Using the SIFT filter object

The code provided in this module can be used in different ways. You can instantiate and use a SIFT filter to extract both SIFT keypoints and descriptors from one or multiple images. Alternatively, you can use one of the low level functions to run only a part of the SIFT algorithm (for instance, to compute the SIFT descriptors of custom keypoints).

To use a SIFT filter object:

To compute SIFT descriptors of custom keypoints, use vl_sift_calc_raw_descriptor().

Technical details

Scale space

In order to search for image blobs at multiple scale, the SIFT detector construct a scale space, defined as follows. Let $I_0(\mathbf{x})$ denote an idealized infinite resolution image. Consider the Gaussian kernel

\[ g_{\sigma}(\mathbf{x}) = \frac{1}{2\pi\sigma^2} \exp \left( -\frac{1}{2} \frac{\mathbf{x}^\top\mathbf{x}}{\sigma^2} \right) \]

The Gaussian scale space is the collection of smoothed images

\[ I_\sigma = g_\sigma * I, \quad \sigma \geq 0. \]

The image at infinite resolution $ I_0 $ is useful conceptually, but is not available to us; instead, the input image $ I_{\sigma_n} $ is assumed to be pre-smoothed at a nominal level $ \sigma_n = 0.5 $ to account for the finite resolution of the pixels. Thus in practice the scale space is computed by

\[ I_\sigma = g_{\sqrt{\sigma^2 - \sigma_n^2}} * I_{\sigma_n}, \quad \sigma \geq \sigma_n. \]

Scales are sampled at logarithmic steps given by

\[ \sigma = \sigma_0 2^{o+s/S}, \quad s = 0,\dots,S-1, \quad o = o_{\min}, \dots, o_{\min}+O-1, \]

where $ \sigma_0 = 1.6 $ is the base scale, $ o_{\min} $ is the first octave index, O the number of octaves and S the number of scales per octave.

Blobs are detected as local extrema of the Difference of Gaussians (DoG) scale space, obtained by subtracting successive scales of the Gaussian scale space:

\[ \mathrm{DoG}_{\sigma(o,s)} = I_{\sigma(o,s+1)} - I_{\sigma(o,s)} \]

At each next octave, the resolution of the images is halved to save computations. The images composing the Gaussian and DoG scale space can then be arranged as in the following figure:

sift-ss.png

GSS and DoG scale space structures.

The black vertical segments represent images of the Gaussian Scale Space (GSS), arranged by increasing scale $\sigma$. Notice that the scale level index s varies in a slightly redundant set

\[ s = -1, \dots, S+2 \]

This simplifies glueing together different octaves and extracting DoG maxima (required by the SIFT detector).

Detector

The SIFT frames (keypoints) are extracted based on local extrema (peaks) of the DoG scale space. Numerically, local extrema are elements whose $ 3 \times 3 \times 3 $ neighbors (in space and scale) have all smaller (or larger) value. Once extracted, local extrema are quadratically interpolated (this is very important especially at the lower resolution scales in order to have accurate keypoint localization at the full resolution). Finally, they are filtered to eliminate low-contrast responses or responses close to edges and the orientation(s) are assigned, as explained next.

Eliminating low contrast responses

Peaks which are too short may have been generated by noise and are discarded. This is done by comparing the absolute value of the DoG scale space at the peak with the peak threshold $t_p$ and discarding the peak its value is below the threshold.

Eliminating edge responses

Peaks which are too flat are often generated by edges and do not yield stable features. These peaks are detected and removed as follows. Given a peak $x,y,\sigma$, the algorithm evaluates the x,y Hessian of of the DoG scale space at the scale $\sigma$. Then the following score (similar to the Harris function) is computed:

\[ \frac{(\mathrm{tr}\,D(x,y,\sigma))^2}{\det D(x,y,\sigma)}, \quad D = \left[ \begin{array}{cc} \frac{\partial^2 \mathrm{DoG}}{\partial x^2} & \frac{\partial^2 \mathrm{DoG}}{\partial x\partial y} \\ \frac{\partial^2 \mathrm{DoG}}{\partial x\partial y} & \frac{\partial^2 \mathrm{DoG}}{\partial y^2} \end{array} \right]. \]

This score has a minimum (equal to 4) when both eigenvalues of the Jacobian are equal (curved peak) and increases as one of the eigenvalues grows and the other stays small. Peaks are retained if the score is below the quantity $(t_e+1)(t_e+1)/t_e$, where $t_e$ is the edge threshold. Notice that this quantity has a minimum equal to 4 when $t_e=1$ and grows thereafter. Therefore the range of the edge threshold is $[1,\infty)$.

Orientation assignment

A peak in the DoG scale space fixes 2 parameters of the keypoint: the position and scale. It remains to choose an orientation. In order to do this, SIFT computes an histogram of the gradient orientations in a Gaussian window with a standard deviation which is 1.5 times bigger than the scale $\sigma$ of the keypoint.

sift-orient.png

This histogram is then smoothed and the maximum is selected. In addition to the biggest mode, up to other three modes whose amplitude is within the 80% of the biggest mode are retained and returned as additional orientations.

Descriptor

A SIFT descriptor of a local region (keypoint) is a 3-D spatial histogram of the image gradients. The gradient at each pixel is regarded as a sample of a three-dimensional elementary feature vector, formed by the pixel location and the gradient orientation. Samples are weighed by the gradient norm and accumulated in a 3-D histogram h, which (up to normalization and clamping) forms the SIFT descriptor of the region. An additional Gaussian weighting function is applied to give less importance to gradients farther away from the keypoint center.

Construction in the canonical frame

Denote the gradient vector field computed at the scale $ \sigma $ by

\[ J(x,y) = \nalba I_\sigma(x,y) = \left[\begin{array}{cc} \frac{\partial I_\sigma}{\partial x} & \frac{\partial I_\sigma}{\partial y} & \end{array}\right] \]

The descriptor is a 3-D spatial histogram capturing the distribution of $ J(x,y) $. It is convenient to describe its construction in the canonical frame. In this frame, the image and descriptor axes coincide and each spatial bin has side 1. The histogram has $ N_\theta \times N_x \times N_y $ bins (usually $ 8 \times 4 \times 4 $), as in the following figure:

sift-can.png

Canonical SIFT descriptor and spatial binning functions

Bins are indexed by a triplet of indexes t, i, j and their centers are given by

\begin{eqnarray*} \theta_t &=& \frac{2\pi}{N_\theta} t, \quad t = 0,\dots,N_{\theta}-1, \\ x_i &=& i - \frac{N_x -1}{2}, \quad i = 0,\dots,N_x-1, \\ y_j &=& j - \frac{N_x -1}{2}, \quad j = 0,\dots,N_y-1. \\ \end{eqnarray*}

The histogram is computed by using trilinear interpolation, i.e. by weighing contributions by the binning functions

\begin{eqnarray*} \displaystyle w(z) &=& \mathrm{max}(0, 1 - |z|), \\ \displaystyle w_\mathrm{ang}(z) &=& \sum_{k=-\infty}^{+\infty} w\left( \frac{N_\theta}{2\pi} z + N_\theta k \right). \end{eqnarray*}

The gradient vector field is transformed in a three-dimensional density map of weighed contributions

\[ f(\theta, x, y) = |J(x,y)| \delta(\theta - \angle J(x,y)) \]

The historam is localized in the keypoint support by a Gaussian window of standard deviation $ \sigma_{\mathrm{win}} $. The histogram is then given by

\begin{eqnarray*} h(t,i,j) &=& \int g_{\sigma_\mathrm{win}}(x,y) w_\mathrm{ang}(\theta - \theta_t) w(x-x_i) w(y-y_j) f(\theta,x,y) d\theta\,dx\,dy \\ &=& \int g_{\sigma_\mathrm{win}}(x,y) w_\mathrm{ang}(\angle J(x,y) - \theta_t) w(x-x_i) w(y-y_j) |J(x,y)|\,dx\,dy \end{eqnarray*}

In post processing, the histogram is $ l^2 $ normalized, then clamped at 0.2, and $ l^2 $ normalized again.

Calculation in the image frame

Invariance to similarity transformation is attained by attaching descriptors to SIFT keypoints (or other similarity-covariant frames). Then projecting the image in the canonical descriptor frames has the effect of undoing the image deformation.

In practice, however, it is convenient to compute the descriptor directly in the image frame. To do this, denote with a hat quantities relative to the canonical frame and without a hat quantities relative to the image frame (so for instance $ \hat x $ is the x-coordinate in the canonical frame and $ x $ the x-coordinate in the image frame). Assume that canonical and image frame are related by an affinity:

\[ \mathbf{x} = A \hat\mathbf{x} + T, \qquad \mathbf{x} = \left[\begin{array}{cc} x \\ y \end{arraty}\right], \quad \hat\mathbf{x} = \left[\begin{array}{cc} \hat x \\ \hat y \end{arraty}\right]. \]

sift-image-frame.png

Then all quantities can be computed in the image frame directly. For instance, the image at infinite resolution in the two frames are related by

\[ \hat I_0(\hat\mathbf{x}) = I_0(\mathbf{x}), \qquad \mathbf{x} = A \hat\mathbf{x} + T. \]

The canonized image at scale $ \hat \sigma $ is in relation with the scaled image

\[ \hat I_{\hat{\sigma}}(\hat\mathbf{x}) = I_{A\hat{\sigma}}(\mathbf{x}), \qquad \mathbf{x} = A \hat\mathbf{x} + T \]

where, by generalizing the previous definitions, we have

\[ I_{A\hat \sigma}(\mathbf{x}) = (g_{A\hat\sigma} * I_0)(\mathbf{x}), \quad g_{A\hat\sigma}(\mathbf{x}) = \frac{1}{2\pi|A|\hat \sigma^2} \exp \left( -\frac{1}{2} \frac{\mathbf{x}^\top A^{-\top}A^{-1}\mathbf{x}}{\hat \sigma^2} \right) \]

Deriving shows that the gradient fields are in relation

\[ \hat J(\hat \mathbf{x}) = J(\mathbf{x}) A, \quad J(\mathbf{x}) = (\nabla I_{A\hat\sigma})(\mathbf{x}), \qquad \mathbf{x} = A \hat\mathbf{x} + T. \]

Therefore we can compute the descriptor either in the image or canonical frame as:

\begin{eqnarray*} h(t,i,j) &=& \int g_{\hat \sigma_\mathrm{win}}(\hat \mathbf{x})\, w_\mathrm{ang}(\angle \hat J(\hat\mathbf{x}) - \theta_t)\, w_{ij}(\hat\mathbf{x})\, |\hat J(\hat \mathbf{x})|\, d\hat \mathbf{x} \\ &=& \int g_{A \hat \sigma_\mathrm{win}}(\mathbf{x} - T)\, w_\mathrm{ang}(\angle J(\mathbf{x})A - \theta_t)\, w_{ij}(A^{-1}(\mathbf{x} - T))\, |J(\mathbf{x})A|\, d\mathbf{x}. \end{eqnarray*}

where we defined the product of the two spatial binning functions

\[ w_{ij}(\hat\mathbf{x}) = w(\hat x - \hat x_i) w(\hat y - \hat y_j) \]

In the actual implementation, this integral is computed by visiting a rectangular area of the image that fully contains the keypoint grid (along with half a bin border to fully include the bin windowing function). Since the descriptor can be rotated, this area is a rectangle of sides $m/2\sqrt{2} (N_x+1,N_y+1)$ (see also the illustration).

Standard SIFT descriptor

For a SIFT-detected keypoint of center $ T $, scale $ \sigma $ and orientation $ \theta $, the affine transformation $ (A,T) $ reduces to the similarity transformation

\[ \mathbf{x} = m \sigma R(\theta) \hat \mathbf{x} + T \]

where $ R(\theta) $ is a counter-clockwise rotation of $ \theta $ radians, $ m \mathcal{\sigma} $ is the size of a descriptor bin in pixels, and m is the descriptor magnification factor which expresses how much larger a descriptor bin is compared to the scale of the keypoint $ \sigma $ (the default value is m = 3). Moreover, the standard SIFT descriptor computes the image gradient at the scale of the keypoints, which in the canonical frame is equivalent to a smoothing of $ \hat \sigma = 1/m $. Finally, the default Gaussian window size is set to have standard deviation $ \hat \sigma_\mathrm{win} = 2 $. This yields the formula

\begin{eqnarray*} h(t,i,j) &=& m \sigma \int g_{\sigma_\mathrm{win}}(\mathbf{x} - T)\, w_\mathrm{ang}(\angle J(\mathbf{x}) - \theta - \theta_t)\, w_{ij}\left(\frac{R(\theta)^\top \mathbf{x} - T}{m\sigma}\right)\, |J(\mathbf{x})|\, d\mathbf{x}, \\ \sigma_{\mathrm{win}} &=& m\sigma\hat \sigma_{\mathrm{win}}, \\ J(\mathbf{x}) &=& \nabla (g_{m \sigma \hat \sigma} * I)(\mathbf{x}) = \nabla (g_{\sigma} * I)(\mathbf{x}) = \nabla I_{\sigma} (\mathbf{x}). \end{eqnarray*}

Author:
Andrea Vedaldi

Function Documentation

void vl_sift_calc_keypoint_descriptor ( VlSiftFilt f,
vl_sift_pix descr,
VlSiftKeypoint const *  k,
double  angle0 
)
Parameters:
fSIFT filter.
descrSIFT descriptor (output)
kkeypoint.
angle0keypoint direction.

The function computes the SIFT descriptor of the keypoint k of orientation angle0. The function fills the buffer descr which must be large enough to hold the descriptor.

The function assumes that the keypoint is on the current octave. If not, it does not do anything.

int vl_sift_calc_keypoint_orientations ( VlSiftFilt f,
double  angles[4],
VlSiftKeypoint const *  k 
)
Parameters:
fSIFT filter.
anglesorientations (output).
kkeypoint.

The function computes the orientation(s) of the keypoint k. The function returns the number of orientations found (up to four). The orientations themselves are written to the vector angles.

Remarks:
The function requires the keypoint octave k->o to be equal to the filter current octave vl_sift_get_octave. If this is not the case, the function returns zero orientations.
The function requires the keypoint scale level k->s to be in the range s_min+1 and s_max-2 (where usually s_min=0 and s_max=S+2). If this is not the case, the function returns zero orientations.
Returns:
number of orientations found.
void vl_sift_calc_raw_descriptor ( VlSiftFilt const *  f,
vl_sift_pix const *  grad,
vl_sift_pix descr,
int  width,
int  height,
double  x,
double  y,
double  sigma,
double  angle0 
)
Parameters:
fSIFT filter.
gradimage gradients.
descrSIFT descriptor (output).
widthimage width.
heightimage height.
xkeypoint x coordinate.
ykeypoint y coordinate.
sigmakeypoint scale.
angle0keypoint orientation.

The function runs the SIFT descriptor on raw data. Here image is a 2 x width x height array (by convention, the memory layout is a s such the first index is the fastest varying one). The first width x height layer of the array contains the gradient magnitude and the second the gradient angle (in radians, between 0 and $ 2\pi $). x, y and sigma give the keypoint center and scale respectively.

In order to be equivalent to a standard SIFT descriptor the image gradient must be computed at a smoothing level equal to the scale of the keypoint. In practice, the actual SIFT algorithm makes the following additional approximation, which influence the result:

  • Scale is discretized in S levels.
  • The image is downsampled once for each octave (if you do this, the parameters x, y and sigma must be scaled too).
void vl_sift_delete ( VlSiftFilt f )
Parameters:
fSIFT filter to delete.

The function frees the resources allocated by vl_sift_new().

void vl_sift_detect ( VlSiftFilt f )

The function detect keypoints in the current octave filling the internal keypoint buffer. Keypoints can be retrieved by vl_sift_get_keypoints().

Parameters:
fSIFT filter.

Index GSS

Index matrix A

double vl_sift_get_edge_thresh ( VlSiftFilt const *  f ) [inline]
Parameters:
fSIFT filter.
Returns:
threshold.
VlSiftKeypoint const * vl_sift_get_keypoints ( VlSiftFilt const *  f ) [inline]
Parameters:
fSIFT filter.
Returns:
pointer to the keypoints list.
double vl_sift_get_magnif ( VlSiftFilt const *  f ) [inline]
Parameters:
fSIFT filter.
Returns:
magnification factor.
int vl_sift_get_nkeypoints ( VlSiftFilt const *  f ) [inline]
Parameters:
fSIFT filter.
Returns:
number of keypoints.
int vl_sift_get_nlevels ( VlSiftFilt const *  f ) [inline]
Parameters:
fSIFT filter.
Returns:
number of leves per octave.
int vl_sift_get_noctaves ( VlSiftFilt const *  f ) [inline]
Parameters:
fSIFT filter.
Returns:
number of octaves.
double vl_sift_get_norm_thresh ( VlSiftFilt const *  f ) [inline]
Parameters:
fSIFT filter.
Returns:
threshold.
vl_sift_pix * vl_sift_get_octave ( VlSiftFilt const *  f,
int  s 
) [inline]
Parameters:
fSIFT filter.
slevel index.

The level index s ranges in the interval s_min = -1 and s_max = S + 2, where S is the number of levels per octave.

Returns:
pointer to the octave data for level s.
int vl_sift_get_octave_first ( VlSiftFilt const *  f ) [inline]

-------------------------------------------------------------------

Parameters:
fSIFT filter.
Returns:
index of the first octave.
int vl_sift_get_octave_height ( VlSiftFilt const *  f ) [inline]
Parameters:
fSIFT filter.
Returns:
current octave height.
int vl_sift_get_octave_index ( VlSiftFilt const *  f ) [inline]
Parameters:
fSIFT filter.
Returns:
index of the current octave.
int vl_sift_get_octave_width ( VlSiftFilt const *  f ) [inline]
Parameters:
fSIFT filter.
Returns:
current octave width.
double vl_sift_get_peak_thresh ( VlSiftFilt const *  f ) [inline]
Parameters:
fSIFT filter.
Returns:
threshold ;
double vl_sift_get_window_size ( VlSiftFilt const *  f ) [inline]
Parameters:
fSIFT filter.
Returns:
standard deviation of the Gaussian window (in spatial bin units).
void vl_sift_keypoint_init ( VlSiftFilt const *  f,
VlSiftKeypoint k,
double  x,
double  y,
double  sigma 
)
Parameters:
fSIFT filter.
kSIFT keypoint (output).
xx coordinate of the keypoint center.
yy coordinate of the keypoint center.
sigmakeypoint scale.

The function initializes a keypoint structure k from the location x and y and the scale sigma of the keypoint. The keypoint structure maps the keypoint to an octave and scale level of the discretized Gaussian scale space, which is required for instance to compute the keypoint SIFT descriptor.

Algorithm

The formula linking the keypoint scale sigma to the octave and scale indexes is

\[ \sigma(o,s) = \sigma_0 2^{o+s/S} \]

In addition to the scale index s (which can be fractional due to scale interpolation) a keypoint has an integer scale index is too (which is the index of the scale level where it was detected in the DoG scale space). We have the constraints (Detector see also the "SIFT detector"):

  • o is integer in the range $ [o_\mathrm{min}, o_{\mathrm{min}}+O-1] $.
  • is is integer in the range $ [s_\mathrm{min}+1, s_\mathrm{max}-2] $. This depends on how the scale is determined during detection, and must be so here because gradients are computed only for this range of scale levels and are required for the calculation of the SIFT descriptor.
  • $ |s - is| < 0.5 $ for detected keypoints in most cases due to the interpolation technique used during detection. However this is not necessary.

Thus octave o represents scales $ \{ \sigma(o, s) : s \in [s_\mathrm{min}+1-.5, s_\mathrm{max}-2+.5] \} $. Note that some scales may be represented more than once. For each scale, we select the largest possible octave that contains it, i.e.

\[ o(\sigma) = \max \{ o \in \mathbb{Z} : \sigma_0 2^{\frac{s_\mathrm{min}+1-.5}{S}} \leq \sigma \} = \mathrm{floor}\,\left[ \log_2(\sigma / \sigma_0) - \frac{s_\mathrm{min}+1-.5}{S}\right] \]

and then

\[ s(\sigma) = S \left[\log_2(\sigma / \sigma_0) - o(\sigma)\right], \quad is(\sigma) = \mathrm{round}\,(s(\sigma)) \]

In practice, both $ o(\sigma) $ and $ is(\sigma) $ are clamped to their feasible range as determined by the SIFT filter parameters.

VlSiftFilt* vl_sift_new ( int  width,
int  height,
int  noctaves,
int  nlevels,
int  o_min 
)
Parameters:
widthimage width.
heightimage height.
noctavesnumber of octaves.
nlevelsnumber of levels per octave.
o_minfirst octave index.

The function allocates and returns a new SIFT filter for the specified image and scale space geometry.

Setting O to a negative value sets the number of octaves to the maximum possible value depending on the size of the image.

Returns:
the new SIFT filter.
See also:
vl_sift_delete().
int vl_sift_process_first_octave ( VlSiftFilt f,
vl_sift_pix const *  im 
)
Parameters:
fSIFT filter.
imimage data.

The function starts processing a new image by computing its Gaussian scale space at the lower octave. It also empties the internal keypoint buffer.

Returns:
error code. The function returns VL_ERR_EOF if there are no more octaves to process.
See also:
vl_sift_process_next_octave().
int vl_sift_process_next_octave ( VlSiftFilt f )
Parameters:
fSIFT filter.

The function computes the next octave of the Gaussian scale space. Notice that this clears the record of any feature detected in the previous octave.

Returns:
error code. The function returns the error VL_ERR_EOF when there are no more octaves to process.
See also:
vl_sift_process_first_octave().
void vl_sift_set_edge_thresh ( VlSiftFilt f,
double  t 
) [inline]
Parameters:
fSIFT filter.
tthreshold.
void vl_sift_set_magnif ( VlSiftFilt f,
double  m 
) [inline]
Parameters:
fSIFT filter.
mmagnification factor.
void vl_sift_set_norm_thresh ( VlSiftFilt f,
double  t 
) [inline]
Parameters:
fSIFT filter.
tthreshold.
void vl_sift_set_peak_thresh ( VlSiftFilt f,
double  t 
) [inline]
Parameters:
fSIFT filter.
tthreshold.
void vl_sift_set_window_size ( VlSiftFilt f,
double  x 
) [inline]
Parameters:
fSIFT filter.
xGaussian window size (in units of spatial bin).

This is the parameter $ \hat \sigma_{\text{win}} $ of the standard SIFT descriptor Standard SIFT descriptor.