Robust Image Matching Algorithm Using SIFT on Multiple Layered Strategies

As for the unsatisfactory accuracy caused by SIFT (scale-invariant feature transform) in complicated image matching, a novel matching method on multiple layered strategies is proposed in this paper. Firstly, the coarse data sets are filtered by Euclidean distance. Next, geometric feature consistency constraint is adopted to refine the corresponding feature points, discarding the points with uncoordinated slope values. Thirdly, scale and orientation clustering constraint method is proposed to precisely choose the matching points. The scale and orientation differences are employed as the elements of k-means clustering in the method. Thus, two sets of feature points and the refined data set are obtained. Finally, 3 ∗ delta rule of the refined data set is used to search all the remaining points. Ourmultiple layered strategiesmake full use of feature constraint rules to improve thematching accuracy of SIFT algorithm.The proposedmatchingmethod is compared to the traditional SIFT descriptor in various tests.The experimental results show that the proposed method outperforms the traditional SIFT algorithm with respect to correction ratio and repeatability.


Introduction
Feature points extraction and registration is an important section in computer vision realms, such as target recognition, image stitching, 3D reconstructing, and target tracking.Recently, a new feature vector descriptor based on local invariant information [1][2][3] has been widely used in computer vision fields.The main idea of image registration is to extract lots of feature points and generate feature vectors with local information.Firstly, lots of feature points are extracted by different methods, for instance, Harris corner operator, SUSAN detection operator, and SIFT descriptor.Then, the feature descriptor regarding each candidate point is generated.Moreover, the constraint rules are used to check whether these feature descriptors are correctly matched pairs.Thus, corresponding pairs of feature points are obtained, achieving the goal of image registration.
In order to test the performances of feature descriptors widely used in computer vision fields, lots of experiments were developed by Mikolajczyk and Schmid [4].In these experiments, sift method reveals more satisfactory performance and better robustness than other descriptors.
Lowe [2] presented SIFT descriptor which was invariant to several transformations including rotation changes, translation changes, and so on.Due to its multiple merits mentioned above, SIFT descriptor has been widely utilized in target tracking and recognition and other computer vision realms.In these fields mentioned above, firstly, we use SIFT descriptor to extract stable feature points and generate feature descriptors with local context of image.Then, we need to find corresponding pairs of feature points via various matching methods.It is evident that the corresponding points with high precision are the basis of further application.Therefore, it is not difficult to see that improving the matching performance is of importance.
In the past, many scholars have presented various types of improved matching algorithms.Wang et al. [5] proposed a new method based on slope value and distance.The distances and slope values of matching points are calculated.Then, the maximum of statistics is found.Moreover, certain value with regard to maximum is used to filter out these mismatching points.Though the method mentioned has achieved satisfactory results in eye ground image matching, the performance

Introduction of SIFT Descriptor
SIFT descriptor based on multiple scale spaces was presented by Lowe in 2004.The approach is divided into four sections, which will be shown as follows [2].
2.1.Space Extreme Detection.First of all, images between two adjacent octaves are downsampled by a factor of 2. Multikernel Gaussian functions are adopted to smooth images belonging to different octaves.Thus, the Gauss pyramid is established.The DoG space pyramid is generated by the difference of Gauss pyramid between two adjacent scales belonging to the same octave.Then the DoG space pyramid is established.Consider the following: where  represents the scale factor and (, ) is the input image.Also, * is the convolution operation in  and .Meanwhile, (, , ) is the representative of Gaussian function with different scale space kernels.
In order to detect extreme points from the scale space, the pixel point is compared with its neighbor points in a 3 * 3 * 3 cube consisting of three adjacent intervals belonging to the same octave.This pixel point is chosen as the candidate point on condition that it is a local extreme with regard to the extreme detection cube.

Keypoints Localization.
The next step is to perform a detailed fit to the nearby data for location, scale, and ratio of principal curvatures.Low contrast points and unstable points with strong edge responses are discarded to improve the robustness of keypoints.
Firstly, Taylor expansion of the scale-space formula with regard to each candidate point is adopted.The specific steps are shown as follows.These candidate points with low contrast values will be discarded from the candidate points.Consider where  = (, , )  is the offset from this point.The accurate position of extreme keypoint X is found by calculating the derivative of function  regarding point .Furthermore, the derivative of function  is set to be zero, which is shown in Substituting ( 4) into (3), we have Candidate points with the absolute value of formula (5) less than some certain threshold (0.04 in this paper) should be abandoned.
In order to filter out the unstable points with strong responses along edges, Hessian matrix is used: If the value of Tr 2 ()/Det() is less than certain threshold TH, it is reserved as one of the candidate points.Therefore, the unstable objects with strong responses along edges will be discarded.

Feature Descriptor Generation.
In this stage, each keypoint is assigned a principal orientation by calculating the orientation histogram of vicinity with respect to the keypoint.This allows for the representation of each keypoint relative to this orientation, achieving invariance to image rotation.Then, the maximum value in the orientation histogram is obtained.In order to get a more precise orientation assignment, the peaks of the histogram are interpolated via the adjacent points.The original standard coordinate is rotated according to the main orientation, making the feature vector invariant to rotation changes.Each keypoint vector is established by calculating the gradient magnitude and orientation at each sample point in a region around the feature point region.Finally, a feature descriptor with 128 elements is obtained with respect to each feature point.

Matching.
In [2], Euclidean distance criterion is selected as the rule to distinguish the matching extent between two feature vectors from the reference image and the unregistered one.The nearest neighbor point is defined as the keypoint with minimum Euclidean distance with regard to the invariant descriptor vector.In order to obtain a more accurate matching result, the ratio of the first closest point to the second closest point is used.If this ratio is lower than a certain value, it is a corresponding point with regard to certain feature point.The value of the first-closest to secondclosest neighbor is advised to be in the range of [0.4,0.8] [2].In our experiments, the ratio is kept unchanged in the same group of tests.

Multiple Layered Strategies
In this section, a new method based on multiple layered strategies is presented.Firstly, Euclidean distance is used to discard the mismatching pairs.In addition, geometric feature consistency constraint is proposed to further filter out these points.In this section, the error points with abnormal slope values are discarded.A new method based on scale and orientation differences constraint is used to refine these matching points.Then the process of image matching based on multiple layered strategies is accomplished.

Euclidean Distance Constraint.
After the process of keypoint searching and feature descriptor generation, Euclidean distance rule is used to filter the original data.In this section, the Best-Bin-First (BBF) searching method is adopted, which returns the closest neighbor with high probability.
In [2], the range of threshold is set to be in the range of [0.4,0.8].Threshold (0.75) is used in this paper as the first matching judge criterion.Consider the following: where  = ∑  =1 (  −   ), a = ( 1 ,  2 , . . .,   ), b = ( 1 ,  2 , . . .,   ),  is the dimension of feature the vector used in performance tests.In (7), vector a is one of the feature descriptors extracted from the reference image and b is one of the feature descriptors from the unregistered image.

Geometric Feature Consistency Constraint.
Geometric feature consistency constraint means the relationship between the keypoints from the reference image and the unregistered image, including parallel feature, perpendicular attribute, and similar characteristic.The purpose of image matching is to figure out the transformation parameters between the reference image and the unregistered image, and then correct the coordinate of input image into the reference coordinate.In image matching section, the similar attribute still exists in certain region with regard to feature points, which is called geometric feature consistency constraint.
In our past experiments, corresponding feature points which have been matched by Euclidean distance still have lots of mismatching pairs.
It is evident that the slope value of correct matching points converges into a data set.Meanwhile, these feature points have evidently uncoordinated slope values in certain region, which belongs to distortion statistics data.Hence, a conclusion is that these distinct error points have uncoordinated slope values.Suppose that the correct points have similar slope values in the region between the reference image and the unregistered image, then the points which conform to the constraint rule can be discarded.
Feature points from reference image after Euclidean distance are stored in image1 set as follows: Feature points from the unregistered image after Euclidean distance rule are stored in image2 set as follows: where  is the number of feature points.The process of geometric consistency constraint is shown as follows.
(1) Firstly, the absolute value of slope with regard to corresponding feature points is obtained as All these results will be stored in another data set:  set = { 1 ,  2 , . . .,   }.
(2) Then, the maximum value of all these slope values is obtained.Suppose  maximum is the maximum value.(3) Threshold  with regard to the maximum value  maximum is used to delete error points.All these points conforming the rule (  <  ⋅  maximum ) will be discarded from the original set.The results further refined by geometric feature consistency constraint are stored in data base:  sieved = { 1 ,  2 , . . .,   }.

Feature Points Constraint Based on Scale and Orientation Differences.
As we know sift feature point is a structure which contains coordinate location, orientation, scale, and descriptor information.Differences between two corresponding principal orientations of matching points indicate the rotation transformation relationship between the reference image and the unregistered image [11].Supposing that all these matched points are correct, the differences should keep constant theoretically.Hence, this characteristic could be used to filter out the mismatching feature points from original data set, refining these corresponding pairs of feature points.However, in fact, affected by several factors, including the accuracy of transformation model between two images, the calculation errors, and the accuracy of matching feature points, the differences between two corresponding principal orientations should converge at a certain value.That is, the differences between the corresponding feature points from the reference image and the unregistered image should converge at a certain point.The distribution characteristic of orientation differences is similar to Gaussian distribution function and lots of statistics converge at certain center point.
Meanwhile, the differences between scales of corresponding points converge at a certain center point [12].The distribution characteristic is similar to that of orientation differences.In order to demonstrate the characteristic, we do several experiments in advance.Here, only one of the test templates is specified in detail.In this test, Figure 1 Figure 2(a) shows the distribution characteristic of the scale differences between corresponding feature points.Figure 2(b) represents the distribution characteristic of orientation differences between matching pairs.We can get the distribution map of scale and orientation in two-dimensional discretion map as shown in Figure 2(c).
After Euclidean distance constraint and geometric feature constraint, these corresponding points show large extent of convergence.In these figures, there are 367 points.Meanwhile, data sets of scale information converge at the point with average 0.3284 and variance 0.4383.Elements of orientation differences set converge at the point with average −0.2288 and variance 0.3965.
Evident error points have been discarded after Euclidean distance and geometric feature constraint.From the results above, it is shown that statistics reveal great convergence attribute.In Figure 2(c), abscissa is scale while ordinate stands for orientation.
Figure 3(a) is the histogram of scale difference information and Figure 3(b) shows the distribution map of orientation differences.From the figures of scale and orientation, it verifies our supposition in Section 3.3.It is evident that the distribution characteristic of these test statistics is similar to Gaussian distribution characteristic.That is, a large number of data points converge at the vicinity regarding certain point.Meanwhile, from the center point, distribution characteristic shows decay in the discretion map.All the results have been demonstrated in Matlab simulation workspace.Based on this idea, we implement the process of results refined.
The main idea of the presented approach in this paper is as follows.Firstly, Euclidean distance and geometric feature constraint are used to filter out mismatching pairs of feature points.Then, the orientation and scale factor differences of the set of pairs are calculated after the first step.Moreover, orientation and scale differences are used in data clustering.In our experiments, -means that algorithm is used to realize the data clustering.The result of data clustering shows two classes, that is, correct set- and error set-.Suppose that  is the clustering center point with regard to the correct set . Also,  is another clustering center point of the error set . Next, suppose that point  is one of the points in error set .Then, the distance from point  to the clustering center point  is obtained.Consequently, the shortest distance  is obtained.
In the following step, distance  is used as the inner radius regarding correct set . Suppose that point  is one of the correct set .The distance from  to clustering center point  is obtained.Once the value is less than , the relative point  is regarded as the element of refined set .After the process, the other elements of set  are stored in set .
In order to get enough correct points, we do the process of searching points which have been abandoned in previous step.Confidence interval of orientation differences with respect to refined set  is used to select these correct points from set .After all these procedures, the process of image matching based on multiple layered strategies rule is accomplished.The confidence interval rule is shown as follows.
It is assumed that  is the average value of orientation differences with regard to set ,  is the standard deviation of orientation differences with regard to set , and   is another point from other set .If the absolute value of difference between   and  is less than 3,   is regarded as one of the elements regarding set . Specific procedures are as follows.
(1) Firstly, Euclidean distance constraint is used to filter out mismatching points from sets of feature points as follows.
Feature points from the reference image are stored in set image1 points.Feature points from the unregistered image are stored in set image2 points as follows: where  is the number of feature points.
Each element of the sets contains the position of point, orientation, and scale factor, which is shown as follows: (2) Geometric consistency constraint rule is used to filter out mismatching points from image1 points and image2 points.Then, the two sets of feature points are obtained by geometric consistency constraint.The points from reference image are stored in set one = ( 1 ,  2 , . . .,   ), while those of unregistered image are stored in set two = ( 1 ,  2 , . . .,   ), where  ( ≤ ) is the number of feature points after geometric consistency constraint.
(14) (4) In this step, -means algorithm is adopted.The data set scl angle sieve delta is divided into two parts, namely, clustering correct and clustering error.Meanwhile, two clustering center points (center point correct and center point error) are obtained.Consider the following: clustering correct = (scl ori delta 1 , scl ori delta 2 , . . ., scl ori delta  ) , clustering error = (scl ori delta 1 , scl ori delta 2 , . . ., scl ori delta − ) , where  is the number of elements with regard to clustering correct.
(5) The distance from each point regarding clustering error to the center point correct is calculated in this step.Thus, the shortest distance  is obtained.Then, we calculate the distance form each point regarding clustering correct to the center point correct.If the distance is less than , the corresponding element of clustering correct is stored in data correct scl ori.The corresponding positions of feature points are stored in two sets: data correct one and data correct two.Note that data correct one contains the positions of feature points from the reference image and data correct two contains the positions of feature points from the unregistered image.
The other elements of clustering correct are stored in data sieve scl ori.The corresponding positions of feature points are stored in two sets: data sieve one and data sieve two, where, data correct one = {( 1 ,  1 ) , ( 2 ,  2 ) , . . ., ( num ,  num )} ,  (6) The last process is serching with the confidence interval of refined data set data correct scl ori.All these elements from data sieve scl ori meeting the confidence interval will be inserted into data correct scl ori.All these corresponding feature points will be pushed into data correct one and data correct two.
Flowchart of the proposed method is presented in Figure 4.

Repeatability.
To evaluate the performances of this presented approach, repeatability [13] is adopted as one of the criteria.Repeatability illuminates the stability of interest points detected via different keypoint extraction methods.With regard to these images, repeatability stands for the portion of keypoints both in the reference and unregistered images.
Suppose that  is a 3D point.Meanwhile, 1 and 2 are two relative projection matrices.It is assumed that feature point   is detected from the reference image   .This keypoint is repeated if the corresponding point   can be detected in the unregistered image   .The definition of repeatability ratio index is defined as the ratio of the number of keypoints repeated between relative images to the total number of feature points.
In the process of repeated point detection, the factor should be taken into account that the observed scene parts differ in the presence of changed imaging conditions.Consequently, these keypoints which exist in the common parts are adopted to calculate the repeatability measure.In order to find the common parts, homography matrix is used.This method is defined as follows: Furthermore, the uncertainty of detection should be considered in the repeatability measure.In fact, a repeated keypoint is not detected exactly at the position of point   .However, it exists in the certain neighborhood region with regard to   which is shown as follows: where  1 is the number of keypoints extracted from the reference image   . 2 represents the number of keypoints detected from unregistered image   .  () stands for the number of points defined in formula (18).

Accuracy.
In order to evaluate the accuracy of our method, the transformation model approach is used to evaluate the correct ratio.In practical application, affine transformation formula has widely been employed as the judging model.Hence, in this paper, we use the following transformation model in this paper: where (, ) is the position with regard to certain feature point from unregistered image and (, V) is the position of corresponding point from reference image.The position of feature point   should be in neighbor region with regard to the point, defined as In ( 21), (û, V) is the result obtained via formula (20).Points (, V) and (, ) are regarded as correct matching points if they meet the requirement of (21).To get these parameters, Least Square Method is adopted in this paper.The specific steps are as follows.
To verify the effectiveness and feasibility of our improved method, OpenCV and VC++ are used to realize the presented algorithm.In our tests, we use Rob Hess's code of the traditional SIFT.In order to evaluate the performances of different approaches, the correct ratio and repeatability are used.In this paper, SR-SIFT in [12] will be introduced in our experiments.Then we will compare ourperformances with SR-SIFT and SIFT.In these results, the upper one is reference image and the lower one is unregistered image.Besides, unregistered image is the result of reference image after transformation.In this paper, matching direction is from unregistered image to reference image.In the Euclidean distance section, threshold is 0.75.In Section 3.3, confidence interval is set to be in the range of [−3.3 * standard deviation, +3.3 * standard deviation].All the tested pictures can be downloaded at http://www.robots.ox.ac.uk/∼vgg/research/affine.

Test One (Affine Transformation between Two Images Includes Rotation and Scale Changes).
Here, the image size is 425 * 340.The test performances are presented in Figure 5. Figure 5(a) presents the two images unprocessed.Figure 5(b) is the performance of our proposed method and Figure 5(c) represents the result of original SIFT method.Figure 5(d) shows the matching results of SR-SIFT.Table 1 shows that there are 1052 feature points in reference image.Feature points set extracted from unregistered image includes 929 candidate points.Number of points filtered by Euclidean distance in traditional SIFT algorithm is 335.Also, the number of points filtered by multiple layered strategies is 363.Moreover, the number of feature points from SR-SIFT is 268.
The correct ratio of the traditional algorithm is 0.8865, while the correct ratio of our improved method is 1.0000.Besides, the correct ratio of SR-SIFT is 0.9701.As for repeatability, the repeatability of traditional SIFT algorithm is 0.3196.Besides, the repeatability ratio of proposed approach is 0.3907.In addition, the repeatability of SR-SIFT is 0.2852.By comparison with SIFT, the correct ratio of our method increases by 0.1135.Besides, the repeatability of our method increases by 7.11%.
The correct ratio of our method increases by 2.99% compared with SR-SIFT [12], while the repeatability increases by 10.55%.

Test Two (Transformation between Two Images Includes Blurring Changes
). Image size is 500 * 350.The test results are presented in Figure 6.Table 2 shows that there are 785 feature points in reference image.Feature points set extracted from unregistered image includes 612 candidate points.The number of points filtered by Euclidean distance in traditional SIFT algorithm is 204.The number of points filtered by multiple layered strategies is 232.Moreover, the number of feature points from SR-SIFT is 134.
The correct ratio of traditional algorithm is 0.9607, while the correct ratio of our method is 0.9784.Besides, the correct ratio of SR-SIFT is 0.9776.As for repeatability, the repeatability of traditional SIFT algorithm is 0.3202.Comparatively, the repeatability ratio of proposed approach is 0.3709.
As for SR-SIFT, there is an equal number of the correct ratio between our method and SR-SIFT.Besides, the repeatability of our method increases by 22.06% compared with SR-SIFT.

Test Three (Affine Transformation between Two Images Includes Viewpoint and Rotation Changes).
The image size is 400 * 320.The test results are presented in Figure 7.   Table 3 shows that there are 923 feature points in reference image.Feature points set extracted from unregistered image includes 1126 candidate points.The number of points filtered by Euclidean distance in traditional SIFT algorithm is 329, while the number of points filtered by multiple layered strategies is 347.In addition, the number of feature points from SR-SIFT is 278.
The correct ratio of the traditional algorithm is 0.8510.Comparatively, the correct ratio of our method is 0.9915.Besides, the correct ratio of SR-SIFT is 0.9424.As for repeatability, the repeatability of traditional SIFT algorithm is 0.3033 while the repeatability ratio of our proposed approach increases by 0.3759.
The correct ratio of our method increases by 4.91% when compared with SR-SIFT.Besides, when compared with SR-SIFT, the repeatability of our method increases by 8.13%.

Test Four (Affine Transformation between Two Images Includes Scale and Rotation Changes).
Image size is 425 * 340.The test performances are presented in Figure 8. Table 4 shows that the number of feature set of reference image is 1052, while that of unregistered image is 866.The number of points filtered by Euclidean distance in traditional algorithm is 266, while that of our method is 281.Moreover, the number of feature points from SR-SIFT is 209.
As for correct ratio, that of the traditional algorithm is 0.9060.Comparatively, the correct ratio of our approach in this paper is 0.9965.Besides, the correct ratio of SR-SIFT is 0.9523.As for the repeatability, the ratio of SIFT is 0.2667, while that of the proposed algorithm is 0.3233.
When compared with SR-SIFT, the correct ratio of our method increases by 4.42%.Besides, the repeatability of our method increases by 9.59%.

Results Analysis.
Based on the results shown above, curves regarding correction ration and repeatability will be presented in Figure 9.Where Figure 9(a) is the correct ratio figure of the traditional algorithm and our method.Figure 9(b) shows the repeatability distribution result of the traditional SIFT, the SR-SIFT, and our improved method.
In the tests, we compare the improved approach based on multiple layered strategies with the traditional SIFT method and SR-SIFT.These transformations of test images include scale, rotation, and viewpoint changes.The SR-SIFT method proposed in [12] is based on scale restriction and shows better performances than the traditional SIFT under image registration experiments.However, the repeatability of SR-SIFT needs to be improved in practical applications, such as object detection based on feature points.
From the performances presented above, we can see that our proposed approach outperforms the traditional SIFT and SR-SIFT under these changes.These test results demonstrate the effectiveness and feasibility of our improved algorithm.

Conclusions
As for the problem of correct ratio and repeatability that is occurred in complicated image matching application using SIFT descriptor, a new approach based on multiple layered strategies is proposed in this paper.Firstly, the results are filtered by Euclidean distance.In addition, geometric feature consistency constraint is used to discard error pairs with abnormal slope.Then, a new method based on scale and orientation differences constraint is used to refine these matching points.The correction ratio and repeatability of the improved method outperform those of the traditional SIFT and SR-SIFT.Performances of experiments demonstrate the effectiveness and feasibility of the proposed algorithm.In future work, we will focus on improving the efficiency of the proposed approach.
(a) is the reference image and Figure 1(b) exhibits the unregistered image.

Figure 2 :Figure 3 :
Figure 2: Distribution characteristics results of these points.(a) Distribution characteristic of scale differences.(b) Distribution characteristic of orientation differences.(c) Distribution map of scale and orientation.

Figure 4 :
Figure 4: Flowchart of our proposed method.

Figure 5 :
Figure 5: Images used in test one and results of the two methods.(a) Original unmatched images.(b) Results of proposed method.(c) Results of traditional SIFT algorithm.(d) Results of SR-SIFT.

Figure 6 :Figure 7 :
Figure 6: Images used in test one and results of the two methods.(a) Original unmatched images.(b) Results of proposed method.(c) Results of traditional SIFT method.(d) Results of SR-SIFT.

Figure 6 (
a) presents the two images unprocessed, Figure 6(b) is the performance of proposed method, and Figure 6(c) represents the result of the original SIFT method.

Figure 6 (
d) shows the matching results of SR-SIFT.

Figure 7 (
a) presents two relative images, Figure7(b)shows the performances of proposed method, and Figure7(c) represents the result of the original SIFT method.Figure7(d)shows the matching results of SR-SIFT.

Figure 8 :
Figure 8: Images used in test one and results of the two methods.(a) Original unmatched images.(b) Results of proposed method.(c) Results of traditional SIFT algorithm.(d) Results of SR-SIFT.
Figure 8(a) presents two relative images, Figure 8(b) is the performance of proposed method, and Figure 8(c) represents the result of original SIFT method.Figure 8(d) shows the matching results of SR-SIFT.

Figure 9 :
Figure 9: Performances of traditional SIFT algorithm and the proposed method.(a) Correct ratio results of the two approaches.(b) Repeatability ratio of SIFT and proposed method.

Table 1 :
Statistics of test one.

Table 2 :
Statistics of test two.

Table 3 :
Statistics of test three.

Table 4 :
Statistics of test four.