A Geometrical-Information-Assisted Approach for Local Feature Matching

This paper presents a geometrical-information-assisted approach for matching local features. With the aid of Bayes’ theorem, it is found that the posterior confidence of matched features can be improved by introducing global geometrical information given by distances between feature points. Based on this result, we work out an approach to obtain the geometrical information and apply it to assist matching features. The pivotal techniques in this paper include (1) exploiting elliptic parameters of feature descriptors to estimate transformations that map feature points in images to points in an assumed plane; (2) projecting feature points to the assumed plane and finding a reliable referential point in it; (3) computing differences of the distances between the projected points and the referential point. Our new approach employs these differences to assist matching features, reaching better performance than the nearest neighbor-based approach in precision versus the number of matched features.


Introduction
For matching local features, the threshold-based method and the nearest neighbor-based approach (NNA) are two fundamental strategies.Compared with the threshold-based approach, the NNA matches features more precisely [1].From the previous works (cf.[2][3][4][5]) to the recent ones (cf.[6,7]), the nearest neighbor techniques are exploited broadly.Since positions of matched features can be seen as samples generated from two images related by a certain homography, RANSAC [8] is usually applied to exclude the impact of outliers on the estimation [9][10][11][12][13][14][15][16], therefore improving matching effect of local features.RANSAC has a defect when inliers are fairly less in putative matches, which causes unpredictable elapsed time, even failing to estimate consensus sets.For matching features more efficiently, some methods introduce prior information into the algorithm of estimating consensus sets, e.g., Guided-MLESAC [17], PROSAC [18], and SESAC [19], using distributions constructed by prior information instead of the uniform distribution to generate hypothetical homographies.Besides, some preliminary processes are also employed to produce subsets containing more uncontaminated samples in advance before running plain RANSAC.For example, Cov-RANSAC [20] employs SPRT and a covariance test; DT-RANSAC [21] refines data by topological information in the Delaunay triangulation; SVH-RANSAC [22] adopts a local feature scale constraint to group observations; SC-RANSAC [23] utilizes spatial relations between extracted corresponding points; and WD-RANSAC [24] and QML-RANSAC [25] adopt the Wigner distribution and the quasi maximum likelihood algorithm, respectively.Nevertheless, in the case of matching features between nonrigidly transformed images, it is difficult to estimate a nonparametric consensus by a homographyestimation-based approach.Some nonparametric consensus methods are also developed to match local features.Sparse-VFC [26,27] introduces sparse representations to estimating vector field consensus and shows powerful performance in matching features.LLT [28] exploits local geometrical constraints to estimate the consensus set which comprises inliers in matches between two rigidly or nonrigidly transformed images.Besides those consensus-estimation-based methods, approaches without estimating consensus are also studied.The method presented in [29] divides the vector of a local descriptor into subvectors and employs binary trees to compare subvectors to achieve NNA.For matching features in the binocular stereo scene, the approach in [30] uses some adjacent pixels near a feature point to form a block and then to find the best match by the block.LPM [31] applies local neighborhood structures to determine true matches, and based on it, GLPM [32] introduces a set with more confidence of including true matches, which are obtained by the distance ratios of local descriptors, and then reaches better performance than LPM.Another local-geometricalinformation-based technique is proposed by [33] for describing features and matching features, which exploits topological relationship amid local features.Moreover, in the case of matching some deep features, convolutional neural networks are employed [34].
Motivated by approaches above built on local geometrical information, we discuss a new method assisted by geometrical information to improve the performance for matching features.An extracting procedure for scale-invariant features provides information about scale or even affine parameters of features (cf.[10,35]), which can estimate the geometrical relationship between two images.We here discover an appropriate geometrical relation and study how to estimate it and exploit this relation to further improve matching effect.This paper is organized as follows.In Section 2, we discuss factors that influence matching effect and point out a new way to enhance matching effect.In Section 3, we discuss how Euclidean distance can assist matching features and work out an algorithm of matching features assisted by geometrical information.We would like to abbreviate the algorithm Geometrical-Information-Assisted Matching as GIAM in this paper.In Section 4, we test GIAM and compare the results with other methods.Finally we conclude our work in Section 5.

Factors That Influence Matching Effect
In what follows, (, ) denotes the event that a point  matches another point , and ∼ (, ) is the negation of it.(⋅) denotes a feature descriptor and (⋅, ⋅) is a similar metric of the descriptors of two points.Besides, the symbol () represents that the function () →  ( is a constant satisfying 0 ≤  ≤ 1) as ‖‖ → 0, where  is a vector in R  for some positive integer .

Geometrical Information for
Therefore we figure out the following expression describing geometrical information (where  = 1, . . ., ) Noting that the point  −1  (  −   ) equals  −1  (  −   ) in the image   , we will show that the geometrical information (9) can be employed as a metric satisfying (4) and (5).We call such a  −1  (  −   ) the referential point and (  ,   ) the referential pair.Proposition 1. Suppose that the image   is defined on R 2 .Then any point  in   and any point  in   satisfy Pr ( (, ) ≤  |∼  (, )) = 0 () . ( Proof.Denote by (, ) the p.d.f. of projected feature points Since  and  do not match, the point where The result of Proposition 1 indicates an optimal case in applying (5).However, we need to formulate the discrete form for pragmatic use, which is built on a probability dominated by the counting measure.
Proof.Denote by   the probability of a feature point (from dom(  ) or dom(  )) being projected to the point (  ,   ) and set  =  E ( −1  ,  −1    ).Since  and  do not match, the point where We finish the proof by noting that for sufficiently small 's, the   's are constant.
Since the number of pixels centred at the referential point with a certain radius is far less than the number of pixels in the whole image, presumably in most cases the  should be far less than 1.Therefore Proposition 2 offers a suboptimal result for (5) in matching features.The proofs and discussions of Propositions 1 and 2 are shown in Figure 1.

Matching Features Assisted by Geometrical Information
3.1.Discussion for Algorithm.We first discuss how to obtain  −1  and  −1  .Suppose that there is a linear transformation  mapping an unit circle { 2 1 +  2 2 = 1}, to an ellipse { 2 1 + 2 1  2 +  2 2 = 1}, and denote  = (     ),  = ( 1 ,  2 )  , and  = ( 1 ,  2 )  .Then we have which means that by Cholesky decomposition of the positivedefinite matrix , we can obtain the linear transformation  −1 .In descriptors of Mikolajczyk's format [10,35], the affine region of a feature point is enclosed by an ellipse with parameters , ,  (for the details of these parameters please refer to the website http://www.robots.ox.ac.uk/∼vgg/ research/affine/descriptors.html#binaries), by which we can calculate  −1  and  −1  .Next we need to find the referential pair (  ,   ) and to estimate the invertible linear transformations   and   .Suppose that {(  ,   )}  =1 is a sequence, which consists of the pairs ordered in similar metrics of descriptors.We assign the most similar pair as the referential pair, i.e., (  ,   ) = ( 1 ,  1 ).Since the unit circle in   can be transformed by   and   onto the boundaries (ellipses) of affine regions of feature points  1 and  1 , respectively,  −1  and  −1  should be the respective Cholesky decomposition of matrices ( ) and ( ), where   1 ,   1 ,   1 are parameters of the boundary ellipse of the affine region of  1 and the same applies to   1 ,   1 ,   1 .
The geometrical information for each pair of feature points can be calculated henceforth.For two sets of feature points {  }  =1 and {  }  =1 , we compute their geometrical information as Finally, new matches are chosen by ordered scores which are calculated by a sum of similar metrics of features and geometrical information of feature points.We calculate the score of each putative pair as   and  −1  in Eq. ( 15).(d) Calculate the distance correlation matrix  1 by Eq. ( 15) and (17).Since there is only one referential pair, the order of the distance correlation matrix is 1; (e) Calculate the score matrix  1 =  +  1 .(f) Compute an ordered matched list {(   ,    )}  =1 by using the score matrix  1 , where  = min{, } is the number of matched feature pairs.Output: An ordered matched list {(   ,    )}  =1 .
descriptor difference matrix, a distance matrix for candidates in the image   , a distance matrix for candidates in the image   , a distance correlation matrix, and a score matrix to be basic data objects.Similar metrics between each pair of candidates constitute the descriptor difference matrix  = (  ), where   = ((  ), (  )).
The distance matrices are   = (  ) and   = (  ), where the entries   represent the distance induced by   between the corresponding points (in   ) of the -th and the -th feature points in   , and the same applies to   for feature points in   .
The distance correlation matrix   = (   ) presents the geometrical information of feature points, which is given by for  = 1, . . ., , where and     and    are entries selected from distance matrices   = (  ) and   = (  ) in the following manner: if the entries of the -th pair are made up of the   -th candidate in the image   and the   -th candidate in the image   , then the -th column in matrix   is the   -th column in matrix   , and the -th row in matrix   is the   -th row in matrix   , whereas the operator ⊖ in ( 17) is defined by    = ∑  =1 (  −   ) 2 .Here we name  the order of the distance correlation matrix.
The score matrix   =  +   contains all results computed by (16).
We summarize our algorithm in Algorithm 1. GIAM (NNA-GIAM).The codes of RANSAC adopted in our simulations are developed by Marco Zuliani (all these codes are obtained from the website https://github.com/RANSAC/RANSAC-Toolbox) and the codes of LPM are developed by Dr. Jiayi Ma (all these codes are obtained from the website https://github.com/jiayi-ma/VFC).We set parameters for LPM as shown in Table 1 and parameters for RANSAC as shown in Table 2.

Simulations
We utilize an executable file implemented by Dr. Mikolajczyk (the original codes are obtained from the website http://www.robots.ox.ac.uk/∼vgg/research/affine/) to produce experimental data.First, we use the executable file to process all test images and generate the sets of descriptors for each test image.Second, we match features between those sets by NNA, NNA-RANSAC, and NNA-LPM as well as NNA-GIAM, respectively.We here define that a point in the first image correctly matches a corresponding point in the second image, if the distance between the point in the second image and the standard point (to which the point in the first image is mapped via a given homography which is a ground truth) is less than 2 pixels.Precision of matches with regard to a number of matched features equals the number of correct matches divided by the number of matches.We apply a curve of precision vs. number of matches to estimate the performance of the four algorithms.
For tests under fairly complex scenes, such as changes from slight to dramatic degree of rotation, scale, viewpoint, blur, light, and JPEG compression, the Mikolajczyk's test data [36] are exploited, which consist of a group of image sets with homographies as the ground truth (the image sequences are from the website http://www.robots.ox.ac.uk/∼vgg/research/ affine/detectors.html)There are 8 test sets in the group, and each set contains 6 images.We match features in the first image to the rest of images in each test sequences by NNA, NNA-RANSAC, and NNA-LPM as well as NNA-GIAM, respectively.In these tests we employ the Harris-Affine detector [10,35] to extract features and describe them by the SIFT [3,4]  In these figures, the "1vs.k" in the captions is interpreted as "the first image versus the -th image", indicating that the test result is obtained by matching features between the first image and the -th image in the same test image sequence.
In the case of rotation and scale change for the textured scene (cf. Figure 2), GIAM overcomes NNA and exceeds LPM and RANSAC when the change is dramatic.After the middle degree change ((c), (d), (e) in Figure 2), GIAM outperforms LPM on most intervals of "number of matched features".GIAM shows slight advantage to NNA on the rotation and scale change for the structured scene (cf. Figure 3).In the cases of blurred images on both structured scene and textured scene (cf.Figures 4 and 5), GIAM improves precision of NNA.GIAM reaches higher scores than LPM when the number of matches is relatively small ((a), (b), (c) in Figure 4) in the structured scene test, and admits analogous precision to RANSAC.For the severest JEPG compression, GIAM has better performance than NNA and LPM (cf. Figure 6).GIAM, NNA, and RANSAC show almost identical performance in the case of slight to mild JPEG compression over more than half of the matches.LPM finds relatively fewer correct matches in this test sequence.In the case of illumination change (cf. Figure 7), GIAM reaches higher precision than NNA as the light turns dim and in general slightly outperforms LPM over the first 1/3 of all matches in each degree of illumination.In the case of viewpoint change for the textured scene (cf. Figure 8), GIAM overcomes LPM and exceeds NNA in all tests, and has comparable performance with RANSAC over the first 1/2 of all matches.Under the situation of viewpoint change for the structured scene (cf. Figure 9), GIAM exceeds NNA in all tests and shows better performance than others when the number of matches and the change of viewpoints are relatively small ((a), (b) in Figure 9).
Consequently, it can be seen from these results that GIAM improves precision of NNA on the situations including scale change, blur change, JEPG compression change, illumination change, and slight change in viewpoint, and reaches better performance than LPM in some change of rotation, scale, illumination, signal compression, and viewpoint.

Conclusion
We have studied how geometrical information improves the posterior confidence of matched features and therefore enhances the matching effect.There are two essential merits in our work.The first one is that we utilize Bayes' theorem to analyze factors influencing matching effect and figure out that prior information, which is a conditional distribution satisfying (4), can assist improving the posterior confidence of matches.The second one is exploiting the geometrical information in the descriptors, which can be used to estimate the distance between projected feature points.Consequently, the technique proposed in this paper shows its capability to improve the performance of matching features.

Figure 1 :
Figure1: An illustration of   for Propositions 1 and 2, where the left side indicates that the probability of two points  and  being projected into the same circle centred at the referential point is zero under the Lebesgue measure on R 2 , whereas the right side shows that the counterpart may not be zero under the counting measure since there are points except  −1  ( −   ) (e.g., the 11 pink boxes above) that  could be projected to. .

4. 2 .
Results of Experiments.We obtain 40 test curves divided into 8 sets, which are shown in Figures 2-9 in the appendix.
Matching Features.Suppose that two images   and   are transformed, respectively, from an image   by two affine homographies   and   are invertible linear transformations on the Cartesian coordinates, which yield rotation, scale change, and tilt of planar images.Let symbols   and   be matched points and symbols  and  be points (respectively, in   and   ) to be matched.Here we by "a point in an image", say  in   , mean that  ∈ dom(  ), its domain of definition.Since the procedure for feature extracting hardly provides information about   and   , we then perform a technique to erase them by calculation of Euclidean distance (which is denoted by  E (⋅, ⋅)) in the same image   as follows:
4.1.Datasets and Methods for Simulations.To test our new algorithm for local feature matching, we employ four methods, NNA, NNA with the plain RANSAC (NNA-RANSAC), NNA with LPM (NNA-LPM), and NNA with

Table 2 :
Main parameters in RANSAC.