In this paper, a hybrid scheme is proposed to find the reliable pointcorrespondences between two images, which combines the distribution of invariant spatial feature description and frequency domain alignment based on twostage coarse to fine refinement strategy. Firstly, the source and the target images are both downsampled by the image pyramid algorithm in a hierarchical multiscale way. The FourierMellin transform is applied to obtain the transformation parameters at the coarse level between the image pairs; then, the parameters can serve as the initial coarse guess, to guide the following feature matching step at the original scale, where the correspondences are restricted in a search window determined by the deformation between the reference image and the current image; Finally, a novel matching strategy is developed to reject the false matches by validating geometrical relationships between candidate matching points. By doing so, the alignment parameters are refined, which is more accurate and more flexible than a robust fitting technique. This in return can provide a more accurate result for feature correspondence. Experiments on real and synthetic imagepairs show that our approach provides satisfactory feature matching performance.
Given two or more images of a scene, the ability to match reliable corresponding points between these images is a fundamental and very important problem in computer vision field. In fact, many computer vision applications rely on the success of finding corresponding points [
Nowadays, a considerable amount of previous research has been conducted on the works on efficient feature descriptors, which have used the spatial domain representation of various image features [
However, those spatial domain approaches conduct exhaustive search of local appearance templates, making it very time consuming and difficult, especially in presence of occlusion junctions, large change of viewpoint, multiple similar structure, and handling of appearing and disappearing features. Even when the most effective invariant descriptors are applied, the performance of feature correspondence in spatial representation is not very satisfactory. These drawbacks are the common problems where overall images are used as the search space for exhaustively finding putative correspondence without guidance. Another more difficult problem is that some difference between the images due to object movements, lighting changes, using different types of sensors or with different sensor parameters, cannot be modeled by a spatial transform alone. They make the registration more difficult since accurate registration can be no longer achieved between two images, even after spatial transformation.
Due to the limitations of spatial domain methods, some researchers take advantage of frequency representation information to assist the image registration process or motion analysis [
This suggests a simple but effective approach that we denote a coarsetofine hierarchical approach. In fact, coarsetofine hierarchical ways have been applied by various researchers [
In this paper, we propose a novel way to hierarchically integrate the estimation in the frequency domain and in the spatial domain. These problems are alleviated by firstly resorting to a rough estimate of the transformation parameters between the image pairs at the coarsest level using the frequency information, then this reasonable approximation guide the matching process in spatial domain at the original level. In such a way, we fuse spatial and frequency domain information in a new efficient manner. The integration of frequency and spatial domain can not only avoid the drawbacks of spatial domain methods, but also make use of spatial information for precise feature localization. It should be noted that our idea in certain steps seems similar to some image registration approaches, that employ a set of correspondence features to determine the transformation between the image pairs. Indeed, to the best of our knowledge, this is the first time that by introducing FourierMellin Transform at the coarsescale, the captured deformation parameters are then applied to assist the feature matching procedure in spatial domain.
The paper is organized as follows. Section
It is assumed that the image pairs containing the same scene are taken at different times, from different imaging devices, or from different perspectives, due to changes in camera position and pose.
We present a novel algorithm that takes advantage of both spatial and frequency domain information in a hierarchical multiscale decomposition way, as is described in Figure
Framework of the presented method.
Since the image pairs can be related by the camera motion which consists of relative translation, rotation, scale, and other geometric transformation, so motion estimation techniques could be introduced into our algorithm. The affine motion model [
Global motion estimation methods can be broadly classified into two categories: spatial domain [
In this paper, we recover the coarse rotation, translation, and scale parameters of the transformation at the top level by using the FourierMellin Transform (FMT), which is essentially a phase correlation method based on the Fourier and Logpolar transform. The idea behind FMT method is to makes use of the Fourier Shift Theorem and the Fourier Rotation Theorem to provide invariance to rotation, translation, and scale. Then it is performed by phase correlation of the crosspower spectra.
Equation (
with
We can see that, the Fourier Transform (FT) itself is translation invariant. So the rotation and scaling parameters can be determined independent of the translation parameter.
Since dynamic range of the output of FFT is very high, making interpolation in the frequency domain difficult; this range is compressed by resampling the Fourier magnitude spectra on logpolar grid. When the Fourier magnitude spectra are converted from Cartesian coordinate system to a logpolar representation
Then it converts to polarlogarithmic coordinates so that rotation
In other words, it can be written in the following way,
It can be seen that, the FourierMellin transform (FMT) gives a transform that its resulting spectrum is invariant in rotation, translation, and scale.
We can summarize the FourierMellin Transform (FMT) as follows. Firstly, by working in this translation invariant (FourierMellin) domain, linear component
Coarse Estimation By FourierMellin Transform at the top level of the multiscale image pyramid.
Since Fourier magnitude spectrum is applied as translation invariant domain, FFT of whole original image is needed, making it computational expensive. This problem is alleviated by multiscale decomposition [
Thus, by FourierMellin Transform in the frequency domain of the coarsescale image, the initial transformation parameters in (
Once the transformation parameters are obtained in the frequency domain, under the guidance for each interest feature extracted in the source image, a set of features correspondence can be established by search within a small area around the ideal projected center. Even so, it is not assured that all of the matches are necessarily the exact correspondences. Sometimes, even a small error can have a large influence in the recovered parameters. Actually, in the case of occlusion or removal, a most similar feature point within that window will be proved to be false candidate match.
To identify and eliminate outliers, we apply the robust estimation algorithm RANSAC [
Then, we make use of the distribution of collections of nearby interest points to increase the correspondence belief for each other. But how to select such a group of points and what metrics can be utilized to enhance the performance is a challenge. Following the works in [
An instance of the distribution of nearby corresponding point sets is designed as follows. For every initial matched feature point pair
The angle from
In this section, we conduct the point correspondence experiments using both the real images and synthetic deformed imagepairs. We demonstrate the accuracy and the robustness of the algorithm presented in the second section.
Image matching plays a critical important role in remote sensing applications. Due to the large volumes of remotesensing data available, automated feature correspondence is highly desirable. We will consider images that differ by a approximate planar motion, which is suitable for remote sensing image registration. To measure the performance of the proposed method, we apply our algorithm to two sets of images. The first set is the image pairs named as col90p1 and col91p1 (showed in Figure
As noted above, any geometrical feature can be represented as a point set to find meaningful correspondence with another point set. However, it is important for featurebased methods to adopt discriminative and robust feature descriptors that are invariant to the differences between the two image pairs. Lowe [
First, the standard SIFT matching procedure is implemented to the image pairs supplied by Leica Geosystems Geospatial Imaging. Figure
Currently in our algorithm, 3pixel is used as the projection error threshold for RANSAC and we repeat the RANSAC loop just for 10 times using 3 putative matches to compute the affine wrap parameters. The length and width of the constrained window for each mapped point are both 20 pixels, as showed in the right part of the Figure
The transformation obtained by frequency information guide the correspondence search within small windows, whose centers are ideally mapped from those interest points detected in the reference image.
The illustration of an instance our strategy. For each initial matched feature point pair, the geometrical relationships between the three nearest points around
Some examples of feature matches in the image pairs named as col90p1 and col91p1, which are supplied by Leica Geosystems Geospatial Imaging. A few of them are false matches due to the similarity of local appearance information.
Two examples of performance comparison of standard SIFT and the proposed method. The first column: original interest point in the reference image. The second column: corresponding interest point using standard SIFT detection in the current image. The third column: the rectified locations yield subpixel accuracy using our method.
In the first experiment, given two sets of interested points detected and described by SIFT feature in the Figure
The experiment on SPOT5’s JiNing coal satellite images also achieves satisfactory result. The transformation parameters are [1.0007, −0.0226, 48.4592, 0.0323, 0.9969, and −806.2157] in the form of
We also performed the experiments on ten pairs of images from several distinct domains, including medical scans, natural scenes, and military surveillance. The proposed method is computationally efficient. This comes from the shift property of the Fourier Transform (FT) and the use of Fast Fourier Transform. Another reason is that the transformation obtained in frequency domain at the coarse layer, can serve as an initial good guess to the matching process, leading to an easy search within a small region.
To further quantitatively evaluate the accuracy of the proposed technique, we perform an easy way to do the evaluation, in which a known transformation (we take affine transformation as the concrete example of the transformation style in this case) is applied to the source image, and the estimated transformation is compared with the known transformation parameters, to see the accuracy. Two examples are shown in Figure
The Example of Fixed Source Image and the Synthesized Image pairs.
Under various parameters of transformation, we compute three metrics, including Root Mean Square (RMS) error between the point sets after alignment, the correlation coefficients between the original image and rectified image, and ratio of outlier to putative matched pointpairs. The RMS error represents the difference between the original control points and the new control point locations calculated by the transformation process. Note that optimum value of the RMS error is 0, indicating exact matching between the images before and after the rectification; while poor matches result in large RMS error values, small correlation coefficients and high outlier rate.
In detail, for the fixed source image, we constructed a target image set which contains 100 frame images that were synthesized from random affine transformations with rotation
RMS error and the correlation coefficient between after the transformation.
Trial number  1  2  3  4  5 

RMS error  0.021  0.033  0.043  0.058  0.038 
Correlation coefficient  0.993  0.992  0.989  0.988  0.993 
Outlier rate  0.42%  0.46%  0.58%  0.65%  0.43% 
Comparison to Fast Fourier Transform (FFT) [
The average relative error and computation time performance compared to FFT and HMIR method.
Average relative error (%)  Time (sec)  




 
FMT method  0.9  1.2  4.3  2.7  18.4 
HMIR method  4.3  3.9  3.6  2.2  34.7 
Proposed method  0.4  0.6  1.4  1.9  21.8 
In the results, SIFT points are used for the comparison but not for a final application (the computation time for our proposed method does not take the SIFT point detection into account). Actually, we also test using Harris corners could be greatly faster without the degradation of accuracy. The type of the point detector (using SIFT detector or Harris corner detector) do not have any influence on the performance of proposed method.
In this paper, the combination of these different methods in spatialfrequency domain tends to compensate for any deficiencies in the individual methods. The integration of frequency and spatial domain can simultaneously find the correct feature correspondences within small support windows, the mapping between these image pairs of a same scene and have a more accurate location result after the rectification step. It has been shown that our hybrid hierarchical estimation techniques can achieve efficient and robust performance.
The authors would like to thank to the associate editor and the anonymous reviewers for their careful work. This research has been supported by the National Natural Science Foundation of China (Grant nos. 60805028, 60903146), Natural Science Foundation of Shandong Province (ZR2010FM027), Zhejiang Provincial Natural Science Foundation of China (no. Y1110661), and China Postdoctoral Science Foundation (2012M521336).