A Subpixel Matching Method for Stereovision of Narrow Baseline Remotely Sensed Imagery

In this paper, an accurate and efficient image matching method based on phase correlation is proposed to estimate disparity with subpixel precision, which is used for the stereovision of narrow baseline remotely sensed images. The multistep strategy is adopted in our technical frame; thus the disparity estimation is divided into two steps: integer-pixel prematching and subpixel matching. Firstly, integer-pixel disparity is estimated by employing a cross-based local matching method. Then the relationship of corresponding points is established under the guidance of integer-pixel disparity.The subimages are extracted through selecting the corresponding points as the center. Finally, the subpixel disparity is obtained by matching the subimages utilizing a novel variant of phase correlation approach.The experiment results show that the proposedmethod canmatch different kinds of large-sized narrow baseline remotely sensed images and estimate disparity with subpixel precision automatically.


Introduction
Stereovision is an advanced task in remote sensing and photogrammetry [1].The aim of stereovision is to estimate the disparity through matching two or more images of same scene in different views and extract digital elevation model (DEM) through the disparity [2].Intuitively, the disparity represents the displacement vectors between corresponding pixels that horizontally shift from the left image to the right image [3].The stereovision of narrow baseline remotely sensed imagery is a new research hotspot for stereovision in recent years [4].Automatic subpixel image matching is the essential technique for narrow baseline stereovision [5].We will brief review the stereovision with different baseline and subpixel image matching method in the remaining part of this section, respectively.

Principle of Stereovision.
The process of DEM extracting can be represented by a stereovision model [6], as shown in Figure 1.B and H express the baseline and the altitude of a satellite.L and R are the left and right images captured by the satellite from different views; h represents the height of ground object to be measured.A and B are the 3D points of the real world scene, which are projected to the three 2D locations a,   , and   in L and R. The 2D points of A and B in L are coincident; thus the disparity d of the two corresponding points a and   equals the distance between   and   in R. D represents the mapping of d in the 3D scene; according to the similar triangles of geometry principle, the equation / = ℎ/( − ℎ) is established.Assume that the ground sample distance (GSD) of the sensed image is G (meters/pixel); the height h of ground object can be generated: ℎ = /(/( − ℎ)).Because the altitude of satellite is much larger than the height of ground object, that is,  ≫ ℎ, the height h can be approximated as ℎ ≈ /(/).It can be known that the precision of h is proportional to the precision of disparity and GSD and is inversely proportional to the B/H ratio.In most real sensors, H and G are limited and fixed; therefore, only two aspects can be used to enhance the precision of h: (1) Improve the precision of disparity to subpixel level [7]; (2) increase the length of baseline, that is, increasing the B/H ratio.In traditional stereovision of remotely sensed imagery, B/H ratio is usually set to about 1, called stereovision of wide baseline remotely sensed imagery.Even if the disparity precision is only up to the integer-pixel level, the DEM can be extracted accurately.

Comparison of Wide Baseline and Narrow Baseline Stereo.
The stereovision of wide baseline remotely sensed imagery can not perform well in urban areas with dense buildings [8].The captured process of the wide baseline remotely sensed image pair is shown in Figure 2(a).Due to the long baseline and large viewing angle, there is a large occlusion area and large geometric distortion between the images.In addition, the interval time of two imaging processes is long; the illumination condition has changed greatly.The above interference factors will lead to a high false matching rate in stereovision of wide baseline remotely sensed imagery.Therefore, it requires a lot of manual and postcorrection to extract DEM [9].In order to solve this problem of DEM extraction in urban areas, stereovision of narrow baseline remotely sensed imagery technique has emerged and gradually become a new research focus in stereovision and remote sensing [10,11].The captured process of the narrow baseline remotely sensed image pair is shown in Figure 2(b).Due to the short baseline and small viewing angle, there is a small occlusion area and little geometric distortion between images.When / → 0 (i.e., a very narrow baseline), then occlusions will be minimized, essentially because each image view becomes more geometrically similar.In addition, the interval time of two imaging processes is short; it can be considered that the illumination condition is similar to the same time.Due to above advantages, the stereovision of narrow baseline remotely sensed imagery is suitable for extracting DEM in the urban areas [8].Two sets of simulated remotely sensed image pairs provided by Beijing Institute of Space Mechanics and Electricity are shown in Figure 3. However the reduction of B/H ratio will inevitably lead to the reduction of precision for DEM extracting.If the stereovision of narrow baseline is going to extract the DEM as precise as the stereovision of wide baseline, the disparity needs to achieve 1/ pixel precision, where  is the ratio of wide baseline and narrow baseline [6].For example, when B/H is 0.05, the precision of disparity needs to achieve 1/20 pixels to satisfy the DEM extraction.Therefore, the key technique for the stereovision of narrow baseline remotely sensed imagery is the high precision subpixel image matching [5,7].

A Survey of Subpixel Image Matching Method.
Automatic subpixel image matching is one of the most essential techniques in stereovision.It can be divided into three major categories: interpolation based method, fitting based method, and phase correlation method [12].Interpolation based method includes the original image interpolation method and the matching cost interpolation method.The matching cost interpolation method is the representative algorithm, which combines the advantages of high efficiency and accuracy [13,14].The initial cost volume is interpolated by various interpolation function, and the subpixel disparity is estimated by searching the extremum of interpolated cost volume.At present, this method is usually used for the auxiliary system of driverless vehicles, robot navigation, and unmanned aerial vehicle.The original image interpolation method significantly increases the computing load, limits the possible precision to the chosen upsampling rate, and also may introduce interpolation artifacts [5].The fitting based method generally postprocess the cost volume or disparity plane by fitting method to estimate the subpixel disparity.The cost volume fitting method fits the peak neighborhood of cost volume into a parabola; then the subpixel disparity is estimated by searching the extreme of parabola [15,16].The disparity plane fitting method models the disparity field by segmentation constraints, and subpixel disparity is obtained by fitting disparity plane [17][18][19][20].The fitting based method is efficient, but the disparity precision is low.Phase correlation method provides high efficiency and accuracy via fast Fourier transform and other supplementary approaches under ideal conditions.In general, the main peak location of the inverse Fourier transform of the normalized cross-power spectrum was interpolated with the parabolic, Gaussian, and sinc functions to get the subpixel disparity [21][22][23][24].
In this paper, a subpixel image matching method based on phase correlation is proposed to estimate the disparity with subpixel precision for stereovision of narrow baseline remotely sensed imagery.The rest of this paper is organized as follows.The principle of phase correlation is introduced briefly in Section 2. A novel improved phase correlation method with subpixel precision is described in detail in Section 3. The complete algorithm is described in Section 4. Experimental evaluation is presented in Section 5. Finally, the paper is concluded in Section 6.

Left image
Right image
Phase correlation is a common method to estimate the translation between images.Only phase information of images is utilized; thus it is almost not affected by noise and radiation difference and more robustness than spatial domain method.But the digital image is a discrete function whose Fourier transform is a discrete transform, so only the integerpixel translation (int(Δ), int(Δ)) can be obtained by searching the peak location.

Improved Phase Correlation with Subpixel Precision
To estimate the subpixel translation, various algorithms have been proposed.In [23], Foroosh et al. achieved the subpixel estimation by approximating the Dirichlet function derived from the normalized cross-power spectrum.Nagashima et al. propose a fitting method to enhance the matching precision from integer-pixel level to subpixel level, which fits a curve surface by using the neighborhood data around phase correlation peak [21].However, these methods only take into account the peak and its neighborhood; the distribution of phase correlation function is neglected; thus, the precision of translation estimation is low in stereovision of remotely sensed imagery.An improved phase correlation method used for subpixel estimation is designed in this section.The flow chart is shown in Figure 4. Due to the periodicity of DFT, an image can be considered to "wrap around" at an edge; therefore, discontinuities, which are not supposed to exist in real world, occur at every border in 2D DFT computation [21].A 2D Hanning window function is applied to reduce the effect of discontinuities at image borders.The 2D Hanning window function is defined as The subpixel translation between two images can be obtained by downsampling high resolution images with integer-pixel translation.Therefore, the phase correlation of two images can be approximated by 2D sinc function: In theory, sinc function can approximate the distribution of phase correlation in ideal condition.However the phase correlation peak is decreased in remotely sensed images matching due to the interference of random noise, image aliasing, edge effects, and other influences.Therefore, the sinc function can not describe the distribution of phase correlation accurately.In our method, the sinc function is improved by introducing phase correlation coefficient: where  is the phase correlation coefficient,  ≤ 1.When the translation is integer-pixel and there is no interference factor,  = 1; when the translation is subpixel or there are interference factors,  < 1. Due to this improved phase correlation method is applied to the stereovision of narrow baseline remotely sensed imagery, and the image pair is treated by epipolar rectification before matching.Thus, the phase correlation PC(, ) can be separated in spatial domain, and a 1D function expression is given which only exists as translation in horizontal direction: The peak position (Δ, 0) is the translation which is expected to be estimated.To locate the high precision subpixel peak position, a peak evaluation method based on uniformly spaced sampling is proposed.Assume that  is the real peak position of sampled-data, as shown at the bottom left corner of Figure 4 sampling where k i = k i−1 + Δk The above equation can be expressed as a linear expression of Δ: Through (12), the horizontal translation Δ with subpixel precision can be estimated:

Mathematical Problems in Engineering
To reduce the influence of noise, the peak evaluation is performed by selecting multiple sets of uniformly spaced sampled-data which are all the data on the  axis, and the least square is applied to solve the optimal translation.Assume that  sets of uniformly spaced sampled-data (  −   ,   ,   −   ) ∈ are selected for peak evaluation;   ,   are determined as follows:  1 =  2 = ⋅ ⋅ ⋅ =   = arg max  PC  (),  1 = 1,  2 =  1 + Δ, . . .,   =  −1 + Δ, (Δ = 1).Equation ( 12) can be rewritten as where

Complete Subpixel Image Matching Algorithm
The input of our algorithm is rectified narrow baseline remotely sensed image pair, and output is the disparity with subpixel precision.Because the stability of phase correlation method is poor when the disparity search range is large [9], therefore, a multistep strategy is adopted in our technical frame, and the disparity estimation is divided into two steps: integer-pixel prematching and subpixel matching.The complete framework of our algorithm is shown in Figure 5. Firstly, integer-pixel disparity is estimated by employing an effective cross-based matching algorithm [25].Then relationship of corresponding points is established under the guidance of integer-pixel disparity.The subimages are extracted through selecting the corresponding points as center.Finally, subpixel disparity is obtained by matching the subimages utilizing the improved phase correlation proposed in Section 3. Assume that a narrow baseline remotely sensed image pair consists of left image  and right image , which can be expressed as a classic stereo model (, ) = ( + (, ), ).The disparity (, ) is used to describe the geometric position variation between  and , which consists of two parts, integer-pixel disparity   (, ) and subpixel disparity   (, ); that is, (, ) =   (, ) +   (, ).The details of our subpixel image matching algorithm are described as follows from step (a) to step (g).
(a) Assume ( 0 ,  0 ) is the pixel which is to be matched.In the first step, the cross-based support window of ( 0 ,  0 ) is defined.Firstly, an upright cross is established for ( 0 ,  0 ).It consists of two orthogonal line segments: horizontal segment   ( 0 ,  0 ) and vertical segment   ( 0 ,  0 ).A quadruple {ℎ − , ℎ + , V − , V + } denotes the left, right, up, and bottom arm length, respectively, and the length of each arm is determined by searching for the extreme point in that direction based on the color similarity.The upright cross skeleton is defined as Then the cross-based support window ( 0 ,  0 ) is defined based on the upright cross skeleton.( 0 ,  0 ) is a combination of each horizontal segment   (, ) where (, ) traverses over the vertical segment   ( 0 ,  0 ).The formula of ( 0 ,  0 ) is defined as (b) The matching cost of ( 0 ,  0 ) is calculated.Firstly, the initial matching cost  ,,  of each pixel in the support window is calculated: where ∇  is horizontal gradient,  balances the color and gradient terms, and  1 ,  2 are truncation values.Then the matching cost    0 , 0 ,  of ( 0 ,  0 ) is aggregated: (c) The integer-pixel disparity   ( 0 ,  0 ) of ( 0 ,  0 ) is estimated by winner-takes-all strategy: where   is the searching range of disparity.
(e) The subimages   ( 0 ,  0 ) and   ( 0 ,  0 ) with same size are extracted through selecting the corresponding points as the center.Because the selection of the two subimages is based on the guidance of the integer-pixel disparity, thus there is only a small range of horizontal translation between   ( 0 ,  0 ) and   ( 0 ,  0 ).

Experiments and Analysis
We test the proposed subpixel matching method for stereovision of narrow baseline remotely sensed imagery on a standard personal computer with Intel(R) Core(TM) i7 CPU, and the algorithm is implemented utilizing VS2010+OpenCV.The results analysis includes two aspects: precision analysis and complexity analysis.Constant parameter settings are used for all experiments.The parameters of this method are set as  = 0.11,  1 = 7/255, and  2 = 2/255.

Test on Simulated Narrow Baseline Remotely sensed Images
Generated by Translation.To evaluate the matching precision of proposed method, six urban scenes of narrow baseline remotely sensed image pairs, including Pleiades, SPOT-5, SPOT-6, WorldView-3, and GF-2, are used to perform image matching.For each image pair, the original image is utilized as the reference image (often called the left image), and the original image is moved 8.738 pixels along the x-axis as the target image (called the right image).Thus the true disparity is known to be 8.738 pixels for all the matching points of each image pair.Table 1 shows the general information of the test image pairs used in these experiments.
Figure 6 shows the distribution of matched points for the test narrow baseline remotely sensed image pairs.Because the image size is different for each image pair, the step size of grid is different too.The step sizes of grid in Tests 1-6 are set to 540 × 540 pixels, 160 × 160 pixels, 180 × 180 pixels, 170 × 170 pixels, 300 × 300 pixels, and 380 × 380 pixels.
In order to evaluate the matching precision of our method quantitatively, interpolation based method [14], fitting based method [15], and traditional phase correlation method [23] are used to compare with our method.Root mean square error (RMSE) and mean error of matched points are used as the assessment indices to evaluate these methods.Figure 7 shows quantitative evaluation results for the four subpixel image matching methods.Here the traditional phase correlation is called traditional PC for short.Figure 8 shows statistical results of the quantitative evaluation.Figure 8(a) shows statistical results where the RMSE is less than or equal to 0.05 pixels.Figure 8(b) shows statistical results where the RMSE is greater than 0.05 pixels and less than 0.1 pixels.Figure 8(c) shows statistical results where the RMSE is greater than or equal to 0.1 pixels.From the experimental results, one can see that the two PC methods have a high matching precision, and the precision is better than that of interpolation method and fitting method.In addition, the matching precision is related to the spatial resolution of the images.High spatial resolution corresponds to a high matching precision.Because the peak evaluation method based on uniformly spaced sampling is designed to fit the phase correlation function, our method outperforms the traditional PC method, about 66% matched points of our method with RMSE less than or equal to 0.05 pixels.

Test on Simulated Narrow Baseline Remotely sensed
Images.To evaluate the precision of DEM extraction by using the disparity estimated by our method, a simulated narrow baseline remotely sensed image pair provided by Beijing Institute of Space Mechanics and Electricity is used to perform image matching.The simulated narrow baseline remotely sensed image pair is generated by SE-WORKBENCH software.The attached parameters file of the image pair not only provides the true disparity, but also provides the true elevation of targets and buildings in the scene.Therefore, the precision of disparity and elevation estimated by our method can be evaluated carefully.Table 2 shows the general information of the test image pair used in these experiments.The B/H ratio of the image pair is 0.05, and spatial resolution is 0.3 meters/pixel.Overlap ratio of left and right images is 60%, and the overlap area includes 22 targets and 40 buildings.Figure 9 shows the simulated narrow baseline remotely sensed image pair and distribution of targets and buildings.Figure 9(a) is the image pair, Figure 9(b) is the distribution of targets, and Figure 9(c) is the distribution of buildings.The targets and buildings are marked with red numbers.Figure 10 shows the matching results.Figure 10(a) is the distribution of matched points for the buildings.Figure 10(b) is the distribution of matched points for the targets.
To evaluate the matching precision of our method quantitatively, interpolation based method [14], fitting based method [15], and traditional phase correlation method [23] are used to compare with our method.RMSE of disparity and mean elevation error are used as the assessment indices to evaluate the precision of four methods, where the elevation Our method Traditional PC method [23] Interpolation method [14] Fitting method [15] (a) Our method Traditional PC method [23] Interpolation method [14] Fitting method [15] (b)  Our method Traditional PC method [23] Interpolation method [14] Fitting method [15] Statistical results with RMSE ≤ 0.05 pixels (%) Our method Traditional PC method [23] Interpolation method [14] Fitting method [15] (b) Our method Traditional PC method [23] Interpolation method [14] Fitting method [15] (c) error is the absolute difference between the true and the extracted elevation.The simulated narrow baseline remotely sensed image pair is matched 300 times by each subpixel image matching method.Figures 11 and 12 show the statistical results of four methods for targets and buildings, respectively.The results show that the RMSE and the mean elevation error obtained by our method are lower than the other three methods for most targets and buildings.Table 3 shows the average values of RMSE and mean elevation error.The true average elevation of 22 targets is 67.818 m, and the true average elevation of 40 buildings is 112.353 m.According to the statistical results in Table 3, the mean elevation error of targets estimated by our method is 0.303 m, and the mean elevation error of buildings estimated by our method is 0.406 m.Such precision can satisfy the requirements for stereovision of narrow baseline remotely sensed imagery; therefore, it is proved that the image matching method based on phase correlation proposed in this paper has practical application value.

Computational Complexity.
Assume that there are  1 points needed to be matched.In integer-pixel prematching step of our algorithm, the complexity of constructing the cross-based support window for each point is (1), and the total complexity for all points of this prematching step is ( 1 ⋅ ), where  is the disparity range.
In the subpixel matching step, the core algorithm is based on the phase correlation; therefore, the complexity is unrelated to the disparity range but related to the size of subimage.Assume that there are  2 pixels in the extracted subimage.The complexity of fast DFT is ( 2 ⋅ log  2 ), the complexity of Hanning window is ( 2 ), the complexity of normalized cross-power spectrum is ( 2 ), and the complexity of IDFT of normalized cross-power spectrum is ( 2 ⋅ log  2 ).The peak evaluation is a least square fitting operation for  sets of uniformly spaced sampled-data essentially.The minimum value of  is 1; that is, the peak position is located at the most left or right of sampled-data.However, the maximum value of  is ((√ 2 − 1)/2); this means the peak position is located at the center of sampled-data; thus  ≤ ((√ 2 − 1)/2).The complexity of subpixel peak position evaluation utilizing the least square method is ((√ 2 −1)/2) 2 , which is equivalent to ( 2 ).The complexity of a single point matching in subpixel matching step is ( 2 ⋅ log  2 ); thus the complexity of total points is ( 1 ⋅ 2 ⋅log  2 ).The complexity of complete image matching algorithm is ( 1 ⋅ ) + ( 1 ⋅ 2 ⋅ log  2 ).Because the complexity of subpixel matching is higher than that of integer-pixel prematching, the complexity of our algorithm is ( 1 ⋅  2 ⋅ log  2 ).
In order to clearly illustrate the complexity of our algorithm, we compare with the classical normalized cross correlation (NCC) algorithm [2].Assume that the support window of NCC is ; in the practical image matching,  ≈  2 ≪  1 .When log  2 ≈ , the complexity of our method is similar to that of NCC.Generally, the size of subimage is 30 × 30∼40 × 40 pixels; the number of pixels in subimage is 900∼1600; thus the value range of log  2 is 9.814∼10.644.The disparity range of narrow baseline stereo image pair is usually within 20 pixels.In our experiment, the disparity range is 17 pixels; log  2 ≈  is established; therefore, the complexity of our method is similar to that of the classical NCC method.

Conclusion
Through the analysis of narrow baseline remotely sensed imagery stereovision, a subpixel image matching approach based on improved phase correlation is proposed.The crossbased local matching method is employed for prematching the image pair, and the obtained integer-pixel disparity reduces the search range of subpixel matching.A high precision subpixel matching step is implemented under the guidance of integer-pixel disparity.A peak evaluation method based on uniformly spaced sampling is designed to improve the precision of disparity estimation.The experimental results show that our method has superior performance on precision and efficiency, and it is beneficial to the stereovision of narrow baseline remotely sensed imagery.Traditional PC method [23] Interpolation method [14] Fitting method [15] (a) Our method

Figure 3 (
a) is simulated wide baseline remotely sensed image pair with a B/H ratio of 1, and Figure 3(b) is simulated narrow baseline remotely sensed image pair with a B/H ratio of 0.05.From the comparison of Figures 3(a) and 3(b), we can see that the occlusion range and the geometric distortion of narrow baseline image pair are less, and there is a higher degree of similarity between left and right image.It is beneficial to improve the accuracy of image matching; thus fully automatic DEM extraction can be achieved through stereovision of narrow baseline remotely sensed imagery.

Figure 2 :Figure 3 :
Figure 2: The captured process of the remotely sensed image pair.(a) Traditional wide baseline stereo configuration and (b) narrow baseline stereo configuration.

Figure 4 :
Figure 4: Flow chart of improved phase correlation image matching method.

Figure 5 :
Figure 5: The complete framework of our subpixel image matching algorithm.

Figure 7 :
Figure 7: The quantitative evaluation results for the four subpixel image matching methods.(a) RMSE and (b) mean error.

Figure 8 :
Figure 8: The statistical results of the quantitative evaluation.(a) Statistical results of RMSE less than or equal to 0.05 pixels; (b) statistical results of RMSE greater than 0.05 pixels and less than 0.1 pixels; and (c) statistical results of RMSE greater than or equal to 0.1 pixels.

Figure 9 :
Figure 9: The simulated narrow baseline remotely sensed image pair and distribution of targets and buildings.(a) The image pair with a B/H ratio of 0.05.(b) The distribution of targets.(c) The distribution of buildings.

Figure 10 :
Figure 10: The matching results.(a) The distribution of matched points for the buildings.(b) The distribution of matched points for the targets.

Figure 11 :
Figure 11: The statistical results of targets.(a) Comparison of RMSE.(b) Comparison of mean elevation error. 0

Figure 12 :
Figure 12: The statistical results of buildings.(a) Comparison of RMSE.(b) Comparison of mean elevation error.

Table 1 :
General information of the test image pairs.

Table 2 :
The general information of the test image pair.

Table 3 :
The RMSE and mean elevation error of four methods.