Feature Based Stereo Matching Using Two-Step Expansion

This paper proposes a novel method for stereomatching which is based on image features to produce a dense disparitymap through two different expansion phases. It can find denser point correspondences than those of the existing seed-growing algorithms, and it has a good performance in short and wide baseline situations. This method supposes that all pixel coordinates in each image segment corresponding to a 3D surface separately satisfy projective geometry of 1D in horizontal axis. Firstly, a state-of-the-art method of feature matching is used to obtain sparse support points and an image segmentation-based prior is employed to assist the first region outspread. Secondly, the first-step expansion is to find more feature correspondences in the uniform region via initial support points, which is based on the invariant cross ratio in 1D projective transformation. In order to find enough point correspondences, we use a regular seed-growing algorithm as the second-step expansion and produce a quasi-dense disparity map. Finally, two differentmethods are used to obtain dense disparitymap fromquasi-dense pixel correspondences. Experimental results show the effectiveness of our method.


Introduction
Stereo matching is an international research focus of computer vision [1].It can produce a disparity map from stereo images which are captured by cameras in different viewpoints.This technology is important in 3D reconstruction, virtual view rendering, and automatic navigation.It is a key point to know how to compute a precise disparity map in a complex environment by stereo matching.There is much excellent research to solve this problem.However, it still has some inherent challenges, such as unavoidable light variations, textureless regions, occluded areas, and nonplanar surface, that make the disparity estimation difficult [2][3][4].
To solve the inherent problems, numerous methods have been proposed in the past two decades.They consist of local and global methods [5,6].Local methods generally compute the correlation between these points and candidates over an adequate window and then use winner-takes-all (WTA) algorithm to find the best candidate to the point [7,8].They are fast to compute a disparity and flexible to model parametric surfaces within the neighborhood but have difficulties in handling poorly textured and ambiguous surfaces.Global methods are different from local approaches; they commonly integrate prior constraints into optimization of the point correspondences to solve the poorly textured areas and lessen the matching ambiguities.They produce the disparity map by an energy minimization algorithm and have a better performance in poorly textured and textureless regions but are limited to model piecewise planar scenes [9].Global methods have a goodish performance when the viewpoints are close [10] but do not handle well when the space of viewpoints becomes large [11,12].
In large-scale stereo images, ambiguous areas exist more than their short-baseline counterpart.Whether the viewpoints are close or wide, there are always some significant features, such as points of interest, which are invariable.An alternative method uses reliable feature correspondences as seeds and expands these points by using a growinglike process to obtain more point correspondences [13][14][15][16][17][18].The methods named seed-growing or region-growing can yield much better results in large perspective distortions and increased occluded areas than traditional ones.Seed-growing methods have a low computational complexity since they are not using global optimization but are sensitive to mismatches.To lessen the influence of wrong points, Cech and Sara [19] employed an optimal solution and introduced an improved 2 Mathematical Problems in Engineering growing method which can handle many difficult instances, such as repetitive or complex textures.The method does not need each seed to be accurate in disparity map.However, seed-growing algorithms only generate a semidense disparity map because of sparse feature points.
To overcome drawbacks of traditional matching methods and seed-growing algorithms, the matched features are naturally integrated into state-of-the-art stereo methods as soft constraints [3,20].In these methods, a primary work is to find accurate point correspondences as GCPs (ground control points) [21].GCP-based approaches improve stereo matching accuracy and correctness.However, GCP-based approaches need much time to obtain an accurate disparity map.
In this paper, a two-step expansion based robust dense matching algorithm is proposed based on the previous works [19,[22][23][24].Sparse support points are obtained by state-of-the-art feature matching methods [22,23].Before two-step expansion, the segmentation-based prior [24] is used to encode the assumption that the region which has the same color is a 3D surface.The first-step is a feature expansion that is presented based on the invariant cross ratio of projective transformation.The basic idea is to match more features from initial support points in uniform region via cross ratio constraint.However, there is no ability to find enough matched pixels to obtain dense disparity map.To obtain more point correspondences, in the second-step, the matched features from the first-step are used as seeds to grow and build a quasi-dense disparity map which is denser than the feature correspondences of the first-step but not an absolutely dense disparity.About the process stage from quasi-dense disparity to dense disparity, the paper introduces two methods: (i) fitting process: a planar surface fitting is used to remove mismatches and can fill blank occluded areas in the uniform region and (ii) synthesized method: an optimal solution incorporates quasi-dense pixels into global energy methods to reduce the matching ambiguities.
This new work mainly focuses on the first-step that uses a feature-expanded algorithm for stereo matching.In the first step, we suppose that it is a set of sparse points whose coordinates are given in the same 3D surface, and the coordinates of the homologous image pixels satisfy projective geometry of 1D in horizontal axis.Our motivation comes from the theory that the points of axis satisfy 1D projective transformation and that the cross ratio is invariant.By using the invariance of cross ratio, the inhomogeneous coordinates of each corresponding pixel can be approximated.The accurate coordinates of the corresponding pixel are found by a search model that computes a correlation statistic for neighboring pixels.In addition, to solve the poorly textured regions, we employ a propagation algorithm to expand low feature pixels.Occluded areas can be filled by a fitting process or a synthesized method, and the fitting process method does not use cross-checking (checking and optimizing the disparity by computing the differences between left-to-right disparity and right-to-left disparity).Experimental results demonstrate that the method of two-step expansion has considerable performances over the existing ones.It can produce denser disparity than these existing seed-growing algorithms, and it has a goodish result in short-baseline and wide-baseline stereo matching.
The paper is structured as follows: firstly, related work is discussed in Section 2. In Section 3, we introduce a supportpoint based expansion algorithm with cross ratio constraint.Then, a two-step expansion method is described, and it mainly presents the first-step about application of features expanded in Section 4. In Section 5, we describe two different methods to produce dense disparity map.Finally, we give the experimental validation supporting the feasibility of the method in Section 6.In Section 7, we give a conclusion and hint some future works.

Related Work
There are numerous literatures related to this work.Firstly, Scharstein and Szeliski [1] summarized dense stereo methods and established an early test bed for stereo matching algorithm.Then, Geiger et al. provided a newly outdoor challenge [25] for the quantitative evaluation of large-scale stereo matching.Seitz et al. [26] introduced a comprehensive study and made a comparison of stereo techniques.It included two main strategies for obtaining stereo correspondence: feature correspondences based local approaches and energyminimum based global methods.In our method, the previous two-step expansion algorithm and the latter fitting process stage belong to the first strategy, and the later synthesized method falls in the second one.
Dense energy-minimum based global methods had a good performance in the past decade.Local stereo algorithms based on feature correspondences are speedy to estimate disparity [1,27] but cannot effectively handle the blurry border and mismatches [7].Hence, most excellent stereo matching algorithms rely first on using local approaches to find the pixel correspondences and then incorporate them into global constrains by dynamic programming (DP) [28][29][30][31], level sets [32], space carving [33], PDE [12,34], EM [35], and voxel coloring [36].Recently, two global methods based on Markov random fields (MRFs) are used as basic algorithms to be improved: Graph Cuts [37] and Belief Propagation [38].Many works of research about both of them have achieved a desirable result [4,39,40].Both methods are often used to be comparable data of the top contenders in the realm of dense stereo matching and are powerful tools to produce disparity map but intractable to finish the solution in wide-baseline stereo.In contrast, our method can lessen the matching ambiguities and is efficient to large-scale stereo matching.
Sparse local feature based approaches are robust to the large-scale images.Image features play an important role in computer vision.They have already been used in wide-baseline stereo matching [41][42][43].In a wide-baseline setup, the inherent problems are perspective distortions and occlusion.Feature based matching methods are particularly effective because features are robust, distinctive, and invariant to various image and scene transformations [22,23,[44][45][46][47].However, the traditional methods based on feature matching produces only sparse pixel correspondences.To find more matched points than features, a propagation algorithm from the matched points to their neighbors is introduced.
The rule of growing a region from primary seeds was used to segment image [48].The seed-growing principle was originally introduced into stereo matching by Otto and Chau [49], O'Neill and Denos [50], and Kim and Muller [51] and used for photogrammetric community.Then Lhuillier and Quan [15,52] employed the epipolar constraint and uniqueness constraint to greedily reproduce adjacent components in disparity blankness from corresponding seeds.The growth algorithm cannot achieve a good performance in the areas of repetitive patterns.The best first strategy as an optimal solution was used to replace the pixel-wise growth increments by Zeng et al. [17,18].And the optimization cannot be able to remove the previous match errors, especially in complex scenes.Kannala and Brandt [53] and Megyesi et al. [54] introduced a propagation algorithm by affine deformation of image similarity patches.But it had inaccurate affine parameters due to wrong initial seeds and made a bad propagation.Cech and Sara [19] introduced an optimal solution and presented a seed-growing method that could recover from errors in initial seeds.However, the method only produced a semidense disparity map.In contrast, our method can not only handle the difficult instances (e.g., repetitive texture, complex scene, and wrong initial seeds) but also produce denser point correspondences than the existing methods.
To compute an accurate dense disparity map, we incorporate quasi-dense pixel correspondences as GCPs into stateof-the-art global matching framework.In these literatures about stereo matching, GCPs-based methods can achieve a precise result.Bobick and Intille [2] used GCPs to optimize DP solution and reduce large occlusion.GCPs were used in preprocessing stage to guide the previous matching process and could reduce false matched points by using the method of Kim [3] and Wang et al. [20].In [21], a GCPs-based regularization was incorporated into global method by using the Bayes optimization rule.In contrast, our method does not need provided special GCPs and can offer quasi-dense pixel correspondences as GCPs.
Geiger et al. proposed a generative probabilistic model ELAS [7] for wide-baseline stereo matching and offered a challenging KITTI dataset [25].On KITTI dataset, these methods [55][56][57] that were used to compute optical flow had better results.In contrast, our method is a just strategy for stereo matching and receives a result compared with ELAS.

Efficient Expansion with Cross
Ratio Constraint We assume there is a stereovision system as shown in Figure 1.It can be seen that there are three sets of four collinear points in the polar plane .Each set is related to the others by a line-to-line projective transformation.Since the cross ratio is invariant under a 1D projective geometry, it has the same value as

Estimation Model via Cross Ratio.
The cross ratio constraint based on 1D projective geometry needs three or more known 3D points.Thus, it needs to obtain the known 3D points.We employ features matching algorithm as prior to produce reliable point correspondences which can be used to calculate the fundamental matrix.The proportional coordinates of the known 3D points can be estimated by the point correspondences and the fundamental matrix.It can produce more errors when the region including the known 3D points is not a planar surface.
To lessen the error, we know that the points on the same epipolar line satisfy 1D projective geometry whether the surface is plane or not and introduce a search strategy that uses image points near the epipolar line instead of the 3D points, as shown in Figure 2. Suppose the images have been rectified and the point correspondences lie on the same line in both images.We wonder how the corresponding point 4  in the right image is found.Firstly, we can find the corresponding polar lines 1 and 2.Then we can find a point V4  satisfies the following equation: If the points 1  , 2  , 3  and 1  , 2  , 3  are the homologous points that the 3D points 1, 2, and 3 project to two images separately in Figure 1, the points V1  , V2  , V3  , and V1  , V2  , V3  are not the corresponding points projected by the 3D points 1, 2, and 3 because of projective transformation.Hence, the point V4  is not the corresponding point 4  , but adjacent to the point 4  .The distance points to the epipolar line are shorter; the position of V4  is nearer to the point 4  .We employ a probabilistic search strategy to ensure the point 4  in the V4  has contiguous pixels along the line 2.

Search
where  is a proportional constant and  is the threshold for the correct correspondence, if 4  = −1 means no corresponding point.

A Two-Step Expansion Method
In this section, we describe a two-step expansion algorithm based on image features to compute quasi-dense point correspondences between two views.Our method is inspired by observing an instance where all points in the uniform surface satisfy 1D projective geometry in horizontal axis.
And in 1D projective transformation, the cross ratio of the projected points is invariant.Our algorithm is arranged as follows: firstly, a sparse set of initial support points are found by excellent feature matching method.Then, in the firststep expansion, we use segmentation-based prior to partition the image into different regions and employ the invariance of cross ratio as a restrictive condition to find the more corresponding feature points from the support points in the same region.Finally, a regular seed-growing approach is used to obtain more pixel correspondences as the second-step expansion.
Suppose there exists a pair of images  = {  ,   }, where   ,   are the separately left and right images, this section aims at finding the quasi-dense disparity   corresponding to   .To expediently introduce our method, we suppose input images   and   are rectified, such that corresponding points lie on epipolar lines of two images.

Initial Support Points.
Before expansion, we introduce how to establish a sparse set of feature correspondences as initial support points.Most algorithms which are used to extract image features can be categorized as either corner detectors (such as Harris and Stephens [22] and SUSAN [46]) or descriptor extraction (such as SIFT [23], SURF [44], and DAISY [47]).Recently, a regional feature detector [59] based on descriptor [23] had a good performance in dealing with the large-scale instance.In our method, we employed regular Harris method [22] to obtain initial support points.While in the presence of large disparity ranges, the number of the successfully matched Harris points is less than the threshold  (which is decided by the number of the segmented regions; refer to Section 4.2), scale invariant feature transform (SIFT) algorithm is used to extract features, and the KD-tree with the best bin first (BBF) [60] algorithm is employed to index and match these features.We assume the {(  ,   )}  is matched point pairs by feature matching method, where {  } and {  } are points from two images.

The First-Step Expansion.
At this stage, our objective is to compute all the possible feature point correspondences through the initial support points in the uniform region.
The first-step expansion is based on segmented regions; thus, we employ the mean-shift method to segment the reference image   before expanding feature points.The mean-shift algorithm which was successfully used to partition images by Comaniciu and Meer [24] can ensure our method estimates regions correctly and localizes depth boundaries precisely.The result of image segmentation will set a different label to each segmented region.If the number of the segmented regions is , then the threshold  in Section 4.1 is  = 3 × .
In Section 3, we introduce an expansion model based on 1D projective geometry in the same planar surface.We now use the expansion model as the first-step expansion algorithm.More formally, let  = { 1 ,  2 ,  3 , . . .,   } be a set of labels with respect to the different segmented regions of the left image.Each pixel  is assigned to a corresponding label   where   ∈ .
We assume the initial support points belonging to a label   construct a set of samples   = {(  ,   )}  , where   ∈   .In this step, we spread sweeping feature correspondences from initial support points in the same region.In our method, the feature means a point whose absolute value of gradient is bigger than 1.Hence, our prior is a process that computes gradient (, ) of each pixel in the image and selects the pixels whose gradient |(, )| ≥ 1 as our candidate feature points.Suppose we have found all the feature points {  } and each feature   ∈ {  } is assigned to a corresponding label   .We will introduce how to find the corresponding point   through   .
The expansion algorithm mainly is based on epipolar geometry and 1D projective transformation.Epipolar restraint has been used to rectify the image and restrict the corresponding points to the same lines in the images.We just need to find three support points to estimate the probabilistic position of the corresponding point.The number of the initial support points {} in each region   is not fixed and can be sorted to two statuses: more than three points and no more than three points.This step mainly handles the first status.
When the number of {} is more than three, as shown in Figure 4, we can consider each horizontal axis on which pixels lie is its corresponding epipolar because of image rectification.Suppose each point   = (  ,   ), where  = 1, 2, . . ., 8, and three support points which would be found satisfy the following conditions: (i) three points have a minimum summation of the distances to the epipolar line at the same time and (ii) any two -coordinates of all points should not be equal.For example, as shown in Figure 4, the support points of 7  are 1  , 2  , and 5  and the support points of 8  are 1  , 3  , and 5  , that satisfy the two conditions.The corresponding search radii are separately 7 = min(ratio×7, ) and 8 = min(ratio×8, ).Then the corresponding point can be found by the method of Section 3.

The Second-Step Expansion.
The second-step employs a regular seed-growing method to obtain stable correspondences in poorly textured regions.Suppose the first-step produces a list of point correspondences  = {(  ,   )}  .We regard the point correspondences  as seeds to grow corresponding patches.Despite the fact that the first-step can find more effective point correspondences, it inevitably introduces errors in complex areas.The traditional seedgrowing algorithms do not handle wrong initial seeds well.
To overcome the drawbacks, Cech and Sara [19] temporarily forwent uniqueness constraint, propagated most disparity ingredients, and then optimized them to remove these false disparity components.Hence, the second-step employs the method of Cech to obtain quasi-dense disparity   .Cech method includes two phases: (i) growing and propagating as many seeds as possible regardless of their overlaps and (ii) optimizing these seeds of the first phase and removing these false ones.The seed-growing method of Cech can keep accurate point correspondences and recover most disparities from false seeds.The detailed descriptions of the seed-growing method can be referred to the literature [19].

Obtaining Dense Disparity Map
The two-step expansion method cannot find all pixel correspondences in some regions because of occlusion and cannot produce completely dense disparity map.We introduce two different processes to compute dense disparity map from quasi-dense point correspondences.One is a filling process by regional 3D surface fitting; the other is a synthesized method that integrates quasi-dense pixel correspondences as GCPs into global optimization frameworks in a principled way.

Fitting Process.
In Section 4.2, we have obtained different regions from the image   by the segmentation-based prior.The segmented regions may be with respect to different 3D surface.Now we assume each 3D surface is planar.In some regions of a quasi-dense disparity   , there may be only a few corresponding points and some piecewise patches which are built by unmatched points due to occlusion.A 3D planar surface fitting can be applied to fill the uncharted patches in the same region.
Assume there exists a set of pixel correspondences {(  ,   )}  in an arbitrary region   , and we use the regional data {  } to compute a 3D plane   .We describe each pixel of the quasi-dense disparity   as   = (  ,   ,   ), where   and   are the coordinates of   in the image, and   is a corresponding disparity.Then, we can use a set of points {  } ∈   to fit a 3D planar surface: where , , , and  are the parameters which are used to describe a plane.These pixels belonging to the same area satisfy the 3D plane equation and can be computed the involved disparities.

Synthesized Method.
Recently, a mixed stereo model which uses these known point correspondences as the GCPs to improve the result of global matching has a good performance in textureless areas and occlusion.Synthesized method is inspired from the method of Wang [21] and formulates the stereo modal as a MAP-MRF problem.Assume the quasi-dense   is produced from a pair of images  = {  ,   } by two-step expansion.Based on Bayes' rule, the posterior probability of the disparity map  is expressed as ( | ,   ) ∝ ( | )(  | )().Finding the maximum posterior cost means minimizing the corresponding negative log likelihood.Thus computing a disparity map  becomes the problem for minimizing the energy function:  () =  data () +  smoothness () +  quasi-dense () , (5) where  data is a function to estimate the probability for disparity map,  smoothness is a smoothness term to encourage similar neighboring points in locally smooth region, and  quasi-dense is the energy of   to constrain the accuracy of disparity map .The details can be referred to the literature [21].

The Overall Algorithm.
The process of two-step expansion algorithm is summarized as in Algorithm 1.

Experiments
We took different experiments to demonstrate the validity of our method.In Section 6.1, we compared our approach to the seed-growing method of Cech on the real complex scenes [19].It was a test on running time for different image resolutions in Section 6.2.We then separately evaluated fitting process and synthesized method on Middlebury benchmark short-baseline stereo images with known ground truth data in Section 6.3.In Section 6.4, we tested our algorithm on largescale stereo image pairs.Throughout all experiments we set ratio = 0.3,  = 7,  = 5,  = 1.2, and  = 4 × 10 −4 , which were empirically determined.All experiments were operated on the computer with Intel core 2 duo CPU and 2.93 GHz clock frequency.Unless stated otherwise, we employed regular Harris method to obtain initial matched points and performed mean-shift image segmentation using EDISON code [61] implementation of Comaniciu's paper [24].
6.1.Computing Quasi-Dense Disparities.Firstly, we can obtain quasi-dense disparity map by the two-step spreading.We demonstrated the difference between the seed-growing method and our algorithm by comparing their performances on some real data.In known seed-growing algorithms, the method proposed by Cech and Sara [19] has a better performance, even in the presence of repetitive patterns.Hence, we compared our approach to the method of Cech.We tested different stereo images from the Cech dataset [62], that is, St. Martin, Head, and Larch.The relevant quasi-dense disparities of the images are shown in Figure 5.It can be seen that our algorithm can produce denser disparity map than the algorithm proposed by Cech.A comparison to the number of the corresponding points in different images is depicted in Table 1.
This experiment result demonstrates that our method can produce a quasi-dense disparity map via a sparse set of initial feature correspondences.Our method does not need too accurate matched features as seeds.In a repeated experiment, our method always found more point correspondences than Cech's method.
Input: A pair of rectified images   and   from different viewpoints of one scene; Set the values of ratio, , , , and .Output: The disparity map with respect to   .Begin: Step 1. Finding initial point correspondences {(  ,   )}  by using state-of-the-art matching method; Step 2. Using the mean-shift to partition the image  1 to different areas denoted as:  = { 1 ,  2 ,  3 , . . .,   }; Step 3. Assigning the points {  } into the corresponding area   ; Step 4. Removing the coarse mismatches in the area   by using regional affine transformation; Step 5. Computes gradient (, ) of each pixel in the image and selects the pixels {  } whose gradient      (, )     ≥ 1 as our candidate feature points; Step 6. Assign each feature   to a corresponding label   ; Step Step 8.The second step expansion by using a regular seed-growing method Step 9. Obtain dense disparity map by using fitting process or synthesized method Algorithm 1: Two-step expansion algorithm.

Running Time.
The term of running time is relative to the elements, that is, image resolution, segmented regions, and initial support points.We changed the image resolutions for Tsukuba, Teddy, Cones, and Venus from Middlebury benchmark [63] and then took a running time statistic with respect to the three elements separately.We downscaled the images bicubically by 10%∼90%, tested the running time in different resolutions, and recorded the corresponding numbers of regions and points.As shown in Figures 6(a), 6(b), and 6(c), it is related to the illustration about the running time of images in different resolutions, regions, and points.It can be seen that the numbers of the segmented regions and the initial points are more, and the corresponding running time is shorter in the same resolution.Figure 6(d) shows the corresponding relations between segmented regions and matched points in different image resolutions.

Short-Baseline Stereo
Matching.We tested the fitting process and the synthesized method on several image pairs, that is, Tsukuba, Venus, Teddy, and Cones from the Middlebury benchmark [63].The maximum of the disparity for the images is less than 100.Firstly, we used a two-step method to produce the different quasi-dense disparities of the images.Then, we found the corresponding disparity maps by the fitting process and synthesized method.When we implemented the fitting process, we restricted the maximal difference of disparity in the same region less than 10.In the synthesized method, we employed Graph Cuts [37] as our assistant global method.The goal is to compute a disparity map  by the function (5).In time statistics, the fitting process takes about 1.3 minutes to estimate a disparity and the synthesized method takes about 1.8 minutes.We demonstrate in Figure 7 the results for the images of Tsukuba, Venus, Teddy, and Cones.As can be seen, these disparities produced by synthesized method have a clear structure and few blurry areas.
To evaluate the performance of our method, we used the quality measure method proposed in [1] with known ground truth data to evaluate the synthesized results.The matching results rank 87 and 62 with respect to 1 and 0.5 pixels error in Middlebury website.These competitive algorithms were based on these classical original methods and proposed an improved algorithm.They commonly integrated many technologies into their methods and had a better performance.Our method is first proposed by us without more refined technologies.To verify the validity of the method, we compared it with these classical original methods, that is, GC (graph cuts) [37], CSBP (constant-space belief propagation) [64], DP [29], and SO (scanline optimization) [1], as shown in Table 2, where the absolute error is more than 2 pixels.Quality evaluation uses the three performance measures: nonocc (bad pixels of nonoccluded), all (bad pixels of the entire image), and disc (bad pixels of near discontinuous).
It can be seen from Table 2 that our method has a better result than these of the traditional methods.The wider the baseline is, the more obvious the accuracy of the disparities is, for example, the scenes of Teddy and Cones.Because our method is based on feature matching which is efficient to large-scale images.On the Venus scene, the error of our method is not the lowest in these methods.Since global methods employ energy-minimum function to optimize the disparity map and have a better performance than featurebased local methods in short-baseline.The images of Venus scene belong to short-baseline.The global methods of GC   and CSBP have more accurate results than our method on the Venus scene.
6.4.Large-Scale Stereo Matching.Though short-baseline stereo matching can yield accurately dense disparity, there is much more challenge in large-scale stereo images because of too much occlusion.In large-scale stereo images, we computed the disparity just via the fitting process and without the synthesized method.Firstly, we compared the fitting process on a wide range of baseline high resolution images, that is, Aloe and Raindeer from the Middlebury benchmark [63] whose maximum of the disparity is bigger than 200.In particular, we compared our method against the method ELAS proposed by Geiger et al. [7], as shown in Figure 8.We compute all erroneous pixels of the entire image whose absolute error is more than 3 pixels.The error results of our method are 13.03% and 20.36% separately corresponding to the images Aloe and Raindeer, and the results of ELAS are 14.14% and 22.28%.
Then, we took a test on the KITTI dataset [25], which consists of 194 training and 195 test pairs of urban images.The training images with semidense ground truth disparities are used to adapt the parameters of stereo matching methods.There is no parameter to be trained and modified in our method.The test images without ground truth are used to evaluate participants in the challenge.On the dataset, the main problem is how to handle these textureless areas.We computed the disparity maps of these test images via the fitting process, and some results are shown in Figure 9.The average run time for computing a disparity map is about 4.7 minutes.The matching results rank 38 and 35 with respect to 3-and 5-pixel error in KITTI website.We compared our method to the similar methods, that is, ELAS [7], GCSF (growing correspondence seeds flow) [55], and GCS (growing correspondence seeds) [19], as shown in Table 3, where Out-Noc is the percentage of erroneous pixels in nonoccluded areas, and Out-All is the percentage of erroneous pixels in total.Avg-Noc is the ratio of average disparity or end-point error in nonoccluded areas.Avg-All is the ratio of average disparity or end-point error in total.The qualitative results for this dataset are similar to the previous evaluation.We are able to robustly reconstruct large-scale images, which leads to low error rates on the street and on other slanted surfaces.

Conclusion
In this paper, we introduce a two-step expansion to produce precise disparity maps from stereo images whether the stereo baseline is short or large.Our method is based on feature matching and can cope with the difficult cases such as large perspective distortions, increased occluded areas, and complex scenes.Our experiments on Cech's dataset, the Middlebury benchmark, and KITTI dataset demonstrate that our method achieves good results in the real complex   scenes, short or wide baseline image pairs.Importantly, we introduce a cross ratio restraint model to expand more feature correspondences based on state-of-the-art feature matching.Our method primarily involves performing point computation in large numbers of segmented regions, which is fit for implementing in GPU and can real-time compute the disparity map of stereo images.

3. 1 .Figure 1 :
Figure1: Two cameras are indicated by their centres   and   and their image planes   and   .There are 4 3D points 1, 2, 3, and 4 in uniform 3D planar surface , and the point 4 projects to 4  and 4  in the images   and   , respectively.Line 2 in the right image and line 1 in the left image are epipolar lines separately with respect to points 4  and 4  .Two camera centres, 3-space point 4, and its images 4  and 4  lie in an epipolar plane .The intersection of the planes  and  determines the line  in 3D.The 3D points 1, 2, and 3 are the closest points on the line  to the 3D points 1, 2, and 3.Points 1  , 2  , and 3  in the left image and points 1  , 2  , and 3  in the right image are projected by 3D points 1, 2, and 3, and points 1  , 2  , and 3  lie on the line 1, and points 1  , 2  , and 3  lie on the line 2.

Figure 5 :
Figure 5: Results for quasi-dense disparities of Cech dataset are as follows: (a) St. Martin, (b) Head, and (c) Larch.Disparity maps are partitioned in different colors: colder color means smaller disparities, warmer color means larger disparities, and deeply blue areas are unassigned disparity.

Figure 6 :
Figure 6: The relations of running time to (a) image resolution, (b) number of regions, (c) number of support points on the Tsukuba, Teddy, Cones, and Venus image pairs, and (d) the relevant segmented regions and corresponding points in different resolutions of the images.

Figure 7 :
Figure 7: Results of our two different methods on short-baseline dataset.(a) Left images.(b) Results of fitting process.(c) Results of synthesized method.(d) Ground truth disparities.
Figure2:   and   are the images observed from the camera centres   and   separately.Given three sets of matched point pairs(1  , 1  ), (2  , 2  ), and (3  , 3  ), an unmatched point 4  in the left image and the unknown corresponding point 4  in the right image, 1 and 2 are the lines on which the corresponding points 4  and 4  lie.Points V1  , V2  , and V3  are the closest points on the line 1 to the points 1  , 2  , and 3  in the left image, and points V1  , V2  , and V3  are the closest points on the line 2 to the points 1  , 2  , and 3  in the right image.
Strategy.The search strategy computes all correlations with the neighbors of the point V4  and decides the position of the point 4  .As shown in Figure3, a set of neighborhoods {(1), . . ., (), V4  , ( + 1), . . ., (2)} of size (2 ×  + 1) whose centre is point V4  is built as a set of candidate matches.The value  is the radius of search and is decided by the maximum  of the Euclidean distances from referenced points 1  , 2  , and 3  to the line 2.If the distance from reference point   ( = 1, 2, 3) to the line 2 is , then the maximum is  = max() and  = min(ratio × , ), where ratio, , is nonzero constants for proportional  and fixed radius.We use sum of absolute  has the SAD minimum and () = min{()}, where  = 1, 2, . . ., ,  + 2, . . ., 2 ×  + 1, then the corresponding point 4  is defined as [58]erences (SAD)[58]on  ×  window as image similarity statistic between the point 4  and all candidate points in the right image, where  is a positive constant.Assuming the SAD value between the point 4  and candidate points is (), where  = 1, 2, 3, . . ., 2 ×  + 1, if the SAD value between 4  and V4  is V = ( + 1), and the th point in candidate ones except the point V4 In the region   , a set of matched points {1  , 2  , 3  , 4  , 5  , 6  } has been known, and 7  and 8  are the points which need to search the corresponding point.The horizontal axes 7 and 8 are the epipolar lines of the points 7  and 8  , separately.

Table 1 :
Comparison of the results on number of corresponding points.
7. The first step for the matched feature expansion Repeat: for:  = 1: size({  }) ensure   via to:   ∈   ; find a set of samples   = {(  ,   )}  , where   ∈   ; if: size({  }) > 3 compute the point   by estimation model via cross ratio; end if end for Until {  } is empty

Table 2 :
Comparative performance of stereo algorithms according to Middlebury methodology.

Table 3 :
Comparative evaluation results on KITTI test dataset.