^{1}

^{1}

^{2}

^{1}

^{2}

This paper proposes a novel method for stereo matching which is based on image features to produce a dense disparity map through two different expansion phases. It can find denser point correspondences than those of the existing seed-growing algorithms, and it has a good performance in short and wide baseline situations. This method supposes that all pixel coordinates in each image segment corresponding to a 3D surface separately satisfy projective geometry of 1D in horizontal axis. Firstly, a state-of-the-art method of feature matching is used to obtain sparse support points and an image segmentation-based prior is employed to assist the first region outspread. Secondly, the first-step expansion is to find more feature correspondences in the uniform region via initial support points, which is based on the invariant cross ratio in 1D projective transformation. In order to find enough point correspondences, we use a regular seed-growing algorithm as the second-step expansion and produce a quasi-dense disparity map. Finally, two different methods are used to obtain dense disparity map from quasi-dense pixel correspondences. Experimental results show the effectiveness of our method.

Stereo matching is an international research focus of computer vision [

To solve the inherent problems, numerous methods have been proposed in the past two decades. They consist of local and global methods [

In large-scale stereo images, ambiguous areas exist more than their short-baseline counterpart. Whether the viewpoints are close or wide, there are always some significant features, such as points of interest, which are invariable. An alternative method uses reliable feature correspondences as seeds and expands these points by using a growing-like process to obtain more point correspondences [

To overcome drawbacks of traditional matching methods and seed-growing algorithms, the matched features are naturally integrated into state-of-the-art stereo methods as soft constraints [

In this paper, a two-step expansion based robust dense matching algorithm is proposed based on the previous works [

This new work mainly focuses on the first-step that uses a feature-expanded algorithm for stereo matching. In the first step, we suppose that it is a set of sparse points whose coordinates are given in the same 3D surface, and the coordinates of the homologous image pixels satisfy projective geometry of 1D in horizontal axis. Our motivation comes from the theory that the points of axis satisfy 1D projective transformation and that the cross ratio is invariant. By using the invariance of cross ratio, the inhomogeneous coordinates of each corresponding pixel can be approximated. The accurate coordinates of the corresponding pixel are found by a search model that computes a correlation statistic for neighboring pixels. In addition, to solve the poorly textured regions, we employ a propagation algorithm to expand low feature pixels. Occluded areas can be filled by a fitting process or a synthesized method, and the fitting process method does not use cross-checking (checking and optimizing the disparity by computing the differences between left-to-right disparity and right-to-left disparity). Experimental results demonstrate that the method of two-step expansion has considerable performances over the existing ones. It can produce denser disparity than these existing seed-growing algorithms, and it has a goodish result in short-baseline and wide-baseline stereo matching.

The paper is structured as follows: firstly, related work is discussed in Section

There are numerous literatures related to this work. Firstly, Scharstein and Szeliski [

Dense energy-minimum based global methods had a good performance in the past decade. Local stereo algorithms based on feature correspondences are speedy to estimate disparity [

Sparse local feature based approaches are robust to the large-scale images. Image features play an important role in computer vision. They have already been used in wide-baseline stereo matching [

The rule of growing a region from primary seeds was used to segment image [

To compute an accurate dense disparity map, we incorporate quasi-dense pixel correspondences as GCPs into state-of-the-art global matching framework. In these literatures about stereo matching, GCPs-based methods can achieve a precise result. Bobick and Intille [

Geiger et al. proposed a generative probabilistic model ELAS [

In the epipolar geometry of two views, it can restrict the corresponding point on the polar line. To find the precise position of the corresponding point, traditional algorithms employ exhaustive search along the corresponding line and give a statistic for correlation of all candidates. To fasten the position estimation of the corresponding point on line, we introduce a new constraint based on 1D projective geometry.

We assume there is a stereovision system as shown in Figure

Two cameras are indicated by their centres

The cross ratio constraint based on 1D projective geometry needs three or more known 3D points. Thus, it needs to obtain the known 3D points. We employ features matching algorithm as prior to produce reliable point correspondences which can be used to calculate the fundamental matrix. The proportional coordinates of the known 3D points can be estimated by the point correspondences and the fundamental matrix. It can produce more errors when the region including the known 3D points is not a planar surface.

To lessen the error, we know that the points on the same epipolar line satisfy 1D projective geometry whether the surface is plane or not and introduce a search strategy that uses image points near the epipolar line instead of the 3D points, as shown in Figure

The search strategy computes all correlations with the neighbors of the point

In this section, we describe a two-step expansion algorithm based on image features to compute quasi-dense point correspondences between two views. Our method is inspired by observing an instance where all points in the uniform surface satisfy 1D projective geometry in horizontal axis. And in 1D projective transformation, the cross ratio of the projected points is invariant. Our algorithm is arranged as follows: firstly, a sparse set of initial support points are found by excellent feature matching method. Then, in the first-step expansion, we use segmentation-based prior to partition the image into different regions and employ the invariance of cross ratio as a restrictive condition to find the more corresponding feature points from the support points in the same region. Finally, a regular seed-growing approach is used to obtain more pixel correspondences as the second-step expansion.

Suppose there exists a pair of images

Before expansion, we introduce how to establish a sparse set of feature correspondences as initial support points. Most algorithms which are used to extract image features can be categorized as either corner detectors (such as Harris and Stephens [

At this stage, our objective is to compute all the possible feature point correspondences through the initial support points in the uniform region. The first-step expansion is based on segmented regions; thus, we employ the mean-shift method to segment the reference image

In Section

We assume the initial support points belonging to a label

The expansion algorithm mainly is based on epipolar geometry and 1D projective transformation. Epipolar restraint has been used to rectify the image and restrict the corresponding points to the same lines in the images. We just need to find three support points to estimate the probabilistic position of the corresponding point. The number of the initial support points

When the number of

In the region

The second-step employs a regular seed-growing method to obtain stable correspondences in poorly textured regions. Suppose the first-step produces a list of point correspondences

Cech method includes two phases: (i) growing and propagating as many seeds as possible regardless of their overlaps and (ii) optimizing these seeds of the first phase and removing these false ones. The seed-growing method of Cech can keep accurate point correspondences and recover most disparities from false seeds. The detailed descriptions of the seed-growing method can be referred to the literature [

The two-step expansion method cannot find all pixel correspondences in some regions because of occlusion and cannot produce completely dense disparity map. We introduce two different processes to compute dense disparity map from quasi-dense point correspondences. One is a filling process by regional 3D surface fitting; the other is a synthesized method that integrates quasi-dense pixel correspondences as GCPs into global optimization frameworks in a principled way.

In Section

Assume there exists a set of pixel correspondences

Recently, a mixed stereo model which uses these known point correspondences as the GCPs to improve the result of global matching has a good performance in textureless areas and occlusion.

Synthesized method is inspired from the method of Wang [

The process of two-step expansion algorithm is summarized as in Algorithm

Input: A pair of rectified images

Set the values of

Output: The disparity map with respect to

Begin:

ensure

find a set of samples

if: size

compute the point

end if

We took different experiments to demonstrate the validity of our method. In Section

Throughout all experiments we set

Firstly, we can obtain quasi-dense disparity map by the two-step spreading. We demonstrated the difference between the seed-growing method and our algorithm by comparing their performances on some real data. In known seed-growing algorithms, the method proposed by Cech and Sara [

Comparison of the results on number of corresponding points.

St. Martin | Head | Larch | |
---|---|---|---|

Cech 07 | 481733 | 293004 | 165218 |

Our method | 749585 | 379153 | 195531 |

Results for quasi-dense disparities of Cech dataset are as follows: (a) St. Martin, (b) Head, and (c) Larch. Disparity maps are partitioned in different colors: colder color means smaller disparities, warmer color means larger disparities, and deeply blue areas are unassigned disparity.

This experiment result demonstrates that our method can produce a quasi-dense disparity map via a sparse set of initial feature correspondences. Our method does not need too accurate matched features as seeds. In a repeated experiment, our method always found more point correspondences than Cech’s method.

The term of running time is relative to the elements, that is, image resolution, segmented regions, and initial support points. We changed the image resolutions for Tsukuba, Teddy, Cones, and Venus from Middlebury benchmark [

The relations of running time to (a) image resolution, (b) number of regions, (c) number of support points on the Tsukuba, Teddy, Cones, and Venus image pairs, and (d) the relevant segmented regions and corresponding points in different resolutions of the images.

We tested the fitting process and the synthesized method on several image pairs, that is, Tsukuba, Venus, Teddy, and Cones from the Middlebury benchmark [

Results of our two different methods on short-baseline dataset. (a) Left images. (b) Results of fitting process. (c) Results of synthesized method. (d) Ground truth disparities.

To evaluate the performance of our method, we used the quality measure method proposed in [

Comparative performance of stereo algorithms according to Middlebury methodology.

Tsukuba | Venus | Teddy | Cones | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Nonocc | All | Disc | Nonocc | All | Disc | Nonocc | All | Disc | Nonocc | All | Disc | |

Ours | 2.57 | 3.13 | 10.8 | 3.34 | 3.49 | 13.1 | 5.44 | 7.84 | 12.1 | 3.41 | 8.30 | 9.21 |

GC | 1.67 | 3.75 | 8.20 | 0.83 | 2.37 | 8.12 | 9.72 | 18.8 | 17.2 | 4.51 | 15.0 | 11.2 |

CSBP | 1.74 | 3.84 | 9.10 | 1.09 | 2.52 | 12.9 | 8.18 | 17.3 | 20.6 | 4.07 | 14.2 | 11.3 |

DP | 3.43 | 4.23 | 9.85 | 6.50 | 7.43 | 17.4 | 7.11 | 14.9 | 13.4 | 6.52 | 15.1 | 15.1 |

SO | 4.23 | 6.21 | 10.7 | 5.14 | 6.58 | 17.0 | 11.3 | 20.2 | 18.6 | 8.76 | 18.8 | 16.1 |

It can be seen from Table

Though short-baseline stereo matching can yield accurately dense disparity, there is much more challenge in large-scale stereo images because of too much occlusion. In large-scale stereo images, we computed the disparity just via the fitting process and without the synthesized method. Firstly, we compared the fitting process on a wide range of baseline high resolution images, that is, Aloe and Raindeer from the Middlebury benchmark [

Comparison to Geiger’s method on the Aloe and Raindeer image pairs. (a) Left images. (b) ELAS results. (c) Fitting process results. (d) Ground truth disparities.

Then, we took a test on the KITTI dataset [

Comparative evaluation results on KITTI test dataset.

>2 pixels | >3 pixels | >4 pixels | >5 pixels | End-point | ||||||
---|---|---|---|---|---|---|---|---|---|---|

Out-Noc | Out-All | Out-Noc | Out-All | Out-Noc | Out-All | Out-Noc | Out-All | Avg-Noc | Avg-All | |

ELAS | 10.96% | 12.83% | 8.24% | 9.96% | 6.73% | 8.24% | 5.67% | 6.97% | 1.4 px | 1.6 px |

Ours | 14.59% | 16.08% | 9.91% | 11.30% | 7.32% | 8.57% | 5.74% | 6.87% | 1.7 px | 1.9 px |

GCSF | 17.41% | 18.73% | 12.05% | 13.24% | 9.22% | 10.28% | 7.54% | 8.49% | 1.9 px | 2.1 px |

GCS | 19.03% | 20.32% | 13.38% | 14.54% | 10.41% | 11.43% | 8.64% | 9.55% | 2.1 px | 2.3 px |

Results on urban scenes. (a) Left images. (b) Our method results. Best viewed is in different colors.

In this paper, we introduce a two-step expansion to produce precise disparity maps from stereo images whether the stereo baseline is short or large. Our method is based on feature matching and can cope with the difficult cases such as large perspective distortions, increased occluded areas, and complex scenes. Our experiments on Cech’s dataset, the Middlebury benchmark, and KITTI dataset demonstrate that our method achieves good results in the real complex scenes, short or wide baseline image pairs. Importantly, we introduce a cross ratio restraint model to expand more feature correspondences based on state-of-the-art feature matching.

Our method primarily involves performing point computation in large numbers of segmented regions, which is fit for implementing in GPU and can real-time compute the disparity map of stereo images.

The authors declare that there is no conflict of interests regarding the publication of this paper.