To fulfill the applications on robot vision, the commonly used stereo matching method for depth estimation is supposed to be efficient in terms of running speed and disparity accuracy. Based on this requirement, Delaunay-based stereo matching method is proposed to achieve the aforementioned standards in this paper. First, a Canny edge operator is used to detect the edge points of an image as supporting points. Those points are then processed using a Delaunay triangulation algorithm to divide the whole image into a series of linked triangular facets. A proposed module composed of these facets performs a rude estimation of image disparity. According to the triangular property of shared vertices, the estimated disparity is then refined to generate the disparity map. The method is tested on Middlebury stereo pairs. The running time of the proposed method is about 1 s and the matching accuracy is 93%. Experimental results show that the proposed method improves both running speed and disparity accuracy, which forms a steady foundation and good application prospect for a robot’s path planning system with stereo camera devices.
1. Introduction
In recent years, mobile robot vision navigation research has mainly focused on obtaining three-dimensional information of the robot surroundings accurately and in real time. Stereo matching algorithm is a key issue in three-dimensional scene reconstruction. Running time and matching accuracy play a vital role in mobile robot visual navigation and autonomous positioning.
Scharstein and Szeliski [1] presented a taxonomy of dense, two-frame stereo methods to assess the different components and design decisions made in individual stereo algorithms. According to whether or not the algorithm includes a global optimization function, stereo matching algorithms are divided into local and global methods. Typically, local stereo matching algorithms [2] use a support window of fixed shape and size to calculate the matching cost. The disparity is then obtained after aggregating the matching cost by summing or averaging over a support region. Stereo algorithms based on local correspondence are typically fast. Nevertheless, an adequate choice of window shape and size is necessary, as it leads to a trade-off between low matching ratios for small window sizes and border bleeding artifacts for larger ones. As a consequence, poorly textured or ambiguous surfaces cannot be matched consistently. Meeting the needs of practical applications is difficult. Algorithms based on global correspondences overcome some of the aforementioned problems by imposing smoothness constraints on the disparities in the form of regularized energy functions. Given that optimization methods such as MRF-based energy functions are generally NP-hard, various approximation algorithms have been proposed, for example, graph cuts [3], belief propagation [4], and max-flow or simulated annealing. Although global optimization methods can acquire high accuracy disparity space map (DSI), the modeling process is complex that they generally require large computational efforts and high memory capacity even on low-resolution imagery. For example, the running time of the graph cut method is nearly 20 s or longer. Therefore, further study on local methods according to the needs of practical applications is valuable.
To solve the mutual constraints between running time and matching accuracy, much research effort has been done and numerous optimum methods have been presented. Zhou et al. [5] presented a fast stereo matching algorithm based on support point expansion and used Gibbs Random Field to describe the energy function. Compared with traditional belief propagation or graph cut algorithm, the algorithm has the advantage of good matching accuracy and running speed. Mei et al. [6] used AD-census method to initialize the matching cost and improved the matching accuracy by adding the smoothing constraints along the scan line after aggregation. Efficient large-scale stereo matching algorithm (ELAS) [7] builds a priori on the disparities by forming a triangulation on a set of support points which can be robustly matched. Computing the left and right disparity maps for a one megapixel image pair takes about one second on a single CPU core. However, the accuracy of disparity map should be improved when it is applied in the three-dimensional reconstruction or target recognition. The basic idea of the literature [8, 9] is to apply the triangulation method to accomplish disparity estimation. The triangle area is used as the matching unit, and all pixels within the triangle are assumed to have the same disparity. Even though it can obtain a dense literature [10], the running time is slower and is likely to cause excessive smoothness in some triangle areas.
In this paper, we propose an effective local stereo matching method based on Delaunay triangulation for stereo matching, which allows dense matching with small aggregation windows by reducing ambiguities on the correspondences. Our approach builds the disparity model over the disparity space by forming a triangulation on a set of robustly matched correspondences, named support points, which are detected by a Canny edge operator. The efficient algorithm reduces the search space and can be easily parallelized. As triangles share a joint vertex, the initial disparity of each vertex is refined. As demonstrated in our experiments, our method achieved good performance when compared with prevalent approaches.
2. Support Points Generation
As support points, we denote pixels that can be robustly matched due to their texture and uniqueness. We are inspired by the characteristic that an image edge contains most of the image information and widely exists in objects and backgrounds or between objects.
The common edge detection operators are Prewitt operator, Sobel operator, Canny operator, LOG operator, Roberts operator, and so on. Prewitt operator and Sobel operator, as first-order differential operators, are average filters and weighted average filters, respectively. LOG operator first uses a Gaussian function to smoothen images and then uses Laplace transform to process images. Sobel, Robert, and Prewitt methods are sensitive to noise and easily form a nonclosed edge area [11], and their edge detection effects are often unsatisfactory. Canny method can be applied to different occasions for the advantages of low missing detection rate, low error detection rate, and high accuracy of edge positioning.
However, the parameters of a (high and low threshold) Canny operator were artificially set and were not adaptive for different images. A good edge detection effect is difficult to obtain when the threshold is manually set. Thus, selecting the appropriate threshold for image edge detection is very important. The method based on gradient magnitude histogram and intraclass variance minimization [12] is adopted to determine the adaptive threshold. This method does not require artificially setting the thresholds and can automatically obtain its own threshold according to different images, excluding the influence of human factors.
The number of support points is larger and affects the running time of the subsequent triangulation algorithm, as well as the entire running time of stereo matching. Under the premise of preserving the edge details of the reference image, we selected the support points every other line after edge detection. Experimental results also provide good verification with the support point images shown in Figure 1. We can see that all support points are distributed along the image edges uniformly.
Support points images detected by the Canny operator.
Cones left image
Cones support points image
Aloe left image
Aloe support points image
3. Support Point Triangulation
The 2D triangulation of the reference image aims to represent the entire image with a set of triangular mesh. The disparity map is described as a set of triangular areas with same or similar disparity. The triangular meshes reflect the topological relation between a pixel and its neighboring pixels. Given the premise of preserving the disparity discontinuities and edge details, the triangulation in homogeneous areas should be large enough to reduce the matching ambiguity. In areas where the depth is homogeneous, the density of points should be small, and a higher number of points must exist near depth discontinuities to correctly preserve the object details.
Many 2D triangulation methods exist, and the representative method is Delaunay triangulation. The Delaunay triangular mesh is the most regularization triangular mesh [13]. The most commonly used Delaunay triangulation algorithms include insertion methods, incremental method, and divide and conquer method. Insertion method is simple and efficient and takes up less memory, but its time complexity is poor. Incremental triangulation method is not commonly used because of its low efficiency. Meanwhile, divide and conquer method has been shown to be the fastest Delaunay triangulation generation technique. Considering running time, we use the divide and conquer method to triangulate the initial set of support points. Triangulation results of higher-resolution images from the Middlebury website (Cones, Teddy, Aloe, and Venus) are shown in Figure 2.
After the 2D triangulation of the reference image, each triangle is initially assumed to present uniform depth. The initial estimation step is assigning a unique depth value to each triangle.
For each triangle tj in the reference image, we assume that Mjr(d) is the matching function of tj with respect to image R at disparity d. The matching function is chosen based on the histogram of pixel gains and its ability to deal with illumination variations in an image. Considering a triangle tj in the reference image, for each pixel x in tj, the ratio can be calculated as(1)rcdx=IcxI′cx+d,where I′ is the image that needs to be matched in the stereo pair, x is pixel coordinates, and d is the disparity value. The ratios between corresponding pixels at each color channel c∈{R,G,B} are computed, and a histogram of the ratios considering all color channels is obtained. An ideal match at the correct disparity value should lead to similar ratios at all pixels and color channels.
If a match is good, the distribution of the histogram has few bins with large values and the rest are small, whereas a poor match has a more even distribution of the histogram. To find the large values of the histogram distribution, we use several methods such as image mean-square error estimate or entropy. We find that the following method is efficient and obtains good matching results. The matching function Mjr(d) for Tj with respect to image r at disparity d is given by(2)Mjrd=13Ajmaxl∑c∈R,G,Bβl-1cd+βlcd+βl+1cd,where Aj is the area of triangle tj and l is the index of histogram bin. We compute this histogram for each color channel using 20 bins βlc(d) ranging from 0.7 to 1.1 and ignore the values outside this interval. We choose maximum sum of three adjacent bins as a good match because it is close to the total number of pixels. For any disparity value d, the value Mjr(d) is within 0 and 1. A better matching is obtained when the value of Mjr(d) is closer to 1.
The advantage of using a triangle as a matching unit is that each triangle shares edges with exactly three other triangles. This property allows a very straightforward implementation of the aggregation step, popular in pixel-based approaches but not common in region-based methods. In the aggregation step, the cost of adjacent triangle regions is also considered before selecting the best disparity.
Denote Δj as the number of adjacent triangles of tj, and the aggregated matching function Mj′ is given by(3)Mj′d=Mjd+∑i∈ΔjωijMid,ωij=e-Bij2/2σ2,where ωij is the measured function of the color similarity between triangles ti and tj based on the Bhattacharyya distance Bij which is computed using the RGB values of both triangles. Parameter σ is used to control the attenuation degree of the exponential function ωij. The support weight of adjacent triangle is larger with increasing σ but possibly blurring the image edge. The weight was set experimentally to 0.16 as a compromise between image smoothing and edge blurring. The initial disparity value of each triangle tj is given by Dj=argmax(Mj′(d)). We can obtain a piece-wise constant disparity map at the end of this step. Next, we will smoothen and refine the disparity in discontinuous areas.
4.2. Disparity Refinement
In the stage of disparity refinement, the disparity value of each vertex should be refined according to the similarity to its neighboring triangles so as to ensure that vertices related to similar triangles have similar depth values. Given that all vertices of each triangle are potentially refined, the final result is a piece-wise linear representation of the depth map.
Considering that vertex V is shared by N triangles, Dj(j=1,…,N) is the disparity values of this vertex in each triangle. We aim to find a refined disparity value for vertex V to ensure that the disparity value difference is reduced when the triangles are similar and kept unchanged when the triangles are dissimilar. The refinement step is formulated as a minimization problem, and the objective function is given by(4)E=∑i=1N∑j=1Nwijdi-dj2+∑j=1Najdj-Dj2,where wij is the similar weight between adjacent triangles i and j. aj is the confidence value when we obtain the initial disparity Dj. The first term of (4) may be a regularization term, and it is minimized when all disparity values dj are the same. The second term is minimized when dj=Dj.
To ensure the accuracy and smoothing of the disparity value, the key issue is the selection of the weight wij and aj. If aj is large, the disparity accuracy increases, whereas the disparity map is smooth when wij is large. Therefore, aj should be selected based on the initial matching algorithm aj given by(5)aj=∑MjRCPj.MjR is the matching function similar to (2). C=3 is the number of color channels. Pj is the number of pixels included in triangle tj. The value of aj represents the percentage of pixels inside the triangle that are present in the three largest contiguous bins considering three color channels. This value is close to one when good matches are obtained and decreases as the quality of the match gets worse. When triangles i and j are similar, the value of wij should be large so that the corresponding disparities of di and dj are similar.
Subsequently, the key issue is the selection of wij. As in [14], we note that color similarity and proximity are two main concepts in classic Gestalt theory for visual grouping. The more similar the color of a pixel is, the larger its support weight is. Assuming that similarity and proximity can be regarded as independent events, wij is given by (6)wij=1Aiexpδijγc+dijγd,where δij is the similarity distance defined as the Euclidean distance between the mean RGB values of ti and tj. The distance can be calculated by the following equation:(7)δij=Ii-Ij2.·2 is second-order norm form and I(·) represents the pixel gray level.
dij is the spatial distance, defined as the Euclidean distance between the centroids of ti and tj, given by(8)dij=ix,y-jx′,y′2,where i(x,y) and j(x′,y′) are the centroid of ti and tj, respectively. (x,y) and (x′,y′) are the centroid coordinates of triangles ti and tj, respectively.
Parameters γc and γd are thresholds that control the decay of the support weight. We have fixed γc=10, γd=12 based on previous experiments and obtained good matching results.
5. Experiment Results and Analysis
An overview of the proposed method is shown in Figure 3. First, we obtained the image edge information using the optimal edge operator Canny. The Delaunay triangulation method was then used to divide the entire image that needs to be matched into a series of two-dimensional triangles according to the edge points. Second, we formulated the matching model to accomplish initial disparity estimation according to the characteristics that each triangle shared edges with another to achieve cost aggregation. Finally, we refined the initial disparity according to the characteristics that triangles shared vertices and obtained the final disparity map.
Flowchart of the proposed algorithm.
To verify the effectiveness of the proposed approach, we tested the method on higher-resolution images from the Middlebury website. We present four images here, namely, Cones, Teddy, Aloe, and Venus, with different resolutions. We implement the method on a PC with a single CPU of 2.79 GHz and 3 G memory; the program language is C++. The calculated disparity maps were evaluated by measuring the percent of bad matching pixels. A comparison of results of disparity maps is shown in Figure 4. The black areas are occluded regions. A comparison of results of matching accuracy and running time of nonoccluded regions is shown in Table 1. As shown in Figure 4, the proposed method can obtain a clear outline disparity map. In occluded areas (Aloe left side) and disparity discontinuous areas (newspaper edge area of Venus) the proposed method can also obtain good matching results. To verify the effectiveness of the proposed algorithm, we compared our approach with ELAS [8] methods in terms of matching accuracy and running time. As shown in Table 1, the average running times of the proposed algorithm and ELAS were 1.043 s and 1.045 s, respectively. The average error matching results in nonoccluded areas of our method and ELAS were 6.75% and 7.83%, respectively. The running time of the proposed algorithm was close to that of the ELAS algorithm. However, the error matching ratio of our method was lower than that of the ELAS algorithm. Therefore, the running time and matching accuracy of the proposed method are able to meet the needs of practical applications.
Comparison of results of different algorithms.
Cones
Teddy
Aloe
Venus
Image resolution
900×750
900×750
1282×1110
950×750
Error matching ratio of ELAS (%)
5.7
15.5
6.6
3.52
Running time of ELAS (ms)
1021
1280
996
876
Error matching ratio of our method (%)
5.4
13.2
5.2
3.2
Running time of our method (ms)
1078
1224
1025
854
Compare results of disparity maps.
Reference image
Ground truth
DSI of ELAS
DSI of our method
We also used the stereo vision system (as shown in Figure 5(a)) based on D-H coordinates captured in the real-world images to verify the effectiveness of the proposed algorithm. The transmission agents of the stereo vision system are constituted by rotary joint and pitch joint. The angle range of rotary joint and pitch joint synchronously determines the scope of the scene, and the rotation velocity and acceleration of the joint determine the responsiveness of the stereo vision system. The rotary accuracy of the joints is more relevant to the positioning accuracy degree of the stereo vision system. Considering the aforementioned features, the angle ranges of the rotary joint and pitch joint are within −60° to 60°, whereas angular velocity and the positioning accuracy are 90°/s and 0.8°, respectively. We used a pair of fixed focus camera WA-922H to capture visual information. As shown in Figures 5(d) and 5(e), in the real-world scene, the proposed method can also obtain a good disparity map, which further verifies the effectiveness of the proposed method.
Experiment results of real environment.
Stereo vision system
Left image of stairs
Right image of stairs
DSI of ELAS
DSI of our method
6. Conclusions
This paper presented a stereo matching algorithm based on Delaunay triangulation. Considering that edge detection has an important influence on image recognition, an adaptive Canny operator was applied to detect image edge. The operator has the advantage of high accuracy edge positioning and it can effectively reduce the error matching ratio. The running time of stereo matching can be accelerated by using a triangle mesh as the matching unit and the gray information of the image to accomplish initial disparity estimation. The method was tested on Middlebury stereo pairs. The running time of the proposed method is about 1s and the matching accuracy is 93% compared with that of ground truth map. Experimental results showed that the proposed method improved both the running time and the matching accuracy. In our future research, we will design an integrated vision system for parallel image processing and then apply it to the study of binocular vision navigation and path planning for six-legged robot.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is supported by the National Natural Science Foundation of China (NSFC) “Environment Modeling and Autonomous Motion Planning of Six-Legged Robot” (no. 61473104) and National Magnetic Confinement Fusion Science Program “Multi-Purpose Remote Handling System with Large-Scale Heavy Load Arm” (2012GB102004).
ScharsteinD.SzeliskiR.A taxonomy and evaluation of dense two-frame stereo correspondence algorithmsBrockersR.Cooperative stereo matching with color-based adaptive local supportRotherC.KolmogorovV.BlakeA.Grabcut: interactive foreground extraction using iterated graph cutsMurphyK. P.WeissY.JordanM. I.Loopy belief propagation for approximate inference: an empirical studyProceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI '99)July-August 1999Stockholm, SwedenMorgan Kaufmann Publishers467475ZhouZ. W.FanJ. Z.LiG.Design and application of fast matching method based on support point expansionMeiX.SunX.ZhouM.JiaoS.WangH.ZhangX.On building an accurate stereo matching system on graphics hardwareProceedings of the IEEE International Conference on Computer Vision Workshops (ICCV '11)November 201146747410.1109/iccvw.2011.61302802-s2.0-84863060370GeigerA.RoserM.UrtasunR.Efficient large-scale stereo matchingXiaC.YangY.JuR.WuG.Effective local stereo matching by extended triangular interpolationProceedings of the IEEE International Conference on Multimedia and Expo (ICME '13)July 20131610.1109/icme.2013.66074472-s2.0-84885612284FickelG. P.JungC. R.MalzbenderT.SamadaniR.CulbertsonB.Stereo matching and view interpolation based on image domain triangulationFickelG. P.JungC. R.SamadaniR. MalzbenderT.Stereo matching based on image triangulation for view synthesisProceedings of the 19th IEEE International Conference on Image Processing (ICIP '12)September-October 2012Orlando, Fla, USAIEEE2733273610.1109/ICIP.2012.6467464MainiR.AggarwalH.Study and comparison of various image edge detection techniquesLiM.YanJ. H.LiG.ZhaoJ.Self-adaptive Canny operator edge detection techniqueMaurP.Delaunay triangulation in 3D2002Pilsen, Czech RepublicDepartment of Computer Science and Engineering, University of West BohemiaYoonK.-J.KweonI. S.Adaptive support-weight approach for correspondence search