Matching Cost Filtering for Dense Stereo Correspondence

Dense stereo correspondence enabling reconstruction of depth information in a scene is of great importance in the field of computer vision. Recently, some local solutions based on matching cost filtering with an edge-preserving filter have been proved to be capable of achieving more accuracy than global approaches. Unfortunately, the computational complexity of these algorithms is quadratically related to the window size used to aggregate the matching costs. The recent trend has been to pursue higher accuracy with greater efficiency in execution. Therefore, this paper proposes a new cost-aggregation module to compute the matching responses for all the image pixels at a set of sampling points generated by a hierarchical clustering algorithm.The complexity of this implementation is linear both in the number of image pixels and the number of clusters. Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art local methods in terms of both accuracy and speed. Moreover, performance tests indicate that parameters such as the height of the hierarchical binary tree and the spatial and range standard deviations have a significant influence on time consumption and the accuracy of disparity maps.


Introduction
Stereo correspondence between stereo images results in a depth image, also called a disparity map, which can be categorized as sparse or dense.Sparse disparity maps are obtained mainly using feature-based methods derived from human vision research [1].As a result, high processing speeds and accurate disparity maps are achieved but without high density, which has limited their use for many purposes.Dense stereo correspondence, which aims to figure out which parts of an image correspond to which parts of another image, is a challenging issue in the field of computer vision.The requirement of dense disparity maps is motivated by many contemporary applications such as virtual reality, view synthesis, and robot vision navigation [2].
Dense stereo correspondence algorithms can be classified as global or local according to whether they obtain disparities from global or local information.The goal of global methods (energy based) is to minimize a global cost function which combines matching costs and smoothness terms, depending on information derived from the whole image.These methods are time consuming but very accurate [3].On the other hand, local methods (area based) offer high speed at the expense of matching accuracy and determine the degree of disparity of each pixel according to information provided by its local and neighboring pixels.These methods are also referred to as window-based methods because the disparity computation between two matching pairs depends only on the intensity values within a fixed-size and fixed-shape matching window [4].However, recent studies have shown that, by ingeniously selecting and aggregating the matching costs of neighboring pixels, the disparity maps produced by a local approach can be more accurate than those generated by global methods [5].The most noteworthy technique is local filtering, which is an effective way to reduce matching noise and is able to generate high-quality disparity maps.
This paper proposes a dense stereo correspondence approach very similar to the original adaptive support weight (ASW) method [6] to obtain accurate disparity maps both in depth discontinuities and smooth regions.The basic idea is to accept similar pixels within a matching window by assigning them relatively large support weights and to reject dissimilar pixels by giving them very small support weights.Therefore, it is necessary to divide the neighboring pixels into similar and dissimilar groups.In the present case, adaptive support weights are computed from the color image using a hierarchical clustering algorithm inspired by Gastal's work [7] in high-dimensional filtering of images and videos in real time; the disparity maps after filtering are less noisy, and the depth discontinuity boundaries are preserved fairly well.In addition, the proposed algorithm has improved the results for efficiency and accuracy compared with the guided-image filter (GIF) [8] algorithm used for stereo correspondence, which is by far the best existing algorithm.
The main contributions of this paper include the following.
(1) A novel matching-cost filtering model is proposed based on an edge-preserving filter for which the adaptive support weights are computed using a hierarchical clustering algorithm (as shown in Section 3.2).This solution can reduce mismatching, especially around regions of depth discontinuities, and can reconstruct dense high-accuracy disparity maps.
(2) The computational complexity of the proposed method is essentially linear both in the number of image pixels and the number of clusters, regardless of the matching window size and the intensity range (as described in Section 3.3).Therefore, the method can be easily adjusted to meet real-time requirements with the help of contemporary graphics hardware (a graphics processing unit (GPU)).
(3) A new disparity refinement method is presented, which has been proved to be robust and effective for improving the accuracy of coarse disparity maps (as presented in Section 3.5).This method can be applied to other coarse-to-fine frameworks, which are among the classic, simplest, and most popular stereo matching algorithms.
(4) The influence of algorithm parameters on accuracy and efficiency is discussed, especially regarding the weight coefficient, the height of the hierarchical binary tree, and the size of the spatial and range standard deviations (as discussed in Section 4.2).This study offers recommendations which can be used as a basis for future practical applications.
The rest of this paper is organized as follows: Section 2 describes an overview of the state-of-the-art local filtering methods and our method will be proposed in Section 3. Section 4 presents experimental results which compare the proposed method with other state-of-the-art approaches and discusses the influences of parameter settings.Finally, conclusions and suggestions for future work are discussed in Section 5.

Related Work
A disparity map is obtained by determining the disparity which has the lowest matching cost in each local matching window, a method which is widely used in local algorithms.Many local methods have been proposed to obtain a dense disparity map recently.For instance, adaptive-window methods [9,10] try to find an optimal matching window for each pixel, and multiple-window methods [11] select an optimal matching window among predefined multiple windows located at different positions with the same shape.However, these methods have one limitation in common: the shape of the matching window is constrained to be a rectangle, which is not appropriate for pixels near depth discontinuities.Therefore, it is difficult to find an optimal matching window with an appropriate size and shape for all cases.
Instead of searching for an optimal matching window of arbitrary size and shape, it is possible to aggregate costs after local smoothing within a matching window to reduce matching noise.It is clear that most noise can be reduced effectively by a linear filter, such as Gaussian filter, but the disparity map always results in a well-known "edge-fattening" phenomenon.Therefore, the local filtering results will not be a good neighborhood representative close to an edge region.To address this problem, the recently proposed ASW algorithm [6] smoothes the matching costs with an adaptive weighted filter in which the support weights are chosen according to both the color similarity and the Euclidean distance to the center pixel.These methods imitate the way that humans assign different weights to a pixel according to color or brightness in the process of finding the correspondences between their two eyes.Such a filter is also referred to as an edge-preserving filter in computer vision and is widely used for image denoising; examples include the SUSAN filter [12], bilateral filter [13], and the nonlocal means filter [14,15].Experimental results show that this approach can produce disparity maps better than those generated using global optimization techniques without needing many userspecified parameters.Although this method leads to highquality results, its computational speed presents a problem because runtime is computationally expensive.Therefore, many improved and real-time solutions have been presented, such as the O(1) bilateral filter [16][17][18], the dual-cross-bilateral grid (DCBG) [19,20], the GIF [21,22], and the nonlocal filter [23].

Cost Aggregation with Local Filtering
A literature review has provided a taxonomy and an evaluation of typical matching algorithms and has emphasized that such a coarse-to-fine algorithm generally performs the following four steps [24]: (1) cost initialization, in which the matching costs for assigning different disparity hypotheses to different pixels are calculated; (2) cost aggregation, in which the initial matching costs are aggregated spatially over matching windows; (3) disparity optimization, in which a cost function is minimized to obtain the best disparity hypothesis for each pixel; (4) disparity refinement, in which the coarse disparity maps are postprocessed to remove mismatches or to generate fine disparity maps.
According to these four steps, in this paper, the cost aggregation with local filtering consists of five parts: matching cost initialization, cost aggregation with filtering, clustering range values for the sampling points, disparity selection, and refinement.In addition, the computational complexity is discussed.

Cost Initialization.
Generally, it is possible to identify matching pairs in stereo images by measuring their similarity.The most common algorithms which use a matching cost function to establish a correspondence between the two points are the sum of absolute intensity differences (SAD), the sum of squared intensity differences (SSD), and the normalized cross-correlation (NCC) [25].
The cost initialization module computes the initial matching cost (, V, ) for assigning disparity hypothesis d to image pixel (, V), where , V define the displacements in the and -directions, respectively.Generally, after rectifying a stereo image, there is no shift in the -direction except for the displacement in the -direction, in which case the cost can be represented as (, ) according to the disparity .The costs are calculated using the truncated absolute differences in range (intensity or color) and the gradient between corresponding pixels.In other words, where  is the weight coefficient,   () is the left image, and the corresponding right image which has disparity  is   ( − ).∇  is the gray-scale gradients in the -directions, and  1 ,  2 are truncation values for balancing the range and gradient terms.Such a matching cost model has been proved to be robust to illumination changes and is commonly used in stereo correspondence [26].

Cost Aggregation with
Filtering.The original local filtering approach tried to compute the weights which are the average of the adjacent matching costs.The costs aggregated over the weights can therefore be expressed as where  and  are pixel indices in the -direction and   is the region around the th coordinate.The weights (, ) of this linear combination are given by two Gaussian filter kernels which combine the spatial weights based on the distance between two pixels and the range weights based on the intensity difference.Therefore, the filter weights (, ) can be represented by spatial and range terms as where   and   are two constants used to adjust the spatial and range similarities.
The Gaussian over the range similarity  can be rewritten as a convolution using two Gaussian kernels: where  is a normalization factor and  is a sampling range value.Finally, the range  for a Gaussian integral can be evaluated numerically using an approximation according to the Gauss-Hermite quadrature rule as where  is the number of sampling range values.Increasing the number of sampling points gives a better approximation for the integral in (4).Assuming that pixel  has a sampling set { 1 ,  2 , ...,   }, the filter weights in (3) can be rewritten as The normalization factor  was not included because both of the numerator and denominator in (2) contain this factor and it will cancel out after the division.

Clustering Range Value for Sampling Points.
As mentioned before, the key point of Yang's algorithm [23] is that it accepts similar pixels within a matching window by assigning them relatively large support weights and rejects dissimilar pixels by giving them very small support weights.Clearly, it is necessary to divide neighboring pixels into similar and dissimilar groups.Inspired by this opinion, the authors propose a hierarchical clustering algorithm similar to that developed by Gastal and Oliveira [7] to separate iteratively the whole set of image pixels from different range values into different clusters and to perform cost aggregation with local filtering within these clusters.This is actually an expansion of the method of adaptive manifold filtering in stereo correspondence and results in a modified clustering algorithm.Assume that pixel  and its neighboring pixel  within a cluster, where their th sampling points have similar range values, satisfy Averaging values only from pixels belonging to the same cluster generates better estimates for the local filtering output.Therefore, after clustering range values for the sampling points, the cost aggregation in (2) can be rewritten using the filter weights in ( 6) and the cluster constraints in (7) as Compared with the complexity of the original bilateral filter in (3), the proposed filter in (8) reduces the complexity from ( 2 ) to (), where  ≪  and  is the number of pixels within the whole image.
After introducing the improved cost aggregation and complexity analysis, an algorithm for clustering the range values can be summarized as follows.
Step 1. Generate the first sampling point  1 at pixel  by lowpass filtering the input signal  within neighborhood   : where  , represents the range value with distance  around pixel .
Step 2. Generate the th (1 <  ≤ ) sampling point   .The first step is to compute an optimal hyperplane   , which corresponds to the eigenvector associated with the largest eigenvalue of the covariance matrix: where   is the difference between the range value   and the previous sampling point  −1, associated with each pixel : and  is equal to the sum of the values   divided by the number of pixels : Step 3. Segment the pixels into two clusters  + and  − using the sign of the projection: Step 4. Compute a new sampling point  +  also by low-pass filtering the input signal, but giving weight zero to pixels not in  + , as The values   are the weights calculated using the range value and the previous sampling points: Perform the same processing for  −  using pixels belonging to  − ; then the combination of  +  and  −  is the whole set of sampling points   .
Step 5.The number of sampling range values  determines whether more clusters are needed for sampling points.Therefore, the next step is to repeat recursively Step 2 onwards until  = .
Remember that Steps 2 and 3 can be directly rewritten using the sign (positive or negative) of the differences when the range value is a gray one:  At the top of the tree, the sampling points are better adapted to smooth regions.Points further down this tree would become gradually better adapted to edge regions.
Figure 2 shows the first three levels of sampling points   of the Tsukuba image which was downloaded from the Middlebury benchmark database [27].Based on clustering range values for more than one sampling points, the filtering results of ( 8) can be guaranteed to be an edge-preserving smoothing.

Disparity Optimization.
Once the matching costs have been filtered using a cluster method, the disparity optimization step computes an optimal disparity map (, V) using the local winner-takes-all (WTA) approach, which computes the coarse disparities associated with the minimum cost value at each pixel.In other words, where (, V, ) represents the matching cost obtained after cost aggregation for assigning a disparity hypothesis to pixel (, V) and  is the number of disparity levels.

Disparity Refinement.
The coarse disparity maps generated by WTA may contain some mismatches because local optimization does not obey the smoothness constraint.Therefore, a two-step postprocessing method for fine disparity maps is proposed.The first step is a left and right cross-checking procedure for mismatches.Two corresponding disparity maps with the left and the right images as reference images are obtained.Then the left and right consistency check divides all the pixels into stable or unstable pixels.Note that all stable pixels in the left and right disparity maps have the same disparity value and that the rest of the pixels are labeled as unstable, represented by a value of zero for all disparity levels.
Secondly, let (, V) represent the left disparity map; a new disparity space volume (DSV) [28] is then computed for each stable () or unstable () pixel (, V) at each disparity level  as Then an edge-preserving filter such as GIF is applied to smooth the DSV at each disparity level, and the unstable pixels are assigned a new disparity value which depends on the lowest value of the DSV.

Experimental Results
In this section, the performance of the proposed method is evaluated using the Middlebury stereo benchmark, which provides stereo images with known ground truth [27].The experimental results are then compared with other local filtering methods which have recently been proven to be the best edge-preserving local stereo methods in terms of both speed and accuracy on the Middlebury benchmark website.Therefore, the comparison results will serve to demonstrate that the proposed method performs well among all local stereo correspondence algorithms.Moreover, this section analyzes the impacts of different parameter settings on the computational complexity and accuracy of the dense disparity maps.
The proposed method was run with constant parameter settings for all four testing images: {,  1 ,  2 , ,   ,   } = {0.1,0.028, 0.08, 4, 0.08, 11}.To analyze and compare the quality of the stereo matching algorithms, a widely accepted quantitative performance evaluation criterion, the percentage of bad pixels (PBP), was introduced: where  is the total number of pixels,   and   are the computed depth mapping and the ground truth mapping, and  is an absolute disparity error threshold.A value of  = 1 was chosen in these experiments because this setting is the same as in some previously published studies.Hence, a smaller PBP number means a better-performing algorithm.
The preferred metric (PBP) used in this paper, which is  considered the most representative of the quality of the results, will be used to make comparison easier.

Accuracy of the Dense Disparity Maps.
The GIF-based cost-aggregation method and the proposed hierarchical clustering method were first used to aggregate matching costs.Then winner-take-all and refinement operations were used to obtain the dense disparity maps.As shown in Figure 3, both methods yielded accurate results for the depth discontinuities as well as in the smooth regions for the test images.
The corresponding quantitative results are presented in Table 1, which records PBP in the nonoccluded, depthdiscontinuous, and overall regions of the "Tsukuba, " "Venus, " "Cones, " and "Teddy" images.The rightmost column of the table contains the average errors (AE), which were calculated using the average PBP over all twelve columns.As can be seen from the fourth and fifth rows of Table 1, the AE values obtained using the GIF (GIN) and the proposed method (HCN) without the refinement procedure were 8.78% and 7.77%, respectively.The first two rows show the errors obtained using the GIF (GIR) and the proposed method (HCR) with the refinement procedure; the AE values were 5.85% and 5.67%, respectively.This shows that the proposed method outperformed the GIF for filtering matching costs during cost aggregation.As expected, the proposed refinement method is suitable for removing mismatches, and the improvement is evident.In particular, as can be seen in Table 1, HCR can also outperform the original ASW algorithm [6] and the fast DCBG technique [20].In the authors' opinion, the method proposed in this research may well achieve the topmost position among local stereo correspondence algorithms.
To verify algorithm stability, the performance of the GIF and the proposed methods was compared on an additional 27 Middlebury stereo images [27].As described above, the PBP values with a disparity error larger than one pixel in all the regions were used to build the average of this measure over all 27 test images.The corresponding quantitative evaluation is summarized in Table 2.Note that both methods may be less accurate in large untextured regions such as the Midd1 and Monopoly pairs.Errors in untextured regions are due mostly to mismatches and will cause inconsistencies between the left and right disparity maps.However, the proposed HC method is still the winner and slightly outperforms the GIF technique.
In a comparison of HCN and HCR, the proposed refinement method is expected to perform well.

Computational Complexity.
We have implemented two versions of the local matching filter described in this paper and tested them on the four benchmark images.These implementations include CPU versions written in MATLAB and a GPU version written in CUDA.The performance numbers reported in this paper were measured on a 2.99-GHz Intel Core 2 Duo processor with 3.25 GB of memory and on a GPU (GeForce 9500GT) with 512 MB of memory.Note that all of the algorithms were run on the same testing platform to achieve a fair comparison.
As demonstrated by the results shown in Table 3, the proposed method is slightly faster than GIF for the testing images both on CPU and GPU platforms.The reason for this is that the total complexity of GIF on three-dimensional color images for disparity maps is O(17N) [21], while that of the proposed method is O(15N), with a tree height  = 4 and therefore a constant  = 2  −1 = 15.Moreover, the proposed method also has the same linear time requirement as the GIF, regardless of the filter kernel size and the intensity range.
Obviously, all the run times increase with the dimensional size of the disparity maps, where the "Tsukuba, " "Venus, " "Cones, " and "Teddy" disparity maps are 384 × 288 × 15, 434 × 383 × 19, 450 × 375 × 59, and 450 × 375 × 59, respectively.As a result, our CPU implementation processes a 1-megapixel image in about 16 to 20 seconds, resulting in a time-consuming process.Due to the simple and parallel operations used by our approach, our filter achieves significant performance gains on GPU platform.The total time required for filtering a 1-megapixel image ranges from 0.1 to 0.2 seconds.This represents a speedup from 80 to 200 compared to our CPU implementation.
Consequently, the proposed approach seems to perform slightly better than others in terms of accuracy and computational efficiency.

Robust Illumination-Independent
Behavior.All of the stereo benchmark images used in Section 4.1 have been acquired under normal lighting conditions and there are no significant variations of luminosity between the two images of a stereo pair.However, this condition is often not valid for a real environment [29,30].Due to illumination effects, the color value is not always reliable for stereo matching.Therefore, it has been suggested to supplement the constraint on the gradient in (1), which is invariant to additive illumination changes.
In order to confirm that the proposed method is robust when applied to illumination-variant stereo pairs, PBP results of the altered Tsukuba images with different weight coefficient were presented in Table 4. Refer to Nalpantidis [29], each stereo pair consisted of the left image of the Tsukuba image set and a mount of different versions of the right image whose luminosity alteration ranged from −25% to +25% with 5% increments.
It can be seen from Table 4 that the algorithm only based on color value (as  = 1) leads to many false matches with the lighting nonuniformity, while the quality of the algorithm that just relied on gradient (as  = 0) remains almost the same for every tested lighting condition.Moreover the algorithm combining color with gradient value produces the best results for ideal lighting conditions ( = 0%).As a result, the quality of our proposed method (when  = 0.1) can be less affected by any difference of the lighting conditions and be satisfied with a suitable accuracy.

Selection of the Tree Height.
The first step is to discuss how tree height affects the performance of the proposed method."Tsukuba" was chosen as the test image, and the GPU run time and PBP of the disparity maps were recorded with increasing tree height, as shown in Table 5.Note that the spatial   = 11 and the range   = 0.08 are constants.It is clear from the second column that the proposed algorithm will increase greatly in compilation time with increasing tree height.Because the number of sampling points  = 2  − 1 increases with tree height, the greater number of summation operations (8) for the sampling points will be time consuming.On the contrary, the accuracy of the disparity maps for nonoccluded, depth-discontinuous, and overall regions, which is demonstrated in the last three columns, is dramatically improved with increasing height.The reason for this, as mentioned before, is that increasing the number of sampling points reduces the errors between the continuous integration (4) and the discrete summation (5). Figure 4 shows the first three levels of weights (16) for the test image corresponding to the sampling tree (Figure 1).Similar pixels with relatively large weights are shown in white, while black denotes dissimilar pixel areas with very small weights.Moving down this tree, the large weights will be gradually assigned to edge regions, and image integrity will be guaranteed.For example, the missing information from the edge regions of the lamp in  2− , which is black with very small weights, can be compensated by more detail from the white regions with large weights in  3−+ and  3−− .However, accuracies improve slightly or even become worse between  = 5 and  = 6 because the spatial and range parameters are constant and unsuitable for the tree height.To confirm this cause,  = 6 is kept constant, and the PBP for "non" are improved, with values of 1.98 and 1.95 when   = 11,   = 0.09 and   = 12,   = 0.08, respectively.Therefore, the spatial and range parameters also affect performance, which will be further discussed below.

Influences of 𝜎
and   .  and   are two standard deviations used to adjust the spatial similarity and the range similarity, respectively.The spatial spread   is chosen based on the desired amount of low-pass filtering.A large   creates more blurring, meaning that more high-frequency components are removed and the image becomes obviously blurred.Similarly, the range spread   is set to achieve the desired amount of combination of pixel range values.Generally speaking, pixels with range differences less than   are mixed together, and those with differences greater than   are removed [13].
The results obtained from varying   ,   in ( 8) are equivalent to adjusting the spatial and range spread for a bilateral filter.However, the influence of changes in   ,   on the clustering weights ( 16) is also significant.To analyze the error source qualitatively, the following two propositions are defined.
Proposition 1.More sampling points will be needed for good accuracy when the range spread   is small or the spatial spread   is large.If the height is constant, the matching error would suffer from the edge-losing effect (ELE).
Proof.Using (16), the weight of each pixel is reduced when the value of   is small with respect to the overall range of values in the image or when the set of sampling points  is dissimilar to the image value  because  appears to be hazy due to larger   (9) or (15).Moreover, each  covers a limited sampling region, which means that in turn, more  values are needed to adapt to the signal [7].
Proposition 2. The filter weights (6) in the proposed method behave more like a low-pass filter when the range spread   is large or the spatial spread   is small.The matching error would be caused by the edge-smoothing effect (ESE).
Proof.Using ( 16), the weights of all pixels are increased when the value of   is large with respect to the overall range of values in the image or when the set of sampling points  is similar to the image value  because  appears to be less hazy due to smaller   (9) or (15).Therefore, all pixel values in any given neighborhood have approximately the same weight from range filtering for (6), and the resulting filter approximates a standard Gaussian filter [13].
"Tsukuba" was chosen as the test image.A fast way to determine the best choice of   and   using the filtering results of peak signal-to-noise ratios (PSNR) [31] is where  ×  is the image size, (⋅) is the local filtering result (as in Figures 3(a)-3(d)), and (⋅) is the ground truth (as in Figure 3(e)).Table 6 shows the PSNR distributions with  s ∈ (1, 90),   ∈ (0.01, 0.9).The following can be determined.
(1) The PSNR decreases as   or   becomes smaller when  s ∈ (1, 10) and  r ∈ (0.01, 0.1).The reason for this is that ESE obeys Proposition 2, that the proposed method behaves more like a low-pass filter when   decreases.The reason for the latter is that ELE obeys Proposition 1 that accuracy is reduced due to lack of more information in the filtering results around the edge regions due to a limited number of sampling points.
(2) The PSNR decreases with increasing   or   when  s ∈ (10, 90) and  r ∈ (0.1, 0.9).It obeys Propositions 1 and 2 that the accuracy is reduced due to ELE with large   in a constant-height tree and due to ESE with large   for each sampling point.
From the two findings, it can be confirmed that the optimal values for   and   are approximately 10 and 0.1, respectively, which are shown using bold italic font in Table 6.
The PBP distributions for the "non, " "all, " and "disc" disparity maps were then recorded with  = 4, but with   and   varying according to   ∈ (1, 20),   ∈ (0.01, 0.2), as shown in Figure 5. Results derived from Figure 5 can be summarized as follows.
(1) All the PBP perform like the results of PSNR; the PBP values increase as   or   becomes smaller.They decrease with increasing   or   , but only up to a certain point, which constitutes the best parameter setting.After that point, the PBP values will gradually increase.
(2) Figure 5(c) is more obviously different from the first two PBP because it was calculated only from the edge regions.The accuracy reduction refers to the nonoccluded and overall regions generated by ESE or ELE, which are smaller than the depth-discontinuous regions.
Consequently, accuracy was reduced when   and   became too small or too large within a constant-height tree.In terms of computational cost, the range component depends linearly () on the image, regardless of the filter kernel for each sampling layer.To this end, the authors suggest that the tree height be first determined according to the time consumption and then that the filtering results for PSNR be used to determine the general choice of   and   .

Conclusions and Future Work
In this paper, a new local solution for fast, high-quality dense stereo correspondence has been proposed that focuses on matching cost filtering method which is based on a highperformance hierarchical clustering algorithm.Instead of filtering the matching costs using an edge-preserving smoothing operator as in the popular bilateral filter, the cost aggregation model was adjusted to compute the matching responses for all image pixels at a set of sampling points generated using a clustering method.The computational complexity for this filtering is linear both in the number of image pixels and the number of clustering classes.The experimental results of the comparison have demonstrated that the proposed method outperforms the GIF-based matching algorithm, which is one of the best local methods on the Middlebury benchmark in terms of both speed and accuracy.Moreover, the results of performance tests, which provide effective guidelines for parameter selection, indicate that good accuracy is highly dependent on the weight coefficient, the height of the hierarchical binary tree, and the spatial and range standard deviations.As a result, it can now be confirmed that the proposed approach can be capable of high-speed processing and offer high-quality disparity maps for dense stereo correspondence.
In the experimental results, we show that both of the GI and HC filtering methods make some of the erroneous disparity values due to the lack of texture, which is a traditional challenge for stereo algorithms.The reason is that a pixel's disparity value is obtained by selecting the point of highest matching score and independently of disparity assignments of neighboring pixels.Hence, most of the disparity values in the low-texture areas maybe incorrect using a local matching method.To overcome this bottleneck, the authors plan to make the algorithm capable of handling large untextured regions, which remains an active area for future research [32].

Figure 1 :
Figure 1: Hierarchical binary tree generated by the clustering algorithm.

Figure 2 :
Figure 2: The first three levels of sampling points   after clustering the range values.

Figure 3 :
Figure 3: Experimental results on the Middlebury benchmark.Dense disparity maps from the first to the last row are the "Tsukuba, " "Venus, " "Cones, " and "Teddy" images.((a) and (b)) The results of GIF and the proposed method without refinement procedure.((c) and (d)) The disparity maps obtained using the GIF and the proposed method with refinement procedure.(e) Ground truth.

Figure 4 :
Figure 4: The first three levels of weight (16) distribution.

Table 1 :
Quantitative evaluation for the Middlebury image pairs.

Table 2 :
Evaluation for stereo methods on all 27 Middlebury stereo pairs.

Table 3 :
Run time comparison of the GIF and the proposed method in seconds.

Table 4 :
Evaluation on illumination-variant stereo pairs with different weight coefficient.

Table 5 :
Run time and PBP of the disparity maps vary with respect to tree height increasing.

Table 6 :
PSNR with different parameters   and   .