Local Stereo Matching Using Adaptive Local Segmentation

We propose a new dense local stereo matching framework for gray-level images based on an adaptive local segmentation using a dynamic threshold. We define a new validity domain of the fronto-parallel assumption based on the local intensity variations in the 4-neighborhood of the matching pixel. The preprocessing step smoothes low textured areas and sharpens texture edges, whereas the postprocessing step detects and recovers occluded and unreliable disparities. The algorithm achieves high stereo reconstruction quality in regions with uniform intensities as well as in textured regions. The algorithm is robust against local radiometrical differences; and successfully recovers disparities around the objects edges, disparities of thin objects, and the disparities of the occluded region. Moreover, our algorithm intrinsically prevents errors caused by occlusion to propagate into nonoccluded regions. It has only a small number of parameters. The performance of our algorithm is evaluated on the Middlebury test bed stereo images. It ranks highly on the evaluation list outperforming many local and global stereo algorithms using color images. Among the local algorithms relying on the fronto-parallel assumption, our algorithm is the best ranked algorithm. We also demonstrate that our algorithm is working well on practical examples as for disparity estimation of a tomato seedling and a 3D reconstruction of a face.


Introduction
Stereo matching has been a popular topic in computer vision for more than three decades, ever since one of the first papers appeared in 1979 [1].Stereo images are two images of the same scene taken from different viewpoints.Dense stereo matching is a correspondence problem with the aim to find for each pixel in one image the corresponding pixel in the other image.A map of all pixel displacements in an image is a disparity map.To solve the stereo correspondence problem, it is common to introduce constraints and assumptions, which regularize the stereo correspondence problem.
The most common constraints and assumptions for stereo matching are the epipolar constraint, the constant brightness or the Lambertian assumption, the uniqueness constraint, the smoothness constraint, the visibility constraint and the ordering constraint [2][3][4].Stereo correspondence algorithms belong to one of two major groups, local or global, depending on whether the constraints are applied to a small local region or propagated throughout the whole image.Local stereo methods estimate the correspondence using a local support region or a window [5,6].Local algorithms generally rely on an approximation of the smoothness constraint assuming that all pixels within the matching region have the same disparity.This approximation of the smoothness constraint is known as the frontoparallel assumption.However, the frontoparallel assumption is not valid for highly curved surfaces or around disparity discontinuities.Global stereo methods consider stereo matching as a labeling problem where the pixels of the reference image are nodes and the estimated disparities are labels.An energy functional embeds the matching assumptions by its data, smoothness, and occlusion terms and propagates them along the scan line or through the whole image.The labeling problem is solved by energy functional minimization, using dynamic programming, graph cuts, or belief propagation [7][8][9].A recent review of both local and global stereo vision algorithms can be found in [10].
Algorithms based on rectangular window matching give an accurate disparity estimation provided the majority of the window pixels belong to the same smooth object surface with only a slight curvature or inclination relative to the image plain.In all other cases, window-based matching produces an incorrect disparity map: the discontinuities are smoothed, and the disparities of the high-textured surfaces are propagated into low-textured areas [11].Another restriction of window-based matching is the size of objects of which the disparity is to be determined.Weather the disparity of a narrow object can be correctly estimated depends mostly on the similarity between the occluded background, visible background, and object [12].Algorithms which use suitably shaped matching areas for cost aggregation result in a more accurate disparity estimation, [13][14][15][16][17][18].The matching region is selected using pixels within certain fixed distances in RGB, CIELab color space, and/or Euclidean space.
To alleviate the frontoparallel assumption, some approaches allow the matching area to lie on the inclined plane, such as in [19,20].The alternative to the idea that properly shaped areas for cost aggregation can result in more accurate matching results is to allocate different weights to pixels in the cost aggregation step.In [21], the pixels closer in the color space and spatially closer to the central pixel are given proportionally more significance, whereas, in [22], the additional assumption of connectivity plays a role during weight assignment.
Our stereo algorithm belongs to the group of local stereo algorithms.Within the stereo framework, we rely on some standard and some modified matching constraints and assumptions.We use the epipolar constraint to convert the stereo correspondence into a one-dimensional problem.However, we modify the interpretation of the frontoparallel assumption and the Lambertian constraint.A novel interpretation of the frontoparallel assumption is based on local intensity variations.By adaptive local segmentation in both matching windows, we constrain the frontoparallel assumption only to the intersection of the central matching segments of the initial rectangular window.This mechanism prevents the propagation of the matching errors caused by occlusion and enables an accurate disparity estimation for narrow objects.The algorithm estimates correctly disparities of both textured as well as textureless surfaces, and disparities around depth discontinuities, disparities of the small as well as large objects independently of the initial window size.We apply the Lambertian constraint to local intensity differences and not to the original gray values of the pixels in the segment.In the postprocessing step, we apply the occlusion constraint without imposing the ordering constraint, which enables successful disparity estimation for narrow objects.Also, our stereo algorithm is suitable for a fast real-time implementation, because it is local algorithm for gray-valued images which uses a local segmentation and only a small subset of window pixels for cost calculation.
Our main contribution is the introduction of the relationship between the frontoparallel assumption and the local intensity variation and its applications to the stereo matching.In addition, we introduce a preprocessing step that smoothes low-textured areas and sharpens texture edges producing the image more favorable for a proper local adaptive segmentation.
The paper is organized as follows: in Section 2, we explain our stereo matching framework: the preprocessing step, the adaptive local segmentation, the matching region selection, the stereo matching, and the postprocessing step; in Section 3, we show and discuss the results of our algorithm on different stereo images; in Section 4, we draw conclusions.

Stereo Algorithm
Our algorithm consists of three steps: a preprocessing step, a matching step, and a postprocessing step.The flow chart of the algorithm is shown in Figure 1.Input to the algorithm is a pair of rectified stereo images I l and I r , where one of them, for instance I l , is considered as the reference image.For each pixel in the reference image, we perform matching along the epipolar line for each integer-valued disparity within the disparity range.Firstly, the input images are preprocessed, as explained in Section 2.1.The preprocessing step is applied to each image individually.Next, we calculate the local intensity variations maps for the preprocessed images and used them to determine the dynamic threshold for adaptive local segmentation, and elaborated in Section 2.2.Further, the stereo matching comprises a final region selection from segments, a matching cost calculation for all disparities from the disparity range and disparity estimation by a modification of the winner-take-all estimation method, see Section 2.3.The result of the matching is two disparity maps, D LR and D RL , corresponding to the left and right images of the stereo pair.Finally, postprocessing step calculates the final disparity map corresponding to the reference image as described in Section 2.4.

2.1.
Preprocessing.We apply a nonlinear intensity transformation to the input images in order to make them more suitable for adaptive local segmentation.The presence of the Gaussian noise and the sampling errors in image can produce erroneous segments for matching.The noise is dominant in the low-textured and uniform regions, while the sampling errors are pronounced in the high-textured image regions.The sampling effects can be tackled by choosing a cost measure insensitive to sampling as in [23], or by interpolating the cost function as in [24].We handle these problems differently and within the preprocessing step.The applied transformation suppresses the noise in low-textured regions while simultaneously suppressing the sampling effects in the high-textured regions.
The transformation is based on the interpolated subpixel samples by bicubic transform in the 4 neighborhoodS and by consistently replacing the central pixel value by maximum or by minimum value of the set, depending on the relation between the mean and the median of the set.We form a set of samples of the observed pixel at the position (x, y) and the intensities in horizontally and vertically interpolates image at the subpixel level at δ i : ( The intensity transformation is performed by replacing the intensity I(x, y) with the new intensity as All intensity values are corrected in the same manner.If the pixel intensity differs significantly from its four neighbors, as in the high-textured regions, it will be replaced by the maximum value in the interpolated subpixel set v, resulting in the sharpening effect.On the other hand, in lowtextured regions, the intensity change is small, and replacing the initial intensity value systematically with the minimum value of the interpolated subpixel set v, it produces the favorable denoising effect.These positive effects originate from the image resampling done by bicubic interpolation, because the bicubic interpolation exhibits overshoots at locations with large differences between adjacent pixels, see Chapter 4.4 in [25] and Chapter 6.6 in [26].These favorable effects are lacking if the interpolation method is linear.
We illustrate the effect of the preprocessing step for an image from a stereo pair from the Middlebury evaluation database in Figure 2. Therefore, the preprocessing step modifies regions with high-intensity variations and results in the sharper image.Further, in Section 3, we show the influence of this step to overall algorithm score.

Adaptive Local Segmentation. Adaptive local segmentation establishes a new relationship between the local intensity variation and the frontoparallel assumption applied
to stereo matching.Adaptive local segmentation selects a central subset of pixels from a large rectangular window for which we assume that the frontoparallel assumption holds for the segment.The segment contains the central window pixel and pixels, spatially connected to the central pixel, whose intensities lie within the dynamic threshold from the intensity of the central window.Starting from the segment, we form a final region selection for matching, see Section 2.3.
The idea behind the adaptive local segmentation is to prevent that the matching region contains the pixels with significantly different disparities prior to actually estimating disparity.We accomplish this aim by conveniently choosing threshold for segmentation based on the local texture.If local texture is uniform with local intensity variations caused only by the Gaussian noise, we opt for a small threshold value.In this way, because the intensity variations are small, the segment will comprise the whole uniform region.We assume that these pixels originate from the smooth surface of one object and therefore that the frontoparallel assumption holds for the segment.On the other hand, if the window is textured, that is, intensity variations are significantly larger than the noise level, it is not possible to distinguish based only on the pixel intensities and prior to matching, whether the pixels originate from one textured object or from several different objects at different distances from the camera.In this case, relying on the high texture for an accurate matching result, it is good to select small segment in order to assure that the segment contains pixels from only one object and does not contain depth discontinuity.Due to the high local intensity variations, this is achieved by large threshold.
We introduce local intensity variation measure in order to determine the level of local texture and subsequently the dynamic threshold.We define the local intensity variation measure as a sharpness of local edges in the 4 neighborhoods of the central window pixel.The sharper local edges are, the larger is the local intensity variation.We calculate the local intensity variation using the maximum of the first derivatives in the horizontal and the vertical directions at the half-pixel interpolated image by benefiting again from overshooting effect of the bicubic interpolation.The horizontal central difference for a pixel at the position (x, y) in image I is calculated as where I(x − 1/2, y) and I(x + 1/2, y) are horizontal half-pixel shifts of image I to the left and to the right.The vertical central difference for a pixel at the position (x, y) in image I is calculated as where I(x, y − 1/2) and I(x, y + 1/2) are vertical half-pixel shifts of image I.We define the intensity variation measure as We divide local intensity variations into four ranges based on the preselected constant T and define a dynamic threshold for each range by a look-up table: Figure 3 shows a color-coded dynamic threshold map, or equivalently local intensity variation ranges, for the left image from Tsukuba stereo pair from the Middlebury stereo evaluation set [27].The dynamic threshold T d (x, y) defined by (6) for the reference pixel in the reference image is also used for the adaptive local segmentation in the nonreference image for all potentially corresponding pixels from the disparity range.
The adaptive local segmentation pseudocode for the reference pixel I l (x, y) in the left image is given by Algorithm 1.The segmentation is performed for reference and nonreference windows independently using the same threshold T d (x, y).Thus, in the W × W window, where W = 2 • w + 1, around the pixel at the position (x, y) in the reference image, we declare that the pixel at (i, j) position, where i, j = 1, . . ., W in the reference window, belongs to the segment if its gray value p i, j l differs from the central pixel's gray value c l = p w+1,w+1 l for less than the dynamic threshold T d (x, y).The segment pixels in the nonreference window are chosen in similar way using the same threshold T d (x, y).Next, the central 8-connected components in the r then set B i, j to 1 end if end for end for Algorithm 2: The final binary map calculation.dilated masks are selected.The final segments are defined by the binary W × W maps, B l and B r , with ones if the pixels belong to the segment.Dilation is performed by 3×3 squared structured element to include additional neighbor pixels into segments and to merge isolated but close-selected pixels.

Stereo Correspondence.
The matching region is defined by the overlap of the adaptive local segments in the reference and nonreference windows.Thus, the matching region is defined by binary map B, which has ones if and only if both binary maps, B l and B r , have ones at the same positions, as given in Algorithm 2.
We assume that the corresponding pixels have similar intensities and that the differences exist only due to the Gaussian noise with the variance σ 2 n .One-dimensional vectors, z l and z r , are formed from the pixels from the left and right matching window at positions of ones within the binary map B.Besides the noise, differences between vectors can occur due to different offsets and due to occlusion.To make the matching vectors insensitive to local different offsets, we subtract the central pixel values c l and c r from vectors z l and z r , given by Algorithm 3. In this way, the intensity information is transformed from the absolute intensities to the differences of intensities with respect to the central window pixels.Further, we impose the Lambertian assumption on the pixels after the central pixel subtraction and not on the original pixel intensities.To prevent the occlusion influence in matching we eliminate the occlusion outliers by keeping only the coordinates of vectors which differ for less than threshold T as given by Algorithm 4.
We calculate the matching cost using the sum of squared differences (SSDs) [7,28].To compare the costs with different length of vectors z l and z r for different disparities, we introduce the normalized SSD: where N p is the length of vectors z l and z r for disparity d.
The winner-take-all (WTA) method selects the disparity with the minimal cost for the observed reference pixel.In our algorithm, besides the cost, the number of pixels participating in the cost calculation is also an indication of a correspondence.This ordinal measure cannot be used directly in the disparity estimation, because it is not always a reliable indication of the correspondence as in the case of occlusion.If the number of pixels used in the cost calculation is very low, it may be due to occlusion.However, a reliable match has a substantial ordinal support.
We combine the cost and the number of participating pixels in the disparity estimation and introduce a hybrid WTA; we consider only disparities supported by a sufficient number of pixels as potential candidates for a disparity estimate.Thus, the final disparity estimate is chosen from a subset of the all possible disparities from the disparity range.We term these disparity candidates as the reliable disparity candidates [13,29].
The reliable disparity candidates have at least N s = K p • max{N The final result of the hybrid WTA is the disparity map D We calculate two disparity maps, one disparity map, D LR , with the left image I l as the reference, and the other, D RL , as the right image I r as the reference.

Postprocessing.
In the postprocessing, we detect the disparity errors and correct them.There are some areas of incorrect disparity values caused by low-textured areas larger than the initial window.There are some isolated disparity errors with significantly different disparity from the neighborhood disparities, so called outliers, caused by isolated pixels or groups of several pixels if the adaptive local segmentation did not result in sufficiently large segment due to high local intensity variation.Also, there are disparity errors caused by occlusion.Although the matching procedure is the same for both occluded and nonoccluded pixels, our stereo matching algorithm does not propagate error caused by occlusions because the boundaries of objects are taken into account by both the adaptive local segmentation and the final matching region selection.However, occluded pixels do not have corresponding pixels, and the estimated disparities for the occluded pixels are incorrect.
The postprocessing consists of several steps including median filtering of the initial disparity maps, disparity refinement of the individual disparity maps, consistency check, and propagation of the reliable disparities.
First, we apply L × L median filter to both disparity maps, D LR and D RL , and eliminate disparity outliers.Second, we refine the filtered disparity maps individually to correct low-textured areas with erroneous disparities, in an iterative procedure.The refinement step propagates disparities by histogram voting to the regions with close intensities defined by a look-up table given in (10) across the whole image as   (11).

Direction
x tmp y tmp Condition N p is the length of the initial vectors z l and z r k = 0 for i = 1 to N p do if |z l (i) − z r (i)| ≥ T then Remove z l (i) and z r (i) end if end for N p is the length of the final vectors z l and z r Algorithm 4: Elimination of the outliers.illustrated in propagation scheme in Figure 4. Some similar notions to this approach appear separately in the literature, [18,30], and we were inspired by them.In [30], the cost aggregation is done along the 16 radial directions in disparity space, while in [18], histogram voting is used within the segment for disparity refinement.We refine our disparity maps by histogram voting of accumulating disparities along 8 radial directions across the whole disparity map with constraint of the maximum allowed intensity difference with the pixel being refined.The maximum intensity difference is defined by a dynamic threshold T p with the same logic behind as in local intensity variation measure in Section 2.2, with the difference that here we distinguish three ranges of intensity differences.Thus, the histogram is formed using disparities of the pixels with close intensities along 8 radial directions, see Figure 4 and Table 1.The pixels are close in intensities, and their disparities are taken into account in histogram forming if they lie within the threshold T p from the intensity of the pixel at the observed position (x, y).The threshold T p (x, y) is selected based on a look-up table: The histogram H with a number of bins equal to the number of disparities within the disparity range is formed by counting the disparities along 8 radial directions for the pixels whose intensity is within threshold T p (x, y): H d x tmp , y tmp = H d x tmp , y tmp + 1 if I x tmp , y tmp − I x, y < T p x, y , (11) where x tmp and y tmp are given by Table 1.
We calculate disparity d h as a disparity of the normalized histogram maximum: The where α, 0 ≤ α < 1, is a significance threshold.The steps given by ( 11), ( 12), ( 13), and ( 14) are repeated iteratively until there are no more updates to disparities in the map.Next, we detect occluded disparities by the consistency check between two disparity maps: If the condition in (15) is not satisfied for disparity D LR (x, y), we declare it as inconsistent and eliminate it from the disparity map.The missing disparities are filled in by an iterative refinement procedure similar to the previously applied procedure for the disparity propagation by histogram voting.In the iterative step to fill in the inconsistent disparities, we use the threshold look-up table (10) as in the disparity refinement step.We calculate the histogram h of the consistent disparities with close intensities along 8 radial directions as given by ( 11) and (12).The missing disparity is filled in with the disparity d h with the largest support in the histogram, provided that the histogram is not empty.The remaining unfilled inconsistent disparities, and we fill in by the disparity of the nearest neighbor with known disparities with the smallest intensity differences.As a last step in the postprocessing, we apply L × L median filter to obtain the final disparity map.

Experiments and Discussion
We have used the Middlebury stereo benchmark [4] to evaluate the performance of our stereo matching algorithm.The parameters of the algorithm are fixed for all four stereo pairs as required by the benchmark.There are five free parameters in our algorithm.The threshold value is set to T = 12.The half-window size is w = 15, and the window size is W ×W where W = 31.The noise variance σ 2 n is a small and constant scaling factor in (7).The ratio coefficient in hybrid WTA is K p = 0.5.In the postprocessing step, the median filter parameter is L = 5, and the significance threshold in histogram voting is α = 0.45.
Figure 5 shows results for all four stereo pairs from the Middlebury stereo evaluation database: Tsukuba, Venus, Teddy, and Cones.The leftmost column contains the left images of the four stereo pairs.The ground truth (GT) disparity maps are shown in the second column, the estimated disparity maps are shown in the third column, and the error maps are shown in the forth column.In the error maps, the white regions denote correctly calculated disparity values which do not differ for more than 1 from the ground truth.If the estimated disparity differs for more than 1 from the ground truth value, it is marked as an error.The errors are shown in black and gray, where black represents the errors in the nonoccluded regions, and gray represents errors in the occluded regions.The quantitative results in the Middlebury stereo evaluation framework are presented in Table 2.
The results show that our stereo algorithm preserves disparity edges.It estimates successfully the disparities of thin objects and successfully deals with subtle radiometrical differences between images of the same stereo pair.Occlusion errors are not propagated, and occluded disparities are successfully filled in the postprocessing step.A narrow object is best visible in the Tsukuba disparity map (the lamp construction) and in Cones disparity map (pens in a cup in the lower right corner).Our algorithm correctly estimates disparities of both textureless and textured surfaces, for example, the examples of large uniform surfaces in stereo pairs Venus and Teddy are successfully recovered.
The images in the Middlebury database have different sizes, different disparity ranges, and different radiometric properties.The stereo pairs Tsukuba, 384 × 288 pixels, and Venus, 434 × 383 pixels, have disparity ranges from 0 to 15 and from 0 to 19.The radiometric properties of the images in these stereo pairs are almost identical, and the offset compensation given by Algorithm 3 is not significant for these two example pairs, as we demonstrated in [13].As required by the Middlebury evaluation framework, we apply the offset compensation to all four stereo pairs.The stereo pairs Teddy, 450 × 375 pixels, and Cones, 450 × 375 pixels, have disparity ranges from 0 to 59.The images of these stereo pairs are not radiometrically identical, and the offset The columns show, from left to the right, the left image, ground truth, result computed by our stereo algorithm, and disparity error map larger than 1 pixel.The nonoccluded regions errors with ranking are, respectively, Tsukuba 1.33% (37), Venus 0.32% (39), Teddy 5.32% (17), and Cones 2.73% (14).compensation successfully deals with these radiometrical differences [13].
The error percentages together with ranking in the Middlebury evaluation online list are given in Table 2.The numbers show error percentages for nonoccluded regions (NONOCC), discontinuity regions (DISC), and the whole (ALL) disparity map.The overall ranking of our algorithm in the Middlebury evaluation table of stereo algorithms is the 28th place out of 123 evaluated algorithms.Thus, our stereo algorithm outperforms many local as well as global algorithms.Among the algorithms ranked in the Middlebury stereo evaluation, there are only two local algorithms ranked higher than our algorithm, but both of them do not impose the frontoparallel assumption strictly: a local matching method using image geodesic-supported weights GeoSup [5] and a matching approach with slanted support windows PatchMatch [31].Both of these algorithms use colored images, while our algorithm works with intensity images and achieves comparable results.Although these approaches have better general ranking in the Middlebury stereo evaluation list, our approach with matching based on frontoparallel regions outperforms the PatchMatch algorithm for Tsukuba stereo pair, and the GeoSup algorithm for Tsukuba, Teddy, and Cones stereo pairs.Thus, our approach with region selection by threshold produces more accurate disparity maps for cluttered scenes than GeoSup algorithm with region selection using geodesic support weights.
To investigate the contribution of the preprocessing and the postprocessing steps to the overall result, we show in Table 3 the results we obtained on the benchmark stereo pairs with or without the preprocessing and the postprocessing steps in the algorithm.We show the results if neither, only one, and both steps are applied.If our postprocessing step was omitted, the L × L median filter was applied.From the results in Table 3, we conclude that both steps, if individually applied, improve the qualities of the final disparity maps.If we apply both steps, the accuracy of the disparity maps is the highest.Furthermore, the improvement contribution of the preprocessing step is greater than the postprocessing step only for Venus stereo pair.This is because the sampling effects were most pronounced in Venus scene.In addition, we show in Figure 6 the disparity maps for Tsukuba stereo pair for all four combinations: if the preprocessing and the postprocessing steps are included or not in the algorithm.We conclude that the preprocessing step plays a significant role in accurate disparity estimation of textureless areas, while the postprocessing step especially helps in an accurate estimation of disparity discontinuities.
To illustrate the subtle features of our algorithm not captured in the standard test bed images, and we apply our stereo algorithm, while retaining the parameter values, on some other images from the Middlebury site in Figure 7.For  successfully estimates the disparity of the smooth lowtextured objects and is suitable also for application to 3D face reconstruction, Figure 8(d).Our algorithm also successfully estimated the disparity map of the tomato seedling.Tomato seedling stereo images represent a challenging task for a stereo matching algorithm in general, because the viewpoints significantly differ and the structure of the plant is narrow, that is, much smaller than the window dimension.
As far as the initial window size is concerned, our algorithm is not influenced by the window size above certain size.In principle, we could apply our algorithm using the whole image as the initial window around the reference pixel.This would result in a sufficiently large region selection for uniform regions in the image and make the ordinal measure within the hybrid WTA more reliable.On the other hand, in matching windows with high local intensity variations, the selected region is always significantly smaller than the window and does not change if the window is enlarged because of the connectivity constraint with the reference central pixel.

Conclusion
In our local stereo algorithm, we have introduced a new approach for stereo correspondence based on the adaptive local segmentation by a dynamic threshold so that the frontoparallel assumption holds for a segment.Further, we have established a relationship among the local intensity variation in an image and the dynamic threshold.We have applied the novel preprocessing procedure on both stereo images to eliminate the influence of noise and sampling artifacts.The mechanism for the final matching region selection prevents error propagation due to disparity discontinuities and occlusion.In the postprocessing step, we introduce a new histogram voting procedure for disparity refinement and for filling in the eliminated inconsistent disparities.Although the starting point in matching is the large rectangular window, disparity of narrow structures is accurately estimated.
We evaluated our algorithm on the stereo pairs from the Middlebury database.It ranks highly on the list, outperforming many local and global algorithms that use color information while we use only intensity images.Our algorithm is the best performing algorithm in the class of local algorithms which use intensity images and the frontoparallel assumption without weighting the intensities of the matching region.Furthermore, our algorithm matches textureless as well as textured surfaces equally well, handles well the local radiometric differences, preserves edges in disparity maps, and successfully recovers the disparity of thin objects and the disparities of the occluded regions.We demonstrated the performance of our algorithm on two additional examples from the Middlebury database and on two practical examples.The results on this additional examples show that the disparity

Figure 2 :
Figure 2: Illustration of the preprocessing step for one image from Tsukuba stereo pair: (a) original image, (b) detail of the original image, and (c) detail of the original image after the preprocessing step is applied.

Figure 3 :
Figure 3: Left image from Tsukuba stereo pair with a color-coded local intensity variations levels: the lowest local intensity variation is in red, and in the ascending order follow orange, green; the highest local intensity variations are in blue.

Step 1 :Algorithm 1 :
Dynamic thresholding for i = 1 to W do for j = 1 to W do if |p i, j l/r − c l/r | < T d (x, y) then set B Dilation Dilate B l/r with 3 × 3 squared structured element Step 3: Imposing connectivity for i = 1 to W do for j = 1 to W do if B i, j l/r = 1 and not connected to B w+1,w+1 Adaptive local segmentation for reference pixel I l (x, y).
pixels, where N x,y p is a set containing the number of pixels participating in the cost aggregation step for each possible disparity value from the disparity range [D min , D max ].K p is the ratio coefficient 0 < K p ≤ 1.The estimated disparity d(x, y) is d x, y = argmin di∈{Dmin,...,Dmax} C x,y nSSD (d i ) | N x,y p (d i ) > N s , (8) where x = 1, . . ., R and y = 1, . . ., C, for image of the dimension R × C pixels, and d i belongs to the set of all possible disparities from the disparity range [D min , D max ].

Algorithm 3 :
N p is the length of the vectors z l and z r c l and c r are the central intensities in the left and in the right windowfor i = 1 to N p do z l (i) = z l (i) − c l z r (i) = z r (i) − c r end for Offset neutralization.

Figure 6 :
Figure 6: Disparity results for the stereo pair Tsukuba: (a) without preprocessing and without postprocessing, (b) without preprocessing and with postprocessing, (c) with preprocessing and without postprocessing, and (d) with preprocessing and with postprocessing.

Figure 7 :
Figure 7: Disparity results for the stereo pairs (1st row: Art, 2nd row: Dolls) from the Middlebury database of the stereo images.Size of each image is 370 × 463 pixels.Disparity range in both stereo pair is 0 to 75.The columns show, from left to the right, the left image, the ground truth, the result computed by our stereo algorithm, and the disparity error map larger than 1 pixel.

Figure 8 :
Figure 8: Disparity results for the stereo pair Sanja, taken at the vision laboratory of Signals and Systems Group, University of Twente.Size of each image is 781 × 641 pixels.Disparity range is 0 to 40.(a) Left stereo image, (b) right stereo image, (c) disparity map corresponding to the right image, and (d) depth map with texture overlay.

Figure 9 :
Figure 9: Disparity results for the stereo pair Tomato seedling, taken within MARVIN project at the vision laboratory of Intelligent System Group, Wageningen UR-Food and Biobased Research.Size of the region of interest in each image is 300 × 310 pixels.Disparity range is 0 to 90.(a) Left stereo image, (b) right stereo image, (c) region of interest in the left image, and (d) disparity map corresponding to the left image.

Table 1 :
x tmp and y tmp values for histogram calculation in initial disparity d(x, y) is replaced by the new value d h if it is significantly supported, that is, if the normalized histogram value h(d h ) is greater than α; otherwise, it is left unchanged:

Table 2 :
[4]luation results based on the online Middlebury stereo benchmark[4]: the errors are given in percentages for the nonoccluded (NONOCC) region, the whole image (ALL), and discontinuity (DISC) areas.The numbers within brackets indicate the ranking in the Middlebury table.