Binocular Images Dense Matching considering Image Adaptive Color Weights and Feature Points

When the matching cost function in Semiglobal Matching is unstable, the inaccurate matching cost values will be propagated in the cost aggregation process. It will lead to a serious mismatching phenomenon. To address the problem, a binocular images dense matching method considering image adaptive color weights and feature points was proposed. Firstly,­e Color Bircheld Tomasi (CBT) matching cost calculation method was proposed to obtain a stable initial cost volume, which combined image adaptive color weights and gradient information. Secondly, the Scale-invariant Feature Transform matching algorithm was used to extract the a priori feature points from binocular images. ­en, the feature points were ltrated. ­e cost volume was optimized by using their coordinate information and disparity information. Finally, an aggregation path segmentation rectication method was adopted to optimize the SGM aggregation paths and reduce the propagation of incorrect paths. Experimental results demonstrate that the proposed method can eectively improve the stability and accuracy of dense matching, reduce the mismatching phenomenon, and nally produce high-quality disparity maps.


Introduction
In recent years, stereo matching has played an essential role in the eld of photogrammetry and computer vision [1]. Stereo matching mainly refers to nding pixels in stereo images corresponding to the same scene point. e corresponding points obtained from the binocular imaging model can be used to recover the depth of the scene or its 3D coordinate information. erefore, stereo matching has a wide application space in 3D reconstruction [2], medicine [3], face recognition [4], automatic driving [5], and many other elds. As for stereo matching, it can be divided into two main categories. One is the feature matching method based on image feature point information, and the other is the dense matching method based on image pixel information.
Image feature matching is actually a process of feature extraction, description, and matching [6]. Firstly, the interest features and attributes of multiple images are extracted.
en, the parametric description is performed. Finally, similarity matching is performed on the extracted features. At present, there are many commonly used feature point matching algorithms, such as Harris [7], Forster [8], SIFT (Scale-invariant Feature Transform) [9], and deep learning methods [10]. Due to the features of rotation invariance, scale invariance and a ne invariance [11], SIFT has become a commonly used algorithm in feature matching. Although the extracted coordinates of the corresponding points are accurate, the number of corresponding points obtained by feature matching is few, which will result in insu cient detailed information in 3D reconstruction. ence, the methods that make sparse feature points become dense have been widely concerned. Aurenhammer constructed Voronoi polygons based on sparse feature points to divide the image and used the SSD (Sum of Squared Di erence) method to process each pixel in the polygon area to obtain dense points [12]. A matching method of adaptive propagation was proposed by [13]. On the basis of feature matching, corresponding triangulations were built for the two images. And then feature dense matching was carried out inside the triangulations. Delaunay triangulations were established after SIFT feature point extraction, and the dense point set was obtained by iterative processing based on the triangle center of gravity [14]. Although these algorithms can get a dense set of corresponding points, they depend on high precision initial feature points. When the features in the region are not obvious, it is easy to cause incorrect point matching. And the methods above are also complicated.
Dense matching is a simple and direct method to obtain corresponding points in stereo vision. It is mainly dependent on the gray level of image pixels. e disparity per pixel is calculated by establishing the corresponding matching relationship between pixels in two images. Owing to the advantages of low cost and high density of matching points [15], dense matching is widely used in change detection [16], mapping [17,18], smart city [19], and other fields. Compared with feature matching, dense matching can obtain more matching points. But on the other hand, the accuracy of the matching points is inadequate, and the mismatching phenomenon is serious. At present, dense matching methods can be divided into the global method [20], SGM (Semiglobal Matching) [21], local method [22], and deep learning method [23]. ere are many advantages of SGM, such as good universality, satisfactory efficiency, and high matching accuracy. SGM does not rely on data sets [24]. So SGM is the most commonly utilized mainstream method.
SGM dense matching method mainly involves four steps: matching cost calculation, cost aggregation, disparity calculation, and disparity refinement. e stability of matching cost calculation methods and the accuracy of cost aggregation paths directly affect the accuracy of dense matching. In the matching cost calculation methods, local windows were often used [25][26][27][28]. e matching cost values were computed by setting regular local windows on binocular images, respectively, and evaluating the correlation of pixels in the windows. Although these methods are simple, the matching accuracy depends heavily on the size of the window. MI (Mutual Information) [20] method is not sensitive to illumination information, but it is complicated and needs iteration. So MI method is not commonly used. Combining the AD (Absolute Difference) method with the Census method, Mci proposed the AD-Census joint cost calculation method [29], which can not only preserve image edges but improve the accuracy of matching cost results as well. Although the joint cost is beneficial to make the cost volume with multiple image features [30], it will also weaken some original features and make the calculation more complicated. Although the BT (Birchfield Tomasi) method [31] can maintain the continuity of disparities, it ignores the color information of the image itself and is not conducive to the edge preservation of the disparity map. erefore, the BT method still has a stability issue. In cost aggregation, Gehrig improved the SGM and proposed a real-time dense matching method [32]. Rothermel proposed the T-SGM to accelerate SGM [33]. In T-SGM, the original image is stratified and downsampled before the cost aggregation process. However, the image downsampling process will affect the quality of the final result. Meanwhile, the above methods ignore the path propagation effect in cost aggregation, which will cause the incorrect matching cost values to be continuously propagated and affect the accuracy of the final disparity map.
In summary, in order to improve the accuracy of dense matching, we analyzed the unique advantages of the two matching methods and proposed a binocular images dense matching method combining image adaptive color weights and feature points. e main contributions of the method are as follows: (1) A more stable CBT matching cost calculation method was proposed. In the calculation process, the color information of each pixel can be adaptively weighted, which can better reflect the color information of the image itself. And it is beneficial to preserve the edge of the disparity map by adding the constraint of image gradient information.
(2) A segmentation correction cost aggregation method was adopted to reduce the mismatching phenomenon. e initial cost volume was optimized according to the image prior to feature point information. During the SGM cost aggregation, the aggregation paths were corrected in segments to reduce the error path, avoid the propagation of the incorrect matching cost values, and improve the accuracy of the whole dense matching.

BT Cost Calculation Method Combined with Image Adaptive Color Weight
In the real world, the depth of the scene is continuous. Discretization errors will be generated during image sampling. As a result, the image depth is discontinuous when the camera is used to obtain the real scene image. For binocular dense matching, the image depth discontinuity is actually the image disparity discontinuity. e discontinuity of image depth and disparity will directly lead to the mismatching phenomenon in the process of stereo matching, which will have a negative impact on the matching accuracy and the 3D reconstruction effect. e linear interpolation calculation method was adopted in BT method [31], and gray images were used to calculate directly. e subpixel disparity values were obtained by calculating the subpixel matching costs of the image. e linear interpolation method is simpler than the subpixel matching method, and it can effectively avoid the image depth discontinuity and reduce the image sampling errors. So compared with other matching cost calculation methods, the advantage of the BT method [30] is that it can maintain the continuity of disparities. A sketch map of the linear interpolation method is shown in Figure 1.
In Figure 1, I L and I R are the left image and the right image, respectively. I L is the reference image. x i and x i ′ are two points to be matched in I L and I R , respectively. I − R is the linear interpolation result of x i ′ with its left neighbor pixel In the BT method, the image is sampled by linear interpolation. en the similarity of two points is assessed by the pixels dissimilarity measurement method. e measurement result of the BT method is the BT value. Disparity level is denoted by d. In the range of d, if the BT value of the two matching points in binocular images is larger, that means the similarity between the two points is smaller, and the two points are dissimilar. When the BT value is the smallest, it indicates that the similarity between the two points is the greatest. At this time, the corresponding matching points are similar points. When the BT method is used to calculate the similarity of two points in binocular images, the left image is taken as the reference image. At the disparity level d, x i ′ � x i + d. e differences of pixel gray values for I L with I − R , I R and I + R are computed, respectively. e minimum value of computation results is taken and regarded as d L . en the right image is taken as the reference image. e differences of pixel gray values for I R with I − L , I L and I + L are computed, respectively. e minimum value of computation results is taken and regarded as d R is the minimum value between d L and d R , which is the result of matching cost calculation. e calculation methods of BT matching cost are shown as follows: Although the BT method can keep the disparities continuous, there are still two defects in the BT method. (1) e color information of the image is ignored. e lack of image color information will lead to unstable calculation results.
are two points to be matched, where R, G, B are color channels. When channel B is selected for calculation, the two points are corresponding points. When channels R or G are selected for calculation, the two points are not corresponding points. (2) e gradient changes of pixels in the image are not considered in the BT method. In the calculation process, with the change of disparity d, the two images maintain relative motion (as shown in Figure 2). e relative motion will lead to the gradient changes in the image. e lack of gradient constraint will result in the loss of the image edge information. So, the stability and edge constraint ability of the BT method should be improved. In Figure 2, p is a point in the cost volume after matching cost calculation.
To address the problems in the BT method, the CBT method combining image adaptive weight and gradient information was proposed. Firstly, the matching results for each color channel of the images were calculated by the BT method. e adaptive weight for each color component per pixel was calculated according to the image color information.
e matching result d c of the adaptive color weights for two images was obtained by weighted sum. en the horizontal and vertical gradient information of the two images was then obtained at each disparity d. BT values of horizontal gradient and vertical gradient for the two images were calculated separately. Finally, the weighted results of color and gradient information were computed. Equations (6)-(9) describe the computation process of the CBT method. value

Mathematical Problems in Engineering
where, d c , d ∇x and d ∇y are the results of color information, horizontal gradient information, and vertical gradient information, respectively. c indicates the color channel R, G, B. λ c represents the adaptive color weight for each pixel, are the gradient information of binocular images. ω 1 is the color weight. ω 2 and ω 3 represent the gradient weights. e sum of ω 1 , ω 2 and ω 3 is 1. D cost is the final result of cost volume. In this paper, ω 1 , ω 2 and ω 3 are 0.6, 0.2, 0.2, respectively. τ is the truncation error.

SGM Method Based on Disparity Optimization of Feature Points
In the SGM method, cost aggregation is performed after matching cost calculation in order to acquire a high accuracy disparity map. e core of SGM is to create a global energy function and optimize the function. Firstly, a global energy function E(D) is established. en, based on the thought of pixel matching, 2D constraints are generated by merging several 1D constraints to accomplish global optimization. Finally, the optimal disparity for each pixel is determined by the Winner-Take-All (WTA) method. e optimal disparity is the one that corresponds to the minimum of the energy function. e energy function E(D) is shown in the following: where p denotes a pixel. d p refers to the disparity value of p. C(p, d p ) represents the matching cost value of p at the disparity d p . P 1 and P 2 are the two penalty parameters of the external input, where P 2 > P 1 . q is the neighbor pixel of p. T[] is a discriminant function, which is used to judge the relationship between d p and d q . P 1 penalty is imposed when |d p − d q | � 1. P 2 penalty is imposed when |d p − d q | > 1. e last two terms in (10) are smoothing terms. e energy function is used to calculate the whole image from a global perspective [35]. Since the energy function E(D) is not differentiable, it actually becomes an NP problem throughout solving. In order to simplify the calculation of E(D), SGM considers optimizing point p from multiple directions. en the results of optimization in each direction are added to get the final result (as shown in equations (11) and (12)), which is the cost aggregation of the SGM method.

Image L
Cost Volume where r refers to direction. L r (p, d) is the cost value of p in the direction r. e aggregation cost along a direction can be regarded as the information transmitted by each pixel at every disparity d in the direction. L r (p − r, d) denotes the matching cost value of the previous point of p in the current path. min k L r (p − r, k) is a constraint term to prevent the calculation result from being too large. S(p, d) represents the final cost aggregation result.
SGM implements 2D constraints using a large number of 1-D constraints, which can reduce the amount of computation, improve the efficiency of operation, and achieve an approximate semiglobal effect. However, every disparity d at every pixel is actually considered in SGM. In fact, the cost aggregation process is a direct correlation process of neighbor pixels and optimizes the current pixel cost value based on the neighbor pixel cost value. If the matching cost calculation yields an incorrect cost value, the incorrect cost value will be continuously transmitted along the current aggregation path in cost aggregation. e aggregation results of the adjacent pixels will be affected. e transmission of the incorrect matching cost will directly result in the mismatching phenomenon of the image. erefore, we proposed using image feature points to rectify the aggregation path and improve the matching accuracy.
SIFT method was used to extract the image feature points in the paper. Since the coordinates of feature points obtained by SIFT are subpixel level, they should be converted to integer-pixel level. At the same time, mismatching points and repeated points should be eliminated to get accurate feature points. According to the coordinates of the feature points, accurate disparity information between the corresponding two points can be acquired. Information of a priori feature points was used for accurate positioning in 3D cost volume. e positions were taken as the guide points. Matching cost values corresponding to other candidate disparities of the point were set as invalid values to optimize the cost volume. In the next cost aggregation process, the path through the point is accurate and unique. is aggregation path constraint (as shown in Figure 3) is beneficial to improving the accuracy of dense matching and reducing the propagation of incorrect cost values. So the dynamic programming path of SGM can be rewritten into the form of the following equations: where P is the set of corresponding points. p refers to the point in the left image. q ′ represents the corresponding point of p in the right image. If the number of corresponding points in P is larger, it will make the path constraint effect more obvious and make the accuracy of SGM higher.
e experimental process of the proposed method is shown in Figure 4. e specific experimental steps are as follows: (1) Binocular rectified images are input; (2) e proposed CBT method is used to obtain the initial matching cost volume; (3) SIFT feature matching is performed on binocular images to acquire feature points; (4) e feature points are optimized to obtain feature point coordinates information and accurate disparity information; (5) According to the information of a priori feature points, the cost volume is optimized. Except for the accurate disparity of the feature points, the corresponding cost values of the other candidate disparities are set to be invalid; (6) Cost aggregation of path constraints is performed; (7) WTA method is used to obtain the disparity map based on the left image; (8) Left and right consistency detection is carried out; (9) e final disparity map is output; (10) e quality of the disparity map is evaluated. Figures 5 and 6) in Middlebury public data set were selected for experiments in the paper. To guarantee that disparity values in the experimental results are accurate and reliable, the left and right consistency detection was conducted.

Experimental Data. Cone and Teddy image pairs (as shown in
To analyze the characteristics of the proposed CBT method, the CBT method was compared with AD, BT, Census [36], and AD-Census methods. Meanwhile, the proposed dense matching method was compared with the traditional SGM method. Since the relative gray values in the window are compared in the Census method, it is easy to produce two neighborhood windows with completely different gray levels in low texture and repeated texture regions, resulting in the occurrence of a mismatching phenomenon. On the other hand, Census relies too much on the stability of the center point in the window, and the correlation between pixels in the window is weak. So as to increase the stability of the Census, the window size for Census calculation was set to 3 × 3. e center pixel in the window was replaced with the mean value of the window. In this way, the correlation between pixels in the window can be increased, and the distortion of the center pixel can be avoided. Experimental results of Cone images are shown in Figures 7-10. Figure 9 is the local detail of the corresponding results. Figure 10 is the error map of the corresponding results. Experimental results of Teddy images are shown in Figures 11-14. Figure 13 is the local detail of the corresponding results. Figure 14 is the error map of the corresponding results.
As for the quality evaluation of disparity maps, after consistency detection, the places in the disparity map with a value larger than 0 are effective disparities. For binocular images without ground truth, the number of effective . Figure 3: Sketch map of SGM aggregation path correction. e red line represents the incorrect matching path and the green line indicates the correct matching path. According to the coordinates of feature points obtained by SIFTmethod, the accurate disparity information of the corresponding two points can be obtained. e green checkmarks are the exact matching point positions, which are obtained by the coordinates and the disparity information of feature points. e yellow crosses are invalid cost values. e matching paths cannot pass. When the incorrect matching path passes through a matching point, the matching path will be forcibly corrected to the matching point, which plays the role of path rectification.  disparities is an important indicator to measure the quality of the disparity maps [1]. So the percentage of effective disparities was recorded for every result, and the histograms were generated for comparative analysis (as shown in Figures 15  and 16) to illustrate the reliability of the proposed method. e elapsed times of different matching cost calculation methods are shown in Table 1. According to the ground truth, RMSE (Root Mean Square Error) and PBM (Percentage of Bad Matching Pixels) were used to assess the quality of experimental results. e quality evaluation results are shown in Tables 2-5. In Tables 3 and 5, "noncc" stands for the PBM of the nonoccluded area. If the RMSE value and the PBM value are smaller, it demonstrates the quality of the result is better. Figures 7-11, the number of mismatching pixels in AD results is the most. Due to the interference of many factors in the calculation process, the stability of the AD method is not good enough. e BT method can improve the stability of the cost volume by calculating the subpixel disparities. However, there are still obvious mismatching regions in BT results, and the results have insufficient edge information and poor quality. As for Census results, the Census method has a good detection effect on the edge and corner features of images, making image edges more obvious. But the matching accuracy in the Census results is still not ideal. e advantages of AD and Census methods are combined by the AD-Census method. e AD-Census method can not only preserve the color information but also have a certain ability of image edge protection. As shown in Figures 7(e) and 7(f ) and Figures 11(e) and 11(f ), there is a superior result produced by the CBT method. Considering the adaptive color weight information of the image can make the results reflect the real color information of the image, which can obviously suppress the occurrence of a mismatching Mathematical Problems in Engineering 7

Experimental Results and Analysis. According to
phenomenon. Combining gradient information is beneficial to improve the matching stability and image edge constraint ability. Compared with the Census method, the CBT method can effectively avoid poor performance in low texture and repeated texture regions. erefore, the edge preservation effect of the CBT method is better than those of the Census. According to Table 1, the computational efficiency of the CBT method is better than that of joint cost methods, which also indicates CBT method is simpler than the joint cost methods. Compared to Figures 7 to 16, the results of the proposed dense matching method are better than that of the traditional SGM method. It can rectify the paths of cost aggregation, reduce the propagation of incorrect cost, and increase the accuracy of dense matching by using image prior feature points to optimize the cost volume. e proposed method can increase the detailed information of disparity maps and make the information of disparity maps become rich and complete. As for the error maps, the error map quality of the proposed method is the best, which indicates the corresponding disparity result has a better matching accuracy. By observing the percentage of effective disparities in the histogram, the proposed method can increase the number of effective disparities, which indicates the integrity of the disparity maps is improved and the mismatching phenomenon has been suppressed. erefore, the accuracy of dense matching can be effectively improved by using feature points to rectify the aggregation paths. By observing the experimental results, the results of the proposed method are better, which demonstrates the effectiveness of the method.  Figure 11(e), (c) Figure 11(f ), (d) Figure 12(d), (e) Figure 12(e), and (f ) Figure 12(f ).
From the evaluation results of RMSE and PBM in Tables 2-5, the quality of AD disparity results is the lowest and the mismatching rate is the highest. e quality of CBT results is the best, which indicates combining color information and gradient information of the image can effectively improve the stability of the method. e CBT method can make the disparity results close to the ground truth. Compared with the SGM method, the indicators of the proposed method are better, and the mismatching rate of the corresponding disparity results is lower. It proves that using feature point constraints is also helpful in enhancing the quality of disparity maps and improving the accuracy of dense matching. e objective quality evaluation results are consistent with the subjective quality evaluation results. erefore, from the experimental results and indicator evaluation results, the proposed dense matching method has certain superiority in vision and indicator, which demonstrates the proposed method is feasible.

Conclusions
e mismatching phenomenon in traditional SGM was researched.
e reasons for mismatching in the SGM method were analyzed. Combined with the properties of feature matching and dense matching, a binocular images dense matching method considering image adaptive color weights and feature points was proposed. e experimental results indicate that the proposed dense matching method has good stability. e method can reflect the real color information of images, and be beneficial to improving the whole quality and detailed information of the disparity map. In the meantime, the method can significantly reduce the mismatching phenomenon and enhance the accuracy of dense matching. Compared with other methods in this paper, the error matching rates of Cone and Teddy images in the proposed method are the lowest, which are 3.05 and 4.17, respectively. ence, the proposed method has certain advantages in improving the accuracy of 3D image reconstruction.

Data Availability
e dataset used to support the findings of this study is included in the article, which is cited at relevant places within the text as [37].

Conflicts of Interest
e authors declare no conflicts of interest.