Local Stereo Matching Using Adaptive Cross-Region-Based Guided Image Filtering with Orthogonal Weights

Adaptive cross-region-based guided image filtering (ACR-GIF) is a commonly used cost aggregation method. However, the weights of points in the adaptive cross-region (ACR) are generally not considered, which affects the accuracy of disparity results. In this study, we propose an improved cost aggregation method to address this issue. First, the orthogonal weight is proposed according to the structural feature of the ACR, and then the orthogonal weight of each point in the ACR is computed. Second, the matching cost volume is filtered using ACR-GIF with orthogonal weights (ACR-GIF-OW). In order to reduce the computing time of the proposed method, an efficient weighted aggregation computing method based on orthogonal weights is proposed. Additionally, by combining ACR-GIF-OW with our recently proposed matching cost computation method and disparity refinement method, a local stereo matching algorithm is proposed as well. &e results of Middlebury evaluation platform show that, compared with ACR-GIF, the proposed cost aggregation method can significantly improve the disparity accuracy with less additional time overhead, and the performance of the proposed stereo matching algorithm outperforms other state-of-the-art local and nonlocal algorithms.


Introduction
Binocular stereo vision can acquire disparity information with required accuracy only by using image pairs of the same scene that are obtained from different angles. It is widely applied in three-dimensional reconstruction [1], three-dimensional measurement [2], robot vision [3], unmanned driving [4], and so on. e purpose of stereo matching is to find corresponding points in a pair of images. e accuracy tends to affect the precision of disparity results. So, it is a critical procedure in binocular stereo vision systems and is a topic of significant research interest in the field of computer vision.
Currently, stereo matching algorithms are mainly divided into two categories. e first category is based on deep learning. In particular, algorithms based on convolutional neural networks (CNNs) have developed rapidly in recent years.Žbontar and LeCun [5] proposed an algorithm that utilizes the Siamese network to compute the matching cost. Pairs of small image patches were used to train the network to determine the similarities among the patches. Pang et al. [6] proposed a cascade residual learning network divided into two stages, where each stage independently calculates disparity maps and multiscale residual signals. Chang et al. proposed a pyramid stereo matching network [7] comprising a spatial pyramid pooling module and a 3D CNN module. Kunal Swami et al. [8] proposed an end-to-end network model to utilize rich multiscale context information, which most existing methods cannot achieve. A large effective receiving domain is implemented to extract multiscale context information, while retaining the required spatial information in the entire network. Kim et al. [9] proposed a network architecture that uses both the matching cost volume and disparity map as inputs. eir architecture contains two subnetworks, namely, the matching probability construction network and the confidence estimation network. Such methods achieve high matching accuracy, but ground truth disparity maps are required in advance, especially for end-to-end network models. e second category is conventional algorithms. ese conventional algorithms can be classified as global, semiglobal, and local algorithms. Global algorithms usually obtain disparities by solving the minimum value of the global energy functions, including graph cuts [10] and belief propagation [11]. ey are characterized by high matching accuracy and low computing efficiency. e first semiglobal algorithm was proposed by Hirschmüller [12], which mainly used the idea of dynamic programming. e matching accuracy and computing efficiency of semiglobal algorithms lie between those of global and local algorithms. Local algorithms are based on cost aggregation within the specified support region, and the matching accuracy is usually lower than those of the first two types. However, the computing efficiency is higher. Such algorithms generally employ the following four steps [13]: matching cost computation, cost aggregation, disparity computation, and disparity refinement.
Cost aggregation refers to summing or averaging the matching cost in the support region of each pixel, which directly influences the computing efficiency and accuracy of the algorithm. It is one of the most important steps and also the primary focus of many studies. Filtering-based cost aggregation methods are currently adopted by most local algorithms, in which cost aggregation is interpreted as the filtering of the matching cost volume. Compared with other filtering methods, guided image filtering (GIF) [14] is usually the preferred approach, owing to its superior filtering effect and computational efficiency.
Hosni et al. [15] were the first to apply GIF to cost aggregation, achieving good results. Various cost aggregation methods have been proposed on the basis of this approach. Based on weighting GIF [16], Hong et al. [17] proposed a cost volume filtering method, in which the adaptive weight based on the local variance is used to control linear coefficients according to local texture features; this yields better performance. Kordelas et al. [18] proposed a content-based GIF, in which two rectangular support regions with different sizes are employed. e support region was selected according to the texture homogeneity of the local area around the pixels. In order to improve the accuracy of object edges and areas with discontinuous disparities in images and to reduce the noise in the matching targets, Hamzah et al. [19] proposed an adaptive support weight based on iterative GIF. Moreover, Wu et al. [20] proposed a strategy to fuse GIF and MST (minimum spanning tree) filter, which embedded the local support region-based GIF into the MST filter based on the whole image; it improved robustness of textureless and highly textured regions. Zhu et al. [21] introduced the weight into the energy function of GIF, thus considering the entire image as a support region.
is resulted in better matching accuracy.
Adaptive cross-region-based guided image filtering (ACR-GIF) is a cost aggregation method adopted by many local stereo matching algorithms currently [22][23][24][25][26]. However, the weights of points in the adaptive cross-region [27] (ACR) are generally not taken into account, which affects the accuracy of results. e main contributions of this paper to address this issue are summarized as follows: (1) An improved cost aggregation method is proposed.
Firstly, according to the structural characteristic of the ACR, the orthogonal weight is proposed, and then the matching cost volume is filtered using ACR-GIF with orthogonal weights (ACR-GIF-OW). (2) In order to improve the computational efficiency of the proposed method, an efficient weighted aggregation computing method based on orthogonal weights is proposed. (3) Combining ACR-GIF-OW with our recently proposed matching cost computation and disparity refinement methods, a local stereo matching algorithm is proposed.
e main contributions of this paper are different from our previous paper [28]. In the previous paper, our work focused on matching cost computation and disparity refinement, so a gradient calculation method and a multistep refinement method based on ACR were proposed. However, the main contributions of this paper are mainly related to cost aggregation as mentioned above. e remainder of this paper is structured as follows. e related work using ACR-GIF is discussed in Section 2. e proposed stereo matching algorithm using ACR-GIF-OW is described in Section 3. Experimental results and discussions are presented in Section 4 and finally, Section 5 concludes this paper.

Related Work
In this section, we discuss cost aggregation methods using ACR-GIF as they are more relevant to our proposed approach. GIF is analyzed first since ACR-GIF is an improved version of GIF.
GIF utilizes support regions with fixed shapes and sizes. As a result, the matching accuracy of textureless areas or regions with discontinuous disparities in images is affected. erefore, it has become imperative to acquire adaptive support regions according to different regions and structures in images.
Due to its simple implementation and high computing efficiency, many local stereo matching algorithms use the ACR as the support region. Yang et al. [22] established the rectangular ACR where the boundary is determined by the endpoint of the support arm, and the endpoint is the spot where the color difference exceeds the threshold and is closest to the center pixel in the given direction. is approach ensures that most pixels in the support region are similar to the center pixel. Zhu et al. [23] adopted both color difference and distance as conditions in constructing the rectangular ACR. e support arm extends when both conditions are met.
is method can effectively reduce outliers from occluded regions or areas with discontinuous depths in the support region.
In addition to the rectangular ACR, the arbitrary-shaped ACR has also been commonly employed. Xu et al. [24] used the arbitrary-shaped ACR-based GIF, where the length of the support arm is determined by the color similarity of RGB channels. Zhu et al. [25] added an exponential threshold to the decision rule for arm length in order to process textureless regions; subsequently, they proposed an adaptive threshold using image variance to address the issue of the support region not being constructed in the same image structure when the brightness changes. Furthermore, in order to improve the accuracy of textureless regions, Yan et al. [26] proposed a decision rule for arm length by using dual constraint linear variable thresholds to construct the arbitrary-shaped ACR.
e above works all put emphasis on how to construct the ACR; the weight of each point in the ACR is not considered. Different from them, our method takes the orthogonal weight of each point in the ACR into consideration.

The Proposed Algorithm
e stereo matching algorithm proposed in this paper mainly includes five steps: (1) input image preprocessing, (2) matching cost computation, (3) cost aggregation using ACR-GIF-OW, (4) disparity computation, and (5) disparity refinement. A flowchart of the proposed algorithm is shown in Figure 1. Each step is detailed in the following sections.

Input Image Preprocessing.
Since guided images are required in matching cost computation, it is necessary to preprocess the rectified input image. GIF has edge-preserving feature and good smoothing effect, and the computation time is independent of the size of the support region. Hence, guided images are obtained by using GIF.

Matching Cost Computation.
To render the model more robust and achieve more accurate results, we adopt a matching cost computation method that combines the absolute differences (AD), census transformation [29], and the gradient.
AD is computed by using information on RGB channels according to the following equation: where (2) Here, ⊗ represents a bitwise connection operation, W k is the window centered on the point k, i is an arbitrary point in W k , and I(i) and I(k) are gray values of points i and k, respectively.
Subsequently, the Hamming distance of binary bit strings between corresponding points is computed to measure the similarity between them. We assume that S cen L (m) is the binary bit string of point m in the left image and S cen R (n) is the binary bit string of the corresponding point n in the right image when the disparity is d.
e Hamming distance between m and n can be expressed as e gradient is calculated by our recently proposed method [28]. It combines the RGB gradient of the input image and the guided image to compute the gradient in x and y directions, respectively. e expressions are as follows: where gx and gy represent the gradient in x and y directions, respectively. e superscripts I and G, respectively, represent the input image and the guided image. By weighted fusion of the above-mentioned approaches, the matching cost computation function is acquired as follows:

Mathematical Problems in Engineering
Here, λ AD , λ cen , λ gx , and λ gy are the weight of AD, census transformation, the gradient in x direction, and the gradient in y direction, respectively; C AD , C cen , C gx , and C gy are the matching cost of the corresponding approach.

ACR Construction.
Points with similar color in the support region may arise from the same structure of the image; thus, these points have similar disparity [27]. In order to ensure that only points with similar color are included in the support region, the ACR is adopted. Double thresholds of the distance and color difference are used to restrict the arm length [30]. e decision rules are as follows: p e is an arbitrary point on the support arm with center point p, p n is the point preceding p e in the direction of the support arm, and D c (p, p e ) � max c∈ R,G,B { } |I c (p) − I c (p e )|, D d (p, p e ) � |p − p e |. C 1 , C 2 , L 1 , and L 2 are thresholds, and C 2 < C 1 and L 2 < L 1 .
When the above rules are fulfilled, the center point is considered to be the starting point that expands in four directions, and the expansion stops when one of the decision rules is not satisfied. e ACR R(p) can be expressed as the union of all horizontal support arms H(q), whose center point q is on the vertical support arm of point p, as shown in Figure 2.

e Orthogonal Weight Calculation.
According to the construction process and structural feature of the ACR, we detect a horizontal path at first and then a vertical one when observed from any point to the center point. erefore, the weight of each point relative to the center point can be computed by multiplying the weight between the adjacent points on the path [31]. Since the weight can be decomposed into two parts (the horizontal weight and the vertical weight), we name it as the orthogonal weight, as shown in Figure 3.    Here, q(i, j) is an arbitrary point in the R(p), p(x, y) is the center point, and HW j,y and VW i,x represent the horizontal and vertical weights of point q relative to point p, respectively. e orthogonal weight can be computed by multiplying the horizontal and vertical weights, and thus, the orthogonal weight of q can be expressed as To obtain the calculations associated with HW j,y and VW i,x conveniently, we construct the weight matrix of horizontal adjacent points, WH, and the weight matrix of vertical adjacent points, WV. e information on RGB channels is used to compute the weights of adjacent points, as shown in the following formulae: where h and w, respectively, represent the row and column indices of image I. e expression of function f(t) in equations (7) and (8) is shown as follows: where parameters δ and c are constants. e purpose of introducing function f(t) is to avoid the problem that when there is a significant local difference in the intensity between adjacent pixels on the path, information tends to get lost [21].
After calculating the matrices WH and WV, the horizontal and vertical weights of q can be computed as follows: According to equation (10), another recursive form of computing the horizontal weight is formulated as Similarly, according to equation (11), another recursive form of computing the vertical weight is formulated as According to equations (12) and (13), HW j,y and VW i,x can be computed from the center point in both directions, such that the weight of the previous point can be utilized, and only one multiplication is required in each computation.

e Weighted Aggregation Computing Method.
On the basis of the discussion in Section 3.3.2 and inspired by the orthogonal integral image [27], we propose a weighted aggregation computing method based on orthogonal weights. Based on the feature that the orthogonal weight can be decomposed; the process of weighted aggregation is decomposed into two orthogonal directions of one-dimensional weighted aggregation as follows: (1) e weighted sum of the horizontal support arm for each point is computed. To improve the computing efficiency for any point r(i, j) in the image, the weighted sum of the left and right support arms is separately computed as follows: where WSL(i, j) and WSR(i, j) are the weighted sum of the left and right support arms of point r, respectively; beg h and end h are the beginning and ending positions of the horizontal support arm for point r, respectively; img is a single-channel image.
For RGB images, one of the channels is selected according to the need for calculation.
(2) e weighted sum of the horizontal support arm for point r is computed and stored in WSHas follows: Here, WSU(x, y) and WSD(x, y) are the weighted sum of the up and bottom support arms for point p, respectively; beg v and end v are the beginning and ending positions of the vertical support arm for point p, respectively. (4) e weighted aggregation result of the ACR centered at point p is obtained as follows:

ACR-GIF-OW.
On the basis of the computing method described in Section 3.3.3, we adopt ACR-GIF-OW as the cost aggregation method. Since the color image has more obvious edge protection effects [14], we select the color image as the guidance image. We denote the guidance image as I and the filtering input image as the matching cost volume C. e linear model coefficients a p and b p can be acquired by minimizing the weighted local energy function that is formulated as where R p is the ACR centered at pixel p, ε is a regularization parameter, and W(q, p) is the orthogonal weight of point q defined in equation (6). e solution to this equation is given as Here, μ c p � ( q∈R p (W(p, q) · I c (q))/ q∈R p W(p, q)) (c ∈ R, G, B { }), C p � q∈R p (W(p, q) · C(q, d))/ q∈R p W(p, q), p is the 3 × 3 covariance matrix of I in R p , and U is an 3 × 3 identity matrix. e linear model is then used to compute the filtered result, which is also the result of cost aggregation, as shown as follows: 6 Mathematical Problems in Engineering where a q � (1/ p∈R q W(p, q)) p∈R q a p , b q � (1/ p∈R q W(p, q)) p∈R q b p , and CA is the matching cost volume after cost aggregation. e comparison of results after cost aggregation using ACR-GIF [28] without/with orthogonal weights is shown in Figure 4.

Disparity Computation.
We use the winner-take-all strategy [13] for the disparity computation, in which the disparity corresponding to the minimum matching cost of each point in CA is selected as the initial disparity. e expression is given by Here, d ini (m) is the initial disparity of point m.

Disparity Refinement.
ere are many outliers in the initial disparity map that need to be detected and corrected by disparity refinement. In this study, a multistep refinement method [28] proposed by us recently is adopted, and each step has been elucidated in the following sections.

Left-Right Consistency Check and Outlier's Classification.
e left-right consistency check judges whether the disparities of two points satisfy the condition given in the following equation: where d L ini and d R ini represent the initial disparity map of the left and right images, respectively; x 0 and y 0 are the point indices.
Subsequently, the detected outliers are divided into two classes: one has the corresponding point in the right image, and the other does not exist the corresponding point in the right image. e first class is called the corresponding outlier, and the second is called the no-corresponding outlier. e steps described below correct the two classes separately.

ACR Voting.
To replace the disparities of outliers with that of reliable points, we first use ACR voting, in which the total number of votes of reliable points and the highest number of votes among different disparities are counted. We then consider the following conditions: Here, N T is the total number of votes, N max is the highest number of votes among different disparities, and N and P are thresholds. If both equations (23) and (24) are satisfied, the disparity corresponding to the highest number of votes is used to replace the outlier's disparity. Meanwhile, the outlier is marked as reliable. In order to deal with as many outliers as possible, this step is iterated five times.

ACR Four-Direction Propagation Interpolation.
For corresponding outliers, the nearest reliable points are found in their own ACR along the directions of the four support arms. e corresponding disparities are separately marked as d RL , d RR , d RU , and d RD .
en, the disparities of these outliers are replaced by equation (25), and they are marked as reliable points. Here, . For dealing with as many outliers as possible, this step is iterated three times.

Two-Direction Propagation Interpolation.
For remaining corresponding outliers, the nearest reliable points are found along the left and right directions, and the corresponding disparities are recorded as d l and d r , respectively. en, the disparities of these outliers are replaced by equation (26), and they are marked as reliable points.

No-Corresponding Outliers Interpolation.
After the above-mentioned steps, the remaining outliers are mainly no-corresponding outliers. Since such outliers usually appear in the leftmost area of the image, we use one-direction propagation interpolation; that is, the nearest reliable point is found along the right side of the outlier. e disparity of the outlier is then replaced and the outlier is marked as reliable.

Subpixel Refinement.
To reduce the error caused by disparity discontinuity, an approach based on quadratic polynomial interpolation is used for subpixel refinement as follows:

Experimental Results and Discussions
We carried out our experiments on Middlebury evaluation platform [32], whose dataset includes two parts: training sets and test sets. Every part has fifteen image pairs with different resolution at least 1300 × 1100 pixels. Owing to the high resolution, complicated scene structure, and different lighting or exposure conditions, the results of the dataset can actually reflect the robustness and accuracy of the algorithm. e parameters and thresholds in the proposed stereo matching algorithm are set as follows: δ � 1/510, c � −3, λ AD � 30/255, λ cen � 45/255, λ gx � 5/255, λ gy � 15/255, ε � 0.01 2 , C 1 � 15/255, C 2 � 12/255, L 1 � max(H, W)/30, L 2 � max(H, W)/60, N � 40, and P � 0.5, where H and W represent the height and width of the input image, respectively. Among them, the values of λ AD , λ cen , λ gx , and λ gy are referenced from [23], the values of C 1 , C 2 , L 1 , L 2 , N, and P are referenced from [28], the value of ε is the same as in [15].

Efficiency of the Proposed Weighted Aggregation Computing Method.
In order to verify the effectiveness of the proposed weighted aggregation computing method, the computation time of straightforward computing (the weight of each point in the ACR is computed according to equations (12) and (13) and then summed by traversal) and the computing method described in Section 3.3.3 are compared. e experimental environment used is Matlab 2018b, and the computer configuration is Intel Core i7-8750H CPU and 16G memory. e results of training sets are shown in Figure 5. e chart illustrates that the computation time of the proposed computing method is obviously less than that required for straightforward computing. Among them, the computation time for Shelves is reduced by a maximum of 80.2%; the computation time for Recycle, Vintage, Jadeplant, and Adirondack is, respectively, reduced by 79.9%, 79.1%, 78.9%, and 78%. Owing to the relatively low resolution and disparity level, the percentage reduction of Teddy and ArtL is comparatively low, that is, 64.6% and 55.6%, respectively. e computation time is reduced by 72.7% on average. e above data indicate that, compared with straightforward computing, the proposed computing method can effectively reduce computing time and improve computing efficiency.

Comparison of ACR-GIF and ACR-GIF-OW.
To verify the effect of the proposed cost aggregation method, two stereo matching algorithms that, respectively, adopt ACR-GIF used in [28] and ACR-GIF-OW for cost aggregation are compared in terms of their disparity result and time overhead. Except for cost aggregation, all other steps of two algorithms are identical.

Comparison of Disparity Results.
e metric bad 2.0 is used to quantitatively evaluate the accuracy of disparity results. It is the default metric of Middlebury evaluation platform that represents the percentage of bad pixels with disparity errors greater than 2.0 pixels. e results of the training sets are shown in Figure 6.
As observed in Figure 6(a), in nonoccluded regions, except Shelves and Teddy, the values of bad 2.0 for the remaining images are reduced to varying degrees. Among them, the value of Motorcycle and MotorcycleE can be reduced by more than 40%; the value of Adirondack, PlaytableP, and Recycle can be reduced by more than 30%; the value of Piano, Pipes, and Vintage can be reduced by more than 20%. From Figure 6(b), we can see that, in all regions, except ArtL, Shelves, and Teddy, the values of bad 2.0 for the remaining images are also reduced to varying degrees. Among them, the values of Motorcycle and MotorcycleE are reduced by more than 30%; the values of Adirondack, PlaytableP, and Recycle are reduced by more than 20%; the values of Piano, Playtable, and Vintage are reduced by more than 15%. Figure 7 shows the comparison result of the bad 2.0 weighted average on training sets, which is obtained from Middlebury evaluation platform (the weight of each image is given by the platform). It can be seen from Figure 7 that the value of ACR-GIF-OW is evidently lower than that of ACR-GIF. e weighted average can be reduced by 24.1% and 16.3% in nonoccluded regions and all regions, respectively.
According to the above results, we can conclude that, compared to that of ACR-GIF, the accuracy of disparity results obtained by using ACR-GIF-OW is significantly superior in both nonoccluded regions and all regions.
Next, to compare the results of two algorithms more intuitively, we select three images from training sets and make comparisons of disparity maps and the corresponding error maps acquired by two algorithms, as shown in Figure 8.
We find that, in error maps of ACR-GIF-OW, black regions in red boxes are apparently smaller in area (the black color implies that the disparity errors are greater than 2.0 pixels), and these red boxes mainly correspond to weakly textured and textureless regions in the image. e above result indicates that ACR-GIF-OW can improve the disparity accuracy of these regions, thereby improving the overall disparity accuracy.

Mathematical Problems in Engineering
Furthermore, the performance on the low texture, repetitive pattern, plain color, and discontinue regions is compared, as shown in Figure 9.
It can be found in Figure 9 that the performance of ACR-GIF-OW is obviously better than ACR-GIF (the area of black region in the red box), especially in low texture, plain color, and discontinue regions.

Comparison of Time Overhead.
Using the same experimental environment and computer as described in Section 4.1, the time overhead of ACR-GIF and ACR-GIF-OW is compared. e result is shown in Figure 10. e chart illustrates that the time overhead of ACR-GIF-OW is more than that of ACR-GIF, owing to the weights' computation. Among them, the time overhead of Teddy has the lowest growth rate of 4.5%; the time overhead of Adirondack has the highest growth rate of 17.6%. e average growth rate of time overhead on training sets is 12.7%.
Combining the results acquired in Sections 4.2.1 and 4.2.2, we can conclude that comparing the increase in time overhead shows that ACR-GIF-OW exhibits an obvious improvement with regard to the disparity accuracy. us, considering both the accuracy and the time overhead, the proposed method is advantageous over ACR-GIF.

Analysis of Parameter Setting.
Parameters δ and c are two key parameters in the process of the orthogonal weight calculation. Figure 11 shows the effect of parameters δ and c with different settings. Figure 11(a) indicates that when δ > 1/510, the disparity accuracy of both nonoccluded and all regions becomes worse; on the contrary, the accuracy remains unchanged. Figure 11(b) reveals that when c � −3, the disparity accuracy can achieve its best in both nonoccluded and all regions. According to the above conclusions, the best setting of δ and c is 1/510 and −3, respectively.

Effect of Each
Step in the Proposed Algorithm. e proposed algorithm is composed of several steps. For analyzing how does each step affect the final result, except for bad 2.0, the weighted average of avgerr on training sets is also used. Avgerr is another metric that means average absolute error in pixels. e results after performing each step are shown in Figure 12. Figure 12 presents the contribution of each step to the reduction of disparity errors in both nonoccluded and all regions. After performing CA, the value of bad 2.0 in nonoccluded and all regions can be decreased by 39.4% and 31.1%, respectively; the value of avgerr in nonoccluded and all regions can be decreased by 56.4% and 34.4%, respectively. After performing DR, the value of bad 2.0 can be, respectively, reduced by 27.7% and 22.9% in nonoccluded and all regions; the value of avgerr can be, respectively, reduced by 29.5% and 47.6% in nonoccluded and all regions.
Moreover, Figure 13 shows the effect of each step in DR. e charts indicate that the contribution of each step is distinct for different metrics and regions. For the bad 2.0, Step 5 is the most effective. But for the avgerr, the errors are significantly reduced by Step 1. In all regions, Step 2 is more effective than in nonoccluded regions, and so are Steps 3 and 4. us, the combination of these steps guarantees a better result.

Comparison with Other Local Stereo Matching
Algorithms. To verify the performance of the local stereo matching algorithm proposed in this paper, we select seven state-of-the-art local algorithms for comparison, namely, DAWA-F [33], FASW [20], IEBIMst [34], ADSM [35], DoGGuided [36], IGF [37], and ISM [38]. e disparity map comparison of five stereo images in Middlebury datasets is shown in Figure 14.
To make a quantitative comparison of disparity results, the metric bad 2.0 is employed again. e comparison results of whole datasets are shown in Tables 1 and 2, where the bold fonts indicate the best results. e weighted average shown in the last row is the weighted average of training sets and test sets. e weights are given by the Middlebury evaluation platform.
e results of Tables 1 and 2 indicate that, whether in nonoccluded regions or all regions, the number of the best results obtained by the proposed method is higher than other local algorithms, and the rest of the results are also relatively good. Besides, both of the weighted average values are the best as well. Furthermore, for image pairs with different illuminations like ArtL, PianoL, and DjembeL and with different exposures like MotorcycleE and Classroom2E, better results are acquired by the proposed algorithm. is demonstrates that the proposed algorithm has better robustness when the illumination or exposure changes for a pair of images. To summarize, the performance of the proposed algorithm is evidently better than those of the other seven state-of-the-art local algorithms.

Comparison with Other Nonlocal Stereo Matching
Algorithms. In order to make a more comprehensive comparison, except for local algorithms, we also select six state-of-the-art nonlocal algorithms for comparison, namely, DDL [39], LS_ELAS [40], TSGO [41], DSGCA [42], SIGMRF [43], and SPPSMNet [44]. Among them, DSGCA,          SIGMRF, and SPPSMNet are algorithms based on deep learning. e disparity map comparison of the same five stereo images is shown in Figure 15. e comparison results of bad 2.0 are shown in Tables 3  and 4, where the bold fonts also indicate the best results. Similar to Tables 1 and 2, the results of Tables 3 and 4 can also indicate that the robustness and performance of the proposed algorithm are obviously better than those of the other six state-of-the-art nonlocal algorithms.

Conclusions
In this study, an improved cost aggregation method is proposed, in which the matching cost volume is filtered by ACR-GIF-OW. Different from other methods adopted ACR-GIF, the proposed method takes the orthogonal weight of each point in the ACR into consideration. For improving the computational efficiency of the proposed method, a weighted aggregation computing method based on orthogonal weights is proposed. Moreover, a local stereo matching algorithm using ACR-GIF-OW is proposed as well. Experimental results demonstrate that, compared with that of ACR-GIF, the disparity accuracy of ACR-GIF-OW is significantly improved at the cost of a smaller growth of the time overhead, and the stereo matching algorithm proposed in this paper exhibits superior performance than those of other state-of-the-art local and nonlocal algorithms. In the future work, we will introduce the orthogonal weight in the disparity refinement to further improve the disparity accuracy.

Data Availability
e dataset used to support the findings of this study are included in the article, which are cited at relevant places within the text as [32].

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.