Generalized belief propagation (GBP) is a region-based belief propagation algorithm which can get good convergence in Markov random fields. However, the computation time is too heavy to use in practical engineering applications. This paper proposes a method to accelerate the efficiency of GBP. A caching technique and chessboard passing strategy are used to speed up algorithm. Then, the direction set method which is used to reduce the complexity of computing clique messages from quadric to cubic. With such a strategy the processing speed can be greatly increased. Besides, it is the first attempt to apply GBP for solving the stereomatching problem. Experiments show that the proposed algorithm can speed up by 15+ times for typical stereo matching problem and infer a more plausible result.
1. Introduction
Many engineering problems related to computer vision, statistical physics, signal processing, and artificial intelligence can be formulated as an inference problem in probabilistic graphical models such as Bayesian networks or Markov Random Fields (MRF).The goal is to find the maximum a posteriori (MAP) configuration [1]. However, it is an NP hard problem to get the exhaustive solution, and thus we may get the approximate inference by graph cuts or message passing algorithm and so on. The most popular variant of message passing algorithm is Belief Propagation (BP). Recently, because of its flexibility and efficiency, BP and its variants are boomed especially in image restoration, optical flow, and stereo.
BP is an optimization tool which is firstly proposed by Pearl for singl Bayesian network [2] and extended to loopy graphs such as MRF in last decade. The virtue of it is that we can use it to compute marginal probabilities for graphical models, at least approximately, in a time that grows only linearly with the number of nodes in the system [3]. In BP algorithm, each variance starts with the same initial message and iteratively updates all the messages passing from its neighbor variances, and calculates messages for its every neighbor, then passes new messages back until converged. In factor graph or Bayesian network, BP can be used to perform exact inference for every variance. However, when it refers to highly connected graphs with massive conflicting interactions such as the MRF of stereo matching, the convergence problem becomes a tricky issue anyway. The precision of configuration will vary with the cyclicity of graph. To impulse and accelerate variances to converge, many works have done and also achieved some plausible progress. However, being enslaved to the absence of the convergence property of BP in graph models with loops, the development of BP seems slow. On the other hand, generalized belief propagation (GBP) proposed by Yedidia et al. [4] with its better convergence property against BP has received more attentions recently.
GBP can be considered as a variant of standard BP. It is also an instance of cluster variation methods. In the literature, BP can only converge to a stationary point of Bethe free energy, while GBP can converge to a more accurate stationary point of Kikuchi free energy [5]. Therefore, it leads GBP to take the advantage of better convergence than BP. Despite of the characteristic of good convergence, as a toll, it is really computationally expensive. When considering the temporal complexity with the optimal version in [6], BP as an approximate method reaches linear complexity, while the canonical GBP takes quartic complexity. This has limited its applicability in some small-scale problems, for example, image denoising and image restoration [1], and obviously prevents GBP away from some more complicated problems, for example, stereo matching even in a small size image pair.
To accelerate GBP algorithm, some optimization methods have proposed recently. Petersen et al. proposed two strategies of fast GBP for map estimation on 2D and 3D grid-like MRF [7]. One is to use a caching method that significantly reduces the number of multiplications during GBP inference. The other is to introduce a speed-up for computing the map estimate of GBP cluster messages by presorting its factors and limiting the number of possible combinations. Pawan and Torr also provides a method of fast memory efficient GBP [8].
However, for solving the stereo matching problem, it is still a fraction of the need. This paper proposes a new method named direction set method which is introduced into the pairwise message computation stage to make GBP more efficiently. With the proposed method, the temporal complexity can be decreased from quartic to cubic. Furthermore, this is the first attempt to apply GBP for solving the stereo matching problem. For completeness, we will briefly introduce the MRF and BP in the next section.
The remainder of this paper is organized as follows. Section 2 gives a sketch of basic theory. Section 3 provides the definition of the GBP with min-sum messaging and its caching structure. Section 4 represents a detailed description of the proposed strategies for GBP optimization. Section 5 gives the experiments and results of stereo matching and Section 6 summarizes the findings.
2. The Basic Principle
Human understand a scene mainly using the spatial and visual information which is assimilated through our eyes. These information such as region or object, mainly based on the contextual constraints, are extremely necessary for interpretation. The context-dependent object such as image can be modeled in a convenient and consistent way through MRF theory. It is achieved through characterizing mutual influences among such entities using conditional MRF distributions [9].
MRF is firstly introduced into computer vision in [10] and have dominated the fields of image processing and computer vision since the early 1980s. As the most popular type of prior models for gridded image-like data, which include not only regular natural images but also two-dimensional fields such as motion or depth maps, as well as binary fields such as and image restoration and segmentations, MRF provides a mathematical foundation for the characterization of contextual constraints and the derivation of the probability distribution of interacting features [9].
Without loss of generality, let M be a set of indexes M={1,…,m}, P={p1,…,pm} be a set of observed nodes, L={l1,…,ln∣n≤m} be a set of labels. Here we set all the labels are discrete.
N={Ni∣i∈M} represents the neighbor system to indicate the interrelationship between nodes or the order of MRF. Recently, a learning high-order MRF model named Fields of Excepts has been proposed which could get more sufficient priors. However, we use one-order MRF (also called pairwise MRF) for simplification. Figure 1 shows a sample of MRF used in this paper.
The model for a n4 neighborhood Markov random field (also called Pairwise MRF). The dash circles are the observed nodes, while the white circles are the unobserved labels.
Many computer vision problems can be formulated as a labeling problem in which the solution is assigning a label from the set L to each of the nodes in P. In the literature, a mapping function F:P→L which F={f1,…,fm} can be represented in this processing. It has been proved that the joint probability Pr(fpm) of an MRF is a Gibbs distribution. Besides, according to the Hammersley-Cliffod theorem, the posterior probability Pr(fpm) only depends on its neighborhood Npm, which means thatPr(fpm∣P-pm)=Pr(fpm∣Npm),∀pm∈P.
According to Bayes’ Rule, the posterior distribution for a given set y and their evidence p(y∣x), combined with a prior p(x) over the unknowns x, is given by p(x∣y)=p(y∣x)p(x)p(y).
If we take the negative logarithm of both sides, we get -logp(x∣y)=-logp(y∣x)-logp(x)+logp(y).
Here logp(y) is a constant which is used to make the p(x∣y) integrate to 1. To find the MAP solution, we simply minimize (2.3), which can also be treated as an energy function:E(x,y)=Ed(x,y)+Es(x),
where the Ed(x,y) is the data penalty and Es(x) is smoothness penalty.
Recall (2.1) and rewrite(2.4)E(fp,Np)=Ed(fp,Np)+Es(fp),
where
Es(fp)=Esq∈Np∖q(fp,fq).
Therefore, the energy function is
E(F)=∑p∈PEsq∈Np∖q(fp,fq)+∑p∈PEd(fp,Np).
In the large label space, because of massive variances and various uncertainties, it becomes a nontrivial task [11] to make a global inference using local information. For this reason, many approximated inference algorithms are proposed to find the MAP estimation against the exact answer. In this case, the inference problem usually can be mapped into an energy minimization problem which has a profound mathematic foundation in the literature. In the last few years, two approximate algorithms have been developed in MRF approximated inference problem with their efficiency and comparatively high accuracy, for example, graph cut (GC) [12] and BP [6, 13].
In standard belief propagation with pairwise MRF, a variable mij(xj) can be vividly treated as a “message” from a node i to its neighbor node j which contain the information about what state node j should be in. The message is a vector of same dimensionality as the number of possible label. The value of each dimension manifested that how this label might be corresponding to the node.
Recall the function of (2.1) and write Pr(fpm) as Pr(pi), thenPr(pi)=Π(i,j)∈Nϕij(pi,pj)∏iϕi(xi,yi),
where ϕij(pi,pj) is the pairwise interaction potential and ϕi(xi,yi) is the “local evidence”.
Usually, the message must be nonnegative. A high value of message show that the node “believes” the posterior probability of Xj is very high. The message update rule ismij(xi)t=∑xiϕi(xi)φij(xi,xj)∏k∈Ni∖jmki(xi)t-1,
where t represents the number of iterations T as shown in Figure 2.
Illustration of message passing in BP. mij(xi)t is a message form node i to its neighborhood node j to indicate what state should be node j in.
The belief is the product of “local evidence” of the node and all messages send to this nodebi(xi)=kϕi(xi)∑j∈Nimij(xi).
The standard BP we have described above is also named sum-product BP. There is another variant BP which is more simple and easy to use: max-product (or max-sum in log domain). In max-product BP, (2.9) is rewritten
mij(xi)t=maxxi(ϕi(xi)+φij(xi,xj)+∑k∈Ni∖jmki(xi)t-1),bi(xi)=k(∑j∈Nimij(xi)+ϕi(xi)).
It indicates that which states should the node most likely be in. Though BP is an efficient implicit inference algorithm for MRF with loops. It can only converge to the stationary points of the Bethe approximation of the free energy where the node number of regions is at most two. As has discussed above, GBP can get a more accurate inference than BP. In next section, we extend BP to GBP.
3. Message Passing
The GBP which was firstly proposed by Yedidia et al. can be considered as a region based BP method [4]. Specifically, the basic intuitive idea behind GBP is to compute more useful message between regions other than nodes. As a Kikuchi free energy approximation method, GBP in general allows an arbitrary number of nodes to gather as a clique and involves the clique information to the whole passing process, which yields better approximation to the posterior probability, while BP only do node-to-node message passing around.
As another source of information, that is, the clique information, involved in the passing process, the search capability for the minimum of an energy function is extensively upgraded. The update rules of the canonical GBP are defined as below:
mrs⟵k∑xr∖sφr∖s(xr∖s)∏mr′′S′′∈M(r)∖M(s)mr′′s′′∏mr′s′∈M(r,s)mr′s′br⟵kφr(xr)∏mr′s′∈M(r)mr′s′,
where r is the regions and s is their correspondent subregion, mrs is the message sending from region r to its subregion s, φ(x) is the local “evidence” of node x, M(r) is the set of messages sending from out side of region r to some nodes inside region r, M(s) is defined similarly, M(r,s) is the set of messages sending from some nodes in region r but not in region s to some nodes in region s, and br is the belief of region r.
The definitions of the regions, for example, r and s, in (3.1) directly determine the performance of GBP. It is very hard to choose the reasonable size of region. Though the basic clusters should encompass as many cycles as possible, the complexity will grows exponentially with the number of size. To some degree, they are somewhat contradictory, the more lager size is, the less efficiency the algorithm is. In practice, it is infeasible to set the cluster sizes larger than four.
In this paper, we concern the implementation instance introduced in [14]. This instance of GBP is comprised of two types of regions definition, that is, single node region and double node region, and the correspondent messages are named edge message and cluster message, respectively. The message update rules are defined in (3.2) and (3.3). The sketch map of the message passing process can be seen in Figure 3,ms→u(xu)=maxxs(ϕsφsuma→smb→smc→smbd→sumce→su),mst→uv(xu,xv)=maxxs,xt(ϕsϕtφstφsuφtvma→smc→smb→tmd→tmab→stmce→sumdf→tv)ms→umt→v.
Equation (3.2) describes the edge message sending from a specific node s to node u. Equation (3.3) describes the cluster message sending from pair-node s and t to pair-node u and v.
(a) A diagram of edge messages passing from node s to its neighborhood node u. (b) A diagram of cluster messages passing from region {st} to region {uv}. The dash (black) lines stand for the edge messages. The dash (red) lines denote the cluster messages. The solid line represents the result message correspondingly.
4. Efficiency Improvement
In order to improve the efficiency of GBP, the direction set method is proposed which can reduce the computation complexity of cluster message. Considering (11), when xu and xv are given, the temporal complexity to compute a specific item in the cluster message is O(n2). It is almost contributed from the first term in the equation, which can be regarded as finding the minimum value in a grid-like dataset which size is s and t, respectively. Usually all the elements in the lattice are traversed to find a minimization. And the temporal complexity is O(n2). Petersen et al. proposed a method to reduce the search space [7, 15], but it relies very much on the traverse order. The method which we suggested in this paper is very straightforward. The temporal complexity becomes O(n) when the direction set method is applied. Thus the total complexity for computing the cluster message becomes O(n3).
The direction set method adopted here is also called Powell’s method [16]. It is a classical numerical algorithm in function minimization or maximization. It decomposes an N-dimensional (N-D) search problem into several one-dimensional (1D) search processes. Take an example in 2D lattice where a node P with a random initial position, and two orthogonal directions are given. First, P moves to the extreme value position which is found by searching along the first direction among the two initial directions. Second, P moves to another extreme value position by searching along the second direction given by initialization. Third, the first direction is substituted by the second, and the second direction is set to be a new direction which is determined by the initial position and the final position after two rounds of searching. Meanwhile, the final position is set to be the new initial position. The three steps are performed in an iterative way until P no longer moves. In another word, by searching along the two directions, there is no other position where its value is less than P. Thus, the final position is where P stops.
The general idea of the direction set or Powell’s method has a challenging problem that the two directions will “fold up on each other” in some cases. Once this happens, the search capability in this iteration will be weakened, and the process has a high risk of getting a subspace minimization instead of full N-D case. On the other hand, in practice it is hard for computer to search along an arbitrary direction where it needs more computation to determine which nodes are occupied. This paper adopts the method suggested in [16]. We set the two directions to be static and parallel along each axis. This setting not only keeps the orthogonal condition from the beginning to the end, but also makes the implementation easier because every search process is along one of the axes.
Although there is no special requirement for the start position, it is more useful to place the initial position close to the extreme value position. When xu and xv are given, the initial position at the s-t lattice is a tricky issue. To place it near the minimum value position, we assume that the combination of the independent minimum value positions of s and t is close to the actual minimum value position.
Through this optimization, the number of accessed positions is decreased from n2 to 2kn where k is the number of iterations which in our practice is about 2 to 3 in average, n is the search range, for example, the disparity range for stereo matching. Since the comparison operation takes main computation time, the general complexity becomes 2kn3 while the complexity of brute force search is n4. The efficiency rate is n/(2k). When n is larger, the rate of computation time is higher.
5. Experiments
Stereo matching has been one of the most challenging and fundamental problems in computer vision. A comprehensive research has been done in the last decade [17–22]. A latest evaluation of these various methods can be found in [23]. In the last few years, as is showed in [24], the global methods based on MRF have reached the top-performing.
In this section, stereo matching is formulated as a MRF inference problem. To achieve the MAP estimation, which can be yielded as an energy minimization problem, let P be the set of the image pixels in image pair and L be the disparity. The initial data cost calculated by the truncated linear transform which is robust to noise or outlier is defined as
D(fp)=λ⋅min(∑c∈{L,a,b}(IcL(p)-IcR(p-fp))2,T),
where γ is the cost weight which determines the portion of energy that data cost possesses in the whole energy, T represents the truncating value. Both of them are set experimentally. IcL(p) represents p’s intensity in the left image of channel c. IcR(p) is defined similarly. The Birchfield and Tomasi’s pixel dissimilarity is used to improve the robustness against the image sampling noise. It is noticeable that we calculate the data cost in the CIELAB (the L*a*b* standard of Commission Internationale de L’Eclairage) color space, and the Euclidean distance is used as the measure. Practical experiments show that it can improve the final results at some degree.
The smooth cost which expresses the compatibility between neighboring variables embedded in the truncated linear model is defined asV(fp,fq)=min(|fp-fq|,K),
where K is the truncating value. The smooth cost based on the truncated linear model is also referred to as discontinuity preserving cost, since it can prevent the edges of objects from over smoothing.
The corresponding energy function used here is the most conspicuous one which is defined as
E(f)=∑p∈PD(fp)+∑(p,q)∈NV(fp,fq),
where N are the edges in the four-connected neighborhood set.
The energy function defined in (5.3) can be considered as a description of the scene. The objective is to find a solution which can minimize (5.3), which means the correct depth information in the scene. Generally, a rather complex energy function can get the solution more correct. However, to simplify the presentation and to be consistent and comparable with other methods, the dualistic energy function as (5.3) is used in this paper.
The proposed method is evaluated on MiddleBury test. We compared our results with efficient BP [6] and canonical GBP to show the improved efficiency as well as the accuracy of the proposed method. The same set of certain typical parameters were used, where specifically, T=30.0 and λ=0.87 in the data cost term, K = 10.0 in the smooth cost term, d = 16. All experiments were tested on a personal computer with 1.6 GHz CPU and 2 G DRAM.
Apparently, efficient BP is much faster than others. However, it is less accurate. The ultimate purpose of the proposed method is to improve the efficiency of canonical GBP while keeping it in a good accuracy. As shown in Figure 4, the execution time which combines the two strategies can be extensively reduced, while the convergence energy rises a little because direction set may cause loss of accuracy. The canonical GBP with caching and direction set can achieve about 15+ times of the speed rate. The experiments were tested with the image “Tsukuba” (384 × 288 size).
Evaluation of efficiency and convergence energy.
The error of result is calculated with the ground truth, respectively. From the evaluated accuracy listed in Table 1, the accuracy of proposed method is obviously better than that of canonical GBP. Comparing it with the accuracy of efficient BP, the proposed method yields a similar level. On the other hand, through the comparison between the proposed method and efficient BP, it is noticeable that efficient BP tends to get a frontoparallel result which makes the surface oversmooth and results in a layered effect. In the contrary, the proposed method does not have the drawback of layered effects like that caused by efficient BP, but the 3D map becomes blurred at the boundaries and some noises cannot be eliminated. In fact, although a layered result can reach a lower energy, it cannot always be a better description of the real scenes (Figure 5).
Evaluation of errors.
Error (pixel)
0≤error≤1
1<error≤2
2<error≤3
error>3
Efficient BP
89.2%
7.0%
0.3%
3.5%
Canonical GBP
79.1%
13.2%
2.8%
4.9%
Proposed method
86.5%
8.1%
1.8%
3.6%
Quality evaluation. From (a) to (d): the test image, ground truth, result of canonical GBP, and result of the proposed method.
6. Conclusion
This paper studied the challenging issues in both physics and computer vision, that is, the efficient optimization for GBP and stereocorrespondence for 3D vision. A min-sum scheme is invented for the message computing process in GBP, and this new method is applied to solve the stereo matching problem. Direction set is proposed for improving the efficiency. For a typical image pair, it can speed up the matching process to about 15+ times. Besides this improved speed in each single thread, with a parallel computing architecture, it can further catch up or take over most contemporary global algorithms due to its message-based passing process. Furthermore, with the proposed method we can get more plausible results in visual favorite because its better convergence can outperform most of other global algorithms. The practical experiments also prove these conclusions beyond both efficient BP and canonical GBP.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (NSFC-60870002, 60802087, 60873264, 61070214), the 973 Plan [2011CB302800], NCET, and the Science and Technology Department of Zhejiang Province (2009C21008, 2010R10006, 2010C33095, Y1090592).
WainwrightM. J.JaakkolaT. S.WillskyA. S.MAP estimation via agreement on trees: message-passing and linear programming20055111369737172-s2.0-2774445627810.1109/TIT.2005.856938PearlJ.1988Morgan KaufmannYedidiaJ. S.WilliamT. F.WeissY.Understanding belief propagation and its generalizations2003chapter 8236249YedidiaJ. S.FreemanW. T.WeissY.Generalized belief propagation2000137689695YedidiaJ. S.yedidia@merl.comFreemanW. T.billf@ai.mit.eduWeissY.yweiss@cs.huji.ac.ilConstructing free-energy approximations and generalized belief propagation algorithms20055172282231210.1109/TIT.2005.850085FelzenszwalbP. F.HuttenlocherD. P.Efficient belief propagation for early vision200670141542-s2.0-3374495108110.1007/s11263-006-7899-4PetersenK.FehrJ.BurkhardtH.fast generalized belief propagation for MAP estimation on 2D and 3D grid-like markov random fieldsIn Proceedings of the 30th Deutsche-Arbeitsgemeinschaft-fur-Mustererkennung (DAGM) Symposium on Pattern RecognitionJune 2008Munich, Germany1013PawanK. M.TorrP. H. S.Fast memory-efficient generalized belief propagationProceedings of the European Conference on Computer Vision, Part IV451463LiS. Z.20093rdSpringerGemanS.GemanD.Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images1984667217412-s2.0-002151820910.1109/TPAMI.1984.4767596ZBL0573.62030CattaniC.ccattani@unisa.itFractals and hidden symmetries in DNA201020103150705610.1155/2010/507056ZBL1189.92015BoykovY.VekslerO.ZabihR.Fast approximate energy minimization via graph cuts20012311122212392-s2.0-003550996110.1109/34.969114SunJ.LiY.KangS. B.ShumH.-Y.Symmetric stereo matching for occlusion handlingProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '05)June 20053994062-s2.0-24644491905YedidiaJ. S.FreemanW. T.WeissY.Bethe free energy, kikuchi approximations and belief propagation algorithms2001TR-2001-16Mitsubishi Electric Research LaboratoriesBakhoumE. G.ebakhoum@uwf.eduTomaC.cg.toma@yahoo.comDynamical aspects of macroscopic and quantum transitions due to coherence function and time series events201020101342890310.1155/2010/428903ZBL1191.35219PressW. H.TeukolskyS. A.VetterlingW. T.FlanneryB. P.19922ndCambridge, UKCambridge University PressBalasubramanianR.DasS.SwaminathanK.Reconstruction of quadratic curves in 3-D from two or more perspective views2002832072192-s2.0-1874437019710.1080/10241230215283ZBL1130.68330ZitnickC. L.KangS. B.Stereo for image-based rendering using image over-segmentation200775149652-s2.0-3454715068010.1007/s11263-006-0018-8SzeliskiR.ZabihR.ScharsteinD.VekslerO.KolmogorovV.AgarwalaA.TappenM.RotherC.A comparative study of energy minimization methods for Markov random fields with smoothness-based priors2008306106810802-s2.0-4324909185010.1109/TPAMI.2007.70844WoodfordO. J.TorrP. H. S.ReidI. D.FitzgibbonA. W.Multiview stereo via volumetric graph-cuts and occlusion robust photo-consistency200729122241224610.1109/TPAMI.2007.70712BleyerM.RotherC.KohliP.Surface stereo with soft segmentationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern RecognitionJune 201015701577YangQ.WangL.YangR.StewéniusH.NistérD.Stereo matching with color-weighted correlation, hierarchical belief propagation, and occlusion handling20093134925042-s2.0-5954909678310.1109/TPAMI.2008.99ScharsteinD.SzeliskiR.A taxonomy and evaluation of dense two-frame stereo correspondence algorithms2002471–37422-s2.0-003653747210.1023/A:1014573219977ZBL1012.68731ScharsteinD.SzeliskiR.Middlebury Stereo Vision Research2008, http://vision.middlebury.edu/stereo/eval/