Upper-Lower Bounds Candidate Sets Searching Algorithm for Bayesian Network Structure Learning

Bayesian network is an important theoretical model in artificial intelligence field and also a powerful tool for processing uncertainty issues. Considering the slow convergence speed of current Bayesian network structure learning algorithms, a fast hybrid learning method is proposed in this paper. We start with further analysis of information provided by low-order conditional independence testing, and then two methods are given for constructing graph model of network, which is theoretically proved to be upper and lower bounds of the structure space of target network, so that candidate sets are given as a result; after that a search and scoring algorithm is operated based on the candidate sets to find the final structure of the network. Simulation results show that the algorithm proposed in this paper is more efficient than similar algorithms with the same learning precision.


Introduction
Bayesian network (BN), as a graphic model handling uncertainty issues, has been discussed by many researchers through these years.It has been applied successfully in many areas such as fault detection, medical diagnosis, and traffic management [1][2][3].It had been years that people focused on finding a data structure to compress the storage of joint probability density and developing inference algorithms based on that data structure, and then BN was brought up.After that, when BN had been a successful tool in this area, researchers began to follow with interest structure learning algorithms of BN based on sample data.Essentially, the problem of structure learning of BN is part of combinatorial optimization issues, and it is proved theoretically that learning structure from data was NP hard [4].Nonetheless, some heuristic methods have been proposed and performed well in several areas [5,6].
Currently, there are two approaches for BN structure learning.One is CI-test method [7,8] and the other is scoredsearching method [9,10].The first one uses conditional independence tests (CI test) to determine the conditional independence relationships among all the variables and build networks based on these relationships.The scored-searching methods attempt to find the network by maximizing a scoring function which indicates how well the network fits the data.
Both methods above have their own advantages and disadvantages.CI-test algorithms are simple and easy to operate.Because low-order CI test is quite computational effective and has high precision, they are very helpful to build a hyper graph of the target (it will be discussed in the following sections).The main drawback about these methods is about performing high-order CI test, which needs large sample sizes and has low accuracy along with the orders of getting higher [11,12].The scored-searching methods may have higher precision in structure learning than the CItest methods.But they are relatively slow, especially when the scales of networks become large, as the structures space increases super-exponentially with the number of nodes.
It is obvious that if it is possible to combine the learning efficiency of CI test and prediction accuracy of scoredsearching algorithm, we will get a better algorithm to deal with BN learning issues.In view of the above reasons, some hybrid methods have been proposed [13][14][15][16][17].These methods may use CI-test algorithms to learn a network structure pattern at first and then use some scored-searching algorithms to find the final BN structures based on the previous pattern.These hybrid methods may perform better in some applications, but there are still some problems unsolved, as fusion in algorithm level does not always mean promotion in performance.Take MMHC (max-min hill climbing) as an example.It includes two steps: the first one called MMPC (max-min parents and children), which constructs parents and children sets of each node via CI-test method, is to provide a partial skeleton frame.While in the second step a hill climbing algorithm is operated to refine every edge in the network.To ensure the precision of partial skeleton frame given by MMPC, high-order CI test must be involved, which unfortunately is unstable [11,12].So in the searching phase, it is not based on the prior structure given by MMPC strictly but operates in a relatively open space.This manner seems somewhat wasteful for computational resources.
The upper-lower bounds candidate sets searching algorithm (UBCS) which is proposed in this paper can provide a more instructive set of candidate networks through constructing the upper-lower bounds of the structure space by low-order CI test.In this framework, we get the final network structure by using the greedy search algorithm.Simulation shows that it could guarantee precision and reduce the time complexity at the same time.
Because nodes in BN have no difference with random variables, they will not be distinguished in this paper, and they will be both called as node.In addition, let  ⇀   denote directed edge V  → V  , and let   denote undirected edge Definition 2 (V-structure).Let BN = (,), where  = (, ), ∀V where MI(,  | ) = 0 means random variables sets ,  are conditional independence given , which can be expressed as Ind(,  | ) too.Therefore, it usually uses MI(,  | ) as CI test among random variables and calls cardinal number of  as orders of CI test.Furthermore, its zero order CI test, if  = Φ.Definition 4 (Markov equivalence).Two DAGs are graph equivalent if and only if (1) both of them have the same skeleton frame and (2) they have the same V structure.
The characteristics of Markov equivalence have been given by Frydenberg [19], while Verma and Pearl expanded these into DAG [20].Based on the Markov equivalence, all the DAGs composed by the same nodes set can be divided into different equivalent classes, which are called Markov equivalent classes.Each equivalent class indicates a unique statics model, and it can be represented by a PDAG (partial directed acyclic graph), which is called complete PDAG.

Method
Given data sets , BN structure learning methods are devoted to find the best network structure of BN = (,).The reference [21] proved the structure quantity of the BN which contains  nodes is From the formula above it can be seen that the potential network structure space rises exponentially with the node increasing.So searching for the candidate sets of network structure is a good approach to reduce dimensions effectively.Based on it, we provide an algorithm named upper-lower bounds candidate sets searching algorithm (short for UBCS), which can get the ultimate network model by constructing the upper-lower bounds of the target network pattern to find candidate sets of network and using search and scoring method.In the following section we will give the first part of the UBCS which is called upper-bound of graph learning algorithm (UGLA), prove the output  + is the upper bound of moral graph of the target network, and then bring in principle of nonincreasing for 0-order mutual information to reach the second section of the UBCS which is called lower bound of graph learning algorithm (LGLA).After that the searching algorithm will be discussed.

UGLA and
LGLA.We will first give the algorithm description in Algorithm 1.
UGLA processing indicates that  + is a triangulating graph, and for triangulating graph, we have the following theorem being tenable.Theorem 5. Any undirected graph  is complete PDAG, if and only if  is a triangulating graph.
(1) Input: Data set D; Variable set  = {V 1 , V 2 , . . ., V  }; (2) Initialization: undirected graph  + = (, ), where  = Φ; (3) Order-0 CI test: for each pair variables For each pair variables The theorem which has been proved in [22] shows that  + is a complete PDAG; that is, in the best situation,  + obtained by UGLA is the PDAG of the target BN.Certainly, this condition is too strong, and we will give a theorem below which has more generality.Theorem 6.Given sample dataset , let the optimal structure of  = (, ) to be learned be   , moral graph of the BN is    , and then the undirected graph  + obtained by UGLA is the upper bound of partially ordered set  = (   , ⊆).
Proof.It only needs to prove that      ⊆  + holds for each      ∈ , where     = (,    ) and  + = (,  + ).Theorem 5 tells that if the complete PDAG of   is triangulating graph,      =  + is tenable.So the following task is to get     ⊂  + proved.As all the graphs have the same nodes set, it only needs to prove that there is   ∈  + in any case that undirected edge   ∈      .For all the undirected edges in      , it is clear that it can be divided into two classes.One is composed by the undirected edges transformed by directed edges in   ; let it be     .The other one is being constructed by the moral edges adding between the nodes which have the same parent; let it be Ẽ   .It is obvious that, for ∀  ∈     , 0-order CI test ensures that   ∈  + must be tenable; for ∀  ∈ Ẽ   , the fifth step of UGLA is the assurance of   ∈  + .Proof is completed.Theorem 7. Any V-structure in  = (, ) exists in a subgraph decomposed from  + by the method MPD (maximal prime subgraph decomposition) [23].
Theorem 7 was proved in reference [17].This theorem guarantees that the  1 sub ,  2 sub , . . .,   sub obtained by UGLA covers all the V-structure in the target graph.
The above section discussed the upper bound of  = (   , ⊆), from what we can get the candidate sets for searching the structure.In the following part, the lower bound of BN structure space will be debated for choosing a relatively precise initial value.
We will start with a lemma as blow.
Lemma 8.For any two random variables V  , V  ∈  and subset

and the equity holds if and only if
The proof is omitted.
Proof.As  ⇀   ∈ , without loss of generality, let MI(V  , V  ) = min{MI(V  , V  ), MI(V  , V  )}, and it is only necessary to prove that MI(V  , V  ) > MI(V  , V  ).According to the definition of mutual information, MI(V  , It can be seen that V  ∈  from the relationship among V  , V  , V  , where  scarifies Ind(V  , V  | ), so the equation above can be expressed as ( Lemma 8 shows MI(V  , V  ) ≤ MI(V  , V  ), while the equity holds for /V  = Φ.
Proof is completed.
We named Theorem 9 as principle of nonincreasing for 0-order mutual information (principle NZMI).The condition of the theorem indicates that it is not suitable for the situation when there is V-structure.For the BN structure shown in Figure 1, it cannot tell whether MI(, ) is bigger than MI(, ) or not, only from the Theorem 9. But, if we can eliminate all the V-structure first and then come to consider the connected relationship between node  and nodes , we will notice that MI(, ) = MI(, ) > MI(, ) must hold when MI(, ) is the biggest.So it only needs 1-order CI test to rule out the possibility of  connection, so it turns out that  are connected.As a matter of fact, principle NZMI provides a new approach for ascertaining whether there is an undirected edge between two nodes without using the Vstructure methods.
Algorithm 2 gives the  − learning algorithm (LGLA) based on the discussion above.
The VSTA (V-structure test algorithm) is involved in LGLA list as in Algorithm 3.
For V-structure test algorithm (VSTA) see Algorithm 3. VSTA is a testing method which only provides "best effort" services.It only involves 0-order and 1-order CI tests, the high accuracy of which guarantees the existence of Vstructure detected.For the situation that there is more than one edge between two father nodes in a V-structure, the detecting will not be operated.This approach avoids bringing in high computation and additional interference edges.
These is a theorem that holds for the output from LGLA. that a V-structure must exist in  if it is contained by  − , which does not hold water certainly conversely.It is obvious that an acyclic graph will be still acyclic no matter any directed edges are deleted.So  − is a PDAG.Proof is completed.
Theorem 11.  − is the lower bound of  if output from VSTA is entirely accurate.
Proof is omitted.The condition of Theorem 11 is relatively strong.As a matter of fact, it can be considered that  − is the lower bound of  in many cases.Take network Asia as an example; Figure 2 shows that all the edges in  − exist in the PDAG of original network.

Searching Method.
Hill-climbing algorithm which is based on search and scoring method is one of the greedy searching algorithms for BN structure learning.It contains three searching operators: adding edge, subtractive edge, and reversing edge.The hill-climbing method is also involved in the UDCS algorithm, but the searching processing is restricted by the upper and lower bounds given by UGLA and LGLA, which means abandoning the new structure got form searching operators if the new one beyond the bounds is given by UGLA nor LGLA.In case of trapping in local optimum too fast, bring in suboptimal competitive mechanism and retain the top  structures which got higher scores each round to the next iteration.The  is decided by the scale of network in principle, which means the greater the scale of network is, the bigger the  is.But it should be noticed that the oversize candidate sets will lead to an increase in the time complexity of algorithm.For the BN network of which scale is as big as alarm, recommend empirical value is  = 10.In order to make a comparison with the BENA algorithm mentioned in [17], the BDeu score function is used as the objective function of searching.

Experiment
We test the performances of UBCS with BNEA and MMHC together in Alarm network.The comparison of scores is shown in Table 1.For ease of observation, we present a normalization to deal with the results.Table 1 shows the results averaged over 10 runs, where SS represents sample size.As can be seen from Table 1, the performance of UBCS is the best among three methods when SS is small, and the scores from all three methods tend to be very close with the increase of SS.Although the VSTA which is involved by LGLA cannot be adequate to assure the facticity of the detection, it has little impact on the learning performance according to the simulation result.This phenomenon is caused by two reasons: on one side, the upper bound given by UGLA is very stable, and on the other side, the effect is reduced by the process of search and scoring.It should be noticed that the BENA shows "over learning" when SS becomes larger (SS > 5000).The "over learning" is considered as a phenomenon only occurs in small sample size typically, while, as the combinatorial optimization with high dimensions (such as BN), it is hard to get plenty of samples, and the time cost is also unacceptable when the current algorithms operate on extremely large datasets.So it is reasonable to find the algorithms that could get a balance between precision and generalization in dataset with appropriate size, which is the intention of UBCS, as the restriction of the upper-lower bounds.
Figure 3 shows that UBCS has an obvious advantage over the other two algorithms in time complexity.The experiment was operated on a typical desktop computer.Comparing  with MMHC, both BNEA and UBCS perform better in time complexity because of using MDP to reduce the dimensions of searching space.On the other hand, because MMPC which is involved in MMHC is used in BNEA, MMHC should have the same time complexity with BNEA, in the worst situation.While BNEA only involves 0-order and 1-order CI test, therefore it has better performance in time complexity.

Conclusion
We propose a hybrid method for Bayesian network structural learning (UBCS).In this method, two constructional algorithms are given to build the upper and lower bound of the BN structure and theoretical proof is completed as well.UGLA which is the first part of UBCS outputs upper bound of the moral graph of the target structure, while the following part named LGLA offers lower bound of the target structure's PDAG.Principle NZMI is also proved in this paper, which indicates the hidden information in 0-order CI test that could be used for reducing the reaching space.As only involving low order CI test, UBCS has an advantage in time complexity comparing with other hybrid learning methods, which is also supported by simulation results.

Figure 2 :
Figure 2: Three subpictures from left to right are structure of Asia network, PDAG of Asia network, and  − of Asia network.

Figure 3 :
Figure 3: Comparison of time complexity in different algorithms.

Table 1 :
Comparison of scores based on the data set from UBCS, BENA, and MMHC.