Bayesian Networks are graphic probabilistic models through
which we can acquire, capitalize on, and exploit knowledge. they are becoming
an important tool for research and applications in artificial intelligence
and many other fields in the last decade. This paper presents
Bayesian networks and discusses the inference problem in such models. It
proposes a statement of the problem and the proposed method to compute
probability distributions. It also uses D-separation for simplifying
the computation of probabilities in Bayesian networks. Given a Bayesian
network over a family
Bayesian networks are graphical models for probabilistic relationships among a set of variables. They have been used in many fields due to their simplicity and soundness. They are used to model, represent, and explain a domain, and they allow us to update our beliefs about some variables when some other variables are observed, which is known as inference in Bayesian networks.
Given a Bayesian network [
The computation of the probability distribution of
In large Bayesian networks, the computation of probability distributions and conditional probability distributions may require summations relative to very large subsets of
This paper describes the computation of
We consider discrete random variables, but the results presented here can be generalized to continuous random variables with the density of
The paper is organized as follows. Section
A Bayesian network (BN) is a family of random variables the set for each
We know that this is equivalent to the equality
The joint probability distribution corresponding to the BN in Figure
An example of a Bayesian network.
We consider the probability distribution
We say that there is a link from
The probability
The probability distribution
Level two Bayesian network.
We define the set of close descendants of a node
In the example below (Figure
For each subset
We can identify this subset with the union of all
For each
In the example above (Figure
Consider the BN in Figure
Suppose we are interested in computing the distribution of
By marginalizing out the variables
By doing this we loose the structure of the BN.
If we do the marginalization as follow we obtain (according to Bayes' theorem):
In other words
Which provides
Level two Bayesian network on
The variables used in the marginalization above, to keep a structure of a BN2, is the the set of close descendants defined above.
More general, if we have to sum out more than one variable there is a need to order the variables first. The aim of the inference will be to find the marginalization, or elimination, ordering for the arbitrary set of variables not in the target. This aim is shared by other node elimination algorithms like “variables elimination” [
The main idea of all these algorithms is to find the best way to sum over a set of variables from a list of factors one by one. An ordering of these variables is required as an input. The computation depends on the ordering elimination; different elimination ordering produce different factors.
The algorithm we proposed to solve this problem is called the “Successive Restrictions Algorithm” (SRA) [
The general idea of the algorithm of successive restrictions is to manage the succession of summations on all random variables out of the target
The principle of the algorithm was presented in details in [
We have introduced an algorithm which makes possible the computation of the probability distribution of a subset of random variables
This algorithm tries to achieve the target distribution by finding a marginalization ordering that takes into account the computational constraints of the application. It may happen that, in certain simple cases, the SRA would be less powerful than the traditional methods [
In addition to the SRA we propose, especially for large Bayesian networks, to segment the computations into several less heavier computations that could be carried independently. These segmentations are possible using the D-separation.
Consider a DAG
On an intermediate node
(a) A converging connexion. (b) A diverging connexion. (c) A serial connexion.
Let
In other words, A chain is not d-separated by
If
Given a subset
As we can see on the following example:
Another classic graphic property used with some inference algorithms that we can find in the literature is the notion of the Moral graph.
Given a DAG
its associated moral graph is the undirected graph
of pairs of pairs
In a similar way, we define what we call the hypermoral graph defined as follow:
Given a DAG
its associated hypermoral graph is the undirected graph
of pairs of pairs
In Figure
Hypermoral and Moral Graphs.
The moral graph helps defining the moral partition as follow.
(i) We call a
In an equivalent way “there exists a chain, in the moral graph
In a similar way we define the hypermoral partition.
(ii) We call an
(a)
The following results show the possibility of segmenting the computation of the probability distribution
Let
The proof of the theorem can be found in [
The set
(a)
We have seen in the last two sections that the application of the SRA for the computation of
The following two sets. The subsets The set
constitute a unique partition defining a BN2 on
This theorem indicates that the level two Bayesian network, characterizing the probability distribution
Let us show that the partition of the target
As
The application of the SRA for the computation of
Let us proof this result by induction on the cardinality of
Let us assume that Card
On one side, marginalizing out the variable (
The BN2 resulting from this marginalization is formed of the new node
On the other side, since Card
all other nodes, in other words the nodes
This shows the result in this first case.
Let us suppose now that Card
We are going to sum out following the inverse order of the given hierarchical order.
Let us assume that the result is right till step
What justify the proof by induction is that once the marginalizing out
The result is right till step
Showing the result for
On one hand, let's first try to find the partition obtained by application of the SRA. We know that, the marginalization of
So if we write
In this case, the partition of
If we write
all other isolated nodes, that is, the nodes of
On the other hand, let's now try to determine the partition of
We have on
Since
all other isolated nodes
So we have
This shows that this partition is same partition obtained by application of the SRA.
(a): fraction of a BN before marginalizing out
(a): fraction of a BN2 on