Local Structure Recovery of Chain Graphs after Marginalization

Graphical models, also known as Markov networks and Bayesian networks, including independence graphs, directed acyclic graphs (DAGs), and chain graphs (CGs) have been applied widely to many fields, such as stochastic systems, data mining, pattern recognition, artificial intelligence, and causal discovery. Chain graphs (CGs) are widely used to represent independence, conditional independence, and causal relationships among the random variables [1–5]. Structure recovery of CGs has been discussed by many authors [4–6]. A statistical conclusionmay be reversed aftermarginalization over some variables, which is called Yule-Simpson paradox [7, 8]. For sampling design, prior knowledge or assumptions on models are necessary for valid structural learning of CGs and parameter estimates since some variables may be unobserved, such as the faithfulness assumption [4, 5] and collapsibility [9, 10]. On the other hand, for data analysis, conditions are necessary for marginalizing over some observed variables. Various conditions have been presented for avoiding a statistical conclusion reversion about association and parameters of linear models [10–14]. Collapsibility of parameter estimates for undirected graphical models over some variables has been discussed in [9, 15–17], and that of DAGs has been discussed in [18, 19]. In this paper, we discuss local structure recovery for a CG when there exist unobserved or latent variables or after marginalization over observed variables. Suppose that there is an unknown true CG with a large number of variables but we may be interested in construction of a local structure of the CG from a marginal distribution of a subset of variables. We present a condition for this localized recovery and explain which edges and directions of edges in local structure can be recovered validly from the marginal distribution and which edges may be spurious. We say that an edge or a direction is recovered validly from the marginal distribution if it is the same as that recovered from the joint distribution. The condition is useful for both sampling design and data analysis. This localization of structure recovery is related to identification and collapsibility. Section 2 gives notation and definitions. In Section 3, we present theoretical results on local structure recovery. Finally a discussion is given in Section 4.


Introduction
Graphical models, also known as Markov networks and Bayesian networks, including independence graphs, directed acyclic graphs (DAGs), and chain graphs (CGs) have been applied widely to many fields, such as stochastic systems, data mining, pattern recognition, artificial intelligence, and causal discovery.Chain graphs (CGs) are widely used to represent independence, conditional independence, and causal relationships among the random variables [1][2][3][4][5].Structure recovery of CGs has been discussed by many authors [4][5][6].A statistical conclusion may be reversed after marginalization over some variables, which is called Yule-Simpson paradox [7,8].For sampling design, prior knowledge or assumptions on models are necessary for valid structural learning of CGs and parameter estimates since some variables may be unobserved, such as the faithfulness assumption [4,5] and collapsibility [9,10].On the other hand, for data analysis, conditions are necessary for marginalizing over some observed variables.Various conditions have been presented for avoiding a statistical conclusion reversion about association and parameters of linear models [10][11][12][13][14]. Collapsibility of parameter estimates for undirected graphical models over some variables has been discussed in [9,[15][16][17], and that of DAGs has been discussed in [18,19].
In this paper, we discuss local structure recovery for a CG when there exist unobserved or latent variables or after marginalization over observed variables.Suppose that there is an unknown true CG with a large number of variables but we may be interested in construction of a local structure of the CG from a marginal distribution of a subset of variables.We present a condition for this localized recovery and explain which edges and directions of edges in local structure can be recovered validly from the marginal distribution and which edges may be spurious.We say that an edge or a direction is recovered validly from the marginal distribution if it is the same as that recovered from the joint distribution.The condition is useful for both sampling design and data analysis.This localization of structure recovery is related to identification and collapsibility.
Section 2 gives notation and definitions.In Section 3, we present theoretical results on local structure recovery.Finally a discussion is given in Section 4.

Notation and Definitions
In this section, we briefly introduce terminologies and notations on graph theory.Readers can refer to [3,20] for more details.
Let G  = (,   ) be a graph with a vertex set  and an edge set   .We say that there is an undirected edge, or a line, between vertex V and vertex  (denoted as V − ), if ⟨V, ⟩ ∈   and ⟨, V⟩ ∈   ; and there is a direct edge, or an arrow, from vertex V to  (denoted as V → ), if ⟨V, ⟩ ∈   and ⟨, V⟩ ∉   .Chain components of G  are obtained by removing all arrows in G  and taking the connectivity components of the remaining undirected graph.A chain graph with only undirected edges is known as an undirected graph (UG).A chain graph with only directed edges and without any directed cycles is known as a directed acyclic graph (DAG).
If V → , then we call V a parent of  and  a child of V.In general, we use pa G () and ch G () to represent the collection of parents and children of , respectively.If V − , then we call V a neighbour of .We use ne G () to represent the set of neighbours of vertex  in G  .If there is an edge between the vertices V and , then we say that V and  are joined or adjacent.The family of  contains  and its parents, which is denoted as fa G ().For a set  ⊆ , we can define similarly pa G () = ∪ ∈ pa() \ , ch G () = ∪ ∈ ch() \ , and fa G () =  ∪ pa G ().The boundary of  is the set of neighbours and parents of , which is denoted as bd G ().What is more, the boundary of a set  ⊆  is defined as bd G () = (pa G () ∪ ne G ()) \ .In case that the underlying CG is clearly specified, the subscript G is often ignored to simplify the notations.
If there is a sequence of the vertices  0 ,  1 , . . .,  −1 ,   ,  ≥ 0, with ⟨ −1 ,   ⟩ ∈   or ⟨  ,  −1 ⟩ ∈   , for all 1 ≤  ≤ , then we call it a route and  0 and   the ends of the route.Furthermore if for all 1 ≤  ≤  there is ⟨ −1 ,   ⟩ ∈   , we say the route is descending and use V  →  to denote a descending route from vertex V to .If the vertices in a route are all distinct from each other, then we call the route a path.If there is a descending path from vertex V to , then we call V an ancestor of vertex  and  a descendant of vertex V, which are denoted as an() and de(V), respectively.Similarly, the ancestral set of a set  ⊆  is defined as an() = ∪ ∈ an()\, and the descendant set is defined as de() = ∪ ∈ de() \ .Furthermore, we define An() = an() ∪ {}.A vertex V without any children in G  is called a terminal.Besides, if de G () = ⌀, we call  a terminal set.We call a route a pseudocycle if it satisfies  0 =   .Besides, it is called a cycle if it also satisfies  ≥ 3 and  0 ,  1 , . . .,  −1 are distinct vertices.A cycle or pseudocycle is directed if it is descending and has ⟨  ,  −1 ⟩ ∉   for some   {1, . . .,  − 1, }.Finally, if graph G  does not contain any directed cycles or directed pseudocycles, we will call it a CG (chain graph).
A section  of a route  = ( 0 ,  1 , . . .,  −1 ,   ) in CG G  means any maximum undirected subroute  in , for example,   − ⋅ ⋅ ⋅ −   with 0 ≤  ≤  ≤ .We call   and   the two ends of the section.Besides, if there is  −1 →   in  for some  > 0 (or   ←  +1 in  for some  < ), then we call   (or   ) a head-terminal, or else we call it a tail-terminal.The head-to-head section in reference to route  is the section which has two head-terminals, and a nonhead-to-head section if it has one head-terminal at most.We use {  , . . .,   } to denote the set of vertices of a section .If {  , . . .,   } ∩  = ⌀ with  ⊆ , then we consider that section  is outside of the set .If {  , . . .,   } ∩  ̸ = ⌀, then we consider that  is hit by set .
Another important concept in CG is complex where there is no extra edge among vertices on the path.The vertices  0 and   are called the parents of complex , denoted as par(), and { 1 ,  2 , . . .,  −2 ,  −1 } is called the region of .What is more, the complex with vertex  as one of its parents is denoted by  par() .At last, the Markov Blanket of vertex  is defined as MB() =  ∪ ch() ∪ ne() ∪ pa() ∪ par( par() ).
Next, we give the definition of -separation in CG G  .If a section  1 of route  satisfies the following two presentations, (1) if  1 is a head-to-head section in reference to  and  1 is outside of set , or (2)  1 is not a head-to-head section in reference to  and  1 is hit by set , then we say that  is separated by the set  ⊆  in G  .What is more, we say that the disjoint set ,  ⊆  in G  is -separated by the disjoint set  ⊆  if every route  between  and  is -separated by , and , ,  form a -separation which is denoted by If two CGs share the same -separation patterns over the same set of vertices, then we say that they are V V.All chain graphs, which are Markov equivalent with each other, form equivalent class, which is known as V V .It is well known that two CGs are V V if and only if they share the identical global skeleton and complexes [20].
If a probability distribution  over  1 ,  2 , . . .,  −1 ,   permits the following factorization [3], where  is the collection of chain components of CG G  and after given  pa() , the conditional probability distribution of   is denoted as (  |  pa() ); then we say that  is a compatible probability distribution in reference to G  .It is easy to check that if  is a compatible probability distribution in reference to CG G  , we have where   :  |   represents the conditional independence between   and   given   in .If the condition is strengthened to

Local Structure Recovery after Marginalization
Let , , and  be a partition of all variables in .In this section, we assume that : |  and suppose that variables in  are omitted or unobservable.The assumption of conditional independence has been discussed as one of conditions for collapsibility of parameter estimates over unobserved variables [9-12, 14, 18, 19].Collapsibility of parameter estimates further requires another condition that the separator  is a complete subgraph in the moral graph.A decomposition approach of structural learning proposed in [21] also requires the condition of a complete separator.In this section, we show under the assumption of conditional independence but without the condition of a complete separator that the local structure of CG G  over  ∪  can be partially recovered from the marginal distribution of observed variables in ∪.
In many practical applications, the conditional independence : |  can be judged with domain or prior knowledge, such as Markov chain, chain graphical models, and dynamic or temporal models [1].When all variables in the full set  are observed, we can first construct an undirected independence graph  over  from the observed data; then we find a set  which separates  and  in the undirected graph , and thus we have that : |  holds [2,3].Two CGs have the same Markov property if and only if they have the same skeleton (i.e., an undirected version of G  ) and the same complexes [20].Thus for recovering the structure of a CG, we can only learn the skeleton and complexes from a distribution of observed variables.A marginal distribution obtained from the distribution  with the Markov property of G  may not obey the Markov property of any CG.Although the class of CGs is not closed under marginalization in this sense, we show that local structure of a CG may be partially recovered under some conditions.First we discuss which edges of local structure over  ∪  can be recovered validly and which cannot from a marginal distribution of  ∪ .Theorem 2. Two vertices in  are -separated by a subset of  if and only if they are -separated by a subset of  ∪ .
Proof.The sufficiency is obvious since  ⊇  ∪ .For the necessity, let  and   be two vertices in  that are -separated by (⊆ ).Thus there is no edge connecting  and   in CG G  .Since  and   are contained in , bd() and bd(  ) are contained in  ∪ , otherwise  :  | .Without loss of generality, suppose that  is not an ancestor of   .Thus we have that bd() (⊆  ∪ ) -separates  and   .
From Theorem 2, we can see that the existence of edges falling into  can be determined validly from the marginal distribution of  ∪ .Theorem 3. Let  and  be two vertices in  and , respectively.Then  and  are -separated by a subset of  if and only if they are -separated by a subset of  ∪ .
We first discuss case (1).Let  = (, , V, . ..) ⊆ An() ∪ An().Note that  must not be the vertex  since there is no edge between  and , but V may be .Further we consider two subcases: (1.1)  is not contained in any head-to-head section, (1.2)  is contained in some head-to-head section.
For subcase (1.2), let us suppose the head-to-head section on route  is  →  − V − ⋅ ⋅ ⋅ −  1 ← , where  is the other parent of this head-to-head section.Since  is contained in An() ∪ An(), then  1 ∈ An() ∪ An(); thus we have the fact that the node  must not be .Otherwise,  1 ∈ ch() and  1 ∈ de(), which is contradictory to  1 ∈ An() ∪ An().Because : |  and  ∈ , we have that  ∉ .Thus  ∈  ∪  and then  ∈  = [(An() ∪ An()) ∩ ( ∪ )] \ {, }.Thus we have that the route  is -separated by node .Now we consider case (2) and show first that such a route contains a head-to-head section.Let  = (, . . ., , , . .., V, , . . ., ) such that  and  are contained in An() ∪ An() but no vertices from  to V are contained in An() ∪ An() since  is not completely contained in An() ∪ An().We know that the arrows must be oriented as  →  and  → V since ,  ∈ An() ∪ An() and , V ∉ An() ∪ An().Then there must be a head-to-head section between  and  on , and none of its descendants is in An() ∪ An().Thus vertices on the region of this head-to-head section and its descendants are not in  = [(An()∪An())∩(∪)]\{, }, and we obtain that  is -separated by .We proved this theorem.
According to Theorem 3, the existence of edges crossing  and  can also be determined validly from the marginal distribution of  ∪ .
Example 1 (continued).Consider again the CG in Figure 1.Let  = {, , , ℎ, , },  = {}, and  = {, , }.We have : | .Thus according to Theorems 2 and 3, we can obtain a local skeleton from the marginal distribution of  ∪ , which may have spurious edges, as shown in Figure 2. Similarly, suppose that variables in {, , , } are observed but those in {, . . ., } are not observed.Then we obtain the local skeleton from the marginal distribution of {, , , }, as shown in Figure 3.Note that the edges (, ), (, ), and (, ) falling into  are spurious in Figure 2, but they are absent in Figure 3.
Next we discuss recovery of complexes from a marginal distribution of  ∪ .We say that a complex (,  1 , . . .,   , ) ( ≥ 1) can be determined validly from the marginal distribution of  ∪  if the marginal distribution has the following two conditions: (1) : |  for some  ⊆ (∪) and (2)  :  | ∪{ 1 } and  :  | ∪{  }.Below we discuss which complexes of local structure can be determined validly from the marginal distribution and which cannot.The following two theorems show conditions for determining complexes and for validity of determined complexes, respectively.Theorem 4. If at most one vertex of a complex is not contained in , then the complex can be determined from the marginal distribution of  ∪ .
Proof.Let (,  1 , . . .,   , ) ( ≥ 1) be a complex where  and  are the parents of this complex and at most one of these vertices is not contained in .It follows from : |  that {,  1 , . . .,   , } ⊆  ∪ .From Theorems 2 and 3, we can validly determine the presence of edges (,  1 ), ( 1 ,  2 ), . .., (  , ) and find a subset  of  ∪  that -separates  and  and does not contain the vertices from  1 to   .Thus this complex (,  1 , . . .,   , ) can be determined from the marginal distribution of (, ).Theorem 5.If two parents of a complex are contained in , then the directions of the complex can be determined from the marginal distribution of  ∪ .
According to Theorem 4, from the marginal distribution of ∪, we can determine all complexes in the local structure which have at most one vertex in .When two parents of a complex are contained in , it may not be determined since the two parents may not be -separated by any subset of ∪.From Theorem 5, however, if a complex is determined from the marginal distribution of  ∪  whose two parents are contained in , the directions of the complex must be valid.A complex who has more than one vertex fall into  may be a spurious complex since the edges falling into  may be spurious.
From Theorems 2 to 5, we can get an approach for local structure recovery, in which edges can be recovered according to Theorems 2 to 3, and then directions can be determined according to Theorems 4 and 5. Suppose that we are interested in the local structure recovery of a CG over a set  of variables.Then we must find a set  to be observed based on the domain or prior knowledge such that  is large enough to -separate  from the set  of unobserved variables.
Example 1 (continued).Now we search complexes in Figures 2 and 3.For Figure 2, suppose that variable {} is not observed.According to Theorem 4, three complexes (, , , ), (, , , ), and (, ℎ, ) can be found from the marginal distribution of  ∪ , as shown in Figure 4.The complexes (, , ) and (, , ) cannot be found since there exist spurious edges between their parents, although the directions of ⟨, ⟩ and ⟨, ⟩ are oriented with other complexes.For Figure 3, there is no complex in this local structure because variables , , and  are conditional independent given variable .
If variable {} is omitted or unobservable, we only obtain the local structure in Figure 4, in which all directions and all edges not falling into  are valid, but edges falling into  (here (, ), (, ), and (, )) may be spurious.

Discussion
We showed that the conditional independence : |  is a sufficient condition for local structure recovery when there exist unobserved or latent variables or after marginalization over observed variables.The conditional independence is an important prior knowledge for local structure recovery.This prior may hold in many cases, such as Markov chain, chain graphical models, and dynamic or temporal models [1].Suppose that we are interested in the local structure of a CG over a set  of variables.We must find a set  which is large enough to separate  from the set  of unobserved variables.Based on the conditional independence : | , we explained which edges and directions of edges in local structure can be recovered validly and which cannot after marginalization.
Based on the theoretical results presented in this paper, we can efficiently recover local structures of a CG.Domain or prior knowledge of conditional independencies can be utilized to facilitate the structural recovery.The theoretical results can also be used for an observational study design and a split questionnaire survey sampling [22,23].