Statistical Identification of Markov Chain on Trees

The theoretical study of continuous-time homogeneous Markov chains is usually based on a natural assumption of a known transition rate matrix (TRM). However, the TRM of a Markov chain in realistic systems might be unknown and might even need to be identified by partially observable data. Thus, an issue on how to identify the TRM of the underlying Markov chain by partially observable information is derived from the great significance in applications. That is what we call the statistical identification of Markov chain. The Markov chain inversion approach has been derived for basic Markov chains by partial observation at few states. In the current letter, a more extensive class of Markov chain on trees is investigated. Firstly, a type of a more operable derivative constraint is developed. Then, it is shown that all Markov chains on trees can be identified only by such derivative constraints of univariate distributions of sojourn time and/or hitting time at a few states. A numerical example is included to demonstrate the correctness of the proposed algorithms.


Introduction
The classic study of continuous-time homogeneous Markov chains usually has a natural assumption that it has a known transition probability matrix or a transition rate matrix (TRM).However, the TRM of a Markov chain in realistic systems might be unknown although it is an underlying one (named as the underlying Markov chain), and that is what needs to be identified by observable data.For a Markov chain with a finite state space, it is determinative provided that its TRM is identified.In some realistic systems, moreover, one probably observes the partial motion of the underlying Markov chain.For example, one can only observe the sojourn sequences and hitting sequences on one of the states.Subsequently, one can fit the PDF of sojourn (resp., hitting) time by statistical techniques.It is easy and trivial to identify the TRM of this Markov chain if we know exactly the PDFs of sojourn time and hitting time of every state.
A natural question arises as to whether it is possible to identify the TRM by those few states.Due to the transition relations of states which intercommunicate in the underlying Markov chain, it is indeed possible.Thus, an issue on how to identify the TRM of the underlying Markov chain by partially observable information is derived from the great significance in applications.
In fact, the continuous-time homogeneous Markov model which described the gating kinetics of single ion channel is a good example to show the real application in biophysics and neuroscience; for example, see [1,2] and further publications.In experiments, the transition between the various states cannot be directly observed and only transitions between a small number of open states are observable.It is straightforward to derive the sojourn time and hitting time distributions for a single state or a small number of states.The PDFs of sojourn or hitting time generally take the form of a sum of exponentials (e.g., [2][3][4]).And the best-fitting approach and function can be found with a number of readily available computer programs [5][6][7].
Unfortunately, once the fitting is completed, it is very difficult to use these PDFs to identify the TRM of the underlying Markov chain.The most difficult challenge lies in finding out the hidden solutions and algorithms to perform the reverse operation.Thus, the identification has always been addressed directly using the maximum likelihood estimate (MLE) (e.g., see [8][9][10][11] and further references) rather than performing this inversion.Another series of publications using canonical forms of reduced dimensions analyze finite two-state trajectories, in which the system is aggregated into only two states, defined as the on state and the off state (irreversible transitions are also allowed); see [12,13] and the references therein for a review.All of them are powerful.However, each approach has pros and cons.None of them can perfectly identify all kinds of Markov chains.In general, for example, the sojourn times   and   may not be sufficient for parameterizing a given Markov model.This is shown by [14] where, in general,   and   as well as bivariate distributions of subsequent open and closed times,   and   , are necessary.
Note that those discarded the prior information from the underlying topology.Thus, it is indeed possible to perform the complicated inversion since the prior information from the underlying architecture of Markov chain is very helpful to accomplish it.This is what we call statistical identification of Markov chain (SIMC).We developed a Markov chain inversion approach (MCIA) to open the possibility of solving the SIMC.This issue can be reduced to three steps for applications.The first one is to obtain the necessary data at the fewest possible states by preprocessing the observed data.The second one is to accurately fit the corresponding PDFs of states required by the next step.The final one is to find out the algorithm to identify the TRM of the underlying Markov chain by those PDFs.Throughout this paper, we focus our attention on the final one since there are a number of readily available approaches to fit those PDFs.
We explore a class of Markov models where reversibility is maintained.Since some realistic systems obey the principle of microscopic reversibility or detailed balance, this implied the reversibility of the Markov chain.At least in applications to ion channels, an assumption of reversibility is plausible in many circumstances, corresponding to channels which are not coupled to a free energy source such as an ion concentration gradient; for discussion of this assumption, see [15].As an example, [16,17] derived a criterion to identify whether it is reversible or not in a three-state Markov system based on the coarse-grained information of two-state trajectories.Throughout the rest of this paper, the unqualified term "Markov chain" usually refers to a reversible Markov chain.
It is well known that classic architectures include stargraph, linear, and cycle chains.Thus, some general constraints and the corresponding algorithms for them were explored in former publications [18][19][20].There is no obvious strategy for selecting the constraints to use for a general model.A key task is to find out the feasible solutions and the corresponding algorithms to different classes.Then, a subsequent question is whether more complicated architectures are identified by such approach.On this point, any one of them can be viewed as a tree provided that it contains no cycles.This implies that the tree is a very important extension of the star graph and line graph and represents one of the two classes of topologies (whether containing cycles or not).It is known that all Markov chains on trees are reversible [21].Thus, SIMC on trees will be exploited in this letter, although some of them are still not applied in real systems.It is derived that all Markov chains on trees are identified by our approach.It is mentioned that, for SIMC on trees, the greatest contribution should be to discover the fundamental way and the corresponding algorithm to identify the TRM due to the challenge of performing the reverse operation as stated earlier.Hence, the work further opens up the possibility of carrying out the statistical identification for all reversible Markov chains.
Such approach has obvious advantages over MLE in that it can identify the most reversible Markov chain without the requirement of bivariate distributions of subsequent sojourn time and hitting time and in that the computation is accurate based on the accurate sojourn time PDFs and the prior information about the underlying topological structure of the Markov chain.
This paper is organized as follows.After recalling the sojourn time distribution, a type of a more operable derivative constraint is developed in Section 2. In Section 3, the solution to SIMC on trees is provided by such derivative constraints.In addition, we address the solutions to three particular and regular representatives of trees, including theoretical conclusions, proofs, and algorithms.Finally, a numerical example is included to demonstrate the correctness of the proposed model.
Assume that the state space may be partitioned as ) . ( The corresponding partition is used for the equilibrium distribution π; that is, π⊤ = [π ⊤  , π⊤  ].
2.1.Distributions of Hitting Time.The sojourn time and hitting time of O can be defined as  = inf{ > 0,   ∉ O} and  = inf{ > 0,   ∈ O}, with the usual convention that the infimum over an empty set is infinity, respectively.They are also the hitting time and sojourn time of C, respectively.For some basic results on aggregated continuous-time Markov chains, refer to [1,2,20,22].Thus, just some of them, required in the sequel, are now recalled.
For ease of exposition, let  be the number of C, and C ≡ {1, 2, . . ., }, Π  ≡ diag( 1 ,  2 , . . .,   ).Note that   is nonsingular since {()} is irreducible, and hence all the eigenvalues of   have strictly negative real parts.Assume that − 1 , − 2 , . . ., −  are all real eigenvalues of   such that   > 0. By symmetric reparameterization with respect to (⋅, ⋅) Π  ,   can be diagonalized with an orthogonal matrix  and we can obtain a matrix  = Π −1/2   and then set  =  −1 and  = diag( 1 ,  2 , . . .,   ).Then, one can get (3) Theorem 1.The PDF of hitting time of O (resp., the sojourn time of C) takes the following form: Note that the above is a sum of -exponentials,   exp(−  ), where −  (  > 0) are the real eigenvalues of   and In particular, if O only contains a single state , that is, O = {}, then the PDF of the sojourn time degenerates to an exponential form.

Corollary 2. The PDF of the sojourn time 𝜎 at a single state 𝑖 takes a form of an exponential
(5)

The Derivative Constraints.
Let Γ = ( 1 , . . .,   ) and Then, (−1)    is the th derivative of the hitting time PDF of O at  = 0.By (6), . By (3),   = −, and then One can apply such operation repeatedly to conclude  ⊤    = (−1)  Π     .A class of constraints commonly used can be derived from the derivatives of the hitting time distribution at  = 0.

Theorem 3. The derivatives of the hitting time distribution of
O at  = 0 are related to the transition rates with the formula It is noted that many other constraints are required to identify the general Markov chains [20].However, derivative constraints are the most common ones.Thus, we will more deeply discover the intrinsic laws in derivative constraints.Further, we will show that only by applying the derivative constraints is it possible to solve most of the Markov chains such as the trees; see the next section.
In particular, the most useful and commonly used constraints are the first three ones.

Corollary 4. The following equations hold:
In particular, a single state  has a more simple and visual form of   , and by far the most common and convenient are those for a single state.
Corollary 5. Observing a single state , one can get the following equations for a tree: The For observations at O, one can obtain the corresponding   .Thus, (6) shows that   ( ≥ 1) is known, which is equal to   times a known constant  ⊤  1 from the observation at O. The reason is as follows.Lemma 6 shows that, based on statistics for a single state , one can obtain by its PDF   for any  ∈ O, followed by   =  ⊤  1  = (1 − ∑ ∈O   )  ( ≥ 1).

Solutions to Trees
As stated in the Introduction, the tree is a very important extension of the star graph and line graph and represents one of the two classes of topologies.It is known that all Markov chains on trees are reversible [21].Thus, SIMC on trees will be exploited in this section, although some of them are still not applied in real systems.In this letter, the term "tree" refers to a connected tree.For a Markov chain on a tree with vertices V 1 , V 2 , . . ., V  and with a TRM  = (  ), its state space is composed of the vertices {V 1 , . . ., V  }, an edge between vertex V  and vertex V  implies the nonzero transition rates   and   , and   /  is the probability of a transition from V  to V  .It is concluded that all Markov chains on trees are identified by our approach.There are various kinds of trees, so we cannot provide individual proofs and algorithms.Without loss of generality, a complete binary tree is provided to show a general proof and solution.According to the classification of Wolfram MathWorld, there are many valid and specific trees: Banana Tree, Cross Graph, E Graph, Fork Graph, Firecracker Graph, H Graph, Spider Graph, and so forth.Cross Graph, E Graph, Fork Graph, H Graph, and Spider Graph can be conformationally viewed as a Star-Branch Graph (a special tree) which can be identified by the algorithms in earlier publications [19,20].Double-Star Graph, Banana Tree, and Firecracker Graph are three particular and regular models of them.For demonstration on solutions of specific trees, those three trees will also be addressed as representatives of trees.
It is mentioned that, for SIMC on trees, the greatest contribution should be to discover the fundamental way and the corresponding algorithm to identify the TRM due to the challenge of performing the reverse operation as stated earlier.

General Conclusions on the Tree.
At the start of this subsection, we present several useful conclusions from the derivative constraints which are convenient to determine the transition rates of Markov chain on trees as the first step.
Lemma 7.For a subchain on a tree such as in Figure 1, the following conclusions hold.
(i) One can obtain the quantities   ,   ,   =   ,   ,   at least from the sojourn and hitting time PDFs at state ; in other words,   ,   ,   =   ,   ,   at least can be expressed in terms of real functions of ,  1 ,  2 , and  3 .
(ii) Further, one can obtain   ,   ,   =   ,   ,   and q k , q jk , q kj , and  k at least from the sojourn and hitting time PDFs at state , provided that the transitions between  and its other children  1 , . . .,   , that is,  , and  , for any  ∈ { 1 , . . .,   }, are all known or have been identified.That is to say,   ,   ,   =   ,   ,   and q k , q jk , q kj , and  k at least can be expressed in terms of real functions of ,  1 , . . .,  5 and  , ,  , ( ∈ { 1 , . . .,   }).
Proof.(i) Firstly,   ,   =   are easy to get by Lemma 6.Secondly, by Corollary 5, Thus, which imply the first assertion.
(ii) On the basis of (i), one needs only to find out the algorithm to obtain   ,   ,   , and   .For ease of exposition, set the constants as  = ∑ and  , are all known for any  ∈ { 1 , . . .,   }.
Firstly, one has Secondly, by Corollary 5, It follows that Finally, by reversibility, it easily follows that which completes the proof.
The proof is completed.
Remark 9.The above conclusion is true by observation at   for any  =  1 ,  2 , . . .,   1 provided that the transitions between  2 and its other descendants are all known or have been identified.
Lemmas 7 and 8 show that if a state  in a tree has  children and the transitions between states  and −1 of the  children as well as the descendants of such −1 children are all known or have been identified, then the transitions between state , the th child (say ), and the parent can be identified by the PDFs of sojourn and hitting time at the th child (i.e., state ).

Main Results on Trees.
Recall the statement in the Introduction; it is trivial to identify the TRM of an underlying Markov chain provided that every state of this chain is observed.People hope that the number of states observed is as few as possible.But there is no strict criterion for determining how many states are observed to use for a general model other than simplicity and ease of solution.However, there is no doubt that it is also of questionable significance to finish it if the number is more than half of the whole chain.Thus, we impose a mild restriction on the number of observed states such that it is smaller than or equal to half of the whole chain.
First of all, a solution is first provided to complete the binary tree, in which all interior nodes have two children and all leaves have the same depth or same level.

Theorem 10. If the initial distribution of a Markov chain on a complete binary tree is the equilibrium distribution, then this tree can be identified by the PDFs of the sojourn time and hitting time for all leaves.
Proof.Consider a complete binary tree with a depth of ℎ; it has 2 ℎ leaves at the last level ℎ and has 2 ℎ − 1 nonleaf nodes (including one root).Now, let us use mathematical induction to prove these facts as follows.
Step 1.We prove the case with 2 levels.For the convenience of expression, let 1-4 be the leaf from the left to the right at the last level, 5-6 be the node from the left to the right at the former level, and 7 be the root (see Figure 3).Let  Step 2. We suppose that this claim is true for any complete binary tree with  (≥ 2) levels.For a complete binary tree with  + 1 levels, we then show it is correct as follows.At this point, a complete binary tree with  levels can be viewed as a left or right child.This means that all transition rates in two complete binary trees with  levels can be identified by observation at all leaves between 1 and 2 ℎ+1 .
For conciseness of notation, let   (resp. −1 ) be a subtree belonging to the left child in the complete binary tree with  + 1 (resp., ) levels, and let 0-( + 1) be the number of the leftmost nodes at each level from the bottom to the top in such complete binary tree with  + 1 levels.
Firstly, by observation at state 0, according to Corollary 5, one has By inductive assumptions, there is only one unknown   in (19).Thus,   can be determined.Again by inductive assumptions, one can obtain  ,+1 =   −  ,−1 −  , (where  denotes the right child of the node ).Analagous to the above, one can get  +1, by Thus,  +1, can be determined by (20).
Those imply that all transition rates from the left subtree (for the tree with +1 levels) can be identified by observations at the first 2 −1 leaves.
Secondly, one can identify the rest (i.e., right part) of the tree with  + 1 levels by duplicate method of the left part.
These have proved that the conclusions of induction are true for a complete binary tree with  + 1 levels.
As a matter of fact, this theorem can also be proved simply by Lemma 8.
Remark 11.This proof shows the corresponding algorithm to determine all transition rates.The solution by observation at all leaves is satisfied by the mild restriction on the number of observed states, although the number of leaves is bigger than the number of nonleaf nodes by one.Because it also can be identified by observation at any 2 ℎ − 1 of those 2 ℎ leaves, for example, without loss of generality, assume that 1-3 are observable for a tree in Figure 3; one can obtain  1 =  15 ,  51 ,  2 =  25 ,  52 ,  5 ,  57 ,  7 ,  75 ,  1 ,  2 ,  5 ,  7 by observations at 1 and 2 and  3 =  36 ,  63 ,  6 ,  3 ,  6 by observations at 3 and then obtain  76 ,  67 ,  6 by observation at 1 or 2; thus,  64 =  6 −  67 −  63 ,  4 = 1 −  1 −  2 −  3 −  5 −  6 −  7 , and  46 =  6  64 / 4 .If the number of all leaves is less than that of all nonleaf nodes for a tree, such as a tree in which many nodes have only one child, then this solution is very valid.These assertions are valid to -tree later.
One can carry this line of argument a bit further to obtain the following result.
Theorem 12.If the initial distribution of a Markov chain on -tree is the equilibrium distribution, then this tree can be identified by the PDFs of the sojourn time and hitting time for all leaves.
One may proceed as previously explained to obtain a more general conclusion for the tree.

Theorem 13. If the initial distribution of a Markov chain on a tree is the equilibrium distribution, then this tree can be identified by the PDFs of the sojourn time and hitting time for all leaves.
Theoretically, the transition rates about state  in a tree can be calculated by PDFS at any leaf from the descendants.However, the one from the shortest path should be optimal to minimize the error propagation.
Furthermore, identification by observation at leaves is just one of the solutions for Markov chains on trees.There are also other solutions for particular trees; for example, the number of leaves is bigger than the number of nonleaf nodes by one.

Solutions to Special Trees.
In this subsection, Double-Star Graph, Banana Tree, and Firecracker Tree, as representatives of trees, will be addressed for demonstration to solutions of specific trees.
From now on, we investigate the algorithm of how to identify such chain.

Theorem 15. If the initial distribution of the Markov chain with Double-Star Graph in Figure
Proof.Firstly,  0 ,  0 and   ,   can be obtained by ( 23) and (24).Secondly, according to Lemma 14,  0 (1 ≤  ≤  − 1) and   ( + 1 ≤  ≤ ) can be obtained by the first two assertions of (26), because all   and   are known from the PDF in Lemma 14 by observation of states 0 and .Finally,  0 (1 ≤  ≤  − 1) and   ( + 1 ≤  ≤ ) can be followed by the final two assertions of (26).
Note that this proof shows the corresponding algorithm to determine all transition rates of such Double-Star Graph chain.In the course of calculation, one can easily derive real roots from comparatively accurate PDFs.

Banana Tree
Graph.An (, )-banana tree, say  , , as defined by Chen et al. [23], is a graph obtained by connecting one leaf of each of  copies of a -star graph with a single root vertex that is distinct from all the stars.Thus, there are (−2) leaves, 2 nodes, and 1 root.This is one of the regular trees.
For  = 2 or  = 3, it is degenerated into the special case of Star-Branch Graph.
For  = 4, there are two leaves for each -star graph.It is implied that the number 2 of all leaves is less than 1, compared with 2 + 1 of all nonleaf nodes.So, observation at all leaves is valid.According to Theorem 13, a normal solution is as follows.For  ≥ 5, the number of all leaves is bigger than that of nonleaf nodes.An optional solution is as follows.

Theorem 17. If the initial distribution of a Markov chain
with Banana Tree Graph  , ( ≥ 5) is the equilibrium distribution, then it can be identified by the PDFs of the sojourn time and hitting time for the root state and all node states, that is, all nonleaf states.Proof.For ease of exposition, suppose that  is the root (see the previous introduction of banana tree in this subsection) of this tree and that  is the central state,  is the leaf, and  are the others at each -star graph, respectively.Firstly, the corresponding   ,   ,   ,   ,   , and   can be obtained by each nonleaf state.Secondly, according to the proof of Double-Star Graph, all   =   ,   can be obtained by observation at all nonleaf states, because their   is similar to that of observation at { 1 ,  2 } for Double-Star Graph.Finally, for each -star graph,   =   − ∑    ,   =     /  ,   =   −   , and   =     /  .

Firecracker
Graph.An (, )firecracker, say  , , is a graph obtained by the concatenation of  -stars by linking one leaf from each [23].
For  = 2 or  = 3, it is degenerated into the special tree which is similar, but not completely equivalent, to a Star-Branch Graph.For  ≤ 4, there are two leaves for each -star graph.It is implied that the number 2 of all leaves is identical to that of all nonleaf nodes.So, observation at all leaves is valid.According to Theorem 13, a normal solution for  ≤ 4 is as follows.

Theorem 18. If the initial distribution of a Markov chain with
Firecracker Graph  , ( ≤ 4) is the equilibrium distribution, then it can be identified by the PDFs of the sojourn time and hitting time for all leaves.
For  ≥ 5, the number of all leaves is bigger than that of nonleaf nodes.Following the technique of proof in Theorem 17, it is easy to verify such optional solution.

Theorem 19. If the initial distribution of a Markov chain with
Firecracker Graph  , ( ≥ 5) is the equilibrium distribution, then it can be identified by the PDFs of the sojourn time and hitting time for all nonleaf states.

Numerical Example.
To demonstrate how to apply our algorithms to real data, we present a numerical example here.As we mentioned before, we focus on the final step and then we omit the preprocessing of data.Thus, the data we have is the observed PDFs of sojourn time and hitting time at each leaf.Based upon these data, we could find all the transition rates of this tree.
Example 1.The model of a Markov chain is a binary tree in Figure 3 with the true transition rates matrix  = (  ) × given by  = ) . (30) We can divide the calculation into two steps: fitting sojourn time and hitting time histogram and transition rates estimation.

Figure 3 :
Figure 3: Schematic plot of a complete binary tree with two levels.1 and 2 are leaves and two children of state 5; 3 and 4 are leaves and two children of state 6; state 7 is the root.

Figure 4 :
Figure 4: Schematic plot of Markov chain on a Double-Star Graph.

Theorem 16 .
If the initial distribution of a Markov chain with Banana Tree Graph  , ( ≤ 4) is the equilibrium distribution, then it can be identified by the PDFs of the sojourn time and hitting time for all leaves.
right-hand side of the first equation gives a sum over all exit rate flows for state  because of     =     .Conditional upon the first equation, the second and the third one are obvious.The expression within the bracket of  2+1 (resp. 2+2 ) is the sum over rates of possible (2 − 1)-step (resp., 2-step) transitions departing from a state , having a direct transition to , and returning to itself.Those possible transitions imply the intrinsic relationships to   so that most transitions can be identified by those hitting PDFs.