An Expectation Maximization Algorithm to Model Failure Times by Continuous-Time Markov Chains

In many applications, the failure rate function may present a bathtub shape curve. In this paper, an expectation maximization algorithm is proposed to construct a suitable continuous-time Markov chain which models the failure time data by the first time reaching the absorbing state. Assume that a system is described by methods of supplementary variables, the device of stage, and so on. Given a data set, the maximum likelihood estimators of the initial distribution and the infinitesimal transition rates of the Markov chain can be obtained by our novel algorithm. Suppose that there are m transient states in the system and that there are n failure time data. The devised algorithm only needs to compute the exponential of m × m upper triangular matrices for O nm2 times in each iteration. Finally, the algorithm is applied to two real data sets, which indicates the practicality and efficiency of our algorithm.


Introduction
Among many quantitative analysis methods of system reliability, the state space representation has often been employed by reliability engineers, for example, 1-5 .The state space representation has been recognized by industrial IEC61511 standards 6 .This method assumes that the structure of the system is modelled with a continuous-time Markov chain CTMC .In the state space representation, nodes represent states of the system and arcs represent the transitions between nodes.This method is well adapted to study the reliability of various systems and allows an exact analysis of their probability of failure.
In the state space representation method, the time interval between two consecutive transitions is a random variable of an exponential distribution.However, many studies have presented counterexamples.For some mechanical and electronic components, the failure rate function of time to failure TTF has a bathtub curve: a monotonically decreasing function initially, eventually becoming a constant, and finally changing to a monotonically increasing function after sufficient time elapses 7 .Because existing approaches do not provide a perfect solution to deal with this class of systems, several researchers have explored models with nonexponential distributions see 8-10 and references therein .Among many approaches treating TTF with nonexponential distributions, the extended Markov-model 11 is recommendable.In the extended Markov-model, an operation state is divided into substates with different levels of failure rates, which result in a nonconstant failure rate of the operation state.That is, in these models, the bug of failure rate functions is fixed within the framework of the state space representation see 12, 13 and references therein for some applications .
Methods of supplementary variables 14 and the device of stages 15 are two classical approaches of extended Markov-models.These two approaches allow one to model a wide range of experimental probability density functions through a set of additional stages in series or in parallel.For supplementary variables with given parameters, a formula for evaluating the occurrence of failures is derived in 16 .In fact, with suitable parameters, additional stages can approximate the behavior of many types of distributions in practice.Reference 17 is a recent application of these approaches.
Assume that the structure of a system is established.That is, the nodes and arcs of the system are given.For a experimental data set on the TTFs of the system, the bottleneck of the above two approaches is the estimation of parameters for the nodes and arcs.Since the system is modelled with a CTMC, the TTFs are phase-type distributed random variables.The problem is similar to phase-type fitting methods 18 .However, traditional phase-type fitting methods give the maximum likelihood estimators of the minimal representation 18 , which may not correspond to the system with the expected structure.Moreover, from the standpoint of computational complexity, traditional methods are not suitable for the system with many paths.
Inspired by the theory of continuous-time Markov chains observed in continuous-time channels see 19, 20 , we propose a new expectation maximization EM algorithm for data of TTFs in this paper.In Section 2 we specify the framework and hidden variables for the system.In Section 3 we derive several conditional probabilities and conditional expectations, which will be used in the EM algorithm.In Section 4 we present the new EM algorithm.In Section 5, we provide numerical results.Section 6 concludes the paper.

System Definition and Hidden Variables
Assume that a system is modelled by a CTMC X : {X t : t ≥ 0} with transient states 1, 2, . . ., m, and one absorbing state m 1.The infinitesimal transition rates Q q ij of X satisfy q m 1,m 1 0, q ii < 0, for i / m.

2.1
The quantities q ij , i / m 1, are unknown.However, the characteristic matrix C c ij , The state transition diagram of a device of stages.

State 1 State i State m
State m 1 which represents the structure of the system, is the known.Assume that the characteristic C is an upper triangular matrix, that is, c ij 0 for i < j.We suppose that the initial probability function π π 1 , π 2 , . . ., π m 1 satisfies π m 1 0 and π 1 , π 2 , . . ., π m are unknown.

Example 2.1
The device of stages .The system has a state transition diagram shown in Figure 1.The characteristic matrix C c ij of this system is Hence C is an upper triangular matrix.
Example 2.2 The parallel system .The system has a state transition diagram shown in Figure 2. The characteristic matrix C c ij of this system is c i,j 1, for j m 1, c ij 0, for j / m 1.

2.4
Hence C is also an upper triangular matrix.
Like the above two examples, the structures of many practical systems can be represented by upper triangular characteristic matrixes.
We define a path, the sequence of nodes visited from an initial node to the absorbing node m 1.The kth path, r k n 1 , n 2 , . . ., n s , is characterized by a set of nodes n 1 , n 2 , . . ., n s , representing the nodes visited before absorption, where c n i n i 1 1 for i 1, 2, . . ., s − 1 and c n s ,m 1 1.Since the characteristic matrix C and hence the infinitesimal transition rates Q, are upper triangular matrixes, any node will never appear again in a path.So the length of a path is no more than m 1, and hence the number of paths is finite.
According to the above discussion, similarly to part b of Theorem 5.4 in 21 , we can provide the following noncanonical construction of a Markov chain with the transition rates Q and the initial distribution π, respectively.The hidden variables, which comprise the noncanonical construction, will serve the EM algorithm in Section 4. Lemma 2.3.For k 1, 2, . . ., n, let T k 1 , T k 2 , . . ., T k m be a sequence of exponentially distributed random variables with parameters −q 11 , −q 22 , . . ., −q mm , respectively.For k 1, 2, . . ., n, let J k 0 , J k 1 , J k 2 , . . ., J k m be a sequence of random variables with discrete distributions described by probability functions

2.6
Then the constructed process X k : {X k t : t ≥ 0} is Markovian with the initial distribution π and the infinitesimal transition rates Q.Moreover, X k , k 1, 2, . . ., n are independent.
For a sample of J :

indicates that there is a unique path corresponding to the discrete-time embedded Markov chain Y k
0 , Y k 1 , . . . .We denote this path as R J .Write j ∈ R J if the node j appears in the path R J .Given a path R J , define the first node F J 0 J k 0 for this path, and the following node F J j J k j for any node j ∈ R J .A sample τ 1 , τ 2 , . . ., τ n of the TTFs for a system is a family of independent identically distributed random variables.Precisely, the stochastic processes X k , k 1, 2, . . ., n, are independent, which have the same infinitesimal transition rates Q and the same initial probability function π.Then define

2.7
Using the above notations, we have

2.8
Based on the observations τ 1 , τ 2 , . . ., τ n and the known characteristic matrix C, our problem is to give estimators π and Q for the unknown parameters.

Conditional Probabilities and Conditional Expectations
To establish an EM algorithm for our problem, some conditional probabilities and conditional expectations will be calculated in this section.From now on, we will omit the superscript k of the stochastic processes when the clarity is being held.Firstly, we recall the cumulative distribution function of the TTF τ, where the vector 1 T D of dimensions m 1 is the transpose of 1 D 0, 0, . . ., 0, 1 .So the probability density function of the TTF τ is

3.3
The above methodology can be used to calculate other probabilities and expectations in this section.
Lemma 3.1.For the above noncanonical construction of X in Lemma 2.3, we have for Here Q i q rs is a matrix of dimensions m 2 × m 2 determined as follows: 3.5 1 P 0, 0, . . ., 0, 1, 0 and

3.6
Proof.Define the events Considering all possible realizations of J J 0 , J 1 , . . ., J m , it follows from 2.8 that Since the characteristic matrix C is an upper triangular matrix, the numbers of nodes along a path are monotonic increasing.Then if J j 0 , j 1 , . . ., j m satisfies R J i, we have j 0 ≤ i and Considering another stochastic process X with the infinitesimal transition rates Q i and the initial probability function π i , we can deduce a noncanonical construction consisting of J 0 , J 1 , . . ., J m , T 1 , T 2 , . . ., T m .It follows from 3.9 that For a path J of the stochastic process X , the property R J i means that the end of the path J is the absorbing state m 1. Similarly to 3.2 , we have P A t P X t m π i exp tQ i 1 T P .

3.11
The result follows from the above equation and Similarly to the discussion in Lemma 3.1, we have the following Lemma.
Lemma 3.2.For the noncanonical construction of X in Lemma 2.3, where J J 0 , J 1 , . . ., J m and p i P i ∈ R J .
p i in Lemma 3.2 can be expressed through the 1-step transition probability matrix of the embedded Markov chain of the continuous-time Markov chain with the infinitesimal transition rates Q i .However, since p i will be eliminated in the further propositions, it is not necessary to present the expression.Theorem 3.3.For the noncanonical construction of X in Lemma 2.3, 1 ≤ i ≤ m, 1 ≤ j ≤ m 1, and t ≥ 0, we have

3.14
Here 1 P and π i are the same as those in Lemma 3.1.Q ij q rs is a matrix of dimensions m 2 × m 2 determined as follows: q it − q ij , r i, s m 2, 0, otherwise.

3.15
Proof.If c i,j 0, it is obvious that P i ∈ R J , j F J, i , τ ≤ t 0. Otherwise, by constructing an auxiliary stochastic process with the infinitesimal transition rates Q ij and the initial probability function π i , we can obtain, similarly to the proof of Lemma 3.1, that where J J 0 , J 1 , . . ., J m .The required result can then be derived in the similar way as that in Lemma 3.1.Theorem 3.4.For the noncanonical construction of X in Lemma 2.3 and i ≤ m, t ≥ 0, we have 3.17 where J J 0 , J 1 , . . ., J m .I is an identity matrix and M i M rs is a matrix of dimensions m 2 × m 2 determined as follows: q rs , i<r≤ m 1, i < s ≤ m 1, 0, otherwise.

3.18
Mathematical Problems in Engineering

3.19
Proof.It follows from 2.8 that given i ∈ R J , where T i and T i j∈R J ,j / i T j , are independent random variables.Given i ∈ R J , the conditional density function of T i is f i t −q ii exp q ii t , for t ≥ 0.

3.21
Given i ∈ R J , suppose that the conditional density function of T i is g i t , where g i t 0 for t < 0. Since the Jacobian of the transformation x, y → x, x y is 1, the joint density of T i , τ is given by f t 1 , t 2 f i t 1 g i t 2 − t 1 .Then for t ≥ 0, we have

3.22
Here the conditional density function As to the cumulative distribution function of T i , given i ∈ R J , we consider another stochastic process X with the infinitesimal transition rates M i and the initial probability function π i,T .It follows from Lemma 2.3 that X has a noncanonical construction consisting of J 0 , J 1 , . . ., J m , T 1 , T 2 , . . ., T m .Similarly to the discussion in Lemma 3.1, we have

3.23
Mathematical Problems in Engineering 9 where J J 0 , J 1 , . . ., J m .Hence P T i ≤ t, i ∈ R J π i,T exp tM i 1 T P .

3.24
Then it follows from 3.13 and 3.24 that

3.25
Since the probability in 3.25 may jump at t 0, the conditional density function corresponding to 3.24 is where δ t is the Dirac function.p i P i ∈ R J is the same as that in Lemma 3.2.Then the integral in 3.22 becomes

3.27
The desired result follows from the above equation and Lemma 3.2.
Remark 3.5.The integral in 3.17 can be efficiently computed by using an approach developed by Van Loan 22 .To apply that approach here, we first compute the exponential of the following 3 m 2 × 3 m 2 block triangular matrix: exp

3.28
Then the upper right block H t of this matrix gives the integral t 0 t − u exp u M i − q ii I du.Moreover, the matrix exponentials can be efficiently performed by using the diagonal Padé approximation with repeated squaring, as recommended in 22, 23 .
Remark 3.6.Sometimes, the denominator of 3.17 is 0. For example, under some characteristic matrices, a node i may never be reached.Since the estimation of T i is nonsense under this case, we can assign any value to E T i | i ∈ R J , τ t .We suggest −1/q ii .
The following corollary can be deduced from the above theorems and the tower property of conditional expectations.

Mathematical Problems in Engineering
Corollary 3.7.For the noncanonical construction of X in Lemma 2.3 and π i e i exp tQ Q1 T D .

3.29
Here e i is the unit basis vectors 0, 0, . . ., 1, 0, . . ., 0, 0 with "1" in the ith position.J J 0 , J 1 , . . ., J m and Proof.The first equation follows from Theorem 3.4 and the tower property of conditional expectations.The second equation follows from

3.30
As to the third equation, we have

3.31
From the solution to the Kolmogorov's forward equation in the exponential form, we obtain P τ ≤ t | J 0 j π i e i exp tQ Q1 T D .

3.32
Similarly to 3.12 , we can see that the third equation follows from the above two equations and 3.2 .

EM Algorithm
The EM algorithm is a two-step iterative process for computing the maximum likelihood estimate.This process is terminated when some imposed measure of convergence for the sequence of estimators is satisfied.A filter-based form of the EM algorithm for a Markov chain was presented in 24 and was developed in 25 .Here we review firstly the filter-based EM framework.Let θ index a given family of probability density functions f θ for some hidden random variable X i , where θ is an element of a parameter space Θ.Then based on a hidden complete data X 1 , . . ., X n , a likelihood L can be written as Let a σ-field F σ{X 1 , . . ., X n }.For a random variable Y ∈ F, E θ * Y denotes the expectation of Y with assumption that θ * is the parameter of X i s.Suppose another σ-field Y ⊂ F.

Mathematical Problems in Engineering 11
Two iterative steps in the EM algorithm are as follows.
1 Expectation step: fix θ * θ t , then compute Q •, θ * , where .and J . . .However, our choice of the hidden random variables results in a simple E-step, and it leads to an explicit M-step for estimating the parameters.
In this case, the complete data likelihood L can be written as

4.4
The factors q i J k i in 4.4 represent the probability functions of the hidden random variables J k i .The factors −q ii exp q ii T k i represent the density functions of T k i .Taking the logarithm of 4.4 yields where I J k i j 1 if J k i j and otherwise I J k i j 0. A new parameter estimate, say θ π, Q , is obtained as the parameter which maximizes the expected value of 4.5 given τ 1 , τ 2 , . . ., τ n and a current parameter estimate θ * π, Q .The constrained maximization gives us the following new estimate: where the conditional probabilities and the conditional expectations are given in Corollary 3.7.Then it follows from 2.5 that π j q 0 j , q ij − q ii q i j , for i / j, i / 0. 4.7 These expressions constitute the M-step of our EM algorithm.

Two Applications
In this section, we present two examples to illustrate the results derived in the previous sections.
The first data set is a widely used sample of 50 components taken from 26 , which possess a bath-shaped failure rate property.These data consist of times to failure of 50 devices put on life test at time 0. The data is shown in Table 1.
Using our EM algorithm with m 3, which means that there are transient states 1, 2, 3, and one absorbing state 4, we obtain  Another real set of lifetime data set, taken from 27 , is the failure data collected during unit testing.During a unit testing phase, the number of failures in a certain time interval of equal length is recorded.The total number of units tested is equal to 311.The data is given in Table 2.
Using our EM algorithm with m 3, we obtain

5.3
The TTT plots for this data set and the Monte Carlo simulated TTFs are given in Figure 4.

Conclusions
One great advantage with the extended Markov-models is that these models can tackle operation states of nonexponential distributions under the framework of the state space representation.Based on the theories of continuous-time Markov chains and EM algorithms, this paper deals with the problem of parameters estimation for extended Markov-models.Given a data set with n failure time data, for a system with m transient states, the proposed EM algorithm only needs to compute the exponential of m × m upper triangular matrices for O nm 2 times in each iteration.The effectiveness of the proposed EM algorithm is well demonstrated through two real world examples.Two real applications show the practicality and efficiency of our new EM algorithm.Nevertheless, one theoretical limitation about EM algorithms is that its iterations sequence converges to a stationary point of the likelihood.Only in the case that the likelihood function is convex, the global convergence from any initial value can be ensured.However, our experience with many examples is against this theoretical drawback.In few practical cases, the stationary points are saddle points.As a simple perturbation can lead the algorithm to diverge from the saddle point, this phenomenon is thus not a problem in practice.However, to avoid that the limit point is only a local maximum of the likelihood, the initial value must be carefully selected in some cases.We suggest to use moments of related random variables to construct the initial estimates.
Of course, it will be interesting to investigate conditions ensuring the convergence to the global maximum of the likelihood.This is left for future research.

Figure 2 :
Figure 2: The state transition diagram of a parallel system.
of the failure rate function, the total time on test TTT plot 26 was used to show the efficiency of the selected model.The TTT plot is obtained by plotting n.Here r 1, . . ., n and T i:n , i 1, . . ., n, are the order statistics of the sample.To check the efficiency of the result, Monte Carlo simulations have also been carried out.We generated 500 independent TTFs, with the estimated parameters π, Q. Due to the sensitivity of TTT plot to T n:n , the simulated TTFs are truncated by the maximum of the data set.The TTT plots for the first data set and the truncated simulated TTFs are shown in Figure3.

Figure 3 :
Figure 3: The TTT plots for the first data set and the simulated TTFs.The circles represent the real data and the solid points represent the simulated results with the estimated parameters.

Figure 4 :
Figure 4: The TTT plots for the second data set and the simulated TTFs.The dashed boxes represent the real data and the solid points represent the simulated results with the estimated parameters.
Under this framework, we propose an EM algorithm which attempts to find a maximum likelihood estimate MLE of the parameter θ π, Q .Here Y is a σ-field generated from a sequence τ 1 , τ 2 , . . ., τ n of TTFs.As for the hidden complete data, we

Table 1 :
Failure time data from 26 .

Table 2 :
Failure time data from 27 .