Parameter Estimation of a Multistate Model for an Aging Piece of Equipment under Condition-Based Maintenance

We study a multistate model for an aging piece of equipment under condition-based maintenance and apply an expectation maximization algorithm to obtain maximum likelihood estimates of the model parameters. Because of the monitoring discontinuity, we cannot observe any state’s duration. The observation consists of the equipment’s state at an inspection or right after a repair. Based on a proper construction of stochastic processes involved in the model, calculation of some probabilities and expectations becomes tractable. Using these probabilities and expectations, we can apply an expectation maximization algorithm to estimate the parameters in the model. We carry out simulation studies to test the accuracy and the efficiency of the algorithm.


Introduction
A multistate model may provide more flexibility than a traditional binary state model for modeling equipment conditions.In a multistate model, a piece of equipment is allowed to experience more than two possible states, for example, completely working, partially working, partially failed, and completely failed.Even if every state has an exponential duration distribution, the equipment has a nonexponentially lifetime distribution.Therefore, many authors utilize multistate models for equipment in many disciplines.Examples arise from electric power systems 1 , the nuclear science 2 , electron devices 3 , and so on.Furthermore, although the deterioration process of a system is usually continuous, it is convenient to model it by several discrete states.For example, in 4 , a deterioration process is classified into four states: initial, minor deterioration, major deterioration, and failure.Besides, 5, 6 define several discrete states for a system based on different ranges of the hazard rate of the system; that is, a deterioration process is often modeled by a multistate model.
On the other hand, maintenance is an indispensable action to keep the reliability level of industrial equipment.To reduce the total maintenance costs, condition-based maintenance is usually implemented.References 7, 8 are good overviews of condition-based maintenance.In this paper, condition-based maintenance involves three key ideas: a routine inspection, a condition indicator, and a repair.Routine inspections are executed with a fixed benchmark interval.In an inspection, we can acquire the current condition indicator for the equipment.An appropriate repair is carried out when the indicator is not less than a warning value.Moreover, we also record the condition indicator right after the equipment is repaired.This kind of condition-based maintenance is recommended by an IEEE standard 9 , and there are many types of equipment operating in such mode 10, 11 .We assume that every state of the equipment has an exponential duration distribution specified by its parameter.In the paper, we investigate the parameter estimation problem, based on the data obtained from routine inspections and repairs.
Such a parameter estimation problem for a multistate model plays an important role in many applications 12, 13 .The problem attracts much attention, and many approaches are proposed.Reference 5 reviews several quite sophisticated empirical approaches.A method of curve fitting is presented in 12 for the probability density function pdf of the equipment lifetime.In 14-16 , based on proportional hazards models, different approximate methods are proposed to obtain a maximum likelihood estimate MLE .In these methods, it is assumed that the condition indicators are time independent or the state of the equipment is fixed within two consecutive inspections.Those assumptions are reasonable only for a small benchmark interval or frequent inspections.However, frequent inspections may increase the total costs and may induce a high chance of accidents.
There is an alternative method to obtain an MLE.To better illustrate the idea, we start from a complete data set of every state's durations through continuous monitoring.As it has an exponential duration distribution, the reciprocal of the average of durations of a state is the MLE of its parameter.This simple example gives a clue to the case of conditionbased maintenance.Then, given the incomplete data from routine inspections and repairs, we can calculate conditional expectations of every state's duration, and we may deduce an MLE.The theoretical foundation of the above discussion is the expectation maximization EM algorithms.The EM algorithms are first introduced in 17 to calculate an MLE for incomplete data.Due to its theoretical property, see, 18, 19 e.g., , the EM algorithms have good performance in many applications.
Recently, a few researchers have tried to use EM algorithms for problems similar to ours.For example, based on a complicated discussion, 20 applied an EM algorithm to a specific 3-state model under devised condition-based maintenance.However, there is no hint in 20 to extend the result to a general multistate model.In general, although an EM algorithm may have good performance, it is difficult to apply an EM algorithm to a specific model.Besides the choice of hidden random variables or stochastic processes for a model, the calculation of involved conditional expectations is often difficult.
In this paper, we first propose a proper mathematical framework and a special technique of distributions to simplify the computation of conditional expectations.Then we apply an EM algorithm to obtain MLEs of the model parameters, with the observation of routine inspections and repairs.The paper is organized into seven sections.Section 2 establishes a model for our problem.In Section 3, we apply an EM algorithm, which is based on several conditional expectations, to the model.Section 4 establishes formulas of these conditional expectations.Section 5 presents simulation results, and Section 6 presents an application to a real-world dataset.The paper is concluded in Section 7.

Models
In this section, we propose the mathematical framework for our problem.In the following Model 1, we present basic concepts of a new multistate model for an aging piece of equipment under condition-based maintenance.Besides, specific hidden random variables are proposed in Model 1.
Model 1.In the multistate model, a piece of equipment with m discrete states under conditionbased maintenance can be described by a predefined warning value W ≤ m and two families of random variables T i T In Model 1, for both T j i and R j i , the subscript i indicates a time index, and the superscript j presents the state of the equipment at the moment.Concretely, we assume that the state of the equipment is j at a time instance i and a repair takes place, and R j i is the state of the equipment right after the repair.And T j i denotes the residual duration of state j after a time instance i, given that there is no maintenance again.As introduced in Section 1, the duration of state j has an exponential distribution, and θ j is its parameter.Due to the memoryless property of an exponential distribution, the distribution of the residual duration is also exponential.Here and in the following, we assume that the benchmark interval is 1 for simplicity of argument.
To introduce some derivational concepts to describe the behavior of the equipment clearly, we introduce several families of auxiliary random variables.These random variables, Y i , M i , and I i , are recursively defined.First, M 0 1, I 0 0.
For i ≥ 1, where Y i that indicates the state of the equipment at a time instance i.In fact, assume that the state of the equipment is M i−1 at a time instance i − 1, Y i is the maximal number s such that the total duration of states M i−1 , . .., and s − 1 is less than 1.In other words, between two consecutive inspections at time indexes i − 1 and i, the equipment goes through states where I k indicates the time index when the kth repair takes place.Moreover, Here M i denotes the state of the equipment right after a time instance i, whether there is maintenance or not.In fact, if Y i < W and no repair takes place, we have there is a proper repair, and M i denotes the state of the equipment right after the repair, as showed in 2.3 .

EM Algorithm
We now apply the EM algorithm to estimate the parameter θ j , j 1, . . ., m − 1, given n pairs of observations y i and m i−1 , i 1, 2, . . ., n.
The EM algorithm, introduced in 17 , is an iterative solution to MLE.Each iteration consists of two steps.In the Expectation step, the following functions Q θ | θ are sought: where parameters θ, θ ∈ R m−1 .θ stands for parameters to be estimated in the current iteration, and θ denotes the parameters estimated in the previous iteration.As indicated in 17 , the expectation in 3.1 is taken over the hidden random variable T i and R i and L θ is the joint pdf of T i and R i , with the parameter θ .Estimates of the parameters θ are obtained in the maximization step by solving the following optimization: The estimation formulas are established by setting As the pdf of the repairs R i has nothing to do with θ and the joint pdf of T i , i 1, 2, . . ., n has the following form:

3.5
Based on the theory of EM algorithms, see, 17-19 e.g. , we propose the following algorithm to calculate the MLE of θ j , j 1, . . ., m − 1.
Step 3. Stop and output θ t if the stop criterion is satisfied.
Step 4. Set t 1 → t and go to Step 2.

Conditional Expectations
To calculate the conditional expectations involved in 3.5 , we propose some lemmas and theorems in the following.Lemma 4.1.Assume that random variables U 1 , . . ., U r are independent, U i has an exponential distribution specified by a parameter λ i , and, for i / j, we have λ i / λ j .Then, the summation S r i 1 U i has a pdf g x; λ 1 , . . ., λ r .Here, for x ≥ 0. c i is a function for arguments λ 1 , . . ., λ r , defined by

4.2
The above denominator is λ i when r 1.
The proof of Lemma 4.1 is based on the properties of polynomials and is presented in Appendix A. Further, we can calculate a useful conditional expectation and some basic probabilities based on Lemma 4.1.

Lemma 4.2. Under the assumption and notations in Lemma 4.1, we have the conditional expectation
Proof.Let S i S − U i , then S i and U i are independent.Based on Lemma 4.1, the pdf f S i x of S i has a form for x ≥ 0, where 4.5 Let f u, x be the joint pdf of U i and S. Since S i and U i are independent and S U i S i , we have the following: Hence, the conditional pdf of U i with respect to S is In the sequel, it follows from the expression of f S i x that

4.8
The result follows from the above equation. 4.9 Proof.By computing the definite integral of the pdf of S on the interval 0, 1 , we can obtain the expression of P S ≤ 1 .
It follows from the independence among U 1 , . . ., U r and V that S r i 1 U i and V are independent.Therefore,

4.10
Mathematical Problems in Engineering 7 Here and in the following, D { x, v |x ≤ 1, x v > 1}.Besides, f V x is the pdf of V .In the sequel, follows from the above integral.
Then we can calculate several auxiliary important conditional expectations.
Theorem 4.4.Under the assumption and notations of Lemma 4.3, we have

Mathematical Problems in Engineering
Proof.In the following, we calculate three expectations respectively.
We have the following: 4.17 Here f S x and f V v are pdfs of random variables S and V , respectively.We have f S x g x; λ 1 , . . ., λ r .The denominator is presented in Lemma 4.3.Similarly to the proof of Lemma 4.3, the above numerator is
Case 2. We have When r 1, it follows from S U 1 that we have When r > 1, according to the total expectation formula, we have

4.20
Then it follows from Lemma 4.2 and 4.20 that . ., λ r is the above definite integral.
Case 3. We have and the result follows.
When r > 1, similarly to the discussion of Case 2, we have

4.23
As U 1 , . . ., U r and V are independent, and S is a function of U 1 , . . ., U r , we have Hence, it follows from Lemma 4.2 that

4.24
The result follows from the above equation.
Now we turn to the following Lemma 4.5 based on the theories of measure and stochastic processes.Please see Appendix B for a proof of Lemma 4.5.Lemma 4.5.For random variables involved in Model 1, we have the following:

4.25
Here, 0 Finally, we arrive at conditional expectations in 3.5 , which is necessary to Algorithm 3.1.Theorem 4.6.We assume that there is a guess θ θ 1 , . . ., θ m−1 for the parameter vector of Model 1 for which θ i / θ j if i / j.Let ex When y k 1 m k , we have the following:

4.26
When y k 1 m, we have

4.27
Finally, when y k 1 / m and y k 1 / m k , we have Proof.Except some trivial cases, this result follows from Theorem 4.4 and Lemma 4.5.

Simulation Results
In this section, we test the efficiency and accuracy of the EM algorithm, based on simulation data.
At first, we investigate the necessity of the condition "θ i / θ j for i / j" in Theorem 4.6.We can find similar conditions for many theorems and lemmas in Section 4, and there is the root in Lemmas 4.1 and 4.2.Although these conditions make our analysis clear and convenient, they appear to be obstacles for an application.Now we turn to the assumption of Lemma 4.2.Assume that random variables U 1 , U 2 , and U 3 are independent, and U i has an exponential distribution specified by a parameter θ i .Based on Lemma 4.2, we can compute the conditional expectation of U 1 conditional on the summation S 3 i 1 U i to be any value, for example, 1.Let θ 3 4 and We draw the surface of B θ 1 , θ 2 in Figure 1. Figure 1 shows that although B θ 1 , θ 2 does not exist if θ 1 θ 2 , θ 1 θ 3 , or θ 2 θ 3 , but by filling proper values, B θ 1 , θ 2 is continuous.That is, B θ 1 , θ 2 has some removable discontinuities.The result is obvious from the viewpoint of engineering; why expectations in Section 4 have "jumps" for any special cases?Similar conclusion can be obtained for the assumption of Lemma 4.1 from the viewpoint of mathematics.In fact, the pdf of U i is an analytical function on θ i > 0. As a convolution, the pdf of S U 1 U 2 U 3 is also an analytical function on θ i > 0, i 1, 2, 3.And any discontinuity is removable for an expression of such a function.
Similar discussions can be applied to other theorems and lemmas in Section 4. In a word, these conditions may not trouble us for any application.For a numerical implement  of the EM algorithm, if parameters become the same by accident, a small perturbation can fix the problem.
Secondly, we present a method to simulate the behavior of a 4-state model for an aging piece of equipment under condition-based maintenance, based on some preset parameters.The simulation can provide us a test data set.
Without maintenance, the equipment would run through states 1, 2, 3 orderly to a failure state 4. With maintenance, the straight path to a failure is regularly deflected by inspections and maintenance.In an inspection, if we find that the current state is 1 or 2, no repair is applied.If the state is 3, an appropriate repair is carried out.In this case, the state of the equipment right after the repair is a random variable, which is 1 with probability 0.1, is 2 with probability 0.3, and is 3 with probability 0.6.If the state of the equipment is 4, we must replace it with a new piece of equipment, and the state right after this replacement is 1.The duration of states 1, 2, and 3 have exponential distributions specified by parameters 0.3, 0.29, and 0.5.These parameters are suggested by 21 .Now we turn to five experiments in which we test the efficiency and accuracy of the EM algorithm, based on simulation data provided by the above method.
In the first experiment, we study the relationship between convergence error and the number of iterations.As EM algorithms are recursive, this experiment will provide us with an intuitive stop criterion for Algorithm 3.1.For a data set from 100 inspections, Figure 2 shows that all curves of the estimated parameters θ 1 , θ 2 , and θ 3 converge while the number of iterations increases.We can conclude from Figure 2 that for 100 or more iterations, we can obtain reasonable results.
In the second experiment, we run the EM algorithm for different initial values.Figure 3 draws the estimated parameter pairs θ 1 , θ 2 for 100 different initial values which are chosen randomly from 0.001 to 15.We can see that the EM algorithm converges to the same result for a great range of initial values.
In the third experiment, we run the EM algorithm for different sample sizes.Given the number of inspections I 20, 40, 100, 200, 400, and 1000, we run the simulating equipment and obtain 6 data sets.Then based on these data sets, we estimate parameters θ 1 , θ 2 , and θ 3 , respectively.For the EM algorithm, the initial values of estimated parameters are θ 1 θ 2 θ 3 0.1, and the number of iterations is 100.Table 1 shows the estimated parameters for different data sets.Of course, the estimated parameters are nonmonotonic for different numbers of inspections, which is inherent to such a random problem.
In the fourth experiment, we repeat the third experiment 100 times to obtain the standard errors of each estimated parameter.Table 2 shows the standard errors of the estimated parameters.Again, all cases demonstrate a decrease in the standard errors, when the number of inspections increases.We can conclude from Tables 1 and 2 that, for 100 or more inspections, we can obtain reasonable estimated parameters with an acceptable standard error.
Finally, in the fifth experiment, we compare our method with another method.
As indicated in Section 1, 14-16 used an assumption that the state of the equipment is fixed during two consecutive inspections for their problems.Under this assumption, we can establish an MLE for every parameter.For n pairs of observations Y i and M i−1 , i 1, 2, . . ., n, we write V 1 0, S 1 M 0 , and for k ≥ 1, Here V k records the moment when the state of the equipment is changed and S k records the corresponding new state.Let N j k be the number of the set {l ≤ k| S l j} .Then, under the assumption of 14-16 , records the N j k th duration of the state j of the equipment.Finally, let the total number of changing K be the maximal subscript of V k .Then, we have the following MLE:

5.4
On the other hand, although we assume that, in our method, the benchmark interval is 1 for simplicity of argument, it is not too difficult to implement our method for other benchmark intervals.We run the simulating equipment for 400 time units and obtain inspection data sets for different benchmark intervals, respectively.
Figure 4 shows the estimated parameter θ 1 of two methods.We can conclude that for frequent inspections, the corresponding benchmark interval is less than 0.2 two methods have similar behaviors.However, we can find that two methods show a striking contrast when the benchmark interval is not less than 0.2.As indicated in 21 , the latter is more important for practical situations.As our method is stable for all cases, it is more practical.

Application
In this section, we apply the EM algorithm to a real-world dataset.The dataset is a 25-year record of inspections and repairs of a power transformer substation under condition-based maintenance.The power transformer substation consists of two power transformers, and it is a cool backup system.Once the primary transformer incurs faults, the secondary transformer replaces it automatically, and it can generally serve the entire load.The benchmark interval of inspections is three months.During an inspection, a transformer is repaired if it incurs faults, and the repaired primary transformer is put into operation.The repair is perfect because we can replace a transformer if necessary.
The behavior of the power transformer substation can be described by a 3-state model with a warning value W 2 and a three-month benchmark interval.Here state 1 implies that the primary transformer works, 2 implies that the secondary transformer works, and 3 implies that two transformers incur faults.In Table 3, we present the record of inspections.
Here Y and M are defined by 2.1 , 2.3 , and 2.4 Based on the dataset given in Table 3, we apply Algorithm 3.1 and obtain the following MLE θ 1 and θ 2 for the primary and the secondary transformers, respectively θ 1 0.0243, θ 2 0.164.

6.1
That is, the mean duration of the primary transformer is 41.2 months, and it is 6.10 months for the secondary transformer.
We use the total time on test TTT plot to check the efficiency of the result.The TTT plot is obtained by plotting against r/n.Here Z i:n is the order statistics of the sample.See 22 for a detailed introduction to TTT plot.From the dataset of Table t010207, the total lifetime of two transformers are 22, 81, 54, 54, 21, 9, and 45 months.We apply Monte Carlo simulations, with the MLE parameters θ 1 and θ 2 , to generate 100 independent total lifetime of two transformers.The TTT plots for the real-world, dataset and the simulated dataset is shown in Figure 5.In Figure 5, the circles represent the real data, and the solid points represent the simulated data with the estimated parameters. Figure 5 shows the efficiency of the result.

Conclusion
In this paper, we have addressed the parameter estimation problem for the multistate model for aging equipment under condition-based maintenance.Based on the memoryless property of exponential distributions, we proposed a convenient mathematical framework for the problem.In this framework, the calculation of involved conditional expectations became tractable.Then we applied an EM algorithm to obtain MLE of parameters.A sequence of simulation experiments shows that the estimated parameters converge to the preset value of the parameters, even for a moderate number of inspections.Moreover, the EM algorithm is stable for different length of benchmark intervals.Hence, the algorithm can be recommended for practical applications.It is convenient to extend the algorithm to the situation with random benchmark intervals.
In this paper, we assume that the state of the equipment can be directly observed at an inspection.However, it is a difficult task for many types of industrial equipment.In these cases, some other variables may be used to obtain estimates of the states.To establish a full model consisting of equipment, inspections, indirect variables, and investigating parameter estimations, it is challenging and valuable for practical situations.

A. Proof of Lemma 4.1
We proceed by an induction on r to show the result in Lemma 4.1.
For r 1, the result clearly holds.And for the convenience of the following discussion, we analyze the case of r 2. In this case, the pdf of S is the convolution of the pdf f λ 1 x of U 1 and the pdf f λ 2 x of U 2 .Thus, for x ≥ 0, Mathematical Problems in Engineering A.1 Assume that for r k, 4.1 and 4.2 hold.That is, for x ≥ 0, where Let r k 1, we have the following: Due to the linearity of convolution, we can derive from A.1 that, for x ≥ 0, Here, for i ≤ k, A.6 Therefore, A.1 holds for i ≤ k.As to i k 1, we have

A.7
It is obvious that is a polynomial in a variable x, and the degree of f x is less than k.Moreover, we have f λ i − 1 0, i 1, . . ., k.Since a polynomial with a degree less than k must be a zero polynomial if it has k different roots.Hence, f x − 1 ≡ 0, or f x ≡ 1.Hence, A.9 So the result is true for r k 1.This completes the proof.

B. Proof of Lemma 4.5
For a time instance k, let τ 1 max{i|I i ≤ k − 2} and τ 2 min{i|I i ≥ k − 1}.We will consider the following filtrations:

B.1
Let A 1 {M k−1 r} and B 1 {Y k s} for given 1 ≤ r, s ≤ m.We will prove that for every events A ∈ F 1 , B ∈ F 2 , and C ∈ F 3 , That is, F 1 ∨ F 3 and F 2 are independent, conditional on {M k−1 r} and {Y k s}.It is obvious that A 1 ∈ F 1 .Moreover, as As for every i ≤ n, taking the {Y i y i , M i−1 m i−1 , i 1, . . ., n} projection of both sides of B.8 , we have the result

Figure 2 :
Figure 2: Relationship between convergence error and the number of iterations.

Figure 4 :
Figure 4: Estimated parameters of two methods for different benchmark intervals.

Figure 5 :
Figure 5: The TTT plots for real-world dataset and the simulated dataset.

θ 2 Figure 3 :
Estimated parameters for different initial values.

Table 1 :
Estimated parameters with different numbers of inspections.

Table 2 :
Standard errors of estimated parameters.

Table 3 :
Inspections of a power transformer substation.
3therwise, B.3we have B 1 ∈ F 2 .Hence, due to the independence among F 1 , F 2 , and F 3 , we have the following:P A 1 B 1 P A 1 P B 1 , P ACA 1 B 1 P ACA 1 P B 1 .B.4Therefore, we have the following:P AC | A 1 B 1 P ACA 1 B 1 P A 1 B 1 P AC | A 1 , P B | A 1 B 1 P B | B 1 .Further, as F 1 ∨ F 3 and F 2 are independent, we obtain thatP ABCA 1 B 1 P ACA 1 P BB 1 .B.6It follows from B.5 , and the above equation thatP ABC | A 1 B 1 P AC| A 1 B 1 P B | A 1 B 1 .