A Hidden Semi-Markov Model with Duration-Dependent State Transition Probabilities for Prognostics

Realistic prognostic tools are essential for effective condition-based maintenance systems. In this paper, a Duration-Dependent Hidden Semi-MarkovModel (DD-HSMM) is proposed, which overcomes the shortcomings of traditional HiddenMarkovModels (HMM), including the Hidden Semi-Markov Model (HSMM): (1) it allows explicit modeling of state transition probabilities between the states; (2) it relaxes observations’ independence assumption by accommodating a connection between consecutive observations; and (3) it does not follow the unrealistic Markov chain’s memoryless assumption and therefore it provides a more powerful modeling and analysis capability for real world problems. To facilitate the computation of the proposed DD-HSMM methodology, new forward-backward algorithm is developed. The demonstration and evaluation of the proposed methodology is carried out through a case study.The experimental results show that the DD-HSMMmethodology is effective for equipment health monitoring and management.


Introduction
Fault is a change from the normal operating condition of a system to an abnormal condition, which occurs as a result of system performance degradation over time [1].Diagnostics indicates the occurrence of a fault and its root cause.Prognostics is fault prediction method; it involves detection of a pending fault before it occurs, identifying its root cause and estimating the remaining useful life (RUL), which is also known as time-to-failure [2].Conditionbased Maintenance (CBM) is a maintenance program that recommends maintenance actions based on the information collected through condition monitoring.
A CBM program can be used to do diagnostics or prognostics; however, regardless of the application, it follows three steps [3][4][5].First, data relevant to events and system health are collected through data acquisition techniques.Data acquisition in CBM includes event-type data (i.e., information of what happened) and condition monitoring data, which are the measurements related to system health.
Second, event and condition monitoring data are interpreted for better understanding in the data processing step.Finally, maintenance decisions are made based on the interpretation and analysis of data.In particular, to identify the weakest components and states and improve the efficiency of CBM, the integrated importance measure of multistate system was introduced by Si et al. [6,7].An extensive survey on machine diagnostics and prognostics implementing condition-based monitoring can be found in Jardine et al. [8] and Heng et al. [9].
Data analysis for event data only is reliability analysis, which maps the event data over a time axis to determine the probability of events and uses the probability distribution to predict failures.On the other hand, data acquisition in CBM provides event and condition monitoring data.Therefore, it is more effective to combine events and conditions in a model in order to do diagnostics or prognostics.Hidden Markov model (HMM) is a technique for modeling and analyzing event and condition monitoring data together.It consists of two stochastic processes: (1) a Markov chain with 2 Mathematical Problems in Engineering finite number of states describing an underlying mechanism and (2) an observation process depending on the hidden state [10][11][12].An HMM contains finite states connected by transitions.Each state is characterized by a transition probability and an observation probability [13].
Researchers have proposed a number of techniques to address these limitations.Continuous variable duration HMM is adopted in the speech recognition.Compared to standard HMM, results show that the absence of a correct duration model increases the error rate by 50% [14][15][16].Another example is in handwritten word recognition area; due to the inherent ambiguity related to the segmentation process in handwritten words, it is a practical idea to use the variable duration model for the states in a HMMbased handwritten word recognition system [17,18].Recently, some researchers apply HMM in the area of diagnostics and prognostics in machining process [19,20].However, these studies use only ordinary HMM technique.The inherent limitation of HMM as mentioned above still exists in these models.
Prognostic methods used in CBM are often a combination of statistical inference and machine learning methods [4,[21][22][23][24]. Model-based methods assume that measured information is stochastically correlated with the actual machine condition.HMM identifies the actual machine conditions from observable monitored data through a statistical approach.HMM has been very effective in various applications ranging from speech recognition [10,14,[25][26][27] to tool wear monitoring and machining [3,20,28,29].
The primary advantage of HMM is its robust mathematical foundation that can allow for many practical applications and different areas of use.An added benefit of employing HMMs is the ease of model interpretation in comparison with pure "black-box" modeling methods such as artificial neural networks that are often employed in advanced diagnostic models [28].However, an inherent limitation of HMM approach is that its state duration follows an exponential distribution.In other words, HMM does not provide adequate representation of temporal structure.
To overcome the limitations of HMM in prognosis, Dong and He [30,31] propose a Hidden Semi-Markov Model-based (HSMM) methodology by adding an explicit temporary structure into HMM to predict RUL of equipment.In this model, the states of HSMM are used to represent the health status of equipment.The trained HSMM can be used to diagnose the health state of equipment.Through parameter estimation of the health-state duration probability distribution and the proposed backward recursive equations, the RUL of the equipment can be predicted [32].Although the results from HSMM are promising, the deterioration in the same state of the system is not taken into consideration in this model.It assumed that the state transition probabilities stay the same in the same state, which assumes all observations are independent, which typically does not hold in real world applications.
This paper presents a new approach that expands the HSMM methodology [30,31,33] with duration-dependent state transition probabilities.Different from HSMM, the proposed Duration-Dependent Hidden Semi-Markov Model (DD-HSMM) does not follow the unrealistic Markov chain's memoryless assumption and therefore provides a more realistic and powerful modeling and analysis capability.The major contribution of the DD-HSMM methodology is that it allows explicit modeling of the transition probabilities, which (1) do not only depend on the state but also (2) vary with the duration of each state, and (3) it provides the capability to relax observations' independence assumption by accommodating a link between consecutive observations, which makes it more realistic in real world applications.

Description of General HSMM.
A Hidden Semi-Markov Model (HSMM) is an extension of HMM by allowing the underlying process to be a semi-Markov chain with a variable duration or sojourn time for each state.The HSMM model is an ideal mathematical model for estimating the unobservable health states with observable sensor signal.For example, a small change in a bearing alignment could cause a small nick in the bearing, which could cause scratches in the bearing race and additional nicks, leading to complete bearing failure.This process can be well described by the HSMM.Let   be the hidden state at time  and  the observation sequence; a HSMM is characterized by its parameters.The parameters of a HSMM are as follows: the initial state distribution (denoted by ), the transition model (denoted by ), the observation matrix (denoted by ), and the state duration distribution (denoted by ).Thus, a HSMM can be written as  = (, , , ).For a given state ,  is the probability matrix of observation being   at time  and  +1 at time  + 1.

State Transitions.
In HSMM, there are  states, and the transitions between the states are according to the transition matrix ; that is, ( → ) =   .Similar to standard HMM, the state  0 at time  = 0 is a special state "START." Although the distinct health-state transition the state transition  −1 →   is usually not Markov.It is the reason why the model is called "semi-Markov" [32], which means in the HSMM case, the conditional independence between the past and the future is only ensured when the process moves from one health state to another distinct health state.

Inference
Procedures.Similar to HMM, HSMM also has basic problems to deal with, that is, evaluation, recognition, and training problems: (1) evaluation (also called classification): given the observation sequence  =  1  2 ⋅ ⋅ ⋅   and a HSMM , what is the probability of the observation sequence given the model, that is, ( | ); (2) decoding (also called recognition): given the observation sequence  =  1  2 ⋅ ⋅ ⋅   and a HSMM , what sequence of hidden states  =  1  2 ⋅ ⋅ ⋅   most probably generates the given sequence of observations; (3) learning (also called training): how do we adjust the model parameters  = (, , , ) to maximize ( | ).
Different algorithms have been developed for above three problems.The most straightforward way of solving the evaluation problem is enumerating every possible state sequence of length  (the number of observations).However, the computation burden for this exhaustive enumeration is prohibitively high.Fortunately, there is a more efficient algorithm that is based on dynamic programming, called forward-backward procedure.The goal for decoding problem is to find the optimal state sequence associated with the given observation sequence.The most widely used optimality criterion is to find the single best state sequence (path), that is, to maximize ( | , ) that is equivalent to maximizing (,  | ).Viterbi algorithm is used to find this single best state sequence, which is based on dynamic programming methods.For learning problem, there is no known way to obtain analytical solution.However, we can adjust the model parameter  = (, , , ) such that ( | ) is locally maximized using an iterative procedure, such as the Baum-Welch method (or equivalently the Expectation-Maximization algorithm).

Duration-Dependent State Transition Probability.
In DD-HSMM, the state transition probability distribution  = {  ()}, 1 ≤ ,  ≤ , 1 ≤  ≤ .We define durationdependents state transition probabilities as follows: where  and  are the number of states and the maximum duration in any states, respectively.Equation ( 2) represents the transition from state  to state , given that the duration in state  at time  is   () = .It indicated that, in the DD-HSMM case, the state transition probability is not only state dependent, but also duration variant.

Inference
Procedures.Similar to HSMM, DD-HSMM also has basic problems to deal with, that is, evaluation, recognition, and training problems.To facilitate the computation in the proposed DD-HSMM-based health prediction model, in the following, new forward-backward variables are defined and modified forward-backward algorithm is developed.
A dynamic programming scheme is employed for the efficient computation of the inference procedures.To implement the inference procedures, a forward variable   (, ) is defined as the probability of generating  1 ,  2 , . . .,   and ending in state  and the duration   () = : The initial conditions are established at time  = 1 as follows: All unspecified  values are zero.For time  = 2, . . ., , where For the backward probability, the initial conditions are set at time  =  as follows: Then the total probability can be computed by

Modified Forward-Backward Algorithm for DD-HSMM.
In order to give reestimation formulas for all variable of the DD-HSMM, one DD-HSMM-featured forward-backward variable is defined: Then, we have and the probability in state  at time  with duration of  is defined as   (, ), and, from the definition of the forwardbackward variables, we can easily derive   (, ) as follows: The forward-Backward algorithm computes the following probabilities.
Forward Pass.The forward pass of the algorithm computes   (, ).
Step 1 (initialization ( = 1)).The forward variable is shown as follows: Step 2 (forward recursion ( > 1)).For  = 2, . . ., , Backward Pass.The backward pass computes   (, ) Step 1 (initialization ( = )).The backward variable is shown as follows: Step 2 (backward recursion ( < )).For  = 2, . . ., , 3.5.Parameter Reestimation for DD-HSMM.The reestimation formula for initial state distribution is the probability that statewas the first state, given : The reestimation formula of state transition probabilities is the ratio of expected number of transition from state  to state , to the expected number of transitions from state : In these equations,   (, ) is the probability of state  at time  with the duration of   () =  and   (, ) can present as

DD-HSMM Based Health Prognostic
Many applications in the actuarial, econometric, engineering, and medical literature involve the use of the hazard rate function [33].The mathematical properties of HR function can reveal a variety of features in the data.
Let  denote the time to failure of an item under consideration, with lifetime distribution function () and reliability function (), where () + () = 1 and (0) = 0. Assume that (0) = 0 and density function () =   () exist; then the HR function can be defined as: In which,  is the total number of sample items, () is the number of items that fail before time , and Δ() is the number of items that fail during the time interval (,  + Δ).
The ERL function () is the expected time remaining to failure, given that the system has survived to time ; then for  such that () > 0. Therefore, () can be approximated as the conditional probability of failure during the time interval (,  + Δ) given survival to time .
Suppose that a machine will go through health states , ( = 1, 2, . . .,  − 1) before entering failure state .Let () denote the expected duration of the machine staying at health state ; based on the parameters estimated above, we can get () as follows: And  can be denoted by Then, once the machine has entered the health state , its expected residual life equals the summation of the expected residual duration of the machine staying at health state  and the total remaining staying in the future health states before failure.Denote D(  ) as the expected residual duration of the machine staying in the health state  for .When the equipment entered state  at time   , the conditional probability of failure during (  + ,   + ( + Δ)) can be defined as the probability that the machine will transit to any other state during the coming Δ and the probability that the machine still stay at state .It can be seen from ( 9) and ( 10) that λ( + )Δ can be denoted as follows: Then The DD-HSMM equipment health prediction procedure is given as follows.
Step 1. From the DD-HSMM training procedure (i.e., parameter estimation), the state transition probability for the DD-HSMM can be obtained.
Step 2. Through the DD-HSMM parameter estimation, the duration probability density function for each health-state can be obtained.Therefore, the duration mean and variance can be calculated.
Step 3. By classification, identify the current health status of the equipment.
Step 4. The remaining useful life (RUL) of equipment can be predicted by the following formula (suppose that the equipment currently stays at health state  with duration of ):

Case Study
In this case study, long-term wear experiments on rolling element bearings were conducted [1].In order to collect adequate amount of data sets for the validation of the proposed scheme, three experiments with normal operating conditions, three experiments with cage defect fault, and three experiments each of inner and outer race defect faults were performed until the bearing reached a complete failure state and stopped operating.Bearing characteristic frequencies in the frequency domain are extracted from the vibration signals corresponding to different degrees of the health states of the bearing acquired during experiments.
During the test running, under each condition, vibration signals were collected.These signals were extracted using a Mahalanobis-Taguchi System (MTS) based model in the original paper [1] and used for the proposed DD-HSMM methodology in this paper.The expert judgment is made of four integer numbers ranging from 0 to 3, representing 4 system states, as follows: 0 → the bearing is operating normally; 1 → the bearing is operating and shows signs of deterioration; it is advisable to take some preventive action at the next planned maintenance; 2 → the bearing is operating but requires immediate attention; 3 → the bearing has failed.

Operation State Identification.
In order to identify the accuracy of the operation state identification method proposed in this paper, experimental data with normal operating condition were obtained.The experimental data set included 50 samples for each state (denoted by 0, 1, 2, and 3).Of these data points, 20 of them were used to train the model, and the remaining 30 samples were used to validate the model.In the DD-HSMM, mixture Gaussian distribution and the single Gaussian distribution were used to model the output probability distribution and the state duration densities separately, in which the number of states is 4. The maximum number of iterations in training process is set to 100 and the convergence error to 0.000001.The DD-HSMM-based training model is shown as Figure 1.The x-axis shows the training steps and the y-axis represents the likelihood probability of different states.As can be seen from Figure 1, the progression of the four states reaches the set error in less than 40 steps.This demonstrates the potential of the model to have a strong real-time signal processing capability.
The classification results obtained on the remaining 30 data samples are shown in Table 1.As indicated in the results, the accuracy of the DD-HSMM method is 94.2%.

Health Prediction for RUL.
As described before, a fourstate DD-HSMM prediction model is constructed.In the training process, even if the device is in the same running condition, the dwell time is different, transition probabilities between states and the mean or variance of duration in each state are not the same.Tables 2 and 3 show the state transition probability, the mean, and variance of duration in each state when (1) = 1, representing the bearing in state 1 with duration of 1. Tables 4 and 5 show the state transition probability, the mean, and variance of duration in each state when (1) = 4, representing the bearing in state 1 with duration of 4.
First, the state  of the current operating state based on the recognition results is determined; then the residence time ∑ −1 =+1 () is calculated according to the duration parameters of the operating state in training process.Then, the remaining effective life in the current operational state is calculated using (25).Finally, the RUL of the bearing can be calculated using (26).Suppose that the bearing is now at state 1 with a duration of 1; then the following can be obtained: (2)+(3) = 10.9426,D(1 1 ) = 6.0875 by (25), and RUL (1)  1 = 17.0211 by (26).

Prediction Comparison.
In order to compare the prognostic method based on the DD-HSMM with the prognostic method based on the HSMM, ( 29) is used to evaluate the life error.In (29) Table 6 shows the prediction comparison of DD-HSMM versus HSMM.Failure prediction of the HSMM method is only state dependent, while the DD-HSMM method uses both state dependency and duration dependency.The DD-HSMM method has a self-updating capability, in which the historical data on states are used in the calculation of state transition probability matrix.As indicated in the results, the DD-HSMM method is more accurate than the HSMM method.

Conclusion
This paper presents a Duration-Dependent Hidden Semi-Markov Model (DD-HSMM) for prognostics.As opposed to the Hidden Semi-Markov Model (HSMM), failure prediction capability of the DD-HSMM method uses state dependency and duration dependency.The two important aspects of equipment health monitoring, which are the stages and the rate of aging, are taken into consideration in an integrated manner in the proposed DD-HSMM model.The durationdependent state transition probability in the Hidden Semi-Markov model makes the decision-making more relevant to real world applications.
In order to facilitate the computational procedure, a new forward-backward algorithm and reestimation approach are developed.By using autoregression, the interdependency between observations is established in the model.By incorporating an explicitly defined temporal structure into the model, the DD-HSMM is capable of predicting the remaining useful life of equipment more accurately.
The demonstration of the proposed model is carried out using experimental data on rolling element bearings.The proposed model provides a powerful state recognition capability and very accurate results in terms of remaining useful life prediction.In order to draw general conclusion on the capabilities of the proposed DD-HSMM, more experimental data in various prognostics areas are needed.

Figure 1 :
Figure 1: Training curve of the DD-HSMM model.
() is the state transition probability from state  to state , given that the duration in state  at time  is   () = .(  ) is the output probability of observation vector   from state  and   () is the state duration probability of state .is the number of states in DD-HSMM and   is the maximum duration in state .Similar to the forward variable, the backward variable can be written as   (, ) =  (  , . . .,     = ,   () = , ) .
, RUL actual represents the actual life of the component, and RUL forecasted represents the expected life predicted by DD-HSMM or HSMM: