Generalized Accelerated Failure Time Frailty Model for Systems Subject to Imperfect Preventive Maintenance

Imperfect preventive maintenance (PM) activities are very common in industrial systems. For condition-based maintenance (CBM), it is necessary to model the failure likelihood of systems subject to imperfect PM activities. In this paper, the models in the field of survival analysis are introduced into CBM. Namely, the generalized accelerated failure time (AFT) frailty model is investigated to model the failure likelihood of industrial systems. Further, on the basis of the traditional maximum likelihood (ML) estimation and expectation maximization (EM) algorithm, the hybrid ML-EM algorithm is investigated for the estimation of parameters. The hybrid iterative estimation procedure is analyzed in detail. In the evaluation experiment, the generated data of a typical degradation model are verified to be appropriate for the real industrial processes with imperfect PM activities. The estimates of the model parameters are calculated using the training data. Then, the performance of the model is analyzed through the prediction of remaining useful life (RUL) using the testing data. Finally, comparison between the results of the proposed model and the existing model verifies the effectiveness of the generalized AFT frailty model.


Introduction
Condition-based maintenance (CBM) has continuously been an important issue in the area of maintenance strategy.As degradation processes before failure of many systems can be measured, CBM is more effective as corrective maintenance and time-based preventive maintenance in some aspects, such as catastrophic failure reduction and availability maximization [1,2].CBM program includes three steps: data acquisition, signal processing, and maintenance decision support [1].Maintenance decision support is categorized into diagnostics and prognostics.Prognostics are often characterized by estimating the remaining useful life (RUL) of systems using available condition monitoring information.The RUL estimation ensures enough time to perform the necessary maintenance actions prior to failure [3,4].
The proportional hazards model (PHM) [5,6], which is a popular model in survival analysis, can be applied in reliability evaluation and maintenance optimization.It has shown its effectiveness for RUL prediction and CBM scheduling of industrial systems [7][8][9].PHM is an effective CBM method due to its strength in handling the influence of variable operational conditions.The accelerated failure time (AFT) model [10,11] is an important alternative to the PHM in survival analysis and has the advantage of being more intuitively interpretable than the PHM.However, the AFT method has been rarely applied in reliability-related fields.It is shown by reviewing and comparing the major mathematical models of survival analysis and reliability theory that both fields address the same mathematical problems [6].Because of the unified mathematical models of both fields, the methods for survival analysis might be used for reliability analysis and related fields such as CBM.In this paper, we introduce the AFT model in reliability fields and investigate the AFT-based model for CBM.
Imperfect preventive maintenance (PM) restores a system to a better state but not "as good as new." Imperfect PM activities are considerably common in industrial systems and there are many studies on the model of imperfect PM [12][13][14][15][16][17].The extended PHM (EPHM) [16,17] has been proved to be a superiorly effective method to predict RUL of systems subject to imperfect PM.Inspired by the EPHM, this paper investigates a new alternative model, the generalized AFT frailty model.
The rest of the paper is organized as follows.Section 2 introduces the generalized AFT frailty model and compares it with the existing model.Section 3 proposes the hybrid maximum likelihood-(ML-) expectation maximization (EM) algorithm for the calculation of the parameters in the model.Section 4 proves the effectiveness of the proposed model by the simulation experiment of RUL prediction for systems subject to imperfect PM.Finally, Section 5 concludes the paper.The Cox PHM [5] is a popular model in survival analysis, and it is an effective method for RUL prediction and CBM scheduling of industrial systems.Let ( | X) be the hazard/ failure function at time  given the covariates X; the PHM is expressed as

Generalized AFT Frailty Model
where  0 () is the baseline hazard/failure function, which considers the age of the system at the time of inspection.X is the vector of the covariates and  is the regression coefficient vector.In a CBM program, the systems' degradation indicators are usually chosen to be the covariates.
The AFT model [10,11,18] is an important alternative to the PHM.Suppose   is the failure time of system  in cluster (or group) .There is a censoring random variable, which we denote as   .We only observe  *  = min(  ,   ) and the linear regression (i.e., AFT) model is log where X  is the vector of observed covariates,  is the regression coefficient vector, and   is the random error.This linear form of the AFT model deals with the regression relationship of the covariates and the failure times logarithms, which has the similar form to the generic linear regression model.In the PHM, the baseline failure function and the covariates in PHM are independent, which limits the modeling of some types of the survival data in medical research.Collett [19] has introduced the influence of the covariates in the baseline failure function and proposed the following generalized AFT model: In this model, the whole part of the covariates exp(−X  ) has been introduced into the baseline failure function.
The effectiveness of this model has been verified by the modeling of the survival data in medical field [19].PHM and AFT are suitable for the mutually independent failure time data.In reality, correlated or clustered failure time data are very common in the fields of survival analysis and reliability.For example, systems operate in the same environment with the same temperature and humidity.The shared environment of the subjects leads to the dependence among the observed failure time.Frailty is a good tool to represent the random effect shared by subjects in the same cluster (or group) and it induces dependence among the correlated or clustered failure time data.The PHM frailty model [20] is where   is the frailty term and models the random effect that is shared by the systems in the th cluster (group).Frailty has been also introduced in the AFT model to represent the possible correlation among failure times [12].Considering both the generalized AFT model (3) and the PHM frailty model ( 4), the following generalized AFT frailty model [21] is proposed:

Comparison between the Generalized AFT Frailty Model
and the Existing Model.The EPHM has been proposed by You et al. [16,17] to model the failure likelihood for systems subject to imperfect PM activities.The EPHM has the following form: where ℎ  is the hazard/failure function of the system after the th PM activity and before the ( + 1)th PM activity,   is the th PM interval, and   is the random local time between the th and the ( + 1)th PM activity.  =  0 ⋅  1 ⋅ ⋅ ⋅   , where   is the hazard rate increase factor (HRIF) due to the th PM activity, and   is the age reduction factor (ARF) due to the th PM activity, The HRIF modifies the increase rate of the hazard/failure rate of the system after an imperfect PM activity, and the ARF measures the extent to which the PM activity brings the system to a younger age.
Comparing ( 5) and ( 6), we observe that the HRIF   and the frailty term   act multiplicatively on the hazard/failure rate function and play the same role in the models.When the system starts operation after imperfect PM activities, the system has already aged somewhat.The ARF measures the extent to which the imperfect PM affects the effective age.The generalized AFT frailty model describes the hybrid influence of imperfect PM on both the hazard/failure rate and the effective age, which is shown in Figure 1.To some extent, the generalized AFT frailty model is unified with the EPHM.So we introduce the generalized AFT frailty model ( 5) into the CBM field to model the failure likelihood of the systems subject to imperfect PM.

Parameter Estimation Algorithm
After the introduction of the generalized AFT frailty model, we investigate its parameter estimation algorithm.The EM algorithm has been a popular method for computing ML estimates from incomplete data.Due to the unknown frailty terms, the EM algorithm has been used to estimate parameters in the PHM frailty model [22] and AFT frailty model [12].Vu and Kuniman [23] have proposed a hybrid ML-EM method for the calculation of ML estimates in PHM frailty models and have verified that the hybrid method is more computationally efficient than the standard EM method and faster than the standard direct ML method.In this paper, we estimate the parameters in the generalized AFT frailty model with a hybrid ML-EM algorithm.
If the failure time is smaller than the censoring random variable, the indicator function has a value of one; otherwise, it has a value of zero: The likelihood function of right censoring data is where  (⋅) is the survival/reliability function.
The frailty terms are random variables, and we suppose   obeys the distribution (  , ), with the parameter , which can be a vector.Thus there are two unknown vectors  and  in model (5).
The parameter  is computed with the ML estimation method.The ML estimation is based on the marginal likelihood function, which is written as where Λ 0 is the cumulative failure function.For the sake of convenient calculation, we take the logarithm of the above equation so that the multiplications can be converted into additions.The logarithmic form is where Given the initial values of  and Λ 0 , we can calculate the ML estimate of .Given the estimate of , we can estimate  by an EM algorithm.Namely, the estimations of  and  interact with each other as both preconditions and results.So we update both parameters with an iterative process until they converge.
The parameter  is estimated with the EM algorithm.We start the EM algorithm with the full likelihood function, which is written as The logarithmic form of  full is denoted as   and the corresponding full log-likelihood is where The -step calculates the expectation of the full likelihood function with respect to , in order to eliminate its effect on the likelihood function.The corresponding equations for the expectation calculation are From ( 15) and ( 16), we observe that the calculation of ( 1 ) depends only on the parameter  and is independent of .
The calculation of ( 2 ) contains .So we estimate  only by analyzing (16).
The -step maximizes ( 2 ) and calculates the ML estimate of .Due to both the unknown parts  0 (   −X  ) and Λ 0 (   −X  ) in ( 2 ), we cannot handle ( 2 ) directly.With kernel smoothing processes [24], we convert the semiparametric form of ( 16) into the parametric form.The likelihood function is written as where   = log   − X  , () is the kernel function, and   is the window width of the kernel function.By maximizing the likelihood function (  2 ()), we obtain the ML estimate of .
Based on the estimate β, we can calculate the failure density function, the reliability function, and the corresponding cumulative failure function: With the calculated estimates β and Λ 0 , we calculate the ML estimate  by maximizing   in (10).Then, with the estimated value θ, we follow the EM process to calculate β and Λ 0 again.Thus, we repeat this iterative process until the parameters convergence.We summarize the above-analyzed process as the following estimation procedure: Step 1: obtain initial estimate β from the independent AFT model and estimate the cumulative failure function Λ 0 (   −X  ); Step 2 (ML estimation): given the current estimates β and Λ 0 , update the estimate θ by maximizing the marginal log-likelihood   in (10); Step 3: prepare the full log-likelihood function for the EM algorithm; Step 4 (-step): calculate the expectation of the full loglikelihood function with respect to  in order to eliminate its effect on the likelihood function; Step 5 (-step): given the current estimate θ (from Step 2), update the estimate β by maximizing the partial loglikelihood function (  2 ()) in ( 17) and calculate the corresponding cumulative failure function Λ 0 with (20); Step 6: repeat Steps 2, 3, and 4 until both parameters β and θ converge.This iterative ML-EM procedure is illustrated in Figure 2.

Evaluation Experiment
We generated the data on the basis of a typical degradation model to simulate the industrial processes of the systems subject to imperfect PM activities.With the simulated data for training, we calculated the parameter estimates of the generalized AFT frailty model.In order to test the effectiveness of the model, we calculated the RUL prediction with the testing Full likelihood function equation (11) Expectation of partial full log-likelihood function equation ( 15) and ( 16) With kernel smoothing process handled expectation equation ( 17) Full log-likelihood function equation ( 12), ( 13) and ( 14) Marginal log-likelihood function equation ( 10 equation ( 18), ( 19) and ( 20) data and compare the results with those of the EPHM.By the comparison, it was proved that the generalized AFT frailty model is effective for the systems subject to imperfect PM activities.

Process-Data Generation.
The functional form of the typical degradation process is as follows: where () denotes the degradation indicator, and the stochastic parameters are assumed to follow the normal distribution; namely,  ∼ ( 0 ,  2 0 ) and  ∼ ( 1 ,  2 1 ).The error term () is included in the model to capture system and environmental noise, signal transients, measurement errors, and variations due to monitoring system.() is assumed to follow a Brownian motion process () ∼ (0,  2 ).Equation ( 21) is used to generate the degradation indicator over time, and a higher magnitude of degradation indicator corresponds to a worse system state.
On the basis of ( 21), the degradation process with imperfect PM activities was simulated.The simulation mechanism and experiment design are introduced by You and Meng [17].It was demonstrated that the simulated degradation processes were close to real data in practice.In our simulation experiment, we also considered the single degradation indicator () as a covariate for consistency with the simulation work of You and Meng [17].It was assumed that the PM duration was ignored as zero, and the th PM activities occur at time   .After PM, the degradation process was simulated as a new process plus a residual degradation due to the PM activity with a random imperfect effect.For example, after the first PM, sample  has the following degradation process: After the th PM the corresponding degradation signal is  where  is the coefficient quantifying the amount of residual degradation, which is defined as ,   , and   are the model parameters of sample  after it receives its th PM activity.The typical method for the estimate of the degradation model parameters after each PM performs a Bayesian update, which combines the prior distribution parameters and the real-time information to obtain the posterior distribution parameters [25].
We simulate the degradation processes of 400 systems, which are subject to a maximum of three imperfect PM activities.Some systems fail before they receive any PM.Some systems fail after the first PM, second PM, or third PM.Some systems operate in a normal state up to the end of the simulation time.The updated degradation model parameters of one sample are shown in Table 1.
Twenty degradation processes are illustrated in Figure 3.The horizontal axis represents time.The vertical axis represents the magnitude of the degradation indicator.
The three moments of PM activities are  1 = 100,  2 = 190, and  3 = 270, respectively.Each section represents the degradation process before the corresponding PM.When the degradation indicator exceeds the threshold (defined as  = 500), the process is considered to have failed.Among the twenty illustrated degradation processes, four processes fail before receiving any PM activity, three processes fail after receiving the first PM activity, and eleven processes fail after receiving the second PM activity.
Next, we demonstrate the effectiveness of the simulated data for imperfect PM.We take the local lifetime as the time interval between the PM moment and the time of failure in each section.In Figure 4,  0 ,  1 , and  2 are the times of failure in the three sections, respectively.The corresponding local lifetimes are  0 =  0 − 0,  1 =  1 −  1 , and  2 =  2 −  2 .Concerning all of the 400 simulated processes, the number of failed processes and the average local lifetime in each section are shown in Table 2.
From Table 2, we observe that the average local lifetime decreases with the increases in the times of imperfect PM activities.These simulated data are appropriate for processes with imperfect PM activities, which restore the system to better but not "as good as new" states.

RUL Prediction and Results
Comparison.We used the RUL prediction, an example to test the effectiveness of the generalized AFT frailty model.Among the 400 simulated degradation processes, we used 300 samples for training and 100 samples for testing.Parameters of the model were estimated with training samples, and RUL predictions were calculated with testing samples.During the parameters estimation procedure, we used the typical Weibull distribution as the baseline failure function.We chose the gamma distribution with mean 1 and variance  as the frailty distribution for the following two reasons.Firstly, the gamma distribution is the most popular choice for the frailty distribution [12,18,23].The posterior distribution of the gamma frailty is still the gamma distribution.Secondly, the gamma distribution with mean 1 and variance  is appropriate for the characterization of the frailty terms in our model.Frailty acts multiplicatively on the failure rate function and represents the change in the failure rate after PM relative to that before PM.This multiplication factor is theoretically greater than or equal to 1 but sometimes smaller than 1 in reality [17].With the estimated parameters θ and β, we calculated the RUL prediction of the 100 testing samples following the typical procedure [17].The minimum of the actual local lifetimes in three sections were 43.5, 39.6, and 37.3, respectively.Theoretically, all the samples whose RUL is less than the minimum of the actual local lifetime could be used for further analysis.However, because the prediction of being close to failure is the most essential information in a CBM program, we only collect the samples whose actual RUL is less than 10.One part of the statistical analysis of the predicted RUL based on the generalized AFT frailty model (G-AFT-F for short) and the EPHM with respect to the actual RUL are summarized in Table 3.
The statistical analysis for the samples that fail after receiving the first PM activity and the second PM activity is shown in Figure 5. Mean values of the statistical analysis results are regarded as predicted RUL values, so only mean values are illustrated.The horizontal axis represents the actual RUL, and the vertical axis represents the mean values of the predicted RUL.
From Table 3 and Figure 5 we can see that there is no obvious performance difference between the results using both models.The superior performance of the EPHM over other classical models has been proved [17].Similar to the EPHM, the generalized AFT frailty model can achieve reasonably accurate and reliable RUL prediction.Thus, it is demonstrated that the generalized AFT frailty model is appropriate to model the failure likelihood of the systems subject to imperfect PM activities.

Conclusions and Future Work
Imperfect PM activities are considerably common in industrial systems.This paper has introduced the methods of survival analysis into the CBM field and used the statistical tools to solve practical industrial problems.The generalized AFT frailty model has been investigated to model the failure likelihood of systems subject to imperfect PM activities.On the basis of the ML estimation and the EM method, the hybrid ML-EM algorithm for parameter estimation has been analyzed.In the evaluation experiment, the data of the typical degradation model has been generated.The RUL prediction has been taken as a calculation case and the results of the proposed model have been compared with those of the existing model.The comparison demonstrates the effectiveness of the generalized AFT frailty model.
The data in the experiment are generated from the typical degradation processes and are consistent to practical industrial cases.That is why we performed the evaluation only with simulation data.However, in future research, it will be necessary to evaluate the performance of the proposed model using real data.In addition, it would be interesting to introduce more methods of survival analysis into reliabilityrelated fields and investigate their applicability.

Figure 1 :
Figure 1: Hybrid influence of imperfect PM on both failure rate and effective age.

Figure 3 :
Figure 3: Degradation processes with a maximum of three imperfect PM activities.

Figure 5 :
Figure 5: Statistical analysis of the predicted RUL using G-AFT-F and EPHM for the testing samples that fail after the first and second PM.
First, we introduce the Cox PHM, the AFT model, the generalized AFT model, and the PHM frailty model in sequence.Then, we propose the generalized AFT frailty model. 2.1.Models.

Table 1 :
Updated degradation model parameters.

Table 2 :
Average local lifetime in each section.

Table 3 :
Statistical analysis of the predicted RUL using G-AFT-F and EPHM for the testing samples that fail before receiving any PM activity.