Applying Hierarchical Bayesian Neural Network in Failure Time Prediction

With the rapid technology development and improvement, the product failure time prediction becomes an even harder task because only few failures in the product life tests are recorded. The classical statistical model relies on the asymptotic theory and cannot guarantee that the estimator has the finite sample property. To solve this problem, we apply the hierarchical Bayesian neural network HBNN approach to predict the failure time and utilize the Gibbs sampler of Markov chain Monte Carlo MCMC to estimate model parameters. In this proposed method, the hierarchical structure is specified to study the heterogeneity among products. Engineers can use the heterogeneity estimates to identify the causes of the quality differences and further enhance the product quality. In order to demonstrate the effectiveness of the proposed hierarchical Bayesian neural network model, the prediction performance of the proposed model is evaluated using multiple performance measurement criteria. Sensitivity analysis of the proposed model is also conducted using different number of hidden nodes and training sample sizes. The result shows that HBNN can provide not only the predictive distribution but also the heterogeneous parameter estimates for each path.


Introduction
In this high technology era, the society operations highly depend on various machinery and equipments.Once the machinery or equipment is broken down, enormous trouble and economics cost will be brought to the entire society.To enhance the product reliability, the methodologies to assess product reliability have received much discussion in both academics and industries.Among several mature techniques, degradation testing provides an efficient way for reliability assessment when product quality is associated with a timevarying degradation process.Typically, degradation measures can provide more reliability information, particularly in modeling the failure-causing mechanism, than time-to-failure data in few or no failure situation.depends on parameters whose values must be determined with a training set of inputs and outputs.Network architecture is the organization of nodes and the types of connections permitted.The nodes are arranged in a series of layers with connections between nodes in different layers, but not between nodes in the same layer.
Several researchers also integrate neural network algorithm with Bayesian theory, which has been known as Bayesian neural network, in prediction.For examples, Neal 12 applied Hybrid Markov chain Monte Carlo MCMC numerical integration techniques for the implementation of Bayesian procedures.M üller and Rios Insua 13 proposed a MCMC scheme based on a static or dynamic number of hidden nodes.In their subsequent paper, they have extended their research results by releasing the constraint of number of hidden nodes 13 .Also, Holmes and Mallick 14 used Bayesian neural network modeling in the regression context.
In this paper, we conduct a hierarchical Bayesian neural network analysis with MCMC estimation procedure in the failure time prediction problem.Here, hierarchy means that the coefficients in our constructed HBNN model are specified by random effect distributions.We attempt to use this hierarchical structure to determine if the heterogeneity exists among paths.The advantage of proposed HBNN model cannot only provide a better failure time prediction by incorporating the heterogeneity of components and autocorrelated structure of error term but also provide a predictive distribution for the target value.Different from previous research, the proposed HBNN model can successfully offer the full information of parameter estimation and covariance structure.Engineers can use the heterogeneity estimates to identify the causes of the quality differences and further enhance the product quality.
The data of the fatigue crack growth from Bogdanoff and Kozin 15 is used to illustrate the proposed model.To demonstrate the effectiveness of the proposed model, the prediction performance of the proposed model is evaluated using multiple performance measurement criteria.Sensitivity evaluation of the proposed model is also conducted using different number of hidden nodes and training sample sizes.The result shows that HBNN can provide not only the predictive distribution but also the heterogeneous parameter estimates for each path.
The rest of this paper is organized as follows: Section 2 introduces the proposed HBNN model for failure time prediction.In Section 3, the fatigue crack growth data from Bogdanoff and Kozin 15 is illustrated, and the model estimation procedure is provided.Failure time prediction and sensitivity analysis are demonstrated in Section 4. Concluding remarks are offered in Section 5.

HBNN Model for Failure Time Prediction
To model failure time, we adapted the growth-curve equation used by Liski and Nummi 16 as follows: where y i,j is the jth crack length of the ith path and t i,j is the observed cycle time of the ith path, where i 1, 2, . . ., N and j 1, 2, . . ., n i .In addition, β i,m are the weights of the ith path Mathematical Problems in Engineering attached to the hidden nodes m m 1, 2, . . ., M , M is the number of hidden nodes, Γ i,m,j is the output of the mth hidden node when the jth crack length of the ith path is presented, i,k 1,m are the weights from the first input, y i,j , to the hidden node m, and ψ is the activation function.Typically, the choice of M depends upon the problem under consideration.The testing results of neural network with combinations of different numbers of hidden nodes have been investigated.In the present case, we have set the number of hidden nodes M equal to 3 because it gives the best predicting result.
According to the above equation, we know that there are totally N paths from a given population, and n i observations are available for path i at fixed crack lengths y i,1 , y i,2 , . . ., y i,n i i.e., The observations at length y i,1 , y i,2 , . . ., y i,n i are t i,1 , t i,2 , . . ., t i,n i , resp. .Herein, we assume that the conditional distribution of t i,j given y i,j is normally distributed as It means that each value of y i,j produces a random value of t i,j from a normal distribution with a mean of β i,0 M m 1 β i,m Γ i,m,j and a variance σ 2 .Moreover, from literature 17, 18 , we understand that degradation signals are usually autocorrelated in nature.We also noticed that the values of the first-order autocorrelation r 1 of the residuals in Lu and Meeker 1 are not exactly equal to 2.0.Therefore, we suspected that the error term might be characterized as a first-order autoregressive process.Based on this finding, we proposed a new parametric crack growth model with autocorrelated errors as the following equations: where ρ i is the autocorrelation coefficient and Z i,j is a normal distributed error with N 0, σ 2 form.Note that the elements t i,1 , t i,2 , . . ., t i,n i in 2.2 are independent given β i,m , i,k,m , σ 2 , ρ i and y i,j , where k is the number of inputs.The function ψ • is referred to as an activation function in a neural network.Typically, the activation function is nonlinear.Some of the most common choices of ψ • are the logistic and the hyperbolic tangent functions.In order to describe the heterogeneity varying from path to path, we characterized β i by a 4-variate normal distribution with mean vector β and covariance matrix . ., N, and i,m is characterized by a 3-variate normal distribution with mean vector m , and covariance matrix V m for m 1, 2, and 3. Equations 2.2 -2.4 specify a general model for studying when observed cycle time sensitivity to crack length may increase.The heterogeneity among paths is captured by parameters β i , i,k,m , and the specification of covariates V β and V m .
According to the above setting, the likelihood function for the data can be written as To reduce the computational burden of posterior calculation and exploration, we 1 as the conjugate priors on the parameters β i , β, V β , i,m , m , V m , σ 2 , and ρ i , respectively.Typically, the selection of priors is problemspecific.Some have even criticized Bayesian approach as relying on "subjective" prior information.However, we should also notice that the basis of prior information could be "objective" or data-based.The power prior developed by Ibrahim et al. 19 is an example of it.However, in most empirical cases, the utilization of diffuse prior for parameters is a reasonable default choice.
By using the Bayes theorem with the sample information and prior distribution of each parameter, the posterior distribution of each parameter can be derived.The posterior distributions and the details of the estimation procedure can be referred to Carlin and Louis 20 .The posterior distributions of estimated parameters can be summarized as the full conditional probability formulas shown in the Appendix.
In addition to the posterior distribution for the estimated parameter, the predictive distribution of the unobserved cycle time, t pred , given the observed cycle time, t, is one of our main objectives.The predictive distribution is analytically intractable because of the requirement of highly dimensional numerical integration.However, the Markov chain Monte Carlo MCMC method provides an alternative, whereby we sample from the posterior directly and obtain sample estimates of the quantities of interest, thereby performing the integration implicitly 21 .In other words, Bayesian analysis of hierarchical models has been made feasible by the development of MCMC methods for generating samples from the full conditionals of the parameters given the remaining parameters and the data.
Among these MCMC methods, Gibbs sampling algorithm is one of the best known estimation procedures that uses simulation as its basis 22 and will be used herein to estimate our parameters.It has been shown that, under the mild condition, the Markov chain will converge to a stationary distribution 23 .Beginning with the conditional probability distributions in 5 , the Gibbs and Metropolis-Hasting sampling procedure uses recursive simulation to generate random draws.Details of the conditional distributions for the full information model are available upon request.The values of these random draws are then used as the conditional values in each conditional probability distribution, and according to the procedure, generated random draws are carried out again in the next iteration.After numerous reiterated simulations are performed in this way, the convergent results yield random draws that are the best estimates of the parameters.

Illustrative Example
We use the fatigue crack growth data from Bogdanoff and Kozin 15 as an illustrative example to demonstrate the modeling procedure and effectiveness of the proposed Hierarchical Bayesian neural network approach.Figure 1 is a plot of the total 30 sample degradation paths.It is obvious that variability amongst paths does exist.There are several possible factors, such as different operating conditions and different material properties, which could cause the variability.Therefore, it is a big challenge to construct a model to capture the statistical properties of degradation paths and to predict failure time.
In this data set, there are 30 sample paths in total and each sample path has 164 paired observations, cycle time, and crack length.The cycle time is observed at some fixed crack lengths.We predefined the path as "failure" as soon as its crack length reaches a particular critical level of degradation i.e., D f 49 mm and assumed the experiment was terminated at 40 mm.In other words, based on the measurements of degradation from 9 mm to 40 mm, we would like to model the degradation process and use the proposed model to predict the failure time t i,j D f to the assumed critical level for the degradation path i.e., crack length 49 mm .As mentioned, because the fatigue experiment was conducted on paths with fixed crack length, we are interested in the predicted failure time for the path when a specific crack length i.e., 49 mm is reached.

Model Estimation
Because the coefficients β i,m and i,k,m used to depict the degradation process are high dimensional, it is difficult to integrate out these parameters to obtain the distribution of failure time, especially when complex interactions among random parameters are present.To solve this problem, estimation was carried out using the Markov chain Monte Carlo methods using R language.The chain ran for 20,000 iterations, and the last 10,000 iterations are used to obtain parameter estimates.Convergence was assessed by starting the chain from multiple points and inspecting time-series plots of model parameters.Posterior draws from the full conditional are used to compute means and standard deviations of the parameter estimates.
Table 1 reports the posterior mean and standard deviation of the parameters for the proposed model.It shows that the values of Γ m 1,j , Γ m 2,j , and Γ m 3,j become steady when ln y i,j becomes a large number.The covariance matrix of the heterogeneity distribution is reported in Table 2.It shows that the posterior mean of the diagonal elements of matrix V β are ranged from 0.0002 to 0.0004.Compared to the outputs of hidden nodes Γ m,j ranged from 0 to 1 , all these diagonal elements are not really small.It represents that the heterogeneity across paths does exist.According to above findings, we can conclude that the proposed HBNN model can successfully determine the heterogeneity across various paths even though, in this particular data set, we were unable to provide explanation to the cause of heterogeneities because of the limited information in data.

Failure Time Prediction
The model estimation shown in Section 3 allows us to predict failure time t i,j to the assumed critical level of degradation i.e., D f 49 mm based on the measurements of degradation from 9 mm to 40 mm.In order to demonstrate the effectiveness of the proposed hierarchical Bayesian neural network model, the prediction performance is evaluated using the following performance measures: the root mean square error RMSE , mean absolute difference MAD , mean absolute percentage error MAPE , and root mean square percentage error RMSPE .The definitions of these criteria were summarized in Table 3. RMSE, MAD, MAPE, and RMSPE are measures of the deviation between actual and predicted failure times.The smaller the deviation, the better the accuracy.The failure time prediction results using the proposed hierarchical Bayesian neural network model are computed and summarized in Figure 2 and Table 4. Table 4 shows that RMSE, MAD, MAPE, and RMSPE of the HBNN model are 0.37340, 0.27121, 1.058%, and 1.440%, respectively.It can be observed that these values are very small.It indicates that there is a smaller deviation between the actual and predicted failure times obtained by the proposed model.Moreover, the proposed HBNN can provide not only posterior estimates of the spatial covariance but also a natural way to incorporate the model uncertainty in statistical inference.

Sensitivity Analysis
To evaluate the sensitivity of the proposed method, the performance of the HBNN model was tested using different number of hidden nodes and training sample sizes.In this section, we set the number of hidden nodes as 3, 4, 5, and 6.And three different sizes of the training dataset observations collected from 9 mm to 30 mm , 9 mm to 35 mm , and 9 mm to 40 mm resp.were considered.The prediction results made by the HBNN model are summarized in Table 5 in terms of RMSE, MAD, MAPE, and RMSPE.According to the table, the HBNN model has a lower RMSE, MAD, MAPE, and RMSPE with observations collected from 9 mm to 40 mm than with observations collected from 9 mm to 30 mm .This is because the sample size of the 9 mm to 30 mm dataset is smaller than the sample size of the 9 mm to 40 mm dataset.However, the RMSE, MAD, MAPE, and RMSPE are almost the same for the cases of hidden nodes 3, 4, 5, or 6.This result suggests that there is no difference for the predictions when the number of hidden nodes varies.

Conclusion
In this paper, we applied the HBNN approach to model the degradation process and to make the failure time prediction.In the process of developing the HBNN model, the MCMC was utilized to estimate the parameters.Since the prediction of failure time made by HBNN model can sufficiently represent the actual data, the time-to-failure distribution can also be obtained successfully.In order to demonstrate the effectiveness of the proposed hierarchical Bayesian neural network model, the prediction performance of the proposed model is evaluated using multiple performance measurement criteria.Sensitivity evaluation of the proposed model is also conducted using different number of hidden nodes and training sample sizes.As the results reveal, using HBNN can provide not only the predictive distribution but also accurate parameter estimate.By specifying the random effects on the coefficients β i and i,m in the HBNN model, the heterogeneity varying across individual products can be studied.Based on these heterogeneities, the engineers will be able to conduct a further investigation in the manufacturing process and then to find out the causes of differences.
For the future research, statistical inferences of failure time based on degradation measurement, such as failure rate and tolerance limits, can be further evaluated given the predicted failure time.In addition, for some highly reliable products, it is not easy to obtain the failure data even under the elevated stresses.In such case, accelerated degradation testing ADT can be an alternative that provides an efficient channel for failure time prediction.The proposed HBNN approach can also be applied to depict the stress-related degradation process by including those stress factors as covariates in the model.

Figure 1 :
Figure 1: Thirty paths of fatigue crack growth data from Bogdanoff and Kozin 15 .

Figure 2 :
Figure 2: Prediction of failure time at 49 mm when data collection is stopped at 40 mm .

Table 1 :
Estimated mean and STD for posterior parameters.

Table 2 :
Covariance matrix of the distribution of heterogeneity.

Table 3 :
Performance measures and their definitions.

Table 4 :
Summary of failure time prediction results by HBNN model.