MINIMIZING THE DISCREPANCY BETWEEN SIMULATED AND HISTORICAL FAILURES IN TURBINE ENGINES: A SIMULATION-BASED OPTIMIZATION METHOD (POSTPRINT)

The reliability modeling of a module in a turbine engine requires knowledge of its failure rate, which can be estimated by identifying statistical distributions describing the percentage of failure per component within the turbine module. The correct definition of the failure statistical behavior per component is highly dependent on the engineer skills and may present significant discrepancies with respect to the historical data. There is no formal methodology to approach this problem and a large number of labor hours are spent trying to reduce the discrepancy by manually adjusting the distribution’s parameters. This paper addresses this problem and provides a simulation-based optimization method for the minimization of the discrepancy between the simulated and the historical percentage of failures for turbine engine components. The proposed methodology optimizes the parameter values of the component’s failure statistical distributions within the component’s likelihood confidence bounds. A complete testing of the proposed method is performed on a turbine engine case study. The method can be considered as a decision-making tool for maintenance, repair, and overhaul companies and will potentially reduce the cost of labor associated to finding the appropriate value of the distribution parameters for each component/failure mode in the model and increase the accuracy in the prediction of the mean time to failures (MTTF).


Introduction
There are several failure modes in gas turbine engines.Typically, the failure rates of different components are recorded throughout the life of the engine.Upon obtaining the failure rates of such components, the first step is to conduct a statistical distribution fitting to the initial data per component or failure mode.The mean percentage of failures computed while considering the fitted statistical distribution may not be similar to the historical data due to the lack of quantity or quality of the data.Therefore, a common practice is to perform some adjustments in the distribution parameters based on human intelligence and the experience of the engineers.Currently, there is no formal methodology to approach this complex problem.
The reliability modeling of turbine engines is a complex stochastic system.The complexities that arise when randomness is embedded within a system make the analysis of these systems a difficult task.Unfortunately, epistemic uncertainty is a common and unavoidable characteristic among realworld systems.The development of simulation as means to evaluate stochastic systems enhances the ability to obtain performance measure estimates under several given conditions.Furthermore, these performance measures are much more accurate compared to the estimations of analytical techniques, which often make crude assumptions about the system's condition and operation.
Computer simulations are highly effective in answering evaluative questions concerning a stochastic system.However, it is often necessary to determine the values of the model's decision parameters such that a performance measure is maximized or minimized.This type of problems can be tackled by simulation-based optimization methods [1].
This paper aims to develop a simulation-based optimization method to reduce the discrepancy between simulated and historical failures, which will not only significantly reduce the cost of labor hours associated to performing this activity manually but will also find an appropriate value of the distribution parameters for each turbine component/failure mode so that the simulation's outcome better agrees with the real-life data.
This research was done in collaboration with an aerospace industry's maintenance, repair, and overhaul company referred to as ABC Company.A 54-component turbine engine is considered as a case study.For confidentiality reasons, the names of the components, distributions, and distributions' parameters are not mentioned.
The structure of the paper is organized as follows.Section 2 presents the proposed model and its notation.Section 3 discusses the details of the proposed methodology.In Section 4, the case study is presented along with a design of experiments (DOE) to tune the parameters of the optimization algorithm.Section 5 shows the results of the case study.Finally, concluding remarks and future work are presented in Section 6.

Turbine Engine Model
The engine model is developed using the following assumptions: (1) the component events are independent of each other, (2) the component failure rates interact in a series fashion (see Figure 1), and (3) some of the components may have accumulative time (e.g., an item that has been repaired will recover its previous cumulative time).The model considers the lowest TTF associated to any component as the system's time to failure (i.e., whichever component will have the lowest TTF at each run of the simulation will cause the system to fail) Equation (1) computes the system's TTF where (  ) is the system's TTF and   is the TTF of th component.

Methodology
The proposed simulation-based optimization method searches for a high quality solution (i.e., the value of the distribution's parameters of the components within their feasible regions) with the use of the simulated annealing (SA) metaheuristic.The proposed method is broken down into two phases.
Phase I consists of a Monte Carlo simulation to obtain the simulated percentage of failure per component, given an initial set of distribution parameters.The simulated percentages of failures from Phase I are used to compute the initial discrepancy between the simulated and the historical percentages of failures, which is expressed as the sum of squares of errors (SSE).Nevertheless, other combinations of feasible distribution's parameters values may lead to lower SSE values.(i.e., other high quality solutions could exist in nearby neighborhoods).These neighborhoods are explored in the next phase of the method.
In Phase II, a SA-based search procedure is used to explore the neighborhood within the feasible region of the distributions' parameters in order to locate solutions of potentially higher quality.The objective is to minimize the SSE obtained from Phase I.The SA-based procedure works well for Phase II because it has the ability to move away from local minima in contrast with gradient methods that cannot overcome local minima.The SA neighborhood function is defined based on the bounds of the distribution parameters of each component.The following section discusses the implementation in further detail.

Monte Carlo Simulation to Obtain the Failure Time of Each Component.
A Monte Carlo simulation is often used to obtain not only the simulated failure times per component, but also the distribution of the statistical estimates of the system's TTFs: MTTF and STTF.
At this stage of the proposed methodology, the distribution and the distributions' parameters values per component (obtained from initial data) are considered as primary inputs.The data set for each component contains suspensions (censored data) which represent the units that have not failed by the failure mode in question.They may have failed by different failure modes or not failed at all [2].
In the case of life data, these data sets are composed of units that did not fail.For example, if five units were tested and only three had failed by the end of the test, it would have right censored data (or suspension data) for the two units that did not failed.The term right censored implies that the event of interest (i.e., the time to failure) is to the right of our data point.In other words, if the units were to keep on operating, the failure would occur to the right on the time scale [3].
A random number, drawn from a uniform distribution over the interval [0, 1] is used to generate a failure time from the distribution's inverse cumulative distribution function (CDF).For each component of the model, the formula for the CDF corresponding to the distribution of each component is used and then solved for  (the failure time).Table 1 displays the equations to generate failure times ().
For the normal or lognormal distributions, there are no closed-form expressions but there are good mathematical approximations for that purpose.The approximation utilized in this paper is the one developed by Weisstein [4], which considers the error function (erfc).The definition for the erfc can be found in the Appendix.The inverse CDF equations, shown in Table 1, include cumulated time (); that is, as different individual components are replaced, the different operating times can be adjusted for the components whose reliability has been restored (through either refurbishment or replacement).Components that are not replaced are returned to service (i.e., their operating times are not different than when they were removed from service).This reliability model will be used to compute the system's conditional probability of failure once it has been repaired.The inverse CDF equations are derived from the conditional reliability equations.If there is no cumulated time (e.g.,  = 0), then the inverse CDF equations would be transformed into the unconditional form.The system's TTF is computed considering the component with the lowest TTF as the system failure rate (1).Thus, a single iteration of the loop computes a single system TTF for the model.Statistical estimates (mean and standard deviation of the system TTFs MTTF and STTF, resp.) as well as the simulated percentages of failures for each component are obtained after performing all the iterations of the Monte Carlo simulation.

Simulated Annealing-Based Optimization
Procedure.The simulated annealing (SA) algorithm is a well-known local search metaheuristic used to address discrete, continuous, and multiobjective optimization problems.Its ease of implementation, convergence properties, and ability to escape local optima have made it a popular technique during the past two decades.The application of the SA algorithm to optimization problems emerges from the work of Kirkpatrick et al. [5] and Černý [6].In these pioneering works, the SA was applied to graph partitioning and very-large-scale integration design.In the 1980s, the SA algorithm had a major impact on the field of heuristic search for its simplicity and efficiency in solving combinatorial optimization problems.Moreover, the SA algorithm has been extended to deal with continuous optimization problems [7][8][9].
The SA algorithm is based on the principles of statistical mechanics whereby the annealing process requires heating and then a slow cooling of a substance to obtain a strong crystalline structure.The strength of the structure depends on the rate of cooling of the substance.If the initial temperature is not sufficiently high or a fast cooling is applied, imperfections (metastable states) are obtained.In this case, the cooling solid will not attain thermal equilibrium at each temperature.Strong crystals are grown from careful and slow cooling.The SA algorithm simulates the energy changes in a system subjected to a cooling process until it converges to an equilibrium state (steady frozen state).This scheme was developed in 1953 by Metropolis et al. [10].
The SA-based procedure has been successfully implemented to address problems in manufacturing enterprises such as [11,12].The interested reader is referred to the works of Eglese [13], Aarts and Lenstra [14] and Suman and Kumar [15], for a comprehensive description of the simulated annealing metaheuristic.

Optimization Problem Definition, Objective, and Constraints.
After all the iterations of the Monte Carlo simulation, the percentage of failures is obtained for each component and it is compared to the historical data to obtain the squared error.The discrepancy or the squared error between the simulated and historical percentage of failure is obtained for each component to conform the sum of squared errors (SSE).
The objective function for the optimization process is shown in (2) and aims to minimize the sum of squared errors (SSE) of the system's percentage of failures subject to constraints which are the confidence bounds (contour plots) on the distribution's parameters: where x is the simulated percentage of failures,   is the historical percentage of failures, and  is the number of components.
The simulation-based optimization method will propose new values of the distribution parameters (  ), where   is the th distribution parameter of th component within their contour plots (see Figure 2).According to Wasserman [16], the likelihood contour plot can be drawn satisfying the following relationship: where ln  * is the log-likelihood of the th component at the estimated distribution parameters using MLE, ln (  ) is the log-likelihood of the th component for the perturbed th distribution parameter, and  = 1 − confidence level.In case of unfeasibility, that is, the distribution's parameters at the new state not satisfying (3), one of the distribution parameters is set to its nearest bound (upper or lower bound, whichever is closer) and solved for the other distribution parameter so that they satisfy (3).Equation ( 4) shows the equation of likelihood contour plot for the Weibull twoparameter distribution as an example: ln  ( (lower bound or upper bound) , ) The shaded region in Figure 2 shows an example of the parameters' feasible region for the simulation-based optimization method.
A schematic diagram of the SA-based procedure, used to determine the best value of the distributions' parameters, is presented in Figure 3.
Obtaining the confidence bounds of the distribution's parameters is an integral part of the proposed methodology.There are multiple methods for obtaining the bounds of these parameters.The following section provides the necessary background behind the selection of the likelihood ratio based on the confidence bounds as the preferred method as well as the steps for obtaining the bounds of the distributions' parameters.

Confidence Bounds on the Distribution Parameters.
Our data set contains suspensions (censored data); therefore, the procedure for computing confidence bounds includes censored data in the parameter estimation.In this section, we present the methodology to obtain the likelihood ratio-based confidence bounds of the distribution's parameters.
In the general case, the distribution parameters are estimated using the maximum likelihood estimation (MLE) method with a modified likelihood function for censored data.Using these estimated parameters, confidence bounds can be calculated.The parameters are estimated using "mlecustom" function of MATLAB.The general form of the likelihood function considering censored samples is [ where  is the number of units run to failure,  is the number of unfailed units,  1 ,  2 ,  3 , . . .,   are the known failure times, and  1 ,  2 ,  3 , . . .,   are the operating times on each unfailed unit.For example, if the time to failure distribution is Weibulldistributed, the modified likelihood function for censored samples would be The "mlecustom" function of MATLAB was used to compute the bounds of the distribution's parameters using the normal approximation method.The normal approximation method for confidence interval estimation is used in most commercial statistical packages because of the relative easiness for the bound's computation.However, the performance of such procedures could be poor when the sample size is not large, or when heavy censoring is considered [17].
On one hand, since the input data set contains heavy censoring, the computation of confidence bounds on parameters using the normal approximation is not recommended.On the other hand, Fisher matrix bounds tend to be more optimistic than the nonparametric rank based bounds.This may be a concern, particularly when dealing with small sample sizes.Fisher matrix bounds are too optimistic when dealing with small sample sizes and, usually, it is preferred to use other techniques for calculating confidence bounds, such as the likelihood ratio bounds [2].Hence, the likelihood ratio-based confidence bounds estimation method is preferred.
Without loss of generality, the use of the likelihood ratio statistic for developing confidence intervals about a parameter of interest () can be described.The likelihood ratio (LR) test statistic provides a formal framework for testing the hypothesis:  0 :  =  0 and  1 :  ̸ =  0 .As its name implies, the LR statistic test is a ratio of likelihood functions.However, it is more convenient to work with the log form, which is computed as the difference between two log-likelihood expressions.Specifically,  0 is rejected at some level of confidence where ln  * is the log-likelihood function evaluated at the maximum likelihood (ML) point ln  * ( β, γ) and ln  * ( 0 ) = max  ln ( 0 , ).If β( 0 ) is denoted by the solution to max  ln ( 0 , ), then ln  * ( 0 ) = ln  ( 0 , β ( 0 )) .
An asymptotically correct confidence interval or region on  consists of all the values of  for which the null The LR confidence intervals can be graphically identified with the use of the likelihood contours, which consist of all the values of  and  for which ln (, ) is a constant.Specifically, the contours constructed are solutions to Solutions to ln  * ( 0 ) = ln  * − (1/2) 2 1, will lie on these contours.For the Weibull, lognormal, and normal distributions, the confidence intervals can be graphically estimated by drawing lines that are both perpendicular to the coordinate axis of interest and tangent to the likelihood contour [18].For one-parameter distributions (e.g., exponential), no contour plot is generated and the confidence bound on the parameter will be considered.The summary of steps to obtain the LR based bounds on parameters follows.
Step 1. Load the experimental data and define separate sets of information containing the failure and the censored data points from the input data set.
Step 2. Define the custom probability density function (PDF) and cumulative density function (CDF) equations based on the distribution of the component.
Step 3. Use the "mlecustom()" function in MATLAB to estimate the distribution's parameters values.
Step 4. Compute the output value of the likelihood function (i.e., the modified likelihood function for a component with censored data points) for the parameter estimates obtained in Step 3 using (5).
Step 5. Obtain the graphical estimate of confidence intervals of the parameters satisfying (10) from the likelihood contour plot.
Step 6. Obtain the chi square statistic value and substitute that value in (11) [19] for the specified confidence level: where () is the likelihood function for the unknown parameter vector , ( θ) is the likelihood function calculated at the estimated vector , and  2 ; is the chi-squared statistic with probability 1 −  and  degrees of freedom, where  is the number of quantities jointly estimated.
Step 7. Using the graphical estimates of confidence intervals on the parameters as initial guess, two nonlinear optimization problems are solved to obtain accurate LR limits.Since there are two unknowns (e.g., Weibull's shape and scale parameters) and only one equation, use an iterative method to obtain the values of the parameters (i.e., for given values of scale, obtain the maximum and minimum value of shape parameter that satisfy (11)) and vice versa.The "fsolve()" function in MATLAB was used to compute solution iteratively.The default algorithm for "fsolve()" is the "Trust-Region Dogleg, " which was developed by Powell [20,21].

Summary of Proposed Solution
Procedure.The following steps summarize the proposed methodology.
Step 1.The initial data, consisting of failure times and operating times of unfailed units, is provided.The input data is used to determine the distribution behavior of the components' failure rate and to determine the initial distribution parameters values for each component using Weibull++.
Step 2. The components' distributions and their parameters are used to simulate the expected failure time ().For each component, a random number, that is, () in the range 0 ≤ () ≤ 1, is generated using uniform distribution (0, 1).A failure time is generated corresponding to () from the inverse cumulative distribution function (CDF) equation for each component.Finally, the system's time to failure (TTF) is computed using the system's equation (1).
Step 3. The simulated percentages of failure are obtained from the Monte Carlo simulation for all the components in the system.The discrepancies (e.g., SSE) between the simulated and historical percentages of failures are computed.
Step 4. The simulated annealing algorithm minimizes the sum of square errors (SSE) subject to constraints (e.g., the confidence bounds on distribution parameters).
A schematic diagram of the proposed method is shown in Figure 4.   2 summarizes the parameters corresponding to the distribution of each component used in the analysis.To select the appropriate distribution and parameter values, a weighted goodness of fit measure using Weibull++ software along with engineering experience was utilized.A fixed cumulative time of 1850 hours is used for the 18th component of the model.

The SA's Parameters Tuning through Statistical Design of Experiments (DOE).
A statistical design of experiments can be defined as a series of tests in which purposeful changes are made to the input variables of a process or system so that changes in the output response can be observed and quantified [22].In order to tune the parameters of the optimization algorithm, a design of experiments is performed to understand the cause and effect relationship among the SA's parameters and the output response.
The factors that were selected during the DOE development are (1) the number of accepted trials before the system's temperature is decreased (Trials), (2) the number of Markov chains (Chains), and (3) the rate of cooling (Alpha).Table 3 shows the selected factors and their levels.
The responses are (1) the sum of squared errors (SSE), (2) the mean of squared errors (MSE), and (3) the average CPU time.

DOE Results.
The DOE analysis was performed using Minitab.A total number of 3 3 = 27 computational experiments were performed while considering a 95% significance level.The ANOVA model adequacy verification did not exhibit issues with regard to (1) the residual's normality behavior, (2) the residual's independency, and (3) the residual's constant variance assumptions.
From the ANOVA results, it was observed that the cooling rate (Alpha) was not a significant factor and it was fixed to 0.98.The significant factors were number of accepted trials (Trials) and the number of Markov chains (Chains).
For the sake of brevity, only selected plots are presented.Figure 5 shows the main effects plot considering SSE as a response.
From Figure 5, it can be observed that both factors (e.g., Trials and Chains) have great impact on the response (i.e., The initial systems' temperature was fixed to 10.In order to get this value, the initial temperature is initialized at a very low value and an experimental run is performed where the temperature is raised by a ratio until a minimum acceptance ratio defined by the practitioner (0.8) is met; the acceptance ratio is computed as the number of accepted trials (Trials) divided by the trials.
A sensitivity analysis was conducted to determine the best value for the Monte Carlo sample (MC is not an SA algorithmic parameter but it is a parameter in the overall solution procedure).It was observed that the SSE becomes stable for Monte Carlo sample size of 10,000 and larger values; that is, the SSE is not sensible to changes in the Monte Carlo sample size after a value of 10,000 samples.On the other hand, if the sample size for the Monte Carlo loop is below 10,000, the SSE considerably increases for this particular problem.Hence, the selected number of Monte Carlo samples was fixed at 10,000.

Test Case Results
. The evaluation of the test case was performed based on the outcomes of the previously presented DOE analysis.After conducting a computational evaluation of the test case, the value of the SSE was found to be 35.1834having an average CPU time of 55,239 seconds.Table 4 compares the values of the SSE before and after the optimization.
The SA-based procedure was executed five times considering the best combination of algorithmic parameters.Figure 6 shows the average performance of the solution procedure.
Table 5 shows the discrepancies between the simulated and historical percentages of failures for the components before and after the optimization.
Figure 7 shows the histogram of percentage of failures for the historical data, before and after the optimization.
In order to observe the shifting of the distribution's parameters at the different evaluations of the DOE, contour plots were constructed for each component.Figure 8 shows the shifting of the distribution's parameters at the different evaluations of the DOE for the 45th component and Figure 9 shows both the original (fitted) and best found combination of distribution parameters values for 45th component.

Conclusions and Future Work
This paper presented a simulated annealing-based optimization method to minimize the discrepancy of historical and simulated percentages of failures in a turbine engine model.A DOE was performed for the tuning of the algorithmic parameters.The results showed a 30% reduction of the SSE (e.g., from 50.6541 to 35.1834) for the engine model.The average CPU time was approximately 15 hours mainly due to the calculations involved in the likelihood function.Alternative neighborhood and feasibility functions can be investigated by studying the trends in the shifting parameters' values per component.For instance, it was observed that the distribution's parameters shifted around the edge of the contour plot for some components.
The proposed simulation-based optimization method can serve as a decision-making tool for maintenance, repair, and overhaul companies and will potentially reduce the cost of labor associated to finding the appropriate value of the distribution's parameters for each component/failure mode in the model and increase the accuracy in the prediction of the mean time to failures (MTTF).
Future research lines involve parallelization of the algorithm to solve larger models (e.g., thousands of components) and comparing the performance of the simulated annealing with other metaheuristics such as Evolutionary Algorithms, Tabu Search, and Particle Swarm Optimization, among others.

Figure 1 :
Figure 1: Series system for the mock engine model.

Figure 2 :
Figure 2: Feasible region (contour plot) for the distribution parameters of one component.

Figure 4 :
Figure 4: Schematic flowchart of the proposed method.

4. 1 .
Distribution and Parameter Fittings.The turbine engine model consists of 54 components/failure modes.Table

Figure 5 :
Figure 5: Main effects plot for SSE.The gray background represents terms that were not significant in the ANOVA.

Figure 6 :Figure 7 :
Figure 6: The average performance of the SA-based procedure.

Figure 8 :
Figure 8: Distribution parameters for the 45th component for different evaluation cases.

Figure 9 :
Figure 9: Original distribution's parameters values and best found combination of distribution's parameters values for the 45th component.

Table 1 :
Inverse CDF equations for different distributions in the case study.

Table 3 :
Factors and levels during the DOE development.

Table 4 :
Comparison of the responses (before and after the optimization).

Table 5 :
Comparison of the responses (before and after the optimization).