The Chaotic Prediction for Aero-Engine Performance Parameters Based on Nonlinear PLS Regression

The prediction of the aero-engine performance parameters is very important for aero-engine condition monitoring and fault diagnosis. In this paper, the chaotic phase space of engine exhaust temperature EGT time series which come from actual air-borne ACARS data is reconstructed through selecting some suitable nearby points. The partial least square PLS based on the cubic spline function or the kernel function transformation is adopted to obtain chaotic predictive function of EGT series. The experiment results indicate that the proposed PLS chaotic prediction algorithm based on biweight kernel function transformation has significant advantage in overcoming multicollinearity of the independent variables and solve the stability of regression model. Our predictive NMSE is 16.5 percent less than that of the traditional linear least squares OLS method and 10.38 percent less than that of the linear PLS approach. At the same time, the forecast error is less than that of nonlinear PLS algorithm through bootstrap test screening.


Introduction
Aeroengine condition monitoring is applied as a better way to estimate aero-engine condition and reliability amongst most of the commercial airlines 1, 2 .Predicting and detecting abnormal behavior of main aeroengine performance parameters is of great importance in the aeroengine condition monitoring.With the development of modern industry, one typical aeroengine becomes a more and more integrative and complex system, of which working condition and fault are very difficult to predict and judge, and the influence and harm caused by system faults are also more serious than ever.Therefore, there will be a growing demand for effective prediction methods of long-term trend of aeroengine system to predict

Data Pretreatment
Typical aeroengine performance parameters are exhaust gas temperature EGT , low-spool rotor speed NI , high-spool rotor speeds N2 , and fuel flow WF 13 .These four measurements are often called the four basic parameters, which can be recorded in the airborne equipment on most new and old aeroengines.Note that we take the most important performance parameter, EGT, as example to analyze in this paper; the other parameters can be fitted in the same method.

Similarly Correcting EGT Sequence
According to operating principle of the aeroengine, the data of main performance parameters of the same aeroengine in different external flight conditions such as the external atmosphere total temperature, total pressure, MACH numbers are quite different, so the original engine exhaust temperature EGT data usually cannot be directly used for comparison analysis.To solve this problem, similar theory is adopted to eliminate the influence of external conditions on the aeroengine EGT sequence 14 .

Eliminating Outliers According to Layida Criterion
If outliers exist in the data, they will seriously affect the prediction accuracy of an algorithm.Layida criterion is the most commonly used for discrimination outliers, of which the precondition is that there is sufficient data 15 .For time series {X i } i 1, . . ., n , if n is large enough normally n > 80 , and only random disturbance exists in all data.Assume that the random time series is following a normal distribution whose mean is μ and deviation is σ 2 , so the probability that the deviation X i − μ falls in ±3σ ranges is about 0.3%.According to Layida criterion, if the absolute deviation of a time series data X i i 1, . . ., n is larger than 3σ, then X i can be judged as an outlier and should be removed.
Let the number of sample be N, the observed value of time series is {x i } i 1, . . ., N , the sample mean and standard deviation is x and S, respectively.If the sample point x i meets |x i − x| > 3S, then x i is an outlier which will be eliminated.x and S should be computed again after all the outliers are removed.

Chaos Prediction Model Based on Nonlinear Partial Least Squares (PLS) Regression
Suppose observed time series is {x n } n 1, 2, . . ., N , the chaotic phase space s n is reconstructed by 16 .
where τ is the delay time, m is the embedding dimension and N is the length of the time series.

Characterizing Chaos through Lyapunov Exponents
Through the years, the existence of chaos has been characterized in time series by means of several methods, among others: analysis of Fourier spectra, fractal power spectra, entropy, fractal dimension, and calculation of Lyapunov exponents.However, several of these methods have proven not to be very efficient 16 .The Lyapunov exponents provide all the information about the local and global complexity of chaos, therefore the measurement of Lyapunov exponents has been an effective method to judge whether a time series is caused by a chaotic dynamical system.The Lyapunov exponents λ 1 , λ 2 , . . ., λ d are the average rates of expansion λ i > 0 or contraction λ i < 0 near a limit set of a dynamical system 16 .In fact, the LE is a quantitative measure of the divergence of several trajectories in the system.There is one LE for each direction in the d-dimensional phase space where the system is embedded.The variable d represents the Lyapunov dimension or phase space dimension.The LE does not change when the initial conditions of a trajectory are modified or when some perturbation occurs in the system.If at least one LE is positive, the system presents chaotic motion 17 ; hence if chaos exists in a dynamical system, the largest Lyapunov exponent must be positive.
We will calculate the largest Lyapunov exponents according to the algorithm offered by Wolf et al. 1985 17 ; the numerical results are shown in Table 1.

Chaos Prediction Model
Let state vector at time T be s T , we obtain the nearest neighbor points s α 1 , s α 2 , . . ., s α K by calculating the Euclidean distance between the target points s T and any one of reconstruction vectors 18 , where Let X be the sample matrix of reconstructed variable, Let y be the prediction variable, then the nonlinear chaos prediction model is described as where In practice, the observed data sets of independent variable and dependent variable can usually be obtained.However, the model form of their relations is unknown especially in case of higher dimensions, and the problem becomes more complicated when the relationship between the dependent variable and independent variables is nonlinear.Usually 3.4 is fitted by additive model or multiplicative model.For the sake of computation convenience, we consider the addition model of each independent variable; that is, At present, the nonlinear relationship between the original variable usually is into a quasi-linear relationship, after then, the prediction model is established by an effective linear theory method.Spline function and kernel function are commonly used in the transformation of the basic functions. where 3.9 where β 0 , β j,l are the model parameters to be determined; ξ j,l−1 , h j , M j are range points, segment length and section number are divided on the variable x j , respectively.The minimum observed value of variables is recorded as min x j and the maximum as max x j , then the nonlinear prediction function relationship between independent variables and dependent variable can be transformed as the following equation: Equations 3.6 -3.8 show that the relationship between y and z j,l is linear, and the relationship between x j and z j,l is nonlinear, so the chaos prediction model with cubic spline function transformation is a quasi-linear regression model see in 3.10 .
The principle of Kernel function transformation is same as spline function transformation; that is, a kernel function transformation based on one dimensional nonlinear function f j x j , and the cubic spline function{Ω 3 x j − ξ j,l−1 /h j l 0, 1, . . ., M j 2 } is replaced by basic function {K x j − ξ j,l−1 /h j l 0, 1, . . ., M j 2 }.Thus, the chaos prediction model with kernel function transformation is as follows 19 : The commonly used kernel functions include uniform kernel function, Epanechnikov kernel function, biweight kernel function, tri-weight kernel function and Gaussian kernel function.

Nonlinear Partial Least Squares Regression (NPLS)
However, there exist some problems when establishing the above quasi-linear regression model based on ordinary least squares method, because the dimension of regression model will increase significantly after a function change, which may lead to the result that the number of sample points is less than the number of transformed variables; on the other hand, the data of reconstructed variable x 1 , x 2 , . . ., x m are come from the same series, and the nonlinear correlation exists between each other.So it is not appropriate to employ the ordinary least squares method to estimate the model parameters when the multicollinear relationship among the new transformed independent variables is found.One of the effective methods to solve this problem is to use partial least squares method PLS to estimate parameter of the predictive model.
In this paper, a modified chaos prediction approach based on nonlinear PLS regression with basic function transformation is developed.The cubic spline function and various kernel function transformation are used in our model, and the comparative analysis between them is presented in Section 5.

Prediction Evaluation
Normally, prediction accuracy is evaluated by the mean squared error MSE, where x T p is predict value, x T p is observed value.However MSE is concerned with the order of the magnitude of the observed data.In this paper, the normalized mean square error NMSE is used as the evaluation criterion of prediction.
where σ 2 is sample variance of the prediction sequence.Our investigation adopts the mean of 20 predicted values NMSE in order to overcome contingency.

Algorithm and Algorithm Flowchart
Step 1.The observational data of aeroengine performance parameter EGT are collected and similarly corrected, and then the outliers are removed according to Layida criterion.
Step 2. For the selected embedding dimension m 1 ≤ m ≤ m max and delay time τ 1 ≤ τ ≤ τ max , the chaos phase space on pretreated EGT time series is reconstructed, and the appropriate nearby points are selected to construct independent variable matrix X and dependent variable vector y.
Step 3. Choose section number M 1 ≤ M ≤ M max ; each row vector x j j 1, 2, . . ., m in independent variable matrix X is transformed to Z j by various basis function transformation such as 3.8 , then Z j z j,0 , z j,1 , . . ., z j,M 2 .

4.1
Step 4. The new independent variable Z j and dependent variable y are standardized and marked as Z, y Z 1 , Z 2 , . . ., Z m , y , so the prediction model is described as where α j,l j 1, 2, . . ., m; l 0, 1, . . ., M 2 are the parameter of regression model 4.2 , ε is the stochastic error and follows the normal distribution with zero mean and the same variance.
Step 6.If estimation of prediction model for all the combination of M are completed, then go to step 7, otherwise go to step 3.
Step 7. Calculate the predicted value according to prediction function 3.6 and calculate the corresponding predicted NMSE.
Step 8.If the calculation of the combination of all the embedding dimension m and delay time τ is completed, then go to step 9, otherwise go to step 2.
Step 9. Find out the optimal prediction model with the smallest NMSE, and obtain the corresponding dimension m, delay time τ and section number M.
Algorithm flow chart is shown in Figure 1.

Results and Analysis
The in-the-wing ACARS data of four PW4077D aeroengines have been collected for one year, and about 1400 EGT data of each engine were selected to form a time series used in numerical example analysis.The maxim value of embedding dimension m max, the delay time τ max, and the segment number M max in the above algorithm are estimated after a large amount of experiment.In our research, the range of m is among 1-30, τ is 1-20, M is 1-8, respectively.The cubic spline function and five kernel function introduced in Section 2 are employed for transformation, and we minimize chaotic prediction NMSE based on PLS algorithm.The numerical results of the first engine are shown in Table 1, in which line 1 to line 6 show the prediction results based on various NPLS, line 7 to line 8 show the prediction results based on the common PLS and OLS.
As can be seen in Table 1, all the largest Lyapunov exponents are positive and it is clarified that the nonlinear and chaotic natures exist in EGT sequence.The first seven NMSE calculated based on PLS are all less than that based on OLS, which indicates that there is an obvious advantage of PLS over OLS in fitting chaotic prediction models.The NMSE of PLS based on various nonlinear transform methods is even less than that of free transformed PLS, and it indicates that there is a greater advantage of chaotic prediction based on PLS in forecasting aeroengine EGT.PLS prediction algorithm based on biweight Kernel transformation is the optimal chaotic prediction algorithm, estimated optimal embedding dimension is 8, the delay time is 19, and the number of nonlinear transformation section is 2.
PLS regression model includes all of the original selected variables, and there is multi-collinearity in the independent variables, which can affect the prediction accuracy.The model parameters estimated based on PLS is of a pretty complex nonlinear character,  so the independent variables cannot be selected by the parameter test of linear methods.In this paper, the nonparametric statistical method Bootstrap 20 is adopted to select the independent variables transformed with kernel function and Cubic spline function, and to find the minimum NMSE; the optimal results of the first engine are shown in Table 2.
Table 2 shows the predictive results based on various NPLS algorithm after selecting independent variable by Bootstrap method.Only the optimal predictive NMSE based on triweight Kernel function transformation decreases, and the minimum NMSE in     The deviation between the predictive value and original value based on the above three algorithms are shown in Figure 3.The fluctuations of the deviations curve based on biweight Kernel method with the optimal NMSE is the smallest, and larger predicted error is not appeared in its deviation sequence.parameters EGT.Numerical results show that the proposed model is able to effectively predict the performance parameter EGT with a higher degree of accuracy.The optimal embedding dimension is 8, the delay time is 19, and the number of segments is 2. NPLS chaotic prediction algorithm based on biweight Kernel function transformation is the optimal EGT prediction algorithm because of the smallest predictive NMSE.Our prediction NMSE is 16.5 percent less than that of the traditional linear least squares OLS method and 10.38 percent less than that of the linear PLS approach.The NPLS algorithm of chaotic prediction with selection variables through Bootstrap test fails to make NMSE further reduced.And we find that reducing calculation time complexity by lessening search range of embedding dimension will not affect the prediction accuracy of EGT.It is proved that the proposed algorithm is robust by the predictive results from four aeroengines.
In particular, the conservative baselines of main aeroengine performance parameters are given in ECM program provided by jet engine manufacturers 14 .Set the baseline value of EGT be EGT base , the deviation between EGT cor and EGT base can be defined as ΔEGT EGT cor − EGT base . 6.1 The trend plot of the deviation ΔEGT sequence in continuous flights is routinely used by airlines for the performance monitoring and diagnostics 13 .On the basis of our prediction algorithm, the EGT cor of upcoming flights can be predicted, and then a layout for trending of the deviation of main performance parameters for upcoming flights can be provided.Experienced power maintainer or engineer look at a given trend shift in the measurement deviations can identify significant behavior disparities, evaluate the aeroengine condition, and isolate the faulty module or component using the ECM trend plot reports and related PW4000 fingerprint.
In summary, accurately forecasting the aeroengine performance parameters is of great importance in the aeroengine condition monitoring.The anomalies of the aeroengine performance parameters can be detected in time by the predictive value calculated from our algorithm, then maintenance measures are taken early to prevent the aeroengine from failure effectively, which assists to reduce aircraft down time, and improve the reliability of the aeroengine.

Figure 2 :
Figure 2: Comparison of original values and predictive values.

Figure 4 :
Figure 4: Embedding dimension analysis of some forecasting methods.

Figure 5 :
Figure 5: Delay time analysis of some forecasting method.

Figure 6 :
Figure 6: Subset numbers analysis of some forecasting methods.

Figure 2
Figure 2 shows the comparison of the prediction values based on OLS, PLS, and biweight Kernel transformation chaos algorithm with the original value.The PLS algorithm based on biweight Kernel function transformation has the optimal NMSE in Figure 2.However, not all of the predictive value has the minimum forecast NMSE.The deviation between the predictive value and original value based on the above three algorithms are shown in Figure3.The fluctuations of the deviations curve based on biweight Kernel method with the optimal NMSE is the smallest, and larger predicted error is not appeared in its deviation sequence.
where EGT obs is the original EGT data, EGT cor is the corrected value of EGT obs , θT 2 TAT 273.15 /288.15,TAT is the total air temperature.

Table 1 :
Comparative analysis of the optimal results based on various forecasting methods.

Table 2 :
The optimal results after selecting independent variable by Bootstrap method based on NPLS.

Table 2
is larger than that in Table1; it indicates that better NMSE cannot be achieved by Bootstrap test for selecting independent variables.