Multisensor Prognostic of RUL Based on EMD-ESN

This paper presents a prognostic method for RUL (remaining useful life) prediction based on EMD (empirical mode decomposition)-ESN (echo state network). The combination method adopts EMD to decompose the multisensor time series into a bunch of IMFs (intrinsic mode functions), which are then predicted by ESNs, and the outputs of each ESN are summarized to obtain the final prediction value. The EMD can decompose the original data into simpler portions and during the decomposition process, much noise is filtered out and the subsequent prediction is much easier. The ESN is a relatively new type of RNN (recurrent neural network), which substitutes the hidden layers with a reservoir remaining unchanged during the training phase. The characteristic makes the training time of ESN is much shorter than traditional RNN. The proposed method is applied to the turbofan engine datasets and is compared with LSTM (Long Short-Term Memory) and ESN. Extensive experimental results show that the prediction performance and efficiency are much improved by the proposed method.


Introduction
Prognostics is an engineering discipline with regard to predictive diagnostics, including calculation of the remaining useful life based on the observed system condition [1]. As machines are getting increasingly complicated, prognostics for complex systems has attracted more significant interest [2]. RUL prediction enables identifying problems at an early stage, which makes it beneficial for industries to make effective maintenance planning and reduce maintenance cost [1,3]. Research techniques of RUL have been adopted in various industrial objects, including lithium-ion battery [4] and turbofan engine [5].
In general, the RUL prediction methods are typically classified as physics-model based approaches, data-driven approaches, and hybrid approaches, and data-driven methods can be divided into statistical model based approaches and AI approaches further [6,7]. Among them, approaches based on the physical model [8,9] generate an explicit mathematical model of the degradation processes of machinery. e modeling process tends to be difficult because in-depth knowledge and rich experiment are required [7]. us, the methods are less used [6]. Compared with physics-model based approaches, data-driven approaches are easier to be implemented. ey often extract features reflecting the failure through mining the historical data. Statistical model based approaches such as AR models [10] and random coefficient models [11] describe the degradation process by statistical or stochastic models. e methods can depict the uncertainty of the degradation process and its influence on RUL prediction [6]. However, they are not good at modeling for nonlinear systems, and AI approaches such as ANNs [12,13], NF systems [14], and SVM [15] can deal with the issue. ey operate RUL prediction through feature extracting and data training without building the specific model. In recent years, research about AI approaches for RUL prediction has gained more and more attention. Hybrid approaches combine physics-model based approaches and data-driven approaches to take advantage of different approaches.
RNN (recurrent neural network) is a kind of AI approaches for RUL prediction, and it performs well for forecasting tasks of time series data [16]. But the use of RNN is limited due to the "fading memory" problem caused by gradient explosion or gradient dispersion. A variant of RNN called LSTM (long short-term memory) was proposed [17,18]. Lots of research work based on LSTM has been carried out, and good results are acquired [19][20][21][22].
Nevertheless, the traditional RNNs possess disadvantages of computational burdens and being time consuming because numerous parameters need to be trained. ESN (echo state networks) can provide a solution to deal with the problem [16]. ESN was first proposed in 2001 [23] and it is a relatively new type of RNN. ESN has a randomly generated reservoir, which remains unchanged during the training phase and only a readout is trained [24]. e characteristic makes ESN consumes much less time for the training process compared with traditional RNN such as LSTM. Indepth research for RUL prediction based on ESN still needs to be carried out [16]. Most of the existing research concentrates on optimization of the architecture and parameters of ESNs to obtain a better prediction result [25,26].
However, existing research lacks attention to the analysis and mining of the original data. To deal with this, decomposing the raw time series into subsequence is an effective solution [27][28][29][30]. Reference [28] proposed an SDA (secondary decomposition algorithm) to decompose the original wind speed data into detailed components twice. e decomposition algorithms include WPD (wavelet packet decomposition) and FEEMD (fast ensemble EMD).
en an Elman neural network performed the prediction. In [29], to mitigate the problem that the model cannot handle environmental factors, a hybrid EMD-based prediction model was proposed for wind speed and solar irradiation forecasts. In [30], for chaotic time series prediction, PE (permutation entropy) was adopted to analyze the complexity of the IMFs decomposed by EEMD (ensemble EMD). e decomposed IMFs with similar complexities were combined, so fewer inputs for prediction were gained. en, the ESNs performed the prediction and the ultimate results were assembled.
Although the combination of signal decomposition and ESN have performed well for forecasting problems, in recent years, few related studies have been conducted. As far as the authors know, they are not yet applied to RUL prediction. In most cases, the prediction of RUL is based on multisensor data and a large amount of noise needs to be filtered out. Signal decomposition methods can handle noise well exactly. Inspired by this, this paper proposes a novel combination method for RUL prediction based on EMD and ESN. EMD can smooth the noisy data and decompose the time series into a bunch of IMFs, which are then predicted by ESNs. e outputs of the ESNs are summarized as the final RUL prediction results. Besides, data processing methods including GBDTfeature selection and Kalman filtering are also adopted. e proposed method is verified through a case study on turbofan engines of aircraft. e relevant data of turbofan engines is collected from multiple sensors under variable operating conditions. e proposed method is compared with ESN and LSTM [19]. e results demonstrate the superiority of our proposed method. e contributions of this paper are as follows: (1) We introduce the idea of signal decomposition to multisensor data processing for RUL prediction. e adoption of EMD has a good effect on noise disposal, which provides a reference solution for reducing the noise of the RUL prediction problem. (2) We propose a combination method of EMD and ESN for RUL prediction, which provides a new idea for RUL prediction methods based on ESN. (3) We conduct a lot of experiments on C-MAPSS datasets. During the process, some data processing skills such as GBDT feature selection and Kalman filtering are adopted. e results show that our proposed method is well performed and effective for RUL prediction. e rest of this paper is organized as follows. Section 2 elaborates the methodology and presents our method. Section 3 demonstrates our experimental process. Meanwhile, a comparison of models and results analysis is also given. In Section 4, conclusion and highlights of future work are drawn.

Empirical Mode Decomposition.
e EMD [31] is the core component of the Hilbert-Huang transform. It is an adaptive time-frequency analysis method for processing nonlinear and nonstationary time series. EMD can decompose time series into a group of IMFs without analysis in advance. An IMF should satisfy two conditions: (1) roughout the data segment, the number of the extreme points and that of the zero crossing points should be equal or the difference is no more than 1; (2) e mean of the upper envelope formed by the local maximum points and the lower envelope formed by the local minimum points is zero. e detailed steps of the EMD decomposition process are described below.
(1) Assume the time series to be analyzed is x(t). Extract all the local maxima and minima. (2) Fit the maximum value points and minimum value points by cubic spline interpolation, respectively, to form the maximum value envelope e max(t) and the minimum value envelope e min(t). e mean value of the extreme envelopes m(t) can be obtained: (3) Subtract the mean value m(t) of the envelope from the original time series x(t), and record the result as y1(t).
(4) Determine whether y1(t) is an IMF component according to the above-mentioned judgement criteria of the two conditions. If y1(t) does not meet the conditions, treat y1(t) as the original data and repeat the above steps until y1(t) satisfies IMF conditions. At this time, take c1(t) � y1(t) and c1(t) as the first IMF component of the time series x(t).
(5) Take the remaining data r1(t) � x(t) − c1(t) as the original time series and decompose it for k times with the same principles as above to obtain the k − order IMF and a residual. e stopping criterion includes the following: (i) e residual r(t) is a monotone function; (ii) the kth IMF or the residual r(t) is smaller than the specified threshold; (iii) the number of zero crossing points and extrema values differs by 1 at most. (6) Finally, x(t) can be described as follows: where c k (t), r(t) and n represent the kth IMF, the residual, and the number of IMFs, respectively.

Basic Structure of ESN.
ESN is a relatively new type of RNN, which replaces a reservoir for the hidden layers of RNN [23]. It is able to catch the system's dynamic behaviours and has intrinsic memory properties [16]. ESN uses the reservoir to map input space to feature space, in which neurons connect with each other randomly and sparsely. Input, feedback, and connection in the reservoir matrix are generated at random and remain unchanged during the training process, so the only variable that needs to be trained is the output matrix, which realizes computational savings. As shown in Figure 1, ESN consists of a K units' input layer, a N internal units' reservoir, and a M units' output layer. e directed arrows represent the weight connections of neurons. e solid arrows indicate required connections, while the dashed arrows denote that connections are possible but not required. At time step t , variables of the input units are u(t) � (u 1 (t), u 2 (t), . . . , u K (t)) T , of the internal units are x(t) � (x 1 (t), x 2 (t), . . . , x N (t)) T , and of the output units are y(t) � (y 1 (t), y 2 (t), . . . , y L (t)) T . e input weight matrix W in ∈ R N×K represents the weights from input units to internal units, and the internal weight matrix W ∈ R N×N represents the weights between the internal units and the feedback weight matrix W back ∈ R N×L represents the weights from the output units to the internal units. Activation functions such as sigmoid function and hyperbolic tangent function are adopted in the reservoir and the output layer. And at time step t the reservoir state can be acquired as follows: where f � f 1 , f 2 , . . . , f N are activation functions of the internal units. e ESN's output equation can be obtained as follows: and if we use W out gen � W inout , W out , W outout to indicate all the connections to the output neurons, then the above formula can be given as follows: where . . , f out M are also activation functions for the output layer which are often chosen to be linear. Moreover, [u(t); x(t); y(t − 1)] is a concatenation of the input state, internal state, and the output state of the last time step.
e dimension of the output weights W out gen is us, we can use ESN � W in , W inout , W r , W out , W outout , W back } to depict the ESN and the weights W inout , W outout , and W back are optional. All the weights except the output weights W out gen are initialized randomly and remain unchanged during the training process. Only the weights W out gen need to be trained.

Key Parameters of ESN
(1) Reservoir size (N): e reservoir size N represents the number of neuron units in the reservoir, which is very important and has a great influence on the performance of ESN [32]. If N is chosen to be too small, the network may not fit the expected output fully, while a too large reservoir may cause data overfitting and calculation consumption. e choice of N should take the complexity and effectiveness of the network into account to meet specific task requirements. (2) Spectral radius (ρ(W r ) or SR). e spectral radius represents the maximum absolute eigenvalue of the internal weight matrix (W) of the reservoir. It decides the memory capacity of the ESN. If it is too small, the previous input has little effect on the present output, and the reservoir has a poor memory. If it is too large, the state of the reservoir may be unstable during the iteration process. It is not sufficient but necessary that the setting ρ(W r ) < 1 can guarantee ESP (echo state property) in most cases [33]. ESP can ensure the network state is uniquely decided by the input history [26]. ..

Mathematical Problems in Engineering
(3) Sparsity (D): Different from most neuron networks, the neurons in the reservoir are not fully connected and the sparsity denotes the sparse degree of the neurons in the reservoir. In a general way, the D is selected around 0.1 to guarantee the reservoir has enough dynamic properties [34]. When D is selected as 1, the neurons in the reservoir is fully connected and the ESN has evolved into a traditional RNN at this time. (4) Input scaling (α W in ): e input scaling factor α W in is a parameter that can realize a scaling transform to the input weight matrix W in and it influences the degree of linearity of the responses of the reservoir units [35]. e larger α W in is, the more nonlinear the response is. It is usually chosen to be within the interval [0, 1].

Training Steps of ESN.
Assume that a training sample is (u(t), y(t)), which contains t time steps. e general training steps of ESN are depicted below.
(1) Initialize the parameters of ESN. e ESN weights W in , W r , W out are randomly initialized first. To satisfy the requirement of ESP (echo state property), the W r is scaled by the scaling factor α W r to meet ρ(W r ) < 1. e state variables x(0) also need to be initialized.
(2) Update and collect the internal states of the reservoir. e internal states can be updated according to equation (4) driven by input signals u(t) and the internal state of the last time step x(t − 1). e state variables before the time step k min − 1 should be abandoned to eliminate the influence of initial states on the network performance, which is called the washout phase. e state variables after k min are collected.
e output weights W out are computed according to equation (5) to minimize the target function.

Combination Model Based on EMD-ESN.
A combination model based on EMD and ESN is adopted in this paper, comprising decomposition of EMD, prediction of the ESNs, and summation of the separate outputs. e process is demonstrated in Figure 2.
First, the raw time series has been denoised to obtain cleaner results, which are then taken as inputs to be decomposed by EMD into IMFs adaptively. e IMFs are arranged from high to low frequency and the complexity of the IMFs is reduced greatly; thus the analysis and modeling of each IMF are much easier.
Second, we use multiple ESNs to train and predict decomposed IMFs. e training steps are illustrated in Section 2.2.3. A sampling window is used to construct samples of the training and testing datasets, which are taken as inputs to train the ESNs. Each ESN gives an output.
Finally, the outputs of the ESNs are assembled to obtain the final prediction result of the RUL.

Experimental Setup
e dataset used to support the findings in this paper have been deposited in the "Turbofan Engine Degradation Simulation Data Set," NASA Ames Prognostics Data repository [36], which contains simulated degradation data for turbofan engine [37]. ere are 4 subdatasets in the C-MAPSS dataset denoted as FD001, FD002, FD003, and FD004. Each dataset contains multivariate time series data under different kinds of operational conditions and fault modes. It is also separated into a training set and a testing set, as shown in Table 1. Each row is a snapshot of the run-tofailure records taken within a single time cycle. It contains 26 columns, which represent an engine ID, a current operational cycle number, 3 operational settings, and 21 sensor values, respectively. Each engine unit's initial state is unknown but is considered to be healthy. As the operational cycle number increases, the engine units begin to degrade at some point, which is also unknown. For the training set, the engines run till a failure occurs and the whole snapshots of degradation data can be acquired. While for the test set, the degradation ends sometime prior to a failure. e goal is to predict the number of remaining operational cycles (RUL) for the test set to verify our proposed method.

Data Preprocessing
(1) Sensor Data Selection and Normalization. ere are 21 sensor measurements contained in the C-MAPSS dataset. However, not all variables are closely related to the RUL prediction. We analyze the correlation between sensor data and the RUL values to select important features using GBDT (gradient boost decision tree). GBDT is a kind of ensemble learning algorithm with the decision tree as its basic estimator. It performs well when used for feature selection and can output the relative importance of the features, which is shown in Figure 3. e results are ranked according to relevance from high to low, and the top 10 sensor measurements are selected as the data to be analyzed next.
As different ranges of sensor values are generated, minmax normalization has been used to normalize the original data and limit it within the interval [0, 1].
where x i,j (t) is the original sensor measurement of the jth sensor of the unit i. min(x i,j ) and max(x i,j ) represent the minimum value and the maximum value of the unit i. x i,j ′ (t) is the normalized value.
(2) Samples Constructed through Time Window. In order to construct training samples of the ESNs, a time window is used, sliding along the processed sensor data. As shown in where n is the total number of the testing data samples, and h i � RUL i − RUL i (for the i th data point estimated RUL − true RUL). More penalty is given to overestimation than early predictions by the function since in a real industrial scene, late maintenance may result in more severe consequences than untimely maintenance. Root Mean Squre Error(RMSE) is defined as equation (9), which is a widely used evaluation metric for RUL estimation. RMSE gives an equal penalty to both early and late predictions. e difference between the scoring function and the RMSE function can be seen in Figure 5.

RUL Target Function.
In practical prognostic applications, the accurate RUL values of the turbofan engines are unknown, which is different from our simulation experiment data as it contains the specific RUL value of each cycle [22]. In a system, the degradation of the equipment for the initial period of time is negligible until it runs for a certain time. So, the initial state of the engines is often taken as healthy and a piecewise linear RUL target function is adopted for the prediction. As shown in Figure 6, the maximum RUL value is limited to a constant value of 130 cycles [19,38], which represents that we neglect degradation in the initial cycles and the linear decline of the RUL value starts from the time cycle of the number 130.

Prognostic
Procedure. e prognostic procedure can be depicted in Figure 7. First, the raw data is preprocessed including Kalman filtering, data normalization, and sensor data selection to obtain cleaner data. en, the processed data is selected through time window sliding and training samples are constructed. Next, the time series is decomposed by EMD and multiple IMFs and a RES are obtained, which are then trained by ESNs of the same number. Finally, the output of each ESN is added up to get the final prediction result.     Mathematical Problems in Engineering the experimental procedure. en, the proposed method is employed on the C-MAPSS data set and is compared with LSTM [17] and basic ESN. Finally, a results analysis is given.

(1) e Time Window Length Selection
Experiment. An experiment is carried out to select the most appropriate length of the time window to construct training samples. As we can see in Figure 8, the corresponding relations between the time step and RMSE are shown. As the time step increases, the trend of RMSE is generally downward. is can explain that to a certain degree, the more historical data training samples take, the more accurate the prediction is. e minimum cycles of the test dataset are only 31, so we choose 30 as the length of the time window in this paper. Besides, when the length grows more than 30, the decline of RMSE is not that much. In some research based on the C-MAPSS data set, the length of the time window is also chosen to be no more than 30 [19,21,22,25,26].
(2) e EMD Decomposition. After the process of feature extraction by GBDT and sampling by the time window, the 10 sensor measurements are then decomposed by EMD. e decomposition result of the training set of FD001 is shown in Figure 9. ere are 4 IMFs and a RES generated, which have simpler information each and can be synthesized into the original time series. e decomposition results are taken as the input of the ESNs next. Much noise can be filtered out during the decomposition process, which is beneficial for prediction improvement.
(3) e ESN Training Experiment. As depicted in Sections 2.2.2 and 2.2.3, to train ESN, we need to set some parameters, i.e., the reservoir size N which influences dynamics of the system, the spectral radius SR which decides the ESP (echo state property) of the reservoir, the sparsity D to represent the connection degree of the neurons, and the input scaling α W in to scale the input signal. Besides, in this paper, the regularization coefficient λ is used to prevent overfitting. With the grid search strategy, the parameters of the ESN are set as the values in Table 2.
e training process of the ESN can be divided into two subprocesses, which includes preliminary training and weights calculation. e preliminary training process is for the purpose to eliminate the effects of the initial network state on the training process. e weights calculation process     Mathematical Problems in Engineering is to calculate the W out illustrated in Section 2.2.3, which is taken as a linear regression process. e learning algorithm adopted in this paper is RR (Ridge Regression). e hyperbolic tangent function is chosen as the activation function in this paper, which can be expressed as equation (10). e response curve is demonstrated in Figure 10. e activation can make the network have good nonlinearity.
e overall prediction results are shown in Figures 11  and 12, which demonstrate the prediction of ESN and EMD-ESN, respectively. As we can see in the figures, the models of ESN and EMD-ESN have both acquired good results. However, the prediction accuracy is enhanced by EMD-ESN. e circumstances of overpredicting have also been reduced, which is encouraged in the PHM (prognostics and health management) applications because delay maintenance may cause major damage and cost. Besides, it can be discovered that when the actual RUL values are relatively small, an overprediction situation is more likely to happen.
One of the reasons for the problem may be that the amount of historical data is small.

Results Comparison and Analysis.
In the experiment, the prediction performance and efficiency of EMD-ESN, LSTM, and ESN are compared. e experimental results and analysis are discussed below.
(1) Results Comparison of Prediction Performance. e results of the prediction performance of different models are revealed in Table 3. e performance metrics include Score and RMSE, which are depicted in Section 3.      Mathematical Problems in Engineering lead to improvement of results, we get a significant gain in prediction result of our proposed method. As we can see in Table 3, the EMD-ESN method proposed in this paper performs best on the four C-MAPSS datasets, whether it is for Score or RMSE. For Score, IMP over LSTM of our proposed method is up to 93.41% on FD002 and IMP over ESN is up to 93.99% on FD003. For RMSE, IMP over LSTM of our proposed method is up to 64.96% on FD004 and IMP over ESN is up to 51.35% on FD003. We can conclude that the proposed method can improve the prediction effect to a large extent. When we take Score as the performance metrics, the improvement effect is particularly obvious.
For RMSE, the ESN has also performed better than the LSTM on all the datasets. However, for Score the LSTM performs better than the ESN on FD001 and FD003. e situation demonstrates that overpredicting tends to be generated by the ESN, especially for FD001 and FD003, which contain just 1 operational condition. Different from the ESN, our proposed method has improved the overpredicting situation, which can also be seen by comparing Figure 12 with Figure 11. In Figure 12, the situations of overprediction are less than those in Figure 11, obviously. Besides, there is a substantial improvement for the EMD-ESN on FD004, which is the most complicated dataset   containing 6 operating conditions and 2 fault modes. is all shows that the EMD-ESN can filter out redundant signals or noise, in which EMD plays a crucial role. is inspires us that the signal decomposition technology can be adopted in the RUL prediction to deal with noise.
(2) Results Comparison of Prediction Efficiency. In addition to results comparison of prediction performance, the training time comparison is also performed, and the results are demonstrated in Table 4. As we can see in Table 4, the training time of the ESN and the EMD-ESN are significantly saved compared with the LSTM, which means fewer computing resources are required. Although the EMD-ESN still takes more training time than the ESN, which consumes a certain time for the EMD decomposition process, the prediction improvement is also considered, especially for data with much noise. is shows the necessity to adopt EMD to decompose the raw sensor data.
(3) Sample Examples of RUL Prediction. In Figure 13, the RUL prediction of a sample engine unit for each dataset is illustrated. We can see that the prediction for the four examples all perform well, with an unobvious trend for overpredicting. It can also be seen that for each unit, the preceding period prediction is generally not as good as the following period prediction. is can be explained by the fact that with the increase of the training data, the accuracy is becoming higher.
In this section, the prediction performance and efficiency are compared between LSTM, ESN, and the proposed EMD-ESN, and result analysis is performed. rough experiments, the superiority of our proposed method is verified.

Conclusions
Prognostic of RUL contributes to timely maintenance and less cost consuming. In this paper, we propose a combination method of EMD and ESN to predict the RUL. e proposed EMD-ESN method is performed on the turbofan engine multisensor time series, which demonstrates the validity, accuracy, and effectiveness of our method. e EMD decomposes the raw time series to obtain portions with simpler information, which can be easily trained by the ESN further. e output of each ESN is summarized to get the final prediction value. Compared with the LSTM and the basic ESN method, the prediction effect of the EMD-ESN method is greatly improved, especially for data with much noise. Besides, the computing time is largely reduced.
However, there is still a lot of research to be conducted and some of the directions are as follows: (1) Although the adoption of EMD improves the prediction effect, it also consumes more time. It is necessary to study how to improve prediction accuracy and save computing resources at the same time.
(2) EMD is one of the signal processing methods in the time and frequency domain, and there are many other relative methods such as wavelet packet decomposition. e adoption of relative methods into noise data processing is also worthy of further study.

Data Availability
e dataset used to support the findings in this paper have been deposited in the "Turbofan Engine Degradation Simulation Data Set," NASA Ames Prognostics Data repository, which contains simulated degradation data for turbofan engine developed by NASA. e dataset can be found through the link https://ti.arc.nasa.gov/tech/dash/groups/ pcoe/prognostic-data-repository/.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this article.