Accurate predictions of remaining useful life (RUL) of important components play a crucial role in system reliability, which is the basis of prognostics and health management (PHM). This paper proposed an integrated deep learning approach for RUL prediction of a turbofan engine by integrating an autoencoder (AE) with a deep convolutional generative adversarial network (DCGAN). In the pretraining stage, the reconstructed data of the AE not only participate in its error reconstruction but also take part in the DCGAN parameter training as the generated data of the DCGAN. Through double-error reconstructions, the capability of feature extraction is enhanced, and high-level abstract information is obtained. In the fine-tuning stage, a long short-term memory (LSTM) network is used to extract the sequential information from the features to predict the RUL. The effectiveness of the proposed scheme is verified on the NASA commercial modular aero-propulsion system simulation (C-MAPSS) dataset. The superiority of the proposed method is demonstrated via excellent prediction performance and comparisons with other existing state-of-the-art prognostics. The results of this study suggest that the proposed data-driven prognostic method offers a new and promising prediction approach and an efficient feature extraction scheme.
As the demand for reliability and efficiency in maintenance technical areas is increasing, prognostic and health management (PHM) has received significant attention. PHM is not only able to decrease the rate of accidents occurring and prolong the lives of devices by replacing old or broken components with new ones earlier but also able to avoid wasting resources by canceling unnecessary maintenance activities [
Generally, the methods of dealing with RUL prediction problems can be categorized into model-based approaches, sensor-based data-driven approaches, and hybrid approaches. For model-based methods, it is difficult to model extremely complicated systems, such as aircraft systems. Moreover, model-based methods require a large amount of prior knowledge and expertise, which limit the effectiveness of these methods. For sensor-based data-driven approaches, the availability of sufficient information is the necessary condition to maximize their powerful processing capability. Fortunately, it is quite common in the current era for a large number of sensors to be installed to monitor the operational behaviors of a system. These data records are historical observations that can be exploited as useful information. Hybrid methods usually combine the two aforementioned methods. However, it still remains very challenging to utilize the advantages and avoid the disadvantages of both approaches. Therefore, the method adopted in this paper is a data-driven approach.
Currently, most data collected in real-life PHM applications are high-dimensional. Due to their complicated environment, monitoring data are subjected to several operating conditions and fault modes, increasing the inherent degradation complexity and the difficulty of directly discovering clear trends in the input data for the prognostic algorithm. To cope with this issue, feature extraction is a necessary procedure to capture useful information from high-dimensional data efficiently [
The CNN was first proposed by LeCun et al. for image processing [
An autoencoder neural network [
To extract more degradation-related features, a deep CNN is embedded in the AE as the basic neural network architecture. It is difficult for the AE to extract deeper abstract information in a single-error reconstruction process. To strengthen the feature extraction capability of the AE, it is used as a generator to participate in the training process of the DCGAN, and its parameters are trained and optimized again. Through double-error reconstructions, the high-level abstract representation is captured and revealed the underlying correlations and causalities in the collected sensor data. Generative adversarial networks (GANs) were proposed by Goodfellow et al. [
As one of the most complex systems, aircraft systems have always been a focus of health monitoring. The engine is one of the most important components for determining the health and life of an aircraft. Hence, there is always a pressing need to develop new approaches to better evaluate engine performance degradation and estimate its RUL. Our work meets this need by proposing a new deep learning model.
In this paper, the time window approach is employed to prepare samples to conduct better feature extraction via the DCGAN based on an AE pretraining model. Raw sensor measurements with normalization are directly used as inputs to the proposed model, and no prior expertise on prognostics or signal processing is required, which facilitates the industrial application of the proposed method. After high-level abstract features are extracted by the pretraining model, the associated RUL is estimated based on the learned representations via an LSTM. Through a double-nested error regression, more degradation-related features can be exploited in the pretraining stage, which is helpful for the whole algorithm to better understand the underlying degradation phenomena.
In view of the effectiveness of our proposed pretraining model, the proposed method is expected to obtain a higher prognostic accuracy than other deep learning methods. A comprehensive analysis of the proposed approach and comparisons with existing methods are presented in this study. The results are verified on four different simulated turbofan engine degradation datasets from the publicly available commercial modular aero-propulsion system simulation (C-MAPSS) dataset produced and provided by NASA [ This paper innovatively integrates an AE and a DCGAN as a pretraining model, which greatly enhances the ability of feature extraction. Through double-error reconstruction, the generated data are closer to the original data so that the intermediate features extracted by the encoder contain more useful information of the original data. Although the simple LSTM and fully connected neural networks (FNNs) are chosen as the fine-tuning stage, better prediction performance is still achieved, which proves the effectiveness of our proposed pretraining model. It is suitable not only for engine datasets but also for other datasets as a feature extraction framework. This work could provide a new perspective to study unsupervised feature representation methods. The proposed new algorithm achieves the RUL prediction performance compared to other comparative algorithms with several operating conditions and fault modes. The proposed algorithm is appropriate for RUL prediction. Higher prediction accuracy allows an enterprise to arrange maintenance activities in advance, which improves the reliability of the system and the economy of the enterprise.
The remainder of this paper is structured as follows. Related work on RUL prediction is introduced in Section
The C-MAPSS dataset has been extensively used to evaluate the effectiveness of deep learning algorithms for RUL estimation. This section reviews the most recent studies conducted using the C-MAPSS dataset. Then, the proposed method is briefly introduced, which is compared with these studies in a later section.
In most PHM applications, sensor data are easy to obtain for intelligent machine health monitoring. Sequential data are the common format of input data. In deep learning, the recurrent neural network (RNN) [
To deal with sequential information more effectively, a CNN can be used to extract abstract features before LSTM layers. Although CNNs have performed excellently on computer vision tasks, such as object recognition [
It is easier to capture the latent representation of the original data in the form of a pretrained model. Thus, a semisupervised learning method is suitable for RUL estimation. Ellefsen and colleagues [
To improve the efficiency of solving problems, different deep learning tools are combined. Yu et al. obtained one-dimensional HI values from sensor data via the bidirectional RNN-based autoencoder, which represents the degradation patterns of the units of the system. Then, they used the similarity-based curve matching technique to estimate the RUL [
Recent deep learning (DL) approaches proposed for RUL predictions on the C-MAPSS dataset (the years between 2016 and 2019).
Authors and references | Year | Approach |
---|---|---|
Babu et al. [ | 2016 | CNN + FNN |
Zhang et al. [ | 2016 | MODBNE |
Zheng et al. [ | 2017 | LSTM + FNN |
Li et al. [ | 2018 | CNN + FNN |
Yu et al. [ | 2019 | BiLSTM-ED |
Ellefsen et al. [ | 2019 | RBM + LSTM |
As stated previously, there is great potential to improve the RUL estimation accuracy by extracting intermediate representations. This paper proposes a new double-nested pretraining model that enhances the quality of the extracted intermediate representation. As shown in Figure
The architecture of semisupervised learning.
The specific flowchart of the proposed method is illustrated in Figure
Flowchart of the proposed architecture for prognostics.
The monitoring data of complex systems, such as engine data, have the characteristics of high dimensions. Due to the influence of various operational conditions and fault modes, it is difficult for the model to directly capture the hidden degradation trend in data, which decreased the prediction accuracy of the model. Therefore, it is necessary to perform high-level feature extraction on the data. The pretraining model proposed in this paper combines a DCGAN and an AE to form a double-nested feature extraction structure, which greatly improves the quality of the extracted features and thus improves the prediction accuracy of the model.
Aircraft engines are typical complex systems. C-MAPSS is the benchmark dataset for detecting the prediction performance of the RUL of aircraft engines. It includes software simulated data of the failures and degradation of large commercial turbofan engines under different operating conditions. Our comprehensive experiments and comparisons with recently proposed RUL estimation algorithms developed based on this dataset show the superiority. The details of the comparison are shown in a later section.
This section introduces the necessary components of the proposed model architecture. First, the main deep learning tools are introduced, including the CNN, autoencoder, LSTM, and DCGAN. Next, the architecture of the proposed model is elaborated.
The CNN was first proposed by LeCun for image processing, and the network has three characteristics, i.e., local receptive fields, tied weights, and spatial subsampling [
In this study, the input data are prepared in a 2D format, where one dimension is the number of sensors, and the other is the time sequence of each feature. Despite the fact that the sources of the collected features are different sensors, the relationship between the spatially neighboring features in the data sample is not remarkable. Thus, the convolution filters in the proposed model are considered to be 1-dimensional (1D) in the first four layers. In the following, the 1D CNN is briefly introduced.
First, the input sequential data are assumed to be
An autoencoder (AE) is a typical unsupervised learning method aimed to extract abstract representations from raw data, and it includes three essential parts: an encoder, representations, and a decoder. The input data
An LSTM is a variant of an RNN that aims to address sequential data. LSTMs have achieved great success on speech recognition and machine translation [
The core structures of an LSTM cell are three nonlinear gating units. Forget gates control the forget rate of the last cell information and are denoted as
Input gates
Output gates
The candidate state values
The output gate,
Through these steps, the LSTM cell is updated in every step.
The core idea of the GAN is the adversarial loss formulated by the generator model
DCGANs have a more stable architecture than GANs via five improvements. First, the pooling layers are replaced with strided convolutions (discriminator) and fractional-strided convolutions (generator). Second, the batch normalization method is used in both the generator and the discriminator [
Temporal sequence data provide more information in comparison to a multivariate data point sampled at a single time step. In the proposed architecture, therefore, a sliding window strategy is adopted to use multivariate temporal information efficiently. The input of the proposed model is a 2D matrix
The proposed architecture structure is shown in Figure
Considering that the input data are collected from different sensors, four convolutional layers with 1-dimensional convolution filters and zero paddings are stacked to extract the degradation information inside every sensor observation in the generator. The first four CNN layers consist of 10 filters (16 × 1). The relationship between the spatially neighboring features in the data sample is captured by three stride-2 convolutions with 18 × 4 filters. The ReLU function is used for the convolution layers. Seven total convolutional layers constitute the encoder of the AE model. The structure is shown in Figure
The architecture of the encoder.
To maintain the same size of the raw data
Note that the dropout technique is used in the first convolutional layer in the decoder and the first FNN to relieve overfitting [
In the supervised stage, two LSTM layers are used to reveal the hidden sequential features of the representation
Phase 1 DCGAN based on AE Modeling Input: sliding window training data Initialize: CNN layer parameters, batch size, learning rate repeat Generation losses = generator loss + reconstruction loss of AE Discrimination loss = discriminator error of real data + discriminator error of generative data Update Generator and Discriminator parameters using RMSprop optimizer separately until Maximum iterations return Trained DCGAN-AE model end Phase 2 Supervised Learning Stage Input: training data and label RUL Initialize: LSTM layer parameters, FNN layer parameters, dropout rate repeat Extracted representations Conducting LSTM operations with the representations (dropout rate is employed to avoid the overfitting problem) FNN is used for RUL estimation Compute losses between predicted RUL with label RUL Update parameters using Adam until Maximum iterations return Trained RUL prediction model end
In the following experimental study, the performance of the proposed framework is evaluated. First, we introduce the C-MAPSS dataset, which has been adopted by many studies. Then, the details of the experimental setup are elaborated. Finally, comparison results are shown and discussed. All the experiments are run on an Intel(R) Core(TM) i7-8550U with 8 GB of RAM and the Microsoft Windows 10 operating system. The programming languages for the deep learning are “TensorFlow” version 1.13.1 and Python 3.5 [
C-MAPSS is a dataset that simulates the effects of faults and deterioration under different operating conditions in the five main rotating components (fan, low-pressure compressor, high-pressure compressor, high-pressure turbine, and low-pressure turbine) found in a large commercial turbofan engine. The C-MAPSS dataset consists of 4 subsets, which are divided into training datasets and test datasets. Each subset includes 26 columns: the number of engines, operational cycles, three operational sensor settings, and 21 sensor measurements that give 21 types of measurements from 21 sensors. A description of the sensed engine variables can be found in Table
Variables of the C-MAPSS dataset.
Sensor data number | Description | Units |
---|---|---|
1 | Total temperature at the fan inlet | °R |
2 | Total temperature at the low-pressure compressor outlet | °R |
3 | Total temperature at the high-pressure compressor outlet | °R |
4 | Total temperature at the low-pressure turbine outlet | °R |
5 | Pressure at the fan inlet | psia |
6 | Total pressure in bypass-duct | psia |
7 | Total pressure at the high-pressure compressor outlet | psia |
8 | Physical fan speed | rpm |
9 | Physical core speed | rpm |
10 | Engine pressure ratio | — |
11 | Static pressure at the high-pressure compressor outlet (Ps30) | psia |
12 | Ratio of fuel flow to Ps30 | pps/psi |
13 | Corrected fan speed | rpm |
14 | Corrected core speed | rpm |
15 | Bypass ratio | — |
16 | Burner fuel-air ratio | — |
17 | Bleed enthalpy | — |
18 | Demanded fan speed | rpm |
19 | Demanded corrected fan speed | rpm |
20 | High-pressure turbine coolant bleed | lbm/s |
21 | Low-pressure turbine coolant bleed | lbm/s |
In addition, different subsets have different numbers of engines whose operational cycles vary. Each engine starts with different degrees of initial wear and manufacturing variation that are unknown and considered to be healthy. As the operating time increases, the engines start to degrade at some point. The degradation in the training datasets grows in magnitude until a failure occurs, while the degradation in the test datasets ends sometime prior to the occurrence of a failure, which is the RUL. The purpose of the proposed algorithm is to predict the RULs of the test datasets. To access the prediction accuracy, the true RUL targets of the test datasets are provided.
The basic information of the datasets is given in Table
Details of the C-MAPSS dataset.
Dataset | FD001 | FD002 | FD003 | FD004 |
---|---|---|---|---|
Engines of the training set | 100 | 260 | 100 | 249 |
Engines of the test set | 100 | 259 | 100 | 248 |
Fault modes | 1 | 1 | 2 | 2 |
Operational modes | 1 | 6 | 1 | 6 |
Training samples (default) | 17,731 | 48,819 | 21,820 | 57,522 |
Testing samples | 100 | 259 | 100 | 248 |
In the training process, all the available engine measurements are used as the training samples, and the corresponding RUL labels obtained from a piecewise linear degradation model are regarded as the targets [
Sensors 1, 5, 6, 10, 16, 18, and 19 in subsets FD001 and FD003 exhibit constant sensor measurements throughout the engine’s lifetime, which are not important for RUL estimation. In addition, subsets FD001 and FD003 are subject to a single operating condition. Hence, the three operational settings are excluded. Accordingly, sensors 2, 3, 4, 7, 8, 9, 11, 12, 13, 14, 15, 17, 20, and 21 are used as the input features for subsets FD001 and FD003. However, to make the output data generated by the decoder maintain the same size as the input data, one of the constant sensor measurements needs to be used. Any one of six constant sensors is sufficient, and thus, sensor 19 is chosen is this paper.
The three operational settings cannot be excluded due to the six operating conditions in subsets FD002 and FD004. Only sensor 1 is dropped from the constant measurements to recover the input data in the generator. In fact, any one of the constant sensor measurements is sufficient.
It is obvious that FD001, FD002, and FD003 are particular cases of FD004 [
No datasets are merged with dataset FD001 since it is useful to test the generalization capacity of the model in the simplest case. Moreover, the algorithm needs to generalize more complex cases, such as FD002 and FD004, in which all the subdatasets are used for their unsupervised training. For dataset FD003, FD001 is merged with it for the sake of measuring the capability of dealing with multiple fault modes [
For each of the 4 subdatasets in C-MAPSS, the collected measurement data from each sensor are normalized to be within the range of [−1, 1] using the min-max normalization method:
True RUL labels are not provided in the training sets; they are only provided at the last time step for each engine in the test sets. To construct labels for every time step for each engine in the training sets, a piecewise linear degradation model has been validated to be suitable and effective to label training datasets [
Illustration of the piecewise linear degradation function.
This piecewise linear RUL target function is the most common approach in previous studies. Based on previous studies, the choice of the initial constant RUL is mainly divided into four types, namely, 115, 120, 125, and 130. The experimental results of the four parameter settings are shown in Figure
Prediction performance of different RUL values on FD001.
For the sake of comparability with other algorithms, the same performance metrics are used to evaluate the prediction accuracy. The formulas for the scoring function (
Difference between RMSE and
First, the C-MAPSS subdatasets are preprocessed as mentioned above. The normalized data are sent to the initial DCGAN based on the AE pretrained model to extract high-level abstract representations, which are used to predict the RUL. Due to the large amount of data, minibatches are used to train the model. The value of the hyperparameter batch size has an impact on the prediction performance. We choose common values for the experiment (values: 128, 256, 512, and 1024). The results are shown in Figure
Prediction performances of different hyperparameters in FD001. (a) Prediction accuracy of different batch sizes in FD001. (b) Prediction accuracy of different hidden sizes in FD001.
Then, the extracted features are utilized by two LSTM layers and two FNN layers to predict the RUL. To achieve better prediction performance, four commonly used hidden node numbers are evaluated ((32, 16), (64, 32), (128, 64), and (256, 128)). When the hidden nodes of the LSTM and FNN are 64 and 32, respectively, the highest prediction accuracy is achieved (see Figure
The hyperparameter
Default parameters of the supervised architecture.
Architecture | Hidden size | Dropout | Activation function |
---|---|---|---|
First LSTM layer | 64 | 0.5 | tan |
Second LSTM layer | 64 | 0.5 | tan |
First FNN layer | 32 | 0.5 | ReLU |
Second FNN layer | 1 | 1.0 | Abs |
In the training procedure, each complete training subset is split into a training set and a cross-validation set. Fifteen percent of the total time windows in the training subsets are randomly selected for cross-validation. The remaining 85% of the total data are designated as the training sets. After the pretraining stage, the testing data samples are fed into the trained network for the RUL prognostics. Finally, the target RUL and prediction accuracy can be obtained.
In this section, the performance of the proposed deep learning approach is evaluated. First, the prediction results of four subsets are analyzed. Then, a comparison is conducted with other state-of-the-art methods to show the superiority of the proposed approach.
The RUL prediction results over the four datasets (i.e., FD001–FD004) are presented in Figures
Prediction for the last recorded data point of different testing engine units in FD001–FD004: (a) prediction for the 100 testing engine units in FD001; (b) prediction for the 256 testing engine units in FD002; (c) prediction for the 100 testing engine units in FD003; (d) prediction for the 248 testing engine units in FD004.
Three key points can be highlighted. First, it can be observed that the accuracy for engines with smaller RULs is noticeably higher, which is particularly important since a smaller RUL means a higher probability of a potential failure. Maintenance activities can be carried out in advance to avoid catastrophic failures. Second, it is easy to find that the prediction error is greater in the early stage than in the late stage, especially when the machine is in a fresh healthy state. That is, because each engine starts with different degrees of initial wear and manufacturing variations that are unknown, the prediction error is increased in the early stage. Third, the prediction accuracies shown in Figures
Studies that have reported results on all four subsets in the C-MAPSS dataset have been selected for comparison. Although the initial RUL values are somewhat different, the results are still comparable. As shown in Tables
RMSE comparison with the literature on the C-MAPSS dataset.
DL approach and references | FD001 | FD002 | FD003 | FD004 |
---|---|---|---|---|
CNN + FNN [ | 18.45 | 30.29 | 19.82 | 29.16 |
MODBNE [ | 15.04 | 25.05 | 12.51 | 28.66 |
LSTM + FNN [ | 16.14 | 24.49 | 16.18 | 28.17 |
CNN + FNN [ | 12.61 | 22.36 | 12.64 | 23.31 |
BiLSTM-ED [ | 14.74 | 22.07 | 17.48 | 23.49 |
RBM + LSTM [ | 12.56 | 22.73 | 12.10 | 22.66 |
Proposed architecture | 10.71 | 19.49 | 11.48 | 19.71 |
Score function comparison with the literature on the C-MAPSS dataset.
DL approach and references | FD001 | FD002 | FD003 | FD004 |
---|---|---|---|---|
CNN + FNN [ | 1287 | 13,570 | 1596 | 7886 |
MODBNE [ | 334 | 5585 | 422 | 6558 |
LSTM + FNN [ | 338 | 4450 | 852 | 5550 |
CNN + FNN [ | 274 | 10,412 | 284 | 12,466 |
BiLSTM-ED [ | 273 | 3099 | 574 | 3202 |
RBM + LSTM [ | 231 | 3366 | 251 | 2840 |
Proposed architecture | 174 | 2982 | 273 | 3874 |
As seen from Table
Compared with these algorithms, the performance of our model was first assessed through a parameter study only on subset FD001 to find suitable key parameters, and then it was directly applied to the other datasets without further tuning for each dataset. More importantly, the proposed scheme shows a good generalization capability for the other datasets when tuned only on subset FD001. Based on the comparison of the above two evaluation criteria, it can be seen that the proposed model greatly improves the prediction accuracy of the RUL of aircraft engines. However, for the two subsets of FD002 and FD004, the inherent complexity of the data increases the difficulty of extracting high-level abstract features, so the prediction stability remains to be improved.
In this paper, we proposed and demonstrated a new deep learning approach, referred to as the DCGAN-based AE scheme, for RUL estimation from multivariate time-series sensor signals. The DCGAN and an AE are integrated to achieve a pretrained stage to extract high-level abstract representations from initial data, and then a fine-tuning stage that includes the LSTM and FNN is used to predict RUL. The improved results have proven that the pretraining model can capture the degradation trend of a fault, which means the proposed method can also be used as an efficient feature extraction scheme to solve other problems. Experiments are carried out on the popular C-MAPSS dataset to show the superiority of the proposed model. Comparisons with several state-of-the-art approaches demonstrate better prediction performance of the model, which proves that the proposed data-driven prognostic method is effective and suitable for prediction problems.
While good experimental results were obtained by the proposed method, further optimization is still necessary. Improving the stability of the method for complex conditions is a further direction for future research. Moreover, efforts should be made to decrease the average training time for each subset in the future.
The dataset was provided by the Prognostics CoE at NASA Ames. So, this dataset is public, and we can visit
The authors declare that there are no conflicts of interest regarding the publication of this paper.