Rotating Machinery Remaining Useful Life Prediction Scheme Using Deep-Learning-Based Health Indicator and a New RVM

Remaining useful life (RUL) prediction plays a significant role in developing the condition-based maintenance and improving the reliability and safety of machines. *is paper proposes a remaining useful life prediction scheme combining deep-learning-based health indicator and a new relevance vector machine. First, both one-dimensional time-series information and two-dimensional time-frequency maps are input into a hybrid deep-learning structure network consisting of convolutional neural network (CNN) and long short-term memory network (LSTM) to construct health indicator (HI). *en, the prediction results and confidence interval are calculated by a new RVM enhanced by a polynomial regression model. *e proposed method is verified by the public PRONOSTIA bearing datasets. Experimental results demonstrate the effectiveness of the proposed method in improving the prediction accuracy and analyzing the prediction uncertainty.


Introduction
Rotating machinery has played an essential role in industrial applications. However, most rotating machinery operates under severe working conditions which may cause different types of faults. erefore, timely maintenance is vital for the reliability of the rotating machinery [1][2][3][4]. Industrial Internet of ings (IoT) and data-driven techniques have been transforming the scheduled maintenance into predictive maintenance. Remaining useful life (RUL) prediction is a critical component of a predictive maintenance scheme, which will reduce the cost of unplanned maintenance and enhance the reliability, safety, and availability of the rotating machinery [5]. e data-driven techniques for RUL of machinery mainly consist of two steps: health indicator (HI) construction and remaining useful life prediction based on the constructed HI [6][7][8][9][10]. e HI is a quantitative value that represents the degradation process of the monitored machinery, including root mean square (RMS) [11], kurtosis [12], and entropy [13]. However, many traditional HIs have poor monotonic trend, which is against the prediction accuracy. For example, the HI curve with an excellent monotonic trend will be well correlated with the degradation process, making the RUL able to be predicted by extrapolating the historical data. However, most HI curves do not show an evident trend until severe degradation starts, which is terrible to make maintenance scheme and reduce the prediction accuracy. Besides, many HI construction methods do not consider the historical data of similar machinery that contains tremendous degradation information.
Recently, the deep-learning network has shown great potential in dealing with big data [14][15][16]. Motivated by the strong power of the deep learning, researchers have done many related works about remaining useful life prediction based on deep-learning method. Zhu et al. [17] presented a deep-learning method for RUL through a multiscale convolutional neural network, the input of which was timefrequency maps. Liao et al. [18] proposed an enhanced restricted Boltzmann machine with a novel regularization term to construct a new HI that is suitable for RUL. e input data of the network was time-series features. Xia et al. [19] presented a two-stage automated approach to estimate the RUL, in which the autoencoded deep neural networks were used to classify different degradation stages and a shallow neural-networks-based regression model was used to predict the remaining useful life. Zhang et al. [20] constructed a new HI called "waveform entropy." en, the new HI and some traditional HIs were input into the long shortterm memory network to identify the bearing remaining useful life. Al-Dulaimi et al. [21] proposed a hybrid deep neural network framework for RUL estimation.
is framework used an end-to-end RUL prediction scheme, the output of which was the RUL value. Although these deeplearning-based RUL prediction methods have shown great performance, they still may be confronted with some problems: (1) e deep-learning-based RUL prediction scheme does not provide any confidence limit, which is not beneficial for people to make a maintenance scheme. (2) Most of the deep-learning networks process only one-type input data, missing some important degradation information.
Relevance vector machine (RVM) is an artificial intelligence method to learn the machinery degradation patterns from available data instead of building statistical models. It can deal with the prognostic issues of sophisticated machinery whose degradation process is challenging to be interrelated by the statistical model [22]. What is more, the RVM-based RUL prediction method also gives a confidence interval to provide uncertainty estimation and probability significance. erefore, the RVM has been attracting more and more attention in the RUL prediction of machinery [23][24][25]. e relevance vector of RVM is sparsity and the hyperparameters are simple, which is beneficial for the online remaining useful life prediction [26]. However, the long-term prediction accuracy of RVM is poor. erefore, a new RVM prediction method with the sparsity characteristic and the accurate long-term prediction ability has been proposed.
ere are many sources of uncertainty in RUL prediction, such as measurement error, randomness of load, degradation feature extraction error, and modeling error, which need to be quantified and managed during the prediction process, and the confidence interval of forecast results is given to facilitate the planning of maintenance. At present, the research on the uncertainty of RUL mainly focuses on statistical data-driven methods. e statistical data-driven method is based on the theory of probability and statistics. rough statistical or random model, the probability distribution of the remaining life can be solved naturally, which is easy to quantify the uncertainty of the prediction results of the remaining useful life. Liao et al. [27] constructed a multiphase degradation model with jumps based on Wiener process, which is formulated to describe the multiphase degradation pattern. All the parameters of the model were assumed to be random variable, which led to the uncertainty of the final remaining useful life. Gao et al. [28] proposed a right-time prediction method to reduce the prognostics uncertainty of mechanical systems under unobservable degradation. Wang et al. [29] presented a probabilistic framework for remaining useful life prediction of bearings. In the proposed model, the Markov chain Monte Carlo method is investigated in posterior sampling for predicting RUL and outputting uncertainty. Most of the existing deep-learning methods can only achieve point prediction and cannot provide uncertainty of prediction results, which greatly limits the practical application of deep learning in RUL prediction field [30]. Some researchers tried to establish RUL prediction model based on Bayesian neural network to solve the uncertainty problem [30,31]. Although Bayesian neural network can be used to solve the uncertainty problem of RUL prediction, the disadvantage of high training cost limits the practical application of Bayesian neural network.
Although the deep-learning-based HI construction methods and RVM-based RUL prediction methods have been widely studied, the methods combining them are relatively lacking. To fill the research gap, a new RUL prediction scheme that combines a new deep-learning structure-based HI construction method and a new RVMbased RUL prediction method is proposed. e new RUL prediction scheme can not only learn the degradation process features from different types of data and get RUL prediction result automatically but also provide a confidence interval (CI). e contributions of this paper can be summarized as follows: (1) A new deep-learning structure that can deal with one-dimensional time-series data and two-dimensional image data simultaneously is proposed to construct HI. e constructed HI has better performance compared with other deep-learning-based HI construction methods.
(2) e proposed systematic approach integrates deeplearning-based HI and a new RVM-based prediction method into a framework to realize the goal of estimating RUL automatically and provide a confidence interval. (3) A new RVM model is proposed by combining traditional RVM and polynomial regression model, improving the long-term prediction accuracy. e paper is organized as follows: Section 2 provides the theoretical backgrounds. Section 3 introduces a new RUL prediction scheme combining deep-learning-based HI and a new RVM. Section 4 demonstrates the effectiveness of the presented RUL estimation scheme with an experimental bearing dataset. e conclusions are presented in Section 5.

Theoretical Background
e proposed RUL prediction scheme mainly consists of four functional layers, which are time-series information learning layer, time-frequency map information learning layer, fully connected layer, and RUL prediction layer. e hybrid deep-learning structure consists of two parallel paths followed by a fully connected multilayer neural network to use the information contained in the original data fully. e two parallel paths are time-series information learning layer constructed by long short-term memory (LSTM) neural network and time-frequency map information layer made up of convolutional neural network (CNN), respectively. e LSTM is used to extract temporal features, while the CNN is utilized to extract spatial features, which are then fused by fully connected layer to construct an HI. Finally, the HI is put into RUL prediction layer to get the remaining useful time and its confidence intervals. e theoretical background of each layer is introduced as follows.

Time-Series Information Learning Layer.
e time-series information learning layer mainly consists of the long shortterm memory network. e long short-term memory network is a state-of-the-art sequence data processing method. It develops from the recurrent neural network with a memory cell, which overcomes the problem of gradient vanishing or exploding. Figure 1 represents the hidden layer replaced by memory cells in LSTM network. e memory cell of LSTM mainly consists of an input gate, output gate, and forget gate. Equations (1)-(6) represent the network update process at time t [32]: In the above equations, i, o, f, a, c, and h represent input gate, output gate, forget gate, the output value of input gate, the state value of memory cell, and the output value of hidden layer, respectively.

Time-Frequency Map Information Learning Layer.
e time-frequency map information learning layer is made up of a deep convolutional neural network, consisting of a convolutional layer and a pooling layer.
In the convolutional layer, local features are generated by convolutional kernels from the feature maps. en, the convolutional results are input into the activation layer to construct the feature maps of the current layer, whose equation process is as follows [33]: In the above equation, x l j is the jth feature map of the lth layer. x l− 1 i is the ith feature map of the (l − 1)-th layer. k l ij is the convolutional kernel with size of S × S. b l j is the bias of the lth layer. M j is the feature map of the convolutional layer. f is the activation function.
In the pooling layer, the feature is extracted from feature maps with the subsampling method to increase computational efficiency. e max-pooling method is given as In the above equation, x l j and x l+1 j are the jth input feature map of layer l and the jth output feature map of layer l + 1. m is the pooling filter size, c and d are the value after convolution, and p and q are the moving step length.

Fully Connected Layer.
e fully collected layer is added after the time-series information learning layer and timefrequency map information learning layer. e features leaning from the above two layers are flattened to construct the fully connected layer, which can be represented by the following equation: In the above equation, O is the final output value. x F j is the jth neuron. ω j represents the weights between the jth neuron and the output node. b is the bias. f is the activation function.

RUL Prediction Layer.
is layer can filer the unwanted measurement noise and manage the uncertainty in prognostics. e RUL prediction layer is constructed with a new relevance vector machine (RVM) combining the traditional RVM method with polynomial models.
RVM is a kernel function algorithm based on Bayesian inference framework [29]. e RVM model of the given dataset where ω � (ω 0 , . . . , ω N ) T , and ω is the weight of the RVM Figure 1: Structure of LSTM.
Shock and Vibration the Gaussian distributed random error with the mean of 0 and the variance of σ 2 . According to the Bayesian inference, the likelihood of the dataset p(t|x) satisfies N(t|y(x), σ 2 ) distribution, which can be written as Maximum-likelihood estimation of ω and σ 2 from (11) will generally lead to severe overfitting, so a Gaussian prior over the weights is defined to smooth the functions as In the above equation, p(ω|α) is the Gaussian prior probability over the weights, ω is the weight of the RVM model, and α � α 0 , α 1 , . . . , α N is a vector of N + 1 hyperparameters corresponding with weights ω. ese hyperparameters are the critical features of the model and are ultimately responsible for the sparsity properties. e posterior over the unknowns could be computed with Bayes' rule, given the defined noninformative prior distribution.
Equation (13) cannot be computed directly, but it can be decomposed as e posterior distribution of the weights is e posterior covariance and the mean of equation (15) are As can be seen from equations (16) and (17), the values of the hyperparameter α and noise variance σ 2 need to be obtained in order to get the values of μ and . e maximum-likelihood estimation method is used to obtain the estimated values of α and σ 2 . e probability distribution of the output value t * is calculated by the new input value x * as where In the training process, most αvalues tend to be infinite, so the corresponding weights have posterior distributions, whose mean and variance are both zero, suggesting that those parameters and corresponding kernel functions play no role in regression analysis, which represents the sparsity of the RVM. e inputs data corresponding to the nonzero weights is called relevance vector (RV). e 95% upper and lower confidence interval can be calculated as where t * upper and t * lower are the upper bound and lower bound of the predicted value t * , respectively. Kis the kernel function; x � x i m i�1 represents the set of the relevance vectors, x represents the relevance vector, and x * is the input data.
e polynomial models are suitable for long-term RUL prediction. Polynomial regression belongs to the leastsquare curve fitting family. Specifically speaking, it estimates the coefficients of a polynomial function to approximate the curve closely. e mathematical expression of polynomial regression is as follows: where y is the response variable, x is the predictor variable, and a 0 , a 1 , a 2 , . . . , a n are model coefficients that can be estimated by curve fitting methods.
In this paper, we take advantage of the RVM and polynomial model, the response variable y is RV, and x is the corresponding running time. e coefficients of the polynomial model a 0 , a 1 , a 2 , . . . , a n are determined by x and relevance vector y of the test bearing. en the predicted value is calculated by equation (23). e variance value σ 2 * is calculated by equation (20) and the 95% upper and lower confidence interval can be calculated by equations (21) and (22).

Time-Frequency Analysis.
In the process of performance degradation of rolling bearings, vibration acceleration signals have nonstationary characteristics. Time-frequency analysis includes both time-domain information and frequency-domain information, which can effectively characterize the characteristics of nonstationary signals. Continuous wavelet transform is a time-frequency analysis method commonly used in state monitoring of rotating machinery. e calculation formula is as follows: where α is the scale parameter; β is the transformation parameter; x(t) is the original vibration acceleration signal; ψ(t) ∈ L 2 (R) is the mother wavelet function; ψ(t) is the complex conjugate of ψ(t). ere is a standard or universal method for the selection of the mother wavelet function. In this paper, Morlet wavelet, which is similar to the impact signal of rolling bearing, is chosen as the mother wavelet.
After the continuous wavelet transform, the one-dimensional vibration acceleration signal is mapped to the twodimensional coefficient matrix, and the time-frequency diagram of the vibration signal is obtained.

RUL Scheme Combining Deep-Learning-Based HI and a New RVM
A hybrid deep-learning structure that can learn temporal features and spatial features simultaneously is proposed to take advantage of mutual information from multidimensional features for degradation assessment and RUL prediction. What is more, the training set is constructed with historical whole lifetime monitoring data. en the training set consisting of different HI curves is used to train the RVM. e sparsity of RVM regression is highly dependent on the choice of kernel functions. e common kernel functions are classified into local kernels and global kernels. In local kernels, only the data points that are close or in proximity of each other have an effect on the kernel values.
In contrast, a global kernel allows data points that are far away from each other to affect the kernel values as well. Furthermore, the common global kernels are polynomial function, spline function, and so forth [34]. Different types of kernels perform distinctly in the interpolation and extrapolation ability. A multikernel RVM-based prediction method is proposed to make full use of the superiorities of different kernels by combining them with the particle swarm optimization (PSO) algorithm. Figure 2 shows the proposed RUL scheme.
First, the time-series information including time-domain features, frequency-domain features, and time-frequency map information of the whole lifetime is extracted from the original vibration signal. Different information is processed by different information learning layer. en, a fully connected layer is used to combine different features learned from the time-series information leaning layer and time-frequency map information learning layer together. e HI is constructed by a three-layer neural network using combined information. Finally, the constructed HI curve is used to predict the RUL with the RUL prediction layer, which is constructed by the RVM and polynomial model. At the inspection time T k , the future HI can be predicted with the constructed polynomial curve. When the polynomial curve reached the failure threshold, the bearing is considered to fail. According to the concept of the first hitting time [31], the RUL of the bearing can be defined as In the above equation, RUL(T k ) is the remaining useful life at inspection time T k , f(t + T k ) is the predicted HI at the time t + T k , and θ is the failure threshold.

Experimental Results and Analysis
In this section, the run-to-failure data acquired from accelerated degradation tests of rolling element bearings are used to verify the effectiveness and superiority of the proposed RUL scheme in practical applications. e experimental data comes from PROGNOSTIA in the IEEE PHM 2012 Data Challenge [35]. e experimental platform mainly consists of three parts, a rotatory part, a degradation generation part, and a signal acquisition part, which is shown in Figure 3.

Data Description.
In this experiment, 17 rolling element bearings working under three different conditions are tested. e experimental conditions are listed in Table 1. Under each condition, two bearings' data are used as a training dataset, while others are testing datasets, which are listed in Table 2. e acceleration sensor is installed in the outer layer of the rolling element bearing. e sampling frequency is 25.6 kHz and every sampling process includes 2560 points. e sampling process is repeated every 10 s.

HI Construction.
e whole lifetime data of the first bearing is selected to be analyzed. e acceleration signal on the horizontal direction shown in Figure 4 shows that the vibration amplitude increases as the experiment cycle increases, but it is hard to determine accurately when the incipient fault occurred. erefore, different features are extracted from the original signal including 10 time-domain features, 12 frequency-domain features, and 1 time-frequency domain feature, which are listed in Table 3.
e training data can be presented by I train � x t , y t , in which x t ∈ R N×N is the time-frequency map with size N × N at time t. y t ∈ [0, 1] represents the performance degradation degree of bearing at time period t, y t � (t/T), and T is the life cycle period of bearing for training. It was verified that the relationship between training data label and running cycle does not affect the final result of health indicator construction.
erefore, the linear model is selected here to construct the training dataset label. Here, bearing 1_1 is taken as an example. e time period when the bearing fails completely is 2800 period, and the degradation degree corresponding to the 1400 period is y t � (1400/2800) � 0.5.
In the proposed deep learning network, the convolution structure mainly refers to classical AlexNet network and time-series information learning layer constructed by stacking three-LSTM-layer network. e literature shows that the network structure can effectively extract the characteristics of time-series data. e CNN and LSTM connected layer is used to connect the information extracted by time-series information learning layer and time-frequency map information learning layer together, which can get degradation information comprehensively. Finally, a fully connected layer is constructed to output the final result. Detailed network parameters can be seen in Table 4.

RUL Prediction.
e HI constructed in Section 3 is used to predict the remaining useful life by the RUL prediction layer. Figure 5 shows the complete degradation prediction Shock and Vibration process of bearing1_5 at inspection time T � 2302 running period to describe the prognostics procedure in detail. First, all the HIs constructed with hybrid deep-learning networks, which are shown by hollow blue dots in Figure 5, are input into RVM to perform regression analyses with different kernel parameter values. Next, the kernel parameter value is selected by the PSO algorithm. As is shown in Figure 5, Figure 6(a) that the RMS increases significantly at the end of the experiment. e kurtosis and crest factor are sensitive to the incipient degradation process with more background noise, which cannot present the degradation process of the whole lifetime clearly as shown in Figures 6(b) and 6(c). In Figure 6(d), the peakpeak-value-based HI has similar trends to RMS, which is insensitive to the incipient degradation process. RUL     Shock and Vibration prediction based on these HIs cannot provide timely maintenance suggestions. Figures 6(e)-6(g) described the deep-learning-based HI suggesting better monotonicity and trendability than the traditional HI. However, the single deep-learning structure cannot fully utilize the information contained in the original acceleration signal. It can be seen in Figure 6(e) that the HI constructed by the deep CNN could present the degradation process, but there exists too much background noise at the earlier stage, since the single CNN structure cannot learn and distinguish the time-frequency picture effectively at the earlier degradation stage. From Figure 6(f ), it can be concluded that the HI constructed with LSTM is almost constant at the earlier degradation stage because the single LSTM structure is unable to learn the difference of the earlier degradation features between the training set and testing set. Figure 6(g) is the HI curve constructed with the proposed method, suggesting that the HI constructed with deep hybrid structure has better linearity and less background noise, which are beneficial to promoting prognostic accuracy. Figure 7 shows the RUL prediction results of bearing 1_3 to illustrate the influence of different deep learning structure on the predicted results further. Figure 7(a) shows the RUL prediction results of LSTM-based HI, where the predicted RUL time is 4800 s and the 95% confidence interval is [2510, 7090] s. It can be seen in Figure 7(a) that the relevance vector in the earlier stage affects the polynomial type, which lowers the prediction accuracy. Figure 7(b) shows the RUL prediction results of CNN-based HI, where the predicted RUL time is 5190 s and the 95% confidence interval is [0, 10830] s. A larger confidence interval brings about too much uncertainty to the RUL results, which is terrible for making maintenance plan. e RVs in the middle period influence affect confidence interval. Figure 7(c) shows the RUL prediction results of hybrid-structure-based HI, where the predicted RUL time is 5600s, and the 95% confidence interval is [5398, 5803] s. e actual RUL time is 5730 s, which has been included in the confidence interval. What is more, the confidence interval is narrow, leading to the low uncertainty of the prediction results.

Results and Analysis
e RUL prediction results of all the test bearings are shown in Figure 8. It can be seen from Figure 8 that the method proposed in this paper can effectively predict the performance degradation trend and obtain a relatively linear type (such as bearing1_3, beaing1_4, bearing1_7, bearing2_4, and bearing2_5), exponential type (such as bearing2_2), and S-shape type (such as bearing1_5, bearing1_6, bearing2_1, bearing2_3, and bearing2_6). e polynomial model can effectively fit different types of degradation curve.
Different RUL prediction methods are compared with six other studies with the same dataset to illustrate the superiority of the proposed scheme, which are listed in Table 4. Column 1 shows the testing bearings. e prediction starting time is shown in column 2. For each testing bearing, the actual and predicted RUL times are displayed in columns 3 and 4, respectively. e predicted errors of the proposed method are shown in the final column, and the six comparative studies are shown in columns 5 to 10. e mean and SD of the percent errors and the scoring metrics are shown in the last three rows. A scoring function to evaluate the final prediction results is defined as follows: where A i is the prediction score of the ith test bearing and Er i is the percent error of RUL prediction results for the ith testing dataset, and it can be calculated as where ActRUL i represents the actual RUL of test bearing i and RUL i represents the predicted RUL results of test bearing i. e RUL prediction method proposed by Sutrisno et al. [36] has predicted the RUL based on vibration frequency signature anomaly detection and survival time ratio. However, the anomaly detection time point is decided by subjective criteria, and the prediction errors are large in the table. Hong et al. [37] have constructed the packet-EMD and SOM-based HI to predict RUL to improve the RUL accuracy compared to the previous work. However, it requires extracting more than 100 features to construct the HI, which is time-consuming. Lei et al. [38] have proposed a new HI construction method based on weighted minimum quantization error (WMQE) to predict RUL of bearings. ese three methods use feature extraction, selection, and fusing, which rely on manual experience and time consumption.
Guo et al. [40] constructed a deep-learning-based method to construct HI with multiple features from the time domain, frequency domain, and time-frequency domain.
is method showed superiority over SOM-based HI construction method, but it has a lower accuracy than Lei et al.'s. Yoo [41] has proposed a new method to construct the HI with CNN and the Gaussian process regression method for RUL prediction. is method improves the prediction accuracy and efficiency. Si et al. [42] has constructed HI with wavelet packet decomposition, empirical mode decomposition, and self-organizing map and used RVM combined with exponential degradation model to predict RUL, which improves the RUL accuracy effectively. However, the HI construction process is complicated and time-consuming.
In Table 5, the proposed method shows the lowest percent errors and low deviation, proving that the model is accurate and reliable on every tested bearing. What is more, this method does not design a sophisticated feature extraction algorithm based on human experience and realizes the intelligent RUL prediction. e performance  Compared with the other methods, the prediction accuracy of the proposed method is not the highest in all test bearings, but the average error and score of the prediction results are the best. In the next step, the performance degradation process of different test bearings will be studied in depth to further improve the prediction accuracy of the remaining useful life of each type of test bearings.

Conclusions
is paper proposes a new RUL prediction scheme combining deep learning and a new RVM method. Firstly, different types of degradation data are input into the deeplearning network with a hybrid structure to construct the health indicator. en the new RVM model consisting of RVM and a polynomial model is used to predict the RUL and calculate confidence interval. Finally, the proposed method is compared with different RUL prediction methods to verify the effectiveness. e proposed deep-learning network with a hybrid structure could learn from different types of degradation data.
e constructed health indicator curve has better monotonicity and trendability than the single-structure deep-learning network, such as CNN and LSTM. e RVM is widely used in RUL prediction. On the one hand, the RVM could reduce the redundancy of the degradation curve to enhance the prediction accuracy. On the other hand, the prediction results of RVM are profoundly affected by kernel function and the long-term prediction ability is reduced. e proposed method retains the advantage of RVM and overcomes the disadvantage by combining the polynomial model with RVM. e final RUL prediction results show that the proposed method can enhance prediction accuracy and narrow down the confidence interval.
Although the proposed RUL scheme improves the prediction results, it is time-consuming. In future work, it is expected to raise the computational efficiency by researching a better deep-learning structure.

Conflicts of Interest
e authors declare that there are no conflicts of interest.