A Novel Method for Remaining Useful Life Prediction of Roller Bearings Involving the Discrepancy and Similarity of Degradation Trajectories

Accurate remaining useful life (RUL) prediction of bearings is the key to effective decision-making for predictive maintenance (PdM) of rotating machinery. However, the individual heterogeneity and different working conditions of bearings make the degradation trajectories of bearings different, resulting in the mismatch between the RUL prediction model established by the full-life training bearing and the testing bearings. To address this challenge, this paper proposes a novel RUL prediction method for roller bearings that considers the difference and similarity of degradation trajectories. In this method, a feature extraction method based on continuous wavelet transform (CWT) and convolutional autoencoder (CAE) is proposed to extract the deep features associated with bearing performance degradation before the degradation indicator (DI) is obtained by applying the self-organizing maps (SOM) method. Next, a dynamic time warping (DTW) based method is applied to perform the similarity matching of degradation trajectories of the training and testing bearings. Driven by the historical DIs of the given bearing, the grey forecasting model with full-order time power terms (FOTP-GM) is applied to model the degradation trajectory using a parameter optimization method. Then, the failure threshold of the given testing bearing can be determined using a data-driven method without manual intervention. Finally, the RUL of the given testing bearing can be estimated using the preset failure threshold and the optimized degradation trajectory model of the given testing bearing. The experimental results show that the proposed method retains the individual differences of bearing degradation trend, realizes the independent and reasonable bearing failure threshold setting, and improves the prediction accuracy of RUL.


Introduction
Accurate remaining useful life (RUL) estimation of bearings is a significant challenge in the prognostics and health management (PHM) system for rotating machinery to improve the equipment reliability and reduce equipment failures as well as maintenance costs. In the literature, the prognostic approaches in a PHM framework can be implemented in three different ways: physics-based approaches, data-driven approaches, and hybrid approaches (a combination of data-driven and physics-based approaches) [1]. By solving a set of equations based on the physical laws and the knowledge of engineering and science, the physicsbased prognostic approaches assess the component health and predict when the damage crosses a predefined failure threshold based on the mathematical modeling of the degradation process for a particular failure mode [2]. However, with high accuracy and efficiency requirements in component RUL prediction, physical model-based life prediction methods are difficult to meet modern needs due to their complexity, time-consuming, and nonuniversality.
Without considering the complex degradation mechanism of the system, the data-driven-based prognostic approaches can reduce the dependence on the amount of prior knowledge and have the advantages of high prediction accuracy and strong applicability. From the perspective of mathematical modeling, the data-driven prognostic methods can be further divided into statistical methods and artificial intelligence (AI) based methods [3]. Statistics-based prognostic approaches, also known as the empirical model-based prognostic approaches, estimate the remaining useful life of a mechanical component by building a statistical model based on empirical knowledge. e statistical methods include Gaussian model methods, autoregressive model methods, hidden Markov model (HMM) methods, Wiener process model methods, various statistical clustering methods based on distance, and so on. Medjaher et al. [4] proposed a mixture bearing RUL prediction model combining the Gaussian model and HMM, and the performance of the proposed method is verified using the real degradation data sets of bearings. C. Kwan et al. [5] proposed a novel bearing fault diagnostics and prognostics method using HMM to characterize the failure mechanism of bearings and applied the HMM-based method to a rotating shaft system to verify its performance using actual data. X. Zhang et al. [6] proposed an integrated method for bearing fault diagnostic and prognostic based on PCA and HMM, and the effectiveness of the proposed method is verified by using experimental bearing vibration data sets. P. Ding et al. [7] proposed a novel degradation trend estimation method based on an interpretable and lightweight vector autoregression algorithm, and the run-to-failure data sets of rolling and slewing bearings are analyzed to demonstrate the effectiveness of the proposed method. B. Ayhan et al. [8] proposed an adaptive bearing RUL prediction method based on the damage accumulation description, in which the recursive least squares (RLS) algorithm is applied to estimate the damage curve approach (DCA) based RUL prediction model adaptively.
e main way to perform an AI-based framework for bearing RUL prediction is machine learning (ML) based prognostics, including artificial neural networks (ANN) [9], support vector machines (SVM) [10], random forests (RF) [11], and deep learning (DL) [12]. e conventional machine learning-based prognostic methods usually extract a single statistical feature in time or frequency domain from the original signal as a health index (HI), such as root mean square [13], Kurtosis [14], energy entropy [15], and so on. However, there are significant limitations in the characterization capability of such single statistical features. To a more reasonable degradation indicator (DI), different statistical features are extracted in time, frequency, and timefrequency domain, and some DI construction methods, such as the principal component analysis (PCA) [16], support vector data description (SVDD) [17], self-organizing map (SOM) [18], and orthogonal sparse algorithm (OSA) [19], are applied to fuse an effective degradation indicator by reducing the dimension of such extracted feature sets. e deep-learning-based prognostic methods can learn the deep information of original data and accurately evaluate the degradation status of bearings. e deep-learning-based methods applied to predict the bearing RUL mainly include: deep belief networks (DBN) [20], integrated deep learning method based on time domain and frequency domain features [21], convolutional neural network (CNN) [22,23], autoencoder (AE) [24], recurrent neural network (RNN) [25], and long short-term memory (LSTM) [26]. It should be noted that the deep learning methods mentioned above usually use the full-life-cycle data of the training bearing to build a deep learning model to establish a nonlinear mapping relationship between the monitoring data of testing bearing and its RUL under the assumption that the degradation patterns of the training bearing and the test bearing are the same or similar. In practical engineering practice, due to the individual heterogeneity of bearings and different environmental conditions, bearings under the same working condition may not necessarily have similar degradation trajectories, while bearings under different working conditions may also have similar degradation trends. In addition, a sufficient volume of bearings with full-life data is needed as a prerequisite to training a practicable prediction model using deep learning methods, but it is not easy to collect large amounts of bearing data with health information marked since the collection of bearing data is complex and expensive.
To address the mismatch between the pretrained RUL prediction model and the test bearing, one approach is to map the pretrained model or the testing data by employing transfer learning (TL) approaches to adapt the prediction model to the test bearing domain [27]. To this end, many researchers have extensively explored transfer learning methods, such as transfer component analysis (TCA) [28], joint distribution adaptation (JDA) [29], and correlation alignment (CORAL) [30]. Aiming at bearing RUL prediction based on transfer learning, Mao et al. [31] adopted TCA to bridge the RUL discrepancy between test bearings and training bearings. P. Ding et al. [32] proposed a novel bearing RUL assessment method based on unsupervised meta-learning to deal with the challenge of poor generalization and low prediction accuracy resulting from unlabelled and limited samples. Y. Ding et al. [33] proposed a dynamic domain adaptation-based RUL prediction method for the machinery with multiple working conditions using a deep subdomain adaptive regression network. X. Li et al. [34] proposed a deep learning-based prognostic method in which the generative adversarial network is used to learn the distribution of the healthy state data. However, it is worth noting that it is a time-consuming task with a tremendous computational burden to perform the transfer learning for high RUL predicting accuracy. Besides, the nonlinear relationship between the training data and the RUL prediction model after domain adaptation is not necessarily suitable for all the target test bearing data sets due to the individual heterogeneity of the bearings.
Another approach for cross-domain RUL prediction problems is the degradation indicator extrapolation method. e DI extrapolation method sets a reasonable failure threshold according to the data characteristics of the given test bearing, then establishes a fitted degradation model of the test bearing according to its historical DI curve, and finally realizes the RUL prediction of the given testing bearing based on the preset failure threshold and fitted the DI curve. Compared to the black-box transfer learningbased methods, the DI extrapolation method has good physical interpretability. e challenges in this method are how to fit the bearing historical DI curve quickly and accurately and how to determine the failure threshold automatically with no human intervention, in which three issues should be addressed carefully: 2 Computational Intelligence and Neuroscience (1) Building a degradation model for the given bearing using practical features that can characterize the performance degradation of bearing (2) Setting practicable failure thresholds for the testing bearings with different performance degradation patterns (3) Establishing a prediction framework with a good generalization ability that can update the model parameters dynamically according to bearing test information As an approximate exponential model, the grey forecasting model (GM) can analyze the internal law of the grey system with fuzzy structure and incomplete or uncertain exponential data, showing a high forecasting precision and robust performance. K. Peng et al. [35] proposed an aircraft engine RUL method using GM (1, 1) model with a better RUL prediction performance by taking logarithmic operations and sliding window prediction. Z. Meng et al. [36] combines the Markov and GM (1, 1) models to realize the bearing RUL prediction, showing a lower root mean square percentage error. Note that the traditional grey models cannot simulate accurately any given nonhomogeneous exponential sequence with velocity and acceleration terms, a novel grey forecasting model with full-order time power terms (FOTP-GM (1, 1)) is proposed to solve this problem by S. Li et al. [37]. e FOTO-GM (1, 1) model can simulate a more complex approximate exponential sequence containing constant, velocity, and acceleration terms by changing its structure automatically to adapt to the evolution trend of parameters to be predicted. e FOTO-GM (1, 1) model can fit the homogeneous exponential sequence exactly and simulate such nonhomogeneous exponential, which provides another new approach for grey theory in bearing RUL prediction. e accuracy of bearing RUL prediction is not only affected by the bearing degradation model but also closely related to the failure threshold. A reasonable failure threshold can make the facility managers carry out more effective maintenance depending on the bearing health condition. In the existing literature on data-driven based bearing RUL prediction, there are few discussions on the setting of failure threshold, and most of them are set manually, which mainly includes four setting methods: manual empirical method [38], vibration acceleration amplitude threshold of 20 g [31], and life percentage uniform threshold [39]. ese failure threshold setting methods have a significant manual subjectivity and do not consider the variability of the bearing performance degradation process, and their engineering utility is limited. In the engineering practice, the degradation trends of bearings under the same working conditions may be different, while the degradation trends of bearings under different working conditions are similar to some extent due to individual heterogeneity and different environmental conditions of bearings. erefore, the difference and similarity of the degradation trends between the training and test bearings must be taken into account when performing the bearing RUL prediction. Some researches have been done to fill this gap. Wang T et al. [40] proposed a novel RUL prediction method based on the degradation trajectory similarity using the Euclidean distance as the similarity criterion, and the smaller the distance, the greater the similarity. Li et al. [41] proposed a similarity-based approach for RUL estimation for industrial components by calculating the fuzzy similarity between test trajectory patterns and reference training trajectory patterns. However, the similarity measure methods mentioned above need to unify the sequence length using the average method or the interpolation method resulting in the loss of the original time information of the degradation trajectories. Dynamic time warping [42], which can effectively cluster time series with noise and time distortion, is an effective pattern dissimilarity measurement technique that can align two different length sequences representing the same type of things in the time domain and calculate the distance between the two-time series by extending and shortening the time series.
Given the challenges and discussion above, this work proposes a bearing RUL prediction approach that considers the difference and similarity of bearing degradation trajectories. In the proposed method, the DTW is used to measure the similarities between degradation trajectories of the training and testing bearings, and the FOTO-GM (1, 1) model is utilized to fit the degradation trajectory of the given bearing with high simulative precision. Firstly, the timefrequency diagrams of the training and testing bearings are generated by CWT before the deep features associated with the performance degradation are extracted by inputting such time-frequency representations into the CAE network. Secondly, the degradation trajectories of the training and testing bearings can be obtained by inputting such hidden features into the pretrained SOM networks. en, the fitted degradation trajectories models of the training and testing bearings are obtained by the FOTO-GM (1, 1) model, and the failure thresholds of testing bearings can be determined using the fitted degradation curves of the training bearings and the degradation trend distances between the training and testing bearings measured by DTW. Finally, the RUL prediction of the testing bearings can be realized using the preset failure thresholds and fitted degradation curves of testing bearings. e proposed framework is evaluated on the IEEE PHM 2012 Challenge data sets [43] and the XJTU-SY data sets [44]. e case study on experimental bearing data sets proves that the proposed method could accurately predict the RUL of bearings under different working conditions. e comparison with other state-of-the-art methods also verifies the feasibility of the proposed method. e main contributions of this paper are summarized as follows: (1) A CWT-CAE-SOM-based bearing degradation indicator construction method is proposed without manual steps of signal feature extraction, selection, and fusion. is method extracts the hidden deep representations driven by the monitoring data of the given bearing to meet the consideration of the In this method, the fitting error between the fitted degradation curve and the original degradation trajectory is selected as the evaluation metrics for the training phase, and the distance between the fitted degradation curves of the given testing bearing and corresponding reference training bearing is considered as the selection criteria in the testing phase, which fully considered the difference and similarity of the degradation trajectory of the training bearing and the test bearing (4) A data-driven-based bearing failure threshold setting method is proposed. is method adaptively determines the bearing failure threshold of the given testing using the DTW distance between the given testing and reference training bearing and the life endpoint value of the fitted degradation curve of the reference training bearing, which avoids the blindness of the artificial subjective of the failure threshold setting. e remainder of this paper is organized as follows. Section 2 introduces the theoretical background. In Section 3, the failure thresholds setting and RUL prediction methods are proposed and discussed. In Section 4, the experimental procedure and results are presented and discussed. Finally, some conclusions are drawn in Section 5.

Bearing Performance Degradation Characteristics Extraction Based on CWT and CAE.
e time-frequency representation provides the joint distribution information of the time domain and frequency domain, and the continuous wavelet transform method is applied to extract the timefrequency distributions of the monitoring bearing signal effectively. e CWTcan decompose the given bearing signal into a time-scale plane representation by scaling and shifting the mother wavelet. A mother wavelet ψ ∈ L 2 (R) is usually a function with zero average and finite length, where L 2 (R) is the space of square-integrable complex functions [45]. e family of time-scale waveforms is obtained by scaling and shifting the mother wavelet as follows: where a > 0 is the scale factor for dilating or contracting the wavelet and b is the shifting factor for transitioning the wavelet along the time axis. For the given bearing signal x(t), the CWT operation decomposes the signal x(t) into wavelet coefficients according to the following integral: where ψ * is the complex conjugate of mother wavelet ψ. e CWT is useful for obtaining the frequency components at different time scales and resolutions. For small scales (a > 1), ψ a,b (t) will be short and of high frequency, while for large scales (a < 1), ψ a,b (t) will be long and of low frequency.
In view of the individual variability of bearing degradation trends, the CAE is used to automatically extract degradation characteristics from bearing monitoring data sets and realize the effective extraction of bearing performance degradation features without prior knowledge of bearing RUL. Compared with the traditional autoencoder, CAE, as an unsupervised deep learning method, uses convolutional operation for the encoding and decoding part instead of slicing and stacking the data, which significantly improves the performance of training parameter optimization and feature extraction [46]. e convolutional network in the CAE encoder encodes the input data into a set of hidden space representations, and then the decoder reconstructs the input data using deconvolution operation. As shown in Figure 1, let TF i denotes the time-frequency map of the given bearing data X i at time i, where TF i ∈ R L1×L1×D , and the H represents the potential latent space representations of the given bearing time-frequency data, where H ∈ R L2×L2×K . e kth feature map in encoder output H can be expressed as follows: where σ(·) is the nonlinear activation function, * represents 2-D convolution, and ω k and b k denote the weights and bias value of the kth convolution kernel of the encoder, respectively. en the kth encoded hidden representation can be decoded to reconstruct the input data TF i using deconvolution operation as follows: where TF i ∈ R L1×L1×D is the reconstruction of input bearing time-frequency data, ω k is the 2-D deconvolutional filter in the decoder, and b k is the bias value of the decoder. During unsupervised pretraining, the loss function of CAE is defined as follows: where E denotes the mean-square-error (MSE) distortion between the original input image TF i and reconstructed image TF i . Minimizing the loss function E can get an optimal hidden space representation of the input bearing time-frequency data TF i , which can be used as a deep feature of bearing performance degradation at that time i.
To improve the training efficiency and generalization capability of the CAE network, multiconvolution layers are usually adopted, and each convolutional layer is followed by a batch normalization layer to make sure that the inputs and outputs of each layer have the same amplitude distribution with input data, which make the CAE can use a larger learning rate for training, accelerate the training speed, and overcome the influence of covariance offset.

Bearing Performance Degradation Trajectory Construction Based on SOM.
Due to the different sensitivities to track the degradation trend of bearing performance, the multiscale depth representations extracted by CAE cannot reflect the hidden information of bearings in the degradation process in a unified manner, and it is necessary to map such depth features into a unified performance degradation indicator. To obtain an accurate DI curve of the given bearing, the SOM is used to fuse the multiscale depth representations extracted by CAE into a nondimensional DI.
Let H i denotes the n-dimensional hidden representations of the given bearing time-frequency data TF i output by encoder, is the weights of the j th neuron in the SOM network, where j � 1, 2, · · · , M, M is the neuron number of the SOM network. e construction process of DI is illustrated as follows: (1) e normalized H 1 is firstly input into the SOM model, and the winning neuron c is selected according to the minimum Euclidean distance standard: (3) All the historical time-frequency diagrams of the given bearing are input into the trained SOM model, and the DI trajectory can be obtained using the minimum value of Euclidean distance between hidden feature H t at time t and the weight vector W: e DI(t) measures the deviation degree of the bearing hidden representations between degradation conditions and the normal condition in kernel space performed by SOM. A higher DI value represents a more severe degeneration in bearing performance.

Similarity Measure of Bearing Degradation Trajectories
Based on DTW. Dynamic time warping, which can effectively cluster time series with noise and time distortion, is an effective pattern dissimilarity measurement technique. DTW-based bearing degradation trend matching is a nonlinear regularization method that combines bearing operation time regularization with degradation trajectory distance calculation. It takes the historical degradation curve of the given testing bearing as the reference template, compares the full-life degradation trajectories of the training bearings with the reference template one by one, and finds the training bearing with minimum trajectory distance as the degradation trend matched training bearing of the given test bearing.
e notation DI train denotes the full-life degradation trajectory of a training bearing, and the notation DI test represents the degradation trajectory of a testing bearing, where DI train ∈ R 1×N train and DI test ∈ R 1×N test . Define a matrix d ∈ R N train ×N test , where d(i, j) represents the distance between point DI train (i) and point DI test (j) to find the most fitting path that passes through a number of grid points of this matrix grid, ensuring the final total distance between the two degradation curves is the shortest. Such a path can be expressed as follows: Computational Intelligence and Neuroscience where p represents the regularity degree of the two trajectories, e grid points through which the path passes are the points on which the two sequences are aligned for calculation. To ensure that each point of DI train and DI test appears in the path p, the i and j in p k � (i, j) must be monotonically increasing.
e obtained regularized path is required to satisfy the shortest distance path rule: where D(i, j) is the cumulative distance of the two sequences from p 1 � (1, 1) to p k � (i, j), that is, the sum of the Euclidean distance d(i, j) that represents the distance between the point DI life (i) and point DI test (j), namely, the cumulative distance of the smallest neighboring element that can reach the point (i, j). e value of the last point in matrix D is the total cumulative distance of the two sequences and can be considered as the similarity of the two sequences.
Bearings with similar degradation trajectories should have a similar degradation pattern. e failure threshold of the reference training bearing with degradation trend matched with the given testing bearing can be used to determine the failure threshold of the given testing bearing.

e Basic Principle of the FOTP-GM (1, 1) Model.
e traditional grey model is only suitable for fitting the time series of pure exponential change law, while the performance degradation of rolling bearings is affected by many complex factors, such as steady disturbance, constant speed disturbance, and acceleration disturbance, and its performance degradation trajectory will not strictly follow the pure exponential change law [21]. FOTP-GM (1, 1), as a new model derived from the grey prediction model, can adaptively change the model structure and parameters according to the dynamic changes of the measured sequence to maximize the fitting and prediction accuracy. e discrete form of the FOTP-GM (1, 1) model is defined as follows: where a is the development index, b i i � 1, 2, . . . , h is the grey actuating quantity, and h is termed as the order of time power terms b i t h− i . Based on the given degradation curve X (0) , a 1-AGO sequence X (1) can be generated by the following: en the parameter sequence a and b i i � 1, 2, . . . , h of the FOTP-GM (1, 1) model can be estimated by the leastsquares method, and the corresponding time response function can be obtained as follows: where c is a constant that can be optimized to acquire the minimum error of simulation. e reconstructed sequence can be obtained as follows: e structural parameters of the FOTP-GM (1, 1) model can be adjusted adaptively with the dynamic changes of the actual time sequence, which can fit the homogeneous exponential sequence accurately and approximate the nonhomogeneous exponential sequence without error.

Methodology
Due to the nonlinearity and uncertainty of the bearing degradation process, it is difficult to predict the RUL of the given bearing accurately. e essential step to this problem is to establish a believable and reasonable mathematical model to characterize such a nonlinear degradation process. Based on the theories mentioned in Section 2, the proposed method is shown in Figure 2, which mainly contains four steps: DI trajectory construction, degradation trend matching, optimal order setting of FOTO-GM (1, 1) model, and the RUL prediction with preset failure thresholds.
Firstly, the time-frequency diagrams of the vibration data of the training and testing bearings are input into the CAE network to extract the deep hidden representations, and the degradation trajectories of the training and testing bearings can be generated from such deep hidden representations using the SOM network. Secondly, the trend similarity between the degradation trajectories of the training and testing bearings is evaluated using DTW, and the degradation trend matching results can be determined using the minimum value in DTW distances. en, the degradation trajectory models of the training and testing bearings can be obtained by applying FOTO-GM (1, 1) model using an order optimization method. Finally, the RUL prediction of the testing bearing can be realized using the preset failure thresholds and fitted degradation curves of testing bearings.

Data-Driven Based Degradation Trajectory Model
Construction. An accurate bearing degradation model can reduce the nonlinearity and uncertainty of the bearing RUL prediction. Based on the theories mentioned in Section 2, the proposed data-driven method for bearing degradation trajectory construction and modeling is shown in Figure 3, which can be summed up as four steps as follows.
(1) Generating time-frequency diagrams of the given bearing data using equation (2) given in Section 2.1.
(2) Extracting the deep hidden features of obtained time-frequency diagrams using the CAE method mentioned in Section 2.1.    Step 3 Step 1 Step 4 Step 2 terms orders, then using the method given in Section 3.2 to evaluate the fitting performance and select the optimal time power terms order h.

e Order Determining of the Time Power Terms for the
Training and Testing Phase. Since the curve reconstructed by the FOTP-GM (1, 1) model is sensitive to the order of time power terms b i t h− i . A fitted degradation trajectory model with a suitable h can accurately reflect the process of bearing performance degradation. So an appropriate time power terms order is needed to be selected. e FOTP-GM (1, 1) model is used to fit data points in the historical degradation trajectory of the given bearing and the corresponding fitted time series,DI fitted , can be described as follows: where u, p, q, r, and s are the fitting parameters.
For the training phase, the fitting performance of different fitting order of time power terms can be expressed as follows: where e train represents the root-mean-square value of fitting error between the origin degradation trajectory DI train and the fitted degradation curve DI train_fitted , h ∈ N * is the order of time power terms, and N is the length of the origin degradation trajectory DI life . e smaller the value of e train, the smaller the fitting error between the fitted curve and the original curve. To prevent overfitting, the h corresponding to the trend transition point in the e train curve can be selected as the optimal order of time power terms for the given training bearing.
For the testing phase, the optimal order of time power terms can be determined using the DTW distance between the fitted degradation curves of the given testing bearing and the corresponding reference training bearing with degradation trend matched: where DI train_fitted and DI test_fitted are the fitted degradation curves of the reference training bearing and the given testing bearing, respectively. e h corresponding to the minimum point in the e test curve and making the fitted curve monotonically increasing can be selected as the optimal time power terms order for the given testing bearing.

e Bearing Failure reshold Setting and RUL Prediction.
e discrepancy and similarity of degradation trajectories are considered to determine the failure threshold of the given testing bearing. e degradation trend of the training bearing, which has a similar degradation trend to the given testing bearing, is selected as the reference performance degradation pattern, and the DI value at the life endpoint is considered to set the testing bearing failure threshold. However, the degradation trajectories of training and testing bearings cannot be the same, so the DTW distance between these two curves, which indicates the similarity of the degradation trend, is considered to reduce the difference between these two degradation curves.
Let DI train and DI test_fitted represent, respectively, the original DI trajectory and the fitted degradation curve reconstructed by the FOTP-GM (1, 1) model of the given reference training bearing, as shown in Figure 4(a). e DI test and DI test_fitted denote, respectively, the original DI trajectory and the fitted degradation curve reconstructed by the FOTP-GM (1, 1) model of the given testing bearing, as shown in Figure 4(b). e given testing and training bearings have the minimal value of DTW distance showing a similar degradation trend. As shown in Figure 4(e), the value of DI train_fitted (N) at the life endpoint is considered as the failure threshold of the given training bearing. e failure threshold of the given testing bearing with a similar degradation trend to the training bearing can be determined as follows: e RUL prediction method of rolling bearings based on FOTP-GM (1, 1) can dynamically optimize the model structure according to the performance decline trend of rolling bearings to obtain the smallest simulation error of the original time sequence as much as possible, with higher prediction accuracy and stronger generalization ability.
As shown in Figure 4(e), the specific implementation method of RUL prediction is as follows: firstly, using the historical DI curve of the given testing bearing, DI test (m), (m � 1, 2, . . . , M), to model and solve the parameters with equation (11). en, the time correspond DI test_fitted function can be solved using equation (13), and several points in the future of DI test_fitted can be predicted. When the DI test_fitted value reaches the preset failure threshold, it is considered that the whole life is reached. Finally, the RUL can be obtained by subtracting the current operation time T C from the predicted whole life T p , that is,  [43]. is data set was used in the IEEE PHM 2012 Data Challenge for predicting the RUL of bearings. As shown in Figure 5, the experimental platform conducted accelerated degradation tests to collect 8 Computational Intelligence and Neuroscience degradation data of ball bearings until their total failure, in which 17 bearings were tested under 3 different operating conditions, as shown in Table 1. e first 2 bearings of each group were used to train the run-to-failure data set to build prognostics models, and the remaining 11 bearings were truncated and required to predict the RUL accurately. e sampling frequency of the acceleration signal collected by the test bench is 25.6 kHz, and the data acquisition card (NIDAQCard-9174) collects data once every 10 s, with a time of 0.1 s and 2,560 data points.

e Result of DI Curves
Construction. e vibration signals collected from faulty bearings usually contain periodic pulses with shapes similar to the Morlet wavelets. Based on the principle that the shape of the selected wavelet should be similar to the mechanical fault signal, the Morlet-based CWT is applied to extract time-frequency features from the raw vibration signal of bearings. Figure 6 shows the time-frequency diagrams of the bearing training data sets during the run-to-failure experiment, in which the degradation progress is calculated as a percentage of the operation time over the whole lifetime of the bearing. Generally, the lifecycle of bearing can be divided into three stages: normal, degradation, and failure. e normal stage has two phases: run-in state and steady state. e frequency response of a running-in bearing is concentrated in the rotating zone, and the time-frequency diagram becomes clean with time.
e time-frequency diagram of a normal bearing is clean with occasional random shocks seen. In the degradation stage, the time-frequency diagram starts to become cluttered, and the frequency response is concentrated in the middle frequency band. When the degradation progress approaches 100%, the time-frequency diagram becomes very cluttered, and the frequency response has amplitude in all frequency bands. e above analysis shows that the time-frequency diagram can show the frequency energy distribution of bearing vibration signals hinting at the development trend of defects in the timefrequency domain, and the time-frequency characteristics of the bearing vibration signals are sensitive to the bearing degradation.
e time-frequency maps of all the tested bearings are input into the CAE model to perform encoding and decoding operations, and the depth hidden characteristics related to bearing performance degradation can be obtained from the output of the convolutional encoder. e specific parameter settings of the three CNN models are shown in Table 2.
e size of input time-frequency maps is 128 × 128 × 3.
e convolutional encoder has three convolutional layers and three pooling layers. e size of convolutional kernels in the three convolutional layers is 3 × 3, and the stride is 1 × 1. e tiling sizes for the three pooling layers are 4 × 4, 4 × 4, and 2 × 2, respectively. e batch normalization operation is applied to the output of each convolutional layer to make sure that the inputs and    Computational Intelligence and Neuroscience outputs of each convolutional layer have the same distribution with input data. e ReLU function is used for the activation function, and the maximum pooling is used for the pooling layer. e padding parameter in both the convolutional and pooling layers is "SAME" to retain the most significant features of the input maps. e convolutional decoder includes six deconvolutional layers, whose parameters of each layer correspond to those of the convolutional encoder. Except for the last deconvolutional layer, the outputs of all the deconvolutional layers are batch normalized, and the activation function is ReLU function with "SAME" padding.
As illustrated in Table 2, the outputs of the convolutional encoder are output by the pooling layer 3, and its size is 4 × 4 × 15, which means that the hidden layer contains 15 feature maps with a size of 4 × 4. ese feature maps can be expanded into a fully connected layer with a size of 1 × 240, and the values in this fully connected layer are the depth features of the bearing time-frequency map extracted by the convolutional autoencoder. Taking bearing 1-1 as an example, the dimension of depth features extracted by CAE is 240, and 4 representative features, including Nos. 59, 74, 113, and 125, are selected, and their curves change with the operation time are shown in Figure 7. Figure 7 shows that the depth features extracted by CAE have a certain trend with time, which is suitable for the prediction analysis of bearing RUL. However, different depth features have different manners of tracking the degradation trend of bearing performance, in which some features increase with time or decrease with time, and some features suddenly change in amplitude at a certain point in operation time. Note that the depth features extracted by CAE cannot uniformly reflect the degradation process, and all the depth features are mapped into a unified DI by SOM. e first 5% normalized feature sets of each bearing are selected as the training data sets of normal bearing to train the SOM model corresponding to each bearing, and the DI curves can be obtained by inputting the normalized feature sets of all the tested bearing into the well-trained SOM model. Figure 8 shows that the lifecycle degradation trajectories of the bearings under the same working condition show a similar degradation trend with some differences that can be seen, in which the DI values corresponding to the final failure time points are not the same. It can be seen from Figure 9 that the DI curves of test bearings are obviously heterogeneous with different lengths. Due to the difference in data distribution between full-life bearings and test bearings, the degradation trends of bearings under the same working conditions may be different, while the degradation trends of bearings under different working conditions are similar to some extent, so it is difficult to visually determine       the full-life bearings with similar degradation trajectories to test bearings. erefore, it is necessary to match the trend of the degradation trajectory of each bearing and set a reasonable failure threshold for the test bearing by considering the discrepancy and similarity of degradation trajectories of training and testing bearings.

Bearing Degradation Model Construction Using an Optimized Order of Time Power
Terms. e fitted degradation curves with different order of time power terms of FOTP-GM (1, 1) have different abilities to predict the bearing RUL. Using the method proposed in Section 3.2, the fitting performance is evaluated by calculating the fitting error between the origin degradation trajectory DI life and the fitted degradation curve DI fit . Figure 10 shows the degradation model constructed by FOTP-GM (1, 1) with optimal orders of time power terms.
In the fitting error curves of the FOTP-GM (1, 1) model for training bearings, there exists a trend transition point after which the fitting error is getting smaller fluctuation, as shown in Figure 10. e time power terms order corresponding to such trend transition point can be selected as the optimal order of time power terms for the given training bearing. e training bearings have different fitted degradation curves, which illustrates the heterogeneity of the bearing performance decay. e results of the optimal order and the value of the corresponding fitted degradation curves of training bearings at the life endpoint are listed in Table 3.

Failure
reshold Setting of Testing Bearings. To predict the bearing RUL accurately, a reasonable failure threshold for each testing bearing is needed. e similarity matching analysis is performed between the DI curves of testing bearings and the lifecycle degradation trajectories of the training bearings to classify the testing bearing degradation trends based on the degradation trends of training bearing, and the corresponding training bearing degradation model can be selected as the reference degradation model of the selected test bearing. Using the method mentioned in Section 2.3, the similarities between the lifecycle degradation trajectories of the training bearings and the DI curves of testing bearings are illustrated in Table 3.
Based on Tables 3 and 4, the failure thresholds of the testing bearings can be obtained using equation (16) given in Section 3.3 and illustrated in Table 5.

Bearing Remaining Useful Life Prediction.
Based on the fitted DI curves and the preset failure thresholds of the test bearings, FOTP-GM (1, 1) model is adopted to predict the RUL of the test bearings using an optimal time power terms order selection method introduced in Section 3.2. e performance of each order of time power terms of the trained model is evaluated quantitatively according to equation (15). en the RUL of the given testing bearing can be estimated using the optimal fitted degradation curve and the preset failure threshold. For instance, the e test curves, the optimal fitted degradation curves, and the RUL prediction results of bearings 1_3, 2_7, and 3_3 are shown in Figure 11.
As shown in Table 6, the optimal order of the FOTP-GM (1, 1) is different for different testing bearings. Table 6 also illustrates the predicted RULs and the error rate of the testing bearings, in which the RUL prediction results of bearings 1_5 and 1_6 are 5,340 s and 4,380 s. However, the given actual RULs of these two bearings are 1,610 s and 1,460 s. e RUL prediction errors of bearings 1_5 and 1_6 are large, which results in poor prediction performance of the proposed method. It seems that the proposed method fails to predict the RULs of these two bearings accurately. However, reference [30] pointed out that bearings 1_5 and 1_6 have their own specificities in the IEEE PHM 2012 prognostic challenge data sets. e data set description document states that "For security reasons, tests were stopped when the amplitude of the vibration signal overpassed 20 g." However, bearings 1_5 and 1_6 did not meet this requirement. As shown in Figure 12, the waveform of the last sample of bearings 1_3, 1_5, 1_6, and 1_7 are displayed with the amplitude threshold 20 g marked. Figure 12 illustrates that at the end of the test, the vibration peak amplitude of bearings 1_5 and 1_6 are around 10 g and did not overpass the preset failure threshold of 20 g.
To further verify that whether the vibration amplitudes of bearings 1_5 and 1_6 have reached the preset failure threshold, the whole life vibration peak amplitude curves of these two bearings are presented in Figure 13. Figure 13 indicates that the vibration amplitudes of bearings 1_5 and 1_6 did not pass over 20 g during the whole testing process, which means that the given reference RULs of these two bearings are shorter than their real RUL in the case that the failure threshold is set to 20 g. So it is reasonable that the RUL prediction results of the proposed method for bearings 1_5 and 1_6 are longer than the given reference RUL. e error rates of the RUL prediction results of bearings 1_5 and 1_6 are not representative due to the given reference RULs being smaller than their actual RULs. e performance of the RUL prediction results of testing bearings can be evaluated using three metrics: root-meansquare error (RMSE), symmetric mean absolute percentage error (SMAPE), and scoring function. e RMSE and SMAE are defined as follows: e scoring function has been adopted by many researchers and IEEE PHM 2012 Prognostic Challenge [46]. By considering the different weights of earlier and later prediction results, the scoring function is defined as follows:

16
Computational Intelligence and Neuroscience where i ∈ [1,11] states for the test bearings defined in Table 1, actRULi and RULi denote the RUL of the bearing estimated by the experimental participants and the actual RUL to be predicted, respectively. E i is the percent prediction error on testing bearing i, and A i is the score of accuracy of RUL estimates for testing bearing i. e Score represents the overall accuracy of testing bearing RUL prediction, and higher value in Score, higher overall RUL prediction accuracy.
To verify the performance of the proposed method, several state-of-the-art RUL prediction methods are compared, including Sutrisno's vibration frequency signature anomaly detection and survival time ratio [13], Hong's combinatorial feature extraction and self-organization mapping [17], Guo's recurrent neural-network-based health indicator [24], Singleton's extended Kalman filter-based method [46], Zhu's multiscale convolutional neural network-based method [22], Cheng's transferable convolutional neural network-based method [29], Mao's deep feature representation and transfer learning [30], and Li's deep adversarial neural networks-based method [33]. Table 7 shows that the proposed method has an outstanding prediction performance. Compared with Sutrisno, Singleton, and Cheng's methods, the prediction RMSE of the proposed method is much smaller. Besides, compared with Mao's method, although the RMSE and score of the proposed method are slightly smaller, the proposed method has the lowest SMAPE value, showing that the proposed prediction model still has better precision. Combined with listed methods, the adaptive failure threshold setting enables the proposed method to predict the bearing RUL accurately under multiple operating conditions.

Experimental System and Data Description.
e experimental lifecycle data sets of rolling bearings provided by the Institute of Design Science and Basic Component at Xi'an Jiaotong University [44] are analyzed to further prove the effectiveness of the proposed method. As shown in Figure 14, the experimental platform consists of an AC motor, a motor speed controller, a rotating shaft, two supporting bearings, a hydraulic loading system, a test bearing, and so on. Two accelerometers (PCB 352C33) are positioned at 90°on the housing of the tested bearing to measure the horizontal and vertical vibrations of the tested bearing. Fifteen rolling element bearings (LDK UER204) were tested under three different operating conditions to collect degradation data of ball bearings until their total failure. As shown in Table 8, the first two bearings of each group were selected to train the run-to-failure data set to build prognostics models, and the remaining nine bearings were truncated and required to predict the RUL accurately. e sampling frequency of the acceleration signal collected by the test bench is 25.6 kHz, and the data acquisition card (LE DT9837) collects data once every minute, with a time of 1.28 s and 32,768 data points.

e Result of DI Curves Construction.
e Morletbased CWT method is applied to obtain the time-frequency diagrams from the vibration data sets of the six training bearings, as shown in Figure 15. Similar to Case 1, the timefrequency map of a normal bearing is clean with occasional random shocks seen. In the degradation stage, the timefrequency diagram starts to become cluttered, and the frequency response is concentrated in the middle frequency band. When the degradation progress approaches 100%, the time-frequency diagram becomes very cluttered, and the frequency response has amplitude in all frequency bands. e above analysis shows that the time-frequency diagram can hint at the development trend of defects in the time-frequency domain, and the time-frequency characteristics of the bearing vibration signals are sensitive to the bearing degradation.
Taking bearing 2_1 as an example, the deep hidden features of its time-frequency diagrams is extracted by the CAE model proposed in Case 1, and four typical feature curves changing with the operation time are shown in Figure 16, including Nos. 57, 94, 168, and 205. Similar to Case 1, Figure 16 indicates that the depth features extracted by CAE have a certain trend with time. However, different depth features have different manners of tracking the degradation trend of bearing performance, in which some features increase with time or decrease with time, and some features suddenly change in amplitude at a certain point in operation time. e SOM method is applied to fuse these depth features into a unified DI to reflect the degradation process uniformly. e first 5% normalized feature sets of each bearing are selected as the training data sets of normal Table 7: e comparison of RUL prediction results of testing bearings.

Methods
RMSE SMAPE Score Sutrisno's method [13] 0.3187 0.3583 0.3066 Singleton's method [14] 0.1161 0.3768 0.2645 Hong's method [17] 0.0907 0.2258 0.3614 Zhu's method [22] 0.0691 0.1549 0.3624 Guo's method [24] 0.0860 0.1910 0.2631 Cheng's method [29] 0.0971 0.2769 0.3035 Mao's method [30] 0.0558 0.2399 0.4285 Li's method [33] 0  Figure 14: Overview of XJTU's experimental platform [44].  Computational Intelligence and Neuroscience of bearings under the same working conditions may be different, while the degradation trends of bearings under different working conditions are similar to some extent, so it is difficult to visually determine the full-life bearings with similar degradation trajectories to test bearings. erefore, it is necessary to match the trend of the degradation trajectory of each bearing and set a reasonable failure threshold for the given test bearings by considering the discrepancy and similarity of degradation trajectories of training and testing bearings.

Bearing Degradation Model Construction Using an Optimized Order of Time Power terms.
e fitted degradation curves with different order of time power terms of FOTP-GM (1, 1) have different abilities to predict the bearing RUL. Using the method proposed in Section 3.2, the fitting performance is evaluated by calculating the fitting error between the origin degradation trajectory DI life and the fitted degradation curve DI fit . Figure 19 shows the degradation model of training bearings constructed by FOTP-GM (1, 1) with optimal orders of time power terms. Similar to Case 1, there exists a trend transition point after which the fitting error is getting smaller fluctuation in the fitting error curves of the FOTP-GM (1, 1) model for training bearings, as shown in Figure 19. e time power terms order corresponding to such trend transition point can be selected as the optimal order of time power terms for the given training bearing. e results of the optimal order and the value of the corresponding fitted degradation curves of training bearings at the life endpoint are listed in Table 9.
Taking subtask #6 with the division ratio of 75% as an example, the similarity matching analysis is performed between the historical DI trajectories of testing bearings and the lifecycle degradation trajectories of the training bearings to match the testing bearing degradation trends with the degradation trends of training bearing, and the    Table 11. Based on Tables 9 and 11, the failure thresholds of the testing bearings can be achieved using equation (16) given in Section 3.3 and illustrated in Table 12.

Bearing Remaining Useful Life Prediction.
e RUL of the given testing bearing can be estimated using the optimal time power terms order and the preset failure threshold. e RUL prediction results of subtask #6 are listed in Table 13, and for different testing bearings, the optimal order of the FOTP-GM (1, 1) is different. e predicted whole lives of the testing bearings are shown in Table 14. Due to the different division ratios used to generate the testing samples, the predicted lives of the testing bearings are different. e proposed method can adaptively update the degradation curve fitting model and failure threshold of the given test bearing based on the collected bearing information. As shown in Table 14, the life prediction results get a smaller fluctuation and become more accurate as the volume of testing bearing data increases.
e results of RUL predictions are shown in Figure 20. As can be seen from the figure, the predicted results fluctuate around the actual RUL label values.
is preliminarily    Table 14: e predicted lives of the testing bearings for 10 subtasks.

Conclusions
is paper proposes a new method that involves the difference and the similarity between the degradation trajectories of training and testing bearing to predict the RUL of roller bearings. is method mainly focuses on two aspects of bearing RUL prediction: data-driven based accurate failure threshold setting and optimal mathematical degradation model construction using deep features. From the experimental results, we have the following conclusions: (1) e CAE-based feature extraction method can adaptively extract the deep features associated with the bearing performance degradation. e SOMbased DI construction method can effectively handle the deep features extracted by CAE into a practical degradation indicator.
(2) e FOTO-GM (1, 1) model parameter optimization method can determine the optimal fitting order based on the fitting errors. e optimized FOTO-GM (1, 1) method can construct the fitting degradation model of the given bearing based on the degradation information. (3) e failure thresholds setting method based on DTW and optimized FOTO-GM (1, 1) can adaptively adjust the failure threshold of the given bearing according to the accumulation of test bearing information without human intervention. e proposed bearing RUL prediction model can adaptively update its parameters when new monitoring data are available. e comparison between the experimental results of this method and the existing methods shows that this method can effectively improve the prediction accuracy, reduce the uncertainty of prediction, and has better engineering practicability.

Data Availability
e data used to support the findings of this study are included within the article.