A Novel Multiscale Deep Health Indicator with Bidirectional LSTM Network for Bearing Performance Degradation Trend Prognosis

As rolling bearings are the key components in rotating machinery, bearing performance degradation directly affects machine running status. A tendency prognosis for bearing performance degradation is thus required to ensure the stability of operation. (is paper proposes a novel strategy for bearing performance degradation trend prognosis, including health indicator construction techniques and a performance degradation trend prediction method. To more accurately represent the degradation trend, the multiscale deep bottleneck health indicator is proposed as a new synthesized health indicator to remove high-frequency detail signals from features, which can reduce possible fluctuations in conventional synthetic health indicators. A suitable method for selecting the statistical characteristics required for fusion is also presented to solve the problem of information redundancy that affects trend representation. In addition, a stacked autoencoder network is used for deep feature extraction of selected statistical features. A bidirectional long short-term memory network prediction model is also proposed for the prediction of degradation trend, which can make full use of historical and future information to improve prediction accuracy. Finally, experiments are carried out to verify the effectiveness of the proposed method.


Introduction
Rolling bearings are one of the most critical components in rotating machinery and their operating state will affect the overall performance of mechanical equipment [1][2][3]. Bearing performance degradation trend prognosis can ensure the stability of mechanical equipment, avoid catastrophic events, and extend the life cycle of the machinery [4][5][6]. e prediction of bearing performance degradation is based on operating rules derived from the analysis of historical monitoring data [7,8]. In general, the prediction process can be divided into three stages: (1) data acquisition; (2) health indicator (HI) construction; and (3) performance degradation trend prognosis [4,9]. erefore, constructing HI from the monitoring data and using an effective prediction algorithm are both important factors affecting the final prediction result.
Features of the statistical data are most commonly used as HI to determine the true health of the bearing. e selection or construction of a suitable HI is highly important to correctly determine the occurrence and development of bearing degradation. Conventional HI constructions are mainly based on statistical characteristics. Specifically, the traditional method often uses the statistical features extracted from the time domain [10], frequency domain [11], or time-frequency domain [12] of the vibration signal to represent the degradation trend. For example, the root mean square (RMS) value of the time-domain vibration signal is a commonly used HI [7,13]. However, a simple statistical feature indicator is neither sensitive enough to early bearing defects nor stable enough in later stages. erefore, a multidimensional HI must be established to fully represent the trend of degradation.
In recent years, synthesized HI has attracted much interest for its ability to more accurately represent the tendency of bearing degradation and is usually constructed using a data fusion technology. Liao [14] used a genetic algorithm to perform deep feature fusion and extraction on different domain features to construct a new HI. Li et al. [15] employed cointegration theory to fuse the two degradation characteristics of RMS and SampEn and formed a new HI to improve the ability of fault prediction. Wang et al. [16] fused multiple statistical features through support vector data description and constructed an effective fusion HI. However, these articles rarely select the statistical characteristics of fusion, which may lead to redundant information and affect the description of the degradation process. In addition, the synthesized health indicators often contain a lot of detailed fluctuations, which is not conducive to the description of the degradation trend. erefore, the construction of a nonlinear multifeature model that can adapt to bearing trend prediction is urgently required.
After extracting appropriate feature parameters, a suitable prediction model is needed to achieve an accurate prediction of bearing degradation trend. Machine learning models have been widely used in machinery performance assessment and remaining life prediction. For instance, Gebraeel et al. [17] used neural networks for bearing degradation states and life predictions. Tang et al. [8] extracted a variety of characteristic parameters from the bearing vibration data and used the least-squares support-vector machine (LS-SVM) to predict the degradation trend of the bearing. Aye and Heyns [18] proposed an improved Gaussian process regression (GPR) for the prediction of remaining useful life (RUL). However, these traditional machine learning techniques cannot automatically learn multiple representations from the original input data and require hand-coded rules or domain knowledge.
Deep learning has been introduced and widely studied in recent years to solve such issues. Belonging to the field of artificial intelligence, deep learning can automatically and accurately learn deeper features from low-level feature composition [19][20][21]. Malhi et al. [22] used a continuous wavelet transform to preprocess the bearing vibration signals and then employed recurrent neural networks (RNNs) to further predict the trend of degradation. Deutsch and He [23] proposed a deep belief network-feedforward neural network (DBN-FNN) algorithm for RUL prediction, while Zhu et al. [24] extracted time-frequency features by wavelet transform and then used convolutional neural networks (CNNs) to estimate RUL. ese deep learning models can achieve certain effects for the fault diagnosis and prediction of mechanical equipment but do not fully consider the correlation between monitoring data or the correlation between monitoring data and time information. is characteristic means that such models can easily experience overfitting problems, which limits their application in mechanical equipment fault diagnosis and prognosis.
Long short-term memory network (LSTM) is a type of deep learning model that is suitable for processing and predicting important events with relatively long intervals and delays in time series. A memory controller is included in the LSTM structure to determine when to forget, remember, and output data. For instance, Zhang et al. [25] proposed a method for evaluating bearing performance degradation using LSTM recursive network. Wang et al. [26] solved the gradient explosion problem by using LSTM and realized the time series prediction of rolling bearing vibration signals. Hinchi and Tkiouat [27] used LSTM to capture the degradation process and predict the time value to estimate the remaining useful life. However, LSTM only utilizes historical information. To improve the prediction ability of the model in this work, a bidirectional LSTM network (BiLSTM) is employed to facilitate the use of both historical information and future information in the prediction of degradation trends.
In this paper, a novel multiscale deep bottleneck health indicator (MDBHI) with BiLSTM for performance degradation trend prognosis of rolling bearing is proposed. A more reasonable synthesized HI is developed to represent and predict the bearing degradation trend. e major contributions of this work are summarized as follows: (i) A method for multiscale decomposition of the selected statistical features is proposed. Multiscale decomposition is performed on the selected statistical features to be fused and high-frequency noise components are removed to solve the problem of large feature fluctuations in bearing performance prediction. (ii) A multiscale deep health indicator (MDBHI) is proposed. A stacked autoencoder (SAE) network is used for deeper feature extraction of multiple statistical features after multiscale decomposition in order to obtain an HI that better reflects the tendency of bearing degradation. (iii) A prediction model of bearing degradation trend based on BiLSTM is established which can make full use of both historical and future information to obtain more accurate forecasting trends. e remainder of the paper is organized as follows: in Section 2, the BiLSTM network, SAE, and the principle of multiscale decomposition are briefly introduced. In Section 3, an introduction of MDBHI extraction and the framework of the proposed method is provided. Experiments are presented to verify the effectiveness of the proposed method in Section 4, and, finally, conclusions are drawn in Section 5.

Overview of Bidirectional LSTM Networks.
e concept of bidirectional circulatory neural network was first put forward by Mike Schuster [28]. In the network, a backward hidden layer is added to the traditional RNN network structure to enable training of input information within a specific time range in the past and future. e BiLSTM network is an improved network structure based on bidirectional RNN. Hidden layers in RNN are transformed into memory modules in the LSTM network and such structures can make full use of temporal sequence information for training. A typical BiLSTM structure is shown in Figure 1.
e LSTM memory module for each time step is marked by green or orange circles in the figure. Compared with the conventional LSTM network, BiLSTM contains hidden layers that pass the input information from back to front in each hidden layer node until finally the information is gathered in the output layer. ere is no connection between the two hidden layers, so it can be considered as two LSTM networks stacked to work together on one output.
In BiLSTM training, forward propagation is basically consistent with conventional LSTM. Taking the output as an example, the basic principle is as follows: , and f 1 , respectively, represent the state of the forward hidden layer, the weight of the input, the weight of the hidden layer at time t − 1, the weight of the output, and the activation function; in the same way, , and f 2 , respectively, represent the state of the backward hidden layer, the weight of the input, the weight of the hidden layer at time t + 1, the weight of the output, and the activation function; and y t and f 3 , respectively, represent the output value and the activation function of the output layer.
Backpropagation through time (BPTT) algorithm is still used to update weight parameters in BiLSTM backpropagation. By calculating the derivative of the loss function to the weight of each layer, using the chain rule and gradient descent, the network parameters are constantly updated to carry out network converge. e only difference is that the backward propagation updates the forward hidden layer from t � T to t � 1 and the backward hidden layer from t � 1 to t � T. By adding a time delay, BiLSTM ensures that the current output depends on future information in addition to the historical information, which can improve the prediction ability of the model.

Stacked Autoencoder Networks.
Autoencoder network (AE) is a type of unsupervised neural network which employs an encoding and decoding process. e first part is the encoding process, which transforms the input into other dimensions. e second part is the decoding process, which restores the encoded information to input. For AE networks, the final output of the network is usually not included, but encoded low-dimensional features are extracted to achieve the effect of feature reduction. e forward information dissemination process of the AE network can be expressed as follows: input signal X � x 1 , x 2 , . . . , x n is mapped to a new dimensional space by an encoder and a hidden layer is obtained. e hidden layer is then transformed into an output X � x 1 , x 2 , . . . , x n using a decoder, and the process can represent where f 1 is the encoder and g 1 represents the decoder. Encoders and decoders typically perform nonlinear mapping of their inputs. e basic principles are as follows: where s f 1 and s g 1 represent the nonlinear activation functions of the hidden layer and the output layer, respectively; W 1 and W 1 ′ represent the weight parameters of the input and hidden layer; and b 1 and W 1 ′ represent the bias parameters of the hidden layer and the output layer, respectively.
Input layer Output layer Shock and Vibration e backward information propagation process of the AE network can be expressed as follows: the error between the input value and the model output value can be backpropagated through the preset loss function, so that the network weight parameter is continuously updated until the network model loss function reaches the expected threshold. e specific mathematical process is to find the network parameters that minimize the loss function: where θ � W 1 , b 1 , θ � W 1 ′ , b 1 ′ , and θ * and θ ′ * indicate the updated weight parameters. e schematic diagram of the three-layer AE network is shown in Figure 2.
On the basis of AE, SAE takes the output of the hidden layer of the AE as the input of the next AE to continue on dimensional reduction learning of features. e SAE is able to learn a variety of expressions of raw data layer by layer, and each layer is based on the expression of the bottom layer, so it is more suitable for tasks such as complex classification. e specific process is shown in Figure 3.

e Multiscale Decomposition.
As a representative multiscale decomposition method, wavelet transform can obtain the time-frequency components of nonstationary signals. e basic idea is to approximate a target signal by using a cluster of wavelet function systems, which are obtained by different scaling and shifting transformations of a basic wavelet function.
For the continuous wavelet transform, the scale factor a, translation factor b, and time t are continuous. It is inconvenient to process the parameters by computer, which will also increase the redundancy of wavelet coefficients. erefore, it is necessary to discretize the continuous wavelet.
e discretization of the scale factor and the translation factor is denoted as follows: where a j 0 and b 0 are discrete initial values of scale factors a and b, and j � 0, ± 1, ± 2, . . ., k � 0, ± 1, ± 2, . . .. e discrete wavelet function ψ j,k (k) and discrete wavelet transform DWT are defined as follows: Multiscale wavelet analysis is essentially a process of continuous approximation. By approximating the function f(t) with different scales for function f(t) in any one-dimensional space, the approximation function of the function f(t) will be obtained in the process of approximation. e function is an imitation of the function f(t). As the approximation process continues to increase, this function gradually becomes more precise.
Multiscale decomposition can obtain some low-frequency approximation signals and high-frequency detail signals by approximating nonstationary signals at different time resolutions. e low-frequency approximation signal characterizes the approximate contour and trend of the original signal, while the high-frequency detail signal contains the detailed information in the signal, which can be expressed as where A is the low-frequency approximation signal and D is the high-frequency detail signal.

Extracting Deep Bottleneck Health
Indicator. e feature extraction of vibration signals is a necessary premise for the degradation trend prognosis of bearing performance. Extracting rich feature samples by monitoring vibration signals can improve the accuracy of trend prediction. Most commonly used statistical feature extraction methods work to explore feature parameters in the time domain, frequency domain, or time-frequency domain, respectively. However, this kind of simple statistical feature parameter cannot fully reflect the running status of bearings.
To solve this problem, it is necessary to select various sensitive characteristic parameters to measure the health of the bearing from various aspects. However, if the features to be fused are not properly selected, information redundancy may diminish the extraction effect. erefore, depth features must be extracted according to the various sensitive feature parameters selected. is process ensures that the features are representative during bearing degradation.
Based on the monotonous feature evaluation indicator, this paper utilizes the correlation between feature parameters to further select sensitive feature parameters. is correlation indicator based on feature parameters is referred to as a consistent feature evaluation indicator in literature [29]. e consistency evaluation indicator Con is defined as follows: where F 1 and F 2 are two characteristic parameters, f 1,k and f 2,k represent the values of F 1 and F 2 at time t k , respectively, and F 1 and F 2 represent the mean values of F 1 and F 2 , respectively.
Six characteristic parameters can be obtained through monotonic evaluation indicators: absolute mean value (AMV), variance, RMS, peak value, energy spectrum (ES), and wavelet decomposition of second layer energy (WDSLE). e consistency evaluation indicator between each characteristic parameter is then calculated and several characteristic parameters with high consistency of degradation laws are selected. e steps for extracting deep bottleneck health indicators are as follows: first, vibration signals measured by sensors are collected; different signal characteristics are then classified and processed; and, finally, deep learning algorithms are used for deep feature extraction of various features. Traditional multifeature extraction algorithms such as principal component analysis (PCA) can reduce the feature dimension and accelerate operation. However, the transmission path of a bearing vibration signal is very complicated, and the signal obtained by sensors has strong nonlinear characteristics. erefore, although PCA achieves dimensionality reduction by studying a linear transformation of the data, it cannot learn deeper nonlinear features. e SAE can obtain nonlinear characteristics of a bearing vibration signal by setting a nonlinear activation function. erefore, in this paper, deep features are obtained from multiple features through SAE; then, the output of the hidden layer provides the deep feature parameters required. e extraction process of a deep bottleneck health indicator (DBHI) is shown in Figure 4.

Multiscale Decomposition Parameter
Selection. e DBHI obtained using the process above often contains many fluctuating details. is has an impact on bearing performance predictions, especially in the later stages of bearing degradation. erefore, the extraction of multifeature sensitive parameters is attempted in this work using a multiscale decomposition. rough multiscale wavelet decomposition of the multifeature sensitive set, the low-frequency approximate component of the feature is extracted and high-frequency detailed components with large fluctuations are eliminated. Multiscale decomposition can be used to optimize the sensitive feature sample set, which is more conducive to trend prediction in the later stage of bearing degradation. e multiscale decomposition method contains two parameters. e first one is the wavelet function and the second is the decomposition scale, also called the decomposition layer number. e difference between these two parameters will affect the final extracted trend term, so it is necessary to select an appropriate wavelet function and decomposition scale.

Selection of Optimal Decomposition Function.
Wavelet transform is proposed to improve Fourier transform in this work. Compared with the trigonometric functions used in the Fourier transform, the wavelet function has a characteristic of diversity. Moreover, different wavelet functions will produce different effects on the analysis of the same signal, so it is necessary to select an appropriate wavelet function to analyze the corresponding signal.  e selection criteria for the generally accepted optimal wavelet function can be roughly divided into orthogonality, tight support, regularity, symmetry, and vanishing moment. However, the five common wavelet functions mentioned above contain individual advantages and disadvantages and should be selected according to the actual needs. In this section, multiscale wavelet decomposition is used to optimize DBHI of bearing degradation trend and the trend term (low-frequency approximation component) of sensitive feature parameters is extracted. erefore, the selected wavelet basis function should be able to separate the low-frequency approximation component from the high-frequency detail components in the sensitive feature parameters. For this purpose, a trending optimal wavelet function selection criterion is defined [9], and the principle is as follows: where t k represents t k moment and f k is the value of the characteristic parameter at time t k . e closer the trending criterion Tre is to 1, the better the trend of the characteristic parameters is. For the selection of an optimal wavelet function in MDBHI extraction proposed in this paper, the five most common wavelet functions described above are selected as alternative wavelet functions. Considering that some wavelet functions consist of a series of functions, such as dbN and coifN, the vanishing moment N should be determined. e larger the vanishing moment, the higher the smoothness of the wavelet, and the better the wavelet decomposition band division effect. However, an excessive vanishing moment will cause the tight support in the time domain to weaken, increase the calculation requirements, and provide poor real-time performance. Two-layer wavelet decomposition is then performed on the multifeature sensitive parameters selected in the upper part and the trend Tre value extracted by each wavelet function is calculated. Finally, the optimal wavelet function can be selected.

Selection of the Optimal Number of Decomposition
Layers. Multiscale wavelet decomposition is a constant approximation of the target signal in which some highfrequency components will be discarded. As the number of decomposition layers increases, the high-frequency details of the signal will decline progressively, and the low-frequency trend information will become increasingly detailed. is is conducive to the prediction of bearing degradation trend, but too many decomposition layers will lead to a deformation of the trend term obtained by the final decomposition. erefore, it is important to select an appropriate number of wavelet decomposition layers.
A fusion indicator is proposed in this paper for the selection criteria of wavelet decomposition layers. rough a weighted summation of consistency indicator and trend indicator, a fusion indicator Fus is obtained, which is defined as follows: where w 1 and w 2 , respectively, represent the weight of the consistency indicator Con and the trend indicator Tre. 6 Shock and Vibration Similar to the above wavelet function selection, the trend indicator represents the trend of sensitive feature parameters after multiscale decomposition. In theory, the higher the number of decomposition layers, the smoother the trend curve. However, too many decomposition layers will cause the feature to be deformed, thus increasing the work of consistency indicator to assist in the adjustment. Consistency indicator calculates the correlation between sensitive feature parameters and original feature parameters after multiscale decomposition, and, in theory, the more the decomposition layers, the smaller the consistency indicator. By combining these two indicators, the optimal wavelet decomposition layer can be appropriately selected.

Summary of the Proposed Method.
A trend prediction method based on MDBIH and BiLSTM is proposed in this paper to solve the problem of value fluctuations in bearing degradation feature parameters. First, the statistical features from the time domain, frequency domain, and time domain are, respectively, extracted. e appropriate statistical characteristics are selected based on the consistency evaluation indicator, and a multifeature sensitive parameter set is obtained. To extract the low-frequency approximate component of the feature, the high-frequency detailed components with large feature value fluctuations are removed. Multiscale decomposition is then carried out for the multifeature sensitive parameter set and the decomposed multifeature set is input into the SAE to extract MDBHI. Finally, the BiLSTM model is used to predict the trend of bearing degradation. By adding time delay information, the current output is dependent on not only the historical information but also the future information, so as to improve the prediction ability of the model. e whole framework of the proposed method is shown in Figure 5

Experiment Verification
Rolling bearing degradation trend prediction requires a fulllife test data of the bearing. Usually, the actual bearing normal degradation process takes too much time; thus, a bearing acceleration degradation test is often used. e experimental data in this section uses data from the degraded database of four roller bearings provided by the University of Cincinnati and shared on the website of the Prognostic Data Repository of NASA [30]. e test rig for bearing acceleration degradation is shown in Figure 6. e test rig was comprised of an alternating current dynamo and a shaft with four bearings mounted on the shaft. e dynamo transmitted power through the belt to make the shaft rotate at a speed of 2000 r/min. e bearing on the test rig was subjected to a constant load of 6000 lbs along the shaft diameter direction. e experiment employed a total of three sets of data. Each set contained complete degradation process data from the start of bearing testing to failure, and the second data set was used.
e sampling frequency was set to 20 kHz and the sampling time was 1 s. A total of 20,480 acceleration data points were collected during each sampling time and the sampling interval duration was 10 min. e bearing accelerated degradation experiment was run for eight days and a total of 984 sets of data were collected until a serious outer ring failure occurred in bearing 1. e full-life cycle vibration time domain of a rolling bearing is shown in Figure 7.
In general, the life cycle of bearing degradation can be divided into five stages, which are running-in stage, steady state stage, defect initiation, crack initiation to crack opening state, and defect propagation [31].
is is because the bearing wear evolution process will produce some surface topological changes which have a significant influence on bearing state monitoring. e test data used in this work incorporates this type of bearing degradation growth process; thus, the objective is to obtain a more accurate prognosis of the degradation trend, especially near the changing points of different stages.

Establishment of Feature Fusion Health Indicators.
e six statistical characteristics mentioned in the third section are AMV, variance, RMS, peak, ES, and WDSLE, and their characteristic parameter diagrams are shown in Figure 8. As shown in the figure, the degradation trend represented by different features, the starting point of the fault, and the degree of degradation are different. In other words, a simple feature cannot fully reflect the bearing running status. Intuitively, the extraction of multiple characteristic parameters is an alternative choice to solve this problem. In this work, the consistency of different characteristics is calculated according to the consistency evaluation indicator. e purpose is to determine the most suitable features for the establishment of subsequent health indicators. e results are provided in Table 1 and Figure 9.
It can be clearly observed from Table 1 that, among the six characteristic parameters selected by monotonicity evaluation indicators, the two-pair consistency evaluation indicators of the five characteristic parameters, that is, absolute mean value, variance, effective value, peak value, and energy spectrum, are very large, indicating that the degradation rules contained in the five characteristic parameters are basically consistent. However, the consistency evaluation indicator between the second layer energy of wavelet decomposition and the other five characteristic parameters is relatively small, indicating that the second layer energy of wavelet decomposition cannot accurately represent the bearing degradation information. erefore, in this paper, the five characteristic parameters of absolute mean value, variance, effective value, peak value, and energy spectrum are selected as the original characteristic sample set of DBHI.

Extracting Deep Bottleneck Health Indicator.
In order to select the best wavelet function required in multiscale decomposition, the five common wavelet functions mentioned in the third section are selected as candidate wavelet functions and the vanishing moment M is set as 4 to make the contrast more complete. Two-layer wavelet decomposition is then performed on the multifeature sensitive parameters selected above. Finally, the trending Tre corresponding to the trend term extracted by each wavelet Optimal wavelet function selectioncriterion Selection criteria of wavelet decomposition layer number     function after processing each feature parameter is calculated, and then the optimal wavelet function is selected. e results are shown in Table 2 and Figure 10. According to Table 2, for these five wavelet functions, the coif4 wavelet function has the largest mean value among the samples of sensitive characteristic parameters after two-layer wavelet decomposition. Moreover, the independent trend of each sensitive feature obtained by coif4 wavelet decomposition is higher than that of other wavelet decomposition. erefore, coif4 wavelet is selected as the optimal wavelet function to extract MDBHI features of bearings.
Next, the number of wavelet decomposition layers must be selected according to the fusion indicator mentioned above. In this paper, the two weight values in the fusion indicator are set to be 0.5. e next step is to determine an appropriate number of decomposition levels. e fusion indicators proposed in this article can help in this selection. Table 3 and Figure 11 show the calculation results of the fusion indicator of the one to six layers of the wavelet decomposition. It can be clearly seen from the histogram that when the number of decomposition layers N � 3, the fusion indicator value of each sensitive feature is larger than that of the other decomposition layer numbers. e average value of the fusion indicator is also the largest when N � 3, so the optimal number of decomposition layers N � 3 is selected. e optimal wavelet function is determined as coif4 wavelet based on trend evaluation indicators, the optimal wavelet decomposition layer is selected as three layers based on fusion indicators, and then the sensitive feature samples are extracted, as shown in Figure 12. Comparing Figures 12 and 8, it can be observed that the trend of feature parameters after multiscale decomposition is more clear.
is is helpful for the extraction of bottleneck health indicators in the next step.
Finally, through the screening of trend indicator and fusion indicator, the optimal wavelet function is determined as coif4, the optimal wavelet decomposition layer number is three, and the sensitive feature samples after multiscale        decomposition are calculated. en, the five selected features after multiscale decomposition are input to SAE for deep feature extraction. Finally, MDBHI proposed in this paper can be obtained, and the results are shown in Figure 13. A comparison between MDBHI and DBHI without multiscale decomposition is also provided in the same figure.
Compared with the DBHI parameters without multiscale decomposition, MDBHI obviously removes some detail signals with large fluctuations. Removing these details can improve the accuracy of the prediction of bearing degradation trends. In the late stage of degradation, this corresponds more accurately with the degradation state of the    bearing. erefore, MDBHI is more conducive to the prediction of the degradation trend of the bearing in the overall trend.

Bidirectional LSTM Networks Trend Prediction Results.
Finally, BiLSTM is used to predict the bearing degradation trend. e specific implementation steps are as follows.
Step 1 e construction of the predictive model is first carried out. In order to verify the validity of the model, MDBHI parameters must be divided into two parts; then the appropriate sample pairs are built based on the embedding theory.
Step 2 e BiLSTM model is trained using training samples to achieve convergence. e predicted samples are then input into the trained BiLSTM model to obtain predicted values.
According to the method proposed in this paper, 100 epoch models were trained to achieve convergence. e model prediction results and prediction error rates in the test samples are shown in Figure 14. According to the results, it can be seen that the prediction method proposed in this paper can accurately predict the development of bearing degradation.
e prediction results and prediction error rates of the LSTM based on DBHI are shown in Figure 15. Compared with the two methods, the prediction trend using DBHI and LSTM is less accurate. And the prediction error is larger than the proposed method.
To further demonstrate the effectiveness of the presented method, confidence intervals (CI) for the prediction results obtained using the method are presented in Figure 16. It can be seen from the figure that the prediction component is almost entirely within the 95% CI, demonstrating that the prediction of bearing degradation tendency with the proposed method is effective.
Root mean square error (RMSE), mean absolute error (MAE), and the decision coefficient (R2) are used to evaluate the accuracy of the prediction and compare it with traditional bearing trend prediction methods such as LSTM network based on SAE (SAE-LSTM), LSTM network based on PCA (PCA-LSTM), circulatory neural network based on statistical characteristic root mean square (RMS-RNN), and BP neural network based on statistical characteristic root mean square (RMS-BP). e results are shown in Table 4 and Figure 17: From the perspective of prediction errors, compared with traditional methods, the prediction effect based on MDBHI and BiLSTM is better than others. is result indicates that the health indicator (BHI) after removing detailed fluctuation components can successfully reflect the tendency of bearing degradation and will improve the final prediction effect. In addition, the introduction of BiLSTM can effectively utilize the characteristic information of bearing degradation trends, thereby minimizing prediction errors and improving the prediction effect.

Conclusions
A method of bearing degradation trend prediction based on MDBHI and BiLSTM was proposed in this paper. e method can solve the problem that a simple feature cannot accurately describe the bearing degradation process. In terms of HI extraction, the MDBHI extracted from the selected multiple features by SAE has a superior ability to represent the bearing degradation process. And after multiscale decomposition, MDBHI solves the problem of highfrequency components in DBHI that affect the trend representation. In terms of prediction methods, based on the BiLSTM network, the past and future information is fully  utilized to improve prediction accuracy. According to the experimental results, the proposed method based on MDBHI and BiLSTM obtains higher prediction accuracy. erefore, this method has broad application prospects in bearing condition assessment and trend prediction.
As future works, after HI are extracted, some appropriate methods should be used to judge the starting point of the fault. is will help determine a more accurate time to start prediction. In addition, end-to-end forecasting methods should also be explored.
Data Availability e manuscript uses public data from the degradation database of four roller bearings provided by the University of Cincinnati.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.