Remaining Useful Life Prediction of Rolling Bearings Using PSR , JADE , and Extreme Learning Machine

Rolling bearings play a pivotal role in rotating machinery. The degradation assessment and remaining useful life (RUL) prediction of bearings are critical to condition-based maintenance. However, sensitive feature extraction still remains a formidable challenge. In this paper, a novel feature extraction method is introduced to obtain the sensitive features through phase space reconstitution (PSR) and joint with approximate diagonalization of Eigen-matrices (JADE). Firstly, the original features are extracted frombearing vibration signals in time and frequency domain. Secondly, the PSR is applied to embed the original features into high dimensional phase space. The between-class and within-class scatter (SS) are calculated to evaluate the feature sensitivity through the phase point distribution of different degradation stages and then different weights are assigned to the corresponding features based on the calculated SS. Thirdly, the JADE is employed to fuse the weighted features to obtain the advanced features which can better reflect the bearing degradation process. Finally, the advanced features are input into the extreme learning machine (ELM) to train the RUL prediction model. A set of experimental case studies are carried out to verify the effectiveness of the proposed method. The results show that the extracted advanced features can better reflect the degradation process compared to traditional features and could effectively predict the RUL of bearing.


Introduction
Currently, Prognostics and Health Management (PHM) is an important research area in industry, which aims at increasing availability and decreasing the downtime by predicting the residual life of machine.Rolling bearing as a fundamental component is widely used in the rotary machine.Bearing failure is one of the principal reasons for the machine damage [1].In brief, condition monitoring of bearing is becoming more and more significant, especially in the RUL prediction, which can reduce the costs of maintenance.There are three main approaches, namely, model based, data driven, and hybrid [2][3][4].Among them, the data driven methods are easier to deploy than others, which are based on the relation between degradation state and measurement data [5].
One of the challenges for RUL prediction based on data driven is sensitive feature extraction from vibration signals that are always mixed by noise [6,7].Many modern mathematical methods are employed to overcome this challenge.These methods offer systematic, scientific, and effective tactics for extracting the features that can reflect the bearing degradation.There are some representative feature extraction methods via statistical calculation, such as time domain analysis, frequency domain analysis, and time-frequency domain analysis [8][9][10].However, these original features are often redundant and with high dimensionality.Therefore, dimensionality reduction methods such as the principal component analysis (PCA) and the independent component analysis (ICA) are employed to reduce the redundancy for machine fault diagnosis [11][12][13].On the other hand, the nonlinear dimensionality reduction methods are developed in recent years, such as local embedding structure (LLE) [14], local tangent space information (LTSA) [15], and local adjacency relations (LE) [16].However, they are often with complex algorithm and need a large amount of calculation [17].
In this paper, a new dimensionality reduction technique is introduced based on approximate diagonalization of Eigenmatrices (JADE) algorithm, which is originally used in blind identification.Generally, the mixed model without environmental noise can be effectively identified by JADE.However, it has a poor performance when the signals are mixed with background noise [18].To overcome this problem, a feature weighted fusion method combining the phase space reconstitution (PSR) [19] with the between-class and withinclass scatters evaluation [20] is proposed in this paper.As different kinds of signals demonstrate different structure in phase trajectory, the PSR can better grasp the nature and regularity of the time series.
After extracting feature from vibration signals, machine learning algorithms can be used to establish the bearing RUL prediction model, such as the support vector machine (SVM) [21][22][23] and the ANN [24].In recent years, a new machine learning algorithm called extreme learning machine (ELM) has been introduced by Huang et al.It is a novel algorithm of single hidden layer feed-forward neural networks (SLFNs), which has an extremely fast learning speed and advantageous generalization capability through randomly choosing hidden nodes and analytically determining output weights [25].Moreover, ELM has a commendable performance in both classification and regression with a simple structure.Compared with back-propagation (BP) feed-forward network learning algorithm, the training speed of ELM is much faster while obtaining better generalization [26,27].In view of these advantages, the ELM is selected in this paper to establish bearing RUL prediction model.
The rest of the paper is organized as follows.Section 2 gives a brief introduction of PSR, JADE, and ELM.The proposed RUL prediction procedure for bearings is illustrated in Section 3. Experimental validations and comparison with other linear dimensionality reduction methods are presented in Section 4. Finally, the conclusions are drawn in Section 5.

Phase Space Reconstitution (PSR).
Chaotic time series analysis is a method for nonlinear signal processing.By extending the time series to high dimensional phase space, this method can grasp the nature of time series much better.The research of PSR on chaotic time series was started from Packard et al. [28].Suppose that the time series is  = ( 1 ,  2 , . . .,   ), the embedding dimension is chosen as , and the time delay is selected as .And then a phase point is constructed as where  is the number of time series and  is the index of the phase point.However, the acquired signals are blended with noise, so the parameters should be determined under a reasonable rule [29].In this paper, the autocorrelation function is employed to identify the time delay, which is defined as where  = (1/) ∑  =1 ().According to Rosenstein's theories [30], the minimum  that makes   () lower than 1 − 1/ can be considered as the most suitable time delay.And Cao's algorithm [31] is employed to identify the embedding dimension.Suppose that the nearest phase point to   () is  near(,) ().In accordance with the embedding theorem, if  is the suitable embedding dimension, then any two points that are close to the  dimensional reconstructed phase space will continue to be close to the  + 1 dimensional space ( With the increase of ,  1 () will be gradually stabilized; therefore, the minimum  that stabilizes  1 () is set as the embedding dimension.However, it is difficult to determine whether the curve of  1 () is stable due to the limited length of samples.Therefore, supplementary criteria are proposed as shown in where  2 () will be changed with the increasing of  if the sample is deterministic.Besides, more details of the description are presented in [32].

Joint Approximate Diagonalization of Eigen-Matrices (JADE).
The JADE algorithm is widely used in blind source separation (BSS) [33]; it is based on a standard linear data model that can be expressed as follows: where () is the source signal that mixed with noise, () is additive noise, and () is the source signal, which can be obtained by transfer matrix .The JADE can be summarized as follows: (a) Signal whitening is as follows:  where () is whitened signal,  1 ,  2 , . . .,   is the eigenvalue of sample, ℎ 1 , ℎ 2 , . . ., ℎ  is the corresponding eigenvector, and  is the noise variance.
(b) Calculate the fourth-order cumulants [34] where  is the dimensionality of the vector .
(c) Calculate the unitary matrix where # denotes the pseudo-inverse, "off" is the square of the nondiagonal elements, "arg min " is the argument of a complex number, and  is a rotation matrix.
(d) The single model can be separated as where Â = Û # .
In this paper, JADE is firstly applied in dimensionality reduction and feature extraction.The single model can be obtained through Eigen-decomposition of the fourth-order cumulative matrix of the multivariate data.This not only is simpler than PCA and fast-ICA algorithm but also can improve the robustness of the results.However, the original features have different units and dimension, and it is difficult to distinguish between the contributions of each feature to the overall feature.

Extreme Learning Machine (ELM).
ELM is an efficient learning algorithm for classification and prediction in SLFNs, the structure of ELM as shown in Figure 1.It can randomly generate the bias and input weights of hidden nodes.Moreover, it can immediately get the result without the puzzle of a local minimum [35].
Suppose there are  samples (  ,   ), where   denotes the feature and   denotes the target value.Then the model of SLFNs with Ñ hidden nodes can be described as below: where   = [ ,1 ,  ,2 , . . .,  , Ñ]  is the weight vector that connects the th hidden and the input nodes and   is the th bias of hidden layer nodes, and they are randomly assigned.
,  ,2 , . . .,  , Ñ]  is the weight vector connecting the th hidden and the output nodes and  is the output value.To minimize the error between output   and target   , which is represented as ∑  =1 ‖  −   ‖ = 0, there is an existing optimal value of   such that Then above  equation can be written compactly as where Thus the output weight  can be written as where Although the ELM is faster than other machine learning algorithms, its accuracy is still dependent on feature extraction.

The Proposed Method for Bearing RUL Prediction
In the abovementioned methods, the PSR has superior merits for characterizing nonlinear time series.The JADE focuses on feature fusion or dimensionality reduction.The ELM is an effective learning algorithm with fast training speed.A bearing RUL prediction scheme combined with the three methods is introduced in this paper.
Where   is a spectrum for  = 1, 2, . . ., ,  is the number of spectrum lines, and   is the frequency value of the th spectrum line.The scheme is exhibited in Figure 2 and can be described as follows.

JADE
Step 1 (original feature extraction).A total of 16 statistical features are shown in Table 1 including 8 time domain features and 8 frequency domain features.1 to 8 are the mean, root mean square, RMS amplitude, absolute average, skewness, waveform index, impulsion index, and kurtosis index, respectively.9-16 are the frequency domain feature, which indicates the degree of dispersion or concentration of the spectrum, and the change of the dominant frequency band.
Step 2 (PSR for each feature).The autocorrelation function is employed to select the time delay and Cao's algorithm is used to determine the embedding dimension.All the features are reconstructed by PSR with the same parameter.
Step 3 (between-class and within-class scatter () for feature weighting).Based on the feature phase space distribution,  is used to describe the classification ability of features quantitatively for different degradation states.Suppose that there are  classes that are established by degradation states and the th class vector is   = (  1 ,   2 , . . .,     ), where   is the number of samples in this class.Then  is defined as follows: where   is the mean vector for samples in the th class and  is the mean vector for all classes.The between-class scatter   denotes the scattered degree in different classes while the within-class scatter   indicates the concentration among the same class.Therefore, the classification ability can be evaluated through the .The weight of each original feature is calculated as follows: Step 4 (JADE for feature fusion).In this step, every feature that is extracted in Step 1 can be seen as a "source signal," and the JADE is introduced to separate the single model from the 16 source signals that have been normalized and weighted.However, the separated unidimensional vector is mixed by noise.Therefore, the exponentially weighted moving average (EWMA) is used for denoising, which is defined as follows: where  ∈ [0, 1] is the smoothing factor, and it generally takes between 0.05 and 0.25 [36].
Step 5 (prediction model training).Based on Steps 1-4, the fused feature vector with real RUL (  ,   ) is input into ELM, where  is the index of the samples.The bias and input weights of hidden nodes in ELM are randomly generated.
Then the model with Ñ hidden nodes can be constructed as , where   is the weight connecting the th hidden and the input nodes,   is the bias of hidden layer nodes, which are randomly assigned, and  is the output value.  is the weight connecting the th hidden and output nodes, where the best value can minimize the error between output   and real RUL   .And then the fused feature vector of testing samples is input into the trained prediction model for running bearing RUL prediction.

Experimental Data Set Description.
In order to evaluate the effectiveness of the proposed feature fusion scheme for RUL prediction, the entire life cycle bearing data originated from the Center for Intelligent Maintenance Systems [37] is analyzed.The experimental data sets are collected from run-to-failure bearing test on the designed test platform (Figure 3).Four bearings are installed on a shaft.The rotation speed is kept at 2000 r/min by using an AC motor coupled to the shaft via rub belts.A 6000 lb radial load is applied onto the bearing and shaft by using a spring mechanism.A PCB 353B33 High Sensitivity Quartz ICP Accelerometer was installed on bearing housing.A vibration data sample of 20480 points was collected every 10 minutes by a National Instruments DAQCard-6062E data acquisition card.The sampling rate is set as 20 kHz, the vibration data are recorded on the test bench, and more details can be found in [38].

Original Feature Extraction.
In this paper, 100 groups of training samples are selected from the run-to-failure bearing data, and the data samples adjacent to the training data samples are regarded as testing samples.The length of every group's data is set to 1680 points.Take outer-racefault case as an example; the acquired run-to-failure bearing data is ( 1 ,  2 , . . .,  984 ), where   is the th acquired data sample and  ranges from 1 to 984.The training samples can be selected as ( 90 ,  99 , . . .,  981 ) and the testing samples can be selected as ( 91 ,  100 , . . .,  982 ), where   = ( 1 ,  2 , . . .,  1680 ) and  is the index of samples and ranges from 1 to 100.
The 16 original features (as shown in Table 1) are extracted from the training samples and illustrated in Figures 4-6.However, these features are not robust in characterizing the degradation process.Take 12 as an example; it performs well in inner-race-fault case and roller-fault case but performs badly in outer-race-fault case.Therefore, it is very important to extract a robust and favorable feature from the original features, which can better express the bearing degradation.In addition, from these extracted features, it can probably be found that the time ratio of each operation stage (including normal, slight degradation, and serious degradation) in the outer-race-fault case, inner-race-fault case, and rollerfault case is about 0.5 : 0.4 : 0.1, 0.6 : 0.3 : 0.1, and 0.5 : 0.2 : 0.3, respectively.

Phase Space Reconstitution.
As the high dimensional phase space can grasp the nature of time series much better, the PSR is then applied to reconstruct the original feature vector to the phase space before the following feature contribution evaluation using .
In this paper, the autocorrelation function and Cao's algorithm are employed to identify the time delay and embedding dimension.Take 1 of outer-race-fault case as an example; the results are shown in Figure 7. Figure 7(a) shows that   () changes with the increase of delay time, and the minimum  that makes   () lower than 1 − 1/ could be considered as the most optimal time delay.From Figure 7(b), it can be seen that  1 () keeps stable when  is larger than 10, and  2 () randomly fluctuates with .Consequently, the time delay is set to 1 and embedding dimension is set to 10. Then other features are processed as the abovementioned methods, and the phase points distribution of different operational stages is shown in Figures 8-10 (two dimensions are chosen to reflect the phase point distribution for better observation experience).

Feature Fusion Based on JADE and SS.
is then employed to evaluate the contributions of the feature.As shown in Figure 11, the abscissa is the feature number which has been described in Table 1.The ordinate is the value of  which is the evaluation index of feature.The greater the , the better the identifiable capacity of bearing degradation states.Take inner-race-fault case as an example; it can be seen that 1, 5, 7, and 8 cannot effectively characterize or distinguish different degradation stages, and 3 provides excellent preference for distinguishing of different degradation states.The weights of every feature can be calculated as Besides, 12 performs well in roller-fault case but performs badly in outer-race-fault case.Therefore, it is very important to extract a robust feature that could be adapted to different cases.
The JADE is then applied to fuse the weighted features.In order to improve the sensitivity and reliability of the fused feature in the bearing slight degradation, the EWMA is employed for denoising according to (17), where the smoothing factor is selected as 0.2.After all of the steps, the new feature vector can be extracted and shown in Figure 12.

Prediction Performance.
In this step, the fused feature vector and the corresponding real RUL are input into the ELM to construct the RUL prediction model.The data samples adjacent to the training data samples are used to test the model.Results are shown in Figure 13.It can be seen that the predicted RUL is coinciding fairly well with the real RUL, which indicates that the feature fusion method and constructed ELM model can achieve an ideal result.However, the RUL prediction accuracy is relatively high in the first stage as the vibration is relatively stable.The slight fault stage is always complex and sharp fluctuation also happens at this stage.The vibration at the serious fault stage is more regular, which can lead to higher prediction accuracy.
The PCA and ICA as the representative linear dimensionality reduction methods are applied on the same data to compare with the proposed method.The root mean square error (RMSE) is selected to evaluate the RUL prediction accuracy which can be defined as where   is the real RUL and   is the predicted RUL.
Considering the problem of repeatability, the ELM is applied on the same training data 10 times, and the average of the RMSE is selected to evaluate the prediction accuracy.The results of comparison are shown in Tables 2-4.It can  be seen that the proposed feature extraction is better than the traditional two methods, which proves the merits of the proposed method for fusing original features.

Conclusions
A novel feature extraction method that integrates the PSR, , and JADE is introduced in this paper.The contribution of the original features is considered according to  that derived from phase point distribution after PSR.The JADE  is employed to fuse the features.Based on the extracted advanced features, the ELM is employed to construct the RUL prediction model.A set of experimental case studies are presented to verify the effectiveness of the proposed method.The results indicate that the proposed method is appropriate for RUL prediction and even performs better than the traditional methods.The introduced JADE joint PSR and  feature extraction method is a multifeature fusion method that can inhibit the nonsensitive features while enhancing the superior features.It is also suitable for other machine RUL predictions, such as cutting tools and gears.

Figure 1 :
Figure 1: The structure of ELM.

Figure 2 :
Figure 2: Principle of the proposed method based on PSR and JADE.

Figure 8 :
Figure 8: The phase points distribution of outer-race-fault case in two dimensions.

Figure 9 :
Figure 9: The phase points distribution of inner-race-fault case in two dimensions.

Figure 10 :Figure 11 :Figure 12 :Figure 13 :
Figure 10: The phase points distribution of roller-fault case in two dimensions.

Table 1 :
List of the original extracted features.

Table 2 :
Predicted accuracy of outer-race-fault case.

Table 3 :
Predicted accuracy of inner-race-fault case.

Table 4 :
Predicted accuracy of roller-fault case.