Rolling Bearing Degradation State Identification Based on LPP Optimized by GA

In view of the problem that the actual degradation status of rolling bearing has a poor distinguishing characteristic and strong fuzziness, a rolling bearing degradation state identification method based on multidomain feature fusion and dimension reduction of manifold learning combined with GG clustering is proposed. Firstly, the rolling bearing all-life data is preprocessed by local characteristic-scale decomposition (LCD) and six typical features including relative energy spectrum entropy (LREE), relative singular spectrum entropy (LRSE), two-element multiscale entropy (TMSE), standard deviation (STD), RMS, and root-square amplitude (XR) are extracted and compose the original multidomain feature set. And then, locally preserving projection (LPP) is utilized to reduce dimension of original fusion feature set and genetic algorithm is applied to optimize the process of feature fusion. Finally, fuzzy recognition of rolling bearing degradation state is carried out by GG clustering and the principle of maximum membership degree and excellent performance of the proposed method is validated by comparing the recognition accuracy of LPP and GA-LPP.


Introduction
Rolling element bearings are one of the most important components for carrying heavy loads and providing constant rotational speed in rotating machines [1].With continuous operation of rotating machines for a long time, rolling bearings' performance condition is changing all the time and that affects performance stability of the whole machine directly.Therefore, there is a practical significance for improving rotating machines' service life by rolling bearing degradation state identification in real time.
The fault feature extracted from vibration signals is analyzed to determine the bearing state [2].And fault feature extraction is the basis of realizing rolling bearing degradation state recognition.Scientific degradation features can characterize the degradation degree of rolling elements accurately and stably.Degradation features are mainly selected form time domain, frequency domain, time-frequency analysis, and signal complexity aspects.Considering that the actual vibration signal of rolling bearings is nonlinear and nonstationary, the ability of time and frequency domain statistics to characterize different degradation states of the same bearings is relatively poor.For instance, kurtosis is insensitive to initial damage [3] and it can hardly characterize the slight degradation state exactly.These years, the information entropy theory is widely used in signal processing and fault diagnosis and it develops into different forms of entropy with different properties such as approximate entropy (ApEn), sample entropy (SampEn), multiscale entropy (MSE), spatial information entropy (SIE), and fuzzy entropy (FuzzyEn) [4,5].These entropy features apply nonlinear dynamics theory which is different from traditional time-domain indexes in health monitoring and fault identification and have made achievements.Compared with the preset fault pattern recognition, rolling bearing degradation state recognition in its whole life is more ambiguous and complex.However, a single feature of vibration signals can only reflect fault characteristics of rotating machines at a certain fault degree and this can result in problems such as recognition inaccuracy, system instability, and ambiguous recognition results [6].To address these problems, multidomain feature fusion is widely used in degradation state recognition and fault prediction of rotating machines [7,8].
However, high-dimensional feature vector composed by multidomain features inevitably has the problems of information redundancy and characteristic conflict and the International Journal of Rotating Machinery effective information is easy to be submerged by highdimensional data [9].Moreover, the use of high-dimensional data leads to a sharp increase in the amount of calculation that is not conductive to the real-time identification of rolling bearing degradation state.Manifold learning theory has the ability to identify low-dimensional nonlinear structure which is hidden in high-dimensional data and thus, in recent years, those manifold learning algorithms including locally linear embedding (LLE), locally preserving projection (LPP), isometric feature mapping (IsoMap), and Laplacian eigenmaps (LE) [10].By the neighbor graphs obtained from highdimensional features, LPP algorithm can gain its projection in the low-dimensional space.In this way, fusion and reduction of high-dimensional data are achieved.Compared with IsoMap and LLE, LPP has an advantage of simple calculation and fast processing speed.The results of LPP are closely related to nearest neighbor parameters that have no definite criterion.Therefore, the optimized parameters are obtained by repeated experiments.Reference [11] proposed a modified kernel distance measure sensitivity factor to measure the ability that fault features characterize different fault patterns.In view of this, LPP algorithm can be optimized by taking the sensitivity factor as object function.When the factor reaches its maximum, the effect of LPP feature fusion is best.
Considering that the actual rolling bearing degradation states perform strong fuzziness and the boundaries of different degradation states are difficult to determine, Fuzzy C-means (FCM) clustering [12] and Gustafson-Kessel (GK) clustering [13] are widely used in fault diagnosis.And Gath-Geva (GG) clustering improves FCM and GK algorithm by fuzzy maximum likelihood estimation distance norm and the clustering effect is better [14].
Based on the above analysis, a rolling bearing degradation state identification method based on fusion and dimension reduction of multidomain features and GG clustering is put forward in this paper.Six features computed from information entropy and time domain are fused by LPP optimized by genetic algorithm (GA-LPP) in order to separate the training points of different degradation degrees as clear as possible.Finally.degradation state recognition is realized by GG clustering and the principle of maximum membership.

Time-Domain Features.
In order to fully characterize different degradation states of rolling bearings, multi-timedomain features are needed to analyze.Common timedomain indexes include mean, standard deviation (STD), root mean square (RMS), root-square amplitude, skewness, peak to peak, waveform index, pulse index, margin index, partial slope index, and kurtosis.These features are examined from three aspects of ability to follow degradation trend, monotonicity, and data smoothing.Three features including STD  std , RMS  rms , and root-square amplitude   are selected to compose a three-dimensional feature matrix as follows: (2) With the development of rolling bearing degradation state, the energy at characteristic frequency and its multiplications will become larger in the frequency spectrum which is obtained by LCD and Hilbert transform.For rolling bearings, different fault modes have different vibration characteristics.For a certain rolling bearing fault mode just as inner ring pitting, its vibration characteristic frequency and its frequency multiplication can be calculated by the following formula: where  is roller number of the bearings,  is roller diameter,  represents pitch diameter,  denotes contact angle, and   is the rotor frequency.
(3) The sum energy of all samples at the characteristic frequency   ( = 1, 2, . . ., ) is computed as follows: (6) The singular value spectrum of normal samples and degradation samples can be obtained by singular value decomposition (SVD):  (8) The LRSE between normal state and degradation state is defined as follows: 2.2.2.TMSE.Through LCD, there is enough degradation state information in the first two signal components whose cross-correlation coefficient is higher than others.For the two components whose sequence length is , after coarse grain, two-element embedding reconstruction, composite delay vectors, and thresholds setting, assuming that the two composite delay vectors' embedding dimensions are  and  + 1, the conditional probabilities are, respectively,   () and  +1 () when similar capacity limit is .TMSE can be expressed as the natural logarithm of the conditional probabilities' ratio: where  is embedding vector and  is delay vector.The above three kinds of entropy features constitute another three-dimensional feature matrix, which can be expressed as Above all, entropy features and time-domain features constitute a six-dimensional multidomain feature matrix:

Optimized LPP Based on GA
3.1.The Principle of LPP.LPP algorithm can retain the nonlinear structure and local characteristics inside the data when it is applied for high-dimensional data reduction.The algorithm principle can be shown as below [16].
For  data samples with  dimensional space  = { 1 ,  2 , . . .,   }, the matrix  = { 1 ,  2 , . . .,   } is its lowdimensional space samples, where   ( = 1, 2, . . ., ) is a  dimensional vector ( ≪ ).The similarity matrix can be defined by the following formula: where   and   are the nearest neighbor points and  is a constant.LPP algorithm can be achieved by solving the following optimization problem: which needs to satisfy      = 1 and  =  −  is Laplace operator.The matrix   = ∑    reflects the density of the data distribution.Then, the transform matrix can be calculated by solving the generalized eigenvalue decomposition problem: In the above formula, the matrix   is sometimes a singular case.For this problem, the feature set is usually projected onto a PCA subspace and in this way the singularity can be eliminated.And then the following linear mapping can be obtained: 3.2.Kernel Space Measure Sensitivity Factor.In order to evaluate the distinction effect of different degradation states by training samples after fusion and dimension reduction, Zheyuan et al. [17] propose that distance between different types of samples in kernel space is taken as the basis of feature evaluation.However, in clustering analysis, clustering center selection not only depends on the distance, but also depends on the degree of aggregation of the same type of points.Therefore, reference [11] takes the ratio of different types' distance and divergence of the same type as the measure factor in kernel space.And this factor is regarded as the distinguishing criterion for high accuracy.The Gaussian radial basis kernel function is selected to calculate the distance between  1 and  2 .The form is as follows: Then, the distance between two points can be expressed as On this basis, the average distance between training samples of type  and type  can be calculated as follows: where  = 1, 2 . . ., ;  = 1, 2 . . ., .  is the number of sample categories.  and   are the number of samples of type  and type .

International Journal of Rotating Machinery
The average distance between different sample categories is The divergence of the same sample category can be expressed as where   is average of training samples of category .
According to the definition, the kernel space measure sensitivity factor is 3.3.Optimization Based on GA.In order to make the fusion features gained from LPP dimension reduction distinguish different degradation states better, genetic algorithm (GA) is applied to optimize the kernel space where there are kinds of training samples.GA is a newly developing algorithm to search an optimal solution.The process of GA algorithm mainly includes population initialization, crossover, mutation, fitness calculation (individual evaluation), and selection (population replacement).The kernel space measure sensitivity factor is taken as the fitness function for optimization and the optimal individual is the case where the discrimination of different degradation states is highest.
Studies have shown that the clustering effect of LPP fusion features will change along with the changing kernel space.In the interest of finding the optimal kernel space, all training samples need to do affine transformation.Take 3D fusion features as an example, one training point is set as  1 ( 0 ,  0 ,  0 ) and affine transform angles are set as  1 ∈ [0, 2] and  2 ∈ [0, 2].So the affine transformation matrix is The new sample feature points after kernel space transformation can be computed by the following equation: The two affine transform angles are used as the training entity and the individuals are randomly generated to complete initialization.By the optimization process of GA, the training sample clustering effect is found to be the best.

GG Clustering Algorithm
For the training sample set  = { 1 ,  2 , . . .,   }, it is assumed that each sample is made up of  characteristics:   = { 1 ,  2 , . . .,   }.After initialization, all samples are divided into  categories; namely, the number of clustering classifications is  (2 ≤  ≤ ).The clustering centers of all categories are   = {V 1 , V 2 , . . ., V  } and the membership matrix is  = {  } × .The element   ∈ [0, 1] represents the membership degree of the  training sample to the  degradation state (1 ≤  ≤ ).In GG algorithm, the following objective function can reach the minimum value with the iterative adjustment of  and : where  is the weighted index generally taken to 2. Different from FCM clustering,   indicates the distance measure calculated by the covariance matrix in GG clustering.In that way, the data samples of different directions and shapes can be reflected effectively.

The Process of Degradation State Identification
The original vibration signal is preprocessed by LCD.The time-domain features of STD, RMS, and root-square amplitude and the entropy features of LREE, LRSE, and TMSE are extracted from the selected signal components to compose the original characteristic set.The degradation state recognition processes are as shown in Figure 1.
The degradation state recognition algorithm mainly contains the following key steps: (i) LCD Pretreatment.According to the cross-correlation coefficient between the LCD components and the original signal, the useful components can be chosen.
Considering the amount of information existing in components and the time of computation, the first two components whose coefficient is higher than others are selected for further analysis after many tests.(ii) Feature Extraction and Fusion.Six-dimensional multiple domain features are fused by LPP algorithm and the intrinsic dimension is three according to the maximum likelihood estimation.Therefore, the target dimension of feature fusion is set as three.On the basis of the maximum sensitive factor principle, the fusion features are optimized by GA to find the best kernel space for clustering analysis.(iii) The clustering centers are determined by GG algorithm and the rolling bearing degradation identification is achieved by the principle of maximum membership degree.3(a).The specific parameters are set as shown in Table 1.

Instance Verification
When the test bench running time reaches 9600 minutes, the machine is shut down.Inner ring pitting occurs in the bearing at number 4 station and that result in bearing failure (as shown in Figure 3(b)).
The collected 960 groups of vibration data record the whole process of rolling bearing from normal state to failure state.Figure 4 shows the real-time monitoring curves of average amplitude versus time which reflect different degradation states of rolling bearing clearly.According to the change of curve amplitude and curvature, the rolling bearing performance variation can be initially divided into four states: normal state, slight degradation, severe degradation, and failure state.The details are presented in Table 2.
The original signal is preprocessed by LCD to get 10 intrinsic scale components (ISCs) and the first 5 ISCs are shown in Figure 5.Further the cross-correlation coefficient between each component and the original signal is calculated and the value relation is as follows: What is more, there are only the first and the third ISC whose coefficient is more than 0.5, respectively, 0.6487 and 0.5395.Therefore, the two components are taken as signal source for degradation feature extraction.

Degradation Feature Fusion and Optimization.
According to the degradation state division in Table 2, 100 groups of normal data, 100 groups of slight degradation data, 60 groups of severe degradation data, and 30 groups of failure data are selected as training samples.The characteristic indexes of different degradation states are extracted and normalized, respectively.The 3D time-domain feature points are shown in Figure 6.In the bearing degradation process from normal state to failure state, these three features are monotonically increasing and the effect of failure state distinguishing is obvious.However, the points of the other three degradation states are mixing severely and cannot be distinguished clearly.
Although the time-domain features such as RMS are easy to get and have good stability to characterize degradation states, literature [19] indicates that these time-domain features are not sensitive to early bearing fault including slight degradation and severe degradation until bearing failure occurs.What is more, reference [20] points out that rolling bearings' vibration signals present nonlinear characteristics, and these three traditional time-domain features are similar and can  15) are shown in Figure 7.The entropy vector can distinguish normal state, slight degradation, and severe degradation on the whole.Nevertheless, in the failure state, the training samples' clustering effect is unsatisfying.Reference [21] demonstrates that entropy indexes are sole dependent on the probability distribution of the event occurrence in bearing fault signals.They are sensitive to the degradation state changing but are more susceptible to spurious vibrations.When the bearing comes to failure state, the violent condition changing will make the vibration signals mixed with a lot of spurious components and the entropy features cannot stably characterize the failure state of bearings.Therefore, the 3D entropy features at failure state show strong discreteness in Figure 7.
In order to improve the discrimination effect of different degradation states, the above time-domain features and entropy features need to be fused.Therefore, the sixdimensional multidomain feature vectors are input to the LPP for feature fusion and dimension reduction.In order to ensure the information exchanging among the neighborhoods, the neighborhood number  should not be too small; yet if  is too large, the local features can be incomplete.Generally analyzed, the size of  should be between  and  where  is the intrinsic dimension and  is the number of training samples in each category.In this paper,  = 3 and  = 30.Thus, 3 <  < 30.
The clustering effect is better when  = 7 that is presented in Figure 8.Compared with the time-domain features and the entropy features, the degradation state distinguishing ability of the LPP fusion features is better and the clustering effects of normal state, slight degradation, and failure state are satisfying.But the robustness of fusion features in severe degradation state is relatively poor and this results in the fact that the same severe degradation state is divided into two sample parts.Meanwhile, the sample class spacing is relatively small and the clustering effect is not good.So the process of feature fusion needs to be optimized.
The kernel space measure sensitive factor is taken as the objective function.According to formula (22) and formula (23), the kernel space is optimized by GA so that the factor has a maximum value.In order to improve the convergence speed and ensure the search quality, the population size is set as  = 20∼200.After several experiments,  = 30.The larger the crossover probability is, the higher the loss rate of excellent results is.But when the probability is too small, the search will be blocked.In general, crossover probability   = 0.6∼ 1.0 and here it is 0.8.Mutation probability generally should not be too large; otherwise GA will become a random search method and the precision and speed of convergence will be influenced.Therefore, the mutation probability   = 0.03.
As shown in Figure 9, after 26 iterations, the kernel space measure sensitivity factor tends to be stable and the maximum value is achieved.And the optimized affine transformation angles are  1 = 1.4910 and  2 = 3.8532.feature points.In comparison with Figure 8, the optimized fusion features distinguish different degradation states better than the original features and especially the clustering effect of training samples in severe degradation state improves a lot.What is more, the different class distinctions are further widening.Thus, the optimization effect is obvious.
In order to furtherly illustrate the excellent performance of the proposed method, the sensitivity factors of timedomain features, entropy features, LPP fusion features, and GA-LPP fusion features are calculated, respectively, and the result is just as Figure 11 shows.The kernel space measure sensitivity factor of GA-LPP fusion features is the maximum one and it indicates that the fusion features have a strong ability to characterize different bearing degradation states after GA optimization.

Degradation State Recognition Based on GG Method.
According to the number of bearing degradation states, the number of clustering centers is determined as  = 4.The In accordance with Table 1, every 5 groups of data are chosen randomly as testing samples from each degradation state.The selected 20 groups of data's multidomain features are optimized by GA-LPP at the same affine transformation angles.The fusion feature space distribution is shown in Figure 12 where the testing feature points are well distributed around the clustering centers and the testing sample spacing is large enough.This method can effectively avoid identification misjudgment and improve the recognition accuracy.
The membership matrix  is established based on grey correlation analysis.Based on this, bearing degradation state recognition is realized guided by the principle of maximum membership value.Table 3 is the membership matrix between the testing samples and each standard degradation state.By comparing the membership value of the same sample point and different degradation states, the recognition result is the degradation state whose membership value is maximum.Here are two LPP results before and after GA optimization.Without GA optimization, LPP fusion features judge slight degradation state as normal state and severe degradation state is mistaken as failure state.The accuracy of degradation state recognition is only 85%.In comparison, GA-LPP fusion features have a better distinguishing ability.20 groups of identification results are in complete agreement with the real degradation states and the excellent performance of the proposed method is verified.

( 7 )
Combined with the relative entropy theory, the related probabilities are defined as

Figure 10 Figure 9 :Figure 10 :
Figure 9: The curve of objective function with iterations.

Figure 11 :
Figure 11: Clustering effects of different combinations of features.

Table 1 :
The experimental parameters.
6.1.Experimental Platform and DataPreprocessing.The bearing full-life data used in this paper comes from Hangzhou

Table 2 :
The division of degradation state of rolling bearing performance.