Fault Diagnosis for Reducer via Improved LMD and SVM-RFE-MRMR

The vibration signals are usually characterized by nonstationary, nonlinearity, and high frequency shocks, and the redundant features degrade the performance of fault diagnosismethods. To deal with the problem, a novel fault diagnosis approach for rotating machinery is presented by combining improved local mean decomposition (LMD) with support vector machine–recursive feature elimination withminimum redundancymaximum relevance (SVM-RFE-MRMR). Firstly, an improved LMDmethod is developed to decompose vibration signals into a subset of amplitude modulation/frequency modulation (AM-FM) product functions (PFs). Then, time and frequency domain features are extracted from the selected PFs, and the complicated faults can be thus identified efficiently. Due to degradation of fault diagnosis methods resulting from redundant features, a novel feature selection method combining SVM-RFE with MRMR is proposed to select salient features, improving the performance of fault diagnosis approach. Experimental results on reducer platform demonstrate that the proposed method is capable of revealing the relations between the features and faults and providing insights into fault mechanism.


Introduction
Rotating machines are the heart of some modern equipment such as aircraft engine, gas turbine, and reducer and often a common source of machine faults.However, most vibration signals are nonstationary, nonlinear, and disturbed with heavy background noises, and the machinery components are seriously coupled, providing a challenge to detect and identify specific rotating machine faults accurately [1,2].For accurate diagnosis of rotating machine fault, a number of techniques have been developed in recent years [3][4][5].
Generally, vibration signals from mechanical equipment are the most effective and common way to identify the states and faults of rotating machine [3].According to the domains to which the extracted features belong, machinery fault diagnosis methods can be categorized into time domain-based, frequency domain-based, and time-frequency domain-based diagnosis methods [4].The time domain-based features have intuitive physical meaning with low computation complexity while the classification, location, and severity of fault cannot be identified through these features.The frequency domainbased methods like Fourier transform and its variants detect the fault according to the spectrum of vibration signal.However, Fourier transform is not capable of dealing with nonstationary and nonlinear signal.Time-frequency domain diagnosis methods can decompose signal into multiplescale components; the local characteristics of signal can be revealed.Moreover, the components are approximately stationary and linear, and the time or frequency features are then extracted to detect machinery faults.
Due to poor working environment, fault features are usually immersed in heavy background noise.Especially for incipient fault, the fault features are too weak to be extracted [5].Time-frequency analysis techniques, such as Wigner-Ville distribution [6], wavelet transform [7], and EMD [8], are effective in suppressing signal noise.In Wigner wavelet transformation, degrading the resolution signals in time domain and frequency domain.Empirical mode decomposition (EMD) suffers from mode mixing and end effect.SMITH [9] proposed the LMD to compute instantaneous frequency and the distribution of time-frequency energy of electroencephalogram signals in 2005.The method suppresses end effect, eliminates over-envelope and underenvelope, and reduces computational complexity.Compared with EMD, LMD achieves instantaneous amplitude and frequency without Hilbert transformation and fitting the upper and lower envelopes.Therefore, LMD not only overcomes the problems existing in EMD, but also concentrates more useful information in a few decomposition components, which is suitable to analyze nonlinear and nonstationary signal [10].At present, the LMD integrating with machine learning methods has been developed to enhance fault features and identify fault patterns [11][12][13][14].Fault feature extraction from vibration signals is a critical step of fault diagnosis.However, in many real-world applications, there are some redundant features in high-dimensional data, reducing generalization performance of fault detection model.Therefore, removing redundant features from highdimensional data has become an effective way to enhance the performance of fault diagnosis model [15,16].Aiming at the insufficient stability when variable predictive model class discrimination (VPMCD) was applied to small samples or in multi-correlative feature space, Tang et al. [15] proposed ARSFS method based on affinity propagation (AP) clustering, RReliefF, and sequential forward search for feature selection.The experimental results showed that ARSFS effectively identified the faults of the rolling bearing.Intelligence optimization methods have been commonly applied for feature selection, which achieves better identification accuracy than most existing feature selection approaches [16].Support vector machine-recursive feature elimination (SVM-RFE) based on one-to-one or one-to-many strategy provides a feasible way to identify multiple machinery faults [17][18][19].Tang et al. [18] proposed a two-stage SVM-RFE, in which feature extraction and the feature subset selection were conducted sequentially.However, SVM-RFE barely takes the degree of the features effect on the classifier into account and ignores redundant features.Sometimes, the classification accuracy on subset consisting of M features from N (M<N) salient features may be higher than that on the subset consisting of N salient features in some applications.
Taking the above problem into account, minimum redundancy maximum relevance (MRMR) algorithm is introduced to evaluate the feature subset selected by SVM-RFE, and the optimal feature subset is thus yielded.In MRMR, mutual information is used to measure the relevance and redundancy of features, and the feature subset is achieved through two cost functions called information difference and information entropy [20].This algorithm is called support vector machine-recursive feature elimination-minimum redundancy maximum relevance (SVM-RFE-MRMR).As seen later, the new method is capable of effectively dealing with the problem of traditional feature selection algorithm.
The structure of this paper is organized as follows: Section 2 briefly introduces the improved LMD method, and the elimination of end effect is illustrated.SVM-RFE-MRMR algorithm is proposed and the details are given in Section 3. Experiments are performed on a reducer test system to verify the effectiveness of the proposed method in Section 4. Finally, some conclusions are drawn in Section 5.

Improved Local Mean Decomposition
LMD is an adaptive time-frequency analysis method similar to empirical mode decomposition (EMD).Both methods can decompose a complex signal into several single component signals with physical meaning.Compared with EMD, LMD can diminish end effect and eliminate over-envelope and under-envelope with low computational complexity and less information loss [21].
LMD decomposes complicated signals into a set of product functions (PF).The instantaneous frequency of each PF is the product of an envelope signal and a frequency modulated signal.The time-frequency distribution of original signal can be yielded by combining the instantaneous amplitude and instantaneous frequency of all PF components.The schematic of LMD is shown in Figure 1.
Local extreme value is the premise for evaluating product functions.Due to the limited length of a signal, the endpoint of the signal may not be the extreme point.Then false components will gradually "pollute" the whole signal sequence inward from two endpoints, resulting in the divergence of envelope function and degradation of the decomposition results, namely, end effect.To deal with the problem, an improved LMD method is proposed.Firstly, support vector regression (SVR) model is utilized to extend both ends of the original signal, and then the extended signal is decomposed into a set of PFs via LMD.
Assuming a set of data {(  ,   )}  =1 , where   ∈   ,   ∈ , and  is the length of the samples, a SVR model is formulated as follows: where () is a nonlinear mapping function to map the sample data into the high-dimensional feature space,  is the weight vector, and  is the offset.
Introducing penalty parameter  and insensitive loss function, the SVR model can be achieved by solving the following convex optimization problem: where   and  *  are the slack variables.This above optimization problem can be transformed into the dual problem by introducing kernel trick; i.e., min where   and  *  are the Lagrangian multipliers.Finally, the SVR model is obtained.
Here, radial basis function (RBF) is selected as the kernel function.

𝐾 (𝑥
where  and  is required to be prespecified.Penalty parameter  and the kernel parameter  play a significant role in the model generalization.Particle swarm optimization (PSO) algorithm is introduced to find the optimal parameters.The hybrid PSO-SVR method treats the parameters  and  as the particles, and the velocity and location are constantly updated.
Let   = ( 1 ,  2 )  be the velocity of th particle and denote individual and global extremums by  , = ( 1 ,  2 )  and   = ( 1 ,  2 )  , respectively.In each iteration, the velocity and position of each particle are updated according to the following formula: where  represents the inertia factor, d represents ddimensional space,  = 1, 2, . . ., ,  represents the current number of iterations,  , represents the velocity of th particle,  1 =  2 = 2 are acceleration factors, and  1 and  2 are random numbers between [0, 1].
The process of optimized PSO-SVR is illustrated in Figure 2.
The fitness function is mean square error (MSE).where   represents the fitness of th particle, (  ) is the output of SVR with input   ,   is the expected output, and  is the number of samples.
In order to demonstrate the efficiency of the improved LMD method, end effect evaluation index  is introduced, which is calculated as follows: where  0 is the effective value of original signal,   is the effective value of the th PF,   is the effective value of residual component, and where () is the signal to be calculated.The larger the  ( > 0) is, the greater the influence of end effect is.Based on the above specifications, a synthesis signal is generated as The sampling frequency is 500 Hz; the time interval is [0.056, 0.654] and it contains 300 points in total.Optimal parameter results of PSO with generations are shown in Figure 3.The optimized penalty parameter is  = 82.6, and the kernel function parameter is  = 1.181.
The extreme mirror extension method and the SVRbased extension method are performed on the simulation signal.LMD results of the extreme mirror extension and SVR-based extension methods are shown in Figures 4 and  5; the corresponding end effect evaluation indexes are  = 0.0348 and  = 0.0144, respectively.It can be seen from Figure 4 that PF2 has obvious distortion in its left and right endpoint, so the method cannot characterize the trend of the original signal.Based on the comparison of two methods, it is induced that the SVR-based extension approach outperforms the other method.

Support Vector Machine-Recursive Feature Elimination (SVM-RFE)
. SVM-RFE was firstly introduced to rank genes from gene expression data for cancer classification [22].SVM-RFE performs backward feature elimination.All features are firstly sorted in terms of the ranking scores of features, and the feature with smaller ranking score will be removed from candidate feature subset.Similar to the above procedure, SVM is literately trained on the new candidate feature subset, and the remaining features are resorted until one feature is left in candidate feature subset.
More specifically, given training dataset {(  ,   )}  =1 , where   is the training sample,   ∈   ,   ∈ {−1, +1} is the class label of   ,  is the number of training samples, and  is the feature dimension of the sample, the decision function of SVM is expressed as () =    + , where  = [ 1 ,  2 , . . .,   ]  is a vector and  is a scalar.By introducing the kernel trick, the dual optimization problem of SVM can be written as where  is a trade-off between the training accuracy and the model complexity.
In SVM, the importance of features is measured by their weight coefficients: where   is achieved by solving the optimization problem in (12).
In SVM-RFE, the ranking score of th feature is defined as The detailed algorithm of SVM-RFE is shown in Algorithm 1.
From Algorithm 1, it is induced that the ranking criteria can measure the correlation between features and decisions by judging the weight of features.Although SVM-RFE can eliminate those unimportant features one by one, it is noted that whether the deleted features are redundant is a problem that requires attention.To address the issue, MRMR is introduced in the paper.

Minimum Redundancy Maximum Relevance (MRMR)
Method.Ding and Peng introduced a criterion firstly to measure relevance and redundancy of features by using mutual information called minimum redundancy maximum relevance (MRMR) [23].MRMR measures the maximal sample information and the minimal relevance among features by defining maximum relevance criteria and minimum redundancy criteria, respectively.MRMR can achieve the ranking score of each feature.Specifically, given two feature vectors  and  with probability density () and (), let (, ) be the joint probability density of  and .Their mutual information (MI) which measures the redundancy among features is defined as follows: (, ) = ∬  (, ) log  (, )  ()  ()  .
The minimum redundancy and maximum relevance are calculated as where  and || represent feature set and the number of features in , respectively,  is class label, (  , ) represents the MI of feature   and class label , (  ,   ) represents the MI of feature   and feature   ,  is the mean of MI, and  represents the MI among the features.MRMR intends to yield the features with minimum redundancy and maximum relevance through the following two criteria: (1) while  ̸ = [] do (2) Train a SVM with features in  (3) for all th feature   in  do (4) Compute   using equation ( 14) (5) Compute   using equation (19) (6) end for (7) Find   and   using equation ( 20) and ( 21) respectively ( 8) for all th feature   in  do (9) Compute () =   +   (10)

Support Vector Machine-Recursive Feature Elimination-Minimum Redundancy Maximum Relevance (SVM-RFE-MRMR). According to the above introduction, SVM-RFE takes the correlation between features and decisions into
where  *  = max    ,  *  = max    .The overall algorithm of SVM-RFE-MRMR is described in Algorithm 2.

Fault Identification and
Analysis for Reducer 4.1.Experimental Device.Fault diagnosis for reducer is conducted on vibration data while it is difficult to acquire multiple fault signals in practice.Therefore, a reducer simulation platform is constructed to acquire fault signals.The mechanical system of the whole device is shown in Figure 6(a), and the number of each component is shown in Figure 6(b).This system mainly consists of motor, reducer, magnetic powder brake, vibration sensor, etc.The vibration sensor is installed at the 4# bearing.The magnetic powder brake is used to simulate mechanical load by adjusting the voltage.In the experiment, a tooth of gear b, a part of inner circle in 4# bearing, and a part of outer circle in 4# bearing were cut off by line cutting.Figure 7 shows broken tooth fault, inner circle fault, and outer circle fault, respectively.The sampling frequency was 4 kHz, and the motor rotating speed was 1420 r/min.Original waveforms of various states are shown in Figure 8.The normalized equation can be expressed as

Fault
where , ℎ ∈   ,  min is the minimum value of , and  max is the maximum value of .

Multifault Classification Model
Training.There are 4 kinds of fault states in this experiment, which belongs to multiclassification problem.The training of multifault classification model is shown as below: Firstly, we label the different operation status as Table 1.The normal state is defined as 1, broken tooth fault is defined as 2, inner circle fault is defined as 3, and outer circle fault is defined as 4.
Secondly, 6 binary classifiers need to be constructed, such as (1v2), (1v3), (1v4), (2v3), (2v4), and (3v4).Corresponding vector is selected as the training set in the process of training each binary classifier, and we obtain 6 kinds of training models.Then we use corresponding test set to test the 6 kinds of results, respectively.The entire feature set contains 120 samples; each sample contains 8 feature characteristics.Lines 1-30 represent normal state (label 1); lines 31-60 represent broken tooth state (label 2); lines 61-90 represent the inner circle fault (label 3); and lines 91-120 represent the outer circle fault (label 4).We randomly select two-thirds of the samples in each label as the training samples, and the remaining as the test samples.
Finally, we apply the test samples to test the accuracy of classification model.
Optimal parameters need to be searched in the process of classification training.Common parameter optimization methods include grid method, genetic algorithm (GA) method, and PSO method.The process of searching optimal parameters via three methods and corresponding test results are shown in Figures 14,15,and 16, respectively.The classification accuracy of each method is shown in Table 2.As can be seen from the table, two model parameters achieved by these methods are very close and their classification accuracy is consistent.The advantage of grid method is that it can search multiple parameters at the same time.For independent parameter pairs, it is easy to search in parallel and it takes less time when searching for fewer optimization parameters.Besides, the approach can find global optimal solution when the optimization interval is large enough and the step distance is short enough [24].GA method has good global search performance, but the search speed is slow and the solution efficiency is low.PSO method has strong local search performance and fast convergence speed, but it is prone to premature convergence and then fall into local optimal solution [25,26].Therefore, the SVM model parameters are selected via grid method in this paper.nested feature subsets  1 ⊂  2 ⊂ ⋅⋅⋅ ⊂   .Prediction accuracy of SVM is used to evaluate the performance of these subsets so as to obtain the optimal feature subset.E We use two criteria, training set leave-one-out cross validation error recognition rate (Loo Error Rate) and independent test set error recognition rate (Test Error Rate), to determine the optimal feature subset.
Following the steps outlined above, we use the training set designed in Section 4.3 to derive the SVC training and get optimized parameters  1 = 0.7579 and  1 = 1.3195.These parameters are then applied to rank the influence degree of each feature; the result is shown in Table 3.At the same time, the nested feature subset is obtained as shown in Table 4.In this table, the relationship among each subset is  1 ⊂  2 ⊂ ⋅ ⋅ ⋅ ⊂  8 .Next, calculate the classification accuracy on the basis of optimized parameters and nested feature subset in first step, and use the Loo Error Rate criterion to determine the optimal feature subset.The result is shown in Table 5.Finally, select the nested feature subset to train SVC and get optimized parameters  2 = 9.7656 × 10 −4 ,  2 = 4.The two parameters and the test set are used to calculate the classification accuracy and Test Error Rate as before, which can evaluate the performance of predictive model.It can be seen from Table 6 that the Test Error Rate of feature subset  8 is the lowest.
Through contrasting and analyzing Tables 5 and 6, we can conclude the following information: (a) different combinations of features can achieve the same effect; (b) some  fault features contain analogous information; (c) the optimal feature subset contains the least number of features.

Conclusions
This paper presents a fault diagnosis method based on the combination of improved LMD and support vector machine-recursive feature elimination-minimum redundancy maximum relevance (SVM-RFE-MRMR) for feature redundancy of reducer vibration signals.The approach reduces feature dimensionality of original feature set through discarding those features that contribute less to classification or are insensitive to fault and preserving the best feature set made up of the optimal feature parameters.The results of the experiment verify that the proposed approach has good reliability and achieves high classification accuracy, which provides good reference value for condition monitoring of rotating machinery.In addition, this method can also be applied to multichannel fault signals processing and overcome the problem of high data bits and small samples.

Figure 4 :Figure 5 :
Figure 4: LMD results of the extremal mirror extension method.

Table 1 :
Feature parameters list.

Figure 15 :Figure 16 :
Figure 15: Classification training and test results based on GA.

Figure 17 :
Figure 17: Diagram of searching the optimal feature subset.

Table 1 .
Feature Generation.The fault diagnosis process for reducer under different states is shown in Figure9.Only part of the data is listed due to limited space.The feature parameters extracted from vibration signals under 4 states are arranged in sequence into a matrix of 120 rows and 8 columns, and each column of the matrix is normalized to [0, 1].

Table 2 :
Optimized parameters  and  and corresponding classification accuracy.

Table 6 :
Test Error Rate.