Bearing Fault Diagnosis with Kernel Sparse Representation Classification Based on Adaptive Local Iterative Filtering-Enhanced Multiscale Entropy Features

. To improve the bearings diagnosis accuracy considering multiple fault types with small samples, a new approach that combined adaptive local iterative filtering (ALIF), multiscale entropy features, and kernel sparse representation classification (KSRC) is put forward in this paper. ALIF is used to adaptively decompose the nonlinear, nonstationary vibration signals into a sum of intrinsic mode functions (IMFs). Multiple entropy features such as sample entropy, fuzzy entropy, and permutation entropy with multiscale are computed from the first three IMFs and a total of one hundred and eighty features are obtained. After normalization, the features are employed to train and test the classifier KSRC, respectively. Finally, the proposed approach is evaluated with two experimental tests. One is concerned with different types of bearing faults from the centrifugal pump; and the other is from Case Western Reserve University (CWRU) considering 12 bearing fault states. Experimental results have proved that the proposed approach is efficient for bearing fault diagnosis, and high accuracy will be obtained with high dimensional features through small samples.


Introduction
The rolling bearings are mostly used in rotating machinery and their working conditions are concerned with maintenance of machines and safety of workers.Since the faults in bearings are always companied with the vibration which is easy to measure, many works are focused on the fault diagnosis based on the vibration analysis.Generally, the procedure for bearing fault diagnosis is composed of four steps: (1) preprocessing based on adaptive mode decomposition for nonlinear and nonstationary vibration signals, (2) extract features that are relatively insensitive to the data length and immune to the noise, (3) dimension reduction of the feature matrix based on principal component analysis (PCA) or Laplacian scores (LS), etc., and (4) fault pattern identification with the classifier.For example, Zhao et al. [1] computed multiscale permutation entropy of subbands by wavelet packet decomposition (WPD) and employed hidden Markov model (HMM) to identify the fault pattern of the rolling bearing.Yang et al. [2] extracted the energy entropy from the intrinsic mode functions (IMFs) by empirical mode decomposition (EMD) [3] as features and employed artificial neural network (ANN) to identify the fault types.Li et al. [4] utilized local mean decomposition (LMD) [5] for preprocessing, improved multiscale fuzzy entropy as features, Laplacian scores for feature selection, and improved support vector machine based binary tree for bearing fault diagnosis.Yang et al. [6] combined variational mode decomposition (VMD) [7], local linear embedding (LLE) with support vector machine (SVM) to diagnose mechanical faults of the rotor-bearing-casing system.Good effects in bearing fault diagnosis have been realized to some extent from the above description; however, some problems still exist and need to be investigated further.
The first problem is concerned with adaptive mode decomposition [8].WPD needs to prespecify the basis function and could not decompose the signals adaptively.The representative approach for adaptive decomposition is EMD, which could decompose a complicated signal into the sum of some certain IMFs, yet it is subjected to the problems of end effect and mode mixing.Some modifications are proposed following EMD, such as LMD and VMD.Recently, a new approach called adaptive local iterative filtering (ALIF) was proposed by Cicone in 2016 [9,10].It follows the structure of EMD and has advantages of the adaptive filter adjusted with the Fokker-Planck (FP) equation and an adaptive filter length.Authors in [11] have successfully applied ALIF and approximate entropy for wind turbine bearing.Consequently, ALIF will be more suitable to process the faulty vibration signals of the rolling bearings.
The next problem is how to extract efficient features for classification with high accuracy [12].The traditional features are from time domain, frequency domain, and timefrequency domain.To deal with the nonlinear dynamic characteristics of bearing fault signals, entropy is introduced [13], such as approximate entropy (ApEn) [11,14], sample entropy (SaEn) [15,16], fuzzy entropy (FuEn) [17,18], and permutation entropy (PE) [19,20].However, they all estimate the complexity of signals at a single scale, which may be not conducive to the extraction of signal features.To overcome this drawback, multiscale entropy (MSE) was proposed by Costa et al. to measure the complexity of signals over a range of scales [21,22].Based on MSE, multiscale sample entropy (MSaE), multiscale fuzzy entropy (MFE), and multiscale permutation entropy (MPE) are proposed, which have been proved to have better performance compared with SaEn, FuEn, and PE in application of diagnosis on rolling bearing fault [23][24][25].However, it is not ideal to use the entropy features directly for classification because of the influence of noise and interference harmonics in the vibration signals.Hence, ALIF is utilized to decompose the original signals into a sum of IMFs, which reduce the interference of noise and harmonics and highlight the effect of the fault information.In the following, the three multiscale entropy features are computed with the IMFs containing the most fault information.Considering the advantages of the three multiscale entropy features in feature extraction and characteristics of the following classifier, all of them are employed in this paper.
Before classification, feature selection like LS or dimensionality reduction like PCA and LLE should be performed.In the following, classifiers such as HMM [1], ANN [2], SVM [6], and multiclass relevance vector machine [20] are carried out for identification of the fault type.Though theories of them are well established, the inherent limitations have confined them to some extent.For example, ANN and VPMCD [26] need large training samples to obtain high classification accuracy; also, the SVM is a binary classifier which requires a classification strategy such as one against one, one versus all, and binary tree.Nevertheless, this is a twostage combined feature reduction and classifiers.In addition, the training samples in the practical application are small, but with multiple features.Hence, a sparse representation classifier [27] is introduced to achieve the two stages at one time and realize feature selection through regularization.The classifier is firstly proposed to recognize human faces viewed in front considering cases of varying expression and illumination.Its advantage lies in the requirement of a sufficiently large number of features for high classification accuracy, but the number of samples.To modify the classification accuracy with high dimensional features, kernel approach is introduced and KSRC is proposed and applied in face recognition [28,29].Hence, KSRC is employed in this paper to identify bearing fault states combined with ALIF-enhanced multiple entropy features.
The organization of this paper is as follows.Theoretical backgrounds including ALIFD MMPE, and KSRC are briefly introduced in Sections 2, 3, and 4. The illustration concerning the proposed method based on the theoretical backgrounds is presented in Section 5. Experiment datasets are employed to verify the proposed method in Section 6, and in Section 7 the conclusions are finally drawn.

Adaptive Local Iterative Filtering
Given a nonstationary, nonlinear signal (), it could be reconstructed as the sum of several IMFs and the residue: where   () represents the IMF,  is the number of IMFs and the IMF should satisfy two conditions [3]: (1) extrema in the whole data set must have the same number with zero crossings or differ by one at most; and (2) at any point, the mean value of the upper envelope connecting all the local maxima and the lower envelope connecting all the local minima is zero.Generally, the decomposition process consists of two loops: the inner loop and the outer loop, where the former is used for IMF extraction, while the latter is used to determine the number of IMFs and the residual.In EMD algorithm, the cubic spline interpolation is employed for the upper and the lower envelope functions, which will be susceptible to singularities.Consequently, iterative filtering computes the moving average Θ(()) of the signal () by the convolution in lieu of the envelop functions.In (2), * represents the convolution operator, () constrained with ∫  − () = 1 is a low pass filter, and  is the mask length.Afterwards by the sifting process, the first IMF is generated: where  is the iterative number,  1 () = (), and   () = Θ 1,−1 ( −1 ()).Since the number  is impossible to achieve infinite in (3), so (4) is adopted as a stop criterion for iterations: where Θ , represents the moving average of the -th iteration of the -th IMF and  is a prespecified parameter.If  is large, rough decomposed results may be obtained.However, if  is too small, the computation will be expensive and noise will be introduced.Finally, 0.001 is determined to  after trials.In the next step, the second IMF will be obtained by repetition of the previous iterative process to the residual signal () = () −  1 ().With the same manner, all the subsequent IMFs are produced by Finally, if () does not satisfy the two conditions of IMF, then treat it as the residual and stop the iteration.The ALIF method is improved from the iterative filtering technique, which could adaptively adjust the filter with the FP equation and adaptively compute the filter length.Consequently, the above equation (2) can be rewritten as ( + )  (, ) , (6) which is subjected to where (, ),  ∈ [−(), ()], is the filter at time , and () is the mask length varying with .
The decomposition results of ALIF and EMD are shown in Figures 2 and 3.In Figure 2, IMF1, IMF2, and IMF3 correspond to the components  1 ,  2 , and  3 .Moreover, the decomposed components from IMF4 to IMF8 are residuals.However, the IMF1 by EMD corresponding to the  1 is distorted in Figure 3, and IMF2 and IMF4 are corresponding to  2 and  3 because IMF3 is a false component.The absolute error is employed to compare the decomposition results of ALIF with EMD.From the comparison in Figure 4, ALIF outperforms EMD.

Multiscale Entropy Features
Following the preprocess of signals with ALIF, entropy features will be extracted from the IMFs for the preparation of fault diagnosis.Since a fault type with more features could be better represented, yet with the consideration of computation efficiency, hence sample entropy, fuzzy entropy, and permutation entropy are just introduced.

Sample Entropy.
Considering a time series {(),  = 1, 2, . . ., }, then the  dimensional vector at time  can be constructed as where  is the time delay.The distance between  ()  and  ()    is defined as Set the threshold , and the ratio of distance less than  is defined as with the mean Repeat the above steps for  (+1)  () and obtain the mean  (+1) (); then the sample entropy is Since  is a finite value, ( 13) can be rewritten as 3.2.Fuzzy Entropy.The distance in (10) is used to measure the fuzzy similarity as follows: Define the function at : and the function at  + 1: and then the fuzzy entropy is When  is a finite number, ( 18) is rewritten as

Permutation Entropy.
Let   = ( 0 ,  1 , . . .,  −1 ); then  ()    has a permutation   if it satisfies the fact that where 0 ≤   ≤  − 1, and For each permutation   , 1 ≤  ≤ !, the relative frequency can be defined as where # represents the number of  ()    belonging to the type   .Then the definition of PE with  dimension can be written as 3.4.Coarse Grained Process.The multiple scales are realized through the coarse-grained process for better feature extraction.Further, the length of the coarse-grained time series x i+2 x 1 x 2 x 3 x 4 x 5 x 6 x i x i+1 depends on the length of the original time series divided by the corresponding scale factor, which is illustrated in Figure 5. Hence, the coarse grained time series  ()  at a scale factor of  can be constructed according to Then SaEn, FuEn, and PE of each coarse-grained time series are calculated based on ( 14), (19), and ( 22) and, respectively, plotted them as functions of the scale factor :  (, , , , ) =  ( ()  , , , ) (, , , , , ) =  ( ()  , , , , ) (, s, , ) =  ( ()  , , ) In this paper, the prespecified parameters are set in Table 1.
Especially, SD represents the standard deviation (std.) of the original signals.

Kernel Sparse Representation Classifier
Consequently, the linear representation of  along with all auxiliary training samples is expressed as where  0 = [0, . . ., 0,  1 ,  2 , . . .,    , 0, . . ., 0]  ∈ R  is a coefficient vector, in which the entries are zero if they do not belong to the th class.
The sparse solution to  =  can be achieved by optimizing the following  1 -minimization problem: When considering small noise, a noise term  ∈ R  with ‖‖ 2 <  is introduced to (29) and the formula can be modified as The flexible  1 -minimization problem for a sparse solution  is When a new sample is for testing, it could be expressed as ŷ =   (α 1 ), where   (α 1 ) ∈ R  is a vector in which parts of entries associated with class  are nonzero but the rest are zeros, and   (⋅) : R  → R  is a function that achieves coefficients selection related to the i-th class.Finally, the object class of the new testing sample could be identified with the residual between  and ŷ : The algorithm for SRC is summarized as follows.
(2) Normalize the columns of  to have unit  2 -norm.
(2) Normalize the columns of  to have unit  2 -norm.

Illustration of the Proposed Method
Since the proposed method could simultaneously perform feature selection and multiclass classification, the corresponding procedure based on ALIF enhanced multiscale entropy features and KSRC is set up, and the steps are as follows.
(1) Collect vibration signals of bearings with healthy and different defective types, in addition to different defect sizes for each defective type.
(2) Decompose the vibration signals into a sum of IMFs with ALIF.The first three IMFs containing prominent fault information are selected to extract multiscale entropy features and they are used to construct feature vectors after normalization with where  represents all features in one sample,  denotes the sequence of samples, and  is the sequence of features in each sample.
( The illustration of the proposed approach is shown in Figure 6.

Experimental Verification
To validate the capability of the proposed approach, two cases concerning bearing faults are investigated.One is about the bearing in the centrifugal pump considering different fault types [16].The other is about rolling bearings in the test rig from CWRU with different fault categories and severity levels [36].

Bearing Fault of the Centrifugal Pump.
The centrifugal pump test system is shown in Figure 7, and the experimental details can be found in [16].for test.The accuracy formula of testing samples is defined as and the accuracy formula of training samples is where   is the number of right classified testing samples;   is the number of testing samples;   is the number of    41) is 100% with std.0. In Table 4, accuracies of the ten tests are listed, and the maximum accuracy could reach 98%.The corresponding classification result of the proposed method at accuracy 98% is shown in Figure 11.Compared with [16], their mean accuracy varies from 94.58% to 97.08% according to ratio of the std. of the added noise in ensemble empirical mode decomposition (EEMD); moreover, the ratio of training samples in [16] is 40%, yet it is 20% in our paper.To show the advantage of high dimensional features in KSRC for accuracy improvement of bearing fault diagnosis, a comparison is performed as listed in Table 5.The sequence of the effect in accuracy from small to large, respectively, is MSaE, MFE, and MPE, and in pairs.Considering the defect size, labels of fault types corresponding to12 bearing fault states are specified for classification, and description of bearing fault states can be found in Table 6.Each state has 50 samples, and MSaE, MFE, and MPE over 20 scales of the first three components by ALIF are averaged from the fifty samples as shown in Figure 15.It is shown that the distances among different types of the three entropy features are not distinct; hence a combination is considered.To prove the accuracy of the proposed approach, ten repetitive tests are performed with randomly selected   16.

Artificially
In addition, another condition at 2 HP is considered to test the flexibility of the proposed approach with different loads.Ten random samples of load 0 HP are used to train KSRC and all samples of load 2 HP are used to test.Ten tests are performed as above and the results are listed in Table 9.The mean diagnostic accuracy at 2 HP is 89.73% with std.2.02%, and the maximum accuracy is 92.83% with the corresponding illustration in Figure 17.The high accuracy rate of diagnosis demonstrates the usefulness of the proposed approach under different loads.
Since the multiscale entropy features in Figure 15 are not easy to be distinguished from others due to the multiple faults, a comparison is performed considering different combinations of features in Table 10.Ten tests with different random testing samples as above are carried out as well.The mean and the std.corresponding to different features are listed in Table 10.The results have verified the advantage of the combination of MSaE, MFE, and MPE.
Besides, a list of literatures using the CWRU bearing data is collected in

Predicted results
True results entropy-based features, especially MPE, are mostly employed for bearing fault diagnosis and good results could be obtained.When considering more classified classes, combinations of multiscale entropy features will be a solution.

Sample number
Though the result of the proposed approach could not reach 100% like in [34] with 12 classified states as well, the proposed approach avoids the problem of feature selection and parameter optimization of SVM.Compared with the remaining researches, the proposed approach could deal with more classified states with high accuracy.

Conclusion
To

Figure 5 :
Figure 5: Illustration of the coarse-grained process.

)( 4 )
Set the number of training samples and testing samples.The training samples are randomly selected for KSRC.It is noted that the number of training samples includes the number of auxiliary training sample  and the number of auxiliary testing samples.After successful training, KSRC is used to test samples and identify the fault patterns with different severity levels.
Five commonly occurring faults in the centrifugal pump were set, including normal, bearing roller wearing (BRW), bearing inner race wearing (BIRW), bearing outer race wearing (BORW), and centrifugal pump impeller wearing (PIW).Vibration signals at the five fault states are shown in Figure 8, and the corresponding first five IMFs by ALIF are shown in Figure 9. From the comparison,

3 Figure 10 :
Figure 10: Multiscale entropy features over 20 scales of the first three IMFs by ALIF with the average of fifty samples.

Figure 11 :Figure 12 :
Figure 11: Classification result of the proposed method at accuracy 98%.

2 )Figure 13 :Figure 14 :
Figure 13: Bearing vibration signals with different fault types at defect size 0.007 inch and load 0 HP.

Figure 15 :Figure 16 :
Figure 15: Multiscale entropy features over 20 scales of the first three IMFs by ALIF with the average of fifty samples.

Figure 17 :
Figure 17: Classification result of the proposed method at classification accuracy 92.83% and load 2 HP.

Table 1 :
Prespecified parameters in the entropy computation.
4.1.Sparse Representation Classification.Let a matrix   represent features of the th class for auxiliary training samples, namely,   = [ 1 ,  2 , . . .,    ] ∈ R ×  , where  is the feature dimension, and   is the number of auxiliary training samples of the ith class.The auxiliary testing samples   ∈ R  from the same class could be approximately expressed as   =  1  1 +  2  2 + ⋅ ⋅ ⋅ +       .(27) Sparse Representation Classification.By means of the kernel trick, SRC is extended to KSRC for nonlinearity.Suppose a nonlinear mapping : R  → ,  → (), which realizes the transformation of auxiliary training samples from the original feature space R  into the kernel feature space .Similar to SRC, the  1 -norm minimization problem of (32) can be reformulated as 1 ,subject to      −     2 ≤ ,

Table 2 :
Description of the experimental data.

Table 3 :
Result of bearing fault diagnostic accuracy.
right classified training samples;   is the number of training samples.As listed in Table 3, mean of the testing classification accuracy by (40) with repetitions of ten times is 96.95% with std.0.98%, and mean of the training classification accuracy

Table 4 :
Diagnostic accuracy of ten times.

Table 5 :
A comparative study of different features on the effect of diagnostic accuracy.

Table 6 :
Description of the experimental data.
Seeded Damage Bearing.The bearing data are obtained from Bearing Data Centre of CWRU, and the bearing test system is shown in Figure12.The drive end bearing 6205-2RS JEM SKF is investigated, which is seeded with single point faults using electrodischarge machining.There are four states, including norm, ball fault (BF), inner

Table 7 :
Result of bearing fault diagnostic accuracy.

Table 11 ,
and they are arranged according to the classified states.Based on the comparisons in Table11and our work, respectively, it is shown that the

Table 8 :
Diagnostic accuracy of ten times at 0 HP.

Table 9 :
Diagnostic accuracy of ten times at 2 HP.

Table 10 :
A comparative study of different features on the effect of diagnostic accuracy.
improve accuracy of the bearing fault diagnosis considering multiple fault states with small samples, a novel bearing fault diagnosis method based on ALIF-enhanced multiscale entropy features and KSRC is proposed in this paper.Adaptive local iterative filtering could decompose the nonlinear and nonstationary vibration signals adaptively into a sum of IMFs with different scales.MSaE, MFE, and MPE values of the first three IMFs by ALIF are computed and normalized.Further, KSRC could accurately identify multiple faulty types of roller bearings with the normalized entropy features and realized features selection through regularization.Eventually, the proposed method is evaluated with experimental data concerning bearing faults in the centrifugal pump and multiple bearing faults from CWRU.The comparison shows that high dimensional features through small samples could achieve high accuracy of bearing fault diagnosis at 0 HP as well as varying working condition 2 HP.The results demonstrate that the proposed method is feasible and effective in bearing fault diagnosis.

Table 11 :
Previous work concerning bearing fault diagnosis published in the literatures.