Multiple-Fault Diagnosis Method Based on Multiscale Feature Extraction and MSVM_PPA

Identification of rolling bearing fault patterns, especially for the compound faults, has attracted notable attention and is still a challenge in fault diagnosis. In this paper, a novel method called multiscale feature extraction (MFE) and multiclass support vector machine (MSVM) with particle parameter adaptive (PPA) is proposed. MFE is used to preprocess the process signals, which decomposes the data into intrinsic mode function by empirical mode decomposition method, and instantaneous frequency of decomposed components was obtained by Hilbert transformation. Then, statistical features and principal component analysis are utilized to extract significant information from the features, to get effective data from multiple faults. MSVM method with PPA parameters optimization will classify the fault patterns. The results of a case study of the rolling bearings faults data from Case Western Reserve University show that (1) the proposed intelligent method (MFE PPA MSVM) improves the classification recognition rate; (2) the accuracy will decline when the number of fault patterns increases; (3) prediction accuracy can be the best when the training set size is increased to 70% of the total sample set. It verifies themethod is feasible and efficient for fault diagnosis.


Introduction
With the increasing complexity of modern industry, fault diagnosis as accurately and timely plays an important role in industrial applications.Many fault diagnosis analysis methods have been developed to accurately and automatically identify faults in the past two decades.They usually use some basic measurements, like vibration, acoustic, temperature, and wear debris analysis [1,2].Although these methods have been beneficial, tests are still quite expensive and timeconsuming.Fault diagnosis using the big data is a goal that has not yet been fully implemented.When the machine creates faults, the dynamic signals of the machine structure can be monitored in real time.The most effective fault diagnosis is feature learning from the monitoring information.Numerous researchers focus on fault diagnosis by faults patterns recognition, but most studies are concerned with the single fault patterns recognition [3][4][5].However, the fault of the machinery may be the compound, which is influenced by two or three causes in practice.
Compared with single fault, compound faults lead to serious performance degradation and are more difficult to recognize.This leaves the challenging task to identify multiple faults effectively.There are a few studies on multiple faults patterns recognition [6][7][8].Chen et al. integrate extremepoint symmetric mode decomposition with extreme learning machine to identify typical multiple patterns recognition [6].Ranaee and Ebrahimzadeh use a back-propagation (BP) neural network to recognize multiple-fault patterns [7].Lu et al. propose a hybrid system that uses independent component analysis (ICA) and support vector machine (SVM) for recognizing mixture patterns.That method initially applies the ICA to get the independent components (ICs), and then the ICs are used as the inputs for the SVM classifier [8].
Feature extraction has become a major technique for multiple-fault patterns recognition.Numerous previous studies have reported about signal processing [9][10][11][12][13], like Fourier Transform [9] and wavelet transform [10].But these methods should select appropriate base functions in advance.It is difficult to get effective analysis results, because the data from real world machines are nonstationary and nonlinear.Empirical mode decomposition (EMD), as a formidable and effective time-frequency analysis method, is programed to analyze the nonstationary signals and can be adaptive to decompose the confusion signal into intrinsic mode functions (IMFs) by the inherent characteristics of the signals [11][12][13].Features extraction by EMD is appropriate for distinguishing different mechanical signals [14][15][16].Wang et al. [14] propose a novel feature extraction method by nonnegative EMD manifold in machinery fault diagnosis.Saidi et al. [15] proposed an EMD-based fault diagnosis module to detect the incipient bearing faults based on the raw vibration signals.Ali et al. [16] use EMD as feature extraction method then select the most important intrinsic mode functions and classify bearings defects by the artificial neural network (ANN).However, due to factors of uncertainty and nonlinearity of the production process, EMD analysis methods may overcome these limitations to implement the multiple faults diagnosis.
Traditionally, fault diagnosis was examined and analyzed manually by some measurements data.With the development of machine learning techniques, expert systems were employed for faults recognition in automatic process monitoring [17][18][19][20].Support vector machine (SVM) has been widely used in recognizing multiple faults for its excellent performance in the practical application.Unlike neural network methods, SVM has great generalization ability of dealing with small samples since its model complexity does not depend on the number of features and thus is suitable for high dimensional data [13,21,22].
The above discussion shows that although some methods have shown prospective results in improving multiple-fault diagnosis performance, none has been widely used and still has improvement room to achieve the final goal.Multiplefault classification method is proposed by a combination of empirical mode decomposition, PCA, and MSVM theory with PPA parameters optimization.Process signals were decomposed into IMFs by EMD method, and Hilbert transformation is utilized to get the instantaneous frequency of decomposed components.Then, the statistical features of intrinsic mode function and instantaneous frequency were calculated.Principal component analysis is utilized to extract significant information from the statistical features, to get effective data of multiple faults.Finally, MSVM with PPA parameters optimization will classify the fault modes.

Processing Model
The proposed fault patterns recognition model performs in three modules to effectively monitor the multiple faults (Figure 1).
(i) Features extraction as the first stage: according to the characteristics of the machinery and equipment operating process, process signals were chosen and decomposed into IMFs by EMD method and instantaneous frequency of decomposed components was obtained by Hilbert transformation.Statistical and shape features are extracted from the EMD data.
Then PCA is further applied to reduce the feature dimension and the computational complexity.
(ii) Classify the fault patterns by MSVM in the second stage: the selected features are used as the inputs and the MSVM classifier should be designed properly for getting the satisfactory recognition performance.(iii) Optimization module as the third stage: K-fold crossvalidation and an adaptive mutation particle swam optimization are combined as the PPA parameters optimization method to select the parameters for MSVM.

Features Extraction
(1) EMD Method Principle.Empirical mode decomposition (EMD) is an adaptive frequency analysis method; its main principle is to decompose the complex signals into some IMFs self-adaptively by the inherent characteristics of the signal, and every IMF shows the specific frequency information of the signals.EMD helps to smooth processing of the signal, decompose the fluctuations or trends of signal in different scales gradually, and overcome the pre-filter center frequency and bandwidth problem in traditional envelope analysis.EMD is suitable for nonstationary and nonlinear signal analysis, and it has been widely used in many fields [11][12][13][14], and the steps of EMD can be shown as follows.
Firstly, the raw vibration signal was decomposed; see where () is the original vibration signal;   () is the intrinsic mode function;   () is the remaining functions, which represents the overall trend of signals. means decomposition time and  means total decomposition times.
The mode components are separated by the instantaneous frequency from high to low; EMD method can be viewed as a set of high-pass filter from the filter characteristic.EMD method obtained the first few high-frequency IMF components, which can effectively represent the signal characteristics; remaining IMFs belong to the residual component, which is mainly the low-frequency noise.Selected highfrequency IMF components are transformed by the Hilbert method (see (2)), and (3) construct their analytic signals.

𝐻 [𝑐
Amplitude function and phase function can be got by Eq. ( 5) means the derivative of the phase function, namely, the instantaneous frequency.
IMF components largely reflect the true characteristic information of the original signal.However, both ends of the  signal generate divergent phenomenon due to the EMD having used the cubic spline interpolation method.And with the gradual deepening of decomposition, divergent phenomenon has extended to the entire signal and produces modal aliasing.
Recent studies show that the longer signal can be selected to reduce endpoint divergence and then select the subsequent IMF component, whose both ends of the signal are intercepted, to reduce the impact of endpoint modal aliasing.In this paper, IMFs and their corresponding instantaneous frequencies are selected as characteristic variable data, and they effectively extract features value to calculate the corresponding eight statistical feature values as feature data set, respectively, mean, max, range, standard deviation, skewness, kurtosis, coefficient variation, and sum of square (see Table 1).
(2) Principal Component Analysis (PCA).Multivariate statistical analysis is the most commonly used data-driven method for fault diagnosis.The principal component analysis (PCA) as an efficient representative uses linear transformation to obtain fewer features as little as possible.It can make these new features not correlated to each other but maintain the original information as much as possible.The PCA method is chosen as the second feature extraction step to eliminate duplication of information in this study.
PCA algorithm is assumed that we have a collection of  unlabeled training data samples organized into the matrix  = [ (1) ,  (2) , . . .,  () ].Each data sample is a (column) vector of dimension .For example, sample  can be denoted as  () = [ ()  1 ,  () 2 , . . .,  ()  ]  , where, for a matrix (or a vector) ,   indicates its transpose.For simplicity of description, we assume each data sample has already been appropriately scaled and demeaned.
First, calculate the covariance matrix The matrix  describes the overall scattering of the process data.The matrix  is symmetric and can be orthogonally diagonalized as Table 1: Eight statistical features.

Statistical features name Formula Mean
where  = [ 1 ,  2 , . . .,   ] is an orthonormal eigenvector matrix, and The eigenvectors   , 1 ≤  ≤  are the principle components of the th dimension.Let be the contribution rate of the th principal component.
Choose the smallest  principal components such that where the threshold is the desired percent of variance retained; for instance, a threshold can be chosen to be 0.99.The data set shall be approximated using the set of the first  principal components organized into the matrix Specifically, for each data sample  () , 1 ≤  ≤ , the extracted new feature is a -dimension vector given by The new data set is subsequently given by  = [ (1) ,  (2) , . . .,  () ] .
Therefore, the original data set is where  is the residual matrix, mainly caused by noise.Removing residuals will not cause significant impact on the useful information.
O p t i m a l s e p a r a t i n g h y p e r p l a n e ( O S H ) M a r g i n

Support Vector Machine.
Basic binary SVM is initially designed to deal with two-class problems based on the structural risk minimization theory.It is set up to get the best solution between model complexities and learning ability.However, it has been extended to multiclass problems.The binary SVM classification method is established by constructing an optimal separating hyperplane (OSH), in order to maximize the margin between two classes of data points (see Figure 2).Suppose that the training set ( () ,  () ),  = 1, 2, . . ., ,  () ∈   ,  () ∈ {−1, 1}.A binary SVM model with a nonlinear kernel  is to find the best classifier  with parameters   ,  = 0, 1, . . ., , in the form of where the minimization is over all decision variables  0 ,   ,   ,  = 1, . . .,  and  > 0 is a penalty constant.It can be applied to control the trade-off between minimize classification error and maximize margin.
For nonlinear decision boundary, the kernel function is applied to transform the input from a low-dimensional space into a higher dimensional feature space, so that an optimal linear separating hyperplane can be found.Although many researchers proposed several types of kernel functions, radial basis functions (RBF) are the most widely used to solve nonlinear problems in SVM.Its definition can be described as for ,  () ∈   ,  = 1, 2, . . ., : where  ≥ 0 denotes the width of the RBF, and ‖‖ indicates the Euclidean norm of .
To solve multiclass problems, a MSVM method is applied in the second classifier stage.Two kinds of the MSVM methods are widely used; one is one-against-all (OAA); the other is one-against-one (OAO).In this paper, the OAO is adopted for multiple faults recognition.This method constructs ( − 1)/2 binary SVM classifiers and every sample is trained to separate one class from another class.Testing sample can be got the results by the voting results.
The largest problems encountered in the MSVM are to select the best penalty parameter () and the kernel function value ().To alleviate this difficulty, PPA parameters optimization method is next used to get the best values of parameters  and  in the MSVM classifier.

Parameters Optimization.
MSVM is applied to classify the multiple-fault patterns in this paper, but the largest problem encountered is how to select the penalty parameter () and kernel function parameters value () of MSVM.In many literatures, these two parameters can be got by -fold cross-validation.
The principal of -fold cross-validation is make the original sample separated into  subsamples randomly and then choose one subsample as the testing data for validation, and the rest ( − 1) subsamples are used as the training data.Then repeat  times; apply every subsample adopted once as the validation data.An averaged testing result can be got by this -folds.The advantage of this method is to ensure that all data are applied for training and testing.Select the optimal penalty parameter () and kernel function parameters value () with the highest rate of -fold cross-validation as the final MSVM parameters, and then use these two parameters on the entire training set, and finally test unknown testing samples by the trained classification model.
-fold cross-validation weakness is that elected best parameters obtained from the training data cannot represent the entire training data.The result will be affected when there are only small sizes of training data.Therefore, adaptive mutation particle swam optimization and -fold crossvalidation are combined to get the best parameters of MSVM.Firstly, define MSVM regularization parameter () and kernel parameter () as a combination of a particle and use the training accuracy by -fold cross-validation as the fitness function; 3-fold cross-validation is applied.The steps of the proposed parameters optimization method are followed.
Step 1. Set PPA parameters, like population number, swarm size, maximum velocity, and the probability of adaptive mutation rate, parameter ranges (see Table 2).
Step 2. Randomly generate the initial particle and set velocity; the particles are used MSVM to get training accuracy as fitness value by 3-fold cross-validation method, The 3-fold in cross-validation is chosen according to the proportion of the training samples to the testing sample [23], the particle's best known position of the initial position.
Step 3. Update the individual position and velocity of every particle.Subsequently, renew the best known position (  ) of each particle  and the best group position   .Specifically, And similarly Moreover, define Calculate Then calculate each component of For  = 1, 2, . . ., , the notation sgn indicates the usual signum function.Then the th particle position is updated with Step 4. For solving PSO's "premature" problem, which is easy to relapse into a local extremum and other particle quickly moves to this local position in the optimization process.AMPSO is used to solve this problem; it makes the algorithm escape from the local optima to find the best solution in the other space.
As can be seen in formula (22), the next position of a particle is determined by both its current position and its new velocity.The new velocity is determined by the immediately previous velocity, individually best   , and group best   , as shown in formula (20).If the algorithm is in premature, then the group best   is the local optimal solution.If   is changed, the search direction of particles will be redirected.Thus, the main idea of the AMPSO is by mutating   in hope that the search will get out of a local optimum to explore new individual optimum and group optimum.
The mutation of the PSO is designed as a random operator with a certain probability .Specifically, for a uniformly distributed random number rand ∈ (0, 1) a mutated new group optimum is obtained as follows: Step 5. Until meeting a termination criterion, which can be the number of iterations performed, or meeting the accuracy requirements, repeat from Step 2.
Step 6. Find the global best position (  ).
Through improved adaptive mutation particle swam optimization algorithm and -fold cross-examination validating, the optimum parameters of MSVM are used to train the entire training set; get the PPA-MSVM classification model.Then trained PPA-MSVM classifier is utilized to train unknown testing set, so that diagnosis of the rolling bearing faults is done intelligently.

Case Analyses
For verifying the feasibility and effectiveness of this method, the bearing dataset of Case Western Reserve University Bearing Data Center is adopted in this paper.The detailed description of the experimental apparatus is presented in Figure 3; it consists of a 2 hp, three-phase induction motor (left), a torque transducer (middle), and a dynamometer-load (right) [24,25].There are four different bearing conditions with four different loads simulated including healthy, inner-race defect, rolling element defect, and outer-race defect.Typical waveforms of the four conditions are illustrated in Figure 4.

Data Descriptions.
In this study, the bearings with 1797 r/ min in rotating speed at a sampling frequency of 12 kHz for four bearing conditions were selected to evaluate the   3.Each sample set selects 1000 vibration data, 120 groups of data sets in each mode.

Features Extraction
(1) Empirical Mode Decomposition.Firstly, Process vibration signals were decomposed into intrinsic mode function by EMD method.IMF1 ∼ 4 after intercepting both ends are selected in this paper; then instantaneous frequency of decomposed components was obtained by Hilbert transformation (seen in Figure 5).A total of eight components are, respectively, four IMF components and four corresponding instantaneous frequency sequences.For suppressing the end effect of EMD, each component signal cut off both ends, taking the remaining 800 data points of IMF and instantaneous frequency.
(3) Principal Component Analysis.Principal component analysis can get the fewer variables but maintain the original information as much as possible.So in this study, after choosing the statistical and sharp features, we use the PCA method as the second feature extraction to eliminate duplication of information.70% of the sample are used for training and the rest 30% are for testing.The recognition accuracy can be estimated by the testing samples.In PCA, cumulative contribution threshold value is set to 90%, indicating that PCA select owns 90% of data information by the main components; the results are shown in Figure 7 and Table 4.  function, first use the cross-validation (CV) method to get regularization parameter () and kernel parameter (), both of the search range [10 −2 10 2 ]; the index step is 0.5, the relationship of the trained MSVM accuracy and ,  is shown in Figures 8 and 9. From Figures 8 and 9, MSVM classification recognition rate has a greater impact when  and  vary over a considerable range.The lowest classification rate is only around 18%. Accordingly, use particle parameter adaptive method to optimize MSVM kernel parameters  and regularization parameter .The parameters of PPA can be seen in Table 5,  and take fold number  of -fold cross-validation as 3 (best cross-validation accuracy = 95.24%,best  = 100, and best  = 0.01).

Result Analysis
The average recognition accuracies of GV MSVM and PPA MSVM show that proposed PPA MSVM method plays a significant role in increasing the recognition accuracy.Because GV MSVM quite depends on the  fold results of the training sample data, but not the entire training set, PPA algorithm can solve this problem.
The simulations show that the PCA method is less effective than that of EMD statistical features.The data dimensions of the original feature set will be effectively reduced to improve the efficiency of the identification operation.And only relying on a small number of 15 main element characteristics can effectively identify the type of fault and still maintain a high accuracy rate.But on the other hand, compared with the original feature set, using PCA can cause information missing to reduce the recognition effect, but this negative effect impact to identify the fault is small; we can accept this result.
(2) Performance of Recognizer in Different Fault Patterns.To verify the proposed method and analyze the classify accuracy on different fault pattern, the behavior of the simulated fault patterns number, PCA, and parameters optimization method were examined through a full factorial experiment.It mainly had the following three factors: (1) parameters optimization method with two levels (CV MSVM and PPA MSVM), (2) fault pattern number with three levels of four patterns (see Table 3, fault numbers 1-4), seven patterns (see Table 3, fault numbers 1-7), and ten patterns (see Table 3, fault numbers 1-10).
Table 6 shows the results of the full factorial experiment.The prediction accuracy can be reached to 100% when it only has 4 fault patterns.It indicates that the proposed method was effective in classifying the fault patterns.However, the accuracy will decline when the numbers of fault patterns increase.The reasons are the multiple-fault patterns information may be confused and more difficult to recognize and so result in serious performance degradation.Besides, when it is 7 fault patterns, PCA method plays an effective role in two different parameters optimization method.Finally, the results show that parameters optimization method of PPA is more effective than that of CV when coupled with the PCA and different fault pattern number.7.As shown by the experimental results, prediction accuracy can be the best when the training set size is increased to 70% of the total sample set.The reason is the prediction accuracy will be higher when the training model gets the best parameters.But it also has the overfitting problem; we can find the prediction accuracy to be not well when the percentages of training samples are 80% and 90%.
(4) Performance of Recognizer in Different Feature Extraction Methods.Feature extraction can lead to faster training and more efficiency in multiple-fault diagnosis method.Thirteen statistical and shape features are utilized as the inputs in this paper.In order to explain its effectiveness, MSVM classifier using the EMD, PCA, and MFE (Combine EMD and PCA) as the feature extraction method is constructed.Table 8 shows the recognition accuracy of three different feature extraction methods.
The average prediction accuracies of EMD MSVM (8.33%), PCA MSVM (72.5%), and MEF MSVM (94.50%) show that feature extraction method plays an important role in improving the recognition accuracy.From the results, we can find that multiple-fault diagnoses are difficult to recognize due to the complex relation, but the result is much better after using multiscale feature extraction (MEF) method, which decomposes the data into intrinsic mode function empirical mode decomposition method and instantaneous frequency of decomposed components was obtained by Hilbert transformation, and then statistical features and principal component analysis are utilized to extract significant information from the features.

Conclusion
The objective of this study is to propose a fusion approach for the multiple-fault diagnosis with single and coupling faults, by multiscale feature extraction with integrating three information methods (empirical mode decomposition, statistical features extraction, and principal component analysis) of signal progress, respectively, in time domain, frequency domain, and time-frequency domain.MSVM method with particle parameter adaptive (PPA) parameters optimization will classify the fault patterns.From this discussion, the proposed MFE MSVM PPA method can produce the highest average correct classifier accuracy compared with other methods in experiments.Besides, we analyze the influences of the prediction accuracies under different elements, like parameters optimization method, fault pattern number, PCA, and training sample size.The proposed classification method holds high precision on multiple faults fusion diagnosis and is proved to be a promising diagnosis approach for catering to the increasing characteristic parameters and feature information.
This multiple-fault diagnosis approach is feasible and, as the computational results show, quite effective in improving the compound faults diagnosis of rolling bearing fault patterns.While still immature, the data is from simulation, not relying on significant real-time testing.Because getting field data for validation of the approach is very difficult, we generate the simulated data using the rolling bearings faults original data from the Case Western Reserve University, then to analyze the multiple-fault pattern recognition problem.

Figure 2 :
Figure 2: The OSH of a binary SVM.

Figure 3 :
Figure 3: Case Western Reserve University Rolling experiment platform.

Figure 4 :
Figure 4: Typical waveforms from the four conditions.

Figure 5 :
Figure 5: IMFs of four kinds of conditions in rolling bearing case.

( 1 )Figure 6 :
Figure 6: Box plots of the eight features for different classes.

( 3 )
Performance of Recognizer in Different Training Samples.The performance of the optimization methods has been compared with CV for investigating the capability of the proposed REBs multiple-fault patterns method.To indicate the influence of the training sample, we test the accuracies of the PCA PPA MSVM model based on the proposed EMD and statistical feature extraction, in the cases where the percentages of training samples are 40%, 50%, 60%, 70%, 80%, and 90%.The testing results are presented in Table

Table 2 :
The parameters of AMPSO. + 1,  > 0, are updated using the current velocity and the distances from   and   as follows.Specifically, let  max

Table 5 :
Comparison of the performance of PPA MSVM with GV MSVM classifiers.

Table 6 :
Comparison of the performance of recognizer in different fault patterns.