A Fault Diagnosis Method for Rotating Machinery Based on PCA and Morlet Kernel SVM

1 School of Mechatronics and Automotive Engineering, Chongqing Jiaotong University, Chongqing 400074, China 2 School of Automation, Chongqing University, Chongqing 400044, China 3The State Key Laboratory of Mechanical Transmission, Chongqing University, Chongqing 400030, China 4Chongqing Academy of Metrology and Quality Inspection, Chongqing 401121, China 5 School of Mechanical and Electronic Engineering, Zhongyuan University of Technology, Zhengzhou 450007, China 6Mechanical and Electrical Engineering Department, Chongqing Vocational Institute of Safety & Technology, Wanzhou, Chongqing 404020, China


Introduction
Rotating machinery is widely used in the modern factory.Unexpected mechanical faults could cause unscheduled downtime and loss.So, it is very important to diagnose the fault of the rotating machinery, to achieve effective fault diagnosis of the rotating machinery; firstly, the features should be extracted from the collected vibration data.Then, based on the extracted features an effective diagnosis model should be selected [1].Feature extraction is the process of transforming the raw vibration data collected from running equipment to relevant information of health condition.There are three types of methods to deal with the raw vibration data: time domain analysis, frequency domain analysis, and time-frequency domain analysis.The three types of methods are often chosen to extract the feature.For example, Yan et al. [2] introduce that the time-frequency domain transform method wavelet is often used to describe the characteristics of the vibration signals.Gebraeel et al. [3] chose the average of the amplitudes of the defective frequency and its first six harmonics over time as the features.Yan et al. [4] chose the short-time Fourier transform to extract the features.Ocak et al. [5] chose the wavelet packet transform to extract the feature of bearing wear information.Because the time domain analysis and the frequency features from FFT analysis results often tend to average out transient vibrations and thus do not provide a wholesome measure of the bearing health status, in this paper, the time-frequency EMD is used to decompose the vibration signal and the EMD Shannon entropy is used to extract the original features from the signal.
Although the original features can be extracted, they are still with high-dimensional and include superfluous information.So, the original features fusion and dimensional reduction method should be used to deal with the original features so as to select the typical features.The most commonly used features fusion and dimensional reduction method is principal component analysis (PCA).Sun et al. [6] used PCA to extract features from the run-to-failure test of vibration signals of bearings.Dong and Luo [7] proposed a PCA-based multivariate analysis method for bearing degradation process prediction.In this paper, the PCA is used to achieve the extraction of the most sensitive features.
After selecting the typical features, another challenge is how to achieve effective fault diagnosis of the rotating machinery based on the extracted features.The existing machinery fault diagnosis methods can be roughly classified into model-based (or physics-based modals) and datadriven methods.The model-based methods diagnosis the equipment fault using two models, the physical models based on the components and the damage propagation models based on damage mechanics.However, equipment dynamic response and damage propagation processes are typically very complex, and authentic physics-based models are very difficult to build [8].Data-driven methods, known as artificial intelligent approaches, are derived directly from routine condition monitoring data of the monitored system, which achieve the fault diagnosis based on the learning or training process.The more prior the data used for the training process, the more accurate the model obtained.Artificial intelligent techniques have been increasingly applied to rotating machinery fault diagnosis recently.There have been some methods which are usually used for machinery fault diagnosis such as neural network and support vector machine (SVM) [9,10].However, the neural networks have the drawbacks of slow convergence; difficulty in escaping from local minima; and uncertain network structure, especially when doing the bearing fault diagnosis with large data.Those problems will be more troublesome.The SVM do not have those problems; however, the traditional SVM is not sensitive to the nonlinear feature classification, and, in recent years, the combination of wavelet theories and SVM has drawn considerable attention owing to its high classification ability for a wide range of applications and better performance than other traditional leaning machines.In this paper, the Morlet kernel is used to construct the new SVM model, and the PSO method is used to select the parameters [11].
The paper is organized as follows.In Section 2, the concept of EMD energy entropy is proposed and the EMD energy entropies of different vibration signals are calculated; the PCA is used to achieve the extraction of the most sensitive features.In Section 3, the Morlet wavelet kernel SVM model is presented.In Section 4, the running state identification model for rotating machinery fault diagnosis is applied to roller bearing.The conclusion of this paper is given in Section 5.
The flowchart of the proposed method is shown in Figure 1.

Methods of Signal Processing for Feature Extraction
This section presents a brief discussion on feature extraction from EMD. EMD is developed to decompose a signal into IMF components and every IMF has a unique local frequency.
The IMF should satisfy two conditions.(1) In the whole data set, the number of extreme and the number of zero crossings must be either equal or different at most by one and (2) at any point, the mean value of the upper envelope and lower envelope is zero [12].
Once the extreme is identified, the maxima are connected by using the cubic spline and used as the upper envelope.The minima are interpolated as well to form the lower envelope.The upper and the lower envelopes should cover all the data in the time series.The mean of the upper and the lower envelope,  1 (), is subtracted from the original signal to obtain the first component ℎ 1 () of the sifting process: Ideally, if ℎ 1 () is an intrinsic mode function, the sifting process will stop.So, it will shift the signal again in the same way to get another component ℎ 2 (): where  2 () is the mean of the upper and lower envelopes of ℎ 1 ().
Repeat steps until the residue satisfies some stopping criterion.The signal can be expressed as where  is the number of IMFs,   () is the residue which is a constant, a monotonic, or a function with only maxima and one minimum from which no more IMF can be derived, and   () denotes IMF.
Once the  IMFs and a residue   () are obtained, where the energy of the  IMFs  1 ;  2 ; . . .;   can be calculated, respectively, then, due to the orthogonality of the EMD decomposition, the sum of the energy of the  IMFs should be equal to the total energy of the original signal when the residue   () is ignored.As the IMFs  1 ();  2 (); . . .;   () include different frequency components, E = { 1 ,  2 , . . .,   } forms an energy distribution in the frequency domain of roller bearing vibration signal and then the corresponding EMD energy entropy is designated as where   =   / is the percent of the energy of   () in the whole signal energy ( = ∑  =1   ).After the EMD energy entropy of the rotating machinery is calculated, the feature extraction method PCA is used to fuse the relevant useful features and extract the most sensitive features to work as the input of the proposed prediction model.
The procedure of feature extraction can be described as follows.(1) Use the energy of the first five IMF components to get the features of the rotating machinery at each time.
(2) Use the PCA to reduce the original features dimensions and get one set of typical features as follows.
(a) Compute the covariance matrix from the data as where  is the data matrix of EMD IMFs,  is the total number of patterns, and  represents mean vector of .(b) Compute the matrix of eigenvectors  and diagonal matrix of eigenvalues  as (c) Sort the eigenvectors in  in descending order of eigenvalues in  and the data is projected on these eigenvector directions by taking the inner product in the data matrix sorted eigenvectors matrix as where  is of × dimension, and each row of it is an eigenvector.The features can be obtained.
(3) Use the features as input of the MSVM for rotating machinery fault state identification.

The Morlet Wavelet Kernel SVM Model
The support vector's kernel function can be described as the horizontal floating function, such as (,   ) = (⟨ ⋅   ⟩).
In fact, if a function satisfies the condition of Mercer's theorem, it is the allowable support vector's kernel function.
A specific Mercer's theorem description can be found in literature [13].
According to Mercer's theorem, the number of wavelet kernel functions which can be shown by the existent functions is few.Now, an existent wavelet kernel is given, the Morlet wavelet kernel.It can prove that this function can satisfy the condition of allowable support vector's kernel function.The Morlet wavelet function is defined as follows: The Morlet wavelet kernel function is defined as follows: Then, the Morlet wavelet kernel function is being used as the support vector's kernel function, and the SVM is defined as Through ( 9) and ( 10), the Morlet wavelet kernel SVM is constructed, and the new constructed SVM which is effective in classification is used to achieve bearing running state recognition.

The Morlet Kernel SVM Parameters Selection.
The particle swarm optimization algorithm (PSO) is used to select the SVM parameters, and the PSO was first proposed in 1995.It is an optimization method based on a set of particles whose coordinates are potential solutions in the search space.Particles in PSO will change their coordinates (their solutions) by migration.During migration, each particle adjusts its own coordinates based on its own past experience and other particles' past experiences.
The PSO was chosen to optimize the Morlet kernel SVM parameters through the following formula: where the subscript "" represents the th particle."" represents the -dimensional.
The subscript "" represents the  generation.V  () is the velocity of the th particle in the th iteration;   () is the position of the th particle;   () is the pbest position of the th particle;    is the gbest position (pbest represents the local optimum of the particles; gbest represents the overall situation optimum of the particles).The  represents the inertia weight. 1 ,  2 are learning factors. 1 ∼ (0, 1),  2 ∼ (0, 1) represent two independent random functions.
The process of optimizing the parameters ,  based on the PSO is given as follows.
(1) At the beginning of the optimization process, randomly initialize population sizes,  1 ,  2 , , rand(1), and rand(2), determine the termination condition, positions, and velocities of the particle, mapping the Morlet kernel SVM parameters   ,  into a group of particles, and initialize the initial position of each particle, pbest, gbest.
(2) When training the Morlet kernel SVM, use (11) as the PSO fitness function.
(3) Use the target parameters   ,  as the particles, use their initial values as the LS-SVM parameters in step (2), and use the corresponding value of (11) as the optimal solution of the   , .
(4) Use the initial error value of step (2) as the particle's initial fitness value and search the optimal value as the global fitness value among the initial fitness value and the corresponding particles as the current global optimal solution.
(5) Update the velocity and position vector.the global fitness value is superior to the current particle's fitness value, update the current particle's fitness value according to step (5) and update the current particle's optimal value equal to the corresponding particle's optimal value gotten in step (6).
(8) While the termination conditions are not met, return to step (5).
(9) End the loop.Then, normalize the 20 groups of entropy values and input them into the PCA to reduce the dimension.In order to compare the dimension reduction and redundant treatment effect of PCA, the manifold learning method, local tangent space alignment (LTSA) [15], and the locality preserving projections (LPP) [16] method are used to reduce the dimension.The results are shown in Figures 3, 4, and 5. To be comparable, the dimensions of PCA, LTSA, and LPP are set to 3, so the input dimension of MWSVM is 3 and the neighborhood number is set to 10.

Validation
By comparing Figures 3, 4, and 5, the results show that the LTSA-based data dimension reduction method can not effectively separate the high-dimensional features, and there is still serious aliasing, which will affect the accuracy of the SVM state recognition effect.The LPP-based data dimension reduction method works better than the LTSA methods; however, there still have some features mixed together.The PCA-based data dimension reduction method can effectively separate the features of different running states with high calculation accuracy and a higher computational efficiency than the LPP and LTSA methods, which conform more to the actual project requirement.Thus, in the study, the PCA method is selected.After dimension reduction with the PCA, the extracted features are input into the SVM to train the model so as to recognize the states.And the PSO is used to obtain the main parameters of the model, the particle swarm population size is set to 100, and the number of the particles is set to 20.The fitness function is set to get the minimum prediction error with the optimized parameters.The prediction error is set to 0.0001.The PSO particle's dimension is set to 2, the  is set to 0.5, the  1 is set to 1, and the  2 is set to 1.The optimized obtained parameter  0 is 5, and  is 0.3.Then, the two parameters are used to build the Morlet kernel SVM model to train and predict the value.In order to compare the identifying effect with and without manifold learning method, the following comparisons are done.
(1) Use EMD Shannon entropy to extract the features and directly input the extract features into the MWSVM, without the PCA dimension reduction process.
(2) Use EMD Shannon entropy to extract the features and process the extracted features by LTSA to reduce the dimension and then input the features into the MWSVM.
(3) Use EMD Shannon entropy to extract the features and process the extracted features by LPP to reduce the dimension and then input the features into the MWSVM.
(4) Use the method proposed in this paper.
The comparison results are shown in Table 2.
Table 2 shows that, after the PCA-based dimension reduction method and features extraction, the accuracy of states recognition improved significantly, much higher than the other algorithms.Therefore, the use of PCA for dimension reduction in this research is necessary and valuable.
In order to further verify the identification accuracy of the proposed method, the features extracted by PCA are input into the neural network, traditional RBF SVM (with penalty factor  set to 100 and nuclear parameter  set to 0.1), the   Symlet wavelet kernel SVM (with penalty factor  set to 100 and nuclear parameter  set to 0.1), the db wavelet kernel SVM (with penalty factor  set to 100 and nuclear parameter  set to 0.1), the Gaussian kernel SVM (with  = 23.7,penalty factor  set to 100, and nuclear parameter  set to 0.1), and the MWSVM (with  0 set to 5 and  set to 0.3).The comparison results are shown in Table 3. Table 3 shows that the MWSVM can better identify and approach the sensitive features because of Morlet wavelet kernel.Thus, the choice of MWSVM to determine the bearing running states can effectively improve recognition accuracy.
Next, a comparison about the training and test time loss of different methods is implemented.
(1) The vibration data processed by EMD Shannon entropy and the extract features are directly input into the MWSVM, without the PCA dimension reduction.
(2) The vibration data processed by EMD Shannon entropy and the features are processed by PCA to reduce the dimension.Then, the extracted features are input into the RBF kernel SVM.
(3) The proposed method is in this research.
The comparison results are shown in Table 4.
In Table 4, after the dimension reduction, the recognition speed of SVM improved significantly.The time loss of the proposed method is the shortest.The reason is that the  Morlet kernel is more sensitive to features classification and identification than the RBF kernel SVM.The result validates the proposed method and can effectively recognize the bearing running state.

Case 2.
After validating the efficacy of the proposed method, the method is used on the actual application.The test rig is shown in Figure 6.
The bearings are hosted on the shaft; the shaft is driven by AC motor, the power is 0.55 KW, and the rotation speed is kept at 1000 rpm, with speed control and AC inverter controller.The brake maximum torque is 5 N⋅m, with a radial booster, using the magnetic clutch and brake.The rolling bearing is used, and a radial load of 29.4 N is added to the bearing.The data sampling rate is 25600 Hz and the data length is 102400 collected points, as shown in Figure 7. Every 2 hours, the vibration data is collected once.The bearing is run for one year.Then, a set of data from each of the 2 months is selected; the data sets are used to test whether or not the proposed method can identify the bearing running state.4096 data points are selected to analyze, and 60 groups of collected data of different faults are obtained, with 30 groups for training and the other 30 groups for testing.
Next, each group of signals is decomposed by the EMD method, and the Shannon entropy is calculated.A group of features of different fault conditions are obtained, as shown in Table 5 (not normalized beforehand).Then, the 30 groups' entropy values are normalized and input into the PCA in order to reduce the dimension and extract the typical features; the extracted features are input into the Morlet kernel SVM.The recognized results are shown in Table 6.
Table 6 shows that, although the actual bearing running state is very complex, the proposed method yields a high recognized accuracy.The results confirm that the proposed method can recognize the bearing running states effectively.

Conclusion
Firstly, this research used the EMD Shannon entropy method to extract the original features from the rotating machinery

Figure 1 :
Figure 1: The flowchart of the proposed method.

( 6 )
Resubstitute the updated parameters   ,  into the Morlet kernel SVM model, retraining the Morlet kernel SVM model according to the step (2), save the output value, and calculate the fitness value of the particles again.(7) Compare the saved global fitness value gotten in step (6) with the current particle's fitness value, and if

Figure 2 :
Figure 2: The collected vibration signals of normal state and innerrace four different fault depths.

Figure 3 :Figure 4 :
Figure 3: The dimension reduction and redundant treatment effect of LTSA.
fault of 0.18 mm Inner-race fault of 0.36 mm Inner-race fault of 0.53 mm Inner-race fault of 0.71 mm

Figure 5 :
Figure 5: The dimension reduction and redundant treatment effect of PCA.

Table 1 :
A group of inner-race EMD energy entropy of different running states.

Table 2 :
The states recognition rate of three different methods (recognition rate %).

Table 3 :
The recognition rate of traditional RBF SVM and the MWSVM.

Table 4 :
The time loss of three different methods.

Table 5 :
A group of EMD energy entropy of different running states of the actual signal.

Table 6 :
The states recognition rate of different states based on the proposed method (recognition rate %).