Identification of rolling bearing fault patterns, especially for the compound faults, has attracted notable attention and is still a challenge in fault diagnosis. In this paper, a novel method called multiscale feature extraction (MFE) and multiclass support vector machine (MSVM) with particle parameter adaptive (PPA) is proposed. MFE is used to preprocess the process signals, which decomposes the data into intrinsic mode function by empirical mode decomposition method, and instantaneous frequency of decomposed components was obtained by Hilbert transformation. Then, statistical features and principal component analysis are utilized to extract significant information from the features, to get effective data from multiple faults. MSVM method with PPA parameters optimization will classify the fault patterns. The results of a case study of the rolling bearings faults data from Case Western Reserve University show that
With the increasing complexity of modern industry, fault diagnosis as accurately and timely plays an important role in industrial applications. Many fault diagnosis analysis methods have been developed to accurately and automatically identify faults in the past two decades. They usually use some basic measurements, like vibration, acoustic, temperature, and wear debris analysis [
Compared with single fault, compound faults lead to serious performance degradation and are more difficult to recognize. This leaves the challenging task to identify multiple faults effectively. There are a few studies on multiple faults patterns recognition [
Feature extraction has become a major technique for multiple-fault patterns recognition. Numerous previous studies have reported about signal processing [
Traditionally, fault diagnosis was examined and analyzed manually by some measurements data. With the development of machine learning techniques, expert systems were employed for faults recognition in automatic process monitoring [
The above discussion shows that although some methods have shown prospective results in improving multiple-fault diagnosis performance, none has been widely used and still has improvement room to achieve the final goal. Multiple-fault classification method is proposed by a combination of empirical mode decomposition, PCA, and MSVM theory with PPA parameters optimization. Process signals were decomposed into IMFs by EMD method, and Hilbert transformation is utilized to get the instantaneous frequency of decomposed components. Then, the statistical features of intrinsic mode function and instantaneous frequency were calculated. Principal component analysis is utilized to extract significant information from the statistical features, to get effective data of multiple faults. Finally, MSVM with PPA parameters optimization will classify the fault modes.
The proposed fault patterns recognition model performs in three modules to effectively monitor the multiple faults (Figure Features extraction as the first stage: according to the characteristics of the machinery and equipment operating process, process signals were chosen and decomposed into IMFs by EMD method and instantaneous frequency of decomposed components was obtained by Hilbert transformation. Statistical and shape features are extracted from the EMD data. Then PCA is further applied to reduce the feature dimension and the computational complexity. Classify the fault patterns by MSVM in the second stage: the selected features are used as the inputs and the MSVM classifier should be designed properly for getting the satisfactory recognition performance. Optimization module as the third stage:
EMD_MSVM_PPA classification flowchart.
Firstly, the raw vibration signal was decomposed; see
The mode components are separated by the instantaneous frequency from high to low; EMD method can be viewed as a set of high-pass filter from the filter characteristic. EMD method obtained the first few high-frequency IMF components, which can effectively represent the signal characteristics; remaining IMFs belong to the residual component, which is mainly the low-frequency noise. Selected high-frequency IMF components are transformed by the Hilbert method (see (
IMF components largely reflect the true characteristic information of the original signal. However, both ends of the signal generate divergent phenomenon due to the EMD having used the cubic spline interpolation method. And with the gradual deepening of decomposition, divergent phenomenon has extended to the entire signal and produces modal aliasing. Recent studies show that the longer signal can be selected to reduce endpoint divergence and then select the subsequent IMF component, whose both ends of the signal are intercepted, to reduce the impact of endpoint modal aliasing.
In this paper, IMFs and their corresponding instantaneous frequencies are selected as characteristic variable data, and they effectively extract features value to calculate the corresponding eight statistical feature values as feature data set, respectively, mean, max, range, standard deviation, skewness, kurtosis, coefficient variation, and sum of square (see Table
Eight statistical features.
Statistical features name | Formula |
---|---|
Mean | |
Max | |
Range | |
Standard deviation | |
Skewness | |
Kurtosis | |
Coefficient variation | |
Sum of square | |
PCA algorithm is assumed that we have a collection of
First, calculate the covariance matrix
The matrix
Let
where the threshold is the desired percent of variance retained; for instance, a threshold can be chosen to be 0.99. The data set shall be approximated using the set of the first
Specifically, for each data sample
The new data set is subsequently given by
Therefore, the original data set is
Basic binary SVM is initially designed to deal with two-class problems based on the structural risk minimization theory. It is set up to get the best solution between model complexities and learning ability. However, it has been extended to multiclass problems.
The binary SVM classification method is established by constructing an optimal separating hyperplane (OSH), in order to maximize the margin between two classes of data points (see Figure
The OSH of a binary SVM.
For nonlinear decision boundary, the kernel function is applied to transform the input from a low-dimensional space into a higher dimensional feature space, so that an optimal linear separating hyperplane can be found. Although many researchers proposed several types of kernel functions, radial basis functions (RBF) are the most widely used to solve nonlinear problems in SVM. Its definition can be described as for
To solve multiclass problems, a MSVM method is applied in the second classifier stage. Two kinds of the MSVM methods are widely used; one is one-against-all (OAA); the other is one-against-one (OAO). In this paper, the OAO is adopted for multiple faults recognition. This method constructs
The largest problems encountered in the MSVM are to select the best penalty parameter
MSVM is applied to classify the multiple-fault patterns in this paper, but the largest problem encountered is how to select the penalty parameter
The principal of
Set PPA parameters, like population number, swarm size, maximum velocity, and the probability of adaptive mutation rate, parameter ranges (see Table
The parameters of AMPSO.
Name | Value |
---|---|
Maximum number of iterations | 100 |
Swam size | 20 |
| |
| |
Velocity parameters | 0.6 |
Constants | 1.5 |
Probability of adaptive mutation rate | 50% |
Randomly generate the initial particle and set velocity; the particles are used MSVM to get training accuracy as fitness value by 3-fold cross-validation method,
Update the individual position and velocity of every particle. Subsequently, renew the best known position
And similarly
Then calculate each component of
For solving PSO’s “premature” problem, which is easy to relapse into a local extremum and other particle quickly moves to this local position in the optimization process. AMPSO is used to solve this problem; it makes the algorithm escape from the local optima to find the best solution in the other space.
As can be seen in formula (
The mutation of the PSO is designed as a random operator with a certain probability
Until meeting a termination criterion, which can be the number of iterations performed, or meeting the accuracy requirements, repeat from Step
Find the global best position
Through improved adaptive mutation particle swam optimization algorithm and
For verifying the feasibility and effectiveness of this method, the bearing dataset of Case Western Reserve University Bearing Data Center is adopted in this paper. The detailed description of the experimental apparatus is presented in Figure
Case Western Reserve University Rolling experiment platform.
Typical waveforms from the four conditions.
In this study, the bearings with 1797 r/min in rotating speed at a sampling frequency of 12 kHz for four bearing conditions were selected to evaluate the proposed method. Single point faults with different fault diameters of 0.178 mm, 0.356 mm, and 0.533 mm were used to test in this paper. So there are ten fault classes; specific data is shown in Table
Bearing fault data.
Number | Bearing condition | Diameter/mm |
---|---|---|
1 | Healthy | — |
2 | Inner-race | 0.178 |
3 | Rolling-element | 0.178 |
4 | Outer-race | 0.178 |
5 | Inner-race | 0.356 |
6 | Rolling-element | 0.356 |
7 | Outer-race | 0.356 |
8 | Inner-race | 0.533 |
9 | Rolling-element | 0.533 |
10 | Outer-race | 0.533 |
IMFs of four kinds of conditions in rolling bearing case.
Health conditions
Inner-race conditions
Rolling element conditions
Outer-race conditions
The feature space box plots of the eight features in 10 different classes are generated by the data after EMD (seen in Figure
Box plots of the eight features for different classes.
Principal component analysis.
Component | Initial Eigenvalues | Percentage (%) | Cumulative (%) |
---|---|---|---|
1 | 0.5227 | 26.42 | 26.42 |
2 | 0.3128 | 15.81 | 42.23 |
3 | 0.2932 | 14.82 | 57.05 |
4 | 0.1143 | 5.78 | 62.82 |
5 | 0.1044 | 5.28 | 68.10 |
6 | 0.0857 | 4.33 | 72.44 |
7 | 0.0751 | 3.80 | 76.23 |
8 | 0.0583 | 2.94 | 79.18 |
9 | 0.0419 | 2.12 | 81.29 |
10 | 0.0389 | 1.97 | 83.26 |
11 | 0.0372 | 1.88 | 85.14 |
12 | 0.0349 | 1.76 | 86.90 |
13 | 0.0290 | 1.46 | 88.37 |
14 | 0.0280 | 1.41 | 89.78 |
15 | 0.0247 | 1.25 | 91.03 |
Principal components of PCA (Pareto).
Accuracy with
Accuracy with
From Figures
Comparison of the performance of PPA_MSVM with GV_MSVM classifiers.
Method | PCA | Best | Best | Training | Prediction |
---|---|---|---|---|---|
GV_MSVM | / | 16 | 0.25 | 98.69 | |
PCA | 8 | 0.25 | 98.21 | | |
PPA_MSVM | / | 69.89 | 0.01 | 97.62 | |
PCA | 100 | 0.01 | 96.42 | |
The average recognition accuracies of GV_MSVM and PPA_MSVM show that proposed PPA_MSVM method plays a significant role in increasing the recognition accuracy. Because GV_MSVM quite depends on the
The simulations show that the PCA method is less effective than that of EMD statistical features. The data dimensions of the original feature set will be effectively reduced to improve the efficiency of the identification operation. And only relying on a small number of 15 main element characteristics can effectively identify the type of fault and still maintain a high accuracy rate. But on the other hand, compared with the original feature set, using PCA can cause information missing to reduce the recognition effect, but this negative effect impact to identify the fault is small; we can accept this result.
Table
Comparison of the performance of recognizer in different fault patterns.
Method | Fault | PCA | Training | Prediction |
---|---|---|---|---|
CV_MSVM | 4 | / | 100 | 100 |
4 | PCA | | | |
7 | / | 99.32 | 97.62 | |
7 | PCA | | | |
10 | / | 98.69 | 93.33 | |
10 | PCA | | | |
| ||||
PPA_MSVM | 4 | / | 100 | 100 |
4 | PCA | | | |
7 | / | 99.49 | 98.41 | |
7 | PCA | | | |
10 | / | 97.62 | 96.67 | |
10 | PCA | | |
The optimal values of the MSVM classifier parameters for different training samples.
Method | Training set number | Testing set number | Training | Prediction |
---|---|---|---|---|
40% | 48 | 72 | 93.33 | 86.39 |
50% | 60 | 60 | 94.5 | 93.00 |
60% | 72 | 48 | 94.72 | 90.63 |
70% | 84 | 36 | 96.42 | |
80% | 96 | 24 | 96.20 | 88.33 |
90% | 108 | 12 | 97.59 | 89.17 |
As shown by the experimental results, prediction accuracy can be the best when the training set size is increased to 70% of the total sample set. The reason is the prediction accuracy will be higher when the training model gets the best parameters. But it also has the overfitting problem; we can find the prediction accuracy to be not well when the percentages of training samples are 80% and 90%.
Prediction accuracy of three different feature extraction methods.
Method | Prediction |
---|---|
EMD_MSVM | 8.33 |
PCA_MSVM | 72.50 |
MEF_MSVM | 94.50 |
The average prediction accuracies of EMD_MSVM (8.33%), PCA_MSVM (72.5%), and MEF_MSVM (94.50%) show that feature extraction method plays an important role in improving the recognition accuracy. From the results, we can find that multiple-fault diagnoses are difficult to recognize due to the complex relation, but the result is much better after using multiscale feature extraction (MEF) method, which decomposes the data into intrinsic mode function empirical mode decomposition method and instantaneous frequency of decomposed components was obtained by Hilbert transformation, and then statistical features and principal component analysis are utilized to extract significant information from the features.
The objective of this study is to propose a fusion approach for the multiple-fault diagnosis with single and coupling faults, by multiscale feature extraction with integrating three information methods (empirical mode decomposition, statistical features extraction, and principal component analysis) of signal progress, respectively, in time domain, frequency domain, and time–frequency domain. MSVM method with particle parameter adaptive (PPA) parameters optimization will classify the fault patterns. From this discussion, the proposed MFE_MSVM_PPA method can produce the highest average correct classifier accuracy compared with other methods in experiments. Besides, we analyze the influences of the prediction accuracies under different elements, like parameters optimization method, fault pattern number, PCA, and training sample size. The proposed classification method holds high precision on multiple faults fusion diagnosis and is proved to be a promising diagnosis approach for catering to the increasing characteristic parameters and feature information.
This multiple-fault diagnosis approach is feasible and, as the computational results show, quite effective in improving the compound faults diagnosis of rolling bearing fault patterns. While still immature, the data is from simulation, not relying on significant real-time testing. Because getting field data for validation of the approach is very difficult, we generate the simulated data using the rolling bearings faults original data from the Case Western Reserve University, then to analyze the multiple-fault pattern recognition problem.
The future work will be focused on the following aspects: (1) employing multiscale feature extraction (MFE) as feature extraction method which we will compare with other excellent feature extraction methods; (2) comparing particle parameter adaptive (PPA) with other intelligent algorithms; (3) researching the fundamental principles of multiple faults diagnosis; and (4) study with real data is certainly our next research task.
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This work is financially supported by the Fundamental Research Funds for the Central Universities under Grant no. 2682016CX031 and National Natural Science Foundation of China (NSFC) under Grant no. 51175442.