The Fault Diagnosis of Rolling Bearing Based on Ensemble Empirical Mode Decomposition and Random Forest

Accurate


Introduction
As the common components in rotating machines, the rolling bearings play an increasingly important role in the industry.Faults occurring in the bearings may lead to the fatal breakdown of machines and inestimable economic losses [1].Therefore, it is significant to detect the existence and severity of a fault in the bearing fast, accurately, and easily [2].Several methods have been fully established and developed about bearing fault throughout these last years [3].
The common methods used for the research of fault diagnosis are divided into two types.One type of research is to identify the status of the bearing, to distinguish between good and faulty bearings, and to indicate the defective components.Another is to separate the bearing related signal from other components and to minimize the noise that may mask the bearing signal, especially in the early stage of the rolling element bearing fault.The first type is the most wide research area and is the focus of this paper.The methods used in the past include fast Fourier transform (FFT) analysis, cepstrum analysis, the envelope spectra technique [4], the high frequency resonance technique (HFRT), timefrequency analysis, and higher order statistics.Since the signals of faulty rolling bearing have the character of nonlinear and nonstationary, the above methods are not suited for these signals.Wavelet is an approach suitable for nonlinear, nonstationary signal but has its own flaws in theory.It lacks adaptability, since the sampling frequency and the decomposition scale of the wavelet are determined in advance instead of according to the characteristics of the signal itself.The essence of wavelet analysis is to form the original signal using a set of wavelet basis functions.However, the wavelet basis function is also selected in advance and therefore could not be a good representation of each local feature.Recently, a method called Empirical Mode Decomposition (EMD) proposed by Huang et al. offers a different approach to signal processing.Since it is based on the local characteristic time scales of a signal and could self-adaptively decompose the complicated signal into some intrinsic mode functions (IMFs) [5], EMD can overcome the deficiencies of some methods.Then EMD became a popular method in signal processing, particularly in fault diagnosis of rolling element bearings since its occurrence.However, EMD has an open problem, the mode mixing, which resulted from signal intermittency.Mode mixing is defined as a single IMF with oscillations that include a wide variety of scales or components with similar scales in different IMFs.When the problem of mode mixing occurs, an IMF can cease to have physical meaning by itself, suggesting falsely that there may be different physical processes represented in a mode [6].
Wu and Huang proposed a new method called Ensemble Empirical Mode Decomposition (EEMD) in 2009, which is an improvement of EMD, aiming to eliminate the mode mixing problem of EMD automatically by application noiseassisted analysis to EMD and promote antialiasing decomposition.Then EEMD has attracted widespread attention and has been proven to be better than EMD in decomposing vibration signals of rotating machinery in the decomposing of vibration signals for rotating machinery [7,8].EEMD is a tool for decomposing nonstationary signals, it can also be used to extract the features and reveal the signal characteristic information accurately based on IMFs.When different types of faults occur in the same parts of the rolling bearing, the signal energy value in different frequency bands can be changed and can used to judge fault degree by calculating the energy entropy of different IMFs.Ozgonenel et al. [9] investigated performance of EEMD and made a comparison between it and classical EMD for feature vector extraction.Zhang et al. [10] extracted two types of features referred to as singular values and AR model parameters based on EEMD and then inputted these features to particle swarm optimization support vector machine to diagnose faults for rolling element bearings.An et al. use EEMD and Hilbert transform to extract the fault features of bearing pedestal looseness of wind turbine effectively [11].
Commonly, a classification method is used for fault identification after feature extraction.The common classification methods used in previous studies are decision tree, neural network, nearest neighbor algorithm, naive Bayes, support vector machine, and so on.But a single classifier will often appear to be the problem of low accuracy or overfitting.Random forest (RF), introduced by Breiman, is an integrated tree classifier that is composed of several decision trees.It is an ensemble method which uses recursive partitioning to generate many trees and then aggregate the results [12].The average value over all individual trees is output, since only continuous values are relevant in this case.
This paper proposes a new method combining EEMD and RF for diagnosis of rolling bearing.First, the original signal is decomposed into a number of IMFs with EMD, and it is also decomposed into other IMFs with EEMD.Comparing the IMFs produced by the two methods, which can show the superior performance of EEMD on the mode mixing, the part IMFs produced by EEMD are selected and their energy is calculated.The signal is also decomposed into the same number of levels as the IMFs by EEMD.Finally, the energy entropy is used to extract features of signals to identify the fault state by RF.
The paper is organized as follows.Section 2 is dedicated to introduce the EEMD method and energy entropy.In the Section 3, we are going to explain the RF.Section 4 presents the results of the empirical analysis.Finally, the conclusion is drawn in Section 5.

EEMD and Energy Entropy
2.1.EEMD.EEMD is the improvement of EMD, while EMD is a method which can smooth the signal and decompose the fluctuation and trend of different scales into a series of intrinsic mode functions (IMF).An IMF should satisfy the following two conditions.(1) The number of extremes and the number of zero-crossings must be either equal or differ by at most one over the whole data set.(2) At any point, the mean value of the envelope defined by the local maxima and the envelope defined by the local minima are zeros.
The EMD algorithm is described as follows: (1) Define  = 0 and (2) Find all the maxima and minima of ().
(3) Connect all the local maxima (resp., minima) with a line known as the cubic spline lines, and it is the upper envelope  max () (resp., the lower envelope  min ()).(4) Calculate the mean value of upper and lower envelopes point by point; that is, (5) Let and replace () by  1 ().
(8) Repeat steps ( 1)-( 6), until  +1 () is a monotonic function or constant, then the decomposition of () ends, and we get  IMFs and one trend term As for EEMD, its essential characteristic is adding finite white noise to the investigated signal.EEMD is a more mature tool for nonlinear and nonstationary signal processing.The principle of the EEMD is simple: the added white noise populates the whole time-frequency space uniformly, facilitating a natural separation of the frequency scales, which reduces the occurrence of mode mixing.The mechanism of this idea is that the signal regions of different scales will be automatically mapped to the appropriate scales signal associated with the white noise background when adding the white noise of uniform distribution.As it should be, each independent test may produce noisy results because each signal contains the signal itself and the noise.Since the noise is different in each individual test, the noise will be offset when using enough overall mean for test.Ultimately, the only constant part is the signal itself.The specific process of EEMD is as follows.
(1) Add the target signal () with a set of white noises   (), whose mean is 0 and standard deviation is 1, to construct a hybrid sequence   (): where   () is the th added white noise series,  = 1, 2, . . .,  and  is the initialized number of ensembles.
(3) Repeat step (1) and step (2) for  times with different scales of white noise series each time to obtain an ensemble of IMFs: (4) Calculate the ensemble means of the corresponding IMFs of the decomposition as the final result: Furthermore, some studies concluded that added noise may help extract more proper components from the original data.Wu and Huang (2009) proved a well-established statistical rule for controlling the effect of the added noise: where  is the ensemble number,  is the amplitude of the added white noise, and   is the standard deviation of the error, which is defined as the difference between the input signal and the corresponding IMFs.Previous work has suggested that an ensemble with a few hundred members and an added white noise with a fixed amplitude at 0.2 times the standard deviation of the original signal will lead to an exact result.Unfortunately, until recently, no previous studies have presented a method for determining the best amplitude of the added white noise.We set the number of ensemble members to 100 and select the optimal standard deviation of the white noise series from 0.1 to 0.2 using the -fold cross-validation method [13].

Energy Entropy.
When the different parts of the rolling bearing fault, the frequency distribution of the vibration signal will change, and the energy of fault vibration signal will also change [14].Generally, the energy distribution in signals of normal bearing is even and uncertain and thus bigger than fault one.Hence, the EEMD energy entropy can be used to specify whether the bearing has faults or not.Since some IMFs are beneficial for fault diagnosis, we select effective IMFs before calculating the energy entropy.The correlation coefficient in statistics can be used for this analysis.The correlation coefficient represents the degree of correlation between sequences and hence can be used to select the proper component.Assume the number of effective IMFs is .The energy entropy of IMFs is calculated by the following steps: (1) Calculate the energy of th IMF where  is the length of a IMF.
(2) Calculate the total energy of these  efficient IMFs (3) Calculate the energy entropy of IMFs where  en is the energy entropy in the whole of the original signal and   =   / is the percentage of the energy of the IMF number  relative to the total energy entropy.
(4) Thus, the feature vector can be constructed: where  en IMF  is energy entropy in the whole of the IMF number .

Random Forest
Random Forest is a kind of integrated tree classifier [15,16].
The flow chart of Random Forest is shown in Figure 1: the detailed algorithm is as follows.
(1) Take a random sample of  observations from the data set with replacement of the complete set of  observations using bootstrap resampling technique.This sample forms a new classifier.Some observations, nearly 2/3, will be selected more than once, and the rest will not be chosen.The remaining 1/3 of the cases is called "out of bag" (OOB).New random selection of cases is performed for each constructed tree.(2) Construct a tree (to the maximum size and without pruning) using the cases selected in the previous step.During this process, only consider a subset of the total set of predictor variables every time that it is needed to split a node.The set of predicted variables is selected as a random subset of the total set of available predictor variables.Perform a new random selection for each split.For each split, some prediction results (including the best) cannot be considered, but the prediction results excluded in one split can be used by other splits in the same tree.(3) Repeat steps (1) and (2) to construct a forest, that is, a collection of trees.(4) Run the example through each tree in the forest and record the predicted value to score a case.Use the predicted categories for each tree as "vote" for the best class, and use the class with the most votes as the predicted class.
When we use the previous algorithm to construct a RF, about 1/3 of the cases are excluded from each tree in the forest.These cases are called "out of bag" (OOB); each tree will have a different set of OOB cases.The OOB case is not used to construct a tree or to constitute a separate test sample of the tree.The OOB cases for each tree are run through the tree and the error rate of the prediction is computed, to measure the generalization error of the forest.The error rates for the trees in the forest are then averaged to give the overall generalization error rate for the decision tree forest model.
RF has two random elements: (1) the selection of data set used as input for each tree; (2) the set of predictor variables considered as candidates for each node split.These randomisations, along with combining the predictions from the trees, significantly improve the overall predictive accuracy.This is a significant advantage to dealing with the engineering practice [17].Another point of concern is the random selected subset of features.It makes the structure of the tree less complete and greedy and increases the possibility that some weak features can have access to the tree and combine with other features.Thus, the local characteristics of each sample can be magnified and the probability of wrong judgement caused by information loss can be reduced.All the votes by trees have a comprehensive assessment for a sample [18].
RF uses the Gini index as a measure of optimal segmentation selection, which measures the relative impurity of a given element relative to the remaining classes (Breiman et al., 1984).Thus, the decision tree grows to its maximum depth (without pruning) by using a given combination of features [19].
RF can handle data with many features and distinguish which features are more important in the classification.In addition, it will not be overfitting.The model generalization ability is very strong since the generalization of the error is unbiased estimate when creating a Random Forest.For unbalanced data sets, RF can balance the error.If a large part of the feature is lost, it can still keep its accuracy high enough [20][21][22].

Application
The flow chart of the fault diagnosis in this paper is shown in Figure 2.

Empirical Data.
In this paper, all the data of roller bearing vibration are from the website of Case Western Reverse Lab [23].The website provides access to ball bearing test data for normal and faulty bearings.Experiments were conducted using a 2 hp Reliance Electric motor, and acceleration data was measured at locations near to and remote from the motor bearings.Motor bearings were seeded with faults using electro-discharge machining (EDM).Faults ranging from 0.007 inches in diameter to 0.040 inches in diameter were introduced separately at the inner raceway, rolling element (i.e., ball), and outer raceway.Faulted bearings were reinstalled into the test motor and vibration data was recorded for motor loads of 0 to 3 horsepower (motor speeds of 1797 to 1720 RPM).Data was collected for normal bearings, single-point drive end (DE), and fan end (FE) defects.
The data we used is generated by the motor whose speed is about 1750 RPM, the fault diameter is 0.014 inches, and Each working state contains 24 data samples, and each sample has 5000 data points.
It shows the time-domain waveforms of vibration signals for the first sample of the seven working states in the Figure 3.Each form has its own characteristics, but we need to distinguish them quantitatively.The next step is feature extraction.
The idea of this paper is to decompose the signal first and then extract the characteristics of each component.The commonly used decomposition method is wavelet and EMD, but there exist physical meaning confusing and mode mixing phenomenon that would also confuse the component   characteristics.EEMD could make better performance than EMD on mode mixing.

Comparison Analysis.
Considering the sample size is too large, we only use the first sample of DEBall to demonstrate the process of comparison and feature extraction.Figures 4,  5, and 6 show partial details or IMFs decomposed by EMD and EEMD, respectively.The subgraphs in Figure 4 do not show significant physical meaning because the wavelet basis function is predetermined and not suitable for the signal.Each subgraph contains more than one frequency component showing an obvious mode mixing.The components in Figure 5 are better than those in Figure 4, but IMF6 and IMF 8 still contain more than one regular frequency and show the mode mixing phenomenon in EMD.Finally, the corresponding IMFs in Figure 6 eliminate the different frequencies obviously.It shows that EEMD can avoid mode mixing because it could separate high frequency and low frequency components clearly and obtain the meaningful signal sufficiently.It can also prove that EEMD maintains the adaptability in signal decomposition.    1, which can show that the first six IMFs have a strong correlation with the original signal since these six correlation coefficients are greater than 0.01 and can be chosen as effective IMFs.Process all the 96 samples in the same way and choose 6 IMFs for each sample and then calculate their energy entropy.
According to the calculation method in Section 2.2, the results are shown in Table 2, in which the energy entropy is for the first sample of DEBall.It shows that the energy entropy of the normal state is indeed bigger than the fault states.When the bearing fails, the specific frequency band will appear to be corresponding resonance frequency.Then energy is concentrated in this band, so that the uncertainty of the energy distribution is reduced; finally the entropy is reduced.
After the calculation of energy entropy is completed, a 168 × 6 matrix of energy entropy is obtained.Before the classification, we want to explore an interesting question, which is, can the proposed EEMD method extract characteristics of the signal more accurately than the classical method, such as wavelet?To answer it, we also decompose each sample with wavelet.In order to ensure compliance with the process of EEMD and feature extraction, we use the wavelet to decompose the signal in each sample into 12 levels and calculate the energy entropy using the first six levels.Finally, another 168 × 6 matrix based on wavelet decomposition is also obtained.Then the Random Forest classification can carry out 10000 trees that are set here.
First, we put the feature matrix based on wavelet into RF classifier.The result shows that the OOB estimate of error rate is 1.2%, which means there are two observations being misclassified.Detailed classification results are shown in Table 3. Rows and columns in the table are true result and forecasting result, respectively.The first misclassified observation is in the second row, which originally belongs to the DEIR, but identified as the FEBall.It means different components and different types of faults are confused.The second misclassified observation is in the third row; it originally belongs to the DEBall and is wrongly identified as the DEIR.The last column is the error rate which indicates the ratio of false classification of the true classes, the error rate of DEIR is 0.042, and the error rate of DEBall is also 0.042.
Secondly, we put the feature matrix based on EEMD into classifier.The result shows that the OOB estimate of error rate is 0.60%, which means there is only one observation being misclassified.The confusion matrix of the two classification processes is shown in Table 4. Similarly, the misclassified observation originally belongs to the DEBall but is identified as the DEOUTR.The value of the false classification ratio of DEBall is 0.042.It shown the effect of energy entropy on accuracy and Gini index of DE defects in the Table 5.The greater the value, the greater the effect.It can conclude that HenIMF 4 has the greatest effect on both accuracy and Gini index, meaning it is the most important factor of the classifier.It indicates that the IMF 4 contains the features that represent the original signal best.
Since there are seven types of working state, which are collected in chronological order, their characteristics may be affected by its own trend.5-fold cross-validation has been taken on the energy entropy matrix.We divided the sets of each working state into five randomly and then reassembled them into five subsets.In the 5-fold cross-validation, a subset was used as test set and the rest were used as training sets.Repeat the cross-validation for 5 times until all subsets have been tested; then calculate the average recognition rate and error rate as a result.It shows that the errors of the training set and the test set are both 0. Therefore, the error rate of classification has reached a quit low level.That is, the classification result is accurate.

Conclusions and Discussion
A fault diagnosis method based on EEMD and RF is put forward in this paper.EEMD method is suitable for analyzing complex multicomponent signals.For the fact that the vibration signal is nonlinear and unstable, EEMD method is chosen to precondition the vibration signal of the roller bearing to produce a set of IMF components.EMD and wavelet are chosen too, to highlight the advantages of EEMD in dealing with nonlinear signal and mode mixing.In this paper, the energy entropy is introduced and EEMD method is combined with RF.We calculate energy entropy of the IMFs components and take them as the inputs of a RF classifier for classification.As a comparison, we replaced EEMD with wavelet and then repeated the process of feature extraction and classification.The results of the application on the real data show that the accuracy of classification based on wavelet is high, but the EEMD based method is higher; that is, EEMD method can effectively extract the signal feature effectively.In addition, the 5-fold cross-validation for EEMD based method is performed; we can obtain that the errors of training set and test set are both small.It also proves that RF classification is accurate.
EEMD is a good choice when the signal needs timefrequency analysis, especially when the signal is nonlinear and nonstationary.Since it can keep the advantages of EMD and avoid model mixing, it can accurately capture the features of the signal.Its application can be seen in different areas, such as gears, electricity, weather, and medicine.For RF, beyond fault diagnosis, it can also be used in corporate credit assessment, document retrieval, medical diagnostics, image recognition, and so on.

Figure 1 :
Figure 1: The flow chart of RF.

Figure 2 :Figure 3 :
Figure 2: Fault diagnosis based on EEMD and RF and comparison with wavelet.

4. 3 .
Feature Extraction and Classification.The result of EEMD decomposition has 11 IMFs and one trend term.The

Table 1 :
The correlation coefficient between each IMF and unresolved signals.

Table 2 :
Energy entropy of each bearing state for the first sample.

Table 3 :
Random Forest classification results of wavelet matrix.

Table 4 :
Random Forest classification results of EEMD matrix.

Table 5 :
The effect of energy entropy on accuracy and Gini index.