Bearing Fault Diagnosis Using a Novel Classifier Ensemble Based on Lifting Wavelet Packet Transforms and Sample Entropy

In order to improve the fault detection accuracy for rolling bearings, an automated fault diagnosis system is presented based on lifting wavelet packet transform (LWPT), sample entropy (SampEn), and classifier ensemble. Bearing vibration signals are firstly decomposed into different frequency subbands through a three-level LWPT, resulting in a total of 8 frequency-band signals throughout the third layers of the LWPT decomposition tree. The SampEns of all the 8 components are then calculated as feature vectors. Such a feature extraction paradigm is expected to depict complexity, irregularity, and nonstationarity of bearing vibrations. Moreover, a novel classifier ensemble is proposed to alleviate the effect of initial parameters on the performance of member classifiers and to improve classification effectiveness. Experiments were conducted on electric motor bearings considering various set of fault categories and fault severity levels. Experimental results demonstrate the proposed diagnosis system can effectively improve bearing fault recognition accuracy and stability in comparison with diagnosis methods based on a single classifier.


Introduction
Rolling element bearings are among the most critical components in various machines, and their faults are the main causes of breakdowns in rotating machinery.It was reported that rolling bearing faults accommodate 45-55% of asynchronous motor failures.A variety of fault diagnosis methods have been developed and exploited effectively to detect bearing faults at an early stage for the purpose of keeping machinery performing at its best and avoid unplanned downtime and economical loss.In order for the large machines used in current industry to operate in a safe and efficient mode, a number of sensors of which the number might be up to several thousands are employed to collect dynamical signals [1,2].The amount of signals to be processed is such vast that it must resort to automated fault diagnosis systems instead of manual analysis.Vibrations emitted from industry machinery like asynchronous motors usually contain signatures of multiple resources and are affected by operation parameters including speed and load.Accordingly, bearing fault diagnosis is not a trivial task in terms of signal processing and fault identification.As an antecedent step of machine prognostics and health management (PHM), it needs to not only find the faulty bearings but also locate faulty components, as different fault location follows different fault development mode.As such, the objective of the present work is to identify bearing health condition and locate faulty bearing components with emphases on feature extraction and faulty component recognition.
When local faults such as cracks, pitting, and indentions occur to bearings, the fault signature is represented by repeating impulses in vibrations.The interval and intensity of impulses vary with speed or load fluctuation and slipping between bearing parts.As such, bearing vibrations could be considered to be nonstationary.Various methods have been employed to deal with the nonstationary characteristics of vibration signals for fault diagnosis of rolling bearings [3,4].For nonstationary signals, it is desired to examine how their energies vary with time and frequency.Such a demand impels the development of time-frequency or timescale signal processing methods, among which wavelet packet transform (WPT) has proven effective in feature extraction Shock and Vibration and been exploited for fault diagnosis of rolling bearings [5].Selection of wavelet basis has a significant effect on the results of wavelet transform.During the past decades, many methods for constructing wavelet basis have been proposed, providing a rich variety of wavelet function for fault diagnosis in practice.These traditional wavelet functions were normally constructed by Fourier transform in the frequency domain, and hence the traditional wavelet transform was also known as the first-generation wavelet transform [6].The WPT of a vibration signal results in a set of frequency-band signals locating in independent frequency bands by means of orthogonal or biorthogonal wavelet packet basis.The frequencyband signals are multiscale representation of original signals and able to highlight the information related to health condition of machinery.As frequency-band signals have the same length as original signal, it is therefore necessary to extract some features to represent each frequency-band signal and eventually result in a feature vector of original signals.The energies of each frequency-band signal at the bottom layer of the WPT decomposition tree are extracted as feature vector to depict bearing vibrations in [7].Reference [8] extracts the standard deviations (STD) of the WPT coefficients as features for gear vibrations, where the experimental results indicate the STD can lead the neural networks to converge more rapidly than the aforementioned energy features.
Although feature extraction methods based on WPT preprocessing have reported considerable success in those works, more attention is deserved to be paid on the nonlinear information due to factors such as discontinuous stiffness, damping, surface friction, and impacts in defective bearings.The nonlinearity enriched by the presence of faults will render the traditional extraction methods based on the assumption of linear system less effective [9].As such, it is important to extract nonlinear features for bearing fault diagnosis.With the development of the nonlinear theory, a lot of nonlinear dynamic parameters have found applications in fault diagnosis.Those nonlinear dynamic parameters, taking fractal dimension (FrD), for example, effectively describe the irregularity and complexity of the vibration signals and reflect the change in mechanical systems' health condition.Reference [10] decomposes vibration signals by WPT and utilizes FrD as a parameter to depict the irregularity and complexity of each frequency-band signal.The combined use of WPT and FrD can characterize not only the nonstationarity but also the irregularity and complexity of vibration signals.
In order to further improve the application of the WPT and nonlinear dynamic parameters, two items deserve more attention.On the one hand, the aforementioned feature extraction methods are all based on the first-generation wavelet packet transform preprocessing.However, WPT has limited number of wavelet functions and the adaptive construction of wavelet basis is difficult in practice.Lifting wavelet packet transform (LWPT) proposed by Sweldens, also known as the second-generation wavelet packet transform, is an alternative scheme for building wavelet function which was constructed by means of the lifting scheme.The wavelet function construction is no longer based on Fourier transform but is obtained completely in time domain [11].Lifting wavelet packet transform therefore has a deal of merits over the traditional WPT, including the flexibility of wavelet function construction and less computational effort and memory.As such, the LWPT is positively appreciated in mechanical fault diagnosis.On the other hand, the calculation of some nonlinear dynamic parameters like fractal dimension (FrD) requires a long noise-free data set which is not beneficial for online diagnosis and difficult to obtain particularly under nonstationary conditions.Due to such shortcomings, approximate entropy (ApEn) is proposed to assess regularity of time series by means of statistical methods and applied to dealing with physiological signals and vibration signals [12].Sample entropy (SampEn) proposed by Richman and Moorman is a modified version of ApEn [13,14].In comparison with ApEn, SampEn is less dependent on data quantity and at certain extent robust to noises.Therefore, SampEn can exactly reflect the complexity and irregularity of signals with wide applications in biomedical signal processing [15,16].Motivated by the similarities between mechanical vibration signals and biomedical signals, SampEn is expected to effectively describe the complexity and irregularity of bearing vibrations.For these reasons, the present study investigates the joint use of LWPT and SampEn for feature extraction of bearings fault diagnosis.
In order to reduce manual intervention and human subjectivity, signals are analyzed in an automatic fashion in the form of intelligent diagnosis [17].In recent years, artificial neural networks (ANNs) have been widely used in intelligent fault diagnosis to conduct pattern classification.The performance of a single neural network is usually affected by initial parameters like weights and node number in middle layer, and thus its recognition accuracy is unstable [18].Aimed at such a dilemma, many methods of the multiple classifier fusion have been applied in the field of pattern recognition.Multiple classifiers fusion harnesses the advantages of different neural networks and avoids the shortcomings of a single neural network.Reference [19] utilizes the multiclassifier fusion which consists of seven different classifiers and combined by majority voting scheme (MVS) to classify four different patterns, which can acquire significantly higher classification accuracy than a single neural network.When more than one class wins the highest number of votes, classification decision was solved by comparing the sum of the posterior probability of each class.Reference [20] exploits the result from the highest rate of the member of multiclassifier in order to solve the problem that more than one class gets the highest number of votes.All of the above multiclassifier fusion can effectively improve the recognition accuracy in comparison with the diagnosis method based on a single classifier.However, these aforementioned methods not only employ a large number of classifiers which will increase computation burden and decrease recognition accuracy but also failed to fully resolve the problem that MVS become invalid when more than one classification wins the highest number of votes.For these reasons, the present study investigates a multiclassifier fusion algorithm using the form of binary tree for fault classification, where a multiclassification issue turns into a series of binary classification problems.
Based on the aforementioned discussions, a new approach is proposed for efficient bearing fault diagnosis by x o (k) The paper is organized as follows.Section 2 introduces the theoretical backgrounds of LWPT, SampEn and presents the proposed binary tree structure based classifier ensemble.Section 3 shows the architecture of the proposed fault diagnosis system.The experimental setup is described in Section 4. In Section 5, the experimental results and discussions are given followed by a conclusion of the current paper in Section 6.

Theoretical Background
2.1.Lifting Wavelet Packet Transform.Wavelet packet transform can be implemented using lifting scheme in an easy understanding and efficient way [21].The wavelet basis is determined by the prediction operator and update operator.Selecting different prediction operator  = [ 1 ,  2 , . . .,   ] and update operator  = [ 1 ,  2 , . . .,  M] is equivalent to determining different wavelet function, which can obtain different signal decompositions.The decomposition process of the LWPT consists principally of three steps: split, predict, and update, as shown in Figure 1.
In the update step, a designed update operator is applied on the detail coefficients obtained at the above step to update the even samples, which enable them to maintain a global nature of the original signal (), such as energy, the mean, or vanishing moments.The update operator  = [ 1 ,  2 , . . .,  M] is applied to the detail coefficients resulting from the prediction step and added to the even sample   (); the concrete processing of update step is as follows: where M, an even number, is the length of the update operator.The above formula indicates that M detail coefficients were used to update an even sample, and the obtained  = {(),  = 0, 1, . . ., ⌈/2⌉} is defined as the approximated factor of the original signal ().
The above three steps complete the processing of the first decomposition of the lifting wavelet packet transform.In order to acquire approximation and detail coefficients for () at different scales, constantly repeat these three steps to approximation and detail coefficients which are calculated at each scale.The LWPT reconstruction can be performed by simple algebraic transformation from ( 2) and (3); the reconstruction processing of the LWPT consists of three steps: undo update, undo prediction, and merge, as shown in Figure 2.
In the undo update step, the even sample   () is recovered using approximate coefficient  and detail coefficient :

Undo update Undo prediction Merge
x e (k) x o (k) In the undo prediction step, the odd sample   () is recovered using the above even sample   () and detail coefficient : In the merge step, the original signal () is recovered using the above even sample   () and odd sample   (): 2.2.Sample Entropy.Provided a time series containing  points {(1), (2), . . ., ()}, the step of calculating its sample entropy is as follows [13,14].
(III) Given the threshold r, the number which satisfied the inequality (, ) <  is counted for each value , the ratio of this number, and the total number of distances  −  + 1 which is denoted by    (): where  ≤  ≤  − ,  ̸ =  and the average of all  is counted as (IV) The above three steps are repeated for  + 1, and then  +1 () is obtained.
(V) Theoretically, SampEn(, ) is defined as and when  is a finite value, the SampEn of a time series containing  points is defined as 2.3.Binary Tree Structure Based Classifier Ensemble.In pattern recognition, it is impossible to achieve good recognition results for all samples by only one classifier, and different classifiers may lead to different results.As the performance of a single neural network is susceptible to its initial parameters, the recognition accuracy is unstable and volatile.Therefore, the results obtained by using a single neural network are limited.For these reasons, multiple classifiers fusion has potentials to improve the results due to the fact that classifier ensemble combines the advantages and overcomes the shortcomings of member classifiers.When majority voting scheme (MVS) is employed to build a classifier ensemble system, the number of the required member classifiers needs to be larger than that of the patterns to be recognized.In order to tackle such an issue, a multiclassifier fusion system is proposed by dividing the classification of multiple classes into a series of binary recognition issues.The member classifiers constructing the classifier ensemble system include BP neural network, Elman neural network, and RBF neural network.
BP neural network known as a feed-forward artificial neural network is proposed by Rumelhart, Hinton, and Williams in 1986 which belongs to supervised learning and consists of nonlinear transformation units [22].A BP neural network has a three-layer or over three-layer structure with strong nonlinear mapping ability and self-learning, selforganization, and adaptive ability, which is currently the most widely used network in many fields.
Elman neural network, a well-known recurrent topology, is proposed by Jeffrey Elman in 1990.This network is more sensitive to the historical data, which enable it to handle the dynamic information.Furthermore, the network does not utilize state variable as the input or training signals due to the fact that its internal connections depict its dynamic characteristic, which makes it more suitable for the modeling of time-varying system [23].
RBF neural network, a feed-forward neural network with three layers, is proposed by Broomhead and Lowe in 1988.It is composed of input layer, hidden layer, and output layer where the input and output consist of linear neurons and the hidden layer node is a Gaussian kernel.The most important characteristics of the RBF network lie in the fact that its hidden layer neurons have only local reactions of input function, which is in the middle of the basis function.RBF neural network is characterized by simple structure, concise training, and fast learning convergence with the ability to approximate any nonlinear function [24].Classifier ensemble gives a final result by combining the output of each member classifier through certain fusion algorithm.A lot of fusion algorithms are available like voting scheme and DS evidence theory.The present study exploits MVS to build a multiclassifier fusion system.The MVS is a simple and effective method at decision level, of which the final decision is the one that the majority of the member classifiers support.Although the final decision is not necessary to be the best decision rule, it is the decision with highest relative reliability.Nevertheless, for effective utilization of MVS, the number of the member classifiers is usually bound to be larger than that of the patterns to be recognized.Otherwise, it is difficult to achieve decision fusion in certain cases.For example, if 3 classifiers are applied to classify 10 patterns, the 3 classifiers may give a result different with each other.In this case, the MVS is unable to give a reasonable result.Aimed at such a problem, the present study puts forward a classifier ensemble algorithm performed in the form of binary tree, where the multiclassification problem is divided into a sequence of binary classification.Figure 3 shows the procedure of the proposed classifier ensemble.Patterns are grouped into two categories at each node, and the first category contains only one fault type while the rest of the fault types are considered as the other category.Taking a total of 10 fault types for example, at the first node of the binary tree, the bearing condition C1 is treated as the first category, while the remaining 9 bearing health conditions are packed together as the other category.This process is repeated until the last node contains only two bearing fault types, that is, C9 and C10.Such a trick transforms the multiclassification problem into a series of binary classification problems, which is beneficial for performing multiclassifier fusion based on the MVS with only three member classifiers on each node.

Architecture of the Proposed Fault Diagnosis System
Figure 4 depicts the procedure of the proposed fault diagnosis system.Firstly, the signals are decomposed into different frequency subbands through a three-level LWPT, resulting in a total of 8 node signal components.Later on, the SampEns of all the 8 components were calculated as a feature input to a binary tree structure based classifier ensemble.The trained binary tree based classifier ensemble was finally utilized to recognize the testing set.

Experimental Setup.
Experiments were conducted on rolling bearings to testify the proposed fault diagnosis method.The test rig is a motor-driven mechanical system as shown in Figure 5 which is composed of a three-phase induction motor on the left, a torque sensor in the middle, and a dynamometer on the right [25].The vibration signals were obtained by means of a 16-channel DAT recorder at the sampling frequency of 12 000 Hz, and the shaft rotating speed was set approximately 1797 rpm.In order to acquire vibration signals of various bearing heath conditions, an accelerometer was attached to the motor housing at the drive end and fixed at 12 o' clock position.The experimental data set is depicted in Table 1 in detail, including a total of 10 bearing conditions considering various fault types and different severity levels.Each bearing condition was collected with 60 samples, and each data sample contains 2000 data points.Examples for the time waveforms of the 10 bearing conditions are shown in Figure 6, where conditions C3, C5, and C7 are characterized by obvious impulses.

(13)
SampEn is an improved version of approximate entropy with the ability to reflect the complexity and irregularity of a time series.For calculation of SampEn, the dimension  and the tolerance level  should be determined a priori.Herein, the tolerance level  is selected as 0.2 times the standard deviations of the inspected data and the dimension of the space  is chosen as 2.
With above selected parameters, the bearing signals are first decomposed by the three-level LWPT (with  = 12, M = 12).The SampEns of all the 8 components are then calculated as a feature vector to depict the irregularity and complexity of the bearing vibration signals.By the joint use of the LWPT and SampEn, the features of the signals shown in Figure 6 are obtained.It is seen from Figure 8 that the extracted features can effectively distinguish among different bearing fault types and severity levels.1, 10 types of bearing conditions have 60 × 10 = 600 data samples in total.The data set was divided into a training set and a testing set, where the training set was 40 randomly selected samples from each kind of health conditions and the testing set was constituted by the remaining data samples.That is, the training set comprised 40 × 10 = 400 data samples and the testing set contained 20 × 10 = 200 data samples.The testing set serves the purpose of measuring the performance of the trained binary tree structure based classifier ensemble.Each sample is represented by a feature vector consisting of the SampEns of all the 8 node signal components throughout the third layer of the LWPT decomposition tree.In the present experiment, the bearing signals are first decomposed by a (12,12) three-level LWPT.The SampEns of all the 8 components are then calculated as a feature vector to characterize the complexity of the bearing vibration signals and then input to binary tree structure based classifier ensemble to train each classifier involving BP neural network and Elman neural network as well as RBF neural network.The transfer function of hidden layer and output layer neurons of BP neural network employs Logsig and Purelin, respectively.The largest amount of training and the minimum mean square error are chosen as 1000 and 10 −8 , respectively.The hidden layer node number is selected as 10 for BP neural network.The transfer function of hidden layer and output layer neurons of Elman neural network is Tansig and Purelin, respectively.The largest amount of training is chosen as 1000, the minimum mean square error is set as 10 −8 , and the hidden layer nodes number is 10 for Elman neural network.For RBF neural network, the correlation function adopts Newrbe and the value of spread of radial basis function is chosen as 1.

Diagnostic Results from Binary Tree Structure Based
Classifier Ensemble.Due to the advantages of binary tree structured classifier ensemble, the fusion algorithm shown in Figure 3 is adopted.In this context, the bearing condition C1 is treated as one category, while the remaining 9 types of bearing health condition are grouped into the other category in the first node of the binary tree.The tree node employs three different neural networks and MVS fusion strategy to separate condition C1 from other 9 conditions.In the second node, the bearing condition C2 is distinguished from other 8 conditions excluding condition C1 by means of classifier ensemble based on three classifiers and MVS.The above steps are repeated until all the 10 bearing conditions are differentiated.
Given that the initial connection weights and thresholds have an effect on the performance of each member classifier, 100 runs for binary tree structure based classifier ensemble using the same training set and testing set were conducted.The recognition accuracy of 100 runs for binary tree system is shown in Figure 9, where the highest accuracy, the average accuracy, and the lowest accuracy are 100%, 99.53%, and The number of nodes The number of nodes The number of nodes The number of nodes   99.00%, respectively.The recognition accuracy is considerably stable which implies that the classifier ensemble has a good adaptability and high stability.It is demonstrated that the effect of initial connection weights and thresholds on the final recognition accuracy is small and negligible.The proposed binary tree structure based classifier ensemble can effectively judge bearing fault type and severity.
In order to further examine the details of the classification results, the confusion matrix averaged over the 100 tests is shown in Table 2.The cells along the diagonal of the 10 × 10 matrix indicate the percentage of accurately sorted samples, while the other cells of the diagonal reveal the misclassified samples.Utilizing the cells at the second row from the bottom of the matrix in Table 2, for example, it is shown that the samples belonging to bearing condition C9 are misclassified into conditions C7 and C8 by the proportion of 3.5% and 0.2%, respectively.On the other hand, the value 96.3% indicates the proportion of exactly classified samples.Therefore, the values of the cells along the diagonal are expected to be as large as possible.A small value of the cell along the diagonal in the first line indicates a higher risk for a healthy condition misdiagnosed as that with fault, which will lead to unnecessary production downtime.It is observed in Table 2 that false identifications just occurred with conditions C9 and C10 and the remaining conditions are able to be identified correctly.

Comparison with a Single Neural Network.
In order to verify the advantages of the binary tree structured classifier ensemble in fault classification, single classifier was utilized for comparison purpose including BP network, Elman network, and RBF network.The training set, testing set, and the initial settings except weights and thresholds of each member classifier are identical with those used in above classifier ensemble.Considering the initial connection weights and thresholds have an effect on the performance of BP neural network and Elman neural network, the test was repeated 100 times for each classifier using the same training set and testing set, but the weights and thresholds are randomly obtained.Figures 10 and 11 show the 100 test results for the BP neural network and Elman neural network, respectively.The results of RBF neural network vary with its "Spread" value.With the "Spread" varying from 1 to 100 with a step of 1, the test using RBF was repeated 100 times with results shown in Figure 12.
Figure 10 shows that the maximum accuracy, the averaged accuracy, and the minimum accuracy of BP neural network  are 98.00%, 87.14%, and 79.50%, respectively.The test results show the initial connection weights and thresholds have a substantial effect on the performance of BP neural network and lead the accuracy to fluctuate significantly.It is seen from Figure 11 that the maximum, averaged, and minimum recognition accuracy are 93.50%,80.46%, and 70.00% for Elman neural network.The test results demonstrate that randomly selected initial connection weights and thresholds may result in an unfavorable accuracy.Figure 12 exhibits that the maximum accuracy is up to 92.00% when the "Spread" of radial basis function is 15, while the averaged accuracy is 86.07%.The recognition rate gets its minimum values of 84.00% when the value of "Spread" is 53, 63, or within 94 to 100.The test results illustrate that the recognition accuracy is highly sensitive to the "Spread" of radial basis function.The results of classifier ensemble and member classifiers are summarized in Table 3 in the form of minimum, average, and maximum diagnosis accuracy.It is seen that the performance of classifier ensemble is superior to any member classifier in the sense that the classifier fusion has a high average accuracy and the difference between maximum and minimum accuracy is small.The small variation of diagnosis accuracy means the classifier ensemble paradigm is robust to initial parameter selection of member classifiers, which is important for intelligent diagnosis to be used in the fields.The accuracy of member classifier is necessary to be larger than a random assignment in the framework of multiple classifier fusion.The fusion philosophy can give a reasonable result by synthesizing results of the member classifiers.The excellent performance of classifier ensemble can also be ascribed to the conjunct use of the LWPT and SampEn to characterize not only the nonstationarity but also the irregularity and complexity of bearing vibration signals.
The confusion matrixes averaged over the 100 tests are shown in Tables 4, 5, and 6 when only using BP neural network, Elman neural network, and RBF neural network, respectively.Table 4 shows that conditions C8, C9, and C10 show high misidentified probability and poor diagnostic reliability with BP neural network.Table 5 indicates that high misclassifications also lie in conditions C8, C9, and C10 by Elman neural network with the average diagnostic accuracy of 10.35%, 67.75%, and 31.10%,respectively.Such results imply that BP and Elman neural networks have trouble in distinguishing between the three levels of fault severity on rolling element.The false identifications of RBF networks mostly appear in conditions C8 and C10 as shown in Table 6 where the samples belonging to condition C8 are misclassified into conditions C4 and C9 by the proportion of 74.4% and 0.15% and the samples belonging to condition C10 are misclassified into conditions C4, C7, and C9 by the proportion of 62.05%, 0.05%, and 0.2%, respectively.Above comparison indicates that classifier ensemble is robust to initial parameters of networks and can recognize both fault type and fault severity level with a satisfied accuracy by resorting to the effective feature extraction using the LWPT and SampEn.

Conclusions
The current paper presents an intelligent diagnosis method for rolling bearings by integrating the LWPT, SampEn, and  binary tree structure based classifier ensemble.The distinct merits of the diagnosis method lie in the feature extraction methods combining the LWPT with the SampEn as well as the recognition methods by binary tree system based classifier ensemble.Given that bearing vibrations especially in fault conditions demonstrate not only nonstationarity but also irregularity and complexity, vibration signals are decomposed by a three-level LWPT followed by the application of the SampEns to all the 8 components as feature vectors to represent the bearing vibration signals.A multiclassifier fusion algorithm is presented using the form of binary tree, due to the fact that initial connection weights and thresholds have a significant effect on the performance for a single neural network classifier and traditional fusion algorithms for multiple classifier not only require a large number of member classifiers leading to increase of computation effort and decrease of recognition accuracy but also fail to resolve the problem with more than one class winning the highest number of votes.Experimental data are composed of 10 kinds of bearing health conditions including various fault types and severity levels.The results demonstrate the proposed method can effectively improve the recognition accuracy and performance stability for rolling bearing fault diagnosis in comparison with the diagnosis method based on a single classifier.

Figure 2 :
Figure 2: Reconstruction steps of the second-generation wavelet transform.

Figure 4 :
Figure 4: The structure of the proposed fault diagnosis system.

Figure 8 :
Figure 8: Features of 10 bearing conditions extracted by LWPT and SampEn.

Figure 9 :
Figure 9: Testing results of 100 runs for binary tree system.

Figure 10 :
Figure 10: Testing results of 100 runs using BP neural network.

Table 5 :Table 6 :
Averaged confusion matrix of 100 tests for Elman neural network (Averaged confusion matrix of 100 tests for RBF neural network (

Table 1 :
Experimental data condition.

Table 4 :
Averaged confusion matrix of 100 tests for BP neural network (%).