A Novel Method of Fault Diagnosis for Rolling Bearing Based on Dual Tree Complex Wavelet Packet Transform and Improved Multiscale Permutation Entropy

A novel method of fault diagnosis for rolling bearing, which combines the dual tree complex wavelet packet transform (DTCWPT), the improved multiscale permutation entropy (IMPE), and the linear local tangent space alignment (LLTSA) with the extreme learning machine (ELM), is put forward in this paper. In this method, in order to effectively discover the underlying feature information, DTCWPT, which has the attractive properties as nearly shift invariance and reduced aliasing, is firstly utilized to decompose the original signal into a set of subband signals. Then, IMPE, which is designed to reduce the variability of entropy measures, is applied to characterize the properties of each obtained subband signal at different scales. Furthermore, the feature vectors are constructed by combining IMPE of each subband signal. After the feature vectors construction, LLTSA is employed to compress the high dimensional vectors of the training and the testing samples into the low dimensional vectors with better distinguishability. Finally, the ELM classifier is used to automatically accomplish the condition identification with the low dimensional feature vectors.The experimental data analysis results validate the effectiveness of the presented diagnosis method and demonstrate that this method can be applied to distinguish the different fault types and fault degrees of rolling bearings.


Introduction
Rolling bearings are one of the most widely used parts in rotating machineries because they affect the operation reliability, the performance precision, and the service life of the entire equipment.The failures of rolling bearings may cause catastrophic accidents and result in great loss.Therefore, condition monitoring and fault diagnosis for rolling bearings are of great significance in engineering application [1,2].
Due to the factors such as friction, strike, and structure transmutation, the vibration signals of bearings are often characterized by nonlinearity and nonstationarity.And the major challenge for bearing condition monitoring and fault diagnosis is to acquire the reliable and sensitive features from the vibration signals [3].In recent years, with the development of nonlinear dynamic theories, a series of nonlinear parameter estimation techniques have been investigated and introduced to the field of bearing condition monitoring and fault diagnosis.For example, the correlation dimension was chosen as a tool for discovering the fault features of bearings by Kang et al. [4].Unfortunately, the estimation of correlation dimension usually requires sufficient data, which prevents this technology from being widely used.Yan and Gao [5] applied approximate entropy (AE) to monitor the bearing condition.However, AE depends heavily on the signal length and the calculated value is uniformly smaller than the expected one when processing short term signals [6].Later, the sample entropy (SE) was proposed by Richman and Moorman [7] to overcome the drawback of AE.The collected signals from bearing systems usually consist of multiple temporal scale structures.But AE and SE both evaluate the complexity of signal at a single scale.Hence, these two approaches have limited performance in analyzing the bearing signal.Considering the disadvantage of the single scale analysis, the multiscale entropy (MSE) was developed by Costa et al. [8] to estimate the complexity of time series over a range of scales, and this technology was used by Zhang et al. [9] to extract the features of bearing signal.However, the 2 Mathematical Problems in Engineering estimation of MSE is easily affected by the outliers of signal, and the computational efficiency of MSE is very low for long term signals.
In literature [10], a new kind of entropy named permutation entropy (PE) was proposed to measure the complexity and detect the dynamic changes of signal.Compared with AE and SE, the calculation of PE is simple and immune to noises.But similar to AE and SE, PE also conducts the entropy measure in a single scale.Then the multiscale permutation entropy (MPE) method based on PE was further proposed by Aziz and Arif [11] to depict the multiple temporal scale structures of the signal.And this method was, respectively, applied by Li and Zheng to bearing fault diagnosis [12,13].Nevertheless, the analysis results of MPE are usually unstable for short term signals.Recently, a novel method called improved multiscale permutation entropy (IMPE) was proposed by Azami and Escudero [14] to remedy the weakness of MPE and the effectiveness of this method has been verified by the simulated signal and the real biomedical signal.In view of the advantage of IMPE in digging the inherent features of signal, this method is introduced to the field of fault diagnosis and utilized to identify the condition of rolling bearing in this paper.
Usually, the collected bearing signals are more or less contaminated by external environmental noises, and the interference between the components in the complicated signal is inevitable.These factors lead to the difficulty of feature information extraction using IMPE method directly.And it will be of benefit for the subsequent analysis procedure if the original signals are processed in advance.
Up to now, many signal processing techniques such as empirical mode decomposition (EMD), local mean decomposition (LMD), and discrete wavelet transform (DWT) have been developed and applied in different research fields.As a kind of adaptive signal processing method, EMD could decompose a signal into a set of intrinsic mode functions.However, EMD lacks a forceful mathematical framework and has the problems of mode mixture and end effects.Similar to EMD, the LMD algorithm also involves these drawbacks which have not been fundamentally addressed [15].DWT is a classic signal processing tool and has been widely used to analyze the mechanical fault signals, but its disadvantages of shift variance and frequency aliasing may cause the loss of useful information.Then Kingsbury [16] proposed the dual tree complex wavelet transform (DTCWT) method, which possesses many excellent properties such as nearly shift invariance, good directional selectivity, and reduced aliasing in comparison with DWT.However, DTCWT cannot achieve multiresolution analysis in the high frequency region where the useful feature information usually exists.As a kind of expansion of DTCWT, dual tree complex wavelet packet transform (DTCWPT) is developed to offset this shortcoming [17].After performing DTCWPT on the collected signal, precise frequency band partitions over the whole analyzed frequency domain could be achieved and the corresponding subband signals could be obtained.And it will be more effective to discover the feature information of the original signal by analyzing the subband signals using IMPE.Based on the above analysis, DTCWPT is combined with IMPE for the first time in this study for bearing fault diagnosis.
After feature extraction using DTCWPT and IMPE, the acquired feature vectors need to be fed into the classifier to achieve condition recognition.However, the acquired vectors with high dimension inevitably contain redundancy information.It is time-consuming and may lead to a decline in the diagnostic accuracy if the entire vectors are adopted as the inputs of classifier.Then the manifold learning algorithm named linear local tangent space alignment (LLTSA) [18] is employed in this paper to reduce the dimension of vectors.By using LLTSA, the high dimensional feature vectors are automatically compressed and sensitive feature vectors with lower dimension could be obtained, which will not only reduce the computational burden but also improve the diagnostic precision.
Naturally, an intelligent classifier is needed to automatically distinguish the bearing condition based on the obtained sensitive feature vectors.Extreme learning machine (ELM) [19] is a novel powerful intelligent machine learning approach based on single hidden layer feed-forward networks.Compared with some classic machine learning methods such as support vector machines (SVM), artificial neural network (ANN), and  nearest neighbor classifier (KNNC), the main advantages of ELM lie in better generalization ability on the small samples, faster learning speed, and less human intervention.Thus, in this paper, ELM is utilized to distinguish the bearing condition.
The rest of this paper is organized as follows.Section 2 proposes the feature extraction method based on DTCWPT and IMPE.Section 3 presents the feature dimension reduction method based on LLTSA.Section 4 briefly introduces the ELM classifier.Section 5 illustrates the detailed procedures of the proposed diagnosis method.In Section 6, the proposed method is applied to rolling bearing experimental data and some comparisons are made.Finally, conclusions are drawn in Section 7.

Feature Extraction Based on DTCWPT and IMPE
Fault diagnosis for rolling bearings is comprised of feature extraction and pattern recognition.Feature extraction is the most important part during the fault diagnosis, because the bearing conditions are identified according to the extracted features.Aiming to take the advantages of DTCWPT in processing the nonstationary and nonlinear signal and meanwhile utilize the capability of IMPE in characterizing the property of signal, these two methods are combined together to extract the feature information from the bearing signal.

A Brief View of DTCWPT.
DTCWPT is an enhancement to the traditional discrete wavelet packet transform (WPT).
In the decomposition and the reconstruction process of DTCWPT, two parallel WPTs with different low pass and high pass filters in each level are utilized.These can be, respectively, regarded as the real tree and the imaginary tree in DTCWPT algorithm.And information complementation can be achieved in the process of signal processing [20].The decomposition process of DTCWPT is implemented through a set of low pass and high pass filters recursively as follows.
Real tree decomposition is as follows: Imaginary tree decomposition is as follows: where ℎ 0 and ℎ 1 , respectively, represent the low pass and the high pass filters used by WPT of the real tree, while  0 and  1 are the low pass and the high pass filers used by WPT of the imaginary tree. Re , and  Im , , respectively, denote the coefficients in the real tree and the imaginary tree at the th level, th node.When level  = 0, coefficients  Re , and  Im , are the original signal (); namely,  Re 0,0 =  Im 0,0 = ().The decomposition process of DTCWPT is illustrated in Figure 1.
The corresponding reconstruction operation of DTCWPT is as follows.
Real tree reconstruction is as follows: Imaginary tree reconstruction is as follows: where h0 and h1 , respectively, represent the low pass and the high pass reconstruction filters used by WPT of the real tree, while g0 and g1 denote the low pass and the high pass reconstruction filers used by WPT of the imaginary tree.
where 0 ≤   ≤ ℎ − 1 and   ̸ =   .There are ℎ! kinds of different permutation types for ℎ dimensional vector.For each permutation type  ∈  ( denotes the set of all permutation types), () demonstrates the relative frequency as follows: Then PE of time series  is calculated as follows: (1) The original time series is firstly divided into several coarse-grained series  ()   according to (8) and the schematic of this procedure is shown in Figure 2: where  denotes the scale factor.
(2) The PE of each coarse-grained series is calculated based on ( 6) and ( 7) and then plotted as a function of the scale factor , which can be expressed as follows:

Improved Multiscale Permutation Entropy.
From Figure 2, it can be found that the coarse-grained procedure in MPE method can be considered as the procedure of averaging the original time series within a -length window and then downsampling by a scale factor of .However, the imprecise and unreliable results may occur in the process of downsampling at a certain scale [14].To overcome the drawback of MPE, IMPE algorithm is proposed, and the calculation steps are as follows.
(1) For a defined scale factor , the original time series is divided into  different coarse-grained series The schematic of the coarse-grained procedure for scale factors  = 2 and  = 3.
(2) Calculate the PE of each coarse-grained series  ()  ( = 1, . . ., ) corresponding to the scale factor  separately.Then, IMPE could be obtained based on the average value of PE: 2.3.Feature Extraction.Usually, the collected signals of the bearings with local defect are complicated and the interference between the components in the signal is inevitable.Besides, the differences among the original signals of the bearings in various operating conditions may be subtle.These factors will result in the difficulties of feature information extraction.Then the signal processing procedure combining DTCWPT with IMPE is presented to address this issue.As a useful tool for signal processing, DTCWPT is suitable for analyzing the complicated bearing signals.The original signal could be decomposed into several subband signals using DTCWPT, and the subband signal will be simpler than the original signal.Then the interference between the components in each subband signal will be slighter than that in the original signal.And the hidden features in the original signal will be easier to be discovered by analyzing the subband signals.Therefore, DTCWPT is regarded as a preprocessing technology to analyze the original signal.And the IMPE algorithm, which can effectively evaluate the complexity and detect the dynamic changes of the signal, is used in the subsequent analysis process.
After performing DTCWPT on the original signal, each node of the wavelet packet coefficients is reconstructed at a single level and the corresponding subband signals could be obtained.Then IMPE is further used to calculate the PE values of each subband signal at different scales.If the decomposition level of DTCWPT is  and the scale factor of IMPE is , then the number of the obtained subband signal is 2  and the number of the calculated PE values of each subband signal is .Therefore, 2  ×  PE values could be obtained for every original signal, and the constructed feature vectors based on these PE values could be used to comprehensively reflect the differences of the signals under different bearing conditions.

LLTSA for Dimension Reduction
For the classifier, the large amount of features will not only increase the computational complexity but also lead to a decline in the classification accuracy.Therefore, the dimension of the obtained feature vectors needs to be reduced.The objective of the dimension reduction in fault diagnosis mainly contains two aspects: (1) removing the disturbed and redundant information within the high dimensional feature vectors; (2) increasing the separability of the samples, namely, making different-class samples far from each other while making same-class samples close to each other.
Based on the previous analysis, in this paper, the LLTSA algorithm is utilized to compress the original vectors into the new vectors with a lower dimension.The basic idea of LLTSA is to use the tangent space in the neighborhood of a data point to represent the local geometry of the feature.Then the local manifold structures of space are lined up to construct the global coordinates [21].
Given a dataset  ORG = [ org1 ,  org2 , . . .,  org ] from Euclidean space   , generally,  ORG , an underlying  dimensional nonlinear manifold   (  ⊂   ) embedded in   ( < ) exists.Then the target problem for LLTSA is to find transformation matrix  which can map the original set where   =  −   / represents the centering matrix,  is the identifying matrix,  is  dimensional column vector of all ones, and  denotes the number of the data.
The LLTSA algorithm procedures are described as follows.
(1) PCA Projection.Project the raw dataset  ORG into the PCA subspace by throwing away the minor components.In order to make it clear,  is used to represent the dataset in the PCA subspace in the following steps and  pca is applied to denote the transformation matrix of PCA.
Due to the good clustering performance of LLTSA, the  dimensional eigenvector set  outputted by LLTSA can be served as the input vectors of the classifier for the pattern recognition.

ELM Classifier
The ELM proposed by Huang et al. [22] is a new and fast machine learning technique based on single layer feedforward networks.A brief description of ELM is as follows.
Given a training dataset with  samples {  ,   }  =1 , where   ∈   is the input vector and   ∈   stands for the target vector, the output of ELM with  hidden neurons can be represented as where (⋅) is the activation function,   is the vector of the link weights between the th hidden neuron and the input layer,   is the vector of the link weights between the th hidden neuron and the output layer,   indicates the bias of the th hidden neuron, and   is the output vector of the th input sample.If ELM can approximate these samples without error, then And ( 16) can be rewritten as Input layer Output layer Hidden layer where  denotes the output matrix of the hidden layer and can be expressed as and  = [ 1 ,  2 , . . .,   ]  is the matrix of the link weights from the hidden layer to the output layer, while  = [ 1 ,  2 , . . .,   ]  is the matrix of the target vectors.Typically,  can be determined by the Moore-Penrose (MP) generalized inverse of : Then, utilizing the MP inverse method, the ELM generalization performance can be achieved.The structure of ELM is displayed in Figure 3.

The Proposed Fault Diagnosis Method
Based on the advantages of DTCWPT, IMPE, LLTSA, and ELM, a novel bearing fault diagnosis method is proposed in this paper, and the flow chart of this method is shown in Figure 4.The detailed procedures are described as follows.
(1) Process the collected samples using DTCWPT and acquire the corresponding subband signals.Considering the tradeoff between the classification accuracy and the computational burden, without loss of generality, the decomposition level of DTCWPT is set to 2 in this study.Then each sample is decomposed into four subband signals after performing DTCWPT.
(2) Apply IMPE algorithm to calculate the PE values of the obtained subband signals at different scales.Before using IMPE, four parameters including the embedding dimension ℎ, the length of signal , the time delay , and the scale factor  need to be set.Since ℎ determines the number of accessible states ℎ!, the estimation of PE relies heavily on the selected embedding dimension.If the dimension is too small, the scheme will not work because there are too few distinct states.When the dimension is too large, it will lead to being time-consuming.To evaluate the complexity of the signal, the embedding dimension ℎ is often chosen by tradeoff between the information loss and the computational burden.In this paper, ℎ is set to 4. The signal length  also influences the estimation of PE.It is noticeable that  should satisfy the criterion  ≥ 5ℎ! which is recommended in literature [23] to obtain a reliable statistics.However, a too large value of  will decrease the computational efficiency.The signal with 1024 points is enough to obtain a reliable result.Therefore, we set  = 1024 in this study.The time delay  has little effect on the calculated result; here we set  = 1.As for the selection of the scale factor , when  is too small, the acquired feature information from the signal will be insufficient.On the other hand, if the scale factor  is too large, the obtained PE values in large scales will be unstable.Taking these constraints into consideration, based on the criterion  ≤ /(ℎ+1)!proposed in literature [14] and the selected  and ℎ, the scale factor  is set to 8 in this paper.
(3) Combine the calculated IMPE of each subband signal and construct the feature vector for each sample.Since each sample is decomposed into four subband signals, and the number of calculated PE is 8 for each subband signal, the dimension of the constructed feature vector is 32; namely, 32 features are extracted for each sample.
(4) Utilize LLTSA algorithm to compress the dimension of the constructed feature vectors and acquire the new feature vectors with lower dimension.In LLTSA algorithm, two parameters including the neighborhood size  and the intrinsic dimension  need to be adjusted.If parameter  is too small, LLTSA cannot well discover the intrinsic structure information of the high dimensional feature vectors.Contrarily, LLTSA will lose the ability of nonlinear dimension reduction.As for the intrinsic dimension , if this parameter is chosen larger than what it really is, much redundant information will be preserved.When it is selected smaller, useful information of the feature vectors will be thrown out during the dimension reduction.For LLTSA, there is an approximate linear relation between the optimal neighborhood size  and the intrinsic dimension  [24].Hence, we choose  =  according to this linear relationship in this paper, using the cross validation method to determine the intrinsic dimension .
(5) Feed the acquired new feature vectors into the ELM classifier for training and testing and distinguish the bearing condition automatically.Compared with some classic classifiers, ELM requires less human interventions.Only the number of the hidden neurons needs to be selected.Generally, as long as the number of hidden neurons is larger than 20, the classification accuracy of ELM will remain stable [25].Therefore, the number of the hidden neurons is set to 20 in this paper.

Analysis on Experimental Data
6.1.Experimental Data Description.The experimental data from Case Western Reserve University are applied to verify the proposed method [26].Figure 5 displays the experimental system, which consists of an electric motor, a torque transducer/encoder, and a dynamometer.The SKF6205-2RS deep groove ball bearing supporting the shaft at the drive end was used in the test.The rolling bearings were seeded with single point defects whose diameters were 0.1778 mm, 0.3556 mm, and 0.5334 mm, respectively, using the electric discharge machining technology.The defects were set on the inner race, the outer race, and the rolling element, respectively.An accelerometer was mounted on the motor housing to collect the vibration signals of the bearings under three different kinds of fault types as well as normal condition.The rotating speed of the motor was 1797 r/min and the sampling frequency was 12000 Hz.Every fault type contains   6, respectively.

Results and Discussions. Since the measured vibration signals of the bearings under different conditions represent
the nonlinear and nonstationary characteristics, it is difficult to distinguish the different fault types and fault degrees only using the time domain waveforms in Figure 6.Therefore, it is very essential to perform an effective method to identify different operating conditions accurately.Then the proposed diagnosis method is applied.Firstly, in order to reduce the interference among the components in the original sample and discover the hidden feature information more effectively, each sample is decomposed to 2 levels using DTCWPT.Then four subband signals containing different frequency band information could be obtained.For the sake of space, only the decomposition results of the samples with the slight inner race fault (Slight-IRF) are shown in Figure 7 as a representative.
According to the flow chart of the proposed diagnosis method indicated in Figure 1, after completing the signal decomposition and reconstruction, the IMPE algorithm is then utilized to extract the features at different scales from each subband signal for each sample.Figure 8 illustrates the IMPE values of subband signal 1, subband signal 2, subband signal 3, and subband signal 4 over 8 scales under 10 conditions.As shown in Figure 8, for each subband signal under different conditions, the divisibility among the PE values is high at some scales, while the differences of the PE values are not obvious at some scales.It is still unable to distinguish the different fault types with various fault degrees from the IMPE curves in Figure 8. Then the feature vector is constructed based on the acquired PE values of four subband signals for each sample.And a multifault classifier is applied to recognize the different bearing conditions.In this paper, the ELM classifier is used to achieve this purpose.
If the constructed feature vectors containing 32 PE values are directly taken as the inputs of the classifier, it will be timeconsuming.Even worse, ELM cannot effectively distinguish the conditions of samples since feature vectors inevitably contain certain interference and redundancy information.Then LLTSA is further employed to compress the high dimensional feature vectors.
Before using LLTSA, an important problem about selecting the intrinsic dimension  of the original feature vectors needs to be addressed.In this paper, this parameter is determined using a fivefold cross validation method [27,28].That is, the 100 training samples are randomly divided into five equal-sized subsets.Each subset is validated on the ELM classifier that was trained using the other four subsets.The process was repeated 5 times; the accuracy rate of the classifier is then obtained by means of averaging the recorded accuracy   rate in each testing fold.Finally, choose parameter  which provides the best classification accuracy.In this paper, the intrinsic dimension  in the five-fold validation varies in the interval [3, /2] with an incremental step size of 1, where  = 32 denotes the dimension of the original feature vectors.Figure 9 shows the curve of the classification accuracy versus the intrinsic dimension.As indicated in Figure 9, the accuracy reaches 100% when the intrinsic dimension is larger than 10.In order to avoid information redundancy as far as possible, we select the intrinsic dimension  = 10; then the neighborhood size  is set to 10 according to the approximate linear relationship between the intrinsic dimension and the neighborhood size.After parameter selections, LLTSA is performed on the constructed feature vectors.Then the original high dimensional feature vectors are projected into a low dimensional space, based on which the new 10 dimensional feature vectors could be obtained.Then feed them into ELM for training and testing.After training the classifier with the 100 feature vectors of the training samples, the remaining 400 feature vectors of the testing samples are used to test the ELM classifier.The classification results of the classifier are shown in Figure 10, where the red asterisks denote the ELM actual output classifications of the samples, while the blue squares represent the desired output classifications.The 100 samples on the left side and the 400 samples on the right side of the dotted line are, respectively, the training samples and the testing samples.It is suggested that, for each sample, the actual ELM output classification is consistent with the desired one.There is no misclassified sample, and the recognition accuracy achieves a perfect level of 100%.The proposed method obtains perfect classification results, which means that this method is exactly suitable and effective in bearing fault diagnosis.

Mathematical Problems in Engineering
In order to verify the advantage of IMPE, as a representative, a comparison is taken between IMPE and MPE by analyzing the 50 independent subband signals of the bearing with the slight inner race fault.The selected parameters in MPE algorithm are the same as those in IMPE algorithm.The comparison results displayed in Figures 10 and  12 provide compelling evidence that IMPE can provide more accurate estimation of entropy values with higher distinguishability than MPE.These analysis results can be explained by the fact that when the MPE algorithm is used to analyze the short term series, the calculation points will be decreased exponentially as the scale factor is increased.It can not only give rise to the questionable and uncertain estimations of the entropy values but also increase the standard deviations of the features.However, the IMPE algorithm is able to avoid the drawbacks of MPE effectively and result in better classification accuracy.
To validate the necessity of the dimension reduction using LLTSA, the constructed original feature vectors without dimension reduction are adopted as the inputs of the ELM classifier for a comparison.The classification results are displayed in Figure 13, from which it can be seen that three testing samples with the rolling element fault are misclassified into the wrong fault degrees.The accuracy is 99.25%, which is lower than that of the method with dimension reduction.In the process of the dimension reduction, LLTSA can get the low dimensional sensitive feature vectors from the high dimensional feature vectors with interference and redundancies.Therefore, the recognition precision of ELM could be improved.It is indicted that the dimension reduction using LLTSA is of benefit for the bearing condition classification.Also, the necessity of this procedure is demonstrated at the same time.
In addition, in order to verify the superiority of the proposed feature extraction method based on DTCWPT, IMPE, and LLTSA, the calculated IMPE values of the original samples are directly taken as the input feature vectors of the ELM classifier.The 100 training samples and the 400 testing samples, as well as the selected parameters, remain the same as mentioned previously.The actual ELM output classification and the desired output classification of all the samples are shown in Figure 14, where 28 testing samples on the right side of the dotted line are misclassified.The recognition accuracy is 93%.It is shown that the extracted features of the samples directly using IMPE cannot completely reflect the distinctions of different bearing conditions.Thus, the obtained classification results of the ELM classifier are unsatisfied.This comparison demonstrates the superiority of the proposed feature extraction method which combines IMPE with DTCWPT and LLTSA due to the abilities of DTCWPT and LLTSA in restraining the interference among the components and highlighting the feature information of the samples.
Finally, the recognition accuracies of ELM, SVM, ANN, and KNNC using different feature extraction methods are compared.The training and the testing samples are the same for each comparison.And the feature vectors taken as the inputs of these classifiers are extracted by four different methods, respectively.The first method is the proposed method used in this paper, that is, the combination of DTCWPT, IMPE, and LLTSA (DTCWPT + IMPE + LLTSA).The second method utilizes MPE instead of IMPE and obtains the feature vectors based on DTCWPT, MPE, and LLTSA (DTCWPT + MPE + LLTSA).The third method extracts the feature vectors   using DTCWPT and IMPE without LLTSA (DTCWPT + IMPE).The last method treats the calculated IMPE of the samples as the feature vectors (IMPE).The parameters of SVM are chosen as follows: the penalty factor  is set to 100 and the RBF kernel parameter  is set to 0.01 [29].The parameters of ANN are selected as follows: the number of the hidden neurons  = 20, the maximum number of the iterations  = 500, the learning rate  = 0.1, and the training error  = 0.001 [30].The neighborhood number  of KNNC is set to 7 [31].The classification results of ELM, SVM, ANN, and KNNC using different feature extraction methods are shown in Table 2 and Figure 15.No matter what kind of method, it can be noted that the classification accuracy of ELM is higher than that of the other three classifiers.This verifies the advantage of ELM in classification performance.
It is suggested from Table 2 and Figure 15 that, using the feature vectors extracted through the first method, the testing accuracies of the classifiers are all higher than those of the classifiers using the feature vectors extracted via the other three kinds of methods.On one hand, for the average testing accuracies of the four classifiers, the first feature extraction  method is 2.19%, 0.62%, and 7.93%, better than the second, the third, and the last method, respectively, which in turn verifies the advantage of the presented feature extraction method based on DTCWPT, IMPE, and LLTSA.

Conclusions
IMPE is a recently proposed novel technique for evaluating the complexity and detecting the dynamic changes of time series.Its application in bearing fault diagnosis is firstly investigated in this work.And a novel fault diagnosis method for rolling bearings combining IMPE with DTCWPT, LLTSA, and ELM is proposed in this paper.Focusing on the nonlinear and nonstationary characteristics of the bearing vibration  signals, DTCWPT is employed to preprocess the signal and obtain the corresponding subband signals.IMPE is then taken as the feature extractor to calculate the PE values of each subband signal at different scales.To solve the dimension reduction problem of the constructed feature vectors, LLTSA is applied to compress the high dimensional vectors and sift out the principal sensitive features used to construct the new low dimensional vectors.Besides, the ELM classifier is adopted to implement the condition identification.For comparison purpose, the presented feature extraction method is compared with other methods.The comparison results indicate that the presented method is able to obtain the feature vectors with a higher divisibility.Also, the classification performance of the ELM classifier is also compared with other widely used classifiers, and the advantage of ELM is verified by the comparison results.The experimental data analysis results demonstrate that the proposed fault diagnosis method in this paper is suitable and effective in recognizing the different fault types and fault degrees of rolling bearings.
In the highly automated industry, since the proposed diagnosis method is data-driven without operators' experiences, it is much easier to be widely used.It is mentioned that the proposed method is a promising approach, which is not limited to rolling bearing fault diagnosis but also could be applied in fault diagnosis of other mechanical equipment.
To some extent, limited by the consumption of computer resources, the proposed diagnosis method may not be satisfactory enough in real time.In addition, only the constant working load is discussed in this paper.If the working load is dramatically changed, the accuracy and the efficiency of the proposed method may be influenced.Consequently, further studies will be focused on solving this problem.

Figure 3 :
Figure 3: The structure of ELM.

Figure 4 :
Figure 4: Flow chart of the proposed method.

Figure 5 :
Figure 5: The rolling bearing experimental system.

Figure 6 :
Figure 6: The time domain waveforms of the samples of the bearings under ten kinds of different conditions.

Figure 11 .
Figure 11.Firstly, the mean curves of the PE values derived from IMPE are really close to those derived from MPE.Secondly, compared with the MPE algorithm, the IMPE algorithm is able to get smaller standard deviations of the PE values.These conclusions can also be drawn through analyzing the subband signals of the bearings under the other conditions.It is indicated that the IMPE algorithm is more stable than the traditional MPE algorithm, which means that IMPE can provide a more accurate PE estimation on the nonlinear and nonstationary signals.To further illustrate the advantage of IMPE, the feature vectors extracted by the processing method based on DTCWPT, MPE, and LLTSA are also fed into the ELM classifier to distinguish the bearing conditions.The actual ELM output classifications and the desired output classifications of the training and the testing samples are shown in Figure12.On the right side of the dotted line, the locations of eight red asterisks are inconsistent with those of the blue squares.It is indicated that eight testing samples are misclassified and the classification accuracy is 98%.It can be easily observed from Figure12that two testing samples with Slight-IRF are misclassified as Medium-IRF, two testing samples with Slight-ORF are misclassified as Medium-IRF and Medium-ORF, a testing sample with Severe-ORF is misclassified as Medium-IRF, and three testing samples with Medium-REF are misclassified as Slight-REF and Medium-IRF, respectively.

Figure 8 :
Figure 8: IMPE of each subband signal over 8 scales under 10 different conditions.

Figure 9 :
Figure 9: Curve of classification accuracy versus intrinsic dimension.

Figure 10 :
Figure 10: Classification results of the proposed method.

Figure 11 :Figure 12 :Figure 13 :Figure 14 :
Figure 11: MPE and IMPE comparison results of the samples of the bearing with slight inner race fault.

Figure 15 :
Figure 15: Classification results of different classifiers with feature vectors extracted by different methods.

Table 1 :
The detailed description of the experimental datasets.

Table 2 :
Classification results of different classifiers with feature vectors extracted by different methods.