1. Introduction

MPE

Mathematical Problems in Engineering

1563-5147 1024-123X

Hindawi Publishing Corporation

10.1155/2016/5432648

5432648

Research Article

A Novel Method of Fault Diagnosis for Rolling Bearing Based on Dual Tree Complex Wavelet Packet Transform and Improved Multiscale Permutation Entropy

Tang

Guiji

1 Wang

Xiaolong

1 He

Yuling

1 Chen

Wen

School of Energy

Power and Mechanical Engineering

North China Electric Power University

Baoding 071000

China

ncepu.edu.cn

2016

952016

2016 08 02 2016 07 04 2016

2016

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

A novel method of fault diagnosis for rolling bearing, which combines the dual tree complex wavelet packet transform (DTCWPT), the improved multiscale permutation entropy (IMPE), and the linear local tangent space alignment (LLTSA) with the extreme learning machine (ELM), is put forward in this paper. In this method, in order to effectively discover the underlying feature information, DTCWPT, which has the attractive properties as nearly shift invariance and reduced aliasing, is firstly utilized to decompose the original signal into a set of subband signals. Then, IMPE, which is designed to reduce the variability of entropy measures, is applied to characterize the properties of each obtained subband signal at different scales. Furthermore, the feature vectors are constructed by combining IMPE of each subband signal. After the feature vectors construction, LLTSA is employed to compress the high dimensional vectors of the training and the testing samples into the low dimensional vectors with better distinguishability. Finally, the ELM classifier is used to automatically accomplish the condition identification with the low dimensional feature vectors. The experimental data analysis results validate the effectiveness of the presented diagnosis method and demonstrate that this method can be applied to distinguish the different fault types and fault degrees of rolling bearings.

1. Introduction

Rolling bearings are one of the most widely used parts in rotating machineries because they affect the operation reliability, the performance precision, and the service life of the entire equipment. The failures of rolling bearings may cause catastrophic accidents and result in great loss. Therefore, condition monitoring and fault diagnosis for rolling bearings are of great significance in engineering application [1, 2].

Due to the factors such as friction, strike, and structure transmutation, the vibration signals of bearings are often characterized by nonlinearity and nonstationarity. And the major challenge for bearing condition monitoring and fault diagnosis is to acquire the reliable and sensitive features from the vibration signals [3]. In recent years, with the development of nonlinear dynamic theories, a series of nonlinear parameter estimation techniques have been investigated and introduced to the field of bearing condition monitoring and fault diagnosis. For example, the correlation dimension was chosen as a tool for discovering the fault features of bearings by Kang et al. [4]. Unfortunately, the estimation of correlation dimension usually requires sufficient data, which prevents this technology from being widely used. Yan and Gao [5] applied approximate entropy (AE) to monitor the bearing condition. However, AE depends heavily on the signal length and the calculated value is uniformly smaller than the expected one when processing short term signals [6]. Later, the sample entropy (SE) was proposed by Richman and Moorman [7] to overcome the drawback of AE. The collected signals from bearing systems usually consist of multiple temporal scale structures. But AE and SE both evaluate the complexity of signal at a single scale. Hence, these two approaches have limited performance in analyzing the bearing signal. Considering the disadvantage of the single scale analysis, the multiscale entropy (MSE) was developed by Costa et al. [8] to estimate the complexity of time series over a range of scales, and this technology was used by Zhang et al. [9] to extract the features of bearing signal. However, the estimation of MSE is easily affected by the outliers of signal, and the computational efficiency of MSE is very low for long term signals.

In literature [10], a new kind of entropy named permutation entropy (PE) was proposed to measure the complexity and detect the dynamic changes of signal. Compared with AE and SE, the calculation of PE is simple and immune to noises. But similar to AE and SE, PE also conducts the entropy measure in a single scale. Then the multiscale permutation entropy (MPE) method based on PE was further proposed by Aziz and Arif [11] to depict the multiple temporal scale structures of the signal. And this method was, respectively, applied by Li and Zheng to bearing fault diagnosis [12, 13]. Nevertheless, the analysis results of MPE are usually unstable for short term signals. Recently, a novel method called improved multiscale permutation entropy (IMPE) was proposed by Azami and Escudero [14] to remedy the weakness of MPE and the effectiveness of this method has been verified by the simulated signal and the real biomedical signal. In view of the advantage of IMPE in digging the inherent features of signal, this method is introduced to the field of fault diagnosis and utilized to identify the condition of rolling bearing in this paper.

Usually, the collected bearing signals are more or less contaminated by external environmental noises, and the interference between the components in the complicated signal is inevitable. These factors lead to the difficulty of feature information extraction using IMPE method directly. And it will be of benefit for the subsequent analysis procedure if the original signals are processed in advance.

Up to now, many signal processing techniques such as empirical mode decomposition (EMD), local mean decomposition (LMD), and discrete wavelet transform (DWT) have been developed and applied in different research fields. As a kind of adaptive signal processing method, EMD could decompose a signal into a set of intrinsic mode functions. However, EMD lacks a forceful mathematical framework and has the problems of mode mixture and end effects. Similar to EMD, the LMD algorithm also involves these drawbacks which have not been fundamentally addressed [15]. DWT is a classic signal processing tool and has been widely used to analyze the mechanical fault signals, but its disadvantages of shift variance and frequency aliasing may cause the loss of useful information. Then Kingsbury [16] proposed the dual tree complex wavelet transform (DTCWT) method, which possesses many excellent properties such as nearly shift invariance, good directional selectivity, and reduced aliasing in comparison with DWT. However, DTCWT cannot achieve multiresolution analysis in the high frequency region where the useful feature information usually exists. As a kind of expansion of DTCWT, dual tree complex wavelet packet transform (DTCWPT) is developed to offset this shortcoming [17]. After performing DTCWPT on the collected signal, precise frequency band partitions over the whole analyzed frequency domain could be achieved and the corresponding subband signals could be obtained. And it will be more effective to discover the feature information of the original signal by analyzing the subband signals using IMPE. Based on the above analysis, DTCWPT is combined with IMPE for the first time in this study for bearing fault diagnosis.

After feature extraction using DTCWPT and IMPE, the acquired feature vectors need to be fed into the classifier to achieve condition recognition. However, the acquired vectors with high dimension inevitably contain redundancy information. It is time-consuming and may lead to a decline in the diagnostic accuracy if the entire vectors are adopted as the inputs of classifier. Then the manifold learning algorithm named linear local tangent space alignment (LLTSA) [18] is employed in this paper to reduce the dimension of vectors. By using LLTSA, the high dimensional feature vectors are automatically compressed and sensitive feature vectors with lower dimension could be obtained, which will not only reduce the computational burden but also improve the diagnostic precision.

Naturally, an intelligent classifier is needed to automatically distinguish the bearing condition based on the obtained sensitive feature vectors. Extreme learning machine (ELM) [19] is a novel powerful intelligent machine learning approach based on single hidden layer feed-forward networks. Compared with some classic machine learning methods such as support vector machines (SVM), artificial neural network (ANN), and K nearest neighbor classifier (KNNC), the main advantages of ELM lie in better generalization ability on the small samples, faster learning speed, and less human intervention. Thus, in this paper, ELM is utilized to distinguish the bearing condition.

The rest of this paper is organized as follows. Section 2 proposes the feature extraction method based on DTCWPT and IMPE. Section 3 presents the feature dimension reduction method based on LLTSA. Section 4 briefly introduces the ELM classifier. Section 5 illustrates the detailed procedures of the proposed diagnosis method. In Section 6, the proposed method is applied to rolling bearing experimental data and some comparisons are made. Finally, conclusions are drawn in Section 7.

2. Feature Extraction Based on DTCWPT and IMPE

Fault diagnosis for rolling bearings is comprised of feature extraction and pattern recognition. Feature extraction is the most important part during the fault diagnosis, because the bearing conditions are identified according to the extracted features. Aiming to take the advantages of DTCWPT in processing the nonstationary and nonlinear signal and meanwhile utilize the capability of IMPE in characterizing the property of signal, these two methods are combined together to extract the feature information from the bearing signal.

2.1. A Brief View of DTCWPT

DTCWPT is an enhancement to the traditional discrete wavelet packet transform (WPT). In the decomposition and the reconstruction process of DTCWPT, two parallel WPTs with different low pass and high pass filters in each level are utilized. These can be, respectively, regarded as the real tree and the imaginary tree in DTCWPT algorithm. And information complementation can be achieved in the process of signal processing [20]. The decomposition process of DTCWPT is implemented through a set of low pass and high pass filters recursively as follows.

Real tree decomposition is as follows:(1)cl+1,2NRek=∑mh0m-2kcl,NRem,cl+1,2N+1Rek=∑mh1m-2kcl,NRem.

Imaginary tree decomposition is as follows:(2)cl+1,2NImk=∑ng0n-2kcl,NImn,cl+1,2N+1Imk=∑ng1n-2kcl,NImn,where h0 and h1, respectively, represent the low pass and the high pass filters used by WPT of the real tree, while g0 and g1 are the low pass and the high pass filers used by WPT of the imaginary tree. cl,NRe and cl,NIm, respectively, denote the coefficients in the real tree and the imaginary tree at the lth level, Nth node. When level l=0, coefficients cl,NRe and cl,NIm are the original signal x(t); namely, c0,0Re=c0,0Im=x(t). The decomposition process of DTCWPT is illustrated in Figure 1.

Figure 1

Decomposition process of DTCWPT.

The corresponding reconstruction operation of DTCWPT is as follows.

Real tree reconstruction is as follows:(3)cl,NRek=∑mh~0k-2mcl+1,2NRem+∑mh~1k-2mcl+1,2N+1Rem.

Imaginary tree reconstruction is as follows:(4)cl,NImk=∑ng~0k-2ncl+1,2NImn+∑ng~1k-2ncl+1,2N+1Imn,where h~0 and h~1, respectively, represent the low pass and the high pass reconstruction filters used by WPT of the real tree, while g~0 and g~1 denote the low pass and the high pass reconstruction filers used by WPT of the imaginary tree.

2.2. Background of PE, MPE, and IMPE 2.2.1. Permutation Entropy

The permutation entropy (PE) was proposed to detect the dynamic changes of time series based on comparison of neighboring values of time series [13]. The calculation steps of PE are described as follows.

Given a time series Y={y1,y2,…,yN} with the length of N, h dimensional delay embedding vector at the moment t can be constructed as Yth,q={yt,yt+q,…,yt+(h-1)q} (t=1,2,…,N-(h-1)q), where h represents the embedding dimension and q is the time delay. It is thought that Yth,q has a permutation type πm0m1⋯mh-1 if it satisfies(5)yt+m0q≤yt+m1q≤⋯≤yt+mh-1q,where 0≤mi≤h-1 and mi≠mj.

There are h! kinds of different permutation types for h dimensional vector. For each permutation type π∈T (T denotes the set of all permutation types), p(π) demonstrates the relative frequency as follows: (6)pπ=Numbert∣1≤t≤N-h-1q, Yth,q has type πN-h-1q.

Then PE of time series Y is calculated as follows:(7)PEY,h,q=-∑π∈TpπInpπ.

2.2.2. Multiscale Permutation Entropy

The multiscale permutation entropy (MPE) is defined as the PE set of time series at different scales. Considering the time series X={x1,x2,…,xN} with the length of N, the computational procedures of MPE are as follows.

(1) The original time series is firstly divided into several coarse-grained series yj(τ) according to (8) and the schematic of this procedure is shown in Figure 2:(8)yjτ=1τ∑i=j-1τ+1jτxi 1≤j≤Nτ,where τ denotes the scale factor.

Figure 2

The schematic of the coarse-grained procedure for scale factors τ=2 and τ=3.

(2) The PE of each coarse-grained series is calculated based on (6) and (7) and then plotted as a function of the scale factor τ, which can be expressed as follows:(9)MPEX,h,q,τ=PEyjτ,h,q.

2.2.3. Improved Multiscale Permutation Entropy

From Figure 2, it can be found that the coarse-grained procedure in MPE method can be considered as the procedure of averaging the original time series within a τ-length window and then downsampling by a scale factor of τ. However, the imprecise and unreliable results may occur in the process of downsampling at a certain scale [14]. To overcome the drawback of MPE, IMPE algorithm is proposed, and the calculation steps are as follows.

(1) For a defined scale factor τ, the original time series is divided into τ different coarse-grained series zi(τ)={yi,1(τ),yi,2(τ),…} (i=1,2,…,τ) based on the following equation:(10)yi,jτ=∑f=0τ-1xf+i+τj-1τ.

Then the τ different coarse-grained series zi(τ) (i=1,…,τ) corresponding to the scale factor τ are considered in IMPE algorithm, while, in MPE algorithm, only z1(τ) is taken into account.

(2) Calculate the PE of each coarse-grained series zi(τ) (i=1,…,τ) corresponding to the scale factor τ separately. Then, IMPE could be obtained based on the average value of PE:(11)IMPEX,h,q,τ=1τ∑i=1τPEziτ,h,q.

2.3. Feature Extraction

Usually, the collected signals of the bearings with local defect are complicated and the interference between the components in the signal is inevitable. Besides, the differences among the original signals of the bearings in various operating conditions may be subtle. These factors will result in the difficulties of feature information extraction. Then the signal processing procedure combining DTCWPT with IMPE is presented to address this issue.

As a useful tool for signal processing, DTCWPT is suitable for analyzing the complicated bearing signals. The original signal could be decomposed into several subband signals using DTCWPT, and the subband signal will be simpler than the original signal. Then the interference between the components in each subband signal will be slighter than that in the original signal. And the hidden features in the original signal will be easier to be discovered by analyzing the subband signals. Therefore, DTCWPT is regarded as a preprocessing technology to analyze the original signal. And the IMPE algorithm, which can effectively evaluate the complexity and detect the dynamic changes of the signal, is used in the subsequent analysis process.

After performing DTCWPT on the original signal, each node of the wavelet packet coefficients is reconstructed at a single level and the corresponding subband signals could be obtained. Then IMPE is further used to calculate the PE values of each subband signal at different scales. If the decomposition level of DTCWPT is l and the scale factor of IMPE is τ, then the number of the obtained subband signal is 2l and the number of the calculated PE values of each subband signal is τ. Therefore, 2l×τ PE values could be obtained for every original signal, and the constructed feature vectors based on these PE values could be used to comprehensively reflect the differences of the signals under different bearing conditions.

3. LLTSA for Dimension Reduction

For the classifier, the large amount of features will not only increase the computational complexity but also lead to a decline in the classification accuracy. Therefore, the dimension of the obtained feature vectors needs to be reduced. The objective of the dimension reduction in fault diagnosis mainly contains two aspects: (1) removing the disturbed and redundant information within the high dimensional feature vectors; (2) increasing the separability of the samples, namely, making different-class samples far from each other while making same-class samples close to each other.

Based on the previous analysis, in this paper, the LLTSA algorithm is utilized to compress the original vectors into the new vectors with a lower dimension. The basic idea of LLTSA is to use the tangent space in the neighborhood of a data point to represent the local geometry of the feature. Then the local manifold structures of space are lined up to construct the global coordinates [21].

Given a dataset XORG=[xorg1,xorg2,…,xorgN] from Euclidean space Rm, generally, XORG, an underlying d dimensional nonlinear manifold Md (Md⊂Rd) embedded in Rm (d<m) exists. Then the target problem for LLTSA is to find transformation matrix A which can map the original set XORG=[xorg1,xorg2,…,xorgN] in Rm to the set Y=[y1,y2,…,yN] in Rd; that is,(12)Y=ATXORGHN,where HN=I-eeT/N represents the centering matrix, I is the identifying matrix, e is N dimensional column vector of all ones, and N denotes the number of the data.

The LLTSA algorithm procedures are described as follows.

( 1) PCA Projection. Project the raw dataset XORG into the PCA subspace by throwing away the minor components. In order to make it clear, X is used to represent the dataset in the PCA subspace in the following steps and Apca is applied to denote the transformation matrix of PCA.

( 2) Determining the Neighborhood Size. The Euclidean distance matrix for all data points is constructed, and the k nearest neighbors xij (j=1,2,…,k) belonging to the same class of point xi (i=1,2,…,N) are obtained by analyzing the distance matrix.

( 3) Extracting Local Information. Compute the tangent space matrix Vi composed of d eigenvectors of XiHk (Xi=[xi1,xi2,…,xik]) corresponding to its d largest eigenvalues, and Hk=I-eeT/k.

( 4) Constructing Alignment Matrix. Form matrix B by locally summing as follows: (13)BIi,Ii⟵BIi,Ii+WiWiT i=1,2,…,N,where the initialization B=0, and Ii={i1,i2,…,ik} denotes the set of indices for the k nearest neighbors of xi and Wi=Hk(I-ViViT) (i=1,2,…,N).

( 5) Computing the Maps. Compute the eigenvectors and the eigenvalues for the generalized eigenvalue problem as(14)XHNBHNXTa=λXHNXTa.

Then the eigenvector α1,α2,…,αd ordered according to the eigenvalues λ1<λ2<⋯<λd could be obtained, and ALLTSA=(α1,α2,…,αd). Thus, the ultimate transformation matrix is as follows: A=ApcaALLTSA and X→Y=ATXORGHN.

Due to the good clustering performance of LLTSA, the d dimensional eigenvector set Y outputted by LLTSA can be served as the input vectors of the classifier for the pattern recognition.

4. ELM Classifier

The ELM proposed by Huang et al. [22] is a new and fast machine learning technique based on single layer feed-forward networks. A brief description of ELM is as follows.

Given a training dataset with N samples {xi,yi}i=1N, where xi∈Rd is the input vector and yi∈Rs stands for the target vector, the output of ELM with L hidden neurons can be represented as(15)∑i=1Lbigωi·xj+βi=oj j=1,2,…,N,where g(·) is the activation function, ωi is the vector of the link weights between the ith hidden neuron and the input layer, bi is the vector of the link weights between the ith hidden neuron and the output layer, βi indicates the bias of the ith hidden neuron, and oj is the output vector of the jth input sample. If ELM can approximate these samples without error, then(16)∑i=1Lbigωi·xj+βi=yj j=1,2,…,N.

And (16) can be rewritten as(17)HB=Y,where H denotes the output matrix of the hidden layer and can be expressed as(18)H=gω1·x1+β1⋯gωL·x1+βL⋮⋱⋮gω1·xN+β1⋯gωL·xN+βLN×Land B=[b1,b2,…,bk]T is the matrix of the link weights from the hidden layer to the output layer, while Y=[y1,y2,…,yN]T is the matrix of the target vectors. Typically, B can be determined by the Moore-Penrose (MP) generalized inverse of H:(19)B=H∗T.

Then, utilizing the MP inverse method, the ELM generalization performance can be achieved. The structure of ELM is displayed in Figure 3.

Figure 3

The structure of ELM.

5. The Proposed Fault Diagnosis Method

Based on the advantages of DTCWPT, IMPE, LLTSA, and ELM, a novel bearing fault diagnosis method is proposed in this paper, and the flow chart of this method is shown in Figure 4. The detailed procedures are described as follows.

Figure 4

Flow chart of the proposed method.

(1) Process the collected samples using DTCWPT and acquire the corresponding subband signals. Considering the tradeoff between the classification accuracy and the computational burden, without loss of generality, the decomposition level of DTCWPT is set to 2 in this study. Then each sample is decomposed into four subband signals after performing DTCWPT.

(2) Apply IMPE algorithm to calculate the PE values of the obtained subband signals at different scales. Before using IMPE, four parameters including the embedding dimension h, the length of signal N, the time delay q, and the scale factor τ need to be set. Since h determines the number of accessible states h!, the estimation of PE relies heavily on the selected embedding dimension. If the dimension is too small, the scheme will not work because there are too few distinct states. When the dimension is too large, it will lead to being time-consuming. To evaluate the complexity of the signal, the embedding dimension h is often chosen by tradeoff between the information loss and the computational burden. In this paper, h is set to 4. The signal length N also influences the estimation of PE. It is noticeable that N should satisfy the criterion N≥5h! which is recommended in literature [23] to obtain a reliable statistics. However, a too large value of N will decrease the computational efficiency. The signal with 1024 points is enough to obtain a reliable result. Therefore, we set N=1024 in this study. The time delay q has little effect on the calculated result; here we set q=1. As for the selection of the scale factor τ, when τ is too small, the acquired feature information from the signal will be insufficient. On the other hand, if the scale factor τ is too large, the obtained PE values in large scales will be unstable. Taking these constraints into consideration, based on the criterion τ≤N/(h+1)! proposed in literature [14] and the selected N and h, the scale factor τ is set to 8 in this paper.

(3) Combine the calculated IMPE of each subband signal and construct the feature vector for each sample. Since each sample is decomposed into four subband signals, and the number of calculated PE is 8 for each subband signal, the dimension of the constructed feature vector is 32; namely, 32 features are extracted for each sample.

(4) Utilize LLTSA algorithm to compress the dimension of the constructed feature vectors and acquire the new feature vectors with lower dimension. In LLTSA algorithm, two parameters including the neighborhood size k and the intrinsic dimension d need to be adjusted. If parameter k is too small, LLTSA cannot well discover the intrinsic structure information of the high dimensional feature vectors. Contrarily, LLTSA will lose the ability of nonlinear dimension reduction. As for the intrinsic dimension d, if this parameter is chosen larger than what it really is, much redundant information will be preserved. When it is selected smaller, useful information of the feature vectors will be thrown out during the dimension reduction. For LLTSA, there is an approximate linear relation between the optimal neighborhood size k and the intrinsic dimension d [24]. Hence, we choose k=d according to this linear relationship in this paper, using the cross validation method to determine the intrinsic dimension d.

(5) Feed the acquired new feature vectors into the ELM classifier for training and testing and distinguish the bearing condition automatically. Compared with some classic classifiers, ELM requires less human interventions. Only the number of the hidden neurons needs to be selected. Generally, as long as the number of hidden neurons is larger than 20, the classification accuracy of ELM will remain stable [25]. Therefore, the number of the hidden neurons is set to 20 in this paper.

6. Analysis on Experimental Data 6.1. Experimental Data Description

The experimental data from Case Western Reserve University are applied to verify the proposed method [26]. Figure 5 displays the experimental system, which consists of an electric motor, a torque transducer/encoder, and a dynamometer. The SKF6205-2RS deep groove ball bearing supporting the shaft at the drive end was used in the test. The rolling bearings were seeded with single point defects whose diameters were 0.1778 mm, 0.3556 mm, and 0.5334 mm, respectively, using the electric discharge machining technology. The defects were set on the inner race, the outer race, and the rolling element, respectively. An accelerometer was mounted on the motor housing to collect the vibration signals of the bearings under three different kinds of fault types as well as normal condition. The rotating speed of the motor was 1797 r/min and the sampling frequency was 12000 Hz. Every fault type contains three kinds of fault degrees with respect to different defect diameters. Therefore, a ten-condition classification problem for the bearings is investigated in this paper. The collected vibration signals are divided into several nonoverlapping segments with the length of 1024 points, and each segment is a sample. Each bearing condition includes 50 samples, from which 10 samples are randomly selected to train the classifier, while the residual 40 samples are used for testing. The detailed description of the experimental datasets is displayed in Table 1. The time domain waveforms of the samples of the bearings, respectively, with the slight inner race fault (Slight-IRF), the medium inner race fault (Medium-IRF), the severe inner fault (Severe-IRF), the slight outer race fault (Slight-ORF), the medium outer race fault (Medium-ORF), the severe outer race fault (Severe-ORF), the slight rolling element fault (Slight-REF), the medium rolling element fault (Medium-REF), and the severe rolling element fault (Severe-REF), as well as the normal condition, are shown in Figure 6, respectively.

Table 1

The detailed description of the experimental datasets.

Fault type	Fault diameter (mm)	Fault degree	Number of training samples	Number of testing samples	Class label
Normal	0		10	40	1

Inner race	0.1778	Slight	10	40	2
	0.3556	Medium	10	40	3
	0.5334	Severe	10	40	4

Outer race	0.1778	Slight	10	40	5
	0.3556	Medium	10	40	6
	0.5334	Severe	10	40	7

Rolling element	0.1778	Slight	10	40	8
	0.3556	Medium	10	40	9
	0.5334	Severe	10	40	10

Figure 5

The rolling bearing experimental system.

Figure 6

The time domain waveforms of the samples of the bearings under ten kinds of different conditions.

6.2. Results and Discussions

Since the measured vibration signals of the bearings under different conditions represent the nonlinear and nonstationary characteristics, it is difficult to distinguish the different fault types and fault degrees only using the time domain waveforms in Figure 6. Therefore, it is very essential to perform an effective method to identify different operating conditions accurately. Then the proposed diagnosis method is applied.

Firstly, in order to reduce the interference among the components in the original sample and discover the hidden feature information more effectively, each sample is decomposed to 2 levels using DTCWPT. Then four subband signals containing different frequency band information could be obtained. For the sake of space, only the decomposition results of the samples with the slight inner race fault (Slight-IRF) are shown in Figure 7 as a representative.

Figure 7

DTCWPT results of the sample with the slight inner race fault.

According to the flow chart of the proposed diagnosis method indicated in Figure 1, after completing the signal decomposition and reconstruction, the IMPE algorithm is then utilized to extract the features at different scales from each subband signal for each sample. Figure 8 illustrates the IMPE values of subband signal 1, subband signal 2, subband signal 3, and subband signal 4 over 8 scales under 10 conditions. As shown in Figure 8, for each subband signal under different conditions, the divisibility among the PE values is high at some scales, while the differences of the PE values are not obvious at some scales. It is still unable to distinguish the different fault types with various fault degrees from the IMPE curves in Figure 8. Then the feature vector is constructed based on the acquired PE values of four subband signals for each sample. And a multifault classifier is applied to recognize the different bearing conditions. In this paper, the ELM classifier is used to achieve this purpose.

Figure 8

IMPE of each subband signal over 8 scales under 10 different conditions.

If the constructed feature vectors containing 32 PE values are directly taken as the inputs of the classifier, it will be time-consuming. Even worse, ELM cannot effectively distinguish the conditions of samples since feature vectors inevitably contain certain interference and redundancy information. Then LLTSA is further employed to compress the high dimensional feature vectors.

Before using LLTSA, an important problem about selecting the intrinsic dimension d of the original feature vectors needs to be addressed. In this paper, this parameter is determined using a fivefold cross validation method [27, 28]. That is, the 100 training samples are randomly divided into five equal-sized subsets. Each subset is validated on the ELM classifier that was trained using the other four subsets. The process was repeated 5 times; the accuracy rate of the classifier is then obtained by means of averaging the recorded accuracy rate in each testing fold. Finally, choose parameter d which provides the best classification accuracy. In this paper, the intrinsic dimension d in the five-fold validation varies in the interval [3,m/2] with an incremental step size of 1, where m=32 denotes the dimension of the original feature vectors. Figure 9 shows the curve of the classification accuracy versus the intrinsic dimension. As indicated in Figure 9, the accuracy reaches 100% when the intrinsic dimension is larger than 10. In order to avoid information redundancy as far as possible, we select the intrinsic dimension d=10; then the neighborhood size k is set to 10 according to the approximate linear relationship between the intrinsic dimension and the neighborhood size.

Figure 9

Curve of classification accuracy versus intrinsic dimension.

After parameter selections, LLTSA is performed on the constructed feature vectors. Then the original high dimensional feature vectors are projected into a low dimensional space, based on which the new 10 dimensional feature vectors could be obtained. Then feed them into ELM for training and testing. After training the classifier with the 100 feature vectors of the training samples, the remaining 400 feature vectors of the testing samples are used to test the ELM classifier. The classification results of the classifier are shown in Figure 10, where the red asterisks denote the ELM actual output classifications of the samples, while the blue squares represent the desired output classifications. The 100 samples on the left side and the 400 samples on the right side of the dotted line are, respectively, the training samples and the testing samples. It is suggested that, for each sample, the actual ELM output classification is consistent with the desired one. There is no misclassified sample, and the recognition accuracy achieves a perfect level of 100%. The proposed method obtains perfect classification results, which means that this method is exactly suitable and effective in bearing fault diagnosis.

Figure 10

Classification results of the proposed method.

In order to verify the advantage of IMPE, as a representative, a comparison is taken between IMPE and MPE by analyzing the 50 independent subband signals of the bearing with the slight inner race fault. The selected parameters in MPE algorithm are the same as those in IMPE algorithm. Figure 11 represents the mean values and the standard deviations of the PE values, using the IMPE and the MPE algorithms. The following conclusions can be drawn from Figure 11. Firstly, the mean curves of the PE values derived from IMPE are really close to those derived from MPE. Secondly, compared with the MPE algorithm, the IMPE algorithm is able to get smaller standard deviations of the PE values. These conclusions can also be drawn through analyzing the subband signals of the bearings under the other conditions. It is indicated that the IMPE algorithm is more stable than the traditional MPE algorithm, which means that IMPE can provide a more accurate PE estimation on the nonlinear and nonstationary signals.

Figure 11

MPE and IMPE comparison results of the samples of the bearing with slight inner race fault.

To further illustrate the advantage of IMPE, the feature vectors extracted by the processing method based on DTCWPT, MPE, and LLTSA are also fed into the ELM classifier to distinguish the bearing conditions. The actual ELM output classifications and the desired output classifications of the training and the testing samples are shown in Figure 12. On the right side of the dotted line, the locations of eight red asterisks are inconsistent with those of the blue squares. It is indicated that eight testing samples are misclassified and the classification accuracy is 98%. It can be easily observed from Figure 12 that two testing samples with Slight-IRF are misclassified as Medium-IRF, two testing samples with Slight-ORF are misclassified as Medium-IRF and Medium-ORF, a testing sample with Severe-ORF is misclassified as Medium-IRF, and three testing samples with Medium-REF are misclassified as Slight-REF and Medium-IRF, respectively.

Figure 12

Classification results of the method based on DTCWPT, MPE, and LLTSA.

The comparison results displayed in Figures 10 and 12 provide compelling evidence that IMPE can provide more accurate estimation of entropy values with higher distinguishability than MPE. These analysis results can be explained by the fact that when the MPE algorithm is used to analyze the short term series, the calculation points will be decreased exponentially as the scale factor is increased. It can not only give rise to the questionable and uncertain estimations of the entropy values but also increase the standard deviations of the features. However, the IMPE algorithm is able to avoid the drawbacks of MPE effectively and result in better classification accuracy.

To validate the necessity of the dimension reduction using LLTSA, the constructed original feature vectors without dimension reduction are adopted as the inputs of the ELM classifier for a comparison. The classification results are displayed in Figure 13, from which it can be seen that three testing samples with the rolling element fault are misclassified into the wrong fault degrees. The accuracy is 99.25%, which is lower than that of the method with dimension reduction. In the process of the dimension reduction, LLTSA can get the low dimensional sensitive feature vectors from the high dimensional feature vectors with interference and redundancies. Therefore, the recognition precision of ELM could be improved. It is indicted that the dimension reduction using LLTSA is of benefit for the bearing condition classification. Also, the necessity of this procedure is demonstrated at the same time.

Figure 13

Classification results of the method based on DTCWPT and IMPE.

In addition, in order to verify the superiority of the proposed feature extraction method based on DTCWPT, IMPE, and LLTSA, the calculated IMPE values of the original samples are directly taken as the input feature vectors of the ELM classifier. The 100 training samples and the 400 testing samples, as well as the selected parameters, remain the same as mentioned previously. The actual ELM output classification and the desired output classification of all the samples are shown in Figure 14, where 28 testing samples on the right side of the dotted line are misclassified. The recognition accuracy is 93%. It is shown that the extracted features of the samples directly using IMPE cannot completely reflect the distinctions of different bearing conditions. Thus, the obtained classification results of the ELM classifier are unsatisfied. This comparison demonstrates the superiority of the proposed feature extraction method which combines IMPE with DTCWPT and LLTSA due to the abilities of DTCWPT and LLTSA in restraining the interference among the components and highlighting the feature information of the samples.

Figure 14

Classification results of the method based on IMPE.

Finally, the recognition accuracies of ELM, SVM, ANN, and KNNC using different feature extraction methods are compared. The training and the testing samples are the same for each comparison. And the feature vectors taken as the inputs of these classifiers are extracted by four different methods, respectively. The first method is the proposed method used in this paper, that is, the combination of DTCWPT, IMPE, and LLTSA (DTCWPT + IMPE + LLTSA). The second method utilizes MPE instead of IMPE and obtains the feature vectors based on DTCWPT, MPE, and LLTSA (DTCWPT + MPE + LLTSA). The third method extracts the feature vectors using DTCWPT and IMPE without LLTSA (DTCWPT + IMPE). The last method treats the calculated IMPE of the samples as the feature vectors (IMPE). The parameters of SVM are chosen as follows: the penalty factor C is set to 100 and the RBF kernel parameter γ is set to 0.01 [29]. The parameters of ANN are selected as follows: the number of the hidden neurons N=20, the maximum number of the iterations I=500, the learning rate α=0.1, and the training error e=0.001 [30]. The neighborhood number k of KNNC is set to 7 [31]. The classification results of ELM, SVM, ANN, and KNNC using different feature extraction methods are shown in Table 2 and Figure 15. No matter what kind of method, it can be noted that the classification accuracy of ELM is higher than that of the other three classifiers. This verifies the advantage of ELM in classification performance.

Table 2

Classification results of different classifiers with feature vectors extracted by different methods.

Classifier	Testing accuracies with feature vectors extracted by different methods
Classifier	DTCWPT+ IMPE+ LLTSA	DTCWPT+ MPE+ LLTSA	DTCWPT + IMPE	IMPE
ELM	100%	98%	99.25%	93%
SVM	99.75%	97.75%	99.25%	91.5%
ANN	99%	96%	98.25%	90.5%
KNNC	99.5%	97.75%	99%	91.5%
Average of four classifiers	99.56%	97.37%	98.94%	91.63%

Figure 15

Classification results of different classifiers with feature vectors extracted by different methods.

It is suggested from Table 2 and Figure 15 that, using the feature vectors extracted through the first method, the testing accuracies of the classifiers are all higher than those of the classifiers using the feature vectors extracted via the other three kinds of methods. On one hand, for the average testing accuracies of the four classifiers, the first feature extraction method is 2.19%, 0.62%, and 7.93%, better than the second, the third, and the last method, respectively, which in turn verifies the advantage of the presented feature extraction method based on DTCWPT, IMPE, and LLTSA.

7. Conclusions

IMPE is a recently proposed novel technique for evaluating the complexity and detecting the dynamic changes of time series. Its application in bearing fault diagnosis is firstly investigated in this work. And a novel fault diagnosis method for rolling bearings combining IMPE with DTCWPT, LLTSA, and ELM is proposed in this paper. Focusing on the nonlinear and nonstationary characteristics of the bearing vibration signals, DTCWPT is employed to preprocess the signal and obtain the corresponding subband signals. IMPE is then taken as the feature extractor to calculate the PE values of each subband signal at different scales. To solve the dimension reduction problem of the constructed feature vectors, LLTSA is applied to compress the high dimensional vectors and sift out the principal sensitive features used to construct the new low dimensional vectors. Besides, the ELM classifier is adopted to implement the condition identification. For comparison purpose, the presented feature extraction method is compared with other methods. The comparison results indicate that the presented method is able to obtain the feature vectors with a higher divisibility. Also, the classification performance of the ELM classifier is also compared with other widely used classifiers, and the advantage of ELM is verified by the comparison results. The experimental data analysis results demonstrate that the proposed fault diagnosis method in this paper is suitable and effective in recognizing the different fault types and fault degrees of rolling bearings.

In the highly automated industry, since the proposed diagnosis method is data-driven without operators’ experiences, it is much easier to be widely used. It is mentioned that the proposed method is a promising approach, which is not limited to rolling bearing fault diagnosis but also could be applied in fault diagnosis of other mechanical equipment.

To some extent, limited by the consumption of computer resources, the proposed diagnosis method may not be satisfactory enough in real time. In addition, only the constant working load is discussed in this paper. If the working load is dramatically changed, the accuracy and the efficiency of the proposed method may be influenced. Consequently, further studies will be focused on solving this problem.

Competing Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (no. 51307058 and no. 51475164), the Natural Science Foundation of Hebei Province, China (no. 2014502052 and no. 2015502013), and the Fundamental Research Funds for the Central Universities (no. 2015ZD27 and no. 2015XS120).

Tang

G. J.

Wang

X. L.

Y. L.

Diagnosis of compound faults of rolling bearings through adaptive maximum correlated kurtosis deconvolution

Journal of Mechanical Science and Technology 2016 30 1 43 54

10.1007/s12206-015-1206-7

Ding

Luo

A fusion feature and its improvement based on locality preserving projections for rolling element bearing fault classification

Journal of Sound and Vibration 2015 335 367 383

10.1016/j.jsv.2014.09.026

2-s2.0-84912032961

Zhou

Xiao

Zhang

Zhu

Multifault diagnosis for rolling element bearings based on intrinsic mode permutation entropy and ensemble optimal extreme learning machine

Advances in Mechanical Engineering 2014 6 10

803919

10.1155/2014/803919

Kang

J. Q.

Sun

Liu

Y. B.

Fault diagnosis of the rolling bearing based on correlation dimension improved algorithm

Chinese Journal of Scientific Instrument 2006 27 6 404 406

Yan

Gao

R. X.

Approximate entropy as a diagnostic tool for machine health monitoring

Mechanical Systems and Signal Processing 2007 21 2 824 839

10.1016/j.ymssp.2006.02.009

2-s2.0-33750528937

Pan

Y.-H.

Wang

Y.-H.

Liang

S.-F.

Lee

K.-T.

Fast computation of sample entropy and approximate entropy in biomedicine

Computer Methods and Programs in Biomedicine 2011 104 3 382 396

10.1016/j.cmpb.2010.12.003

2-s2.0-80655144739

Richman

J. S.

Moorman

J. R.

Physiological time-series analysis using approximate and sample entropy

American Journal of Physiology—Heart and Circulatory Physiology 2000 278 6 H2039 H2049

2-s2.0-0033949457

Costa

Goldberger

A. L.

Peng

C.-K.

Multiscale entropy analysis of complex physiologic time series

Physical Review Letters 2002 89 6

2-s2.0-1842722362

Zhang

Xiong

G. L.

Liu

H. S.

Zou

H. J.

Guo

W. Z.

Bearing fault diagnosis using multi-scale entropy and adaptive neuro-fuzzy inference

Expert Systems with Applications 2010 37 8 6077 6085

10.1016/j.eswa.2010.02.118

2-s2.0-77951207585

Bandt

Pompe

Permutation entropy: a natural complexity measure for time series

Physical Review Letters 2002 88 17 1741021 1741024

2-s2.0-4243997063

Aziz

Arif

Multiscale permutation entropy of physiological time series

Proceedings of the 9th International Multitopic Conference (INMIC '05)

December 2005

Karachi, Pakistan

IEEE

1 6

10.1109/inmic.2005.334494

2-s2.0-50249116299

Y. B.

M. Q.

Wei

Huang

W. H.

A new rolling bearing fault diagnosis method based on multiscale permutation entropy and improved support vector machine based binary tree

Measurement 2016 77 80 94

10.1016/j.measurement.2015.08.034

2-s2.0-84941884955

Zheng

Cheng

Yang

Multiscale permutation entropy based rolling bearing fault diagnosis

Shock and Vibration 2014 2014 8

154291

10.1155/2014/154291

2-s2.0-84898034251

Azami

Escudero

Improved multiscale permutation entropy for biomedical signal analysis: interpretation and application to electroencephalogram recordings

Biomedical Signal Processing and Control 2016 23 28 41

10.1016/j.bspc.2015.08.004

2-s2.0-84940704029

Zheng

J. D.

Cheng

Yang

A rolling bearing fault diagnosis approach based on LCD and fuzzy entropy

Mechanism and Machine Theory 2013 70 441 453

10.1016/j.mechmachtheory.2013.08.014

2-s2.0-84883808644

Kingsbury

N. G.

The dual-tree complex wavelet transform: a new technique for shift invariance and directional filters

Proceedings of the 8th IEEE Digital Signal Processing Workshop

1998

Bayram

Selesnick

I. W.

On the dual-tree complex wavelet packet and M-band transforms

IEEE Transactions on Signal Processing 2008 56 6 2298 2310

10.1109/tsp.2007.916129

MR2516634

2-s2.0-44849125344

Tang

B. P.

Yang

R. S.

Rotating machine fault diagnosis using dimension reduction with linear local tangent space alignment

Measurement 2013 46 8 2525 2539

10.1016/j.measurement.2013.04.061

2-s2.0-84879074372

Chen

H. L.

Yang

Liu

D. Y.

Liu

W. B.

Liu

Y. L.

Zhang

X. H.

L. F.

Zhu

Using blood indexes to predict overweight statuses: an extreme learning machine based approach

PLOS ONE 2015 10 11

e0143003

10.1371/journal.pone.0143003

J. X.

Zhang

Z. S.

Gong

A novel intelligent method for mechanical fault diagnosis based on dual-tree complex wavelet packet transform and multiple classifier fusion

Neurocomputing 2016 171 837 853

10.1016/j.neucom.2015.07.020

Zhang

T. H.

Yang

Zhao

D. L.

X. L.

Linear local tangent space alignment and application to face recognition

Neurocomputing 2007 70 7–9 1547 1553

10.1016/j.neucom.2006.11.007

2-s2.0-33847374231

Huang

G.-B.

Zhu

Q.-Y.

Siew

C.-K.

Extreme learning machine: theory and applications

Neurocomputing 2006 70 1–3 489 501

10.1016/j.neucom.2005.12.126

2-s2.0-33745903481

Matilla-García

A non-parametric test for independence based on symbolic dynamics

Journal of Economic Dynamics & Control 2007 31 12 3889 3903

10.1016/j.jedc.2007.01.018

2-s2.0-34548826787

X. Q.

Yuan

Hybrid structure for robust dimensionality reduction

Neurocomputing 2014 124 131 138

10.1016/j.neucom.2013.07.019

2-s2.0-84885846120

Wang

H. L.

J. H.

Jiang

Analog circuit fault diagnosis method based on preferred wavelet packet and ELM

Chinese Journal of Scientific Instrument 2013 34 11 2614 2619

2-s2.0-84889259988

Bearing Data Center, Case Western Reserve University, 2006, http://csegroups.case.edu/bearingdatacenter/pages/download-data-file

Y. B.

M. Q.

Wang

R. X.

Huang

W. H.

A fault diagnosis scheme for rolling bearing based on local mean decomposition and improved multiscale fuzzy entropy

Journal of Sound and Vibration 2016 360 277 299

12638

10.1016/j.jsv.2015.09.016

2-s2.0-84944345608

Zhang

Wang

Sun

Yang

Wang

Supervised locally tangent space alignment for machine fault diagnosis

Journal of Mechanical Science and Technology 2014 28 8 2971 2977

10.1007/s12206-014-0704-3

2-s2.0-84906260789

Dong

S. J.

Chen

L. L.

Tang

B. P.

X. Y.

Gao

Z. Y.

Liu

Rotating machine fault diagnosis based on optimal morphological filter and local tangent space alignment

Shock and Vibration 2015 2015 9

893504

10.1155/2015/893504

2-s2.0-84943424264

Zhao

Wang

C. Y.

Yang

X. D.

Wang

F. Y.

Zhang

X. D.

Zhang

Wang

X. Y.

Prostate cancer identification: quantitative analysis of T2-weighted MR images based on a back propagation artificial neural network model

Science China Life Sciences 2015 58 7 666 673

10.1007/s11427-015-4876-6

2-s2.0-84937971169

Wang

J. X.

Tang

B. P.

Tian

D. Q.

Life grade recognition method based on supervised uncorrelated orthogonal locality preserving projection and K-nearest neighbor classifier

Neurocomputing 2014 138 271 282

10.1016/j.neucom.2014.01.037

2-s2.0-84899736273