Multifault Diagnosis of Rolling Element Bearings Using a Wavelet Kurtogram and Vector Median-Based Feature Analysis

This paper presents a comprehensive multifault diagnosis methodology for incipient rolling element bearing failures. This is done by combining a wavelet packet transform(WPT-) based kurtogram and a new vector median-based feature analysis technique. The proposed approach first extracts useful features that are characteristic of the bearing health condition from the time domain, frequency domain, and envelope power spectrum of incoming acoustic emission (AE) signals by using a WPT-based kurtogram. Then, an enhanced feature analysis approach based on the linear discriminant analysis (LDA) technique is used to select the most discriminant bearing fault features from the original feature set.These selected fault features are used by aNäıve Bayes (NB) classifier to classify the bearing fault conditions.The performance of the proposedmethodology is tested and validated under various bearing fault conditions on an experimental test rig and compared with conventional state-of-the-art approaches. The proposed bearing fault diagnosis methodology yields average classification accuracies of 91.11%, 96.67%, 98.89%, 99.44%, and 98.61% at rotational speeds of 300, 350, 400, 450, and 500 rpm, respectively.


Introduction
For the past several decades, the development of reliable fault diagnosis systems to accurately detect and classify various bearing faults has been at the heart of research in the field of machine condition monitoring for preventive and predictive maintenance.These fault diagnosis systems aim to accurately detect bearing faults at the early stages of their development in order to prevent potential breakdown of industrial machines, thereby improving their reliability and reducing maintenance costs.
Several signal processing-based methods have been developed for fault diagnosis of rolling element bearings.These methods extract characteristic fault features from the inherently nonstationary signals obtained from faulty bearings.Antoni [1] presented the kurtogram, which has proven to be a powerful method for characterizing and extracting hidden nonstationary features in bearing fault signals [2][3][4].The kurtogram method relies on spectral kurtosis, which was originally proposed by Dwyer [5], to detect the presence of nonstationary transients and accurately indicate their location in the frequency bands of bearing signals.Recently, in attempts to enhance the performance of the kurtogram method for fault diagnosis, many researchers have integrated the kurtogram either with the short-time Fourier transform (STFT) [6] or with multirate filter banks (MRFB) [1].These approaches, however, have yielded little improvement in extracting transient characteristics and have rendered kurtogram analysis more sensitive to irrelevant impulsive components [7].These shortcomings have been addressed by introducing enhanced variations of the kurtogram, such as the one proposed by Shi et al. [8], which uses the complex Morlet wavelet transform.The complex Morlet wavelet transform, however, has its own limitations.These include its high computational complexity and its inability to effectively separate the high frequency components, which usually carry the characteristic fault information.In this paper, we propose a way to improve the kurtogram-based method by using the envelope power spectrum and the wavelet packet transform (WPT).The wavelet packet transform (WPT) is computationally more efficient and highly effective in frequency decomposition.The proposed kurtogram is performed by computing 2 Shock and Vibration kurtosis values from the envelope power spectrum of each subband signal at different levels of decomposition.The subband signal that yields the highest spectral kurtosis value is selected for extracting the features that are characteristic of the bearing fault conditions.
In addition to feature extraction techniques, feature analysis methods have also been widely studied in machine fault diagnosis in order to prevent the degradation of classification performance that is caused by redundant information in the feature space [9][10][11][12][13].Dimensionality reduction techniques have been used to identify the features that effectively represent the high-dimensional data in a lower-dimensional space while retaining the intrinsic information of the bearing defects; these eventually lead to improvements in the classification performance of the fault diagnosis system.Among various dimensionality reduction methods, principle component analysis (PCA) [14], independent component analysis (ICA) [15], and linear discriminant analysis (LDA) [16] are the most popular methods that have been effectively utilized in machinery fault diagnosis.For instance, Widodo and Yang [11] applied the PCA technique to machine health prognostics to extract population characteristics from the available condition monitoring data.Guo et al. [12] investigated the performance of an ICA algorithm in separating the envelopes of vibration signals to extract the weak impulsive features of incipient faults in rolling element bearings.In their study looking into fault diagnosis in ball bearings, Harmouche et al. [13] demonstrated the use of an LDA technique to effectively discriminate the rather weak and otherwise difficult to detect/identify spectral features extracted from the envelope spectra of vibration signals.The LDA feature space yields better separation between different classes of bearing faults than conventional feature analysis methods, thereby resulting in improved classification performance.PCA and ICA are unsupervised methods that ignore the label information during dimensionality reduction, whereas LDA is a supervised method that utilizes the class label information to find an optimal low-dimensional representation of the original feature set while also preserving discriminant information between classes.As a result, LDA yields better classification results as compared to PCA and ICA [9,13,14].This paper presents a new LDA-based feature analysis method that uses a vector median-based discriminant criterion (VMDC) to characterize the intraclass compactness and interclass separability of the feature space in order to select the most discriminative subset of bearing fault features.
This paper proposes a reliable multifault diagnosis scheme for rolling element bearings that combines an improved WPT-based kurtogram and the VMDC-based feature analysis methods.The contributions discussed in this paper can be summarized as follows: spectral kurtosis value is selected for extracting the fault features.
(2) An efficient VMDC-based discriminative fault feature analysis approach is proposed for selecting the most discriminative subset of fault features.A vector median-based discriminant criterion is presented to minimize the intraclass compactness and maximize the interclass separability of the feature space.
(3) A comprehensive multiclass fault diagnosis methodology for rolling element bearings is presented, and its performance is validated using AE data for various single and compound bearing defects acquired under different simulated crack sizes and bearing rotational speeds.
The remainder of the paper is structured as follows.Section 2 describes the overall bearing fault diagnosis methodology, Section 3 introduces the self-designed fault simulator (including the data acquisition system and seeded bearing defects for collecting fault signals), and Section 4 shows the experimental results.Finally, Section 5 concludes this paper.

Proposed Bearing Fault Diagnosis Methodology
The proposed comprehensive bearing fault diagnosis scheme consists of four main processes including signal preprocessing, feature extraction, feature selection, and fault classification.Figure 1 illustrates the overall flowchart of the proposed methodology.
2.1.Preprocessing.The performance of wavelet kurtogram analysis can be improved if the variations between the peaks and flanks of the spectral energy are minimized as shown in Table 6; this can be achieved by preprocessing the AE signal before feature extraction [17].In this study, the prewhitening technique is used to reduce the spectral energy variations in the incoming AE fault signal.To obtain the prewhitened signal, an autoregressive (AR) model () is utilized, which is defined as follows: where () is the incoming bearing signal,   are the AR coefficients,  denotes the order of the AR model, and () is the residual signal representing a spectrum close to the white noise spectrum.

Feature Extraction.
For the purpose of extracting the characteristic fault signatures from the AE signals, this paper presents a two-step feature extraction process that works as follows.
Step 1.In this step, the proposed WPT-based kurtogram method is performed on the prewhitened AE signal to determine its most informative subband.The flowchart of the proposed WPT-based kurtogram is illustrated in Figure 2, and its details are provided below.
First, the prewhitened bearing signal is decomposed into a series of subbands by performing the wavelet packet transform (WPT) with five decomposition levels ( = 5) using a Daubechies 2 (or db2) filter.The five-level WPT decomposition results in a total of 63 subband signals.The envelope spectrum of each subband signal is then calculated using the fast Fourier transform (FFT) to identify the characteristic bearing fault frequencies.The envelope spectrum of an AE signal obtained from a defective bearing is usually flooded with the characteristic defect frequency and its harmonics.We utilize kurtosis to measure the degree of protrusion of the envelope spectrum and then characterize the hidden bearing defect signatures.In order to do this effectively, we first determine the bearing defect frequencies, which include the ball pass frequency of the bearing outer raceway (BPFO), the ball pass frequency of the bearing inner raceway (BPFI), the ball spin frequency of the bearing roller (BSF), and the first  harmonics ( = 3 in this paper) of each of these frequencies.These defect frequencies are defined as follows: where   denotes the number of cylindrical rollers,   is the shaft speed,  is the contact angle, and   and   are the roller and pitch diameters, respectively.Each defectrelated bandwidth BW  ranges from   to   ( = 1, 2, . . ., ), which can be calculated as follows: Here, ER is the error rate of the bearing test rig and   is the cage frequency.Once the bearing fault bandwidths are identified, three different kurtosis values are calculated corresponding to the three bearing defect frequencies (i.e., BPFO, BPFI, and BSF).After obtaining the kurtosis values for the subband signals, the wavelet kurtogram is generated (Figure 3).Finally, for each kurtogram, the subband signal with the highest kurtosis value (i.e., the signal that has the most discriminative fault features) is selected for extracting the bearing fault signatures.In Figure 3(b), the selected subband Shock and Vibration with the highest kurtosis value is highlighted using a dashed rectangle.
Step 2. In this step, 10 time domain ( 1 ,  2 , . . .,  10 ) and three frequency domain statistical features ( 11 ,  12 ,  13 ) are extracted from each of the selected subband signals, as shown in Tables 1 and 2, respectively.Furthermore, we calculate three root mean square (RMS) features ( 14 - 16 ) for the first three harmonics of the corresponding defect frequencies in the envelope spectrum of each of the selected subband signals, as described in Table 3.This results in a feature pool that consists of a total of   ×   ×   features, where   is the number of bearing fault classes,   is the number of AE signals obtained for each fault type, and   is the total number of features.In this case,   is 10 + 3 + 3 = 16 and includes 10 time domain statistical features, three frequency domain statistical features, and three envelope spectrum RMS features (calculated using the extracted wavelet kurtogram) for each of the three bearing defect frequencies (i.e., BPFO, BPFI, and BSF).

Feature Selection.
In this paper, we propose a feature analysis method based on the linear discriminant analysis (LDA) technique plus a vector median-based discriminant criterion (VMDC) to find the most discriminative subset of the extracted fault features for accurate fault diagnosis.LDA is a supervised dimensionality reduction algorithm that utilizes the class label information to find an optimal linear transformation matrix that projects the original high-dimensional feature space onto the lower-dimensional representation while also preserving the information that can help discriminate among different classes.Let  = { 1 ,  2 , . . .,   } ∈  × be a dataset, where  is the original dimensionality and  is the number of samples in .Each sample   belongs to a class   = {1, 2, . . ., }.Let   be the number of samples in class   and let  be the total number of samples in all of the classes.Then, the intraclass scatter matrix   and the interclass scatter matrix   can be evaluated with (4) and ( 5), respectively: Here,   = (1/  ) ∑   ∈    is the mean of the samples labelled as class   and  = (1/) ∑  =1   is the mean of all of the samples.The LDA projects the space of the original variables onto a ( − 1)-dimensional space by maximizing the Fisher discriminant rule [18], which is defined as follows: This discriminant rule maximizes the ratio of interclass separability to intraclass compactness by selecting a reduced number of the most discriminative components only.However, LDA-based feature analysis approaches are limited in their ability to efficiently select the optimal number of discriminative components to gain the highest classification performance.In the existing literature, there is no clear consensus on the optimal number of LDA components that yield the highest classification performance.To address this problem, we present a robust method based on the vector median approach to obtain the optimal number of LDA components that can maximize the classification performance.In order to identify the vector median of each class, this study utilizes a cumulative distance criterion using the Euclidean distance.The vector median VM  of class   is defined as the data point that yields the minimum accumulated distance and is determined as follows: Once the vector medians of all classes are determined, the modified intraclass compactness and interclass separability values are calculated using ( 8) and ( 9), respectively: In (8), the intraclass compactness of a class is reflected by the distance from its vector median to other samples in that class.Alternatively, in (9), the interclass separability of a class from other classes is represented by the minimum distance of the vector median of that class to the vector medians of other classes.The effect of outliers in the feature space is minimized by averaging these distance measurements, which improves the discriminatory power of the proposed feature analysis method.The vector median-based discriminant criterion that is used to select the optimal number of useful LDA components is defined as follows: 2.4.Fault Classification.In this study, a Naïve Bayes classifier [19] is used to classify various single and compound bearing faults.This classifier is selected due to its simplicity and high classification performance at a relatively low computational cost [19,20].The Naïve Bayes classifier has complexity (), where  is the total number of training samples and  is the number of conditional features.Other classification algorithms, such as KNN, ANN, and SVM, have complexities ( 2 ), ( 2 ), and ( 3 ), respectively [20].Furthermore, the Naïve Bayes classifier is adaptive in nature and does not require any fixed parameters (unlike other classification algorithms).
In this study, in order to evaluate the generalized classification performance of the proposed comprehensive bearing fault diagnosis method, -fold cross validation (-cv) [21] is employed.In -cv, the feature pool is randomly partitioned into  mutual subsets ( = 3 in this study).Then, at the th iteration of -cv, one of the subsets is used as the testing dataset and the other ( − 1) subsets are employed to train the Naïve Bayes classifier.The classification performance is averaged over  trials of -cv.The advantage of this technique is that no matter how we divide the data every data sample is in the testing set once and is included in the training set ( − 1) times.Thus, variance in the classification results is minimized.To validate the classification performance of the proposed method, this paper utilizes the average classification accuracy (ACA) and the true positive rate (TPR) as performance measures.These are defined in ( 11) and ( 12), respectively: where  is the number of cross validation folds and

𝑖,𝑗
FN are the number of true positives and false negatives of class   resulting in the th iteration of -cv, respectively.In addition, the number of true positives is defined as the total number of samples in class   that are accurately classified as class   .Alternatively, the number of false negatives indicates the number of samples in class   that are not classified as   .

Experimental Setup
In this study, we used an experimental test rig developed by Intelligence Dynamics Lab (Gyeongsang National University, Korea) to validate the performance of the proposed fault diagnosis methodology, as shown in Figure 4.An acoustic emission (AE) sensor (WS type, from Physical Acoustic Cooperation) was employed to capture continuous AE signals.Table 3 describes the specifications of the data acquisition system used in this study.

Data Preprocessing.
In this subsection, we investigate and justify the preprocessing of the AE fault signals to  improve the classification performance of our methodology.As mentioned earlier, a total of eight single and compound bearing defects were investigated in this study.Figures 6  and 7 show the feature distribution of the optimal LDA discriminant components for four datasets.These datasets were acquired at two different rotational speeds (i.e., 500 rpm and 300 rpm) and for two different bearing crack sizes (i.e., 3 mm and 12 mm).From Figure 6, it is clear that the fault features are not well separated and discrimination between different classes is poor without preprocessing.For instance, the class clusters of the 3 mm crack length and 300 rpm rotational speed datasets are heavily overlapped, which leads to higher error rate in classification.Conversely, preprocessing the original signals to obtain the prewhitened signals reduces overlapping between clusters and improves separation between their boundaries.
Likewise, Figure 7 shows that the boundaries between some class clusters of the original signals are blurred and unclear, whereas the prewhitened signals have clearer and more distinct boundaries.These results strongly suggest that prewhitened signals are able to strongly project the discriminative information that is hidden in the bearing signals.The variation between the peaks and flanks of the spectral energy in spectral kurtosis analysis is one reason why each class of the original signals cannot be clustered accurately.Therefore, a prewhitening process is required to improve separation between different clusters, which makes classification easier and more accurate, even with a Naïve Bayes classifier.

Proposed VMDC-Based Feature Analysis Method.
In order to validate the effectiveness of the proposed feature analysis method, this study compares the feature selection performance (in terms of the classification accuracy) of the proposed method and the original LDA for 10 different datasets, as shown in Figures 8 and 9.The vertical axis represents the average classification accuracy calculated for the classification results of different fault types through  trials of -cv, and the horizontal axis represents the number of LDA discriminant components that are selected.There is no general consensus about the appropriate number of LDA components that always guarantees the highest classification accuracy.As evident in Figures 8 and 9, the LDA method achieves its highest classification accuracy when the number of LDA discriminant components reaches seven.Nevertheless, there are some exceptions.For example, in Figure 8 (for the dataset of a 3 mm length crack and a rotational speed of    450 rpm), the highest accuracy that can be achieved is 91.67% (with six LDA components); the highest accuracy obtained with seven LDA components is only 90.83%.Thus, the number of LDA components that can achieve the maximum classification accuracy must be determined empirically for each study.On the contrary, the proposed feature analysis approach always shows better average classification accuracies when compared with the original LDA algorithm.For instance, in Figure 9 (for the dataset of a 12 mm length crack and a rotational speed of 300 rpm), the proposed method yields a classification accuracy of up to 92.8%, clearly outperforming the LDA method with any number of components.This can be explained by the fact that our proposed method utilizes the vector-median discriminant criterion, which can adaptively select the optimal number of LDA components to maximize the ratio of interclass separability and intraclass compactness, thereby achieving the maximum discriminatory power in a low-dimensional feature space.This enables the proposed approach to always achieve the best classification performance.The analysis of these results highlights the main contributions of this study in analyzing and selecting the optimal features from the original feature pool to achieve superior classification performance as compared to conventional feature analysis algorithms.Using the LDA method, the number of LDA components that yield the best classification results must be determined empirically on a dataset to dataset basis, whereas the proposed VMDCbased approach does this adaptively.

Classification Performance.
In this section, we compare the classification performance of the proposed methodology with other state-of-the-art feature analysis approaches (i.e., PCA, ICA, and LDA).The optimal number of components for PCA, ICA, and LDA was determined by utilizing the training set of -cv fold at each iteration and finding the number of components that yielded the highest accuracy for each of these methods.The final classification results, which have been calculated through  folds of -cv, are summarized in Tables 4 and 5.We used two evaluation indexes: the true positive rate (TPR) and the average classification accuracy (ACA).From the figures in Tables 4 and 5, it is clear that the proposed feature analysis scheme consistently outperforms the conventional approaches of PCA, ICA, and LDA for all of the datasets that were analyzed in this study.For instance, Table 5 reveals that the TPR of the proposed method is around 100% for almost all bearing faults (except for the BCR fault condition, where it ranges between 93 and 97% but is still higher than that of PCA and ICA).
In Table 4, it can be observed that the proposed method achieves good classification performance even though the feature clustering of the 3 mm length crack datasets was poor and not well separated as compared to the 12 mm length crack datasets.This comparatively high classification performance demonstrates that the proposed approach is more effective in feature analysis as compared to other conventional methods.
Our approach is also able to yield maximum discrimination in a lower-dimensional feature space.

Conclusion
This paper proposed a comprehensive multifault diagnosis methodology based on AE analysis to detect multiple localized bearing faults of a rolling element bearing.The method comprises a prewhitening step, a feature extraction based on wavelet kurtogram, a useful feature selection with the proposed VMDC-based feature analysis, and a fault classification by employing the NB classifier.One of the main advantages of the proposed methodology is that the subband of the nonstationary AE signal, which carries the most sensitive information related to fault characteristic, is adaptively identified without any prior knowledge of the bearing fault type by using an enhanced WPT-based kurtogram algorithm.In addition, the proposed VMDCbased discriminative feature analysis approach efficiently identifies optimal features where their discriminatory power is maximized.The proposed methodology was evaluated by an experimental setup using a healthy bearing and seven damaged ones under different rotational speeds and different simulated crack sizes.Experimental results indicated that the proposed multifault diagnosis methodology achieves the highest classification accuracy up to 91.11%, 96.67%, 98.89%, 99.44%, and 98.61% under bearing rotational speeds of 300 rpm, 350 rpm, 400 rpm, 450 rpm, and 500 rpm, respectively.

Figure 3 (
a) shows the frequency-scale paving of the proposed wavelet kurtogram, while Figure 3(b) illustrates an example color kurtogram of the three bearing fault frequencies (i.e., BPFO, BPFI, and BSF), which represents the kurtosis values of all of the subbands.

Figure 3 :
Figure 3: The proposed wavelet kurtogram.(a) The frequency-scale paving of the proposed wavelet kurtogram.(b) An example of a proposed wavelet kurtogram.

Figure 4 :
Figure 4: Experimental test rig for fault diagnosis of rolling element bearings [10].

Figure 6 :
Figure 6: Feature distribution spaces with a bearing crack size of 3 mm.(a) Before prewhitening and (b) after prewhitening.

Figure 7 :
Figure 7: Feature distribution spaces with a bearing crack size of 12 mm.(a) Before prewhitening and (b) after prewhitening.

Figure 8 :
Figure 8: Performance of the proposed feature analysis at a bearing crack size of 3 mm.

Figure 9 :
Figure 9: Performance of the proposed feature analysis at a bearing crack size of 12 mm.

Table 1 :
Time domain statistical feature parameters.

Table 2 :
Frequency domain statistical feature parameters.

Table 3 :
Envelope spectrum statistical feature parameters.

Table 5 :
Performance comparison between conventional feature analysis approaches and the proposed feature analysis method in terms of the average classification accuracy and the true positive rate at a bearing crack size of 3 mm (unit: %).

Table 6 :
Performance comparison between conventional feature analysis approaches and the proposed feature analysis method in terms of the average classification accuracy and the true positive rate at a bearing crack size of 12 mm (unit: %).