Mutual Information-Assisted Wavelet Function Selection for Enhanced Rolling Bearing Fault Diagnosis

This paper presents an enhanced rolling bearing fault diagnosis approach, based on optimized wavelet packet transform (WPT) assisted with quantitative wavelet function selection. Mutual information is utilized as a quantitative measure to select the most suitable wavelet function for the WPT-based vibration analysis. Energy features from coefficients of an optimal set of orthogonal wavelet subspaces which resulted from the WPT-based vibration analysis are input to different classifiers. The fault states of the rolling bearings can then be identified. Experiment studies conducted on a rolling bearing test system have verified the effectiveness of the proposed approach for rolling bearing fault diagnosis.


Introduction
In modern industry, the rotating machine is one of the most important mechanical pieces of equipment and has been widely used in each field of industrial production.The performance of the key rotating components, such as bearings and gears, has a severe influence on the working status of the whole rotating machine [1] so that a weak defect of these components may eventually lead to the machine collapse or breakdown of the production.Therefore, fault diagnosis of these key rotating components has become current research trend.
An important prerequisite for accurate diagnosis is to effectively extract characteristic features, which are indicative of faults, from signals measured on the rotating machine.As an effective tool for nonstationary signal analysis, wavelet transform and its extension, wavelet packet transform, have been widely used in feature extraction for fault diagnosis of rotating machines [2,3].For example, the statistical parameters were extracted from the collected signals obtained via the WPT at different decomposition depths and a support vector regressive (SVR)-based generic multiclass solver was proposed to identify the different fault patterns of rotating machines [4].The WPT was conducted to decompose multiclass signals into a library of time-frequency subspaces and the wavelet packet energy in each subspace was calculated to produce a feature vector for each signal for classification [5].Renyi entropy values from subband coefficients of vibration signals were obtained to detect mechanical faults in rotational drives [6].Such features were found to be sensitive to fault occurrence and robust to varying operating conditions.Sensitive subband feature set from the WPT-based signal decomposition was also extracted for classifying bearing faults [7].In addition, the WPT was used to clean noisy signal before EEMD is applied to extracting informative feature vectors for early damage detection of rolling bearings [8].An alternative empirical mode decomposition (EMD) method improved by WPT was developed to process the fault signals [9].A preprocessing model of the bearing using WPT-EMD was constructed for feature extraction.Then it used self-organization mapping (SOM) for the condition assessment of the performance degradation [10].Combined with manifold learning, the WPT was implemented to extract weak transient signal features for rolling bearing fault diagnosis [11].Recently, a comprehensive review on wavelet transform for fault diagnosis of rotating machines has been conducted [3], where 2 Shock and Vibration the applications were summarized according to the following categories: continuous wavelet transform-based fault diagnosis, discrete wavelet transform-based fault diagnosis, wavelet packet transform-based fault diagnosis, and second generation wavelet transform-based fault diagnosis [3].In addition, some new research trends have been discussed, among which wavelet function selection is considered as one of the important factors that affect the results of wavelet applications.This issue has also attracted more and more attention by the research community.For example, Rafiee et al. [12] used genetic algorithm to select proper Daubechies (Db) wavelet for gear fault diagnosis.They also compared 324 wavelet functions and found that Daubechies 44 wavelet showed the most similar shape across both gear and bearing vibration signals [13].Jiang and Liu [14] applied -test to validating the correlation between features extracted from DWT coefficients and the original signal and then applied the estimated probability from the -test to guiding selection of wavelet functions.Yang and Ren [15] and Chen [16] utilized waveform matching method and similarity coefficient to select the optimal wavelet function for impact signal analysis.Schukin et al. [17] made use of the time-frequency window resolution and the estimation error of impact signal parameters to choose suitable wavelet functions.Li et al. [18][19][20] treated Shannon entropy as a measure to determine optimal wavelet function for lamb wave analysis and damage detection.Furthermore, Yan and Gao [21] investigated the energy measure for wavelet function selection in bearing vibration analysis.Later an energy-to-Shannon entropy ratio measure was also developed by Yan and Gao as a wavelet selection criterion for rotating machine fault diagnosis [22].This criterion, together with maximum relative wavelet energy, was used by Kankar et al. to select appropriate wavelet function from six candidate wavelet functions for bearing defect-related feature extraction [23].The energy-to-Shannon entropy ratio was also chosen by Wu et al. to guide the selection of wavelet functions for fault diagnosis of rolling bearings [24].From these efforts, it can be seen that some of these researchers used qualitative method, such as waveform similarity method, to select wavelet functions, which largely relies on subjective assessment of the researchers.Other researchers designed quantitative measures calculated from wavelet coefficients directly to evaluate the similarity between signals and wavelet functions.In fact, the study of the relationship between wavelet coefficients and raw signals may be another outstanding angle for wavelet function selection, which has rarely been considered in previous studies in the field of fault diagnosis.Therefore, taking this relationship into consideration, this paper is intended to investigate a quantitative wavelet function selection method based on mutual information measure, with specific application for enhanced rolling bearing fault diagnosis.
This paper is organized as follows.After introducing the basic knowledge of the wavelet packet decomposition, which is used as the tool for analyzing bearing vibration signals, the mutual information-based wavelet function selection is illustrated with numerical verification in Section 2. Then Section 3 presents the fault diagnosis method based on an optimized wavelet packet transform.After that, experimental studies are performed on a bearing test system to verify the effectiveness of the proposed method in Section 4. Finally, conclusions are drawn in Section 5.

Wavelet Function Selection
The essence of wavelet transform is to compare the signal () with a set of known functions {  ()} ∈ and quantifies the similarity between the signal and the function.Such an operation can be mathematically expressed as where (⋅) then (1) can be represented as Through variations of the scales and time shifts of the wavelet function, the wavelet transform can extract signal features over the entire signal by using small scales for high frequency components and large scales for low frequency components.Generally, the wavelet transform can be represented in continuous (i.e., continuous wavelet transform) as well as in discrete forms (i.e., discrete wavelet transform).Dyadic discretization of the scaling parameter  and shifting parameter  leads to the formulation of orthogonal basis for the set of function  , (); thus the discrete wavelet transform can be realized.

Wavelet Packet Decomposition.
Wavelet packet decomposition is an extension of discrete wavelet transform and can be obtained by a generalization of the fast pyramidal algorithm [25].Mathematically, a wavelet packet consists of a set of linearly combined wavelet functions, which are generated using the following recursive relationships: where  0 () = () is the scaling function and  1 () = () is the wavelet function.The symbols ℎ() and () represent coefficients of a pair of quadrature mirror filters (QMF) associated with the scaling function and wavelet function.Furthermore, ℎ() and () are related to each other by () = (−1)  ℎ(1 − ).Using the QMF, a time-domain signal () can be decomposed recursively as where    () denotes the wavelet packet coefficients at the th level, th denotes sub-frequency band, and  0 0 () = ().The symbol  represents the number of the wavelet coefficients at the th sub-frequency band within the level . Figure 1 illustrates a 3-level decomposition of the signal ().

Wavelet Selection Criterion.
When the WPT is used to analyze vibration signals for rolling bearing fault diagnosis, a critical issue is to choose the most suited wavelet function for signal decomposition and feature extraction [22].As it is known, the wavelet transform or wavelet packet transform can be considered as a kind of correlation calculation between the signals and the wavelet function in different scales.The more similar the signal and the wavelet function are, the more accurate the feature will be extracted by the wavelet analysis.In information theory, mutual information measures the information that the variables  and  share: it measures how much knowing one of these variables reduces uncertainty about the other.In other words, the mutual information measures the degree of similarity between two groups of data sequences; it is presented in this study to direct the selection of wavelet function.Mathematically, the mutual information is described as where (, ) indicates the joint probability density of data sequences  and , () stands for the probability density of data sequence , and () represents the probability density of data sequence .The mutual information determines how similar the joint distribution (, ) is to the products of factored marginal distribution ()().Furthermore, in (6), − ∑ ∈ ∑ ∈ (, ) log (, ) is the joint entropy of data sequences  and .− ∑ ∈ () log () and − ∑ ∈ () log () represent the Shannon entropy of data sequence  and the Shannon entropy of data sequence , respectively.Therefore, ( 6) can be expressed in another form: This equation indicates that the mutual information is the sum of the entropies () and () minus the joint entropy (, ).The relationships among entropies and mutual information are illustrated in a Venn diagram as shown in Figure 2.
In Figure 2, it can be seen that the mutual information (, ) is the intersection part of two data sequences.The greater the mutual information is, the more similar the two groups of data sequence will be.To evaluate the applicability of the mutual information for wavelet function selection, a simulation study is presented here.In the simulation, data sequence  is the Daubechies 10 wavelet (denoted as Db 10 wavelet).The sample signal () is obtained by adding Gaussian white noise with signal-tonoise ratio (SNR) at 20 dB, which is viewed as data sequence  and decomposed by the wavelet packet transform, while the reconstruction signal is viewed as data sequence .The waveforms of Daubechies 10 wavelet, data sequence , and data sequence  are shown in Figure 3, respectively.Then the mutual information for each wavelet function can be calculated and the results are listed in Table 1.Since the sample signal is constructed by adding Gaussian white noise to the waveform of the Db 10 wavelet, in principle, it should be better characterized by the Db 10 wavelet itself.This is verified by the results shown in Table 1, as the Db 10 wavelet maximizes the mutual information between the sample signal and the reconstruction signal, which validate the effectiveness of the proposed quantitative wavelet selection method.

Fault Diagnosis Based on Optimized Wavelet Packet Transform
With the wavelet function being chosen, the WPT can then be used to decompose the vibration signal for extracting features to characterize rolling bearing faults.It can be seen from Figure 1 that there are multiple ways of decomposing a signal with the WPT.For the purpose of rolling bearing fault diagnosis, the challenge is to find a better representation of the signals that yields high discriminatory information among different bearing conditions.Therefore, an optimized wavelet packet transform (termed as OWPT) is applied to identifying the set of best subspaces that can provide maximum dissimilarity information between different classes of the signals, based on the local discriminant bases (LDB) algorithm [26].
The LDB algorithm is a pruning algorithm that can select an optimal set of complete orthogonal subspaces derived from the WPT.This algorithm has been successfully used in various engineering domains, such as audio signal analysis [27], physiological signal classification [28], and vibration data , where (⋅) represents the dissimilarity measure to characterize the signal classification capability; then the best subspaces  , for  =  − 1, . . ., 0 and  = 0, . . ., 2  − 1 can be obtained through a bottom-up search as follows.
(c) After a complete set of orthogonal subspaces are found in the decomposition results, they can be ranked from higher to lower based on their discrimination power.
It should be noted that the optimal choice of the complete subspaces for a given dataset is significantly affected by the dissimilarity measures used to distinguish between classes, as the dissimilarity measure indirectly controls the classification accuracy.Therefore, a good dissimilarity measure should be able to differentiate various classes as much as possible.Considering that a single dissimilarity measure for the optimal subspace selection may not be able to capture all the characteristic information of the signals, both the relative entropy, which describes the difference of energy distribution in different classes of signals, and the normalized energy difference between different classes of signals are used as the dissimilarity measures in this study, which have been identified as good measures for classification.The details of these two measures can be seen in [30].Features from these subspaces can then be extracted to distinguish different classes in a given set of data that belong to several classes.Therefore, an enhanced rolling bearing fault diagnosis scheme can be designed as shown in Figure 4. Vibration signals from each of the classes are decomposed using the WPT with the selected wavelet function.Then the LDB algorithm is used to identify the set of best subspaces that provide maximum dissimilarity information between different classes of the signals.Once the optimal discriminatory subspaces are identified, the energy feature of wavelet packet coefficients from each subspace is then calculated.These features have better discriminatory capability and are chosen as inputs to a diagnostic classifier for characterizing rolling bearing faults.

Experimental Study
In order to verify the effectiveness of the proposed rolling bearing fault diagnosis approach, experimental study was conducted, where the data were collected on a bearing test system (Figure 5) from the Case Western Reserve University Bearing Data Center [31].The system consists of a 2-horse power (hp) motor, a torque transducer, a dynamometer, and control electronics.Single point faults with the size of 0.18 mm, 0.36 mm, and 0.53 mm were set on the drive-end bearings (Type 6205-2RS JEM SKF) at the location of outer raceway, inner raceway, and rolling element (ball), respectively, using electrodischarge machining technique.The vibration data were measured by using an accelerometer being attached to the motor housing with the sampling frequency of 12 kHz.

Diagnosis of Rolling Bearing Faults under Different Severity
Levels.The first study is to diagnose rolling bearing faults under different severity levels.The vibration signals of rolling bearings with different sizes of inner raceway defect are illustrated in Figure 6.The mutual information for each wavelet function is calculated and the results are listed in Table 2. Considering that the reverse Biorthogonal wavelet    The energy values calculated from the optimal set of orthogonal wavelet subspaces (40 groups of training signals and 20 groups of testing signals, each containing 1,024 data points) are then chosen as inputs to a support vector machine (SVM) classifier for characterizing the rolling bearing defect severity levels.The classification results are listed in Table 3.It can be seen that these features bring about much high classification accuracy score at 100%, which indicates that the developed fault diagnosis method is effective for classifying rolling bearing defect severity.
For the purpose of comparison, the energy features are also input to different classifiers, and Table 4 presents the classification results achieved by four different classifiers.It can be seen that, except for the HMM classifier, the other three classifiers perform well for the selected features in characterizing the rolling bearing faults.Furthermore, control experiments using three different wavelet functions are conducted.It can be seen that the test using the reverse Biorthogonal wavelet 1.3 which is the most suitable wavelet to analyze the rolling bearing signals acquires higher classification accuracy than those using other wavelet functions as shown in Table 5.This verifies the effectiveness of the mutual information measures for wavelet selection.

Diagnosis of Rolling Bearing Faults with Different Locations.
The second study is to diagnose rolling bearing faults at different locations.Figure 7 illustrates the waveform of the vibration signals of rolling bearings with different fault locations under varying load and rotating speed.
Following the same procedure, the energy features extracted from the selected subspaces (40 groups of training signals and 20 groups of testing signals, each containing 1,024 data points) are chosen as inputs to the SVM classifier for distinguishing different faults of rolling bearing.The classification results for rolling bearing with different fault locations are shown in Table 6.It can be seen that the selected features produce the high accurate classification rate with 98.75%, where only one inner raceway fault is misclassified.This indicates that the developed fault diagnosis approach is also effective for bearing fault location classification.
The effects of different classifiers on test results are also studied in the experiment to identify different fault locations.The results shown in Table 7 indicate that the SVM classifier is again good for rolling bearing fault location classification.
Similarly, control experiments using three different wavelets are conducted.In comparison, using the reverse Biorthogonal wavelet 1.3, which is the optimized wavelet function, to analyze the rolling bearing vibration signals has produced a higher classification rate than those using other two wavelet functions as shown in Table 8.This verifies the effectiveness of the quantitative wavelet selection criterion proposed in this paper.

Conclusions
This paper presents a quantitative wavelet selection approach for rolling bearing vibration signal analysis.Based on the quantitative wavelet selection and the optimized wavelet packet transform using LDB algorithm, a new effective approach for rolling bearing fault diagnosis has been developed.The results of the experiment studies indicate that the proposed approach has good ability to diagnose the rolling bearing faults.Furthermore, the comparison experiments also show that the proposed quantitative wavelet selection method improves the classification accuracy, which confirm the performance for the enhanced rolling bearing fault diagnosis approach.
Such a relationship is applicable to wavelet selection for rolling bearing fault diagnosis by taking the vibration signal and wavelet packet coefficients as data sequences  and , respectively.It is expected that defect-induced transient vibration can be fully represented by the wavelet packet coefficients.In this study, raw vibration signals of rolling bearings are viewed as data sequence .The reconstruction signals of the subband which contains the defect-induced transient components are viewed as data sequence .Then the mutual information entropy can be used to evaluate the similarity between original vibration signals and the reconstruction signals using different wavelet functions.By comparison, the wavelet function that maximizes the mutual information between the vibration signal and the reconstruction signal represents the most appropriate wavelet for defect-induced transient vibration extraction.

Figure 4 :
Figure 4: Flowchart of the rolling bearing fault diagnosis scheme.

1. 3 (
denoted as rBio1.3)possesses the highest value, thus it is identified as the most appropriate wavelet to analyze the rolling bearing signals through an optimized wavelet packet transform.

Figure 7 :
Figure 7: Vibration signals of bearings with different fault locations under varying load and rotating speed: (a) normal, (b) inner raceway fault, (c) ball fault, and (d) outer raceway fault (fault diameter: 0.53 mm).

Table 1 :
Mutual information resulted from the simulation signal for different wavelets. =1 with   being the total number of training signals in class , the LDB algorithm can be summarized as follows.
(a) The wavelet packet transform is used to decompose the signals contained in the training dataset, and the time-frequency energy maps   for  = 1, . . .,  on the wavelet packet coefficients can be constructed using the following equation:   (, , ) ≡ ∑ (b) Suppose that  , =  , and set Δ , = ({  (, , )})  =1

Table 2 :
Mutual information of the extracted bearing vibration signal using different wavelets.

Table 3 :
Classification results of rolling bearing faults under different severity levels.

Table 4 :
Comparison for classification results using different classifiers.

Table 5 :
The classification results under different wavelet functions.

Table 6 :
Classification results of rolling bearing faults at different locations.

Table 7 :
Comparison for classification results using different classifiers.

Table 8 :
The classification results under different wavelet functions.