Evaluation of Effectiveness of Wavelet Based Denoising Schemes Using ANN and SVM for Bearing Condition Classification

The wavelet based denoising has proven its ability to denoise the bearing vibration signals by improving the signal-to-noise ratio (SNR) and reducing the root-mean-square error (RMSE). In this paper seven wavelet based denoising schemes have been evaluated based on the performance of the Artificial Neural Network (ANN) and the Support Vector Machine (SVM), for the bearing condition classification. The work consists of two parts, the first part in which a synthetic signal simulating the defective bearing vibration signal with Gaussian noise was subjected to these denoising schemes. The best scheme based on the SNR and the RMSE was identified. In the second part, the vibration signals collected from a customized Rolling Element Bearing (REB) test rig for four bearing conditions were subjected to these denoising schemes. Several time and frequency domain features were extracted from the denoised signals, out of which a few sensitive features were selected using the Fisher's Criterion (FC). Extracted features were used to train and test the ANN and the SVM. The best denoising scheme identified, based on the classification performances of the ANN and the SVM, was found to be the same as the one obtained using the synthetic signal.


Introduction
The detection of fault in the machinery, in its incipient stage itself, has gained prime importance as it avoids machine down time, catastrophic failure of the machinery, threat to human life, high maintenance costs, and so forth. The fault diagnostic techniques based on the vibration signal analysis have become popular in recent times [1,2]. The problem of the strong noise components masking the weak characteristic signals has always posed challenges to the condition monitoring expert. Several wavelet based signal processing techniques aiming at denoising the measured signal so as to increase the Signal-to-Noise Ratio (SNR) and reduce the Root-Mean-Square Error (RMSE) have been proposed and tried by several researchers [3][4][5][6][7]. The details of the techniques used by some of the researchers have been explained in Section 2.2. The wavelet based denoising technique has gained popularity due to its effectiveness and ease of application [8]. It overcomes the difficulty of determining the resonant frequency of the system. Therefore, the wavelet technique has been adopted in this work for denoising the bearing vibration signals. The detail coefficients, obtained from the Discrete Wavelet Transform (DWT), generally include a large proportion of the high-frequency noise components along with some of the characteristic information of the machine fault. Suitable compression or suppression of these components would remove the noise. The suppressed detail coefficients can then be used along with the original approximation coefficients in reconstructing the decomposed signal, by using the Inverse Wavelet Transform (IWT), which would now be fairly free of the noise [9,10].
The Artificial Neural Networks (ANNs) and the Support Vector Machines (SVMs) have been used to a large extent in the fault diagnosis problems with high success rates. The bearing vibration signals are nonstationary signals and 2 Computational Intelligence and Neuroscience hence a nonlinear mapping from the input space to the output space is required, which is successfully fulfilled by the classifiers like the ANN and the SVM. Several researchers have applied the ANN and the SVM to the bearing fault identification problem. Wang et al. [12] have used the ANN, with difference values of the autoregressive coefficients as inputs, in a rotating machinery fault identification problem. Zarei [13] has proposed to improve the diagnostic abilities of an ANN applied to a four-condition bearing classification problem by using the time domain features alone as the ANN inputs. Kankar et al. [14] have applied the ANN and the SVM to a five-condition ball bearing defect classification problem and obtained high classification accuracies. The SVM is a soft computing tool which performs the tasks executed by an ANN, but with a different approach. The SVM positions a hyper plane between the two classes of data, thus separating the data belonging to the two classes. The ANN's approach is to minimize the error on the training data set which is known as the empirical risk minimization, whereas the SVM's approach is based on the structural risk minimization, in which the upper bound of the generalization error is minimized [15]. Yang et al. [16] have used the energy features extracted from a number of Intrinsic Mode Functions as input vectors to the SVM classifier to diagnose the REB condition. Sugumaran et al. [17] have illustrated the use of a decision tree to identify the best features, extracted from bearing vibration signal, which were given as inputs to the Proximal Support Vector Machine (PSVM) and the SVM. They reported that the PSVM performed better than the SVM. The popularity of the ANN and the SVM classifiers in the REB diagnostics has motivated the authors of this paper to use them in this work.
The objective of any classifier like the ANN or the SVM is to attain a good generalization ability, that is, to exhibit high accuracies on the training and the test data. This calls for the optimal design of the ANN/SVM architecture. One of the requirements of designing an optimal ANN/SVM architecture is to reduce the input dimensionality, that is, to select a few predominantly sensitive features as inputs. This is known as the Dimensionality Reduction Technique (DRT). Researchers have proposed and tried several DRTs. Some of the popular DRTs are Principal Component Analysis (PCA), Fisher's Criterion (FC), Singular Value Decomposition (SVD), Genetic Algorithm (GA), and so forth. Yen and Lin [20] have investigated the effectiveness of the DRTs, namely, Linear Discriminant Analysis and FC for reducing the number of wavelet packet features extracted for analyzing a bearing classification problem. Fuente et al. [21] have used the Fisher's Discriminant Analysis (FDA) for identifying the faults in a real plant in terms of maximizing the scatter between the classes and minimizing the scatter within each class. Chiang et al. [22] and Tang and Li [23] explain the fault diagnosis based on the FDA. Jack and Nandi [24], Samanta et al. [25], and Saxena and Saad [26] have shown in their works that the GA can be effectively used as a DRT along with the optimization of the topology parameters of the ANN/SVM. From the preliminary work carried out by the authors of this paper, it was found that the GA effectively selected the sensitive features, but took a longer time as the GA depended on the performance of the ANN or the SVM to compute the fitness value, that is, for every computation of the fitness value, the ANN or SVM had to be run, making the process time consuming. However, the effectiveness of the FC in selecting the sensitive features was found to be comparable with that of the GA based feature selection, and, more importantly, unlike the GA, FC was independent of the performance of the ANN or SVM. Therefore, in this work, FC has been used as a DRT.
In this paper, the effectiveness of seven different wavelet based denoising schemes have been evaluated in terms of the classification accuracies of the ANN and the SVM on the denoised training and the test data, extracted from the REB vibration signals. Firstly, a synthetic signal (representing the vibration signal of a defective bearing) has been corrupted by a Gaussian white noise and subjected to the seven denoising schemes. Secondly, the real-time bearing vibration signals, measured from a customized bearing test rig under one load and two speed conditions, for four conditions of the bearings, have been subjected to the same denoising schemes. The denoising scheme which provided high SNR and low RMSE in the first part of the work provided high classification accuracies (on the training and the test data) in the second part of the work. The focus of this work was to evaluate the best wavelet based denoising scheme based on the performance of the ANN and the SVM. Figure 1 shows the denoising schemes and the bearing diagnostic procedure employed in this study.

Wavelet Based Denoising
The characteristic vibration signals of the defective bearings are not generally readily available when collected by means Cai-lian et al. [7] of a Data Acquisition (DAQ) system. This is mostly because the noise, influenced by the resonant frequency of the rotating system, masks the characteristic vibration signals. The noises are often stochastic signals whose frequency band will overlap with the interested signals. Therefore, it is difficult to eliminate the noise from the signals effectively by using the traditional filtering methods. Also, the traditional methods of denoising need the knowledge of the parameters which are difficult to be determined [8]. The wavelet based denoising has gained popularity due to its effectiveness and also that it overcomes the difficulties of the traditional denoising methods. The SNR must appreciably increase and the RMSE must become small on a successful application of a denoising method. Suppose that a signal of interest f (n) has been corrupted by the noise z(n), so that we get a signal g(n) as in (1) which resembles the raw signal collected by means of a DAQ system, where z(n) is a unit-variance, zero-mean Gaussian white noise and σ 2 is the variance of the noise. The denoising is a way to recover f (n) from the samples of g(n) as properly as possible. The three-step procedure adopted in the wavelet based denoising is (i) decomposition of the raw signal using the wavelet transform to get the approximation and the detail coefficients, (ii) suppressing the detail coefficients by selecting a suitable threshold value and by applying a suitable thresholding rule, and (iii) reconstructing the signal by applying IWT to the original approximation coefficients and the suppressed detail coefficients to get the denoised signal [9,10]. Several denoising schemes (step ii) have been proposed by researchers [3][4][5][6][7]. In this work, the denoising effectiveness of seven different denoising schemes has been compared. Table 1 gives the list of seven denoising schemes and the researchers who have proposed them.

Conventional Denoising Schemes.
The wavelet denoising method focuses on the selection of the thresholding rules and the determination of the threshold value. Donoho [11] gave two thresholding rules, namely, hard-thresholding (s1) and the soft-thresholding (s2), which are considered to be the conventional wavelet based denoising schemes and they are readily available functions in the Wavelet toolbox of MATLAB. The hard-thresholding scheme is expressed as where x is the wavelet coefficient, y 1 (x) is the corresponding suppressed wavelet coefficient by hard-thresholding, and λ is the threshold value. The soft-thresholding scheme is expressed as where y 2 (x) is the suppressed wavelet coefficient obtained by soft-thresholding and the other terms have the same meaning as in (2).

Modified Soft-Thresholding Schemes Proposed by Different
Researchers. A list of the denoising schemes s3 to s7 proposed by various researchers [3][4][5][6][7] is provided in Table 1. Huaigang et al. [3] have proposed an improved soft-thresholding function as given in (4). According to them, the conventional thresholding functions set the coefficients below the threshold value to zero, but, in their proposed method, these coefficients were tuned by a polynomial function. The coefficients that were below the threshold value and close to it were attenuated to a value less than the far coefficients. For important coefficients, the function was garrote-like, resulting in a more powerful function: where m and k are tuning parameters and the other terms in (4) have the same meaning as in (2). By tuning the parameter k, the thresholding function can be between the hard-and the soft-thresholding functions. By tuning the parameter m, the near-optimum thresholding function is adjusted to the optimum one by applying small changes. As per [3], optimization of the parameter k works similar to a global search and optimization of the parameter m works like a local search in finding the best thresholding function. The authors in [3] have selected m = 2 and 4 and k ∈ [0, 1]. In this work, m = 4 and k = 0.8 have been selected. Fang and Huang [4] have proposed a wavelet trimmed thresholding scheme as given in (5) which was an improved version of the hard-and the soft-thresholding schemes: where α is a parameter and the other terms in (5) have the same meaning as in (2). They suggested that with careful tuning of the parameter α for a particular signal, a best denoising effect could be achieved. When α = 1, it was equivalent to the soft-thresholding and when α → ∞, it was equivalent to the hard-thresholding. Accordingly, in this work, α = 5 has been chosen. Lin and Cai [5] proposed a new threshold function given in (6) which had the advantage of the nonnegative dead zone thresholding: where k is a positive number and the other terms in (6) have the same meaning as in (2). When x → λ, β → 1 and y 6 (x) = x, which overcomes the disadvantage of the softthresholding, and when x → ∞, β → 0 and y 6 (x) → 0, which makes the signal smoother than the hard-thresholding function. A value of k = 0.5 has been chosen in this work, for using this scheme. Zhang et al. [6] have proposed an improved thresholding function given in (7): where u and α are parameters whose proper tuning can provide an effective denoising while the other terms in (7) have the same meaning as in (2). The power u is used in order to enlarge the difference between the signal and the noise, u > 0 (u = 2, 3, 4, . . .). It can be observed that when α = 0, (7) becomes the hard-thresholding function. When the value of α is appropriately chosen between 0 and 1, the effectiveness of the denoising could be optimized. For using this scheme in this work, u = 10 and α = 0.6 have been chosen. In a new thresholding function proposed by Cai-lian et al. [7], suppression of the detail coefficients was done according to (8): This thresholding function depends on the proper selection of the constant λ. It is continuous unlike the conventional soft-thresholding function and is easily differentiable, statistically very reliable, and robust, making it completely suitable for the discrete signal denoising. The optimum value of λ can be determined as proposed in [7], but, in the current work, λ = 0.8 has been chosen by trial and error so as that the ANN's and the SVM's training and test accuracies were maximum.

Wavelet Based Denoising of a Synthetic Signal
The focus of the first part in this work was to apply the seven wavelet based denoising schemes listed in Table 1 to a synthetic signal that represented a defective bearing vibration signal. In order to simulate the vibration signal of a defective bearing, a weak synthetic signal 0.5e −500t sin(10000t) with a defect frequency of 50 Hz was considered. A sampling frequency of 48 kHz was used as the real-time bearing vibration signals were acquired at the same rate. A plot of the synthetic signal is shown in Figure 2(a). It was corrupted with a strong zero-mean Gaussian white noise. A plot of the corrupted signal is shown in Figure 2(b). The energy of the synthetic signal was 41.89 and that of the corrupted signal was 510.77. The expression for computing the signal energy is given in (9), whereas the expressions for computing the SNR and the RMSE for a denoised signal are given in (11), and (12) respectively: where E is the energy of the signal x and N is the length of the signal: where x is the corrupted signal, d is the denoised signal, and N is the length of the signal. The corrupted signal was subjected to the seven wavelet based denoising schemes listed in Table 1. The corrupted signal was subjected to the DWT so as to decompose it into four levels using Daubechies 8 mother wavelet through a customized MATLAB program. According to Nyquist's rule, the maximum frequency of the vibration signal was set to 24 kHz because the sampling frequency was 48 kHz. The frequency bandwidths of the approximation and the detail coefficients of the wavelet decompositions are shown in Figure 3. For each level of the wavelet decomposition, the threshold value λ was determined as per Stein's Unbiased Risk Estimate (SURE), as SURE threshold selection rules are more conservative as expressed in [9]. This threshold value was used for all denoising schemes, except for s7, where it was selected by trial and error. Figure 4 shows the plots of the noise free synthetic signal and the denoised signals by different schemes. The values of E, SNR, and RMSE for the denoised signals are given in Table 2. The objective of signal processing was to increase the SNR and lower the RMSE of a corrupted signal. From Table 2, it can be observed that s7 gave high SNR and low RMSE. The peaks in the original synthetic signal (representing the bearing fault impulses) were identifiable more clearly in s7 denoising scheme when compared to the other schemes (refer to Figure 4).

Wavelet Based Denoising of Real-Time Vibration Signal
In the second part of the work in this paper, the objective was to apply the seven schemes of denoising discussed in Computational Intelligence and Neuroscience  the previous section to the vibration signals collected from a customized bearing test rig for four bearing conditions (N: normal bearing, IR: bearing with defect on inner race, B: bearing with defect on Ball, and OR: bearing with defect on an outer race). A schematic diagram of the customized test rig used for extracting the bearing vibration signals is shown in Figure 5.  the DAQ system. The acceleration signals were collected for 5.08 seconds from a 6205 deep groove ball bearing under a radial load of 1.7 kN and shaft speeds of 356 and 622 rpm. The signals collected from accelerometer-X were considered for analysis, as the signals acquired in Y-direction were not very sensitive to the bearing condition. Each trial of the experiment resulted in a data vector of size 250000 × 1. Figure 6(a) shows the raw vibration signal collected from a bearing with the OR defect for a load of 1.7 kN and a speed of 622 rpm and Figure 6(b) shows the plot of the denoised signal using scheme s7. It is clear from the figures that the selected denoising scheme has been effective in representing the original signal with reduced noise (SNR-1.9279, RMSE 0.0261).

Feature Extraction
Each denoised vibration signal (250000 × 1) was divided into 50 nonoverlapping bins each with 5000 data. From each bin, 30 features were extracted out of which features 1 to 17 (T 1 to T 17 ) were the statistical time domain features and features 18 to 30 (F 1 to F 13 ) were the statistical frequency domain features. This formed a single pattern. Hence, for four conditions of the bearing, two speed conditions and one load condition, a total of 400 patterns (50 × 8) were extracted. The feature set matrix consisted of 30 features × 400 patterns. Each feature was normalized, by dividing each element of the feature by the feature maxima, so as to attain values between 0 and 1. The patterns of the matrix were thoroughly mixed, out of which 300 patterns (75%) were used in the training data set and the remaining 100 patterns (25%) in the test data set. A list of the features extracted from the denoised vibration signal is given in Table 3.

Feature Dimensionality Reduction
The FC has been used as a DRT in this work. The criterion for the FC is based on computing the "separation distance" between the two classes of interest and it depends upon the mean and the standard deviation of the two classes. The separation distance between the two classes as per the FC is given in (12) as suggested by Yen and Lin [20]: where J P,Q k is a measure of the Fisher's Separation Distance between the two classes of the bearing P and Q for the kth feature (P and Q each may be N, IR, B, and OR defect). Mean() and Std() are the mean and the standard deviation.  [20]: where F is a vector of FDPs. The features with higher values of the FDPs form sensitive inputs to the ANN/SVM classifiers. The FDPs computed for all the 30 features have been arranged in a descending order, resulting in a vector F * . In this paper a new method of selecting the number of sensitive features based on a threshold value θ has been proposed. The expression to compute θ is given in (14): 8 Computational Intelligence and Neuroscience  where f is the total number of features extracted (30 in this work) and s is the number of selected features such that the sum of the s largest FDPs divided by the sum of all the FDPs is approximately equal to θ. In this work, a threshold value of θ = 0.85 was chosen. Table 4 shows the FDPs for signals denoised by seven schemes. It can be seen that different denoising schemes have selected different numbers of features (s) based on the threshold value of θ = 0.85.

Performance of the Denoising Schemes Based on the ANN/SVM
In order to evaluate the performance of the different denoising schemes, the ANN/SVM classifiers were trained and tested using two types of inputs based on the features extracted from the denoised signals, namely, (i) the use of all the 30 features as inputs and (ii) the use of the features selected by the FC as inputs. A binary scheme of classification was used to define the bearing condition at the output of  Figure 7 shows the structure of the MLPNN used, where x 1 , x 2 , . . . x n are the inputs (features), n h are the number of nodes in the hidden layer, and w ji and w o j are the connection weights between the input-hidden layers and the hidden-output layers, respectively. The performance of the MLPNN classifier for the two types of inputs (all the features and the features selected by FC) extracted from the denoised signals is shown in Table 5. It is clear from the table that, for signals denoised using scheme s7, the accuracies on training and test data were higher compared to other schemes. The accuracies for different numbers of neurons in the hidden layer, n h = 15, 20, 25, and 30 were comparatively lower and therefore have not been reported in the table.

SVM Classifier.
The SVM classifier used in this work was based on the customized MATLAB tool box provided in [27]. In this work, the SVM was trained and tested for different values of the regularization parameter γ and the kernel width where β is the shape factor and η is the scale factor   [18,19]. σ 2 . Parameter γ was varied from 6 to 10 in steps of 1 and σ 2 was varied from 3 to 4 in steps of 0.25. The performance of the SVM classifier for the two types of inputs (all the features and the features selected by the FC) extracted from the denoised signals is shown in Table 6. It is clear from the table that, for signals denoised using scheme s7, the accuracies on the training and the test data were higher compared to the other schemes. The accuracies for the other values of γ and σ 2 were comparatively lower and are therefore not reported in the table.

Discussion
The focus of this work is to evaluate seven different wavelet based denoising schemes. In order to ascertain the effectiveness of the schemes, a corrupted synthetic signal simulating the real-time bearing vibration signal is used. By observation and also based on the SNR and RMSE, scheme s7 is found to be the most effective in denoising the corrupted signal. In order to evaluate its performance on a real-time bearing vibration signal, signals collected from bearings under four     conditions and single load and two speeds were subjected to the same denoising schemes. The denoised signals were used for extracting the features in the time domain and the frequency domain. The features extracted were subjected to dimensionality reduction using the FC. Both the reduced feature set selected by the FC and the complete feature set were used as inputs to the ANN and the SVM classifiers for comparing the performance of the different denoising schemes. Table 7 shows the performance of the ANN and the SVM classifier for the denoised vibration signal using the s7 scheme. It is clear from the table that the denoising scheme s7 resulted in more than 95% accuracy on the test data using the ANN and more than 80% accuracy on the test data using the SVM. Also for s7 scheme, the reduced feature set obtained using FC resulted in a higher performance in terms of the test data accuracy and the number of epochs when compared to the use of all the features. The proposed method of selecting the number of sensitive features from the vibration data obtained using different denoising schemes based on a threshold θ has been successful. Therefore, the DRT like the FC can be effectively used in improving the performance of the ANN and the SVM classifiers. Hence s7 scheme is found to be effective in denoising the real-time vibration signals, when compared to the other denoising schemes.

Conclusions
This paper presents the evaluation of the effectiveness of the wavelet based denoising schemes using the ANN and the SVM classifiers applied to the bearing condition classification problem. Seven different denoising schemes selected from an extensive literature survey were used for denoising a synthetic corrupted signal resembling a bearing vibration signal. Based on the SNR and the RMSE, the best denoising scheme was selected. This scheme has been applied, along with the other schemes for denoising the real-time vibration signals collected from an REB test rig for four different bearing conditions, one load and two speeds. The features extracted from the denoised signal in the time domain and the frequency domain have been used as inputs to the ANN and the SVM classifiers. In order to reduce the dimension of the feature set, FC is used and the reduced feature set is also used as inputs to the ANN and the SVM. The proposed method of selecting the reduced number of features based on the threshold θ is found to be effective. It is found that the best denoising scheme selected based on the synthetic signal performed better (in terms of the classification accuracies and the number of epochs) with the feature set extracted from bearing vibration signals, when compared to the other denoising schemes. Hence, it can be concluded that s7 scheme can be effectively used to denoise the bearing vibration signals for an efficient classification of its condition.