Bearing Intelligent Fault Diagnosis Based on Wavelet Transform and Convolutional Neural Network

,


Introduction
With the rapid development of science and manufacturing industry, the rotating machinery is constantly improved in the direction of large-scale, continuity, high precision, and intelligence. Rolling bearing is a key and widely used component in rotating machinery. During the operation of bearings, improper assembly, invasion of foreign matters, insufficient lubrication, pitting, and overload may lead to premature failure and damage of the bearing, which will have a serious impact on the mechanical equipment. erefore, it is very important to real-time diagnose the fault of rolling bearing accurately. In recent years, with the continuous improvement of monitoring methods, more vibration signal data can be obtained. Fault diagnosis of rolling bearing has entered the big data era [1].
Bearing fault diagnosis technology is mainly divided into two categories: fault diagnosis based on signal analysis and the one based on intelligent algorithm. e former depends on the analysis of vibration signal manually to realize fault diagnosis. Reza et al. [2] put forward the background noise removal operation by using Recursive Autocorrelation and Autoregressive Analyses, which can make the fault signal more clearly expressed, so as to realize fault diagnosis. Zhang et al. [3] combined Ensemble Empirical Mode Decomposition (EEMD) and Spectral Kurtosis analysis and applied to the fault diagnosis of rolling bearing successfully. Georgoulas et al. [4] used the method of Empirical Mode Decomposition (EMD) and Hilbert Huang transform (HHT) to extract the feature set of bearing fault signal, which can realize the monitoring of fault signal. Li et al. [5] extracted 12 sensitive features of rolling bearing fault signals for fault detection. Lei et al. [6] combined EMD and Wavelet Packet analysis to extract features and then used them as input of Radial Basis Function Networks (RBFNs) to classify faults. Batista et al. [7] extracted 10 features that reflect the fault state of bearing and classified them by Support Vector Machine (SVM). Saidi et al. [8] calculated 8 higher order spectral features to represent the bearing states and input the dimension reduction features into SVM for fault classification. Chen et al. [9] proposed dependent feature vector to represent the rolling bearing characteristics of six failure modes and used the Probabilistic Neural Network (PNN) to identify them. Vakharia et al. [10] calculated bearing signal features through the weighted gain method, and the Random Forest (RF) is used to classify the faults. Ferrenc and Lutovac [11] used wavelet decomposition to get the 18 dimensional characteristics of fault signals and used RF to classify them.
ough these methods did work in fault diagnosis of rolling bearing, they still have the following deficiencies. (1) e construction of the feature extraction method is more difficult and the features are manually depending on much prior knowledge about signal processing techniques and diagnostic expertise. (2) ese manual features are extracted according to a specific diagnosis issue and probably unsuitable for other issues. (3) e process of fault diagnosis is divided into several steps, which will cause informational loss. In view of the above problems, scholars put forward fault diagnosis methods based on intelligent algorithms. e methods based on intelligent algorithms mainly consist of Deep Belief Networks (DBNs), Convolutional Neural Networks (CNN), and Stacked Autoencoders. CNN is a kind of supervised deep learning method. Compared with the traditional fault diagnosis methods based on signal analysis, the method does not depend on the manual feature extraction ability to automatically extract the deep features of signals. Lécun et al. [12] constructed the early CNN structure by alternately combining the convolution layer and pooling layer. In recent years, CNN has achieved a great success in image recognition, semantic segmentation, target positioning, and other fields. Zhu et al. [13] used short-time Fourier transform (STFT) to make one-dimensional signal into two-dimensional image and then used capsule CNN for fault diagnosis. Janssens et al. [14] used distributed Fourier transform to transform timedomain information to frequency domain and used CNN for fault diagnosis. Huang et al. [15] established a multiscale cascade convolutional neural network structure, which can adaptively extract the fault characteristics from raw signal of rolling bearing and automatically classify the bearing health conditions into different groups. Lu et al. [16] directly input the original signal into the improved hierarchical CNN for bearing fault diagnosis and achieved good results. Zhang et al. [17] also took the original timedomain signal as the input of CNN directly and deeply analysed the reason of high performance of the model. Abdeljaber et al. [18] input the original vibration signal into one-dimensional CNN for fault diagnosis. Eren et al. [19] used the same rolling bearing data from CWRU as in this paper. ey input the original signal directly into the compact adaptive 1 D-CNN for fault diagnosis. According to the experimental results, the fault recognition rate of their experimental method is 92.33%, which is lower than the recognition rate of this algorithm. e practice shows that the fault diagnosis methods based on intelligent algorithms can effectively overcome the shortcomings of the signal analysis methods, and the recognition rate is better than the traditional methods. However, most of the intelligent algorithm methods still exist in the fault diagnosis of rolling bearing, and the network model is not deep enough to extract features comprehensively, which leads to the diagnosis accuracy not reach the ideal accuracy rate, and there are still some errors.
is paper presents an intelligent fault diagnosis method based on WT and D-CNN. Firstly, the harmonic components which will affect the identification of impact components are removed. en, the time-frequency map including the time-domain and frequency-domain information is obtained by WT to enrich the fault feature information carried by the signal. Finally, the time-frequency map is directly used as the input of CNN. In the CNN, the deformable convolution kernel which can adapt to the complex image is used to enhance the deep feature extraction ability and complete the fault diagnosis of rolling bearing. e simulation results show that the recognition rate can reach 99.9% under various fault modes, and our methods proposed in this paper is feasible.

Signal Composition.
In the manufacturing and installation process, there exist some defects for most bearings such as low manufacturing accuracy of bearings, eccentric quality, unbalanced rotor, and improper bearing, which lead to periodic harmonic vibration in the work. ese vibrations are composed of the vibration of basic speed frequency and the vibration of integral time harmonic frequency, which can be expressed as follows: where ω � 2πf, A n and φ n represent the amplitude and phase of nth harmonic vibration, respectively, and f represents the fundamental frequency of bearing. When a fault occurs, it will generate an impact signal, which is recorded as S i (t). In the actual working condition, the noise often occurs, which is recorded as S n (t). erefore, the vibration signal S of rolling bearing caused by fault can be described as three parts: harmonic vibration, impact vibration, and noise, which are described by the mathematical formula as follows: (2) e vibration signal is accompanied by the harmonic signal, which submerges the impact components and produces modulation effect and affects the extraction of fault feature information [20]. e existence of noise can enhance the generalization ability and stability of the model. erefore, the harmonic components are removed from the vibration signal firstly, and only the impact components and noise are retained.

Harmonic
Removal. Because of the difference in frequency between harmonic signal and impact signal, it is necessary to find a dictionary with frequency as variable to distinguish harmonic components and impact components. e waveforms of harmonic signals are mostly sine or cosine waves. Fourier dictionary is a frequency dictionary composed of sine and cosine functions, so it can effectively match the harmonic components in frequency without affecting the impact components. e OMP algorithm can match the complete harmonic components with the Fourier dictionary. rough the OMP algorithm, the sparse representation of signals can be obtained as follows: Combining equations (2) and (3), where R k s � S i (t) + S n (t). It represents the residual after matching the harmonic components, which only contains the impact signal and noise with fault information. Record the Fourier dictionary as dictionary D, and the flowchart of algorithm is shown in Figure 1: us, residual R k s can be obtained. In order to show the effect of harmonic removal, the simulation experiment is carried out with the simulated harmonic signal S h (t) � 2 cos(2πf h t + (π/4)) and impact Hz, ξ � 0.005, and t ∈ (0, 1) s. Noise is set to 0.2 dB. Based on the Fourier dictionary, the effect of harmonic component removal using OMP algorithm is shown in Figure 2.
It can be seen clearly from Figure 2 that the OMP algorithm with the Fourier function as the dictionary can effectively eliminate the harmonic signal components and only retain the impact signal components and noise.

Time-Frequency Analysis
Time-frequency analysis is an important technology in the field of modern signal processing. In the early days, the STFT method, which was simple and intuitive, was widely used. According to Heisenberg principle, the time frame and frequency frame of the window function of Fourier transform cannot be minimized at the same time, so the time resolution and frequency resolution will not reach the best together. e window function of STFT is fixed. In practical operation, we can only choose the appropriate window function according to experience. erefore, it is difficult to obtain satisfactory results in processing nonstationary signals. However, WT solves this problem with its unique multiresolution characteristics, which is very suitable for nonstationary signal analysis. e definition of wavelet function is if the function φ(t) satisfies then the function φ(t) is called the mother wavelet. Analytical wavelet is a function obtained by φ(t) through a series of scaling and shifting transforms. e equation is given as follows: where a is the scale factor, which controls the width of the function and b is the translation factor, which controls the position of the function on the axis. e scale factor a corresponds to the frequency f of the signal. e translation factor b corresponds to the time t and represents a signal by the time scale joint function. When a increases, the frequency resolution of the signal increases and the time resolution decreases. When a decreases, the frequency resolution decreases and the time resolution increases. erefore, for the abrupt signal, the WT with the ability of "zoom" is very suitable for processing this kind of signal.
In WT, the choice of wavelet bases is very important. Different wavelet bases have a profound influence on timefrequency map. Among the commonly used wavelets, Meyer wavelet and Morlet wavelet [21] are more suitable for engineering signals. Among them, the waveform of Morlet wavelet is more consistent with the characteristics of impact signal when bearing failure occurs. In [22], the time-frequency analysis simulation test of LFM signal was carried out, and the result showed that the signal energy of Morlet wavelet is more concentrated than that of Meyer wavelet. e time-frequency image is a two-dimensional image that reflects the energy intensity. It has the characteristics of high instantaneous energy for the impact signal, so it is more suitable to use Morlet wavelet as the wavelet bases. erefore, the wavelet base chosen in this paper is the Morlet wavelet.

Convolution Neural Network
CNN is a network model with a multilayer structure [23], as shown in Figure 3; it includes convolution layer, pooling layer, full connection (FC) layer, and classifier. e image enters the network from the input layer, and the convolution layer performs the convolution operation with the image through the convolution kernel to obtain the important local features of the image. e function of the pooling layer is to reduce feature dimension, keep feature invariable, and prevent overfitting to some extent. In FC, all two-dimensional features are spliced into one-dimensional features as the input of the full connection layer. Finally, the classifier (output layer) is used to obtain the classification results. Softmax is used as the classifier in this paper.

Training Process of CNN.
ere are two stages in CNN training: forward propagation of image information and backpropagation of error. Forward propagation is a process in which information is propagated layer by layer, and finally classification results are obtained by the output layer. Backpropagation based on the random gradient descent method is one of the common methods in supervised learning. It updates parameters according to training samples and expectations, such as the convolution layer parameter K, the convergence layer weight β, the full connection layer weight ω, and the bias B of each layer [24]. Before the training starts, all weights and offsets in the network need to be initialized. If the same initialization parameters are used, the network will not have the ability to learn [25]. Using small random number can make the network learn normally, and it will not lead to training failure because the weight setting is too large.

Deformable Convolutional Neural Network.
e traditional convolution kernel is a fixed block shape. e deformable convolution kernel has the ability of geometric transformation and can capture image features better. e deformable convolution is realized by adding several offset vectors to the convolution kernel, and the original sampling points are replaced by offset sampling points. rough offset learning, the shape of convolution kernel can be self-adjusted according to the specific image characteristics [26]. e deformable convolution effect is shown in Figure 4. It shows that the deformable convolution can be sampled on various scale transformations. Input: signal S ; threshold value ε Record the dictionary D, by applying the permutation k + 1 ↔ n k + 1 6. Compute {b n k } k n=1 , such that g k+1 = ∑ k n=1 b n k g n + γ k , and 〈γ k , g n 〉 = 0, n = 1, ··· , k update the model s k+1 = ∑ k+1 a n k+1 g n , R k+1 7. Set α k+1 = α k = ||γ k || -2 〈R k s , g k+1 〉, a n k+1 = a n ka k b n k , n = 1, ··· , k   Shock and Vibration

Deformable Convolution Kernel.
e two-dimensional convolution is generally composed of two steps [26]. (1) e regular grid R (which determines the size of the receptive field) is used for sampling on the input feature map x. (2) e sampling value is multiplied by the weight ω and summed.
P 0 is any position on the characteristic graph y, and we can obtain where p n is any point in grid R.
In the deformable convolution, the regular grid R varies according to the offset. erefore, an offset Δp n |tnn � q1h, ... x, 7N , where N � |R|. Bring Δp n into (6): Since sampling is carried out in an irregular position, usually the offset Δp n is a decimal, so the calculation of Δp n is carried out by bilinear interpolation, and the formula is as follows: where p is one of the decimal places,(p � p 0 + p n + Δp n ); q is all the integral space positions on the characteristic graph x; and G(q, p) represents the bilinear interpolation kernel, where G is two-dimensional: G(q, p) � g q x , p x × g q y , p y , By adding a convolution layer to the output feature, the convolution kernel is consistent with the current convolution layer parameters. e output offset has the same spatial resolution as the input feature. When training, the deformable convolution can be realized by learning the output feature and the offset at the same time. When learning the offset, the gradient is learned by backpropagation through equations (8) and (9) [26]. e flow of deformable convolution is shown in Figure 5.

Pooling Layer in Deformable
Convolution. e deformation of the convolution layer is similar to that of the convolution layer. During the operation, all the values obtained from the previous layer are averaged within the coverage of each merging core as the output result. e average convergence formula is as follows [26]: where n i,j is the total number of pixels in the region.

General Procedure of the Proposed Method
In this paper, we develop a novel rolling bearing intelligent fault diagnosis method based on wavelet transform and deformable convolution neural network. e flowchart of the proposed method is shown in Figure 6 and the general procedures are summarized as follows.
Step 1: the vibration signals of rolling bearing are measured by sensors and collected by the data acquisition system.
Step 2: harmonic removal is adopted to reduce the influence of harmonic components on fault diagnosis, and then the harmonic removal signal can be obtained.
Step 3: WT is used to get the time-frequency diagram of the harmonic removal signal, which can provide clear information about the health conditions of rolling bearing.
Step 4: without any manual feature extraction, the timefrequency diagram is divided into training and testing samples separately.
Step 5: deformable convolution kernels are introduced into the traditional CNN, and D-CNN can be built.
Step 6: the D-CNN is constructed with a series of training samples, and then it is used for unsupervised feature learning of the training samples. e learned features are fed into a Softmax classifier for the fault pattern recognition of rolling bearing.
Step 7: the performance of the proposed method is verified by using the test samples, and the diagnostic results are reported.

Experimental Data.
In order to verify the effectiveness of the proposed method, the standard database of the rolling bearing experiment centre of Case Western Reserve University (CWRU) is used in this experiment. As shown in Figure 7, the experimental platform consists of a 2 hp drive motor (left), a torque transducer/encoder (centre), a dynamometer (right), and control electronics (not shown). e test bearings support the motor shaft. Single point faults were introduced to the bearings using electro-discharge machining. e drive end bearing is 6205-2RS JEM SKF deep groove ball bearing in Sweden. In the experiment, the accelerometers attached to the motor housing with magnetic base are used to collect the vibration signal. ese sensors are placed at the 12 o'clock position at both the drive end and fan end of the motor housing. Table 1 lists the specific parameters of the bearings.
ere are three kinds of bearing faults: inner ring fault, outer ring fault, and rolling element fault. Two kinds of fault diameters for each fault are taken into consideration, i.e., 0.1778 mm and 0.3556 mm. erefore, there are 6 failure modes in total, and there are 7 modes when the normal state is taken into account. e vibration signal of bearing has periodicity. e number of sampling points in a cycle is N � (f s × 60/n s ) [27], where f s is the sampling frequency (Hz) and n s is the bearing speed (r/min). When the sampling frequency of the driving end of the experiment is 12 kHz and the bearing speed is 1772 r/min, the number N of sampling points in one cycle is 406. In order to ensure the integrity of the fault information of the sampling data, sampling points with more than two cycles are selected; therefore, the number of sampling points selected in this paper is 1024. e signal is divided into 120 samples in each state signal. As a result, there are 840 samples in total. Considering the small number of samples, we use data enhancement methods such as cutting and rotation to double the data; as a result, there are 1680 samples in total. 75% of all samples are set as training sets and 25% are set as test sets, that is, there are 1260 training samples and 420 test samples.
All experiments in this paper are based on a PC platform with Intel (R) Core (TM) i5-4590 CPU @ 3.30 GHz, 8.00 GB of memory, and Windows 10 64-bit operating system. Python is the software used in this paper and its version is Python3.7.

Convolution Neural Network Model.
rough many experiments, the D-CNN model designed in this paper is shown in Figure 8. e model consists of 10 layers, including 1 convolution layer (C), 3 deformable convolution layers (D-C), 4 pooling layers (S), 1 FC, and a Softmax classifier. e characteristic parameters of each layer are shown in Table 2. Input layer size is 150 × 150 (not listed in   Shock and Vibration    Table 2). e learning rate is set to 0.001, the parameters are initialized randomly, Adam optimization algorithm is used to train the network model, the activation function uses ReLu, the regularization selection dropout is 0.5, and the batch is 32.

Experimental Results and Analysis.
Firstly, the harmonic components are removed from the original signal of all sample sets. e original signal and harmonic elimination signal of 0.1778 mm fault sample of rolling element and inner ring are shown in Figure 9. Secondly, wavelet analysis is performed on the original signal and the harmonic removal signal to obtain the time-frequency diagram, as shown in Figure 10.
We can see from the time-domain signals of 1024 sampling points with 0.1778 mm fault in the rolling element and inner ring in Figure 9 that the harmonic removal is obvious. In Figure 10, the time-frequency diagram of rolling element and inner ring 0.1778 mm fault can clearly conclude that the time-frequency diagram of the harmonic removal signal is more concentrated in energy, and the signal impact characteristics are obvious. Compared with the original signal, the feature location in time is accurate and clear, and the time-frequency focusing is better, which is conducive to the CNN for feature classification. Figure 11 shows the time-frequency diagram of outer ring fault of 0.1778 mm, and the time-frequency diagram of normal signal. It can be clearly seen from Figures 10(a), 10(b) and 11 that the shapes of time-frequency diagrams of inner ring fault, rolling element fault, outer ring fault, and normal signal of 0.1778 mm have very obvious difference, which provides enough information for bearing fault diagnosis by the CNN method.
In order to verify the effectiveness of the method proposed in this paper, a comparative experiment is carried out under the same conditions between the rolling bearing fault diagnosis using the method and the traditional CNN method. e traditional CNN method adopts the same network structure and parameters as the method in this paper, as shown in Table 2. During the experiment, each method is trained 300 times. 20 batches, namely, 640 samples, are randomly selected from 1260 training samples to train the network each time. After each training, 32 samples are randomly selected from 420 test samples to test the network, and the recognition rate is output. e experimental results are shown in Figure 12. e above process is repeated 5 times and the highest recognition rate are taken each time to average the average recognition rate, as shown in Table 3. At the same time, the training time of one iteration and the test time of a single sample are recorded in Table 3.
It can be seen from Figure 12 that the recognition rate of the traditional CNN method is higher than that of the D-CNN method in the first 100 test results. is is because the training mechanism of the D-CNN method is more complex and the training parameters are more. More training samples are needed to determine the model completely. After 100 times training, the recognition rates of the D-CNN method are higher and more stable than the traditional CNN method.
is is because traditional CNN is fundamentally limited by the fixed convolution kernel shape and lacks the geometry transformation adaptation mechanism for complex images in actual training. Deformable convolution improves the feature extraction ability of the network for complex images, enriches the feature expression of the network, and has the ability of adaptive learning in receptive field. It can be seen from Table 3 that, in the stable state, the recognition rate of the D-CNN method can reach 99.9% on average, while that of the traditional CNN method can only reach 96.8% on average. In terms of training time, the training time of D-CNN is longer than that of CNN. e traditional CNN method only needs 51.53 seconds, while the D-CNN method needs 216.32 seconds. is is because D-CNN has more convolution layers and offsets than the traditional CNN. First, it will take some time to obtain the offset position by bilinear interpolation. en, it will take more time to learn the offset position in backpropagation.
erefore, the increase of the parameters and calculation amount of the D-CNN offset learning results in its training time is longer than that of the traditional CNN, but this problem can be solved by offline training. In addition, the test time of D-CNN is longer than that of traditional CNN. e traditional CNN only needs 7.9 milliseconds, while the D-CNN method needs 33.8 milliseconds.
is is also due to the large amount of calculation of the D-CNN method. e realtime performance of the method is slightly weak. Because the test time is millisecond, there is not much influence in the engineering application.
In order to further verify the superiority of the method in this paper, it is compared with the traditional fault diagnosis technology based on signal analysis. In the traditional fault diagnosis technology based on signal analysis, we extract 16 time-domain features as feature vectors, which are mean, variance, maximum value, minimum value, peak value, RMS amplitude, standard deviation, absolute mean value, kurtosis, skewness, waveform index, pulse index, margin, peak index, kurtosis index, and skewness index. We use three classifiers for fault recognition, namely, BP Neural Network (BP), Random Forest and SVM. e detailed parameters of the three classifiers are shown in Table 4. During the test, 50 samples are taken  Figure 13. e confusion matrix can clearly reflect the matching degree of the real label and prediction label. It can be seen from Figure 13 that, in a single test, the classification performance of the D-CNN is the best, the method is superior to all other methods, and the SVM classification result is the worst.
In order to further verify the stability of the method in this paper, five experiments are carried out for all five methods, and the recognition results are shown in Figure 14. e average recognition rate and standard deviation of all five experiments are shown in Table 5.
Combined with Figure 14 and Table 5, it can be concluded as follows. (1) e average recognition rate of SVM is the lowest, only 80.2%. e recognition rate of 5 tests is between 68% and 84%. And the standard deviation is the largest, which is 0.069. erefore, the SVM method not only has low diagnosis accuracy but also has unstable performance. (2) e average recognition rates of CNN and random forest are the same, both of which are 96.8%. However, the standard deviation of CNN is 0.008, while the standard deviation of random forest is 0.029, which is lower than CNN, which shows that CNN is better than random forest in stability. (3) e average recognition rate of BP is 97.2%, which is only lower than that of D-CNN. And the standard deviation is relatively low, only 0.016. is means that this method not only has a high recognition rate but also has stable classification performance. (4) e average recognition rate of the proposed D-CNN method is 99.9%, which is the highest among the five methods, and the standard deviation of this method is also the lowest among the five methods, which is 0.004. e above experimental results show that the proposed D-CNN method is superior to the other four algorithms in both recognition rate and stability.     Figure 13: Confusion matrix of three classifiers and two CNN.

Conclusions
is paper presents a bearing fault diagnosis method based on WT and D-CNN. Firstly, the OMP algorithm based on the Fourier basis is used to remove the harmonic component in the signal and retain the impact component and noise.
en, Morlet wavelet is used to analyse the signal with impact component and noise, and the time-frequency map is obtained. Finally, the deformable convolution kernel is introduced into the 10 layers CNN to diagnose the rolling bearing fault.
e experimental results show that the characteristics of the vibration signal are more obvious from the time-frequency diagram of the vibration signal after harmonic removal, and the satisfactory classification result can be obtained by using them as the input of D-CNN. By comparing CNN, SVM, random forest, BP, and D-CNN in the same dataset, it is proved that D-CNN can effectively improve the recognition rate and stability of rolling bearing fault diagnosis. However, the recognition speed of the D-CNN is not ideal, which needs further study.
Data Availability e data used in this paper is from Case Western Reserve University. e dataset has been published on the Internet and can be downloaded from the following website: https:// csegroups.case.edu/bearingdatacenter/pages/download-data-file.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.