Development of Deep Convolutional Neural Network with Adaptive Batch Normalization Algorithm for Bearing Fault Diagnosis

It is crucial to carry out the fault diagnosis of rotating machinery by extracting the features that contain fault information. Many previous works using a deep convolutional neural network (CNN) have achieved excellent performance in finding fault information from feature extraction of detected signals. *ey, however, may suffer from time-consuming and low versatility. In this paper, a CNN integrated with the adaptive batch normalization (ABN) algorithm (ABN-CNN) is developed to avoid high computing resource requirements of such complex networks. It uses a large-scale convolution kernel at the grassroots level and a multidimensional 3×1 small convolution nuclear. *erefore, a fast convergence and high recognition accuracy under noise and load variation environment can be achieved for bearing fault diagnosis. *e performance results verify that the proposed model is superior to Support Vector Machine with Fast Fourier Transform (FFT-SVM) and Multilayer Perceptron with Fast Fourier Transform (FFT-MLP) models and Deep Neural Network with Fast Fourier Transform (FFT-DNN).


Introduction
e characteristics of vibration values are a crucial issue in a designed machine. A lack of balance of its rotating parts and elements can result in noncoaxiality of shafts [1]. Noise and vibrations can even occur in complex structures of ships and result in noticeable obstacle in ship construction [2]. In such a phenomenon, the vibration measurement can help identify dynamic state changes in a rotational system [3]. Accordingly, it is known that fault diagnosis of bearings is based on vibration signals, and a set of features is extracted to classify for rotating shaft fault prediction. e key to the success of fault diagnosis is determined by the choice of feature extractor and classifier [4]. e current research on intelligent algorithms for either diagnosis or prediction has been attracting more attention in past years. However, the feature extractor and classifier of neural networks may not suit the load change for maintaining a highly accurate outcome [5][6][7][8]. For this reason, a combination of different neural network methods attempted to achieve a highly possible solution in a variety of disciplines. For example, trademark image retrieval was implemented using a combination of deep CNNs. A pretrained process and the trademark image retrieval task were carried out by fine tuning the network using two different databases [9]. Another case is that recurrent neural network (RNN) was used to learn inner spectral correlations, while CNN was focused on saliency features and spatial relevance [10]. Recently, a hybrid pattern recognition method was proposed for short-term wind power forecasting. e time series of produced power is estimated through combination of preprocessing, feature selection, and regression steps [11].
To solve problems of fault diagnosis of bearings, the most ideal way may combine two aspects of feature extraction and classification into one model without loss of information. e convolutional neural network (CNN) algorithm has the characteristic of "end-to-end." It means that it can complete the whole process of feature extraction, feature dimension reduction, and classifier classification through a neural network process [12][13][14]. e CNN is a multilayer neural network, including a filtering stage and a classification stage [15][16][17][18]. e filtering stage is used to extract the characteristics of the input signal, and the classification level classifies the learned features, where the two-level network parameters can be obtained through a joint training.
In recent years, two-dimensional convolutional neural networks containing stacked 3 × 3 convolution kernels, e.g., VGGNet, ResNet, and Google's Inception V4, have been reported [19][20][21]. is type of CNN model can deepen the network learning [22]. At the same time, it can achieve a larger receptive field with fewer parameters, thereby suppressing overfitting. However, for a one-dimensional vibration signal, the structure of the two-tiered 3 × 1 convolution, at the expense of six weights, only acquired the receptive field of 5 × 1 so that its application is limited.

WFK-CNN Algorithm.
e model of the Deep Convolutional Neural Networks with Wide First-layer Kernel (WFK-CNN) is characterized by large-scale convolution kernels at the grassroots level, followed by 3 × 1 small convolution kernels in the convolution layer. e convolution kernel size of all the convolution layers except the primary layer was set as 3 × 1. After each convolution operation, batch normalization (BN) is performed, and then maximum pooling of 2 × 1 is carried out. e structure of WFK-CNN is shown in Figure 1.

Batch Normalized (BN) Algorithm.
e BN algorithm can reduce the internal covariate transfer, improve the training efficiency of the network, and enhance the generalization ability of the network [19][20][21]. In this paper, a batch normalized layer is added between the convolutional layer and the active layer and between the full-connected layer and the active layer. Its main operation step is to subtract the mean of the Mini-Batch from the input of the convolutional layer or the fully connected layer before dividing it by its standard deviation so that the training process can be accelerated. However, this may limit the input value to a narrower range, reducing the network implementation ability. erefore, the value of normalization is multiplied by a scaling amount c and an offset β for enhanced expression. When BN acts on the fully connected layer, let the input of the lth BN layer be y l(i) , and the batch normalization operation is expressed in equations (1)-(4): where y l(i) is the input of the lth BN layer, c l(i) and β l(i) are BN layer scaling and offset, z l(i,j) is BN layer output, and ε is the numerical stability constant term. When BN acts on the convolutional layer, the batch normalization operation is represented as equations (5)- (8): When BN acts on the convolution layer, the loss function L on the neurons of the BN layer and the derivative of the hyper parameter are shown in the following equation: Shock and Vibration zL zy

WFK-CNN Parameter
Standard. e size, step size, and the number of convolutional layers in the WFK-CNN model are chosen for the base convolution kernel. e core of the convolutional neural network is the receptive field, which is the range of perception of a neuron in its underlying network [23][24][25][26][27]. e profile of neuronal receptive field is shown in Figure 2.
For the WFK-CNN filtering stage to learn displacementindependent features, the size of the receptive field in the input signal of the last neuron in the pooling layer should be greater than one cycle. Suppose that the size of the receptive field at the last input neuron in the pooled layer is R (0) , T is the number of data points recorded by using the accelerometer of the rotation of the bearing, and L is the length of the input signal. en, T ≤ R (0) ≤ L can be used as a criterion for WFK-CNN parameter selection.
First, solve the relationship between the receptive field of the lth pooling layer and the receptive field at the l − 1th pooling layer, as shown in the following equation: where S (l) and W (l) are the step size and convolution kernel scale for the first convolutional layer and P (l) is the number of downsampling points for the first pooling layer. When l > 1, S (l) � 1, W (l) � 3, and P (l) � 2, equation (11) can be simplified to the following equation: When l � n and R (n) � 1, the size of the receptive field of the first pooled layer neuron in the first pooling layer is shown in the following equation: where n is the number of convolutional layers. Substituting equation (13) into equation (11), the receptive field R (0) on the input last neuron of the pooled layer is obtained in the following equation: According to the design rule T ≤ R (0) ≤ L, the final design criteria can be determined as  Shock and Vibration e input signal length L � 2048, and the signal period T ≈ 400. If there are 5 layers of convolution, then the convolutional layer of the grassroots convolution can only have 8 or 16 convolution steps. e number of neurons in the network will increase when the number of convolution kernels, the scale, and the number of network layers, or the step size of the convolution kernel, are reduced, which can increase the expressiveness of the network but at risk of overfitting. Other hyperparameters of the network need to be further adjusted in the experiment according to the amount of training data [28][29][30][31][32][33][34].
e parameters of WFK-CNN model used in this study are shown in Table 1. It has five convolution and pooling layers. e size of the convolution kernel at the base level is 64 × 1, and the other convolution kernels are 3 × 1. e number of neurons in the hidden layer is 100, and there are 10 outputs in the layer softmax, corresponding to 10 kinds of bearing states.

ABN-CNN Model.
e ABN-CNN model is based on the integration of WFK-CNN algorithm and Adaptive Batch Normalization (Ad-BN) algorithm. erefore, the domain adaptation ability can be effectively improved when the distribution of test samples is different from the distribution of training samples.
e ABN-CNN model is briefly described in Table 2.
e flowchart of ABN-CNN algorithm is shown in Figure 3. e training samples are used to train the WFK-CNN model until the training process is completed. If the distribution of the training signal and the test signal is not consistent, the test set is input to the WFK-CNN model, where only the data are forward propagating, and the mean variance of all BN layers is replaced by the mean variance of the test set, but other network parameters remain unchanged.

Performance Results
In this study, the data from the CWRU Rolling Bearing Data Center, which provides a world-recognized standard data set, were used for bearing fault diagnosis of the proposed model as well as the comparison with existing algorithms [35].
is ball bearing test data came from experiments conducted using a 2 hp Reliance Electric Motor. Accordingly, the actual test conditions of the motor can sufficiently support the proposed model for industrial applications.

Performance Analysis of Bearing Fault Diagnosis in Noise
Environment. In this section, the anti-noise capability of ABN-CNN algorithm is analyzed. First, the noise signal is added to test samples to simulate noise pollution in industrial environment. e noise of the diagnostic signal is generally additive Gauss white noise. erefore, the test signals contain different levels of additive white Gaussian noise.
Signal-to-noise ratio (SNR) is defined as the ratio of P signal and P noise , represents the energy of signal and noise, respectively: SNR dB � 10 log 10 P signal P noise .
As shown in Figure 4, from top to bottom, the failure signal of the bearing from the inner ring without adding noise, the additive Gauss white noise signal, and the inner ring fault signal with noise are presented, respectively, where SNR � 0 dB. It can be seen that it is difficult to directly extract valid fault information from noise-containing signals.

Model Training and Testing.
Based on the recognition rate, the training convergent curves with/without BN in the WFK-CNN model are shown in Figure 5. e curves of target function value with/without BN using logarithm is shown in Figure 6. e size of each Mini-Batch is set as 256, and the learning rate of Adam algorithm is set as 0.001.

e Influence of the Convolution Kernel Size on the Noise.
e recognition rate of the influence of convolution kernel size on noise is studied in the WFK-CNN model and ABN-CNN model. In the test, the SNR is set to − 4 dB to 10 dB, and the convolution kernel size ranges from 16 to 128.

Receptive field
Spot patch

Input Convolution layer_1
Pooling layer_1 Convolution layer_2 Pooling layer_2 FC layer Output As shown in Table 3, the WFK-CNN model can achieve a higher recognition rate in general when the convolution kernel is large in a noisy environment. For example, when the size of convolution kernel is 112 × 1, the recognition rate can reach more than 90% when the SNR is 0 dB, and when the convolution kernel size is 16 × 1, the recognition rate is as low as 55.46%. However, when the convolution kernel size is larger than 128 × 1, the recognition rate may decrease slightly. For example, the recognition rate of the convolution kernel 128 × 1 is 89.65%, which is slightly lower than the 89.93% of the 96 × 1 convolution kernel. In Table 4, when the SNR is >− 4 dB, the recognition rate of the ABN-CNN model . Calculate the output of BN layer: is generally above 90%. Compared with the WFK-CNN model, the recognition rate is significantly improved, especially at lower SNR and smaller convolution kernel size. For example, when SNR � − 4 dB, the WFK-CNN model with 64 × 1 convolution kernel size has only 51.89%, but the ABN-CNN model reaches up to 92.56%. As mentioned above, please note that the convolution kernel size influences the recognition rate considerably, particularly under small SNR situation. Accordingly, the large-scale convolution kernel should be appropriately selected when the noise is large, especially in WFK-CNN model.

Anti-Noise Comparison.
e anti-noise comparison between ABN-CNN, WFK-CNN, FFT-SVM, FFT-MLP, and FFT-DNN models is shown in Figure 7. SVM uses a radial basis kernel function, and the number of neurons per layer of the FFT-DNN model is selected as 1025, 500, 200, 100, and 10, respectively. e convolution kernel size of the ABN-CNN and WFK-CNN models is set to 112 × 1.
It can be seen that the recognition rate of FFT-DNN is higher than that of DNN-MLP, but the immunity of the two is relatively weak. For example, at SNR � 4 dB, the recognition rate of both models is less than 90%. In addition, the WFK-CNN model has higher noise and interference immunity than the FFT-DNN and DNN-MLP models, but it is slightly lower than the FFT-SVM. On the other hand, in various noise environments, the recognition rate of the ABN-CNN model reaches more than 90%, which is the strongest anti-noise ability of all models.

Fault Diagnosis Performance Analysis under Variable
Load Condition. Figure 8 shows diagnostic signals with 0.014 inch normalized inner circle defects under different loads. As can be seen, the number of features in the vibration signal varies with different loads. In addition to the inconsistent amplitude, the period and phase of the fluctuations are also very different.
is phenomenon makes it difficult to correctly classify the extracted features for fault recognition.
e proposed WFK-CNN (Ad-BN) model trained by different kinds of load data using 1 hp, 2 hp, and 3 hp motors has a great practical significance to diagnose the vibration signal particularly when the load changes. e detailed description of variable loads for training and testing is shown in Table 5. e data, e.g., A set, were set for training,   6 Shock and Vibration   and the other two loads, e.g., B and C sets, were used for testing, and so on.

Performance Analysis under Various Scenarios.
As shown in Figure 9, the average recognition rate of FFT-SVM algorithm is less than 70% and    CNN model. When set C (load for 3 hp) is trained and set A or B is tested, the recognition rate of ABN-CNN model increases more than 10% than the WFK-CNN model.

Conclusions
In this study, the proposed ABN-CNN model presents a relatively simple network using a large-scale convolution kernel with a small coiling convolution layer. Unlike the traditional Fourier transform, it is trained by the deep CNN with the adaptive batch normalization algorithm for bearing fault diagnosis. It can effectively reduce the difficulty in adjusting the parameters of the WFK-CNN model. When the SNR exceeds − 4 dB, the proposed model can even reach a high recognition rate of more than 90% average using the one-dimensional bearing vibration signal. However, if the distribution of the test samples is significantly different from the training samples, the diagnostic performance efficiency may decrease. Also, noise interference and load change may affect the recognition rate. In a better situation, higher SNR can achieve recognition rates as high as 99%. In general, the experimental results confirm that the ABN-CNN algorithm can considerably improve the recognition rate of the WFK-CNN model and it is superior to FFT-SVM, FFT-MLP, and FFT-DNN algorithms. e future research can be further studied in increasing the recognition rate for low SNR and load variation under a noise circumstance. In addition, an axial piston pump that is widely used in various hydraulic pumps is one of the critical noise sources in the industry. Accordingly, it can be further studied on fault diagnosis of axial piston pumps using convolutional neural networks or deep belief networks via iterative learning processes.
Data Availability e data that support the findings of this study are openly available in the CWRU Rolling Bearing Data Center at http://csegroups.case.edu/bearingdatacenter/home.

Conflicts of Interest
e authors declare no conflicts of interest.

Authors' Contributions
Chao Fu developed the model. Qing Lv collected the data and carried out the performance analysis. Hsiung-Cheng Lin helped edit the manuscript. All the authors contributed to the writing of the final research paper.