Intelligent Fault Diagnosis of Bearing Based on Convolutional Neural Network and Bidirectional Long Short-Term Memory

+e traditional bearing fault diagnosis methods have complex operation processes and poor generalization ability, while the diagnosis accuracy of the existing intelligent diagnosis methods needs to be further improved. +erefore, a novel fault diagnosis approach named CNN-BLSTM for bearing is presented based on convolutional neural network (CNN) and bidirectional long short-termmemory (BLSTM) in this paper.+ismethod directly takes the collected one-dimensional raw vibration signal as input and adaptively extracts the feature information through CNN. +en, the BLSTM is used to fuse the extracted features to acquire the failure information sufficiently and prevent the model from overfitting. Finally, two different experimental datasets are used to verify the effectiveness of the method. +e experimental results show that the proposed CNN-BLSTM model can accurately diagnose the fault category of bearings. It has the advantages of rapidity, stability, antinoise, and strong generalization.


Introduction
As a commonly used power transmission device in mechanical equipment, bearings are widely used in industrial production [1][2][3]. Its operation status will directly determine the useful performance of the equipment [4][5][6][7]. Once a failure occurs, it will cause unstable operation of the equipment and even cause major losses [4]. erefore, it is of great significance to carry out effective condition monitoring and diagnosis of bearings [8][9][10].
At present, the research methods of fault diagnosis are mainly divided into model-based methods, experiencebased methods, and intelligence-based methods [11][12][13][14][15][16]. Among them, the intelligence-based methods were also called the data-driven methods. is method uses a large amount of historical data to establish a fault diagnosis model and gets rid of the dependence on physical parameter models [17]. Compared with the model-based and experience-based diagnosis methods, the data-driven diagnosis method has stronger operability [4]. With the continuous improvement of monitoring methods, the data in enterprise can be collected much faster and more widely than ever before, which has laid a good foundation for data-driven fault diagnosis methods [17,18]. In recent years, with the rapid development of deep learning theory, remarkable achievements have been made in the fields of image processing, computer vision, speech recognition, and so on [19]. As a representative deep learning technology, convolutional neural network (CNN) has been widely used in data-driven fault diagnosis methods. Many scholars have proposed a lot of data-driven fault prediction methods [20][21][22][23][24]. Gao et al. [25] proposed a new method of fault diagnosis using a bidirectional long short-term memory-(BLSTM-) based Hilbert-Huang transform (HHT) and CNN. Liang et al. [26] proposed a fault diagnosis method based on Wavelet Transform (WT), Generative Adversarial Network (GAN), and CNN. Abdeljaber et al. [27] used one-dimensional CNN to automatically extract the optimal damage-sensitive features directly from the original acceleration signals. Zhou et al. [28] proposed a bearing remaining life prediction and fault diagnosis method based on short-time Fourier transform (STFT) and CNN. Levent et al. [29] directly input the raw signal into one-dimensional CNN for fault classification and verified it on the Case Western Reserve University (CWRU) bearing dataset.
Although these methods can better extract fault features and perform fault diagnosis, the accuracy of diagnosis is not high, the operation process is complicated, and the running time is longer. In addition, what CNN learns is the spatial information in the receptive field. With the deepening of the network, the features learned by CNN become more and more abstract, which is prone to overfitting and further affects the judgment of test results. erefore, it is the key to improving the accuracy of fault diagnosis and preventing the overfitting of the model without destroying the relationship between information. A new fault diagnosis method that can effectively extract fault feature information and prevent overfitting is proposed. e bearing faults at the same position are divided into one class, and the fault diagnosis model based on CNN and BLSTM is established.
e effectiveness of the diagnosis method is verified by the benchmark dataset CWRU bearing data [30] and the bearing dataset measured under the time-varying speed condition of the University of Ottawa (uOttawa) in Canada [31]. e contributions of this paper can be summarized as follows: (1) A new feature extraction method, called CNN-BLSTM, is proposed to reduce the complexity of the diagnosis process and avoid overfitting of the model during the training process (2) Avoid the interference of high correlations between different size fault classes in the same position (3) is method has good robustness and achieves high accuracy even under strong noise interference e rest of this paper is as follows. In Section 2, the basic principles and network structure of CNN and BLSTM are introduced. In Section 3, we describe the proposed CNN-BLSTM intelligent fault detection method in detail. In Section 4, the effectiveness of the method proposed in this paper is verified through two different experiments, and the results are compared with other intelligent fault diagnosis methods in the literature. Section 5 is the conclusion of this paper.

Theoretical Background
2.1. Convolutional Neural Network. As one of the representative algorithms of deep learning, CNN has powerful feature extraction ability and was widely applied in various fields such as image processing and time-series data [32]. In mechanical equipment monitoring, the data collected by sensors is usually a one-dimensional time series. erefore, it is more suitable to use one-dimensional CNN for feature extraction. A typical CNN consists of an input layer, convolutional layers, activation layers, pooling layers, fully connected layers, and an output layer. e structural difference between two-dimensional and one-dimensional CNN, the main difference, is the use of one-dimensional arrays instead of two-dimensional matrices in the kernel and feature maps [27].
e main principle of one-dimensional CNN is to use convolution kernel to perform convolution operation on one-dimensional signals input to the convolution layer and then extract features and sparse processing layer by layer through activation function and pooling layer. e formula can be defined as follows: where k is the convolution kernel; j is the number of cores; M is the number of input channels x t−1 i ; b t j is the bias parameter; * is the convolution operator; f is the activation function; Sigmoid, ReLU, and other activation functions are the commonly used functions. Because the Sigmoid activation function has some disadvantages-for example, the gradient disappears, and the output is not a zero mean value-the ReLU activation function can overcome these disadvantages and converge faster, but there is neuron necrosis. erefore, this paper adopts the ELU activation function, whose formula can be expressed as follows: where y t j is the activation value of the t-th layer. e pooling layer is usually after the convolution layer; the main purpose of pooling is to downsample, make use of the local characteristics of the data itself, optimize the network structure, and improve the robustness of the features while reducing the data dimension. Commonly used downsampling methods include stochastic-pooling, meanpooling, and max-pooling, among which max-pooling is the most widely used, which is expressed as follows: where w is the width of the pooling area; y t(w) j represents the t-th activation value of the j-th neuron in the l-th layer; z t(l) j represents the t-th value after the pooling operation in the j-th neuron in the l-th layer.

Bidirectional Long-and Short-Term Memory Neural
Network. LSTM, as a variant of RNN, avoids the problems of exploding or vanishing gradients in the backpropagation of long sequence data in traditional RNN and realizes the long-term memory of useful information in state monitoring data. e LSTM unit consists of 3 components: forget gate, update gate, and output gate. e unit structure is shown in Figure 1. e forget gate is responsible for discarding unimportant information from the unit state received by the previous unit. e update gate is responsible for the addition of information to the cell state. e output gate is responsible for selecting useful information from the current cell state and showing it gives as an output [33]. e formula is as follows: where f t , i t , and o t are the forget gate, update gate, and output gate, respectively; b f , b i , and b o are the bias vectors of the forget gate, update gate, and output gate, respectively; C t−1 is the previous neuron state; h t−1 is the last hidden state; C t is the current neuron state; h t is the current hidden state; σ is the sigmoid activation function; tanh is the hyperbolic tangent function.
Since the LSTM network can only process data in one direction when processing time series such as condition monitoring data, it cannot make full use of the data [34]. e BLSTM layer consists of LSTM layers superimposed on each other in opposite directions, which enables the model to make full use of the main feature information and improve the performance. e structure of the BLSTM unit is shown in Figure 2.
Generally, there are 1-2 fully connected layers behind the BLSTM layer. All neurons in the fully connected layer are usually connected with the feature vector nodes output from the previous layer. e output layer after the fully connected layer often uses Softmax as a classifier to effectively classify multiple targets. Assuming a task is a category problem, the output of Softmax can be expressed as follows: where θ (1) , θ (2) , · · · , θ (k) are the parameters of the model and Oj is the final output result.

Proposed Method
In this paper, a fault diagnosis method of CNN-BLSTM is proposed based on the research of fault diagnosis. It makes the most of the advantages of CNN in feature extraction and BLSTM in dealing with gradient disappearance and explosion. As shown in Figure 3, the proposed CNN-BLSTM model consists of three alternating convolutional and pooling layers, a BLSTM layer, and two fully connected layers. Firstly, the main features of input signals are extracted through three alternating convolution layers and pooling layers. en, BLSTM is used to further extract features as the input of the full connection layer. Finally, the Softmax classifier is used for fault identification and classification.

Experimental Validation
To verify the effectiveness of the proposed method, two experiments on different bearing datasets were conducted. e data of Case 1 was the public bearing fault dataset obtained from CWRU (https://csegroups.case.edu/ bearingdatacenter/home). e data of Case 2 came from the bearing dataset obtained by the uOttawa under the condition of time-varying speed (https://data.mendeley. com/datasets/v43hmbwxpm/2).

Experiment Setup and Data Description.
As the benchmark dataset of fault detection, CWRU rolling bearing data has been widely used in the field of fault monitoring by many scholars [30]. e CWRU experimental platform, as shown in Figure 4, consisted of a 2-horsepower (about 1.5 KW) motor, torque sensor/decoder, power meter, and so on. e bearing (6205-2RS JEM SKF) to be tested was installed on the drive end of the motor to support the motor shaft, and an acceleration sensor was placed above the bearing seat of the motor drive end to collect the vibration acceleration signal of the faulty bearing. e load was about 1HP (1772 r/min). e sampling frequency was 12 kHz, and the time was 10 s. ree bearing failure types were considered: rolling, inner race, and outer race faults. Each fault type was induced in the rolling bearings by electrodischarge with different fault diameters (0.007, 0.014, and 0.021 inches, of which 1 inch � 25.4 mm) [32]. e selected fault status was specifically constituted as shown in Table 1. ere are four fault types of the bearing: normal, ball fault, inner race fault, and outer race fault, reaching 10 fault conditions in total, of which the outer race fault point was 6 o'clock direction. In view of the fact that faults in the same location were often classified into one class in life, the fault type in Table 1 was 4. is paper used One-Hotcoding to label the fault type; for example, the bearing outer race fault was "label � 3" and the code was [0, 0, 0, 1]. In this experiment, in order to ensure the integrity of the fault information of the sampled data, 1024 data points of the experimental platform about two cycles were selected as a sample.
is paper performed downsampling for each fault data in Table 1, and the sampling data points were 120,832. erefore, each fault signal data could be divided into 118 samples, with a total of 1180 samples. To ensure sufficient training volume, 90% of them were randomly selected as training set samples and 10% as test set samples. In addition, 10% of the training set was randomly divided into validation sets to adjust model parameters.

CNN-BLSTM Model.
e parameters selected by the CNN-BLSTM model through debugging are shown in Table 2.
e first convolutional layer used a wider core, which could effectively suppress high-frequency noise and improve the performance of the network [35]. erefore, the size of the first layer of convolution kernel in this paper was selected as 8, and the step size was 2 to speed up the iterative convergence efficiency. In order to prevent the overfitting problem, dropout was introduced in the second and third layers of the convolution layer, a value of 0.2. Softmax was used as a classifier and the optimizer was selected as the Adam. e loss function was classification cross-entropy and batch size is 64. In order to avoid the contingency and particularity of the experiment, this paper used the same parameters to carry out 10 independent experiments. To detect the fault state within a relatively short time, the number of each experiment iteration was 50. All experiments were run on the same PC platform, using Ten-sorFlow to build the CNN-BLSTM model; the processor was Intel(R) Core(TM) i5-6500 CPU @ 3.22 GHz, 3.32 GHz。

Experimental Results and Analysis.
In the intelligent fault diagnosis method, most methods either used too little data or divided faults of different sizes in the same location into different categories. Since the equipment may have a sudden destructive failure in operation, it may not be able to fully reflect the comprehensive data fault state of the industrial scale when the amount of fault data is small. In order to explore the influence of too many classifications on the diagnosis results, faults of different sizes at the same location were regarded as different classes. e data categories in Table 1 were divided into 4 and 10 categories for experimental comparison. e comparison model was the method proposed in this paper, the CNN-LSTM model framework with BLSTM layer replaced by LSTM layer and the CNN model framework without BLSTM layer. Figure 5 shows the test results of 10 repeated experiments. Tables 3 and 4 show the average accuracy and standard deviation of the test results.
Combining Figure 5 and Tables 3 and 4, we could see the following: (1) Whether it was classified into 4 or 10  categories, the CNN model had the lowest accuracy and the highest standard deviation, which proves that its stability was poor. (2) When the dataset was divided from 10 categories to 4 categories, both the test diagnosis results and stability were improved. (3) In the test results of 4 and 10 classes, the CNN-BLSTM model proposed in this paper had the highest average test accuracy and the lowest standard deviation. It shows that adding the BLSTM layer to the CNN model structure can make full use of the extracted characteristic signals and improve the test accuracy, but the calculation time will increase accordingly.
As can be seen from Figure 5(b), the mode of the test result was 98.31%. Figure 6 shows the confusion matrix of the three models with an accuracy of 98.31%. e confusion    Shock and Vibration 5 matrix could clearly reflect the matching degree between the true label and the predicted label. As shown in Figure 6, none of the three models correctly identified the ball fault, which indicated that the high correlation between the different state categories at the same position would interfere with the fault recognition, thus affecting the accuracy of diagnosis.
In the traditional intelligent diagnosis methods of bearing, most of the methods had preprocessed the original signal, such as high-order cumulants (HOCs), WT, continuous wavelet transform (CWT), and EMD, or performed dimensionality reduction techniques to extract main features and then classify. Although selected data or manually extracted features could indeed improve the accuracy of the test results, this method required prior knowledge and destroyed the correlation between the data. erefore, these methods might not be able to achieve real-time condition monitoring tasks of mechanical equipment. Besides, the datasets used in most of these studies were too small to fully reflect the comprehensive data fault state of the industrial scale. Compared with the other articles, the CNN-BLSTM method proposed in this paper can effectively extract the fault features and identify the fault state in a shorter time. Table 5 uses the same baseline dataset. We used a relatively large feature learning data framework and achieved higher accuracy without any data preprocessing or feature manipulation. Although the WT-GAN-CNN method proposed in literature [26] also had a high test accuracy, it used only 6 kinds of data types and required WT to preprocess the data. e CWT-CNN-RF method proposed in reference [40]   has good results in 10 kinds of data, but its processing process is more complicated. Compared with the method that directly used the original signal as the input in literature [29,32,38,39], the CNN-BLSTM method proposed in this paper can effectively extract the fault features and identify the fault state in a shorter time.

Performance under Noise Environment.
In the process of collecting rotating mechanical signal data, due to the complex working environment of the equipment, some outside interference noise would inevitably be collected. In order to better simulate the unknown real noise and verify the fault detection accuracy of the proposed CNN-BLSTM model for signals containing noise, the additive white Gaussian noise (AWGN) with different standard variances was added to the original signal data to form different signal-to-noise ratios (SNR) [26]. e SNR formula was SNR db � 10 log 10 P signal P noise , where P signal is the power of signals and P noise is noise in signals. e experiment was implemented as follows. Gaussian white noise was added to the original vibration signal of the selected data in Table 1 to form a noisy signal. According to the above method, the SNR ranging from −4 to 6 dB noise signal was directly input into the model to test the CNN-BLSTM model proposed in this paper. e test results were compared with several other existing methods in the literature [41], and the results as shown in Figure 7. It was clear that the test accuracy rate of fault detection increased as the SNR became decreased. When the noise condition was SNR � −4 dB, the performance of the other three methods as well as this paper was poor and the test accuracy was very low, while our proposed CNN-BLSTM model could still achieve an average accuracy of 97.35%. When SNR � 0 dB, the CNN-BLSTM model could easily achieve 100% test accuracy. In different SNR, the CNN-LSTM model also had high test accuracy, but its stability was not as good as CNN-BLSTM. In summary, the CNN-BLSTM model performed well against noise in different environments without any preprocessing.

Experiment Setup and Data Description.
To further validate the effectiveness and generalization of the proposed method for bearing fault diagnosis under unknown timevarying speed conditions [42], this paper selected uOttawa bearing data for further experiments. Each dataset had two experimental settings: bearing condition and varying speed condition.
ere were 5 types of bearing conditions, including health, inner race fault, outer race fault, ball fault, and comprehensive fault, of which comprehensive fault contained ball fault, inner race fault, and outer race fault.
ere were 4 types of varying speed conditions: (i) increasing speed, (ii) decreasing speed, (iii) increasing then decreasing speed, and (iv) decreasing then increasing speed. e data was the vibration data measured by the acceleration sensor, the sampling frequency was 200,000 Hz, the sampling duration was 10 seconds, and the sampling points in a period were 1024 [31]. is paper selected 2000 data points as a sample, so each fault signal data could be divided into 1000 samples, totaling 20,000 samples. e specific composition of selected fault states is shown in Table 6, with 5 bearing statuses and 4 varying speed Table 5: Comparison between other methods in the literature and our proposed model.

Experimental Results and Analysis.
As described in Case 1, 90% of the samples were randomly selected as the training set and 10% as the test set. In addition, 10% of the training set was randomly divided into validation sets. Figure 8 shows the iterative situation of CNN-BLSTM training when the accuracy is 99.4%. Figure 8(a) is the ROC curve of the classification evaluation index; Figure 8(b) is the P-R curve, which is the Precision and Recall curve. It uses Recall as the abscissa and Precision as the ordinate.
In this paper, the CNN-BLSTM model, CNN-LSTM model, CNN model, DNN model, RF, and SVM were analyzed experimentally. Among them, the neurons of each layer of the DNN model were 128/256/512/256/32/5, and other parameters were set the same as the model proposed in this paper. e kernel function used by SVM was Radial Basis Function (RBF). RF selects the parameter n-estimators as 275 through GridSearchCV, and the others are default parameters. e results of the 10 experiments are shown in Figure 9. Average accuracy and standard deviation are shown in Table 7. It can be seen from Figure 10 that, during the training process of the CNN-BLSTM and CNN-LSTM models, as the number of iteration steps increases, the diagnostic accuracy becomes higher and higher, and the model shows a trend of convergence. Although the CNN and DNN models take a short time to process large-scale data, they are prone to overfitting during the training process, which affects the diagnosis results.
Combining Figures 9 and 10 and Table 7, we could see that, among all the methods, the CNN-BLSTM method used in this paper had the highest diagnostic accuracy and the best robustness, with an average accuracy of 99.19%. e CNN-LSTM model had the second-highest accuracy rate, partly because LSTM could prevent overfitting. Besides, the accuracy of the CNN test used in this paper also had good results, but its accuracy was not as good as CNN-BLSTM and CNN-LSTM. It was worth noting that SVM had the lowest accuracy rate among all methods. is was because a small number of support vectors determined the final result. When processing large-scale data samples, SVM was vulnerable to interference from the strong correlation between    Shock and Vibration 9 signal data. As can be seen from the experimental results, RF is a simple and fast method, but its classification results are not as good as the proposed method.

Conclusion
In order to improve the diagnosis accuracy and stability of the fault diagnosis method, a feature extraction method to prevent overfitting, namely, CNN-BLSTM, is proposed in this paper. CNN-BLSTM adaptively extracts the fault features in the raw signal data through the end-to-end method and then uses the BLSTM layer to fuse the feature information to enhance stability and avoid falling into local optimum. Taking two sets of experimental bearing vibration datasets as samples, the validity and feasibility of the method proposed in this paper are verified. e results show that this method can accurately extract highly discriminative features directly from the input raw signal data. High diagnostic accuracy can still be achieved even when the working condition is changed or ambient noise is strong. e highest fault diagnostic accuracy on the CWRU and uOttawa datasets can reach 100% and 99.55%, respectively, higher than that of other existing fault diagnosis approaches. erefore, the proposed CNN-BLSTM is an effective feature extraction method for fault recognition, and BLSTM is very suitable for the classification problem of preventing model overfitting. e proposed fault diagnosis method has higher diagnostic accuracy and meets the application requirements of robustness and generalization.
Data Availability e data of Case 1 was the public bearing fault dataset obtained from CWRU (https://csegroups.case.edu/ bearingdatacenter/home). e data of Case 2 came from the bearing dataset obtained by the uOttawa under the condition of time-varying speed (https://doi.org/10.17632/ v43hmbwxpm.2).

Conflicts of Interest
e authors declare that they have no conflicts of interest.