A Novel Method for Diagnosis of Bearing Fault Using Hierarchical Multitasks Convolutional Neural Networks

Intelligent mechanical fault diagnosis has developed very fast in recent years due to the advancement and application of deep learning technologies. Thus, there are many deep learning network models that have been explored in fault classiﬁcation and diagnosis. However, there are still limitations in research on the relationship between fault location, fault type, and fault severity. In this paper, a novel method for diagnosis of bearing fault using hierarchical multitask convolution neural networks (HMCNNs) is proposed, taking into account the mentioned relationships. The HMCNN model includes a main task and multiple subtasks. In the HMCNN model, a weighted probability is used to reduce the classiﬁcation error propagation among multitasks to improve the fault diagnosis accuracy. The validity of the proposed method is veriﬁed on bearing datasets. Experimental results show that the proposed method is very eﬀective and superior to the existing methods.


Introduction
Rolling bearings, as the key parts of mechanical equipment, are widely used in rail transit equipment, construction machinery, precision machine tools, instrumentation, and other fields. According to statistics, about 40% of rotating machinery faults are caused by bearing faults. Once bearing faults occur, they will seriously affect the normal operation of equipment, and they may even cause accidents and economic losses. erefore, it is necessary to diagnose and monitor bearing faults before anything goes wrong [1,2]. At present, bearing fault diagnosis is usually based on data-driven methods. By collecting motor current signals or bearing vibration signals, fault diagnosis methods are applied to complete fault identification [3,4].
Data-driven fault diagnosis generally includes two steps: fault feature extraction and fault classification. e common methods of feature extraction include Fast Fourier Transformation (FFT) [5], Wavelet Transform (WT) [6], Empirical Mode Decomposition (EMD) [7], Local Mean Decomposition (LMD) [8], and Variational Mode Decomposition (VMD) [9]. e common fault classification algorithms include support vector machine (SVM) [10], BP neural networks [11], Bayesian classifier [12], K-Nearest Neighbor (KNN) [13], Random Forest (RF) [14], and Classification and Regression Tree (CART) [15]. Seryasat et al. [16] presented a diagnosis method based on wavelet transform and FFT to extract energy and root mean square of different frequency bands, which could accurately and effectively identify bearing faults. Yan and Jia [17] proposed a multidomain feature classification algorithm based on optimized SVM, which included three stages: multidomain feature extraction, feature selection, and fault recognition. e algorithm has high diagnostic accuracy for rolling bearings under different working conditions. Zhang et al. [18] proposed a new method of rolling bearing fault diagnosis based on Variational Mode Decomposition and compared the performance of VMD and EMD in extracting bearing defect features from rolling bearing simulation signals. e VMD method can accurately extract the main mode of bearing fault signal and is superior to EMD in bearing defect feature extraction. Liu et al. [19] presented a fault diagnosis method for wind turbine bearings based on integral extension local mean decomposition (IELMD), which could effectively process nonstationary signals. Kankar et al. [20] extracted the statistical characteristics of wavelet coefficients and completed the classification of bearing faults combined with an artificial neural network. Jiang et al. [21] proposed a fault diagnosis method for rolling bearings based on high-order cumulant and BP neural network in view of the fact that the vibration signals of rolling bearings were susceptible to the influence of Gaussian noise.
e basic steps of these traditional fault diagnosis methods can be summarized as follows: acquiring fault signals, analyzing the characteristics of fault signals, extracting appropriate features, and selecting appropriate classifiers according to the specific diagnosis problems. is process requires high professional knowledge and experience of signal processing for fault diagnosis personnel. With the development of modern industry, fault monitoring equipment obtains a large amount of data, and the data types are diverse, which brings great challenges to traditional fault diagnosis methods.
Deep learning has been widely used in recent years. e characteristic of the deep learning method is that it can automatically complete the task of feature extraction and classification [22]. Deep learning has also been introduced into the field of mechanical fault diagnosis to overcome the shortcomings of traditional methods recently. Zhang et al. [23][24][25] proposed a fault diagnosis method for rolling bearings based on deep convolution neural networks (CNN), which avoided manual feature extraction and realized automatic feature learning. Shao et al. [26] proposed an enhanced depth feature fusion method for fault diagnosis of rotating machinery. A new depth autoencoder method was constructed by combining Denoising Autoencoder (DAE) with Contractive Autoencoder (CAE) to improve the learning ability of features. Jia et al. [27] proposed a deep normalized convolution neural networks (DNCNN), which could effectively deal with unbalanced classification problems. Liu et al. [28] proposed an unsupervised fault diagnosis method for rolling bearings based on the generative adversarial networks. is method has higher generalization accuracy under noisy and varied workload situations. ese deep learning methods have been successfully applied to bearing fault diagnosis from different perspectives and application scenarios. Compared with traditional methods, they have higher diagnostic accuracy. However, the problem of low generalization ability of deep neural network model remains unsolved. e fault diagnosis of bearing includes fault location, fault type, and fault degree. In the existing fault diagnosis methods, all kinds of samples are generally used as training samples of the training model to achieve fault diagnosis, and the relationship between them and the impact on the final fault diagnosis results are less considered. For the hierarchical classification of deep learning, Yan et al. [29] first proposed the hierarchical deep convolution neural network model to classify images. It first classifies the easily separated classes roughly and then classifies them at a fine level [28]. Based on this idea, Guo et al. [30] and Qu et al. [31] proposed a hierarchical intelligent fault diagnosis algorithm based on an adaptive deep convolution neural network model (ADCNN), which classified bearing fault location first and then classified fault severity. e design of this hierarchical classification model requires training multiple CNN recognition models, taking pretraining, and fine-tuning. It can thus lead to a cumbersome training process, more training samples, interlayer error propagation, and difficulty in model level expansion. erefore, a hierarchical multitask bearing fault diagnosis method based on the deep convolution neural networks is proposed in this paper. By adding multilearning tasks to the convolution neural network, the multitasks learning of bearing fault diagnosis is realized, and the generalization performance of the proposed model is improved. e main contributions of this paper are as follows: (1) Based on the CNN fault classification model, the HMCNN model is formed by adding several related classification tasks representing different dimension information for parallel auxiliary diagnosis. In the proposed model, final classification results are obtained by fusing the classification results of the main task and subtasks of different dimensions according to the weight obtained by training, which can reduce the interlayer error propagation of tasks. (2) e proposed model can extract more valuable features from fewer training samples for fault classification and improve the classification accuracy of the model. In the proposed model, multiple tasks can share network parameters and information, and only one network structure needs to be trained that reduced computational consumption and training complexity. e parallel structure of multiple tasks also has good scalability. e remainder of the paper is structured as follows. e structure of the hierarchical multitask convolutional neural network (HMCNN) and some main techniques used in HMCNN is introduced in Section 2. In Section 3, experiments are carried out to prove that HMCNN has better performance than traditional intelligent methods and some typical deep learning models. After that, the structure of HMCNN model is extended to demonstrate its ability to extend. Lastly, the diagnostic results of HMCNN model are compared with CNN model and analyzed visually to explore its mechanism. e conclusion of this paper is presented in Section 4.

Proposed Method
In this paper, a novel method called hierarchical multitask convolutional neural network (HMCNN) is proposed for the intelligent fault diagnosis of bearings. e proposed model includes four parts: CNN model, hierarchical classifiers, multitask learning, and hierarchical multitask convolutional neural network. e HMCNN model only needs to train a network model to realize multitask classification, in which the sharing layer can reduce the number of network parameters, thereby reducing the computational load. e multitask learning, hierarchical classification, and joint classification layer design in the HMCNN model can improve network generalization ability. More details are described in the following sections.

Introduction to CNN Model.
Convolutional neural network (CNN) is a kind of feed-forward multistage neural network. It mainly contains three kinds of layers: convolutional layer, pooling layer, and fully connected layer. e convolution layer is designed to extract different features of input data. e pooling layer following the convolutional layer is to reduce the parameters of the network through extracting the local mean or maximum value of input data. A fully connected layer is usually built in the last part of the hidden layer of the convolutional neural network. Its main function is to connect all features and send the output value to the classifier. Convolutional neural network (CNN) is one of the common deep learning models. It is used to extract features and classify vibration signals in bearing fault diagnosis.

Hierarchical Classification.
e main idea of hierarchical fault diagnosis based on convolutional neural network is proposed in this paper, as shown in Figure 1. e main structure of hierarchical classification includes the sharing layer, coarse classification layer, fine classification layer, and joint classification layer, in which the sharing layer can reduce the number of network parameters and thus reduce the computation. e coarse classification layer is mainly used for coarse classification of bearing fault location, such as bearing recognition as health, inner ring fault, and outer ring fault. e fine classification is achieved through the fine classification layer. e joint classification layer which receives fine classification results as well as coarse classification results produces a weighted probability as the final classification results; it can be described as follows: where p c (x j ) is the probability of coarse classification made by the coarse classification layer. p F (x j ) is the fine classification made by the fine classification layer. N is the number of hierarchical tasks. e coarse classification layer, fine classification layer, and joint classification layer are all based on softmax classification function for the final classification tasks. In this way, the output of the network is transformed into a probability distribution; the softmax function is described as follows: where z j is the logits of the j th output. n is the number of categories.

Multitask Learning.
In this paper, bearing fault diagnosis has multiple learning tasks, such as fault location, fault type, and fault severity. From the perspective of machine learning, multitask learning can be regarded as inductive transfer learning, which can improve the learning performance of the model by using multiple related tasks, including improving generalization accuracy, learning speed, and comprehensibility of the learning model. In this paper, the related learning tasks are fault or fault location, which helps the final classification of severity tasks of fault categories. In the training process of the model, the joint training method is adopted, which combines the loss functions of multiple tasks and carries out the optimization training together. e loss function is described as where k i is the coefficient, loss i is the loss function for each task, N is the number of hierarchical tasks, m is the size of training minibatch, p j k is the true predicted output value, and q j k is the one-hot type vector with target distribution.

Hierarchical Multitask Convolutional Neural Network (HMCNN) Model.
e HMCNN model is shown in Figure 1, which consists of the following four parts. e first part is the sharing layer, which consists of two modules. e two modules are composed of 2 convolution layers and 1 pooling layer. e 3 × 1 size convolution kernel with stride of 2 is used in all convolutional layers. For pooling layers, the 8 × 1 sized max-pooling with the stride of 8 is done. e second part is the coarse classification layer, which is connected to the shared layer. It consists of 2 full connection layers and 1 softmax layer, which are used to complete the classification of bearing fault location, fault type, and fault severity. e third part is the fine classification layer, which is connected to the shared layer and consists of 3 convolution layers, 1 pooling layer, 2 full connection layers, and 1 softmax layer. It is used to complete the fine classification of bearing fault. e fourth part is the joint classification layer, which receives fine classification results as well as coarse classification results and produces a weighted probability as the final classification results. e HMCNN model training parameters are shown in Table 1.

Data Description.
Experimental data were collected from the bearing test rig of Paderborn University in Germany [32]. e experimental data were obtained by the test rig for condition monitoring of rolling bearings. e test rig consisted of several modules: an electric motor (1), a torque-Shock and Vibration measurement shaft (2), a rolling bearing test module (3), a flywheel (4), and a load motor (5), as shown in Figure 2. e test bearing was ball bearings of type 6203. Bearings are run at a rotational speed of 900 rpm with a load torque of 0.7 Nm and a radial force on the bearing of 1000 N. e frequency of the data acquisition system is 64 kHz. e bearing temperature was kept roughly at 45-50°C. ree kinds of bearing states are used in this experiment: inner ring damage, outer ring damage, and healthy. e detailed situation of data is shown in Table 2. In Table 2, the bearing fault location, damage method, and fault severity are listed. For fault location, H is the bearing with no fault, IR is the bearing with an inner race fault, and OR is the bearing with an outer race fault. e damage methods of bearing are shown in Figure 3. e bearing damage used in this paper was caused by three different methods: electric discharge machining (trench of 0.25 mm length in rolling direction and depth of 1-2 mm), drilling (diameter: 0.9 mm, 2 mm, and 3 mm), and manual electric engraving (damage length from 1 to 4 mm).
As described in the document of the dataset, each bearing acquired 20 original vibration time-series signals, each of which recorded about 256,000 data points. In this experiment, the 2048 data points were used to construct a sample. For each health condition of bearings, 2000 samples were used in the training set and 500 samples were used in the test set. e vibration signals of each health state are shown in Figure 4. e experimental data are normalized by maximum and minimum normalization, and the normalization formula is as follows: where x i is the value of the i-th point of sample data. x max is the maximum value of sample data. x min is the minimum value of sample data. In this paper, the proposed model is based on tensorflow deep learning framework. e experiment was completed on a computer with CPU i7 8700, 16 GB memory, and NVIDIA GTX 1070 GPU.

Diagnosis Results of HMCNN.
e experiments are divided into three parts. e first part is a comparison among the proposed model, traditional method, and intelligent algorithm based on deep learning to demonstrate the superiority of the proposed model in terms of generalization performance. e second part is to extend and compare the     Figure 4. From Figure 5, it can be seen that the classification accuracy of traditional models such as SVM and DNN is less than 80%, while the classification accuracy of deep learning model such as CNN and LSTM is about 90%. e bearing fault diagnosis accuracy of HMCNN model reaches 99.7%. Compared with traditional bearing fault diagnosis models, it does not need to extract features, has higher diagnosis accuracy, and has better generalization ability than the current deep learning models.
e HMCNN model and CNN model are also compared and analyzed. e CNN and HMCNN models use the same optimization method and training parameters in this paper. e accuracy of per 50 steps in HMCNN and CNN model is shown in Figure 6. Figure 6 shows that the bearing fault diagnosis accuracy of HMCNN model is 99.7% and that of CNN model is 92.1%. Compared with CNN model, HMCNN not only has higher bearing fault diagnosis accuracy but also has fewer training steps to achieve the highest diagnosis accuracy. e confusion matrix of experimental results is compared as shown in Figure 7. From Figure 7, compared with CNN model, HMCNN model mainly reduces the confusion degree between outer ring bearing faults, so as to improve the diagnosis accuracy of bearing faults.
In the training processes, the learning speed of HMCNN model and CNN model is compared, as shown in Figure 8. e convergence rate (learning speed) of HMCNN model is about twice that of CNN model.
In order to further verify the generalization performance of HMCNN model, the diagnosis accuracy of HMCNN with other models under different training sets is also compared here. e comparison results are shown in Table 3 and Figure 9. As the number of training samples decreases, the recognition accuracy of all methods decreases to varying degrees. From Table 4, it can be seen that the HMCNN model has better recognition accuracy in fewer training sets, and the recognition accuracy can reach 96.7% in the case of only 500 training samples.
In addition, the performance differences between HMCNN model and SVM, BPNN, CNN, and LSTM model under noise are compared. Comparison results are shown in Table 3 and Figure 10. It can be seen that the diagnosis accuracy of HMCNN is 99.1% in noise environment (SNR � 10 dB), while the diagnosis accuracy of other models is not more than 90%. At the same time, the diagnosis accuracy of HMCNN model in noise environment (SNR � −2 dB) is more than 90%. So HMCNN has good antinoise performance.

e Comparative Analysis of HMCNN with Different
Tasks Numbers. In the second part, we study and analyze the relationship between the hierarchical tasks' number of HMCNN model and its diagnosis accuracy. According to the dataset, the HMCNN model is used to learn one, two, and three classification tasks (bearing fault location, fault type, and fault severity), which are named HMCNN1, HMCNN2, and HMCNN3, respectively. e comparison results are shown in Figures 11 and 12. e results show that both HMCNN3 and HMCNN2 model can achieve a high diagnosis accuracy. e accuracy of HMCNN2 model is 0.8% higher than that of HMCNN1 model. Compared with CNN, HMCNN contributes more to the accuracy improvement by adding the task of bearing fault location. In HMCNN3 model, the diagnosis accuracy of fault type reaches 99.7%, which shows that the task of bearing fault position, fault type, and fault severity diagnosis is effective and feasible, and the final diagnosis accuracy is improved to a certain extent.  Shock and Vibration e comparison between HMCNN1, HMCNN2, and HMCNN3 models proves that the proposed model can be extended to diagnosis multiple tasks. In practical application, the location, type, and severity of bearing fault can be output in HMCNN model, which provides more detailed guidance for fault maintenance.

e Comparative Analysis of HMCNN Model with ADCNN Model.
In the third part, the ADCNN model proposed by Guo et al. [30] is compared with HMCNN model. e idea of ADCNN model for bearing fault diagnosis is to identify the location of bearing fault first and then identify the fault severity of each location of bearing on this basis. Accuracy results of ADCNN and HMCNN models for diagnosis bearing fault locations and final fault severity are shown in Tables 5 and  6, respectively. We can see that the ADCNN model's hierarchical diagnosis of the bearing will make the error of bearing fault location diagnosis spread to the result of bearing fault severity diagnosis, and the more the number of tasks, the more serious the error propagation. e HMCNN model has a shared layer and a weighted joint classification layer, which can solve the problem of error propagation and make the model more scalable.     Shock and Vibration 7

Visualization Analysis.
e principle of HMCNN model is further analyzed by t-SNE visualization technology. In this paper, the test set data are used as input, and the output data of the pooling layer in the HMCNN model are extracted as output. ese output data are reduced to two-dimensional feature vectors by t-SNE, and then these outputs are plotted as scatter plots, representing their classes with different colors, as shown in Figure 13. e visualization results show that, with the increase of network layers of HMCNN model, the separation degree of features extracted from original signals becomes more and more obvious. At last, the output features of softmax classifier have seven distinct distributions (the final classification of bearing faults).
By comparing the output of two identical pooling layers of HMCNN and CNN models as shown in Figure 14, it can be seen that HMCNN has a better classification effect than CNN       e visual output of HMCNN and CNN third pool layer shows that CNN model has no obvious effect on the diagnosis of KA07 and KA08 bearings, while HMCNN model has an obvious effect on the diagnosis of KA07 and KA08 bearings. is proves that, in the third pooling layer of HMCNN model, the bearing fault severity is clearly classified. From the network of HMCNN, the second pooling layer is essentially the last layer of shared layer.
In the shared layer, the proposed model learns the features of bearing fault location, fault type, and fault severity.
HMCNN model can share more fault information than CNN model through multitasks learning. In particular, the task of fault location focuses the network attention on the possible neglected fault location information, which enhances the classification ability of bearing fault location of HMCNN model and the KI07 bearing can be separated obviously by HMCNN model in the second pool layer. After shared layer, the HMCNN model only needs to recognize KA07 and KA08 bearings, but CNN model also needs to recognize KI07, KA07, and KA08 bearings, which may lead to the final recognition accuracy of the CNN model which is lower than that of the HMCNN model. From the analysis of the learning process of HMCNN model, multitask learning in HMCNN model may improve the generalization accuracy of the model by using the information hidden in training signals of multiple tasks as an inductive bias. Multitask learning plays the same role as regularization and reduces the risk of model overfitting. At the same time, it reduces the ability to fit random noise and makes the model have better generalization performance.

Conclusions
e hierarchical multitask learning CNN model (HMCNN) is proposed, which reflects hierarchical classification. Only one model needs to be trained to achieve a multitask classification. Compared with the experimental results of other models, the HMCNN model can improve the accuracy of the final fault diagnosis, and the diagnosis accuracy reaches 99.7%. Compared with CNN model, the HMCNN model has a faster learning speed. We compare the diagnosis accuracy of HMCNN with other models in different training samples and noise environments. HMCNN model has better diagnosis accuracy than other models. It is proposed that the HMCNN model can be extended to diagnosis multiple tasks. e fault location, fault type, and fault severity of bearing fault diagnosis are given, which can provide more detailed guidance for fault maintenance. Compared with the ADCNN model, the HMCNN model solves the problem of error propagation and makes the model scalable.
rough the visual analysis of the HMCNN and CNN model learning process, the reason why HMCNN has higher generalization accuracy is further explored. HMCNN model shares more fault information than the CNN model. In

Data Availability
e data that support the findings of this study are available at https://mb.uni-paderborn.de/en/kat/mainresearch/datacenter/bearing-datacenter/data-sets-anddownload/?tdsourcetag�s_pcqq_aiomsg. At the same time, the data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.