A Novel Multimode Fault Classification Method Based on Deep Learning

Due to the problem of load varying or environment changing, machinery equipment often operates in multimode.The data feature involved in the observation often varies with mode changing. Mode partition is a fundamental step before fault classification. This paper proposes a multimode classification method based on deep learning by constructing a hierarchical DNN model with the first hierarchy specially devised for the purpose of mode partition. In the second hierarchy , different DNN classification models are constructed for each mode to get more accurate fault classification result. For the purpose of providing helpful information for predictive maintenance, an additional DNN is constructed in the third hierarchy to further classify a certain fault in a given mode into several classes with different fault severity. The application to multimode fault classification of rolling bearing fault shows the effectiveness of the proposed method.


Introduction
Rolling bearing is a very pivotal component in rotating machines, which are widely used in large-scale automated industrial equipment.Mechanical failure caused by rolling bearings may cause abnormality of the rotating machinery system, resulting in huge economic losses, and even cause some unnecessary casualties [1][2][3][4][5].Therefore, timely and precisely classification is critical for bearing monitoring.
The methods for mechanical equipment fault classification can be divided into qualitative model based method, quantitative model based method, and data-driven based method [6,7].Qualitative model and quantitative model based methods require precise mathematical model or a large amount of expert knowledge of the system, which will inevitably limit its application in fault classification field.In the recent two decades, data-driven method is widely used in fault detection of complex system.Instead of much more prior knowledge, data-driven approach can detect fault only through the measured data of the complex system [8][9][10][11].The most common used data-driven fault classification methods are statistical feature extraction based methods and machine learning based methods.However, the method based on statistical feature extraction can only realize fault detection and it is unable to realize fault classification.For fault classification, we had better use machine learning method such as Support Vector Machine (SVM) and artificial neural network (ANN).
In the field of mechanical system fault classification, because of the sensitivity of vibration spectrum to equipment failure, vibration signals are usually used as the data source for fault classification of mechanical equipment.Due to mechanical equipment's characteristics of being nonstable, nonlinear, large-scale, high-dimensional, and noise polluted, it is usually very difficult for precise fault feature extraction which is the most critical factor of the accuracy of mechanical equipment monitoring [12][13][14].Some scholars have put forward some feature extraction methods that combine signal processing technology with machine learning method for fault classification of mechanical equipment.Widodo and Yang extract the frequency-domain feature as the data source of SVM to detect the machinery fault [13].When the number of samples is small and the signals are nonstationary, Yu et al. proposed a bearing fault classification method by combining SVM and Empirical Mode Decomposition (EMD) [10].Hu et al. extracted the energy of each wavelet packet transform (WPT) node as the preextracted feature to develop a combined WPT-SVM based method for more accurate bearing fault classification [15].Wang et al. also used WPT to extract nonstationary characteristics of the bearing's vibration signal as the preextracted feature of ANN [16].The method uses the nonlinear learning classification ability and self-organizing ability of ANN to classify and diagnose bearing faults.Yang and Tang proposed a method combining expert system and back propagation neural network (BPNN) [17].This method makes full use of the advantages of expert system and ANN to successfully detect the bearing failure.Since bearing vibration signals are susceptible to Gaussian noise, Jiang et al. used high level statistics as the feature vector of BPNN to improve the performance of BPNN in bearing fault classification [18].However, SVM and BPNN share the shortcomings of shallow learning method: SVM is an algorithm of two classifiers, and it is inefficient in multiclassification especially in the case when the sample number of observation is very large.Selecting the appropriate kernel function and scale parameter usually needs a wealth of experience.ANN also suffers from many defects, such as the following: (1) ANN has a slow convergence rate and can easily converge to the local optimum and (2) ANN is ineffective in feature learning of complex nonlinear data and usually results in poor classification accuracy.In summary, SVM and BPNN as the shallow learning methods could not well extract the data feature involved in the high-dimensional unsteady data [19].With the load varying, bearing can work in different steady state, which is called "multimode" phenomenon.Current research work on machine learning based classification did not take multimode problem into account.
For multimode process, the data feature of each mode is different [20], but current research on bearing fault classification usually regards it as a single mode for simplicity of data processing which will result in inaccurate classification result since feature extracted is inaccurate [21][22][23].Therefore, mode partition should be implemented before fault feature extraction of a separate mode for accurate feature extraction.Zhang et al. proposed an improved -means clustering algorithm based on existing modal partition method [20].Song et al. studied the issue to distinguish stability mode from transition mode without the number of modes known in advance [24].Zhao et al. separated multiple modalities according to the diversity analysis in operational phases and established online monitoring method along multiple batch directions [25].Zhang et al. used modal subspace separation method to deal with multimode monitoring problems [26].By using various characteristics of the subspace, different mode can well be separated, which can provide chance for more accurate multimode fault classification.
Unfortunately, mode partition and corresponding fault monitoring method for certain multimode processes are only specially developed for a specific industrial process [20,[24][25][26][27].It is required to develop a more universal method.Deep learning is a promising ubiquitous feature extraction tool which has attracted wide attention by scholars from various fields [21,[28][29][30].Comparing to shallow learning, deep learning can well process the feature extraction and the issue of nonlinear big data by constructing a deep network [31,32].Through the unsupervised layer-by-layer greedy training algorithm and BP-based global parameter fine-tuning, deep neural network (DNN) can not only avoid the local optimization problem, but also solve the problem of limitation in number of labeled samples and the limitation in generalization ability.Deep learning method was firstly proposed by Hinton and Salakhutdinov in 2006 [22].In view of its excellent feature extraction capabilities, it also attracts the attention of fault classification experts.Lu et al. successfully used the better feature extraction ability of deep neural network to diagnose the bearing fault [33].The proposed method overcomes the shortcomings that the traditional feature extraction method could not discover the unknown type fault timely and effectively.Jia et al. used deep neural network to monitor the failure of bearings [34].Gan et al. proposed a fault classification method based on hierarchical neural network [11].By constructing a twolayer neural network, the method not only could locate the position of bearing fault but also effectively mines the fault size of the bearing in the same position.Deep learning, as one of the most popular machine learning methods, has brought a subversive revolution to the field of artificial intelligence.However, application about the deep learning is still in infancy, during the application process; there are also many issues demanding improvement.For example, the data in [11] are derived from a single mode, without considering the multimode observation caused by load varying problem.Therefore, it cannot fully extract the fault feature involved in the observation of different mode which is essential for the accuracy of multimode fault classification.
To solve the above-mentioned problems, this paper presents a multimode fault classification method based on deep learning.First, a DNN model is constructed, and the trained network is used to mode partition; then, a new set of DNNs are constructed for observation data of each mode, and the trained networks are used to determine which component fails to implement fault location recognition; finally, for a certain fault in a given mode, another DNN is constructed to classify those observation data with different fault size.
The remainder of this paper is as follows: Section 2 overviews the theory of deep learning.Section 3 develops a multimode fault classification method based on DNN by hierarchically constructing DNN models with different purpose.In Section 4, effectiveness of the proposed multimode fault classification method is demonstrated by experiments analysis.Section 5 concludes this paper.

Theory of Deep Learning
Deep learning is a method based on unsupervised feature learning.We use deep learning theory to construct DNN. DNN training process consists of two steps: (1) using the unsupervised learning algorithm to pretrain the network layer by layer, which is helpful for DNN to efficiently mine features from raw data; (2) using the back propagation algorithm to fine-tune the parameters of the whole network,  optimizing the performance of DNN to mine raw feature.In this paper, DNN is pretrained by multistacking AutoEncoder (AE).

AutoEncoder.
AutoEncoder is an unsupervised machine learning structure, and it can be viewed as a three-layer forward artificial neural network, as shown in Figure 1.It consists of the input layer, the hidden layer, and the output layer.AutoEncoder is a very special neural network with single hidden layer, whose output is equal to the input.AutoEncoder network parameters can be adjusted by repeated training process, such that the reconstructed output is an approximation with high accuracy of the input.AutoEncoder is composed of two parts: encoder and decoder.The encoder network encodes the input data from the high-dimensional space into low-dimensional space; then the low-dimensional space data is mapped into high-dimensional space through decoder network which realized the reconstruction process from output to input.Therefore, the low-dimensional space data can be used as the characteristic representation of the input data.
Given an unlabeled dataset {  }, ( = 1, 2, . . ., ;  = 1, 2, . . ., ) consisting of  observation features or variables, each observation variable has  samples.The encoder network encodes the sample   = [ 1 ,  2 , . . .,   ]  to the hidden activate value ℎ with an activation function   .The encoder process is described as follows: where   is the encoder function, Sigmoid function  is usually taken as the activation function in the encoder process,  is the weight matrix of the network between input layer and the hidden layer,  is the bias vector generated by the encoder network, and  = {, } is the connection parameter between the input layer and the hidden layer.The Sigmoid function can be depicted via Similarly, for the decoder network, the feature matrix ℎ obtained from encoder network is used to reconstruct   through the decoder network such that the reconstructed   is equal to the input   .The decoder process is described as follows: where    is the decoder function,  is the activation function of the decoder process,   represents the weight matrix between the hidden layer and the output layer of the network, and  is the bias vector generated by the decoder process.
The essence of AE training process is to optimize the network parameters  and   .In order to make the output   as close as possible to the input   , we characterize the degree of approximation between input and output by minimizing the reconstruction error  (,  ) (, ; , ).The optimization process is described below: In each training process, the gradient descent method is used to update the training parameters  and   of the AE network.The processes of network parameter update are as follows: where  represents the learning rate and partial derivatives (/  ) (,  ) (, ; , ) and (/  ) (,  ) (, ; , ) can be calculated with back propagation algorithm.DNN can be simply viewed as a multihidden layers neural network formed by stacking many AutoEncoders.This model uses the bottom-up method of unsupervised learning, extracting the features layer by layer.Then supervised learning method is applied to fine-tune the whole network parameters, which can extract the most essential characteristics from original signals.The structure of DNN is shown in Figure 2.
First of all, pretrain the DNN by using the unsupervised layer-by-layer greedy training algorithm.Firstly, the first AutoEncoder AE1 is trained by giving an unlabeled dataset  as the input of encoder network.The encoded feature ℎ 1 is the hidden layer of AE1.The training parameter  1 is obtained by designing the unique  as the output of AE1.Then, use ℎ 1 as the input of the second AutoEncoder (AE2) and train AE2 to acquire the network training parameter  2 .ℎ 2 is the hidden layer of AE2 which can be viewed as the characteristics of AE2.After that, choose ℎ 2 as the input of the third AutoEncoder (AE3).Repeat the process to get the hidden layer features ℎ  of the th AutoEncoder (AE ) and the corresponding network training parameter   .
Secondly, a classifier is added in the top layer of DNN.The feature information is extracted by using the unsupervised learning method in the pretraining process of DNN.However, DNN does not have the ability of classifying; a classifier should be added in the top of DNN.In this paper, Softmax classifier is used as the output layer of DNN.We suppose the training dataset is {  } ( = 1, 2, . . ., ), the label is   ∈ {1, 2, . . ., }, and the probability ( =  | ) for each category  ( = 1, 2, . . ., ) can be calculated via the following hypothesis function: where  is the model parameter of Softmax.Similarly to the AE model, in order to guarantee the performance of the classifier, the classifier model parameter is trained by minimizing the cost function   .The cost function of Softmax training process is shown in (7), where the top network parameter  +1 is obtained from minimizing   (  ).
Finally, fine-tune.In order to guarantee the accuracy of feature extraction and the classification effectiveness of output layer, the whole DNN training parameters are finetuned by using a supervise algorithm of back propagation with some limited number of sample labels.The process of fine-tuning is completed by minimizing the reconstruction error ().The procedures for parameter update are as follows: where   represents the actual output value,  is a parameter set generated from the whole network training,  = { 1 ,  2 , . . .,   ,  +1 }, back propagation algorithm is used to update the network parameter , and  is the learning rate in the process of deep learning.The fine-tuning process uses the labeled data to improve the performance of DNN.

Multimode Fault Classification Model Based on Deep Learning
There are a number of multimode processes in practical system.For multimode process, the potential feature extracted from the observation of each steady mode also varies.So it is necessary to separate the observation into several operation modes for accuracy data feature extraction.Therefore, mode partition is a fundamental step before fault classification.In this paper, this problem is solved by constructing a hierarchical DNN model with the first hierarchy specially devised for the purpose of mode partition.By this means, it can make an effective mode partition for multimode process, which can increase the accuracy of DNN-based fault classification.Framework of three-layer DNN is shown in Figure 4.
The detailed steps for multimode fault classification are as follows.
Step 1 (mode partition).In this step, we focus on building a DNN model to determine the mode label of each sample.The whole datasets are used as the input of the multimode classification model.The mode partition process can be illustrated in detail as follows.
(1) Construct a new DNN 1 with  hidden layers AE descripted in (9), and initialize the training parameters of DNN 1 . [Net where   11 = { 1 ,  1 }, where  1 is the weight matrix and  1 is the bias vector. 11 ,  12 , . . .,  1 are the numbers of hidden layer neurons in DNN 1 .The network configuration can be represented by Tr 1 . 1 denotes the training dataset.We use  11 in (10) to represent the number of neurons in the input layer of DNN 1 .
The parameters of DNN 1 can be initialized via (2) Training of DNN 1 to obtain the net parameter   1 .Unsupervised layer-by-layer feature extraction based on the training dataset  1 is implemented to the -level AE defined in (9).
(3) Mode partition uses the trained DNN 1 .Once test sample   1 is obtained, compute the probability of each test sample via the trained Net 1 .Then use (14) to divide the test sample into different modes: where  = 1, 2, . . .,  and  ( = 1, 2, . . ., ) is the mode type of sample.Mode() denotes the mode label of the th test sample.
Compare the mode partition label Mode() with the actual mode label Label() to determine the misclassification number as where size is the operation to characterize the size of a set and _miss is the misclassification set defined by Step 2 (fault source location).For a certain mode partitioned in Step 1, We can further locate the fault source.The procedure in Step 2 is analogous to Step 1, which is described below.
(1) According to the mode partition result, we build the second hierarchy of the model which comprises a set of  DNNs, and  2; ( = 1, 2, . . ., ) denotes the training dataset in DNN 2 .
Parameter initialization mechanism of DNN 2 is the same as Step 1.
Compute the misclassification number  2; ( = 1, 2, . . ., ) of the th mode.And then the misclassification of this classification step can be computed via Step 3 (fault severity recognition).In order to identify the fault severity, the third hierarchy is devised with the intention to distinguish the fault severity.Construct the third deep network Net The misclassification number for a given fault in a certain mode can be computed via where  3;; is the misclassification number of the th fault location in the th mode,  3; is the misclassification number of all  modes, and  3 is the misclassification number in this step.
Step 4 (accuracy computation of the whole multimode classification network).In this paper, the classification accuracy of the hierarchical DNN is measured by the numbers of misclassifications.The final accuracy is calculated by the ratio of the total number of the misclassifications to the total number of samples.The procedure of calculation is as follows: Combining ( 21) with (22), the final accuracy of the proposed multimode fault classification based on DNN can be formulated as where  is the number of total samples, and the flow chart of the proposed multimode fault classification method based on three-layer DNN is depicted in Figure 5.

Application to Rolling Bearing Fault Classification
Rolling bearings play an important role for rotating machinery.The health condition of the bearing directly affects the reliability and stability in the whole system.Rolling bearing as the experimental platform is used to verify the effectiveness of the hierarchical DNN multimode fault classification method, and the performance of the proposed method is compared with the traditional method such as DNN, BPNN, SVM, hierarchical BPNN, and hierarchical SVM, which is listed in detail in Section 4.3.Initialize the parameters of DNN 3 input of (j + 1)-th layer input of (j + 1)-th layer Initialize the parameters of DNN 2 j = j + 1 Fine-tune the whole DNN 1 parameters;

Data Description.
In this case, we collect the vibration signals of the bearing drive end at different loading.The dataset collected contains 4 kinds of modes; the motor load is 0 hp, 1 hp, 2 hp, and 3 hp, respectively, and 4 modes are shown in Table 1.In each mode, there are four states of inner race fault, outer race fault, roller fault, and normal, with 3 different fault sizes in each fault state, that is to say 10 different fault types in a single mode.This paper selects 200 samples in each fault type; each sample contains 2048 observation points.100 samples are randomly selected as the training data, and the other 100 samples as the testing data.We use Fast Fourier Transform (FFT) for each sample to get 2048 Fourier coefficients.Because of the symmetry of the Fourier coefficients, we take the first 1024 coefficients as the new samples; that is to say the dataset contains 8000 samples.In order to compare the proposed method of hierarchical network with single-layer network and explore the effect of different sample numbers on network, for a given mode, the sample number of each DNN is listed in Table 2.In addition, we present the original time-domain waveforms of the 10 fault types in mode 1 under A, as shown in Figure 7.  the proposed method.To reduce the effect of randomness, the experiment was repeated 20 times.In this paper, the initialized parameters in the DNN pretraining process are shown in Table 3.

Results of Fault
The network training uses stochastic gradient descent method; on each hierarchicy the maximum number of iterations of DNN is 500, 300, and 300, respectively.Simulation of three tradition methods, BPNN, SVM, and DNN, is compared with simulation of the proposed multimode fault classification approach to verify its effectiveness.In addition, hierarchical BPNN (HBPNN) and hierarchical SVM (HSVM) are also compared with hierarchical DNN (HDNN).BPNN uses the gradient descent method to update the network weights and bias parameter; one-to-one training mechanism is used to train a SVM with radial basis.The training mechanism of HBPNN and HSVM is the same as HDNN.
Table 4 compares the fault classification accuracies in time domain and frequency domain.It can be seen from Table 4 that rotation machinery fault is more sensitive in frequency domain.So we use FFT as a tool to preprocess the original data.
Table 5 compares the fault classification results after mode partition.It can be seen from line 2 and line 3 that HDNN can obtain more accurate classification either for fault source location or for fault severity recognition which tells us that mode partition is a critical step in multimode fault classification.
The hierarchical model for the case of BPNN and SVM also confirms this conclusion.Comparing line 2 with line 4 and line 6, we can see that HDNN is significantly superior to other hierarchical machine learning models because of the fact that HDNN can get better mode partition accuracy which is shown in Table 6.On the other hand, we can draw another conclusion that the performance of traditional BPNN method is superior to the traditional SVM method in the large sample case, but the accuracy of HSVM is higher than that of HBPNN due to the fact that SVM does well in small sample learning.
In order to demonstrate the performance of the proposed multimode classification method, the hierarchical machine learning methods are employed in this paper.As can be seen from Table 6, the accuracy of mode partition with proposed HDNN method can reach 99.96%, and we can naturally find that the performance of HDNN is superior to HBPNN and HSVM in mode partition procedure.
In view of the excellent performance of the proposed multimode classification method, we found that the performance  3, in each training process, the number of neurons in the last hidden layer is 100; that is to say, the feature dimension is 100, which is too large to be visualized.Therefore, PCA is used as a data compression tool to reduce the feature dimension.In this paper, we use the first three key principal components to plot the scatter chart of the fault source location feature extracted by HDNN, as shown in Figure 8. Figure 8 is the scatter plots for fault feature extracted by HDNN after mode partition, while Figure 9 shows the scatter plots for fault feature extracted by DNN without mode partition.From Figures 9 and 10, we can see that some fault features are overlapped, which result in an unsatisfactory fault classification result.
Figure 10 is the scatter plot of the feature extracted for different modes.We can see from Figure 10 that HDNN does well in multimode fault feature extraction which will greatly affect the accuracy of the successive fault classification.
In summary, the proposed multimode classification method can accurately extract the different fault features based on its strong nonlinear characterization ability.
In general, efficiency of the fault classification method is affected by sample number of the train data.Figure 11 displays the fault classification accuracy of DNN and HDNN in two cases.Red line denotes the classification accuracy of the case when more samples are used as the training data.Black line denotes the classification accuracy of the case when fewer samples (only 1/2 of the first case) are used as the training data.In addition, the line with " * " is the simulation result of HDNN and the line with "◻" is the simulation result of traditional DNN.
From Figure 11, it can be clearly seen that (1) fault classification accuracy of HDNN does not vary much for the two cases, while the fault classification accuracy of DNN is greatly affected by the number of training data used and (2) in both cases fault classification accuracy of HDNN is much better than DNN.So we can come to the conclusion that HDNN is a more robust fault classification for multimode bearing fault classification in the case when fewer number of training data are available.

Conclusions
In this paper, a novel multimode fault classification method based on DNN is developed.The main idea is to construct a hierarchical DNN model with the first hierarchy specially devised for the purpose of mode partition.The second hierarchical model comprising a set of DNNs is devised to extract feature separately of different modes and precisely diagnose the fault source.Another set of DNNs is devised to distinguish the severity of a certain fault in a given mode, which is helpful for predictive maintenance of the machinery equipment.Rolling bearing is the experiment platform to verify the efficiency of the proposed method.

Figure 2 :
Figure 2: The structure of DNN.

Figure 3 :Figure 4 :
Figure 3: Framework of fault classification based on DNN.
Extract the j-th hidden layer features as the Training the j-th AE Extract the features of j-th hidden layer Initialize the training parameters Result of mode partition Initialize the testing The output layer is determined by the mode

1 Fine-tune and save the parameter 휃 㰀 3 Fine-tune and save the parameter 휃 㰀 2 layers is N 1 Figure 5 :
Figure 5: Flow chart of multimode classification based on DNN.

Figure 6 :
Figure 6: Experimental platform for acquiring the vibration signals of rolling bearing.

4. 1 .
Experimental Platform.The experimental datasets are obtained from the Case Western Reserve University Bearing Data Center in the United States [35].The experimental platform is shown in Figure 6.It can be seen that the experimental platform consists of a 2 hp motor, a power meter, an electronic controller, a torque sensor, and a load motor.The vibration signals of the drive end of the motor are collected by the acceleration sensor as the experimental datasets for bearing fault classification.In this experiment, we use acceleration sensor to collect the vibration signals with the load of 0 hp, 1 hp, 2 hp, and 3 hp, respectively, and the sampling frequency is 48 kHz.There are four types of bearing health condition: (1) normal condition; (2) inner race fault; (3) outer race fault; (4) roller fault.The sizes of the bearing fault were 0.007 mm, 0.014 mm, and 0.021 mm, respectively.

Figure 7 :
Figure 7: Observation of original signals corresponding to 10 fault types.

Figure 8 :
Figure 8: Scatter plots of principal components for the feature of fault classification; (a)-(d) represent four modes: corresponding to Mode 1, Mode 2, Mode 3, and Mode 4, respectively.

Figure 9 :
Figure 9: Scatter plots of principal components for fault features with traditional DNN method.

Figure 11 :
Figure 11: Robustness of the fault classification method to the sample number of training data with 20 trials.

Table 1 :
Four modes of rolling bearing.

Table 2 :
Data description of in dataset for a given mode.
Classification.The proposed hierarchical DNN structure is applied to bearing fault classification; there are 8000 samples, 4 different modes, 4 fault positions in each mode, and totally 40 health conditions in dataset A. The health conditions of rotating machinery system under multimode, multicondition, multifault type, and large sample data are simulated which demonstrated the performance with

Table 4 :
Accuracy of classification in time domain and frequency domain.

Table 5 :
Fault severity classification result comparison after mode partition.

Table 6 :
The classification results by the second hierarchical of the proposed model.