Bearing Fault Diagnosis Based on Deep Belief Network and Multisensor Information Fusion

1School of Mechanical and Electrical Engineering, Central South University, Changsha 410083, China 2Key Laboratory of Knowledge Processing and Networked Manufacturing, Hunan University of Science and Technology, Xiangtan 411201, China 3Light Alloy Research Institute, Central South University, Changsha 410083, China 4Hunan Provincial Key Laboratory of Health Maintenance for Mechanical Equipment, Hunan University of Science and Technology, Xiangtan 411201, China


Introduction
Bearing is one of the critical components which has a broad range of application in mechanical equipment.Due to the overload, fatigue, wear, corrosion, and other reasons, bearing is easily damaged in the process of machine operation.As a matter of fact, more than 50% of rotating machine malfunctions are related to bearing faults [1,2].Actually, a rolling bearing fault may lead to equipment intense shaking, apparatus shutdown, stopping producing, and even casualties.In general, the early weak fault of bearing is complicated and hard to detect [3,4].Therefore, bearing state monitoring and analysis is very important, in which it can discover early weak fault of the bearing and control the fault damage situation in time.
Recently, fault detection and diagnosis of bearing has been attracting considerable attention.Among all the kinds of bearing fault diagnosis methods, vibration signal analysis is one of the most principal and useful tools [2].In vibration-based bearing fault diagnosis, there are two kinds of approaches that have been proven effective to fault diagnosis: signal processing and pattern recognition [1,3].Conventional signal processing techniques such as fast Fourier transform (FFT), wavelet transforms (WT), and empirical mode decomposition (EMD) have been applied to bearing fault diagnosis and achieved some effectiveness [5,6].For pattern recognition approaches, artificial intelligence and machine learning are extensively used and studied, for example, fuzzy logic, support vector machine (SVM), and artificial neural network (NN) [7,8].However, most research only focused on single vibration analysis in bearing fault diagnosis.In fact, when using a single sensor vibration, the fault characteristics are very weak and useful information is limited.So, it requires intricate signal processing and feature  extraction.Sometimes the accuracy of fault diagnosis is not stable.
To improve the diagnosis accuracy of bearing, some researches put forward the multisignals.At present, fusion of multisource signals mainly focused on three aspects: data level, feature level, and decision level.Among them, the data level fusion primarily mixed the diagnosis objects such as temperature, pressure, and vibration signals [9,10].This needs various kinds of sensors and instruments in the process of data gathering.The monitoring cost is expensive and the manipulation is complicated.However, in the convergence of feature level with the same kinds of signals, it needed complex signal analysis and weighted calculation [11,12].These methods had some shortcomings such as poor realtime property and weak generalization ability.In the decision level, the intelligent approaches introduced by the literature are, for example, expert systems, decision tree, and SVM [13,14].However, these methods all belong to the shallow learning method; the learning ability is lower.
Recently, deep learning became popular in artificial intelligence and machine learning [15].As a key framework of deep learning, deep belief network (DBN) is primly constituted by stacked restricted Boltzmann machines (RBM) which is a generative stochastic neural network that can learn probability distribution over abundant data [16].In 2006, Hinton and colleagues utilized contrastive divergence to advance the RBM training process that greatly improved the learning efficiency of the DBN.The essence of DBN is the capability to automatically extract features through a successive learning process; it can mine the features from different aspects of the data in lower levels as input for the next layer [17,18].In addition, DBN accomplishes the learning process with an unsupervised pretraining and supervised fine-tuning.So, DBN has more mapping capability and extensive adaptability by a hierarchical structure.Due to the great advantages of DBN, it has obtained good effect in areas such as natural language understanding, image processing, speech recognition, and document recognition [18][19][20].
Lately, DBN gets the preliminary application in the field of fault diagnosis.Shao et al. [21] developed particle swarm to optimize the structure of the DBN and applied it to analyze the simulation signals and experimental signals of a rolling bearing, which obtained more accurate and robust results than other intelligent methods.Tamilselvan et al. [22] originally presented a novel multisensor diagnosis methodology which used the DBN in system health diagnosis such as aircraft engine and electric power transformer.Gan et al. [23] constructed a two-layer DBN of rolling-element bearing fault diagnosis, and experiments showed that DBN got highly reliable results compared to those obtained by SVM and BPNN; Lei et al. [24] proposed a method for multistage gear fault diagnosis with deep learning, which can adaptively extract available fault characteristics from the original data and acquire higher diagnostic accuracy than subsistent methods.Tran et al. [25] presented an approach to implement DBN and multi-information for fault diagnosis of reciprocating compressors.
This paper focuses on the early weak fault of rolling bearing and applies the DBN to integrate the time-domain features of multivibration.The remainder of this paper is organized as follows.In Section 2, the methodologies of deep belief network are introduced.In Section 3, the process of multivibration signal fusion is described.In Section 4, a bearing test rig is explained and experiments are conducted for the proposed method.In Section 5, implementation of classifier based on the DBN model is presented.The obtained results and their evaluation are described.Finally, conclusions and future work are given in Section 6.

Deep Belief Network
2.1.Deep Belief Network Architecture.DBN is a model based on probability of energy generation, which comprises multiple layers of restricted Boltzmann machines (RBM) and a backpropagation neural network (BPNN) [16].Figure 1 is the fundamental structure of DBN; the multilayered architecture makes sure that DBN can be trained through bottom-up learning in a sequence of RBMs and top-down fine-tuning by BPNN [17].
Restricted Boltzmann machine, the key prototype of DBN, is structured by a layer of visible (or input) units and a layer of hidden (or output) units.As every unit is binary, Shock and Vibration 3 it is trained by the activation probabilities.The units in the same layer are not connected to each other but have directed symmetrical connections to the units in the next layer.In DBN, the hidden layer of the RBM becomes the visible layer of the next RBM, so they set up a successive hierarchy by stacked RBMs.
In RBM, the visible node is denoted by V  and the hidden node is represented by ℎ  .The weights between V  and ℎ  are directed and denoted by w  .The visible and hidden nodes have their biases represented by vectors c and b, respectively.b  , c  , and w  of all RBMs make up the parameter set  in DBN.As the values of  define a probability distribution over the joint states of the visible and hidden nodes by an energy function, The ultimate purpose of DBN training is to find the best , which can minimize the model energy error and make the model at an equilibrium state.So, the energy function is utilized to define the joint probability distribution between v and h as follows: Since DBN has no intralayer connections, the conditional probability distributions of visible and hidden nodes can be calculated by

The DBN Training Process.
Generally, the DBN training procedure includes two parts: pretraining and fine-tuning.The pretraining is an unsupervised learning procedure which used the unlabeled data to train the individual RBM.The fine-tuning is a supervised learning process which utilized the backpropagation algorithm to further adjust the parameters.
In the pretraining, each layer is trained by the RBM rules.Since the RBM model is with binary units, it can be learned by stochastic gradient descent on the negative log-likelihood probability of the training data.The functions are as follows: where ⟨:⟩  denotes an expectation of the data distribution and ⟨:⟩  is an expectation of the distribution defined by the model.With the RBM property, it is easy to compute an unbiased sample of ⟨:⟩  to the data distribution.However, obtaining an unbiased sample of ⟨:⟩  is quite difficult [23].Actually, the RBM learning method closely approximates the gradient objective function called contrastive divergence (CD) [17], in which ⟨:⟩  is substituted by  iterations of Gibbs sampling as expressed in (6), where an iteration of alternating Gibbs sampling includes updating all parallel visible nodes by using (3), subsequently updating all parallel hidden nodes by (4).
Actually, one-step Gibbs sampling has been shown to perform surprisingly well [17].Based on (6), the updated methods for all parameters are given by the following equation, where  represents learning rate whose value is between 0 and 1: In the training process, dataset is usually divided into minibatches with a small number of data vectors and the values of  are updated after handling each minibatch.To stabilize the RBM learning procedure, a momentum () is often utilized in updating the synaptic weights and biases.With momentum (), the  update, at the current epoch, can be associated with the  update in the preceding epoch and calculated as After the bottom-up successive learning, the following step of the DBN training is top-down fine-tuning.Finetuning is a supervised learning process which used the

Multisignal Fusion with DBN
Multisensor information fusion technology can obtain more accurate, rich fault features from vibration signals [12].However, in the conventional information integration, signal processing needs to master a lot of signal processing technologies and to be combined with rich experience in engineering practice to extract fault features.Meanwhile, in the pattern recognition, traditional machine learning only contains single nonlinear transform structure; it cannot adaptively integrate the multi-information [20,26].
In this paper, we apply the deep belief network (DBN) to adaptively fuse multivibrations.There are four main processes in the proposed bearing: multichannel signal acquisition, feature extraction, information fusion, and fault recognition.
As shown in Figure 2, firstly, the vibration signals are acquainted by each sensor.Secondly, some time-domain characteristics are extracted from original signal of every individual sensor.Thirdly, without any artificial selection, features data of all signal sensors are put into the DBN and generate appropriate DBN classifier.Finally, the integrated information is used to train or test the classifier, and then the classifier puts out the diagnosis results and completes fault diagnosis.
Since the DBN has a hierarchical structure which can extract the features from various aspects of the data by a layer-by-layer successive learning procedure [17], the multiinformation fusion, based on deep belief network, can get rid of complex signal processing and complicated experience [24].It takes the unsupervised learning with RBM and directly extracts feature from the multivibrations and then uses the best parameters to design DBN and completes the multi-information integration.
However, the structure of DBN is closely related to the number of hidden nodes and hidden layers; if the DBN structure is too simple, learning ability is so poor that it cannot effectively integrate the multi-information.Meanwhile, if the DBN structure is too complicated, it not only wastes running time but also produces problems such as overfitting, local extremism, and training failure [26].Therefore, a method based on data reconstruction error is used to determine the structure of information fusion in DBN.
Figure 3 introduces the optimization process for the DBN structure.The reconstruct error is computed with the model outputs and the objective label data.At the beginning of the procedure, multichannel signal information is put into the DBN and the parameters of , , and  are initialized, where the , , and  are the max values of the hidden nodes, hidden layers, and reconstruction error, respectively.Then, DBN calculates the reconstruction error of training dataset by RBM learning rules.If the reconstruction error is less than , it finishes the optimization and puts out the parameters () of DBN.Otherwise, it increases the number of hidden nodes or hidden layers.If the numbers overflow  or , the procedure finds the best reconstruction error from history and builds the DBN for multi-information fusion.
Table 1 summarizes the procedure of bearing fault diagnosis using multi-information fusion with DBN.As shown in the table, the first step is gating the vibration signals from multichannels and collecting vibration data from each sensor.As the raw samples are nonlinear and unstable, it is necessary to extract some features from each sample.Then, the preprocessed vibration data are divided into training and testing datasets.The DBN structure is optimized by reconstruction error of training dataset and obtains the suitable DBN to accomplish the multi-information fusion.

Experimental Setup
In order to measure the validity of the suggested method, a bearing experimental platform is set up as shown in Figure 4.The bearing fault simulation platform was produced by Qian Peng Company with QPZ-II in China.As shown in Figure 4, the experimental table is mainly constituted with motor, belt coupling, bearing pedestal, and so on.The bearing is installed in the pedestal, and three magnet acceleration sensors are installed in the pedestal, labeled by  1 ,  2 , and  3 , respectively.The position of  1 is located on the vertical side of the bearing pedestal;  2 and  3 are, respectively, located on the lateral and front of the bearing pedestal.In the experiments, the variety of typical fault bearings can be installed and dismounted for multivibration collection.
The test bearings are produced by Harbin Bearing Manufacturing Company, China, with the bearing designation being NU205, which have 13 cylindrical rollers.The inner diameter is 25 mm, the outer diameter is 52 mm, and the thickness of the bearing is 15 mm.As shown in Figure 5, four experiments are carried out under each of the following bearing health conditions: the inner race fault, outer race fault, ball fault, and normal.All the faults are linear cutting with electrical discharge machining, and the cutting diameter is 0.5 mm; the cutting depth is 0.3 mm.
In the process of testing, a variable velocity motor directly drives a shaft.The belt on the right of the shaft brings along the coupling which runs with the same speed of motor.In the experiment, the sampling frequency is 10000 hz, the bearing speed is 1200 rpm, and the sampling time is 5 seconds.
According to the steps shown in Table 1, each experiment continuously acquainted 50000 signal points.Meanwhile, the bearing rotated 100 cycles.We select the signal points of a rotation cycle to construct a sample.So, 500 signal points constitute a data sample.There are four conditions defined for classification and 400 (100 × 4) training samples in dataset.Then, we randomly selected 200 samples constituting the test dataset.The dataset description is shown in Table 2.When the rolling bearing has local damage, it will cause the vibration signal mutation.The local damage position is different and the change of the vibration signal usually is not the same.Figure 6 is the amplitudes waveform of rolling bearing in different conditions.
It is seen from Figure 6 that the vibration signals waveforms are similar, and it is difficult to distinguish the various fault types of rolling bearings.So, some time-domain features are extracted from the original signals; the method is as follows: (1)   ( = 1, 2, . . ., ) is the discrete-time series of the th sensor, and the vibration signals of bearing rotating a cycle are   = [ 1 ,  2 , . . .,   ],  = ( × 60) ÷ ℎ,  where  is the sampling frequency (Hz) and ℎ is the rotational velocity (rpm).(4) Normalize the feature vector   = [ 1 ,  2 , . . .,   ],  =  × 14.

Multisignals and Individual Signals.
As the signals preprocessing is clearly explained in Section 4, 14 classical timedomain features are computed from the raw signals.
To illustrate the property of multivibrations fusion, the method based on each single sensor is also measured with the same conditions.These methods are represented as sensors 1, 2, and 3, corresponding to the three individual sensors.The input vectors of DBN in single sensor experiment have only 14 features extracted from vibration signals.The input vectors of DBN in multisensors have 42 features.The DBNs' structures are shown in Table 4.
The DBN structure of multisensors is 42-12-12-4.That is to say, the input layer contained 42 nodes and the output layer included 4 nodes, which depended on the dimensions of the input and output data.There are two hidden layers in the architecture; both hidden layers contained 12 hidden neurons, respectively.However, in sensors 1, 2, and 3 the input nodes are 14 and the hidden nodes are 8 in every hidden layer.The learning rate and momentum are used to adjust the model error and training efficiency.The learning rate in the experiment is selected as 0.01, and the momentum is 0.02 [20].In the sequential training of every individual RBM, the pretraining of each RBM is accomplished with 20 iterations.

𝑥 rms
In the fine-tuning of model parameters, stochastic gradient descent (SGD) is used to further reduce the training error and improve the information fusion.However, in this research, the SGD takes the minibatch to globally adjust the parameters in DBN.Since there are 400 samples in the training dataset, the number of minibatch is 10 in the experiments.We use the training dataset to train the DBN model and use the testing dataset to test the model identification accuracy.The classification process is repeated for 25 times and the classification results are averaged as shown in Figure 7.
The average accuracy of the training samples in multisensors is 97.5%, and the number of correct classification samples is 390.This is much higher than those using other methods, which are 91.5%,85%, and 87.5% with 366, 340, and 350, respectively.The average accuracy of testing samples in multisensors fusion is 95.5%, and the number of correct classification samples is 191.This is much higher than those using other methods, which are 89%, 78.5%, and 75% with 178, 157, and 150, respectively.
Compared to those individual sensors, the training and testing accuracies of multisensors information fusion are obviously higher than other methods.In the three individual sensors, the classified accuracy of  1 is better than  2 and  3 .It is indicated that the sensor put on the vertical location of the testing pedestal is most sensitive to the bearing faults.The results prove that it is more effective to integrate the signals from multisensors than to use the vibration from individual ones.5 and 6.

Multisensors
In the training experiments, DBN achieved 95.72% identification accuracy which is better than that of SVM (92.28%),KNN (90.06%), and BPNN (83.63%) for multivibration signals.For the testing experiments, the average accuracy of DBN is 93.17%; meanwhile SVM is 90.13%,KNN is 85.23%, and BPNN is 78.13%.In brief, the experiments results explain that the suggested methods have higher reliability and better accuracy than SVM, KNN, and BPNN in rolling bearing fault diagnosis.
The algorithms of DBN, SVM, KNN, and BPNN all trained the model by the same dataset and generated the classifier to carry out the bearing fault diagnosis.However, the stability and generalization ability are different in these methods.As shown in Tables 5 and 6, the training results of DBN approximately agree with the testing results in the 15 experiments.The testing accuracies in SVM and KNN are lower than training accuracies by 3-5%.The classification accuracy of BPNN is decreased obviously in test experiments.
In the experiments, the training datasets are selected from sequence data samples of all kinds of fault condition, and they orderly composite the training sample datasets.As the testing samples are randomly selected from the various states datasets, both sample category and sample order are random.SVM, to classify data identification, mainly depends on the kernel functions with the training set, which is closely related to the quantity and quality dataset.When using the KNN to classify the data identification, the results mainly are determined by the distance function, and once the distance function is selected, it will not be able to transform.So, the testing accuracies are much less than in training set of SVM and KNN.BPNN is a typically shallow learning model, which involved no more than one nonlinear feature transformation and has difficulty in representing complex functions with poor performance and generalization ability.
Compared with the traditional machine learning and signal processing technology, DBN has the merit to get rid of the dependence on signal processing technology.On the other hand, DBN can adaptively extract the fault feature without restrictive assumptions or complex parameter adjustment.Consequently, it is nothing strange that the DBN as a promising method has been effectively applied in multivibrations fusion.

Conclusions
Multiple sensors installed on various locations of bearing pedestal can supply abundant information for fault diagnosis and detection.Based on this observation, a novel technique using deep belief network for the multivibrations fusion is put forward in this paper.Some conventional time-domain features are extracted from three accelerometer vibration sensors.Without manual feature selection, the features are used directly as the input vectors of the DBN.The obtained accuracy of multisensors is 97.5% which is about 10% higher than single sensor.At the same time, the mean accuracy of DBN, SVM, KNN, and BPNN is, respectively, 93.17%, 90.13%, 85.23%, and 78.13%.It suggests that DBN is more effective and stable for the identification of rolling bearing fault diagnosis than other methods.From the results, it can be realized that DBN is able to adaptively integrate available fault features from multisensors and it obtained higher identification accuracy than traditional methods.
backpropagation (BPNN) to further decrease the training error and advance the classification accuracy of the DBN.As the BPNN is supervised learning, fine-tuning uses labeled data for the DBN training.Unlike the unsupervised training in DBN that only deals one RBM at a time, the BPNN simultaneously trains all layers in DBN.The training error of BPNN is calculated with model outputs and the target label data.And the backpropagation learning is continued until the model output attains the maximum number of epochs.

Figure 2 :
Figure 2: The flow diagram of multisensor information fusion.

Figure 3 :
Figure 3: The flow chart of the optimization DBN in signal fusion.

Figure 7 :
Figure 7: Classification rate of individual sensor and multisensors.

Table 1 :
Procedure for bearing fault diagnosis using multi-information with DBN.

Table 2 :
Sample distribution of normal and different faults.

Table 3 :
Statistics features in time domain.

Table 4 :
The parameters of DBN in sensors 1-3 and multisensors.

Table 5 :
Results of training datasets.

Table 6 :
Results of testing datasets.