Combining DBN and FCM for Fault Diagnosis of Roller Element Bearings without Using Data Labels

Because deep belief networks (DBNs) in deep learning have a powerful ability to extract useful information from the raw data without prior knowledge, DBNs are used to extract the useful feature from the roller bearings vibration signals. Unlike classification methods, the clustering method can classify the different fault types without data label. (erefore, a method based on deep belief networks (DBNs) in deep learning (DL) and fuzzy C-means (FCM) clustering algorithm for roller bearings fault diagnosis without a data label is presented in this paper. Firstly, the roller bearings vibration signals are extracted by using DBN, and then principal component analysis (PCA) is used to reduce the dimension of the vibration signal features. Secondly, the first two principal components (PCs) are selected as the input of fuzzy C-means (FCM) for roller bearings fault identification. Finally, the experimental results show that the fault diagnosis of the method presented is better than that of other combination models, such as variation mode decomposition(VMD-) singular value decomposition(SVD-) FCM, and ensemble empirical mode decomposition(EEMD-) fuzzy entropy(FE-) PCA-FCM.


Introduction
With the development of science and technology, aerospace equipment, industrial equipment, and other fields of mechanical and electrical equipment have become increasingly complex, intelligent, and integrated, so that the operating conditions and the working environment are becoming more complex and changeable.erefore, accurate and effective fault diagnosis in complex equipment systems becomes an effective way to improve the reliability and safety of the systems and to reduce the maintenance cost [1].Roller bearings as one of the most common components in mechanical systems and their operating conditions will directly affect the performance of the entire mechanical equipment [2][3][4][5].Using vibration signals for roller bearings, fault diagnosis has become one of the commonly used ways in recent years.Analyzing the roller bearing vibration signals and extracting their characteristics effectively are very important and of practical significance because the vibration signals can reflect the state of the roller bearings and the quality of the feature extraction, which determines the accuracy of the fault diagnosis.
For signal feature extraction, many different traditional methods for vibration signal feature extraction have been presented, such as statistical analysis, wavelet transform (WT), and various mode decomposition models.In reference [6], different statistical indexes, such as mean value, kurtosis, and clearance factor, are employed to calculate the vibration signals, and they are regarded as the eigenvectors for assessing the degradation of slurry pumps by using vibration signals.Wang et al. proposed a method based on WT for gear fault diagnosis [7].e vibration signals are decomposed into continuous statistical features on different scales by using WT.Because the dimension of the features is high, principal component analysis (PCA) is used to reduce the dimension of the eigenvectors.However, WT needs to select the wavelet function and the number of decomposition layers.As the vibration signals have nonlinear and nonstationary features, this is not a self-adaptive method.To overcome this drawback, empirical mode decomposition (EMD) [8] can decompose the vibration signals into series of intrinsic mode functions (IMFs) and a residual selfadaptivity.However, EMD has a mode-mixing problem.Ensemble empirical mode decomposition (EEMD) [9] can solve the mode-mixing problem self-adaptively by introducing Gaussian white noise and decomposing a complicated signal into IMFs.Many scholars use the EEMD and entropy combination models to extract the vibration signal features.Zhang and Zhou employed the EEMD to decompose the roller bearings' vibration signals into some IMFs, and then fuzzy entropy (EE) is used to calculate the IMF entropy values; the extracted features are selected as the input of the support vector machine (SVM) for roller bearing fault diagnosis [10].In reference [11], a method based on EEMD, sample entropy (SE), and SVM for fault diagnosis is developed, and the main purpose of this paper is similar to that of reference [11]; the only difference is that the fact SE replaces fuzzy entropy (FE).
EEMD cannot separate the vibration signals correctly for the closely located frequencies, but variation mode decomposition (VMD) [12] decomposes the signal into variation and nonrecursive modes.Because its essence is a number of adaptive wiener filter groups, VMD can separate two pure harmonic signals with similar frequencies.In [13], the roller bearings vibration signals are decomposed into some band-limited intrinsic mode functions (BLIMFs), and then singular value decomposition (SVD) is used to compute the eigenvalue of each BLIMF.
However, for some complex systems, the traditional feature extraction methods, regardless of self-adaptivity or not, are not enough to extract the sensitive features of all fault types due to the interaction of the external environment and the internal structure.Sometimes, several fault feature extraction methods need to achieve a certain effect for fault.
DBN, PCA, and FCM models are reviewed.Section 3 describes the experimental data sources, evaluation of the clustering effect, and the fault diagnosis methodology.e validation of the experiments is given in Section 4. Finally, the conclusion is given in Section 5.

The Theoretical Framework of DBN, PCA, and FCM Models
2.1.eoretical Framework of DBN.DBN was proposed by Hilton and Salakhutdinov [14].It is widely applied in object and speech identification and image classification.DBN contains input layer X, hidden layer (multilayer unsupervised restricted Boltzmann machine (RBM)), and an output layer.
e network structure of DBN is shown in Figure 1. e RBM is a classic energy-based model, which includes a visible and a hidden layer.e structure of RBM is shown in Figure 2, where vector v and h denote the visible and hidden layers, respectively.W denotes the connection weight values between the visible and hidden layers.For these layers, the connection is complete between the intercellular nodes, and there is no separate connection in each layer.e invisible and hidden layers' neuron values are binary variables, and the neurons' numbers in the visible and hidden layers are I and J, respectively.v i and h j represent the status between the ith visible neuron and the jth hidden layer neuron.For a group of a specific combination (v, h), RBM as a system with energy is listed as follows: where θ � (w ij , a i , b j ) is the parameter matrix, w ij denotes the connection weight values between the visible layer v i and hidden layer h j , and a i and b j denote the bias values of the visible layer v i and hidden layer h j .e joint probability distribution based on the energy function is obtained by where Z(θ) �  v  h e −E(v,h) θ is called the partition function and the distribution function (the likelihood function) p(v) θ is the edge distribution of joint probability p(v, h) θ .For a given visible layer, each neuron in a hidden layer is independent.erefore, the active probability is where δ(x) � 1/(1 + e −x ) is the sigmoid active function.For a given hidden layer, the active probability of the ith neuron node in the invisible layer is RBM is the trained iterative way, and the purpose of the training is to learn the value of the parameter θ � (w ij , a i , b j ) to the fitting when the training data is used, where the parameter θ can be obtained by finding the maximal loglikelihood function by using the training set (where N is the number of samples): Update the parameters w ij , a i , and b j according to the following equations using the contrastive divergence (CD) algorithm: where ε is the learning rate in the pretraining phase and 〈•〉 data and 〈•〉 recon denote the mathematical expectation.

eoretical Framework of PCA.
e essence of PCA is to retain the coordinates of the main components as 2 Shock and Vibration the new data space direction to achieve the goal of dimension reduction.Assuming X k N ∈ R, (k � 1, 2, . . ., K), the K dimensional matrix with N samples, x k i (i � 1, 2, . . ., N) is the ith sample, where R is the covariance matrix.R is defined by where where [p 1 , p 2 , . . ., p i ] denotes the first to ith PCs, [p i+1 , p i+2 , . . ., p k ] is the residual space (RS), and hence, e i is the projection of x i in RS.C is the projection matrix.e process of obtaining the projection matrix C by means of the covariance matrix R is called the modeling process.e number of PCs directly affects the merits of the model and the final fault detection and diagnostics.
is paper uses the main component contribution rate method to select the number of PCS as follows: where η is the percentage of the total variance explained by the first k ≤ K PCs.

eoretical Framework of FCM.
FCM is one of the most common clustering algorithms, based on the objective function to minimize the Euclidean distance between each sample and all clustering centers.Correct cluster centers and classification matrices should be used to meet the termination criteria condition constantly, and hence, the data samples with similar characteristics are clustered into a class.For a given vector X � [x 1 , x 2 , . . ., x n ], the corresponding fuzzy classification matrix is A � [u ij ] c×n and c is the number of clusters.
e clustering centers are C � [c 1 , c 2 , c 3 , . . ., c c ], as mentioned above, and FCM is described as follows: where n, c, and m represent the number of samples, clusters, and weighted index, respectively.u m ij is used to determine the degree of membership of each sample for each cluster.e greater the value u m ij is, the greater the likelihood of the ith cluster also is.d 2 ij is the Euclidean distance between the jth point and ith clustering center point.
FCM converted the extreme value problems with constraints to unconstrained issues by introducing the operator 1)

Hidden layer
Output layer Input layer

Shock and Vibration
3 Equation ( 11) is the objective function, and the necessary conditions for it to reach the minimum value under the following conditions are as follows: e purpose of the FCM is to find the classification matrix and the clustering centers, which will minimize the value of the objective function to smallest.e procedure of FCM is as follows: Step 1. Initialize the cluster center point number c, weighted index m, classification matrix A � [u ij ] c×n , and iteration number l � 0.
Step 2. Calculate the cluster centers C according to Equation (12).
Step 3. Update the classification matrix A according to Equation (13).
Step 4. If ‖A l+1 − A l ‖ < ε, stop the loop, or else set l � l + 1, and return to step 2.

Data Source.
e experimental data came from Case Western Reserve University Bearing Data [15].ree faults (inner Race Fault (IRF), Outer Race Fault (ORF), and Ball Fault (BF)) with fault diameters of 0.18 mm (1 hp), 0.36 mm (2 hp), and 0.54 mm (3 hp) were employed in this paper.e sampling frequency was 12000 Hz.
Table 1 shows the working conditions which are under consideration in this study.In Table 1, "NR" represents the bearings with no faults.e fault diameters are 0.18 mm (1 hp), 0.36 mm (2 hp), and 0.54 mm (3 hp).A, B, and C represent the three datasets.Each subset contains ten types of roller bearings faults under different conditions.Each type of the fault dataset has 30 samples with 2048 points, and hence, different datasets A, B, and C have a total of 300 samples.

Clustering Effect Evaluation.
e two indicators partition coefficient (PC) and classification entropy (CE) are applied to evaluate the quality of the clustering results [16].Partition coefficient is defined as where μ iq denotes the membership value of the qth point in the ith cluster.e disadvantage of a PC is the lack of a direct connection to certain attributes of the data itself.
(1) Classification entropy measures the fuzziness of the cluster partition only: When the PC value is close to 1, it means that the effect of clustering is good; when the CE value is close to 0, it indicates that the effect of the clustering is better [16].

3.3.
e Procedures of the Method Presented.e roller bearings vibration signal features were extracted by DBN, and then PCA was used to reduce the dimension of the eigenvectors.e first two PCs were regarded as the input of FCM for fault diagnosis.
e procedures of the method presented are listed below: (1) Because the frequency spectra of rotating machinery can reflect how their important components are distributed with discrete frequencies, they can potentially provide useful information about the health and working conditions of the machine [17].erefore, fast Fourier transformation (FFT) is used to resolve the original vibration signal into a coefficient symmetrical matrix.As a result, the half coefficient symmetrical matrix is selected as e detailed flow chart is shown in Figure 3.

VMD Decomposition.
To decompose vibration signals effectively, the number of mode m in VMD should first be predetermined.When the value of m is too small, the decomposition of the mode cannot fully reflect the original signal with the time-frequency information, and therefore, the VMD decomposition cannot be achieved.A larger m produces a similar frequency for each BLIMF component, which may result in overdecomposition.erefore, in order to select the appropriate m, we observe the center frequency of the signal to determine the m according to references [13].e results of the center frequency under different modes m are shown in Table 2.
Here, it is shown that when dataset A is used, the center frequency in the IRF2 signals ranges from 0.0507 kHz to 0.3469 kHz especially when m � 5, for example, 0.2279 kHz in BLIMF3, 0.2980 kHz in BLIMF4, and 0.3469 kHz in BLIMF5.e center frequencies of these three modes are very close to one another.is indicates that the vibration signals are not decomposed effectively.e same also happens when m � 3 (like 0.0535 kHz in BLIMF1, 0.2145 kHz in BLIMF2, and 0.2979 kHz in BLIMF3).However, the decomposition results (m � 4) contain four frequency components which are separated well.Hence, the parameter m in VMD is selected as 4. e penalty factor z is often set at 2000 [13].e VMD composition results and its envelope spectrum for each BLIMF are shown in Figure 5.
As shown in Figure 5(a), the IRF2 vibration signals are decomposed into four BLIMFs components.e range of amplitude of each BLIMF is gradually increased; each BLIMF frequency is also increased.To further verify the The PCA is used to reduce the dimension of the former eigenvectors, and then the first two PCs are selected as the input of FCM to fulfill the roller bearings fault diagnosis

Data dimension reduction
The FCM model is used to finish the roller bearings fault identification.In addition, the PC, CE, and the classification accuracy are used to compare the different combined models

Feature Extraction Using DBN.
Firstly, e FFT is used to transform the time-domain vibration signal to the frequency domain; here, we take an IRF2 signal for example.e result of the spectrum envelope analysis in the frequency domain is shown in Figure 6.
As shown in Figure 6(b), the IRF2 signal working frequencies mainly focus on 0-1000 Hz.Because the working frequency for the IRF2 signal is 58 Hz, the frequency is mainly focused on 58 Hz and the double frequency (164 Hz).
is indicates that the frequency-domain signal contains useful feature information.erefore, we use the FFT to preprocess the di erent vibration signals in the rst step.
en, we use the DBN in this section to extract the feature.e number of input layer nodes is set at 1024 because each sample contains 2048 points and only half of the coe cient matrix is used after FFT decomposition before the DBN training procedure.e numbers of neural nodes for the second to fourth layers are set at 512, 256, and 128, respectively.e learning rate is 0.15, the momentum value is 0.65, and the number of epochs is 1200.After the vibration signals' features have been extracted by each layer in DBN, reduce the dimensions of the feature vectors by using PCA.e results of the rst two PCs for each hidden layer are shown in Figure 7.
Obviously, data, of the same fault type, are discrete in the rst three layers, while there may be overlapping between data of di erent fault types in all the datasets.As the number of hidden layers increases, these scattered data points are more concentrated at one point and these data of di erent fault types are more separated from one another.As can be seen in Figure 7, from all datasets, the data points of the same shapes are more concentrated at one point and there is a clear separation between di erent fault data types in the nal hidden layer as compared with that of the rst hidden layer.For example, all NR signal data, which have a triangular shape, are concentrated (overlapping with each other), in dataset B in the nal hidden layer, and they are, however, discrete in the rst hidden layer.
e results of PC1 and PC2 through the nal hidden layer when PCA is used are shown in Table 3, where "total" denotes the sum of all eigenvalues λ in Equation ( 9) and η is the cumulative contribution rate calculated by the rst two PCs.
e two largest eigenvalues (λ1-λ2) in Equation ( 9), when PCA is used, are the rst two PCs; the greater the λ is, the more useful the information contained in the corresponding  PC is. e available number of PCs is often selected as 2 when the cumulative contribution rate η is more than 80% [16].In Table 3, it is up to more than 85% with di erent datasets, for example, 87.81% in dataset A. Moreover, with the increment of the number of PCs, the eigenvalues are decreased, and the rst two PCs are often selected as the input of FCM for roller bearing fault diagnosis (as space is limited, only the rst six PCs are shown in Table 3).

Fault Diagnosis and a Comparison Analysis.
Before roller bearing fault diagnosis, the EEMD is also used to decompose the vibration signal into some IMFs.Some parameters in EEMD should be preset, such as Gaussian white noise amplitude mm, and the number of inserted white noise nn in EEMD, embedded dimension m, and similarity tolerance r in FE should generally be set before calculation for parameter nn in EEMD.If the additional noise is standard  Shock and Vibration deviation and this is only a small part of the standard deviation of the input signal, then the remaining noise will result in less than 1% error.e authors suggest that the value of added white noise mm is usually xed at about 20% of the standard deviation of the input signal [18][19][20][21][22][23][24].e parameter mm is set at 100.
In FE, the greater embedded dimension m allows more detailed reconstruction of the dynamic process.But too great an m value is unsuitable due to the need for too great N 10 m − 30 m , which is di cult to meet the general requirements and will bring about loss of information.m is often xed at 2 [16].Here, similarity tolerance r is often xed at 0.1 -0.25 * SD.SD denotes the standard deviation of the original vibration signals [16].
For the FCM model, the parameter c 4 is set, where c is the number of clusters.Meanwhile, the value of the termination tolerance ε 1e − 6. FCM is used to identify the different roller bearing faults, and the results of two-dimensional The third hidden layer       is demonstrates that the DBN has a good feature extraction ability.
To verify the clustering effect, the three indicators PC, CE, and classification accuracy are used to estimate and compare the method presented, namely, the EEMD-FE-PCA-FCM, and VMD-SVD-FCM models.e results of α (PC) and β (CE) are shown in Table 4. e values of PC and CE are calculated by u m ij , and in Equation ( 14), the greater the PC value close to 1, the better the effect of FCM.However, the smaller the CE value is close 0, the better the effect of FCM.
To demonstrate that DBN can extract the signals effectively, classification accuracy is used to compare the DBN-PCA-FCM, VMD-SVD-FCM, and EEMD-FE-PCA-FCM models.
e corresponding clustering accuracy is shown in Table 5: (1) e greatest classification accuracy is up to 100% with DBN, and the lowest classification accuracy is 76.037%.(2) e overall classification accuracy of DBN is greater than that of the VMD and EEMD models, about 10%-20%.(3) For different vibration signals, particularly in dataset C, the accuracy is up to 100% in DBN.But it is slightly lower in VMD, for example, 23.3% and 60%.
e method presented can extract the vibration signals and diagnose faults effectively, and its clustering is superior to that of the EEMD-FE-PCA-FCM and VMD-SVD-FCM models.

Conclusions
A method based on DBN and FCM for roller bearing fault diagnosis is presented in this paper.Unlike many traditional feature extraction methods, the different roller bearing vibration signals are extracted by using DBN.To visualize the data, PCA is used to reduce the dimension of the eigenvectors.en, the first two PCs are selected as the input of FCM for fault diagnosis, and the experimental results show that the feature extraction is better than that of the other models, such as the VMD-SVD/EEMD-FE-PCA combination model.
e classification accuracy shows that the FCM clustering model can identify the roller bearing faults well under various conditions without data labels.
the input vector for training DBN.Before DBN training, the input data is normalized to [0, 1].(2) Several hidden layers are used to extract the features of the vibration signals.(3) Reduce the dimensions of the features of the vibration signals in step 2 by using PCA.e first two PCs are regarded as the input of FCM for fault diagnosis (4) PC, CE, and classification accuracy were employed to evaluate the clustering performance of the different combination models, such as EEMD-FE-PCA-FCM, VMD-SVD-FCM, and DBN-PCA-FCM.
Figure 4. e ten kinds of vibration signals are difficult to distinguish.ere are no obvious vibration patterns in the NR and IRF signals.Unlike NR and IRF signals, the BF and ORF signals have obvious vibration patterns because the bearing and outer race components experience a certain impact when the roller bearings are working.Compared with NR and BF signals, IRF and ORF vibration signals have fixed vibration periods in some unique frequency bands, and the self-similarity is high.Especially when the inner ring is fixed, the outer ring rotates with the roller bearings; hence, the vibration regularity in ORF signals is more obvious.IRF and ORF vibration signals have strong periodic regularity, but it is difficult to distinguish these vibration signals under different conditions.To mine the signal features, the DBN, VMD, and EEMD models are used to decompose the vibration signals and PCA is used to reduce the dimension of the extracted features.
Obtain the frequency spectra of the original vibration signal using FFT Start Organize these coefficients symmetric matrix into the training set and normalized into [0, 1] Data preprocess Build DBN with N hidden layers to extract the roller bearings vibration signal features The EEMD/VMD is used to decompose the roller bearings vibration signal Feature extraction

Figure 3 :
Figure 3: e flow chart of the method presented.

Figure 4 :
Figure 4: e time-domain waveforms for each working condition.

Figure 6 :
Figure 6: e time domain waveforms and the envelope spectrum of the IRF2 signal: (a) time domain; (b) envelope spectrum.

Figure 7 :
Figure 7: e PC1 and PC2 obtained by each hidden layer in DBN: (a, d, g) dataset A; (b, e, h) dataset B; (c, f, i) dataset C.

Table 1
In this section, various vibration signals inTable 1 are first preprocessed by FFT, and then DBN is used to extract the useful feature information through several hidden layers.e time-domain figure of the various original vibration signals is shown in

Table 2 :
Center frequency corresponding to di erent m in VMD.

Table 3 :
e results of PC1-PC2 and the value of η.As shown in Table4, the results of PC in DBN are overall greater than those of VMD and EEMD, and the results of CE in DBN are smaller than those of

Table 4 :
e results of α (PC) and β (CE).PCA-FCM and VMD-SVD-FCM models are scattered randomly, but in Figures8(f)-8(h), these scattered data points are more concentrated at one point, and the data of different fault types are more separated.

Table 5 :
e results of classification accuracy.