This paper presents a model based on stacked denoising autoencoders (SDAEs) in deep learning and adaptive affinity propagation (adAP) for bearing fault diagnosis automatically. First, SDAEs are used to extract potential fault features and directly reduce their high dimension to 3. To prove that the feature extraction capability of SDAEs is better than stacked autoencoders (SAEs), principal component analysis (PCA) is employed to compare and reduce their dimension to 3, except for the final hidden layer. Hence, the extracted 3-dimensional features are chosen as the input for adAP cluster models. Compared with other traditional cluster methods, such as the Fuzzy C-mean (FCM), Gustafson–Kessel (GK), Gath–Geva (GG), and affinity propagation (AP), clustering algorithms can identify fault samples without cluster center number selection. However, AP needs to set two key parameters depending on manual experience—the damping factor and the bias parameter—before its calculation. To overcome this drawback, adAP is introduced in this paper. The adAP clustering algorithm can find the available parameters according to the fitness function automatic. Finally, the experimental results prove that SDAEs with adAP are better than other models, including SDAE-FCM/GK/GG according to the cluster assess index (Silhouette) and the classification error rate.
As a key part of mechanical systems in the electric devices in microgrid networks, the operational health of bearings is related to the operation of the entire device [
For nonlinear and nonstationary signals, various feature extraction, and diagnosis methods are continuously developed; time and frequency indicators, wavelet transformation (WT), and empirical mode decomposition (EMD) are commonly used for fault feature extraction and have achieved significant results. However, various time-frequency domain indicators and wavelet transformation (WT) cannot adaptively decompose vibration signals because different vibration signals have different working frequency bands. Thus, EMD is proposed as a way to adaptively decompose the signal into intrinsic mode functions (IMFs) based on the current envelope mean of the signal [
An increasing number of scholars have focused on deep learning in fault diagnosis due to its powerful automatic extracting features. For example, many studies have employed stacked autoencoders (SAEs) to extract features and fault diagnosis automatically and successfully [
In addition, marking data labels requires a great deal of labor and richly experimental engineering when working with large amounts of data. Therefore, no manual experience or prior knowledge is required to mark the fault type and fault label by using SDAEs without an output layer.
To identify the different fault types automatically, cluster model is used to complete the fault diagnosis without data labels in this paper. Fuzzy C-mean (FCM) is a commonly used model in fault diagnosis [
However, all of the clustering models mentioned above need to preset the cluster numbers through manual experience before calculation. The affinity propagation (AP) clustering algorithm can automatically find the appropriate number of clusters. AP continuously performs message passing and iterative looping to generate
Therefore, a method based on SDAEs and adAP for bearing fault diagnosis is presented in this study. The main attributes are presented in the following section: Different from traditional multistep fusion fault diagnosis methods and the basic SDAE model, which require data labels for fault classification, SDAEs without an output layer are utilized to extract fault features directly from the frequency domain and weaken the dependence on manual experience to mark the data labels in this paper. There are few reports in the literature in which the adAP model is applied to bearing fault diagnosis. To prove the extracting feature performance of the proposed model (SDAE-adAP), classification accuracy and Silhouette are used to demonstrate the performance of adAP in suppressing some other models, such as FCM/GK/GG.
The rest of this paper is organized as follows. Section
Autoencoders (AEs) include encoders and decoders [
The network structure of AE.
Encoders are used to map the input to the following hidden layer and obtain a new nonlinear extracted hidden feature
The decoder is utilized to map and reconstruct the extracted hidden feature
Denoising autoencoders (DAEs) mix the training data into the noise (the data are randomly set to zero) and remove the noise to obtain the reconstructed output data. In the case of destroyed data, DAEs achieve a better description of the input data and enhance the robustness of the entire model. The structure of a DAE is shown in Figure
DAE structure.
In Figure
Hence, the cost function in equations (
The SDAE concept was presented by Vincent [
SDAE structure.
The first is the greedy layer-by-layer learning of SDAEs using unmarked samples. The specific process is as follows: assuming that the total number of hidden layers is
In the backpropagation error calculation process, it is necessary to calculate the residual
Use equations (
To adjust the parameters of each hidden layer, use the following equation:
It should be mentioned that the input
The AP algorithm works on the
The AP algorithm continuously collects relevant evidence from the data to select the available class representation: AP uses
The AP algorithm generates
There are two key parameters (i.e., the bias parameter
As mentioned above,
From equations (
To overcome the drawbacks mentioned above, adAP searches the cluster number space by scanning the bias parameter space to find the optimal clustering result (called adaptive scanning) and adjusts the damping factor
The goal of adAP is to eliminate both oscillating and fast algorithms when oscillation occurs. Although it is more likely to increase
The adaptive adjustment damping factor technique is designed as follows: The AP algorithm performs a loop to detect whether oscillation is occurring. If there is oscillation, increase Continue Repeat the steps above until the algorithm reaches the stop condition.
If increasing
To keep the algorithm fast, the design of the bias parameter
The algorithm starts from the initial given
The acceleration technology of the The AP algorithm performs an iteration to check whether the number of cluster classes converges to Check if the number of clustering classes converges to
Otherwise, go to step 1.
If (
The pseudocodes of adaptive
The four basic faults—that is, normal (NR), ball fault (BF), inner race fault (IRF), and outer race fault (ORF)—were collected from a motor driving a power device [
The experiment data under various working conditions. Here, “1,” “2,” and “3” denote fault diameters of 0.18 mm, 0.36 mm, and 0.54 mm, respectively.
Datasets | Fault diameters (mm) | Fault type |
---|---|---|
A/B | 0.18/0.36/0.54 | NR |
BF1 | ||
IRF1 | ||
ORF1 | ||
BF2 | ||
IRF2 | ||
ORF2 | ||
BF3 | ||
IRF3 |
By searching the class number space, adAP can output some clustering results with various cluster numbers. Therefore, the clustering validity method can be used to assess the performance of the clustering results. Among the many effectiveness indicators, the Silhouette index is widely used because of its evaluation ability for obvious cluster structures. The Silhouette index shows the interclass tightness of the cluster structure and the class separability [
A dataset is divided into
It is easy to calculate the average
For a series of Silhouette index values of clustering results, the larger the value is, the better the clustering quality becomes. The cluster number corresponding to the largest value is the optimal cluster number, and the corresponding clustering result is also optimal [
The detailed procedures of the proposed method contain three sections: (1) data preprocessing, (2) feature extraction, and (3) fault diagnosis: Data preprocessing: the fast Fourier transformation (FFT) is utilized to transform the raw signal from the time domain to the frequency domain because the coefficient matrix is symmetrical after FFT operation. Hence, half of the coefficient matrix was used for SAE and SDAE training and testing. All input data are normalized into [0, 1]. Feature extraction: since the dimension of the original data is high and cannot be visualized, PCA is used to reduce the feature dimension at each hidden layer to compare the feature extraction performance of the SAE and SDAE. Note that the extracted feature vector dimension is 3 at the final hidden layer without PCA operation. Fault diagnosis: after training the SAE and SDAE, 3-dimensional feature vectors are considered as the inputs of FCM, GK, GG, and adAP for fault identification. To verify that the clustering performance of the proposed SDAE-adAP is better than other models, such as SAE/SDAE-FCM/GK/GG, SAE-adAP, a clustering evaluation index (i.e., Silhouette) is used to assess the clustering performance results. In addition, the accuracy is also utilized to compare the identification performance of the different models. The detailed procedures are shown in Figure
The procedure of the proposed model.
First, the origin vibration signal is shown in Figure
The original signal under various working conditions.
BF2 FFT results by using different datasets; here, the unit of the
The coefficient matrixes are used for feature extraction through eight hidden layers. Some parameters in the SAE and SDAE should be set before training, such as the input size, the learning rate, the denoising rate, and the total number of the neural nodes at each hidden layer.
The length of each original sample is 2,048 points. The frequency domain coefficients of each sample after FFT transformation are symmetrical; hence, the length of each input sample in the SDAE is transformed to 1,024. In addition, the hidden layer adopts a triangular structure—that is, the number of nodes in the following adjacent hidden layer is half that of the previous hidden layer. Therefore, the number of nodes in the first hidden layer is 512. The neural node numbers at the first eight hidden layers are selected as 512, 256, 128, 64, 32, 16, 8, and 3. Then, the first three principal components (PCs) in PCA are chosen as the fault feature for data visualization and compared to the feature extraction ability of the SAE and SDAE.
Since much information is missing when the denoising probability
If the learning rate is too high, the convergence speed of reconstruction error will be fast, but it is easy to trap into the local optimal point. However, if the learning rate is too small, the SAE and SDAE models will exhibit slow convergence [
The 3-dimensional results of different datasets for the training dataset through eight hidden layers by using SDAE/SAE with PCA dimension reduction under different conditions are shown in Figures
The 3-dimensional results of different datasets for the training dataset through eight hidden layers by using an SAE with PCA dimension reduction; 1–8 denote the hidden layer number. (a) SAE-A-512-training data. (b) SAE-B-512-training data. (c) SAE-A-256-training data. (d) SAE-B-256-training data. (e) SAE-A-128-training data. (f) SAE-B-128-training data. (g) SAE-A-64-training data. (h) SAE-B-64-training data. (i) SAE-A-32-training data. (j) SAE-B-32-training data. (k) SAE-A-16-training data. (l) SAE-B-16-training data. (m) SAE-A-8-training data. (n) SAE-B-8-training data. (o) SAE-A-3-training data. (p) SAE-B-3-training data.
The 3-dimensional results of different datasets for the training dataset through eight hidden layers by using an SDAE with PCA dimension reduction; 1–8 denote the hidden layer number. (a) SDAE-A-512-training data. (b) SDAE-B-512-training data. (c) SDAE-A-256-training data. (d) SDAE-B-256-training data. (e) SDAE-A-128-training data. (f) SDAE-B-128-training data. (g) SDAE-A-64-training data. (h) SDAE-B-64-training data. (i) SDAE-A-32-training data. (j) SDAE-B-32-training data. (k) SDAE-A-16-training data. (l) SDAE-B-16-training data. (m) SDAE-A-8-training data. (n) SDAE-B-8-training data. (o) SDAE-A-3-training data. (p) SDAE-B-3-training data.
In Figure
The results of adAP clustering are shown in Figure
The 3-dimensional clustering results for the training dataset by using an SAE/SDAE with adAP. (a) SAE-A-training data-adaptive AP. (b) SAE-B-training data-adaptive AP. (c) SDAE-A-training data-adaptive AP. (d) SDAE-B-training data-adaptive AP.
After the parameters mentioned above are preconfigured, 3-dimensional features are chosen as the input of adAP for fault diagnosis. The 3-dimensional clustering results for training datasets A and B by using an SAE/SDAE with adAP are shown in Figure In Figure From Figure
These results demonstrate that the robustness and the feature extraction ability of SDAEs are better than those of SAEs. Moreover, adAP can find the cluster center point automatically.
The result of the energy function
The results of the energy function (similarity) for all samples.
Hence, the best cluster number is 9. The clustering index (Silhouette) in equation ( From Table Although the best cluster number is 9 in the SAE, it is the same in the SDAE for dataset B, while the largest value of the Silhouette index is 0.889. Hence, the feature extraction ability of the SDAE exceeds that of the SAE, and adAP can find the available parameters automatically. The classification accuracy using the best cluster number is shown in Table
The results of the Silhouette index with different cluster numbers by using an SAE/SDAE with adAP (the training dataset). Here, “—” means there is no cluster number equal to 10, 11, or 12.
Model | Dataset | Silhouette index | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Cluster no. | ||||||||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ||
SAE-adAP | A | 0.330 | 0.391 | 0.484 | 0.538 | 0.592 | 0.635 | 0.664 | 0.683 | 0.694 | 0.672 | 0.662 |
B | 0.483 | 0.510 | 0.636 | 0.608 | 0.630 | 0.574 | 0.641 | 0.678 | — | — | — | |
SDAE-adAP | A | 0.413 | 0.549 | 0.695 | 0.738 | 0.808 | 0.839 | 0.935 | 0.954 | — | — | — |
B | 0.431 | 0.512 | 0.602 | 0.713 | 0.812 | 0.875 | 0.864 | 0.889 | — | — | — |
The results of the best cluster number and classification error accuracy at the maximum Silhouette index value by using different models (the training dataset).
Dataset | Model | Best cluster no. | Silhouette index | Error rate (%) |
---|---|---|---|---|
A | SAE-adAP | 10 | 0.6945 | — |
SDAE-adAP | 9 | 0.9545 | 0 | |
B | SAE-adAP | 9 | 0.6785 | 4.07 |
SDAE-adAP | 9 | 0.8898 | 2.22 |
To further demonstrate that the proposed model (SDAE-adAP) is better than SAE/SDAE-FCM/GK/GG, the 3-dimensional clustering results for training dataset A and B by using the SAE/SDAE with FCM/GK/GG are shown in Figures
The 3-dimensional clustering results for training dataset A by using an SAE/SDAE with FCM/GK/GG. (a) SAE-FCM-A. (b) SAE-GK-A. (c) SAE-GG-A. (d) SDAE-FCM-A. (e) SDAE-GK-A. (f) SDAE-GG-A.
The 3-dimensional clustering results for training dataset B by using an SAE/SDAE with FCM/GK/GG. (a) SAE-FCM-B. (b) SAE-GK-B. (c) SAE-GG-B. (d) SDAE-FCM-B. (e) SDAE-GK-B. (f) SDAE-GG-B.
Compared with the SAE, most of the samples are separated well and close to the center point in the SDAE. While some samples exhibit an overlap phenomenon—especially IRF1 and BF1 in Figure
The error classification accuracy obtained through various models (the training dataset).
Model | Dataset | Error rate (%) | |||
---|---|---|---|---|---|
FCM | GK | GG | adAP | ||
SAE | A | 8.15 | 16.3 | 6.03 | — |
B | 2.59 | 3.33 | 2.59 | 4.07 | |
SDAE | A | 11.1 | 21.1 | 11.1 | 0 |
B | 12.9 | 4.81 | 11.1 | 2.22 |
The testing datasets is used to test the performance model. As with the training dataset, the feature extraction procedure through several hidden layers in the SAE and SDAE is shown in Figures
The 3-dimensional results of different datasets for the testing dataset through eight hidden layers by using an SAE with PCA dimension reduction; 1–8 denote the hidden layer number. (a) SAE-A-512-testing data. (b) SAE-B-512-testing data. (c) SAE-A-256-testing data. (d) SAE-B-256-testing data. (e) SAE-A-128-testing data. (f) SAE-B-128-testing data. (g) SAE-A-64-testing data. (h) SAE-B-64-testing data. (i) SAE-A-32-testing data. (j) SAE-B-32-testing data. (k) SAE-A-16-testing data. (l) SAE-B-16-testing data. (m) SAE-A-8-testing data. (n) SAE-B-8-testing data. (o) SAE-A-3-testing data. (p) SAE-B-3-testing data.
The 3-dimensional results of different datasets for the testing dataset through eight hidden layers by using an SDAE with PCA dimension reduction; 1–8 denote the hidden layer number. (a) SDAE-A-512-testing data. (b) SDAE-B-512-testing data. (c) SDAE-A-256-testing data. (d) SDAE-B-256-testing data. (e) SDAE-A-128-testing data. (f) SDAE-B-128-testing data. (g) SDAE-A-64-testing data. (h) SDAE-B-64-testing data. (i) SDAE-A-32-testing data. (j) SDAE-B-32-testing data. (k) SDAE-A-16-testing data. (l) SDAE-B-16-testing data. (m) SDAE-A-8-testing data. (n) SDAE-B-8-testing data. (o) SDAE-A-3-testing data. (p) SDAE-B-3-testing data.
The 3-dimensional clustering results for the testing dataset by using an SAE/SDAE with adAP. (a) SAE-A-testing data-adaptive AP. (b) SAE-B-testing data-adaptive AP. (c) SDAE-A-testing data-adaptive AP. (d) SDAE-B-testing data-adaptive AP.
As seen in Figure
The results of the Silhouette index with different cluster numbers by using an SAE/SDAE with adAP (the testing dataset).
Model | Dataset | Silhouette index | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Cluster no. | ||||||||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ||
SAE | A | 0.3302 | 0.4377 | 0.5614 | 0.5892 | 0.6126 | 0.6416 | 0.6647 | 0.6791 | 0.6815 | — | — |
B | 0.3629 | 0.4259 | 0.4327 | 0.5173 | 0.6053 | 0.6367 | 0.7151 | 0.7424 | 0.7415 | — | — | |
SDAE | A | 0.4920 | 0.5050 | 0.5487 | 0.6660 | 0.7780 | 0.8214 | 0.8936 | 0.9014 | — | — | — |
B | 0.4370 | 0.5156 | 0.6237 | 0.6880 | 0.7510 | 0.8061 | 0.8969 | 0.9167 | — | — | — |
The results of the best cluster number and classification error accuracy at the maximum Silhouette index value by using different models (the testing dataset) are shown in Table
The results of the best cluster number and classification error accuracy at the maximum Silhouette index value by using different models (the testing dataset).
Dataset | Model | Best clusters no. | Silhouette index | Error rate (%) |
---|---|---|---|---|
A | SAE-adAP | 10 | 0.6815 | — |
SDAE-adAP | 9 | 0.9014 | 4.44 | |
B | SAE-adAP | 9 | 0.7425 | 5 |
SDAE-adAP | 9 | 0.9167 | 2.78 |
The 3-dimensional clustering results for the testing dataset A by using an SAE/SDAE with FCM/GK/GG. (a) SAE-FCM-A. (b) SAE-GK-A. (c) SAE-GG-A. (d) SDAE-FCM-A. (e) SDAE-GK-A. (f) SDAE-GG-A.
The 3-dimensional clustering for the testing dataset B by using an SAE/SDAE with FCM/GK/GG. (a) SAE-FCM-B. (b) SAE-GK-B. (c) SAE-GG-B. (d) SDAE-FCM-B. (e) SDAE-GK-B. (f) SDAE-GG-B.
The error classification accuracy through various models (the testing dataset).
Model | Dataset | Error rate (%) | |||
---|---|---|---|---|---|
FCM | GK | GG | adAP | ||
SAE | A | 3.89 | 6.67 | 3.89 | — |
B | 1.67 | 11.1 | 0 | 5 | |
SDAE | A | 4.44 | 15 | 3.89 | 4.44 |
B | 2.78 | 13.3 | 2.22 | 2.78 |
A method based on an SDAE and adAP for bearing fault diagnosis was presented in this study. To reduce the dependence on manual experiments to label data, we used an SDAE without an output layer to extract useful fault features from the frequency domain directly by using FFT decomposition. Additionally, to find the available parameters in AP, we introduced adAP for bearing fault diagnosis in this paper. The results show that the performance of the proposed model suppresses other models, such as SAE/SDAE-FCM/GK/GG and SAE-adAP.
The advantages and disadvantages of this paper are as follows. Using the model proposed in this article can serve to mark different bearing fault signals. For example, the clustering result is used to label the different fault signals, and then an SAE with an output layer can be used to realize online automatic fault diagnosis. However, the data collected in the actual project contain noise, resulting in the misclassification and mislabeling of the clustering results. Therefore, the classification effect for the subsequent use of an SAE with an output layer is even worse. In order to solve this problem, for future research, we propose an improved SAE model, for example, by adding a data-smoothing model at each hidden layer to eliminate noise layer by layer on the signal, thus improving the accuracy of clustering and classification.
wavelet transformation
empirical mode decomposition
intrinsic mode functions
ensemble empirical mode decomposition
stacked autoencoder
stacked denoising autoencoder
Fuzzy C-mean
Gustafson–Kessel
Gath–Geva
affinity propagation
adaptive affinity propagation
principal component analysis
normal
ball fault
inner race fault
outer race fault
cluster center
autoencoder
Fast Fourier transformation.
Previously reported bearing data were used to support this study and are available at (data link:
The authors declare that there are no conflicts of interest regarding the publication of this paper.