A Denoising Autoencoder-Based Bearing Fault Diagnosis System for Time-Domain Vibration Signals

The condition monitoring of rotating machinery is always a focus of intelligent fault diagnosis. In view of the traditional methods’ excessive dependence on prior knowledge to manually extract features, their limited capacity to learn complex nonlinear relations in fault signals and the mixing of the collected signals with environmental noise in the course of the work of rotating machines, this article proposes a novel approach for detecting the bearing fault, which is based on deep learning. To effectively detect, locate, and identify faults in rolling bearings, a stacked noise reduction autoencoder is utilized for abstracting characteristic from the original vibration of signals, and then, the characteristic is provided as input for backpropagation (BP) network classifier. The results output by this classifier represent different fault categories. Experimental results obtained on rolling bearing datasets show that this method can be used to effectively diagnose bearing faults based on original time-domain signals.


Introduction
Rolling bearings are the modules that have frequent usage in mechanical equipment and are also among the most vulnerable components. When a fault occurs, it often results in enormous losses; consequently, bearing fault diagnosis technology is receiving increasing attention. Statistics show that around 30% of machinery failures arise from bearings [1,2]. To comprehensively detect the health status of rotating machinery, a condition monitoring system is used for data collection. After a long period of operation, such a machine will have generated a large amount of real-time data. We can obtain the features that are extracted from the monitored data through constructing feature engineering. These data can be used to establish a classifier that different bearing faults can be identified [3,4]. Therefore, detecting bearing faults is essentially the task of classification [5][6][7], and the health status of bearings can be identified through such artificial intelligence techniques as deep learning models [8][9][10], support vector machines (SVMs) [11], neural networks (NNs) [12,13], extreme learning machines (ELMs) [14], and autoencoders (AEs) [15][16][17]. In particular, in recent years, deep learning technology has been widely used in this field and achieved good results [18,19].
Most of the common methods used to detect bearing faults at present are supervised algorithms where labor label cost is pretty high. Because of the powerful function of hidden layers [20][21][22], the algorithm for training the classifier that detects bearing faults is NNs commonly. However, high-performance neural network classifiers also rely on numerous good-quality labeled data. But overfitting of NNs may occur when the samples used for training is noisy, or the amount is not enough, or the test distribution cannot be covered. Thus, the ability of NNs to generalize becomes poor particularly for complicated classification problems. Diagnosing bearing faults is challenging because the subject to be solved is a sophisticated mechanical instrument [23,24]. Hence, lots of approaches have been put forward to figure this issue out, like senior techniques of processing the signals that are used for analyzing the vibration of signal to enable the extraction of helpful bearing fault features. But these advanced techniques require practitioners to have a deep knowledge reserve of vibration signal and the whole mechanical systems. Moreover, because it is difficult to obtain the professional knowledge related to this, these methods are not universal, meaning that they are not smart enough compared to machine learning. As an alternative and with the power of deep learning, deep neural networks (DNNs) are proposed to achieve unsupervised learning of characteristic. In recent years, considering the impressive achievements in the area of image recognition, deep learning has become the most popular and promising research methods. Among them, deep AE (autoencoder) network structures, as one of the representatives of unsupervised learning for detecting bearing faults, have become a common solution. In addition, as a well-known unsupervised feature learning method, denoising self-encoding is widely used for the realization of various tasks because it can be used to learn more robust feature expressions for input signals, that is, it has a strong generalization ability [25][26][27].

Brief Introduction to DNNs
DNN is usually a nonlinear converter with multilayer structures. In general, it contains an input layer, an output layer, and several layers in the middle. The parameter values of each layer in DNN can be learned by the training mechanism of the neural network, as Hinton et al. said [10]. By comparison with smaller NNs, DNNs can express more information. With the power of deep networks, we propose to use a SAEbased (stacked autoencoder) method to automatically classify bearing faults through feature selection without any manual intervention. This method consists of three steps.
Step one: use unsupervised learning to train the entire model gradually.
Step two: use supervised learning to fine-tune this network with the training data which contain true labels.
Step three: use the network to perform feature extraction and classification of the bearing data after completing the training of this network.
A DAE (denoising autoencoder) [25] is a variant of an AE obtained by adding denoising technology to an AE. Traditionally, an AE comprises an encoder and a decoder. Besides, AE is also one of the representative methods of unsupervised learning. The principle that AE can achieve unsupervised feature extraction is through continuously minimizing the deviation between the input and the reconstructed data. For the sake of enhancing the generalization performance of the entire model, we add some noise to the input layer. This approach also effectively avoids overfitting in the training of the network. Usually, a DAE is a symmetrical neural network with an odd number of layers, and its goal is to make the output and input data as consistent as possible. The purpose of the encoder of DAE is to condense the input data without losing important information as much as possible and get a concise representation of the input data. The diagram of a DAE is as seen in Figure 1.
In Figure 1, the encoder is made up by an input layer and parts of the layers near the input, and the decoder is composed by an output layer and parts of the layers near the output. The process that the encoder performs processing can be expressed by Equation (1), and Equation (2) represents the processing performed by the decoder, where s f and s g , respectively, denote as the activation functions of the coding process and the decoding process. For these activation functions, the nonlinear sigmoid function is commonly used. Suppose a dataset is represented as where M represents the size of X. The expression of the sample x i in X obtained by the encoder is shown as a i . Furthermore, a i is also a concise expression of the input.
Suppose the reconstructed error is represented as x∧ i . The connection between x∧ i and a i is expressed as The parameters of DAE are updated iteratively during the training phase. The objective function of DAE is as follows: In Equation (5),x represents the input x after adding some noise. In DAE, it is implemented through dropout technology. Therefore, DAE must be able to recover x from this pollution rather than just using the data with noise. By this means, the inner distribution of the generated data structure can be implicitly captured by the encoding function f ð•Þ and the decoding function gð•Þ of DAE in common, thus enabling the extraction of robust features. DAE-based automatic detecting for bearing faults method.
We propose a novel automatic detecting for bearing faults method in this article. This method is based on DAE. The breakdown characteristic from primitive vibration signals of rotating equipment like bearings can be mined by this Input data reconstruction Input data  Wireless Communications and Mobile Computing method. By adaptively selecting these representative features, this method can automatically classify the condition category, which is the health level, of the machinery on these breakdown features. We measure the primitive signals of vibration in the time domain. The analysis of converted spectral signals of time domain and frequency domain is opposite, where frequency domain applies the fast Fourier transform (FFT). The temporal waveform determines the analysis in the time domain, and such a time-domain signal provides relatively vague information on the health of the rotating machinery. Therefore, it is very challenging to classify these original time-domain signals directly. The proposed method contains four steps as seen in Figure 2.
Step one: acquire time-domain data from rotating equipment with various states of health. Assume that the training dataset composed of these time-domain data is denoted as , where x i represents the i-th sample in the training-phase, d i represents the state label of health related to x i , and M represents the amount of time-domain data.
Step two: construct a DNN which consists of several hidden layers. The amount of neurons in the first layer is identical to the dimensionality of the time-domain data x i . Next, pretrain the DNN on each layer through SAE with the unlabeled dataset In the pretraining, the required number of AE is the same as that of the amount of hidden layers in DNN. The specific method of pretraining layer by layer is that firstly take the first three layers of DNN, where it consists of an input and two hidden layers, as the threelayer structure of AE1. The correspondence between above two is that the input layer of DNN is also the input layer of AE1, the hidden layer 1 of DNN is the hidden layer of AE1 and the hidden layer 2 of DNN is the output layer of AE1. The dataset of training AE1 is the unlabeled dataset x, which is also regarded as the output of AE1. After completing the training for AE1, use the parameters fW 1 , b 1 g of AE1 to finish the initialization of the parameters of hidden layer 1 of DNN. Let a 1 be the encoding vector which is from timedomain data of the time rotating machinery by AE1. Then, take a 1 as the input data and objective output of AE2 for training and finish the initialization of the parameters of hidden layer 2 of DNN via fW 2 , b 2 g, and obtain a 2 accordingly. Finally, repeat the above steps sequentially till AEN is obtained and the time-domain data is encoded as a N . By using the above method, we complete the pretraining of the whole of the middle layers of DNN.
Step three: the size of the last layer of DNN, which performs classification, should be consistent with the amount of different states of health that machine may have. Then, by making the reconstruction value from the original timedomain data as close to the given health status label as possible, the parameters of DNN are fine-tuned. The method to optimize the entire network is the backpropagation (BP) algorithms.
Step four: apply the trained DNN to faults diagnose of rotating equipment.
In the approach of this paper, DNN learns multiple nonlinear transformations through a pretraining process to dis-cover the main changes in the time-domain signal characteristics. In this way, DAE can extract more robust features. After that, the fine-tuning process can help DNN mine discriminative information in the vibration signal data. By these means, the well-trained DNN is capable of automatically capturing the fundamental features from the timedomain data and set up a complex nonlinear relationship between the bearing status labels and time-domain data. Thus, this method can be used to realize effective attribute selection and rapid smart detecting of rotating bearing breakdown features.

Fault Diagnosis Experiment
Rolling bearings are the core parts in rotating machinery. The performance, service life, and reliability of this type of machine usually depend on the health of these components. However, because of their harsh working environment, these components are susceptible to various damages, resulting in malfunctions and causing serious economic damage. Several diagnostic situations of rolling bearings will be considered to verify the proposed method in this part.
3.1. Introduction to the Training Data. The raw data of rotating bearing were gathered from a system of machine which is driven by a motor. The approach of acquisition is to utilize a specific frequency to complete the sampling, where the frequency is 12 kHz. In general, there are four ways of loading in these bearing data. Thus, the data were obtained under the following four experimental conditions: (a) normal state (N), (b) outer race fault (OF), (c) inner race fault (IF), and (d) roller fault (RF). Meanwhile, when constructing the raw data, the diameter of the failed rotating bearing is also considered into the driving termination engine. In this method,

Wireless Communications and Mobile Computing
there are three sizes of the fault: 0.007 inches, 0.014 inches, and 0.021 inches. In this article, five datasets, respectively, named A, B, C, D, and E, are used for checking the classification capability of this method. The detailed description of these five datasets is seen in Table 1. Furthermore, the health states of bearing data are divided into ten different types of labels, named 1 to 10. Therefore, A, B, C, and D in the datasets cover these condition type labels with the loading of 0, 1, 2, and 3 hp, respectively. Each state of health involves 600 signals, and there are 2048 data points for every single signal.

Experimental
Setup. The designed one side of DNN consists of four layers. The size of the first layer of DNN is identical to the feature dimension of the training data, which is 2048. As for the size of the hidden layers in the middle of the encoder, we uniformly set them to 500. The size of the last layer of the encoder depends on the categories of possible bearing states of health. Specifically, the shape of the entire DNN is 2048-500-500-10-500-500-2048, where the structure of encoder and decoder is symmetric. We use rectified linear units (ReLUs) in all encoder/decoder pairs except the first (which needs to reconstruct the input data, which may have both positive and negative values) and the last (to ensure that the final encoding retains complete information). For these experiments, during greedy layerwise pretraining, the initial-ization of the weights is set to a random number with a standard deviation of 0.01 derived from a zero-mean Gaussian distribution. Each layer was pretrained for 500 iterations with a dropout rate of 20% and a learning rate of 0.001. We select 50% of the data at random to pretrain the network, and at the same time, these samples are also utilized to optimize the entire network through fine-tuning the parameters. After that, the rest of the data (another 50% data) are used for testing the classification capability. The fine-tuning process was performed by removing the decoder from the pretrained DAE and adding a soft-max layer for BP network, and we set the maximum value of fine-tuning epochs to 500. Table 2 shows the diagnostic results of ten tests performed on each dataset in A to E using the proposed method. We are aware from the experimental results of these test sets that the accuracy of diagnose for bearing fault data is greater than 92%, and some even reach to 99%, which signifies that our method is capable of well distinguishing the status of health of ten types of bearings.

Comparison with Traditional Methods.
Backpropagation neural networks (BPNNs) and SVMs have been widely used in fault detection for rotating equipment. In this part, we compare our method with a multiple layer architecture network, that is, BPNN, whose structure is 2048-500-500-10,

Wireless Communications and Mobile Computing
another multi-hidden-layer BPNN used in a previous study, whose network structure is 2048-600-200-100-10, a neural network with only one hidden layer, whose network structure is 2048-500-10, an SVM-based model, an ELM, and a traditional SAE-based model with the same architecture used in the proposed method. A comparison of the diagnostic accuracy of the different methods across 10 experiments is seen in Figure 3.
The test results demonstrate that the method with the most accurate detection and the lowest volatility is our proposed method, and they also show a greater advantage in stability. Compared with the proposed method, the SVM has relatively stable classification performance and slightly greater volatility. What is more, compared with other methods, its diagnostic performance is relatively stable. Compared with our method or SVM, only one hidden layer BPNN shows worse recognition precision and greater volatility. In addition, the performance of multilayer BPNN2 is only better than that of smaller BPNN1, but the diagnostic results of the multi-hidden-layer BPNN3, with the deepest structure, are poorer. This shows that the classification accuracy of a BPNN cannot always be improved simply by increasing the number of network layers; instead, a reasonable network structure must be selected. In addition, we can see that the results obtained from the traditional SAE-based method are similar to the results of BPNN1, which has the same hidden layer structure. This indicates that even if a DNN is used for pretraining, it is not necessarily possible to effectively extract the useful features of the original time-domain data. In addition, ELM achieves the worst results in all experiments, indicating that this classification method cannot process the original time-domain data effectively. We calculated the mean diagnostic accuracies and standard deviations of more than 10 tests for quantitative comparisons. Table 2 details the experimental results.
It is worth noting that, in each trial, the same training dataset is used for checking the performance of each comparison approach. From the perspective of average classification accuracy, our proposed method can achieve the best diagnostic accuracy, while the ELM has the worst diagnostic accuracy. Moreover, our method also has the best performance in terms of classification stability. The results indicate that our method in this article is able to effectively and stably identify the type of bearing failure and the severity of damage to rotating machinery. In comparison, the results obtained by BPNN in the table are relatively poor in both average accuracy and standard deviation, which shows that our method is better than BPNN. In particular, dataset E contains a large number of samples representing the states of health of bearing data with ten different loadings, and the detecting precision of the method proposed in this paper on this dataset indicates that this approach can detect bearing breakdowns independently from load fluctuations.
The above results show that, in comparison with the conventional diagnosis approaches, our proposed approach of combining unsupervised DAE pretraining with the supervised BP algorithm has significant advantages. In this method, the DAE is trained with the reconstruction errors generated from multidimensional temporal data to explore the relationship between time-domain data. Then, the decoder is replaced with a soft-max layer for health condition monitoring and fault diagnosis. Therefore, compared with the traditional recognition methods based on time-domain signal analysis, the proposed method can obtain higher recognition accuracy by means of a reasonable network structure and the introduction of dropout technology in the pretraining process. The characteristic structure of a timedomain vibration signal is fuzzy, and its regularity is poor. Therefore, the method of combining traditional timedomain features with an SVM classifier results in a low average diagnostic accuracy. However, a DAE is capable of deep learning from the original time-domain data, mining hidden information and identifying differences. Therefore, the DAEbased classification method proposed in this paper can obtain high recognition accuracy in repeated trials and maintain stable performance. Because of the challenges presented by the noise in the original time-domain signals, although the traditional SAE-based method also relies on a deep structure for network pretraining, good classification performance cannot be obtained. In addition, because the structure of a BPNN is relatively shallow and its feature learning ability is limited, its diagnostic accuracy is not high. The BP algorithm based on a multi-hidden-layer BPNN results in unstable network performance and poor generalization ability.

Conclusions
We come up with a smart method in this article which is on the basis of DAE to diagnose faults in rotating machinery. The feasibility of this approach for breakdown detecting of rolling bearings is verified on five test datasets. These datasets contain a large number of samples that represent the health of different bearings under various operating conditions. It demonstrates that based on the diagnosis outcome obtained from these datasets this method can adaptively mine fault features from time-domain data, so as to successfully solve a variety of detecting problems and efficiently implement the classification of the condition of the machine. Throughout the introduction of data dropout, the generalization ability and anti-interference ability of the model are improved. Since the fault features are automatically extracted, compared with traditional BPNN-based methods, our proposed method can avoid the dependence on artificial label and without the expert-level understanding of mechanical signal and diagnosis so that it can be easily used in practical applications.

Data Availability
The labeled dataset used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare no conflicts of interest. 6 Wireless Communications and Mobile Computing