A Stacked Autoencoder-Based Deep Neural Network for Achieving Gearbox Fault Diagnosis

Machinery fault diagnosis is pretty vital in modern manufacturing industry since an early detection can avoid some dangerous situations. Among various diagnosis methods, data-driven approaches are gaining popularity with the widespread development of data analysis techniques. In this research, an effective deep learning method known as stacked autoencoders (SAEs) is proposed to solve gearbox fault diagnosis. The proposed method can directly extract salient features from frequency-domain signals and eliminate the exhausted use of handcrafted features. Furthermore, to reduce the overfitting problem in training process and improve the performance for small training set, dropout technique and ReLU activation function are introduced into SAEs. Two gearbox datasets are employed to conform the effectiveness of the proposed method; the result indicates that the proposed method can not only achieve significant improvement but also is superior to the raw SAEs and some other traditional methods.


Introduction
As one of the most vital components of rotating machinery, gearboxes are widely used in various industrial fields, such as vehicles and machine tools [1]. Due to the complexity structure of gearbox and various working conditions interference, gears always cause fault quite easily. If the fault is not detected in time, it can result in the crash of entire system and serious loss of property. So it is challenging to conduct effective fault diagnosis system. In modern industries, datadriven systems have revolutionized manufacturing through enabling computers to collect a massive amount of data from monitored machines [2]. At the same time, machines have also been more precise than ever before, and machinery fault diagnosis has sufficiently embraced multifault diagnosis revolution in condition monitoring system. Contrasted with top-down modeling proposed by the physics-based fault diagnosis systems, data-driven systems provide a bottomup model to detect the occurrence of machinery faults [3]. As is well-known, the physics-based methods are unable to be updated online with measured data and also cannot deal well with large-scale data. On the other hand, with the fast-developing computer systems and sensors, data-driven based fault diagnosis systems have drawn increasing public attentions.
The basic framework of data-driven system usually consists of four consecutive stages: data acquisition, feature extraction, model training, and model testing [4]. Conventional data-driven methods are usually trying to design a right set of features and then put them into some shallow machine learning models such as Naive Bayes (NB) [5], Support Vector Machines (SVM) [6], and logistic regression [7]. But these works usually focus on manual feature extraction (statistical features, frequency, and time-frequency features) that always need plenty of human labor and cannot update online [8]. Meanwhile, the selection of these features could not leave prior knowledge and feature engineering. So it is really a tough problem for these methods to extract intrinsic features behind the raw time-series data. As the hottest subfield of machine learning, deep learning has been regarded as a powerful solution for the intelligent fault diagnosis system to extract salient features through multilayer architecture, such as artificial neural networks (ANN) [9,10], autoencoders [11,12], restricted Boltzmann machine (RBM) [13,14], and convolutional neural networks (CNN) [15,16]. Compared with traditional methods, deep learning methods do not need human labor and expert knowledge for feature extraction. All the hyperparameters in model training and pattern classification modules are able to be trained jointly. Therefore, deep learning can be employed to address machinery fault diagnosis in a very general way.
As one of the widely used deep learning techniques, stacked autoencoders (SAEs) have attracted considerable attention in fault diagnosis. It has been investigated as a common component of DNN by Bengio et al. [17]. Jia et al. [18] proposed a SAEs based DNNs for roller bearing and planetary gearbox fault diagnosis with input as frequency spectra after Fourier transform. Guo et al. [19] employed multidomain statistical features of the raw vibration signals as the input of SAEs, which can be viewed as a kind of feature fusion. Liu et al. [20] fed the normalized spectrograms created by STFT into SAEs for rolling bearing fault diagnosis. In the work presented in [21], the nonlinear soft threshold approach and digital wavelet frame were used to process the measured signal and then fed into SAEs for rotating machinery diagnosis. Jia et al. [22] constructed a local connection network based on normalized sparse autoencoder, and L1 norm was employed to find sparse features.
Inspired by the prior researches, a new framework based on SAEs is proposed to resolve the gearbox fault diagnosis. Furthermore, to overcome the deficiency of overfitting problem in the training process and improve the performance for small training set, dropout technique [23] and ReLU activation function are introduced into SAEs. Rest of the paper is organized as follows. Section 2 briefly introduces the algorithms of SAEs, dropout, and ReLU activation function. Section 3 is dedicated to detailing the content of the proposed method. In Section 4, the multifault gearbox dataset is adopted to validate the effectiveness of the proposed method. Furthermore, the superiority of the proposed method is exhibited by comparing with the other traditional methods. Finally, some conclusions are drawn in Section 5.

Stacked Autoencoders.
Autoencoder is a kind of unsupervised learning structure that owns three layers: input layer, hidden layer, and output layer as shown in Figure 1. The process of an autoencoder training consists of two parts: encoder and decoder. Encoder is used for mapping the input data into hidden representation, and decoder is referred to reconstructing input data from the hidden representation. Given the unlabeled input dataset {x n } N n=1 , where x n ∈ R m×1 , h n represents the hidden encoder vector calculated from x n , andx n is the decoder vector of the output layer. Hence the encoding process is as follows: where f is the encoding function, W 1 is the weight matrix of the encoder, and b 1 is the bias vector.
The decoder process is defined as follows: where g is the decoding function, W 2 is the weight matrix of the decoder, and b 2 is the bias vector. The parameter sets of the autoencoder are optimized to minimize the reconstruction error: where L represents a loss function L(x,x) = ‖x −x‖ 2 . As shown in Figure 2, the structure of SAEs is stacking n autoencoders into n hidden layers by an unsupervised layerwise learning algorithm and then fine-tuned by a supervised method. So the SAEs based method can be divided into three steps: (1) Train the first autoencoder by input data and obtain the learned feature vector; (2) The feature vector of the former layer is used as the input for the next layer, and this procedure is repeated until the training completes.
(3) After all the hidden layers are trained, backpropagation algorithm (BP) is used to minimize the cost function and update the weights with labeled training set to achieve finetuning.

Dropout.
Dropout is an effective strategy that has been proved to reduce overfitting in the training process of neural networks. The overfitting problem always happens when the training set is small, which would result in a low accuracy on the test set. Dropout can randomly affect the neurons of the hidden layer to lose power in the training process as shown in Figure 3, but the weights of those neurons are preserved. Furthermore, the neurons can recover to work when the next sample is input. Technically, dropout is able to be achieved by setting the output date of some hidden neurons to 0 and then these neurons cannot be related to the forward propagation process. Many researches have tested the effect of dropout on reducing the overfitting problem for the small training set [28], and this paper will also employ it to enhance the feature extraction ability and classification accuracy of SAEs for multifault gearbox fault diagnosis.

ReLU.
For traditional activation functions (sigmoid and hyperbolic tangent functions), the gradients decrease quickly with training error propagating to forward layers. The rectified linear units (ReLU) activation function has received extensive attention in recent years, since its gradient will not decrease with the independent variables increasing. So the network with ReLU does not suffer from gradient diffusion or vanishing. The ReLU function is shown in (4) and the structure is displayed in Figure 4.

Proposed Framework
This section details the proposed intelligent fault diagnosis method. In the method, SAEs are combined with dropout to achieve multifault gearbox fault diagnosis. The framework and illustration of the proposed method are displayed in  the weight matrix from frequency spectra of vibration signals. Specifically, the procedure can be described as follows: (1) The spectra of vibration signals are composed the where M is the number of samples, X j ∈ R N×1 is the ith sample containing N Fourier coefficients, and l i is the health label of X i .
(2) Build the DNNs by SAEs, and then employ the unlabeled training set {X i } M i=1 to pretrain the DNNs layer-bylayer.
(3) Utilize BP algorithm to update the weights and fineturn the parameters of the SAEs with labeled training set Mathematical Problems in Engineering  (4) The testing set is adopted to validate the effectiveness of the proposed method.

Case 1: Fault Diagnosis of a Multifault Gearbox
Gear faults including distributed fault (worn) and localized faults (broken, pit), as well as coupled fault in power train, perhaps cause catastrophic accidents. Therefore, an early recognition of the gear faults is critical for normal operation of a gearbox. Our paper focuses on investigating the multifault gearbox. In this section, a mulifault gearbox experimental dataset is employed to validate the effectiveness of the proposed method [29]. The vibration signals were collected on a specially designed bench which consisted of a one phase input and three-phase output motor (the nominal power is 0.75 kW and nominal rotation frequency is 880 rpm), a gearbox, the shaft supporting seats, a flexible coupling, and a magnetic powder brake as show in Figure 6. The sensor is a piezoelectric accelerometer (DH131E) mounted on the flat surface of gearbox and the sampling frequency is 5120 Hz. The gearbox includes two gears (pinion and wheel gear) and the gear parameters are displayed in Table 1. There are six health conditions under three loads: normal, a single worn pinion, a single pit of wheel, a single broken tooth of wheel, coupled fault of broken wheel and worn pinion, and coupled fault of wheel pit and worn pinion. For brevity, the six fault types of gear are named as Type-1, Type-2, Type-3, Type-4, Type-5, and Type-6, respectively. 100 data samples are collected from each fault type under one load by an overlapped manner, so a total of 1800 samples are obtained from the designed bench and each sample contains 1000 data points. Considering the rotation frequency of shaft is 880 rpm, so each period of rotation contains 350 data points. For avoiding the influence of speed fluctuation, each sample collects almost three periods of rotation data (1000 data points). The frequency spectra are also adopted as input data, and each sample contains 500 Fourier coefficients. The major reason of using frequency spectra is that the frequency spectra can show the distribution of constitutive components with discrete frequencies and more clarity information about the state of rotating machines [18]. Here we randomly select 4 samples from the normal type of gear, and obtain their Fourier coefficients by FFT as shown in Figure 7. It is easy to find that the time-domain features of Mathematical Problems in Engineering 5   each sample are different, but the frequency spectra features are becoming regularity with each other. The structure of the designed DNNs is 500, 200, 100, and 6, respectively.

Diagnosis
Results. The parameter of dropout rate is changed from 0 to 0.7 with a step size of 0.1, and 15 trials are carried out for the experiment in order to reduce the effective of randomness. 10% of samples are randomly selected to train the model, and the rest are used for testing. The diagnosis accuracies are shown in Figure 8. It is clearly seen that when is 0.3, the diagnosis accuracy is the highest and the standard deviation is the lowest, so 0.3 is chosen as the dropout rate in this experiment.
To classify the six health conditions of the gears, 10% samples are employed to train the proposed model and the rest are used for testing. The learning rate is 0.01 and the iteration number is 100. The training and testing accuracies of 15 trials are displayed in Figure 9 and the average training and testing accuracies are 100% and 99.34% ± 0.25% respectively, which indicates that the proposed model can also distinguish the six health conditions of gear with a high accuracy. To illustrate the process concretely, the classification results of  the 14th testing trial are drawn in Figure 10. It can be seen that 3 of the testing samples are misclassified, yielding the success rates 99.44%. Among them, 1 sample of type-3 is misclassified as type-4, 2 samples of type-4 are misclassified as type-3, and 1 sample of type-5 is misclassified as type-6, respectively. To further display the ability of the proposed method, tdistributed stochastic neighbor embedding (t-SNE) [30] is employed to visualize the learned features. Therefore, the 100dimension feature vector is embedded into a 3-dimension feature vector. The classification result is shown in Figure 11. It is easy to find that the same types of samples are gathered together and different types are separated excellently.
For comparison, several diagnosis methods are presented and the diagnosis results are displayed in Table 2. Li et al. [24] proposed a method combining 19 time-domain and frequency-domain features with self-organizing map, when their method was adopted to classify the six types of the gearbox dataset and achieved 92.51% ± 4.23% testing accuracy. In [25], wavelet multifractal features and SVM model were used to represent the six gear fault types, and finally obtained 87.65%±5.37% classification accuracy. Lin et al. [26] proposed a fault diagnosis method using multifractal detrended fluctuation (MFDFA) and here achieved 96.23% ± 1.69% accuracy. Lou et al. [27] applied multiple domain features and ensemble fuzzy ARTMAP neural networks to distinguish the health conditions and 98.83% ± 0.42% accuracy was obtained. Furthermore, the raw stacked autoencoders without dropout (Raw SAEs) are also adopted for comparison, and the testing accuracy is just 93.16% ± 3.78% which exhibits the effectiveness of dropout in feature extraction. Compared with the methods above, it shows that the proposed method can not only automatically distinguish the six health conditions of gearbox, but also achieve a higher accuracy with a lower percentage of training samples.
To further investigate the learned features in the proposed model, another experiment is conducted as shown in Figures  12 and 13. As a result, two level features can be obtained from two hidden layers of the DNNs which can be called learned features, so 200-and 100-dimensional learned feature vectors of each sample are obtained, respectively. For achieving a good view on the visualization, all the learned feature vectors of the same type test samples are gathered together [31]. Figure 12 displays the learned 100-dimensional features of gearbox dataset using the raw SAEs without dropout, and Figure 13 shows the learned 100-dimension features using the proposed method. It can be clearly seen that all the learned feature vectors of one health condition by the proposed method are almost the same trend with each other. By contrast, the feature vectors of each health condition by the raw SAEs method are mixed with each other and can not find a unit tendency. And also, the amplitudes of feature vectors by the two methods are different, the proposed method is able to learn more distinguished feature vectors that own    larger amplitudes than the raw SAEs method. Therefore, the proposed method can effectively mine the main variations in a high order space from different fault signals than the raw SAEs method without dropout.

Case 2: Fault Diagnosis of a Motorcycle Gearbox.
To further validate the proposed method, a motorcycle gearbox dataset [32] is taken as the second case analysis. Figure 14 displays the gearbox being referred to; besides the gearbox, there are an electrical motor with the rotation speed 1420 rpm, a data acquisition system, a tachometer, a triaxial accelerometer, and a load mechanism. The sample frequency was 16384 Hz. There are four health types of gear as shown in Figures 14(b)-14(c): normal condition (NC), slightly worn (SW), medium worn (MW), and broken tooth (BT). In Figure 14(e), the gears, which have 24 teeth and 29 teeth (tested gear), are a pair of driven and driving gears. The vibration signals of four gear health types are depicted in Figure 15. It can be easy to find that the NC and BT are easily distinguished, but the SW and MW are hardly discriminated. 50 samples of NC and 100 samples of SW, MW, and BT are collected, and each sample contains 1000 data points.
Similarly, 10% samples are employed to train the proposed model and the rest are used for testing, and the parameter set is the same as Case 1. The training and testing accuracies of 15 trials are displayed in Figure 16 and the average training and testing accuracies are 100% and 99.26% ± 0.41% respectively, which also indicates that the proposed model can distinguish the four health types of the motorcycle gearbox with a high accuracy. Then, the classification results of the 3rd testing trial are displayed in Figure 17. Only 2 samples are misclassified; i.e., 1 sample of SW is misclassified as MW, and 1 sample of MW is misclassified as SW, respectively. Meanwhile, the visualization of the 2-dimensional feature vectors mapped by t-SNE is shown in Figure 18. The excellent classification result is also obtained which illustrates the robustness of the proposed method.

Conclusions
An intelligent fault diagnosis method based on SAEs is presented for gearbox fault diagnosis. In order to reduce overfitting problem and improve the performance of traditional SAEs for small training set, the dropout technique and ReLU activation function are both adopted. As illustrated in the experimental study, the proposed method can extract useful features from different fault gear types and achieve a high diagnosis accuracy. Comparison studies show that the proposed method outperforms the raw SAEs method and some other traditional methods. On the other hand, the   exhibition of the learned features illustrates that with the help of dropout technique and ReLU activation function, the proposed method can capture salient features and obtain a higher diagnosis result than the raw SAEs method. Meanwhile it can clearly describe the process on how DNNs deal with mechanical signals, which is worth further study in fault diagnosis. In future work, a wide range of experiments will be investigated to evaluate the robustness of the proposed method.

Data Availability
The data used to support the findings of this study can be found in the online versions at https://doi.org/10.1006/mssp .2000.1338 and http://dx.doi.org/10.1016/j.ymssp.2016.06.012.