Reliable Fault Diagnosis of Rotary Machine Bearings Using a Stacked Sparse Autoencoder-Based Deep Neural Network

Due to enhanced safety, cost-effectiveness, and reliability requirements, fault diagnosis of bearings using vibration acceleration signals has been a key area of research over the past several decades. Many fault diagnosis algorithms have been developed that can efficiently classify faults under constant speed conditions. However, the performances of these traditional algorithms deteriorate with fluctuations of the shaft speed. In the past couple of years, deep learning algorithms have not only improved the classification performance in various disciplines (e.g., in image processing and natural language processing), but also reduced the complexity of feature extraction and selection processes. In this study, using complex envelope spectra and stacked sparse autoencoder(SSAE-) based deep neural networks (DNNs), a fault diagnosis scheme is developed that can overcome fluctuations of the shaft speed.The complex envelope spectrummade the frequency components associated with each fault type vibrant, hence helping the autoencoders to learn the characteristic features from the given input signals more readily. Moreover, the implementation of SSAEDNN for bearing fault diagnosis has avoided the need of handcrafted features that are used in traditional fault diagnosis schemes. The experimental results demonstrate that the proposed scheme outperforms conventional fault diagnosis algorithms in terms of fault classification accuracy when tested with variable shaft speed data.


Introduction
Reliable fault diagnosis of industrial machinery is an essential task, as it not only contributes to the safety and reliability of the machinery, but also decreases the associated maintenance and operational costs [1][2][3][4][5][6][7].Vibration acceleration signals collected from complex industrial machines provide useful information about their health status, and therefore, vibration condition monitoring is considered a standard approach that allows for corroboration as a part of reliable fault diagnosis schemes [8][9][10][11][12].Bearings are the most frequently used components in rotating machinery and account for approximately 40-51% of the failure occurrences [13][14][15].As a result, bearing fault diagnosis has been extensively investigated.
Traditional fault diagnosis methods use efficient feature extraction and a machine learning algorithm, such as -nearest neighbors (-NN), support vectors machines (SVMs), and artificial neural networks (ANNs), to perform fault diagnosis [16][17][18][19][20]. Feature extraction is a cumbersome process that requires expert knowledge and also adds to the complexity of the fault diagnosis scheme [21].Previously, ten statistical parameters reflecting the bearing health conditions were first calculated, and then the calculated features were provided as an input to the ANN for fault classification [22].In a previous study, nineteen statistical parameters were extracted from the vibration signals, and fault classification was performed using SVMs [23].A combination of the coefficients of the linear time-invariant autoregressive model and nearest neighbor classifier were utilized for fault diagnosis [24].These networks can efficiently perform fault classification under constant shaft speeds.However, the efficiency of these fault diagnosis schemes decreases when tested with data for variable shaft speeds.A mechanism is required to address the issue for the underlying network so that it can efficiently extract useful information from nonstationary shaft speed data, making it suitable for efficient fault diagnosis in variable shaft speed conditions.In recent years, deep learning has emerged as a useful tool for solving pattern recognition, image processing, computer vision, and natural language processing problems and is capable of attaining informative features from minimally processed data via nonlinear transformations [25][26][27].Deep learning algorithms can replace the need for handcrafted features, as the algorithms are capable of unsupervised feature learning and hierarchical feature extraction [28].
In this study, a three-step mechanism was developed, which can diagnose bearing faults under shaft speed fluctuations using vibration acceleration signals.First, the energyfrequency distribution of the input vibration acceleration signals is estimated by calculating its complex envelope spectrum.Then, the calculated complex envelope spectrum, which extracts essential properties of the signal, is provided as input to the stacked sparse autoencoder (SSAE) based deep neural network.The end fault classification is performed using the SoftMax classifier.The efficiency of the proposed scheme was evaluated by testing it with vibration data obtained for four different shafts speeds.The results demonstrate a noticeable improvement of the diagnostic performance compared to the outputs of existing techniques.The rest of this paper is organized as follows.Section 2 describes the dataset used for this study.Section 3 presents the details of the proposed scheme, and Section 4 discusses the experimental results.Finally, Section 5 concludes the paper.

Bearing Dataset
The effectiveness of the proposed scheme was tested using bearing fault data provided by Case Western Reserve University [29].The faults were seeded in the test bearings on their outer raceway, inner raceway, and rolling element by using an electro-discharge machine (EDM).A variable length vibration signal was collected each time from a 2-horsepower (hp) reliance electric motor for a normal condition and three faulty conditions.The drive end vibration data obtained from an accelerometer, placed at the 12 o' clock position on the bearing housing, was utilized in the experiments.The sampling rate, fault diameter, and crack depth were 12,000 Hz, 7 mils, and 0.11 inches, respectively.The data was collected for four shaft speeds of 1,722, 1,748, 1,772, and 1,796 revolutions per minute (rpm).The four diverse types of signals specifying the three faulty states and the one normal state are shown in Table 1.

Complex Envelope Spectrum and Stacked Autoencoders for Fault Diagnosis
The proposed speed invariant fault diagnosis scheme is elaborated in Figure 1.First, segmentation of the vibration acceleration signal is carried out using a fixed size window, resulting in 117 segments of 1,024 data points each for every fault type and speed condition.The complex envelope spectrum of each segment is computed to reveal the instantaneous features hidden in the time domain signal.This spectrum contains an impulse response series on certain defect frequencies associated with each fault type [30].The defect frequencies are functions of the shaft speed, and therefore, a slight variation of the shaft speed can cause variations of the positions of these defect frequencies in the envelope power spectrum.Therefore, under variable speed conditions, the traditional envelope analysis yields poor results, as it relies on detecting bearing faults through the exact location of the corresponding defect frequencies in the envelope power spectrum.The proposed method does not rely on the exact locations of the defect frequencies; instead, it mines features from the entire spectrum.A variation of the shaft speed changes the exact locations of the peaks at defect frequencies.However, the relative positions of the peaks at the defect frequencies and the peaks at the principal harmonics of these defect frequencies remain unaltered.Thus, changes of the shaft speed skew the envelope power spectrum but do not drastically change its overall shape and structure, such that the relative positions of the defect frequencies and their principle harmonics remain the same.Therefore, due to the unsupervised learning and hierarchical feature extraction mechanism, stacked autoencoders can overcome the speed variations and automatically mine meaningful information from the complex spectrum.In this study, the focus was to compute the complex envelope spectra of the three fault conditions (i.e., inner, outer, and roller raceway faults).As a result, stacked autoencoders may take advantage of variations of the amplitude levels, as well as the relative positions of defect frequencies and the principle harmonics in the spectrum, resulting in extraction of the informative features.

Complex Envelope Spectrum.
The following steps are involved in computing the complex envelope spectrum.
(1) Computation of the analytical signal: it is described as follows.
The analytical signal is composed of a real signal and its Hilbert transform.Let us suppose that () is a time domain signal; then its Hilbert transform and a new time domain signal (), which is also known as the analytical signal, can be mathematically represented as shown below.
(3) The absolute of this spectrum |{()}| is used for further processing with stacked autoencoders.
Typically, a high pass filter is applied as a preprocessing step on this raw signal to eliminate the effects caused by slow vibrations.Given that the deep network can extract meaningful information automatically from the input data, high pass filtering of a raw signal is skipped while computing the complex envelope spectrum.Thus, the proposed method using this spectrum, which is given as an input to the deep network, helps to reduce the complexity, as well as the number of steps for calculating the complex envelope spectrum of signals.

Input Hidden Layer
Output layer Encode Decode

Stacked Autoencoders.
A stacked autoencoder is a deep artificial neural network having more than one hidden layer, and it is formed by stacking simple autoencoders for feature extraction and classification.The functionality of stacked autoencoders can be understood by considering the knowledge of a single autoencoder.Figure 2 shows the architecture of a typical stacked autoencoder.An autoencoder is a threelayered artificial neural network (ANN) that operates in an unsupervised manner.There are two main parts of an autoencoder, the encoder and the decoder.The encoder part utilizes an input  ∈   and provides an output  ∈   by transforming the inputs from a higher-dimensional space (i.e.,  dimensions) to a lowerdimensional space (i.e.,  dimensions).The produced output vector is known as codes or latent variables, and it can be mathematically represented as follows: where  1 , , and  are the encoding activation function, weights, and biases, respectively, which are used in the hidden layer.On the other hand, the decoder part tries to reconstruct the inputs x from the generated codes.The reconstruction process can be represented as follows: Here,  2 ,   , and   are the deactivation transfer function, weight, and bias, respectively, which are used in the reconstruction process.During the reconstruction process, autoencoders try to minimize the reconstruction error by using the following loss function: In the current work, sparse autoencoders were used to create sparse stacked autoencoders (SSAEs).The concept of sparsity in an autoencoder is explained in the following section.

Sparse Autoencoder.
The sparsity constraint can be introduced to the cost function of an autoencoder with the help of a regularization term.This regularization term is a function which measures the average output activation value of a neuron and is helpful in avoiding the overfitting problem.
The regularized cost function after introducing sparsity and weight regularization can be represented as follows: Here,  indicates the  2 regularization coefficient and  is the sparsity regularization coefficient.During the training of an autoencoder, it is possible that the value of sparsity regularization term decreases by increasing the values of weights  and decreasing the values of latent codes .This issue can be resolved by introducing  2 regularization to the cost function, which can be formulated as where , , and  denote the number of hidden layers, number of observations, and number of variables in the input data, respectively.The sparsity regularization term controls the sparsity constraint on the output from the hidden layer neurons.It takes a higher value when the  th neuron provides an average activation value Ṽ that deviates mainly from the desired value V.It can be defined by the Kullback-Leibler divergence as follows: The function given in (8) measures the difference between the two distributions; if the two distributions are equal, it takes a zero value and increases as the distributions diverge from each other.When minimizing the cost function, this term is forced to be as small as possible; as a result, the two values V and Ṽ come closer to one another.The activation Ṽ can be defined as Once all the sparse autoencoders are trained individually, they are stacked to form the deep neural network (DNN).A typical three-layered deep neural network (DNN) based on SSAEs is given in Figure 3.In each hidden layer, sparsity is introduced into the network by a sparsity regularization term.In such a DNN, the autoencoders extract useful features through an unsupervised learning process.The DNN is then fine-tuned in a supervised manner using backpropagation in combination with the standard gradient descent algorithm.After fine-tuning, the network is tested using unseen data.The steps that are carried out during the fine-tuning of a deep neural network using the standard gradient descent based backpropagation algorithm are given as follows: (1) The weights and biases are initialized with small random nonzero values.
(2) A set of input observations "" are provided to the DNN, and the corresponding activation  ,1 is calculated.
(4) The predicted output is compared with the actual value to calculate the error between the two values.
The computed error is denoted by where ∇   is the change in the cost function and   is derivative of activation function used in the neurons of a layer.
(5) Backpropagation of the error is performed to update the weights in order to minimize the error.
(6) The gradient of the cost function is calculated as /   =  −1     and /   =    .( 7) Steps (1)-( 5) are repeated until the overall error is reduced to the smallest possible value.

Experimental Setup
In the current work, four experiments were carried out to validate the effectiveness of the proposed scheme when dealing with shaft speed fluctuations.The set of experiments is listed in Table 2, and each experiment was conducted multiple times with different numbers of epochs to train the network.The bearing fault data was divided into four separate datasets based on the shaft speed, with each dataset containing samples from the normal state, as well as from the inner raceway, outer raceway, and roller faults (468 samples).
In each experiment, the network was trained using samples from one shaft speed dataset and validated with the samples of the other shaft speed datasets.

Parameter Selection for Stacked Sparse Autoencoders (SSAEs).
According to [31], selection of parameters during deep learning affects the performance of the model.In this work, while developing an SSAE-based DNN for bearing fault diagnosis, the model was repeatedly tested with different values for parameters, like the receptive input size,  sparsity constraint, and the number of hidden nodes, and their effects on the reconstruction error of the model were observed.The reconstruction error is the difference between the reconstructed input and the original input and thus can help to improve the developed fault diagnosis model.In the subsequent sections, the details regarding the necessary parameters that are used for developing an autoencoder are given.The following details about the parameters are provided by considering the first autoencoder used in the SSAEs.

Receptive Input Size.
The length of a single sample that is provided as input to an autoencoder is called receptive input size.It is observed that the quality of the higherlevel representative features, which are extracted from the input, improves when larger input sample size is provided to autoencoders.However, it also increases the computational overhead and hence smaller receptive input size is used to achieve better computational performance.In this work as we have used a window size of 1024 to calculate the complex envelope spectrum, the respective input size is 1024.Using an even larger input size would significantly increase the training time for the DNN, but may not yield proportional improvements in diagnostic performance.Therefore, an input size of 1024 data points is used to achieve a reasonable trade-off between diagnostic performance and computational costs.

Number of Hidden Neurons.
The number of hidden neurons that appear in the hidden layer of an autoencoder plays a crucial role in the extraction of higher-level representative features.There is no defined rule for selecting the number of nodes in the hidden layer of an autoencoder.
According to the available literature [32], the number of nodes in the hidden layer must be less than the receptive input size to learn the compressed representation of the input data.
Figure 4 shows the effect of hidden layer neurons in the first autoencoder on the reconstruction error.It is evident that the reconstruction error of the autoencoder is less when the number of nodes is equal to half of the receptive input size or fewer.This criterion of half or fewer than half is also valid for all the subsequent hidden layers of the SSAEs.

Sparsity Constraint.
The primary objective of an autoencoder is to extract higher-level representative features through an unsupervised learning process.In unsupervised learning, an appropriate sparsity constraint can improve the forward learning of an autoencoder.The effect of the sparsity constraint on the reconstruction error of the first autoencoder is shown in Figure 5.It is apparent that the reconstruction error is almost invariant while keeping the sparsity proportion in the range between 0.15 and 0.2.Therefore, to construct a deep neural network, the value of the sparsity proportion in all the hidden layers is kept at a value of 0.15.

Number of Hidden Layers.
The number of hidden layers influences the learning process of the SSAE-based DNN.Table 3 shows the influence of the number of hidden layers used to develop the DNN.It can be observed that the  smallest reconstruction error is for the case that 4 hidden layers are used in the DNN.Moreover, there is no noticeable decline in the reconstruction error with an increase in the number of hidden layers.The error remains almost the same when a greater number of hidden layers are used, and the performance of the DNN is almost unchanged.

Average Execution Cost.
In addition to reconstruction error, another metric that is worth considering is the computational cost of the training process, that is, the average amount of time required to train the DNN.The number of hidden layers and the nodes in each hidden layer affects the average execution cost or the time required to train the network.The DNN with the highest number of hidden layers and nodes will have the highest average execution cost as it will have more network parameters to tune. Figure 6 shows the average execution cost for different DNN structures considered in this study.It can be observed that the execution cost is high for networks with complex architecture.It can be noted from Figure 6 that the DNN with four hidden layers and 500, 250, 125, and 62 neurons, respectively, in each of those four layers has the highest execution cost, whereas the execution cost is reduced when the network architecture is simple, that is, with fewer hidden layers and fewer nodes in each hidden layer (i.e., 100/50).So, from these observations it can be concluded that the addition of more nodes in the hidden layers adds more complexity to the DNN structure, thereby requiring more execution time.
By observing the effect of different parameters on the two metrics used to evaluate the performance of DNN, it can be noted that although the reconstruction error is the lowest for the most complex network, that is, the DNN with four hidden layers, however the average execution cost is maximum in this case.Moreover, it can also be observed in Table 3 that if more nodes are added to a given hidden layer, then the network reconstruction error does not decrease substantially.So, to achieve the best trade-off between reconstruction error and execution cost, a threelayered network structure has been adopted throughout the experiments.The reconstruction error and the execution cost for this network structure vary little as compared to the two-layered network structure, where these values are at their minimum.The network structures with multiple hidden layers efficiently perform dimensionality reduction, thereby improving the final classification results.Each hidden layer performs principle component analysis on the input data and outputs a reduced set of representative features, hence reducing the features vector dimensions.Hence, the adopted three-layered network provides a reduced set of features, that is, 100, 50, and 25 features per features set from its first, second, and third layer, respectively.Based on the above discussion and after observing the effect of different parameters on the performance of stacked sparse autoencoders (SSAEs), the optimal parameters selected for the SSAE-based DNNs are listed in Table 4.

Results
Figure 7 shows the extracted complex envelope spectra for the inner raceway, outer raceway, and roller element faults for different shaft speeds.The complex envelope spectrum is used to calculate the energy-frequency distribution of the given vibration signal.In the complex envelope spectrum, defect frequencies exist for a given fault.A noticeable variation of the energy levels, as well as in the energy distribution pattern, can be observed among the spectra of the different fault types.However, the variation is indistinct among the given fault types under different speed conditions.By taking advantage of the variations of energy levels and defect frequencies present in the complex envelope spectrum for a given fault type, the stacked autoencoders can learn distinct features.
In Figure 8, the scatter plots of the features extracted from the complex envelope spectrum for different shaft speeds are given.It is worth noticing that the features extracted by the autoencoders from the complex envelope spectrum for a given health condition under different shaft speeds are clearly distinguished from one another and clustered separately.These discriminant features enhance the performance of the DNN, performing effective fault classification when fluctuations of the shaft speed occur.A comparison of the results of the proposed scheme, stacked denoising autoencoders (SDA) [33], and vibration spectral imaging (VSI) [34] for four different experiments is presented in Table 5.In the SDA-based scheme, a two-layer deep neural network was developed using denoising autoencoders.The raw vibration signals describing four health conditions of the bearings were used as inputs, which were then contaminated with noise and segmented into 200 window size samples.The resulting samples were provided as inputs to the DNN for bearing fault diagnosis, which provided satisfactory diagnostic results for the bearing when using a noisy signal under constant speed conditions.Whereas in the VSI-based scheme authors presented a bearing fault diagnosis based on vibration spectrum imaging and artificial neural networks (ANNs), in this scheme, a 513-point Fast Fourier Transform (FFT) was first calculated by using 1024 window-sized vibration signals.
Later, the calculated 513-point spectral data was stacked to create a 513 × 8 size grayscale image.The resulting images were subjected to an 8 × 4 sized smoothing filter and later converted to binary images by using an optimum threshold    value (0.7).The resulting binary images having a total of 4014 frequency components were fed to an ANN with three hidden nodes.Both the schemes (SDA and VSI) were evaluated by using the Case Western Reserve University seeded fault bearing dataset.In addition, the results of a backpropagation neural network (BPNN), trained on the same data used in the proposed method, are also included for comparison.It can be observed that the minimum average fault classification accuracy of the proposed method is 90%.On the other hand, SDA, VSI, and BPNN, despite having superior performance in constant speed scenarios, fail to provide better results when speed fluctuations are experienced.Based on the results of this study, the proposed method outperformed the existing methods.In the proposed method, the variations of the energy levels and the presence of defect frequencies in the complex envelope spectrum of a given fault made the anomalous pattern more vibrant, helping the autoencoders to efficiently mine informative features that can be easily distinguished among the machine health conditions under shaft speed fluctuations.
In addition to the steady-state regime, results of the experiment are also presented in this work where subsamples from each fault category and operating speed are taken for training the SAE-based DNN.The results obtained using the proposed model in this configuration are compared with those of ANN and SDA.These experimental results are shown in Figure 9, and they clearly reveal that the proposed method yields the best results when subsamples from each fault category and operating speed are used for training the network as compared to the other two algorithms.

Conclusions
This work presents a stacked sparse autoencoder-based deep neural network (SSAE-DNN), which in combination with a complex envelope spectrum for inputs performs fault diagnosis of rotary machines when there are fluctuations of the shaft speed.In the proposed scheme, vibration signals related to different health conditions of a motor bearing are preprocessed using the complex envelope signal.In the proposed method, information obtained by the stacked autoencoders from the defect frequency, as well as its principle harmonics present in the complex envelope spectrum for a given fault, makes it possible to classify faults with varying speeds.The efficiency of the proposed scheme was validated using rotating machine bearing data for four different shaft speeds.A series of experiments were performed, consisting of dividing the fault data related to the four different shaft speeds into separate datasets and processing each dataset separately for fault diagnosis, in order to anticipate the efficiency.In each experiment, the complex envelope spectrum of one operating speed was used to train the network before testing with datasets comprised of the remaining three shaft speeds.

Figure 1 :
Figure 1: Illustration of the proposed fault diagnosis scheme.

Figure 4 :
Figure 4: The effect of the number of hidden nodes in the first autoencoder on the reconstruction error.

Figure 5 :
Figure 5: Effect of sparsity proportion on the reconstruction error.

Figure 7 :
Figure 7: The complex envelope spectra of an (a) inner raceway fault, (b) outer raceway fault, and (c) roller fault signals for various shaft speeds.

Figure 8 :
Figure 8: Scatter plots of the features extracted by stacked autoencoders for different shaft speeds.

Figure 9 :
Figure 9: The accuracy of the proposed model, SDA, and ANN when subsamples from every fault condition and RMP were used for training the network.

Table 1 :
Description of the datasets used in the proposed scheme.

Table 2 :
Description of the experiments performed using data obtained for the different RPM values.

Table 3 :
Reconstruction of SSAEs with different numbers of hidden layers and hidden nodes.

Table 4 :
Specifications for the training and design parameters of the stacked autoencoders.

Table 5 :
The average classification accuracy of the proposed method, BPNN, VSI, and SDA for variable shaft speed conditions.