Intelligent Diagnosis of Subway Traction Motor Bearing Fault Based on Improved Stacked Denoising Autoencoder

Aiming at the problem that the complex working conditions affect the effect of manual feature extraction in bearing fault diagnosis of metro traction motor, a fault diagnosis method of metro traction motor bearing based on improved stacked denoising autoencoder (SDAE) is proposed. +is method extracts fault features directly from the original vibration signal through deep learning, reduces the dependence on signal processing technology and diagnosis experience, and solves the problem of unsatisfactory effect of extracting feature values under complex working conditions. +e effect of the improved SDAE network structure on the accuracy of bearing fault diagnosis is studied through experiments, and the best network parameters are selected. +e test results show that the proposed method can well extract the deep features of the fault under the condition of variable speed and variable load; when using data sets with complex working conditions, the classification accuracy of the proposed method is better than that of many traditional fault diagnosis methods.


Introduction
e frequent start and stop of the subway make the bearing of the traction motor continuously bear the changes of speed and load, and the inner and outer rings and rolling elements of the bearing are prone to cracks and pitting corrosion, which cause certain safety risks to the smooth operation of the subway [1,2]. erefore, intelligent diagnosis of subway traction motor bearings, timely identification of early failures of bearings [3], and early maintenance and analysis of the causes of failures are of great significance to the normal operation of the subway [4]. e fault diagnosis of bearings has always relied on the signal processing technology of diagnostic experts, and the signal processing method also affects the quality of feature value extraction [5][6][7]. For example, Chen et al. [8] use empirical wavelet transform (EWT) to decompose the signal into single components under an orthogonal basis, extract the inherent modulation information of the signal, and diagnose bearing faults. Li et al. [9] used a bearing vibration feature extraction method based on multiscale displacement entropy (MPE) and an improved support vector machine binary tree (ISVM-BT) to diagnose bearing faults. Zvokelj and Zupan [10] conducted bearing fault diagnosis based on EEMD multiscale independent component analysis. e above personnel manually extract features based on relevant signal processing technology and a large amount of prior experience [11], but these manually extracted features are extracted based on specific diagnostic problems and may not be suitable for other problems, and the effect is general under variable working conditions. In addition, the lack of experience of the diagnosis expert will also omit important information in the original signal in the process of extracting the feature value [12], which will have a certain impact on the result of the fault diagnosis [13].
With the development of computer technology and the advent of the era of electromechanical big data [14], deep learning technology has begun to be applied to the field of fault diagnosis [15], solving many problems in fault diagnosis [16]. Wen et al. [17] proposed a LeNet-5-based CNN for fault diagnosis, which adaptively extracts fault features by converting the signal into a two-dimensional image. Guo et al. [18] proposed a hierarchical learning rate-adaptive deep convolutional neural network based on an improved algorithm and studied its application in bearing fault diagnosis and severity determination. Shao et al. [19] proposed an adaptive deep confidence network (DBN) with a dual-tree complex wavelet packet (DTCWPT), which is applied to the fault diagnosis of rolling bearings. ese methods all use deep learning technology for adaptive extraction of fault features and still have good feature extraction effects under complex working conditions. Deep learning is mainly divided into four types: deep autoencoder network, convolutional neural network, recurrent neural network, and deep confidence network. Among them, the autoencoder network is an unsupervised learning algorithm, which can compress the input information and extract the most representative information in the data. It has an incomparable advantage in feature adaptive extraction [20].
Based on this, an intelligent diagnosis network for subway traction motor faults based on improved SDAE is proposed. e BN layer normalization and L2 regularization are added to the network, and the Leaky ReLU activation function is introduced. e improved SDAE is used for adaptive extraction of fault features, and the data sets of variable speed and variable load of subway traction motor bearings are tested to verify the effectiveness of the proposed method.

Algorithm Description
2.1. Autoencoder. Autoencoder (AE) is a typical unsupervised algorithm, which is often used for data compression, feature extraction, and other aspects. It is mainly composed of input layer, hidden layer, and output layer. e input layer and output layer are symmetrical in structure and have the same number of nodes. e encoding process is formed between the input layer and the hidden layer, and the decoding process is formed between the hidden layer and the output layer, as shown in Figure 1.
x i (i ∈ 1, 2, 3, . . . , n) is the input data, h j (j ∈ 1, 2, 3, . . . , m) is the feature data of the middle hidden layer, and x i (i ∈ 1, 2, 3, . . . , n) is the output data. e encoding process is that the input vector x is mapped to the feature vector h of the hidden layer through the activation function f q . e formula is as follows: e decoding process is that the feature vector h is mapped to the feature vector of the output layer through the activation function f g . e formula is as follows: Among them, X, H, and X are the data of the input layer, the hidden layer, and the output layer, respectively; W q and W g are the weight matrices of the encoding process and the decoding process, respectively; b q and b g are the offset parameters of the encoding process and the decoding process; f q and f g are the activation functions of the encoding process and the decoding process. Commonly used activation functions are sigmoid, tanh, ReLU, and so on, but sigmoid and tanh are prone to gradient dispersion during training, and the ReLU activation function is fast in calculation and can better avoid gradient dispersion problems. e function image of ReLU is shown in Figure 2, and the formula is as follows: As can be seen from the figure, when the input value is negative, the output value of the activation function is 0, which results in the failure of updating parameters of the neurons. To avoid this problem, Leaky ReLU activation function was introduced, and its function image is as shown in Figure 3, and the formula is as follows:

Build Stacked Denoising Autoencoder Network.
Stacked autoencoder (SAE) is a typical unsupervised network model in deep learning. It is composed of multiple autoencoders, and the hidden layer of the former is used as the input layer of the latter. Compared with the structure of a single AE, the stacked network structure has a deeper network structure and a more complex nonlinear mapping relationship, which can extract more abstract expression characteristics from the original data. Due to the instability of the original signal, in order to improve the robustness of AE, noise is randomly added to the original data X to obtain the contaminated data x, which is used as the input of the autoencoder, and at the same time make the output of the autoencoder as close as possible to the uncontaminated original signal x. Its network structure is shown in Figure 4. e structure of the improved SDAE is shown in Figure 5. As a basic unit, the denoising AE is placed in the first layer of SDAE. After that, multiple AE are stacked [21], and Softmax classifier is added in the last layer to classify the features extracted in depth. e training flowchart of each AE is shown in Figure 6. Among them, the normalization of the BN layer plays a role in accelerating training and reducing the gradient dispersion and does not change the distribution of the original data. L2 regularization reduces overfitting, and a model with better generalization ability can be obtained, which is suitable for fault diagnosis under variable operating conditions.

Construction of Fault Diagnosis Network for Metro Traction Motor Bearing
A three-layer SDAE network structure (3750-128-64-6) is designed for fault diagnosis of subway traction motors, including a double hidden layer structure and a Softmax classifier. Among them, 3750 is the number of input layer nodes, 128 is the number of hidden layer nodes in the first layer, 64 is the number of hidden layer nodes in the second layer, and 6 is the number of nodes in the final output layer. e samples input to the network undergo dimensionality reduction, fusion, and nonlinear transformation, compressed and fused by the Softmax classifier and transformed into a 6-dimensional output. e flowchart of metro traction motor fault diagnosis network based on improved stack autoencoder is shown in Figure 7.

Information Collection System.
e subway traction motor bearing information collection system is a rolling bearing experimental platform jointly developed by Henan University of Science and Technology, Luoyang Bearing Research Institute, and Henan Engineering Laboratory of  x 1 x 2 x 3 x n

Encoding process
Decoding process Intelligent CNC Equipment, as shown in Figure 8. e information acquisition system consists of a test bench, subway traction motor bearings, hydraulic loading system, sensor information acquisition module, and computer.
Among them, the sensor information acquisition module is mainly composed of a vibration sensor and a data acquisition card. e sensor adopts the LC0151T highprecision vibration acceleration sensor, which has a range of 33 g and a sensitivity of 150 mv/g. It also uses a PCI8510 data acquisition card that can collect 8 channels of data at the same time, which can well meet the needs of experimental data collection. In the process of data collection, the signal collected by the vibration sensor is a current signal; after a signal conditioner, the current signal is converted into a voltage signal and the computer collects and stores the signal through the data acquisition card. e acquisition system is shown in Figure 9.

Test Parameters and Scheme.
Nu216 metro traction motor bearing is used as the test bearing. In order to test and verify, the defects on the inner ring, outer ring, and rolling element of the bearing are prefabricated, and the simulated faults are processed by laser marking machine, which are inner ring pitting, inner ring cracking, outer ring pitting, outer ring pitting, rolling element pitting, and rolling element cracking. e crack width is 30 um, the pitting diameter is 40 um, and the depth is 30% of laser energy.
Due to the frequent start and stop of the subway, the bearing of the traction motor is constantly subjected to changes in speed and load.
e variables of the test are selected as speed, load, and bearing failure. e experiment collects data under 54 (3 × 3 × 6) working conditions and collects 4 signals for each working condition. e test arrangement is shown in Table 1.

Analysis of Test Data.
e sampling frequency of the test is 50 kHz, the speed of bearing is 800 rpm, 1600 rpm, and 2400 rpm, and the load of bearing is 5 kN, 7 kN, and 9 kN. e vibration data of the bearing under the working conditions of 2400 rpm and 7 kN are selected to construct the data set s to preliminarily analyze the deep learning network and carry out the intelligent fault diagnosis of the bearing. e fault details are shown in Table 2.

Sample Preparation and Labeling.
In the experiment, the sampling frequency is 50 kHz, and the sampling time of each sample is 10 s. erefore, each sample has 500000 data points, which needs to be segmented to obtain a large number of training data. According to Shannon's sampling theorem, the sampling period is at least two revolutions of the bearing; considering the influence of rotation speed, every 7500 data points are finally selected as a group of small samples. Based on this, the original data are processed as follows: (1) e 500000 data points are divided into two parts: 425000 data points and 75000 data points. (2) e first 425000 data points are used as a training set to segment data, and the data are overlapped to achieve the purpose of data amplification. 425000 data points are divided into 100 groups of small sample data, and each group of small sample data contains 7500 data points, as shown in Figure 10. (3) e last 75000 data points need to be used as the test set, without overlapping sampling, and directly truncated. Ten sets of test set samples with a data length of 7500 are obtained. Taking data set s as an example, the number of training set samples and test set samples can be obtained after the original data is processed in the above sections, as shown in Table 3. Each type of fault is labeled according to the number in Table 3, and it is divided into 6 types of faults.
Before inputting the data to SDAE, it is necessary to process the data to meet the input format of the deep neural network. After the experiment, all the vibration data are finally selected to do FFT, and the frequency domain data of the signal are input into the deep neural network for training and testing. Each type of fault is labeled according to the number in Table 3, which is divided into six fault types.

Influence of SDAE Network Structure on Bearing Fault
Diagnosis Results. SDAE is stacked by multiple autoencoders. e number of nodes, number of iterations, learning rate, and even batch training number of each autoencoder will have a certain impact on the results of fault diagnosis. Particularly, the number of layers and nodes has a great influence on the accuracy of diagnosis results. In order to find the appropriate network structure, the number of layers and nodes of the network is studied under the condition of other parameters unchanged (learning rate is 0.001, batch training number is 256, and iterations are 20 times). In this paper, the influence of the following network structure on the diagnosis results under the data set s is studied. e average value of 10 diagnostic results is taken as the accuracy rate, as shown in Table 4.
It can be concluded from Table 4 that the number of network layers and nodes per layer has a certain impact on the diagnosis results, but it is not that the more the network layers and nodes, the higher the accuracy rate. e accuracy rate of the test set of three kinds of network structure is 100%, which are 3750-128-64-6, 3750-256-128-6, and 3750-256-128-64-6. Because of the more layers and nodes of the network and because the amount of calculation is greater and the training time is longer, the network structure of 3750-128-64-6 is selected.

Visualization of Fault Characteristics.
In order to reflect the advantages of SDAE in feature extraction, this paper chose the t-SNE algorithm to visualize the data and perform cluster analysis and visualization on the original data and the extracted features layer by layer, as shown in Figure 11. e same fault types are marked according to the number in Table 3, and the numbers 0-5 and different colors are used to distinguish. It can be clearly seen that the original data is irregular and confused with each other. After the first autoencoder extracts features, the data of the  Figure 9: Schematic diagram of the acquisition system.  Shock and Vibration same category are close to each other, and the class spacing of the data of the same category will be further reduced when features are extracted by an autoencoder, which fully shows that the proposed SDAE has a good effect on feature extraction.

Analysis and Comparison of Test Results
e above tests all adjust the SDAE network on the data set collected under the stable working conditions of the subway traction motor bearing. However, in the actual operation of the subway, frequent start and stop and changes in passenger capacity will also cause changes in the load on the bearings. Fault diagnosis under a single working condition is not enough to explain the stability and effectiveness of the proposed method. It is still necessary to verify the effectiveness of the proposed method under variable load and variable working conditions.

Intelligent Diagnosis of Bearing Failure in Single Working Condition Change.
e change of signal during bearing operation is mainly affected by the speed and load. Here, the speed and load are, respectively, controlled to be constant, and the proposed SDAE is used to diagnose the bearing fault. e collected bearing vibration signal data is processed according to the method proposed in Section 4.4. According to the number, it is divided into 6 types of faults, the label is consistent with the fault number, and the correctness of the fault diagnosis of the SDAE network is tested through experiments. e results are shown in Tables 5 and 6.
From Tables 5 and 6, it can be concluded that the accuracy of fault diagnosis under the same speed and variable load can basically reach 100%, while under the same load and variable speed, the accuracy of fault diagnosis decreases slightly, especially when the training set speed is higher than the test set speed. e reason is that the rotation speed of the training set is high, and obvious fault characteristics are extracted, while the rotation speed of the test set is low, and the fault characteristics are lower than the failure amplitude of the training set, so the diagnostic accuracy rate drops slightly.

Intelligent Diagnosis and Comparative Verification of
Bearing Faults with Changing Operating Conditions. In order to verify the effectiveness of SDAE again, a more complex data set was selected as the training set and test set (as shown in Table 7) and, compared with other methods on this data set, reflect the advantages of SDAE in the case of variable speed and variable load. e following common fault diagnosis methods are selected for comparison and verification. e diagnostic accuracy of different methods is shown in Table 8: (1) Method 1: the vibration signal is decomposed by 8layer wavelet packet, and the root mean square value, kurtosis value, and margin value are extracted, respectively, in 256 frequency bands after decomposition and then input into convolution neural network for fault diagnosis. According to the number of input layer elements 768, the convolution layer is selected as 2 layers, which are (28 × 28) and (12 × 12), the pooling layer is 2 layers (14 × 14) and (6 × 6), the dense layer is 1 layer, and the output layer uses "Softmax" classifier. e convolution kernel is set to 3 × 3, the sliding step size is set to 1, the learning rate is 0.001, the activation function is "ReLU" activation function, and the training times are 25. (2) Method 2: the vibration signal is decomposed by 8layer wavelet packet, and the root mean square value, kurtosis value, and margin value are extracted, respectively in 256 frequency bands after decomposition, which are input into SDAE for fault diagnosis. e network structure of SDAE is 768-128-64-6, the number of batch training is 25, and the number of iterations is 20.
(3) Method 3: the vibration signal is decomposed by EMD, and 12 characteristic indexes, such as mean, standard deviation, variance, skewness index, kurtosis value, peak-peak value, peak mean square amplitude, average amplitude, root mean square value, waveform index, and peak value index, are extracted from the first five IMF components and input into support vector machine (SVM) for fault diagnosis. (4) Method 4: the frequency domain signal of vibration signal is directly input into convolution neural network for fault diagnosis. Two convolution layers (60 × 60) and (28 × 28), two pooling layers (30 × 30) and (14 × 14), one dense layer, and "Softmax" classifier are selected for output layer. e convolution kernel is set to 3 × 3, the sliding step is set to 1, the learning rate is 0.001, the activation function is the "ReLU" activation function, and the number of training times is 25. (5) Method 5: the frequency domain signal of vibration signal is directly input into the traditional SDAE network. e traditional SDAE network does not add BN layer for normalization, does not use L2 regularization, and uses ReLU activation function, the network structure is 3750-128-64-6, the learning rate is 0.001, the number of batch training is 256, and the number of iterations is 20.

Conclusion
In the bearing fault diagnosis of metro traction motor, the extraction of fault characteristics of nonstationary signal under complex working conditions has been one of the research hotspots. e commonly used fault feature extraction methods are manual feature extraction, but in complex working conditions, the effect of feature extraction is not ideal. e improved SDAE network proposed in this paper can adaptively extract fault information under complex working conditions, and the fault diagnosis accuracy rate is more than 96.18%. e following conclusions can be drawn: (1) rough the comparison between the method and method 4, BN layer is added to the traditional SDAE network for normalization optimization, L2 regularization is used to prevent overfitting, and Leaky ReLU activation function is introduced for improvement, which effectively solves the problems of gradient dispersion and overfitting. e extracted features can well express the original signal and improve the accuracy of fault diagnosis.
(2) Under the condition of constant speed and constant load, the diagnostic accuracy of fault feature extracted by this method is equivalent to that of manual extraction method, which can reach 100% accuracy. (3) Under the condition of variable speed and variable load, the accuracy rate of fault feature extracted by this method is far higher than that of traditional manual feature extraction methods in method 1, method 2, and method 3. It shows that the improved SDAE still has strong feature extraction ability under variable working conditions, which can meet the engineering requirements and can be used in the feature extraction stage of mechanical fault diagnosis.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.