Deep Sparse Autoencoder for Feature Extraction and Diagnosis of Locomotive Adhesion Status

The model is difficult to establish because the principle of the locomotive adhesion process is complex. This paper presents a data-driven adhesion status fault diagnosis method based on deep learning theory. The adhesion coefficient and creep speed of a locomotive constitute the characteristic vector.The sparse autoencoder unsupervised learning network studies the input vector, and the single-layer network is superimposed to form a deep neural network. Finally, a small amount of labeled data is used to fine-tune training the entire deep neural network, and the locomotive adhesion state fault diagnosismodel is established. Experimental results show that the proposed method can achieve a 99.3% locomotive adhesion state diagnosis accuracy and satisfy actual engineering monitoring requirements.


Introduction
Precise diagnosis of the wheel-rail adhesion state is an important prerequisite for adhesion control.Currently, the wheel-rail adhesion state of a locomotive is mostly diagnosed based on the detection and analysis of relevant parameters to determine the type of adhesion state and the degree of adhesion [1].For the diagnosis of adhesion states, a sampling eigenvector should be generated based on the creeping speed of the driving wheel and the wheel-rail cohesion coefficient, a sample feature should be extracted, and the feature should be coded; then, various intelligent algorithms should be used to classify the eigenvector [2].Many studies on the use of neural networks in the adhesion field have been reported.For example, Castillo et al. [3] used a neural network to estimate the adhesion state in an ABS system.Castillo [4] trained an artificial neural network to calculate the best creep operating point for each road on the basis of traffic information collected by a vehicle sensor.Li Ningzhou [5] studied the adhesion feature of the air brake of a locomotive and used the optimized recursive neural network to optimize the parameters of the adhesive controller and improve the utilization rate of locomotive adhesion, thereby obtaining a good experimental result.
These methods are more convenient and intelligent than the general mechanism analysis method.However, these methods still belong to the supervised learning area [6].Thus, they require sufficient data for feature extraction.Meanwhile, extracting the right features is often relatively complex and difficult.To obtain labels, experiments and rich professional knowledge are required.With the artificial participation factors, the uncertainty of feature extraction and optimization is greatly increased, thereby making the diagnosis of the right adhesion state difficult.Furthermore, a traditional neural network essentially uses hidden-layer neurons for nonlinear transformation [7].It can learn potential features from a given sample and fit out an approximation function [8].Taking the classical BP neural network as an example, obtaining highprecision features becomes more difficult while the layers are few.If the number of layers is excessive, then the gradient may disappear, and the local optimal solution is another defect that is difficult to overcome [9].
Sample feature extraction is a key step in determining the accuracy of fault diagnosis [10].The change in adhesion T r state is a complex process that is affected by multiple factors, producing a complex nonlinear relation between factors and outcomes.Fault prediction and analysis are particularly challenging.The introduction of deep learning [11] has made a breakthrough in research on high-precision feature extraction.As an unsupervised learning algorithm, deep neural network not only has an excellent feature extraction ability but can also overcome the common problem of obtaining sample labels [12].Thus, deep neural network has become a popular research area in the field of fault diagnosis [13][14][15].This paper proposes a sparse autoencoder deep neural network with dropout to diagnose the wheel-rail adhesion state of a locomotive.This deep neural network can significantly reduce the adverse effect of overfitting, making the learned features more conducive to classification and identification.
The rest of this paper is organized as follows: Section 2 describes the adhesion principle and characteristics.Section 3 describes the principle and process of the deep neural network algorithm.Section 4 discusses the comparative experimental research and result analysis.Section 5 presents the conclusions.

Description of Adhesion Status
Adhesion is the ultimate manifestation of locomotive driving force in the wheel-rail relationship and the fundamental motive force for locomotives [16].The wheel pair rolls forward when subjected to a tangential traction, and the rolling pressure causes deformation between the wheel and the rail.Simultaneously, the gravity of the car body imposed on the rail keeps the contact surface between the wheel and the rail relatively stable.This phenomenon is called adhesion.As shown in Figure 1, the contact point between the wheel and the rail is elastically deformed under the action of the wheel load (P).The wheel rolls forward under the action of the driving torque (T), the original contact surface deformation develops into a new elliptical deformation, and the tractive effort at the wheel rim (F) is generated.
Adhesion coefficient  is typically defined as the ratio of traction to axle weight (see (1)), where W is the axle weight (kg) and  is the gravitational acceleration (m/s 2 ).

𝜇 = 𝐹 𝑊 ⋅ 𝑔
(1)  In the process of normal movement, the train body speed (V  ) is always less than the wheel speed (V  ) due to the wheel-rail microsliding generated by the deformation.This phenomenon is called creep, and the speed difference between them is defined as creep speed V  .
Creep is a slight wheel-spin phenomenon produced by the locomotive drive system.The adhesion coefficient of the rail contact surface rises constantly with the creep speed within a certain range [17], and the locomotive has a great available traction.Once the range is exceeded, the wheel-rail adhesion coefficient drops sharply with the increase in the creep speed.
Figure 2 shows the adhesion characteristic curve of the locomotive.The adhesion peak point is taken as the boundary in which the left side is called the creep region and the right side is called the slid region [4].However, when the adhesion state is divided into two categories, abnormal adhesion can be identified, but the predicted foundation of potential creep failure cannot be provided.To this end, this paper further refines the adhesion features: normal (N0), fault symptom (N1), small fault (F1), and large fault (F2).
The adhesion state is divided into four categories.When a minor fault is encountered, fault tolerant control methods [18] can be adopted to prevent serious system performance deterioration [1].

Deep Neural Network
Unsupervised learning can be used to automatically learn potential features from the samples without labels [19,20].This method has a significant advantage when addressing complex problems, such as adhesion state recognition.The sparse autoencoder is an unsupervised algorithm, and this deep neural network can effectively extract the characteristics that reflect the adhesion state [21,22].

Sparse Autoencoder.
From the structural point of view, the autoencoder is an axisymmetric single hidden-layer neural network [23].The autoencoder encodes the input sensor data by using the hidden layer, approximates the minimum error, and obtains the best-feature hidden-layer expression [24].The concept of the autoencoder comes from the unsupervised computational simulation of human perceptual learning [25], which itself has some functional flaws.For example, the autoencoder does not learn any practical feature through copying and inputting memory into implicit layers, although it can reconstruct input data with high precision.The sparse autoencoder inherits the idea of the autoencoder and introduces the sparse penalty term, adding constraints to feature learning for a concise expression of the input data [26,27].
For the adhesion state identification of locomotive, k sets of monitoring data { 1 ,  2 ,  3 , . . .,  n } exist, which are reconstructed into a N × M data set {(1), (2), (3), . . ., ()}, () ∈   .These data are used as input matrix X.The input data encoded by the automatic encoder are used to construct a mapping relationship.In this paper, the activation function of the autoencoder is sigmoid, which is designed to obtain a better representation of input data: ℎ(, , ) = ( + ).A sparse penalty term is added to the sparse autoencoder cost function to limit the average activation value of the hidden-layer neuron.Normally, when the output value of a neuron is 1, it is active, and the neuron is inactive when its output value is 0. The purpose of enforcing sparsity is to limit the undesired activation.  () is set as the jth activation value.In the process of feature learning, the activation value of the hidden-layer neuron is usually expressed as  = ( + ), where W is the weight matrix and b is the deviation matrix.The mean activation value of the jth neuron in the hidden layer is defined as The hidden layer is kept at a lower value to ensure that the average activation value of the sparse parameter is defined as , and the penalty term is used to prevent   from deviating from parameter .The Kullback-Leibler (KL) divergence [28] is used in this study as the basis of punishment.The mathematical expression of KL divergence is as follows: When   does not deviate from parameter , the KL divergence value is 0; otherwise, the KL divergence value will gradually increase with the deviation.The cost function of the neural network is set as C (W, b).Then, the cost function of adding the sparse penalty term is where  2 is the number of neurons in the implicit layer and  is the weight of the sparse penalty term.The training essence of a neural network is to find the appropriate weight and threshold parameter (W, b).After the sparse penalty term is defined, the sparse expression can be obtained by minimizing the sparse cost function.

Softmax Regression.
The sparse autoencoder can form the deep network structure through the multilayer stack, which can be used for feature learning and clustering of the adhesion data collected by the sensor.However, this autoencoder has no ability to classify.Therefore, this paper presents a deep neural network architecture that combines the stacked sparse self-encoder and softmax regression.The schematic of the network structure is shown in Figure 3. Softmax regression is an extension of the logistic regression model on multiple classifications [29].The category tag of the logistic regression can only take two values, whereas the softmax tag can take on multiple values [30].Let us suppose m training samples of adhesion state {( (1) ,  (1) ), ( (2) ,  (2) ), ⋅ ⋅ ⋅ , ( () ,  () )},  () ∈ {1, 2, . . ., n}.The hypothesis function is used to estimate the probability value ( =  | ) for each category .The softmax output is defined as follows: . . . where normalizes the probability distribution such that the sum of all probabilities is 1.

Overfitting and L2
Regularization.L2 regularization is a way of effectively reducing the neural network overfitting [31].In this study, this method is used to avoid overlearning on features caused by synergies.The basic principles of L2 regularization are as follows: where C  is the cost function of the neural network,  is coefficient, and ‖‖ 2 is the penalty term.The greater the coefficient, the deeper the weight attenuation.

Framework of the Algorithm.
The feature learning ability of the single sparse autoencoder is limited.To construct a model with improved feature extraction capacity, we stacked the sparse autoencoders into a deep structure (SAE).In this process, the output of the upper layer of the encoder is taken as the input of the next layer to achieve a multilearning sample feature.The flowchart of the deep neural network algorithm is shown in Figure 4 and subsequently described. (

1) The Creep Speed (V 𝑠 ) and Adhesion Coefficient 𝜇 Monitored by the Sensor Are Used to Train the Sparse Autoencoder
(1) The parameters, such as the network learning rate and the dropout parameter, are set; weights  and thresholds  are initialized.
(2) The number of iterations is set, and the mean activation value   and the sparse cost function are calculated according to (3)-( 5); the network parameters are updated based on the backpropagation (BP) algorithm. (

2) The Deep Neural Network Is Fine-Tuned with a Small Number of Labeled Samples
(1) With the abovementioned step, the parameters of threshold  and weight  learned from the network are saved.
(2) The L2 regularization and learning rates are set, and the mean square error is calculated.
(3) The BP algorithm is used to update the weights of the network and fine-tune the entire network. (

3) The Performance of the Identification Model Is Tested
(1) The sample size is usually 30% of the total number of samples.
(2) According to the L2 coefficient, the weight of neural network is attenuated while performing the frontpropagation algorithm.
(3) The output of DNN is compared with the sample labels, and comparative statistics are made.

Experimental Research and Analysis
For performance comparison with this dropout-based deep neural network, a BP neural network and an optimized BP The changes in the state of adhesion directly affect the running safety of a locomotive.These changes are reflected in the sensor monitoring data.In this section, the adhesion state of the locomotive is identified and diagnosed according to the inherent characteristics of the sample data of deep neural network based on the sparse autoencoder.In general, the original data samples of sensors are divided into training and test sets according to a 7:3 ratio.A total of 700 training samples and 300 test samples are used in this experiment.

Simulation of the BP Neural
Network.We need to set parameters before the experiment.The number of hiddenlayer nodes is set according to the empirical formula  = 2 + 1, where  is the input dimension and l is the number of hidden-layer nodes.In this study, the creep speed (V  ) and the adhesion coefficient () are used as inputs, so that  should be 5, and the adhesion state is expressed in binary form, as shown in Table 2.This BP neural network should be a 2 × 5 × 3 structure (Figure 5).W is the weight and  is the bias of the BP neural network.The mean square error curve is shown in Figure 6.
To demonstrate the advantages of the proposed algorithm, a genetic algorithm-(GA-) optimized BP neural network is used as a contrast experiment.The crossover operator uses a single point crossover, the crossover probability is 0.7, and the mutation probability is 0.01.The evolutionary process of the GA is shown in Figure 7, and the error descent curve of the BP neural network that is optimized by the GA is shown in Figure 8.

Simulation of Sparse Autoencoder Deep Neural Network.
The visualization of the target classification is shown in Figure 9 to provide a clear analysis of the classification ability of the proposed algorithm.The plane between the yellow and blue modules is the desired classification plane.In this experiment, 1,000 sets of monitoring data are selected as experimental samples, and the 7:3 ratio is used to divide the samples into training and test samples.The actual results are shown in Figure 10, which shows that the actual classification plane is basically consistent with the expected one.From the error histogram in Figure 11, the error distribution of the deep neural network in this chapter is basically in line with the normal distribution, which meets the needs of practical application.The accuracy of adhesion state recognition is shown in Figures 12 and 13.The horizontal axis represents the desired target category, the vertical axis represents the experimentally predicted adhesion state category, and the gray block shows the exact percentage of prediction and expectation.
Figure 12 shows that the accuracy of the deep neural network adhesion state recognition is 96.1%.The overfitting of the neural network generally appears as the trained neural network does not accurately identify the test samples.The adhesion state is a continuously changing process; in order to ensure the safety of driving, the identification of the adhesion state must be as accurate as possible.Since the recognition accuracy rate does not reach the ideal state and there are   significantly more training samples than test samples, there is ample reason to speculate that the deep neural network used in this section has been overfitted.To improve this phenomenon, the L2 regularization method was used to attenuate the weights of the deep neural network.Figure 13 shows the results of the deep neural network adhesion state recognition after L2 regularization.The accuracy rate reaches 99.6%.The accuracy rate of the neural network for the adhesion state test set is improved, and the proposed L2 regularization can improve the overfitting phenomenon that may occur in the adhesion state recognition based on deep neural network.
The experimental results show that the SAE-based locomotive adhesion diagnosis can meet the requirements of  high-accuracy recognition under a reasonable error distribution.Table 3 presents the comparison of the three algorithms in this paper with the traditional BP neural network and GAoptimized neural network performance.

Conclusions
In this paper, an adhesion state fault diagnose method based on SAE is proposed.The effectiveness of the proposed method is validated by computer simulation.The conclusions are elaborated as follows: (1) The adhesion state is divided into four categories, which provide a strong basis for wheel skid warning.
(2) The sparse automatic encoder can extract data features effectively, make classification easier, and extract more robust data features.
(3) Compared with the traditional BP neural network, the deep neural network of the sparse autoencoders can ensure effective fault diagnosis of the locomotive adhesion condition.

Figure 3 :
Figure 3: Schematic of the network structure.

Figure 9 Figure 6 :Figure 7 :
Figure 9 shows the visualization of the classification target of adhesion status.In general, it is necessary to divide the adhesion status into four different statuses.Three classification planes are needed to achieve this (yellow and blue junction in the figure).The requirement of training

Table 1 :
Description of adhesion state of locomotive wheel rail.

Table 3 :
Test accuracy of the four methods.