Estimation of Vehicle Dynamic Response from Track Irregularity Using Deep Learning Techniques

. To improve the quality of track maintenance work, it is a desire to estimate vehicle dynamic behavior from track geometry irregularities. Tis paper proposes a deep learning model to predict vehicle responses (e.g., vertical wheel-rail forces, wheel unloading rate, and car body vertical acceleration) using deep learning techniques. In the proposed CA-CNN-MUSE model, convolutional neural networks (CNNs) are used to learn features of track irregularities, and multiscale self-attention mechanisms (MUSE) are employed to capture the long-term and short-term trends of sequences. Coordinate attention (CA) is introduced into CNN to focus on important interchannel relationships and important spatial mileage points. Te experiments were performed on a multibody simulation model of the vehicle system and the measured data of the actual high-speed line. Te results show that the CA-CNN-MUSE has high prediction accuracy for vertical vehicle responses and fast computation speed. Te predicted time-domain waveforms and power spectral densities (PSDs) agree well with the actual vehicle responses. Te main features of the lateral vehicle responses can also be captured by the proposed method, yet the results are not as good as the vertical ones.


Introduction
Assessment of track quality for track maintenance work is critical to ensure train running safety and passenger ride comfort in high-speed railways. Te current evaluation method of track quality is based on the amplitudes of track geometry irregularities. However, it is not enough to use the single-track irregularity index to evaluate the track quality without considering vehicle dynamic response. For example, some problems may occur in some track sections, where the amplitude of each track irregularity does not exceed the limit but a large vehicle vibration response appears. Conversely, there are also some track sections where the amplitudes of some track irregularities exceed the limits but do not cause poor vehicle response. Tese problems show that the vehicle vibration response is the result of the nonlinear coupling of multiple track irregularities.
To improve the track quality assessment standards and the track maintenance work, research has emerged to relate track geometry irregularities to vehicle response. Te key to these works is to fnd a model to accurately estimate the vehicle response from track irregularities. Ten, the track geometry assessment can be performed by combining the indexes of actual track irregularities and the predicted vehicle responses. Te models are not only helpful for the track maintenance department to fnd the track locations that cause undesirable vehicle responses and estimate the degradation trend of the track but also helpful for facilitating the upgrades of the vehicle design technology.
Tere are several ways to build the model to predict vehicle response from track geometry irregularities. Some researchers focus on building a mechanical model to simulate the nonlinear dynamic behavior of the vehicle [1][2][3].
Using commercial software such as SIMPACK, researchers can build a 3D vehicle-track dynamic model to investigate the relationship between track irregularities and the dynamics performance of railway vehicles [4][5][6]. However, the performance of the mechanical model depends on the reliability of model parameters and is easier to be afected by variations in reality. Terefore, putting a theoretical model into track maintenance practice is difcult because the actual parameters of a vehicle-track system are difcult to obtain and change with time. Besides, the numerical iteration method used in solving a mechanism model is timeconsuming.
Some researches characterize the behavior of the vehicletrack system as linear transfer functions and estimate parameters based on system identifcation theory [7][8][9]. However, system identifcation theory can be applied to only the linear system and only constant speed conditions. Te alternative approach is to carry out a prediction of pure data using machine learning techniques. Te machine learning method has high computational efciency, good performance in modeling nonlinear mapping relationships and can be designed for varying speeds. Machine learning methods can be further categorized into traditional methods and deep learning methods. Traditional machine learning approaches predict vehicle responses including multilayer perception (MLP [10]), a set of backpropagation (BP [11]), decision tree, support vector machines, and other regression algorithms and their comparisons in [12], and NARX neural networks [13].
In recent decades, deep learning has achieved great success in sequence modeling. Te main advantages of deep learning are strong nonlinear modeling ability, endto-end training, and eventually great improvement in model accuracy. Te classical sequence modeling methods in deep learning are recurrent neural networks (RNN [14]), long term short memory (LSTM [15]), and gated recurrent (GRU [16]), and their combinations with convolutional neural networks (CNN). Li et al. used LSTM to estimate car body vertical and lateral acceleration based on the simulation model [17]. Ma et al. proposed a CNN-LSTM model to predict car body vibration acceleration from track irregularities [18]. Te model uses two layers of CNN to learn diferent band features of sequences and feeds the extracted features into two layers of LSTM to learn the mapping relationship between the input and the output. Te accuracy of the model is superior to BP and LSTM.
Recently, with the emergence of the attention mechanism (AM [19]), deep learning has stepped into a new stage. Te main idea of attention is to tell a model to attend "what" and "where" to focus on the most relevant information instead of the entire sequence. Te traditional RNN and LSTM models are replaced by a multihead self-attention mechanism, which achieves better accuracy and allows more parallel computation in machine translation tasks [20]. With its success in sequence-to-sequence tasks, the self-attention mechanism has become a standard component for capturing long-term dependence. However, the self-attention mechanism also has shortcomings. With the deepening of selfattention levels, a certain input vector will be paid too much attention, resulting in insufcient use of local information. Terefore, Zhao et al. replaced multihead self-attention with multiscale attention (MUSE) to encode global and local relations in parallel.
Attention mechanisms are widely deployed for boosting the performance of modern deep neural networks [21][22][23][24][25][26]. Unlike channel attention, it only considers interchannel information but neglects the importance of positional information. Hou et al. proposed a novel lightweight coordinate attention for mobile networks [27], which can capture important interchannel relationships and precise positional information meanwhile.
In this paper, we established a CA-CNN-MUSE model, which combines CNN with MUSE and introduces coordination attention into CNN to estimate vehicle response. Te experiments show that the model improves estimation accuracy, compared with LSTM and CNN-LSTM models and has good computation speed.

Dataset.
In this paper, we use two datasets to investigate the proposed model. Te frst dataset is the "Inspection-Simulation" dataset, which means inspected track irregularities by high-speed comprehensive inspection train (see Figure 1) in China and simulated vehicle responses by a multibody model of the vehicle. Te multibody dynamics (MBD) software SIMPACK is employed for the simulation of vehicle-track interaction. Te CRH380B EMU trailer is modeled, in which the car body, bogie frames, and wheel sets are simplifed as a rigid body and are connected using the primary and secondary suspensions. Te second dataset is the "Inspection" dataset, which means that all the track irregularities and vehicle responses are inspected by track comprehensive inspection trains in actual high-speed railway lines. Te items of track irregularities and vehicle responses of the two datasets are diferent.
Te inspection-simulation dataset includes 4 track irregularities, i.e., left and right longitudinal levels and left and right alignment, coming from 3 Chinese high-speed railways, of which the mileage of line 1, line 2, and line 3 is 320 km, 120 km, and 80 km, respectively. Te vehicle response data includes 14 quantities: wheel-rail force (left and right vertical forces of 1-axel, 2-axle, 3-axle, and 4-axle), wheel unloading rate (1-axle, 2-axle, 3-axle, and 4-axle), and vertical car body accelerations (front and rear).
Te inspection dataset was extracted from the database of track comprehensive inspection trains on a high-speed line in China. Te whole mileage is 17.5 km, the spatial sampling interval is 0.25 m, and the vehicle speed range is from 215 km/h to 245 km/h. Te track irregularity includes 11 quantities: left and right longitudinal levels, left and right alignment, left and right long-wave longitudinal level, left and right long-wave alignment, gauge, cross-level, and twist. Te vehicle response includes 6 quantities left and right vertical forces, left and right lateral forces, and vertical and lateral car body accelerations.

Model Structure.
Te prediction framework of vehicle dynamic response is established as shown in Figure 2. Te network mainly consists of three main modules: CNN, coordination attention (CA), and multiscale attention (MUSE). CNN is composed of alternately stacked convolution layers and pooling layers, which are used to extract diferent features of track irregularities. Te results of CNN are fed into two stacking multiscale attention layers, each of which is composed of a multihead self-attention mechanism and depth-wise separable convolution to encode global and local relations in parallel. Coordinate attention [27] is added to CNN to focus on important feature channels and important mileage points. Tere are fully connected layers before and after the multiscale self-attention module to perform nonlinear mapping and change the dimensions, and the last fully connected layer outputs the estimated vehicle responses. Shock and Vibration

CNN Module.
Since the maximum management wavelength of track irregularity is 120 m, to fully capture long-distance wavelength information, we take all track irregularity data within 120 m as input. We use CNN to learn the features of input vectors and fed the features into the multiscale attention layer to estimate the vehicle response at the current mileage point. Assuming the current mileage point is t, let us estimate the vehicle response at T mileage points at a time Y � y (t) , y (t+1) , . . . , y (t+T− 1) . (1) Te input should be where is the K-dimensional vector, L is the number of mileage points within 120 m. Te size of the input sequence is T × L × C, and the size of the output sequence is T ×1 × K.
For the inspection-simulation dataset, C is 4 and K is 14. For the inspection dataset, C is 8 and K is 6. Te structure and parameters of CNN are as same as that in reference [27]. It includes two convolution layers (Conv1D), where the numbers of convolution cores are 4 and 8 respectively, and the size of the convolution core is 1 × 5, and the step is 1. Two max-pooling layers are used with 1 × 2 pool core size and step 2. After two stacking operations of convolution and max-pooling, the multidimensional features of track irregularities are extracted. Te fattened layers compress the multidimensional feature vectors into one-dimensional feature vectors to get global features.

Multiscale Attention.
Te diagram of multiscale attention is shown in Figure 2. Tis module consists of two parts in parallel, a multihead self-attention mechanism for capturing global features and a depth-wise convolution mechanism for capturing local features. For the input sequence x, the output y passing through the multiscale attention layer can be expressed as follows: Te multihead self-attention mechanism is responsible for learning representations of long-term dependencies. In this module, the input sequence X is projected into three representations, query Q, key K, and value V [20].
Q, K, V � Linear 1 (X), Linear 2 (X), Linear 3 (X). (4) Ten, the output representation is calculated as follows: where W Q , W K , W V , and W O are projection parameters, V � XW V . Te convolution module is used to capture the local contextual sequence representations in the same mapping space. Based on the depth-wise convolution operation, the module uses three convolution submodules, which contain multiple cells with diferent kernel sizes of 1, 3, and 5 to capture the diferent range features. A gating mechanism is introduced to automatically select the weights of diferent convolution cells to converge the information of diferent convolution submodules. Te depth-wise convolution will frst perform independent convolution on each channel and then perform ordinary convolution. Te calculation process of the convolution module can be expressed as follows: where α is the weight coefcient; the output of the convolution cell with kernel size is as follows:

CA Module. Coordinate attention is added in CNN to
focus on the important positions in the spatial dimension L' and the important channels in the channel dimension C', which have important impacts on the outputs. Te input dimension of the coordinate attention is T × L' × C'. Te size after 1D average pooling in the T dimension is 1 × L' × C'. 1 × 1 convolution is used to reduce and increase the dimension of the channel, where r is the reduction ratio. At last, the coordinate attention weight generates through the sigmoid function.

Experimental Setup.
For each dataset, the division ratio of training and test data is 7 : 3, which is a common setup in deep learning. In the training process, the loss function is the mean square error of the actual and the predicted vehicle response, adding the L 1 -norm and L 2 -norm regularization of model parameters.
where T is the sequence length, W is the all-trainable model parameters, λ 1 and λ 2 are regularization coefcients. Te learning rate is 0.001 and the optimizer is the Adam algorithm.

Evaluation Indices.
Four evaluation indices for model performance are employed, including the mean absolute error (MAE), root mean square error (RMSE), theil inequality coefcient (TIC), and the correlative coefcient (ρ). Te indices are defned as follows: where M is the length of test data; y k and y k are the k − th actual and predictive values; y and y are the expectation of the actual value and the predicted value of the model, respectively. MAE and RMSE refect the absolute accuracy of the prediction, and the smaller their values, the better the performance of the model. TIC and ρ represent relative accuracy indices. Smaller TIC (ranging from 0 to 1) means higher accuracy. ρ ranges from −1 to 1, and the greater the absolute value, the higher the accuracy.  To sum up, the evaluation indices of CA-CNN-MUSE are superior to that of other models.

Results on Diferent
Lines. Based on the inspectionsimulation dataset, we investigate the model estimation performance on diferent lines, as shown in Table 2. As can be seen, compared with CNN-LSTM, CA-CNN-MUSE also has better performance online 2 and line 3.  Table 3. Te following can be seen:

Results of Specifc
(1) CA-CNN-MUSE model can efectively estimate vertical wheel-rail forces, wheel unloading rate, and vertical car body accelerations, and the corresponding ρ are 0.87, 0.73, and 0.96, respectively. Among these, the estimation accuracy of vertical car body acceleration is the highest. (2) In vertical wheel-rail forces, the right vertical force of axle 1 has the highest estimation accuracy. Te estimation accuracy of the wheel unloading rate of the four wheel sets is the same.

Results on the Inspection Dataset.
We also train and test the proposed CA-CNN-MUSE on the track inspection dataset. Te 11 track irregularity items and the vehicle running speed are used as the input of the network. We frst remove the wavelength components below 2 m from the signals using wavelet decomposition and reconstruction, then feed them into the network to estimate the vehicle responses. Te following results are obtained on the test set: Table 4 summarizes the accuracy indices of the CA-CNN-MUSE model. As can be seen, the CA-CNN-MUSE model can efectively estimate vertical wheel-rail force and acceleration. Te estimation accuracy of vertical vehicle response is good. Compared with the vertical responses, the estimation of the lateral vehicle responses is not good enough.
To further analyze the model performance on diferent wavelengths, the waveforms of the right vertical force and vertical car body acceleration are illustrated in Figures 3(a)     vehicle responses can also be captured by the proposed method, yet the results are not as good as the vertical ones.
Te power spectral density (PSD) of the right vertical forces and car body acceleration are illustrated in Figures 4(a)-4(b), and the power spectral density of the right lateral forces and car body acceleration are illustrated in Figures 4(c)-4(d). As can be seen, the predicted PSD of right vertical forces and car body acceleration has good ftting efect at the wavelength above 3 m.

Conclusion
To relate track geometry irregularities with vehicle responses to improve track quality assessment standards and track maintenance work, this paper proposed a CA-CNN-MUSE model to predict vehicle response. We use a multiscale self-attention mechanism to replace the LSTM structure, which is dominant in this kind of task. Besides, we introduced the light weight coordinate attention mechanism into CNN to focus on important interchannel relationships and important mileage information. Te results of this paper show the following: (1) CA-CNN-MUSE has higher prediction accuracy than LSTM and CNN-LSTM and a faster inference speed than CNN-LSTM. Te estimated waveforms and PSDs of vertical wheel-rail forces and car body acceleration by CA-CNN-MUSE agree well with the actual items; (2) CA-CNN-MUSE is applicable to the multibody simulation model of a vehicle system and the measured data of actual high-speed lines.
Te proposed model succeeds in representing the vertical wheel-rail force and car body acceleration, yet the estimations on the lateral wheel-rail force and car body acceleration are not as good as the vertical results. In fact, the estimation of lateral wheel-rail force and car body acceleration is indeed a difcult problem, and we will continue to investigate the challenge.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.  Shock and Vibration