Neural Network-Based Train Identification in Railway Switches and Crossings Using Accelerometer Data

This paper aims to analyse possibilities of train type identiﬁcation in railway switches and crossings (S&C) based on accelerometer data by using contemporary machine learning methods such as neural networks. That is a unique approach since trains have been only identiﬁed in a straight track. Accelerometer sensors placed around the S&C structure were the source of input data for subsequent models. Data from four S&C at diﬀerent locations were considered and various neural network architectures evaluated. The research indicated the feasibility to identify trains in S&C using neural networks from accelerometer data. Models trained at one location are generally transferable to another location despite diﬀerences in geometrical parameters, substructure, and direction of passing trains. Other challenges include small dataset and speed variation of the trains that must be considered for accurate identiﬁcation. Results are obtained using statistical bootstrapping and are presented in a form of confusion matrices.


Introduction
Railway switches and crossings (S&C) are important components of railway infrastructure. Dynamic effects of passing trains are higher than in case of a straight track and are affected by factors such as train speed, S&C geometry, fastening stiffness, and substructure material [1]. With increasing traffic and growing demands on the infrastructure, reliability and safety of S&C must be ensured. Large demands on maintenance occur especially on high-speed tracks [2]. Generally, three different maintenance approaches can be applied-corrective, preventive, and predictive [3].
Modern predictive approaches require real-time monitoring and data collection to evaluate S&C condition and apply appropriate countermeasures when needed [4]. Accelerometer or deflection sensors are simple and reliable devices that can be mounted directly in the S&C structure for monitoring the dynamic response. Gradual changes over time for the same train type and speed may indicate an emerging defect in S&C structure [5] and provide an early warning to the infrastructure operators. erefore, train type must be recognized from the data to evaluate changes in S&C.
Project S-CODE (Switch and Crossing Optimal Design and Evaluation [6]) presented requirements for the next generation of S&C [7] and also introduced next generation of control, monitoring, and sensing system that, among others, will be able to determine the type of passing train based on accelerometer data.
is system is referred to as Train Identification System (TIS). Recent studies also proposed to utilize machine learning techniques for predictive maintenance [8]. Train type was already successfully identified in a straight track [9]. Identification of trains in S&C is a more challenging task as more factors affecting sensor measurements must be considered.
Data can be obtained either from sensors mounted on trains or track. For successful train identification, it is important to recognize defects on trains such as flat wheels and not consider data from defected trains in S&C evaluation. Train defects such as wheel flats have been already detected by sensors mounted on trains [10] or track [11]. Critical samples containing defected wheels can be identified from the accelerometer signal by state-of-the-art pattern recognition techniques [12].
Machine learning methods can be used with benefit for processing a large amount of data. Methods such as support vector machines (SVMs) have already been incorporated for condition monitoring of railway infrastructure [13]. In this paper, neural networks are used for train identification as they are suitable for time series classification problems [14,15]. Once trained, neural networks are also advantageous in terms of performance which may be useful for future in situ TIS. e aim of this paper is to introduce possibilities of train type identification directly in S&C using neural networks and accelerometer data. is approach is unique and has not been attempted to date. Two locations and four S&C are considered, and several use case scenarios are presented in order to evaluate the transferability of machine learning models between different locations. Results for multiple train classes as well as various neural network architectures are discussed.

Data Acquisition.
Data used for train identification were obtained by in situ measurements from multiple accelerometer sensors placed in different positions around the common crossing of the S&C. e common crossing contained no movable parts. erefore, passing trains caused increased acceleration impulses due to interruption of the rail continuity as wheels of the train hit the crossing nose. In a case of a movable crossing that is used in some S&C designs especially for high-speed tracks, these impulses would be lower but still detectable [16]. e full dataset contains signals from 6 single-axis accelerometers in Z-direction, 2 three-axis accelerometers in X, Y, and Z directions, and 8 displacement sensors in Zdirection as shown in Figure 1. e sampling frequency of the sensors was 10 kHz. Sensors were placed either on ballast bed, sleeper, or directly on a rail near the crossing nose.  Table 1.

Characteristics of Locomotive
Data were obtained from two nearby locations on the same railway corridor in the Czech Republic: Choceň (referred to as Location 1) andÚstí nad Orlicí (referred to as Location 2). Two S&C were present in each location and their parameters differed between locations. Both S&C in Location 1 had different geometry (suitable for higher speeds), substructure parameters, and also an opposite direction of train passages compared to the S&C in Location 2. Another difference was that trains with locomotive class 363 had lower mean speed in Location 2 as they stopped in a nearby station. e speed of the trains was measured by a radar velocity gun with ±2 km/h accuracy. Measurements for each locomotive class and their speeds are listed in Table 2 for Location 1 and Table 3 for Location 2.

Localization of the Locomotive Part.
Locomotive part of the accelerometer signal was used for the identification since locomotives are usually better maintained compared to the regular carriages. e variance of locomotive weights is also lower. Approaches used for locomotive localization from the whole signal were based on peak detection. Root mean square (RMS) value was calculated by equation (1) using a sliding window of size d � n 1 − n 2 for peak localization. Grouping of nearby peaks was done by mean shift clustering with bandwidth parameter α: (1) e size of the sliding window for RMS was chosen to d � 0.02 s. Peaks were then limited by a minimal amplitude value that was calculated dynamically using quantile of the whole signal between q lim � 0.85 ∼ 0.95. Mean shift clustering with bandwidth parameter α � 0.03 ∼ 0.033 s distance was applied in order to group nearby peaks. All parameters were chosen empirically based on mean train speed. ese methods served only for preprocessing of the given dataset and are not the aim of this research.
Each peak in an accelerometer signal represents an axle of a train and a two-peak group represents a bogie. erefore, the signal can be divided into four-peak groups where the first group represents a locomotive which is followed by carriages as subsequent groups. is algorithm proved itself useful in data preprocessing and automatic extraction of the locomotive part of the signal as it was applied on a dataset which contained mostly signals with low levels of noise.
Example of an accelerometer signal generated by train with a locomotive of class 380 at speed 162 km/h passing through a S&C is shown in Figure 2. All axles of the train can be easily recognized as peaks in the signal. Detail of the locomotive part of this signal is shown in Figure 3.

Methodology for Classification.
e high cost of corrective maintenance and risk of accidents require a robust solution for train type identification as it will be part of the S&C real-time monitoring system. e S-CODE project was proposed to incorporate accelerometer signals to determine the type of passing train [6]. Accelerometer sensors will be mounted in situ in the S&C structure and it is expected that a large amount of data will be collected over time, so appropriate methods must be chosen for further data processing.
As stated in [13], machine learning methods, such as support vector machines (SVMs), are often used for monitoring and evaluation of the condition of railway infrastructure components [17] or for train defect detection from sensor data [11]. Using neural network-based models for time series classification is a common problem [18], and recent research mostly focuses on developing novel network   architectures such as modifications of residual neural network (ResNet) [19]. Convolutional neural networks are often used for the classification of time series data with an outstanding performance [20,21] and are also widely used for the classification of accelerometer data and human activity recognition [14,22]. In railway engineering, deep neural networks were successfully applied in areas such as fault diagnosis on trains [23] or for rail degradation prediction [24].
Given the high complexity of the train type identification problem in S&C, multiple neural network architectures will be examined in this paper in order to find an optimal design.   (MLP1), two (MLP2), or three (MLP3a, MLP3b) fully connected hidden layers. Hidden layer size was set to 100 neurons in all cases except MLP3b where 500 neurons were used. All perceptron models employed rectified linear activation function (ReLU) between layers except the output layer where softmax activation was applied. Using the softmax activation function in the output layer is a common practice [25] which has an advantage that the vector of output probabilities sums to 1. A convolutional neural network (CNN) consisted of a convolutional layer with 64 filters of length 5 followed by a max-pooling layer of length 5 and a hidden layer of size 100. ReLU was used as an activation function between layers and softmax activation at the output. e sixth and final architecture was a long short-term memory recurrent neural network (LSTM) with one LSTM layer with 50 hidden states followed by a fully connected layer with output softmax activation function. e input size for all models was set to 1000, and the output size was the number of classified train types (i.e., 5 and 7). e training was done in 12 epochs and data were forwarded through the model in batches of size 4. e learning rate of the models was fixed to 0.001. Adam optimizer was selected for automatic differentiation [26], and mean squared error was used as a loss function. e number of trainable parameters and the number of layers for different neural network architectures are presented in Table 4.

Normalization of Input Accelerometer Signals.
Specific features from the data can be selected as input to the neural networks to decrease complexity and improve training times. However, a whole accelerometer signal may be used without a need for extensive and domain-specific preprocessing. is approach also removes bias due to manually selected features [18] and improves performance, especially for in situ device.
In the first step, signals were normalized in both X-and Y-axes to prevent locomotive misclassification for different train speeds. e number of samples in available locomotive signals spanned between 1 · 10 3 and 1 · 10 4 depending on the sampling frequency of the sensors, train speed, and locomotive geometry. In the X-axis, signals were resampled to the input size which was chosen to 1000. is number of samples is sufficient as it preserves enough information with a lower number of samples than in the original signal (see Figure 4). In the Y-axis signals were normalized between values −1 and 1.

Use Case Scenarios.
Four accelerometer channels A0Z, A2Z, A3Z, and A7Z were selected for train identification as they were similar in terms of phase shift and noise. Sensors A2Z, A3Z, and A7Z were placed on a sleeper under the crossing nose and sensor A0Z was placed in a ballast bed nearby as shown in Figure 1. ese four channels were used separately in order to augment data and increase its variability as the sensors can generally be placed in arbitrary position around the crossing nose. e full dataset contained 108 train measurements from Location 1 and Location 2 giving in a total of 432 samples. To evaluate classification models for a different variety of data, these two locations were considered both independently and together, using only locomotive classes present in both locations (5 classes).
Four use case scenarios were considered as shown in Table 5. In scenarios A and B, the models were trained on all the samples from Location 1 and Location 2, respectively. In scenario C, the data from these two locations were combined. Size of the dataset remained relatively small despite using four accelerometer channels independently. erefore, the bootstrapping technique [27] was utilized for scenarios A, B, and C in order to produce statistically relevant results. 10 models were trained and tested for each neural network architecture and each scenario, and the results were averaged to evaluate the overall performance of the given architecture [28]. For every model, the scenario dataset was shuffled and split in the way that at least two locomotive passages (i.e., 8 samples) for each class were available for testing.
Finally, the use case scenario D used data from Location 2 for training and the data from Location 1 for testing. is scenario aimed to evaluate a situation when the model for train identification is trained on the currently available data and then applied to another S&C.

Results
Substantial differences of classification accuracy between the use case scenarios, locomotive classes, and neural network architectures were observed due to factors such as the variance of train speeds, undercarriage geometry, or dynamic response of S&C structure. Despite these factors, the accuracy of the presented models is still relatively high compared to random classification.
Baseline accuracy (random classification) for scenario A is 14.3% and for scenarios B to D is 20.0% and is given by the inverse of the number of classified classes. Mean model accuracy for different scenarios spanned between 52.3% and 80.6% and is presented in Table 6 and Figure 5. e difference in the mean accuracy in the considered two locations (scenarios A and B) was 28.3% and has to be addressed to the higher data variability in Location 1 as more locomotive classes were classified and also the train speeds were more variable. Training models on data from one location and testing on the other (scenario D) resulted in a mean accuracy of 55.0%. Combining data from both locations together (scenario C) exhibited a mean accuracy of 72.9%. Confusion matrices were used for the evaluation of results.
Differences can also be observed between different neural network architectures (Table 6 and Figure 5). e flexibility of models varies as the number of trainable parameters differs (see Table 4). CNN shows the best accuracy in all scenarios compared to the other models since the convolutional layer enhances the ability of feature recognition in time series data. is architecture also contains the largest number of trainable parameters. All multilayer perceptrons (models MLP1, MLP2, MLP3a, and MLP3b) have only low variance in accuracy and with slightly decreasing trend for deeper architectures. Relatively poor mean accuracy was observed in LSTM due to difficulties in    Locomotive classes were also classified with varying accuracy. Pendolino 680, Stadler 480, and class 380 were identified with the highest mean accuracy due to their specific undercarriage geometry. On the contrary, mean accuracies in scenario A for the three mutually geometrically similar classes 151, 163, and 363 were lower. Differences in the classification accuracy for the same locomotive classes in different scenarios are to be addressed to the variance of speed. An overview of the mean accuracy of classification for each locomotive class is shown in Table 7.

Discussion
Results showed differences in accuracy for different scenarios, locomotives, and machine learning models which can be addressed to factors such as complex dynamic interaction of the train and S&C structure, multiple locomotive classes, similarities in locomotive undercarriage geometries, speed variance, and a relatively small amount of training data. e test scenario C that used data from one location for training and the other location for testing presented that neural network-based classifiers are generally transferable to S&C in different locations. Nevertheless, the model performance has to be improved by using a larger training dataset and more advanced architectures of the neural networks. Additionally, high uncertainty in case of trains with high-speed variance requires partitioning trains with different speeds into separate classes. e highest classification accuracy of CNN was expected since it is the most commonly used architecture for this type of problem [18]. On the other hand, the lowest accuracy of LSTM compared to the other evaluated models may be attributed to the long input sequence as this architecture is generally suitable for time series classification [29]. Adding a convolutional layer to LSTM may also increase its accuracy as this architecture was successfully applied in a number of time series classification or prediction problems [30,31].
Trains with different undercarriage geometry were identified with the highest accuracy contrary to the trains with similar geometry that were often mutually misclassified. Large speed variability should also be addressed for the poor classification accuracy for class 363.
It is expected that more accelerometer data from train passages through S&C will be available in the future. Advanced network architectures such as LSTM with convolutional layers [30] or ResNet [19] will be examined as well as more refined optimization of hyperparameters. Also, data augmentation techniques can be employed to increase dataset size and variability [32]. Another possible solution is to use transfer learning [22] and utilize a large amount of data available in other industries. Here, machine learning models can be trained on similar time series data and then fine-tuned for the locomotive classification problem. e ultimate goal is to develop a full-featured solution for locomotive identification in order to evaluate changes in the dynamic response of S&C for the same train types and speeds as well as to detect defected trains and exclude them from the dataset to improve classification accuracy.

Conclusions
Train identification based on accelerometer data in S&C using different neural network architectures was presented in this paper. e most important findings can be summarized as follows: (i) Train type identification in S&C is feasible despite the increased complexity of the problem compared to a straight track. (ii) Transferability of machine learning models between different locations is also possible. Models can be trained on data from one location and then applied to another, previously unseen location, with relatively high classification accuracy in spite of differences in S&C parameters. However, both locations evaluated in this paper are positioned on one railway corridor. It is therefore desirable to further verify the transferability of models between unrelated locations. (iii) Accelerometer signals can be classified without a need for manual feature selection with respect to the limited computational capacity of the in situ device.
To enhance the robustness of evaluated models, only the locomotive part of the signal was used as locomotives are less variable in terms of weight and wheel geometry. However, locomotives with largely different speeds were incorrectly classified despite normalization of input data. Grouping of locomotives into speed categories is required in order to improve classification accuracy. Additionally, defected trains must be identified in advance and excluded from the dataset for successful train identification and subsequent evaluation of the dynamic response of S&C.
Comparison of four use case scenarios and six neural network architectures showed higher model performance for data with lower variability and vice versa. e best performing convolutional neural network proved to be a suitable baseline architecture for the locomotive classification problem. In further research, more advanced neural network architectures, as well as hyperparameter optimization, will be investigated.
Data Availability e data used in this work were provided exclusively by Správaželeznic, the national railway infrastructure manager of the Czech Republic. Data can be provided on demand at the e-mail address info@spravazeleznic.cz.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments
Financial support provided by the Technology Agency of the Czech Republic under the projects Turnout 4.0 (CK01000091) and Efficient spacetime predictions using machine learning methods (TJ04000232) as well as the   Journal of Advanced Transportation support of the project Smart sensoric system for railways (FAST/FSI-J-20-6265) is greatly acknowledged.