^{1}

^{2}

^{2}

^{2}

^{2}

^{1}

^{2}

In this study, a hybrid method combining extreme learning machine (ELM) and particle swarm optimization (PSO) is proposed to forecast train arrival delays that can be used for later delay management and timetable optimization. First, nine characteristics (e.g., buffer time, the train number, and station code) associated with train arrival delays are chosen and analyzed using extra trees classifier. Next, an ELM with one hidden layer is developed to predict train arrival delays by considering these characteristics mentioned before as input features. Furthermore, the PSO algorithm is chosen to optimize the hyperparameter of the ELM compared to Bayesian optimization and genetic algorithm solving the arduousness problem of manual regulating. Finally, a case is studied to confirm the advantage of the proposed model. Contrasted to four baseline models (k-nearest neighbor, categorical boosting, Lasso, and gradient boosting decision tree) across different metrics, the proposed model is demonstrated to be proficient and achieve the highest prediction accuracy. In addition, through a detailed analysis of the prediction error, it is found that our model possesses good robustness and correctness.

With the rapid development of society and the continuous improvement of people’s quality of life, people have put forward higher requirements for the reliability and punctuality of high-speed railway transportation [

The traditional models are a classical approach for train delay prediction, such as probability distribution models [

Recently, the application of machine learning methods to predict train delays has been widely concerned by researchers, which makes up for the shortcomings of traditional methods [

To improve the backpropagation algorithm and simplify the setting of learning parameters of general machine learning models, the ELM algorithm was proposed by Bin Huang et al. [

Parameter adjustment is another critical factor to guarantee the good performance of machine learning models [

Therefore, according to what the author has learnt, we propose PSO to optimize the hyperparameter of ELM to forecast train arrival delays.

The contributions this paper makes are as follows:

The main features affecting the train delay prediction are evaluated by the extra trees classifier. Then, the proposed model is constructed based on these features which possess spatiotemporal characteristics (train delays at each station). In this way, the interpretability of the proposed model is improved.

The proposed model is applied to the arrival delay prediction of trains on HSR line, which suggests a brand-new perspective for the train delay prediction problem. In addition to solving the drawbacks of backpropagation algorithm, the advantage of ELM-PSO is also to solve the arduous problem of manual regulating the hidden neurons of ELM better than random search and Bayesian optimization at accuracy and efficiency.

We perform experiments on a section of the Wuhan-Guangzhou (W-G) HSR line. The proposed model not only is compared to other two adjusting parameter models, but also is contrasted with four prediction models from different perspectives. Our model turns out to have an extraordinary ability in managing large-scale data in accuracy.

The remainder of this paper is distributed as follows: in Section

Train delay problem is visualized in Figure

Conversion from the train itinerary to mathematical notation.

This paper only focuses on the train arrival delay prediction. We suppose that there is an aimed train

The station code (

The train number (

The length between the present station and the next station (

The scheduled running times between the present station and the previous station (

The actual running times between the present station and the previous station (

The scheduled running times between the present station and the next station (

The actual running times between the present station and the next station (

Buffer time, which indicates the difference between

The arrival delay time at the present station (

There are multiple potential interdependent features (e.g., the train number, the length between two adjacent stations) that are intently related to train delay prediction. Based on the collected data and the experience of dispatchers, we ultimately select nine features that are possible to influence train delays.

We apply extra trees classifier to analyze the correlation between all features and train delays. The results are exhibited in Figure

Bar chart of the correlations between the nine input features and output

The train arrival delay prediction problem in this paper is transformed into the following expression:

This paper proposes a hybrid model of ELM and PSO for train delay prediction. ELM is widely used in regression problems because of its advantages of small computation, good generalization performance, and fast convergence speed [

Step 1: data preprocessing. First, 9 features mentioned in Section

Step 2: initializing the parameters and population. Parameters such as maximal iteration number, population size, and speed and position of the first particle are initialized. Each particle

where

Step 3: ELM (hidden layer activation function: sigmoid function) is used. The processed feature set

where

Step 4: calculate the fitness of each particle, and compare to update the current best fitness and its particle location.

Step 5: start the iteration. PSO will update the positions and velocities of all particles, and then repeat step 4. If the maximum number of iterations is exceeded, it will end the process.

Step 6: output the results. We can obtain the output value on test set as well as the optimal number of hidden layer neurons.

The specific flowchart is shown in Figure

The flowchart of ELM-PSO method.

The data employed to verify the ELM-PSO are obtained from the dispatching office of a railway bureau. The 15 stations applied in the study include a section, the length of which is 1096 km from CBN to GZS on the double-track W-G HSR line. There are more than 400,000 data points used in this study, with a time span from October 2018 to April 2019. The train original operation data and route map of the targeted 15 stations on the W-G HSR line are shown in Table

Train operation data format in the database.

Station | Station code | Date | Actual arrival | Actual departure | Train | Scheduled arrival | Scheduled departure |
---|---|---|---|---|---|---|---|

GZS | 369 | 2018/7/27 | 12:04 | 12:04 | G100 | 12:05 | 12:05 |

GZN | 368 | 2018/7/27 | 12:19 | 12:19 | G100 | 12:19 | 12:19 |

QY | 367 | 2018/7/27 | 12:26 | 12:26 | G100 | 12:26 | 12:26 |

YDW | 366 | 2018/7/27 | 12:37 | 12:37 | G100 | 12:38 | 12:38 |

Map of the W-G HSR line.

Analysis of the delay ratio of each station reveals not only the condition of each station but also an increasing emphasis on the indispensability of train arrival delay prediction, which contributes to improving the ability of each station to cope with and even inhibit the increase in train arrival delays. Trains with arrival delay greater than 4 minutes are considered as delayed trains. What is intuitively presented in Figure

Arrival delay ratio for each station.

In order to compare the performance of our proposed method, the k-nearest neighbor (KNN), categorical boosting (CB), gradient boosting decision tree (GBDT), and Lasso are used as baseline models. We take 20% of the dataset as the test set and the rest as the training set. The experiment runs in Python in an environment with an Intel® Core i5-6200U processor 2.13 GHz and 8 GB RAM. Briefly, an overview description and hyperparameter settings of each model are as follows:

KNN: KNN algorithm is extensively applied in differing applications massively, owing to its simplicity, comprehensibility, and relatively promising manifestation [

N_neighbors = 15

Weights = uniform

Leaf_size = 30

CB: CB is a machine learning model based on gradient boosting decision tree (GBDT) [

Depth = 3

Learning_rate = 0.1

Loss_function = RMSE

GBDT: GBDT has been employed to numerous problems [

N_estimators = 30

Loss = ls

Learning_rate = 0.1

Lasso: Lasso is a prevailing technique, capable of simultaneously performing regularization and feature filtering. Furthermore, data can be analyzed from multiple dimensions by Lasso [

Alpha = 3.0.

Max_iter = 1000.

Selection = cyclic.

Root mean squared error (RMSE), mean absolute error (MAE), and

We compare PSO with the other two hyperparameter tuning models to ascertain the most satisfying one. The overview and hyperparameter settings of each model are as follows:

PSO: to locate the optimal hyperparameter of the ELM, the parameter settings of the PSO algorithm are as follows. PSO has 20 particles at each iteration, and there are altogether 20 iterations, which is equivalent to 400 iterations of Bayesian optimization.

Number of particles = 20

Fitness function: RMSE on test set

Search dimension = 1

Particle search range = [1, 2000]

Maximum number of iterations = 20

BO (Bayesian optimization): BO calculates the posterior probability distribution of the first

Objective function: RMSE on test set

Substitution function: Gaussian process regression

Acquisition function = UCB (upper confidence bound)

Hyperparameter search range = [1, 2000]

Maximum number of iterations = 400

GA (genetic algorithm): the traditional iterative model is easy to fall into the trap of local minima, which makes the iteration impossible to continue. GA overcomes the phenomenon of “dead loop” and is a global optimization algorithm [

Objective function: RMSE on test set

Hyperparameter search range = [1, 2000]

Generations = 20

Population size = 20

Maximum number of iterations = 400

The process of PSO tuning the hyperparameter is shown in Figure

Convergence discriminant graph of PSO optimization.

The search range [1–2000] of hyperparameter is determined by manually trying several values in the range of [1–10000]. When the hyperparameter value is greater than 2000, the fitness tends to be stable. Also, the time consumption is multiplied acutely. Ultimately, we decide to limit the search range to [1–2000], weighing time consumption and precision.

The computational cost is shown in Table

The performance of each hyperparameter setting model.

Model | Maximum number of iterations | The number of iterations at best RMSE | RMSE | Neurons | Time (s) |
---|---|---|---|---|---|

ELM-PSO | 20 | 5 | 1.0387 | 1462 | 90000 |

ELM-BO | 400 | 120 | 1.0708 | 1279 | 129600 |

ELM-GA | 20 | 6 | 1.0668 | 1494 | 108000 |

In this section, the performance comparison between ELM-PSO and baseline models is performed.

First, we compare the overall performance of the five models. The evaluation metrics are

Prediction errors on each model’s test set for the W-G HSR line.

Model | RMSE | MAE | Time (s) | |
---|---|---|---|---|

ELM-PSO | 1.0387 | 0.3490 | 0.9955 | 856 |

CB | 1.6808 | 1.0464 | 0.9883 | 9 |

GBDT | 1.9976 | 1.2031 | 0.9835 | 18 |

Lasso | 1.9852 | 0.9240 | 0.9847 | 9 |

KNN | 1.6488 | 0.5025 | 0.9887 | 27 |

Prediction errors on each model’s training set for the W-G HSR line.

Model | RMSE | MAE | |
---|---|---|---|

ELM-PSO | 0.8247 | 0.3377 | 0.9973 |

CB | 1.6368 | 1.0407 | 0.9896 |

GBDT | 1.9495 | 1.2107 | 0.9852 |

Lasso | 1.6046 | 0.9199 | 0.9893 |

KNN | 1.4681 | 0.4558 | 0.9916 |

Then, by separating the delay duration into three bins (i.e., [0–1200 s], >1200 s, and all delayed trains (trains with arrival delay greater than 240 seconds)), we attempt to measure the capability of the benchmark models and our model to seize the features of train delays to varying degrees on test set. As is distinctly shown in Table

Model performance comparison on test set for the five models for different delay bins.

Delay bin (seconds) | Model | RMSE | MAE | |
---|---|---|---|---|

[0, 1200] | ELM-PSO | 0.5201 | 0.3006 | 0.9655 |

CB | 1.2628 | 0.9223 | 0.7967 | |

GBDT | 1.3968 | 0.9886 | 0.7513 | |

Lasso | 1.0957 | 0.7903 | 0.8469 | |

KNN | 1.0195 | 0.5278 | 0.8675 | |

>1200 | ELM-PSO | 5.6009 | 2.8249 | 0.9924 |

CB | 6.2335 | 3.1986 | 0.9906 | |

GBDT | 8.0733 | 4.8589 | 0.9843 | |

Lasso | 6.5023 | 3.1169 | 0.9898 | |

KNN | 9.0247 | 4.8611 | 0.9804 | |

All delayed trains (>240) | ELM-PSO | 3.3116 | 1.3457 | 0.9951 |

CB | 4.0469 | 2.2680 | 0.9927 | |

GBDT | 5.1561 | 3.1498 | 0.9881 | |

Lasso | 3.9251 | 1.7418 | 0.9931 | |

KNN | 5.5878 | 2.8828 | 0.9860 |

On the basis of the previous section, we will evaluate the performance of the ELM-PSO model from other angles, including the prediction errors for each station precisely, the prediction correctness, and the robustness.

First and foremost, the errors of the ELM-PSO model for the predicted arrival delays are calculated at the station level on test set. Viewing the overall situation in Figure

Prediction errors in terms of the RMSE, MAE, and

In addition, to put forward more detailed and embedded results, we describe the correctness of the absolute residual between the predicted values and the actual values for each station from three intervals (i.e., <30 s, 30 s–60 s, and 60 s–90 s) (Figure

Prediction correctness for each station on test set.

At last, we investigate the robustness of our model to data size. In detail, we further train and test our model using 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%, respectively, of the total data as test set, and compare the results with the baseline models. The data sizes used in the experiments are shown in Figure

MAE, RMSE, and

In this section, the Friedman test (FT) and Wilcoxon signed rank test (WSRT) are used to verify the advantages of our proposed method compared with other methods [

Friedman ranking test and WSRT results.

Models | Mean rank | FT | Model 1 vs. model 2 | WSRT | WSRT |
---|---|---|---|---|---|

ELM-PSO (M1) | 1.50 | ≤0.001 | — | — | — |

CB (M2) | 3.55 | M1-M2 | −259.200 | ≤0.001 | |

GBDT (M3) | 4.08 | M1-M3 | −261.312 | ≤0.001 | |

Lasso (M4) | 3.53 | M1-M4 | −257.456 | ≤0.001 | |

KNN (M5) | 2.34 | M1-M5 | −174.373 | ≤0.001 |

In this paper, a hybrid ELM-PSO method is proposed to predict train delays. The ELM can overcome the shortcomings of backpropagation training algorithm, and the advantage of PSO is its excellent ability in searching the best hyperparameter. Four benchmark models, CB, KNN, GBDT, and Lasso models, are selected to compare with proposed model. These models were run on the same data collected from China Railways. ELM-PSO tends to have a better performance and generalization ability (

The dataset used in this paper contains train delays under all types of scenarios. Therefore, in the future, we will consider dividing all the data into certain types of delay scenarios according to particular rules and implementing currently prevalent models to train and predict each scenario to achieve a higher accuracy. Finally, in terms of the input features, all the information of the features in this paper can be obtained from train timetables. In the future, other types of features, such as the infrastructure, weather features, and other HSR lines obstruction, will be taken into account.

The data used to support the findings of this study were supplied by China Railway Guangzhou Bureau Group Co. Ltd. under license and so cannot be made freely available. Access to these data should be considered by the corresponding author upon request, with permission of China Railway Guangzhou Bureau Group Co. Ltd.

The authors declare that they do not have any commercial or associative interest that represents conflicts of interest in connection with the paper they submitted.

Xu Bao contributed to conceptualization, prepared the original draft, was responsible for software, and visualized the study. Yanqiu Li prepared the original draft, was responsible for software, and visualized the study. Jianmin Li contributed to methodology and reviewed and edited the manuscript. Rui Shi contributed to supervision and data curation. Xin Ding contributed to data curation.

This work was financially supported by the Fundamental Research Funds for the Central Universities of China (2019JBM077) and the Open Fund for Jiangsu Key Laboratory of Traffic and Transportation Security (Huaiyin Institute of Technology).