Prediction of Train Station Delay Based on Multiattention Graph Convolution Network

Train station delay prediction is always one of the core research issues in high-speed railway dispatching. Reliable prediction of station delay can help dispatchers to accurately estimate the train operation status and make reasonable dispatching decisions to improve the operation and service quality of rail transit. ,e delay of one station is affected by many factors, such as spatiotemporal factor, speed limitation or suspension caused by strong wind or bad weather, and high passenger flow caused by major holiday. But previous studies have not fully combined the spatiotemporal characteristics of station delay and the impact of external factors. ,is paper makes good use of the train operation data, proposes the multiattention mechanism to capture the spatiotemporal characteristics of train operation data and process the external factors, and establishes a Multiattention Train Station Delay Graph Convolution Network (MATGCN) model to predict the train delay at high-speed railway stations, so as to provide references for train dispatching and emergency plan. ,is paper uses real train operation data coming from China high-speed railway network to prove that our model is superior to ANN, SVR, LSTM, RF, and TSTGCN models in the prediction effect of MAE, RMSE, and MAPE.


Introduction
High-speed rail transit will be affected by many factors such as stations, lines, and equipment [1]. Train delay will cause long time of passenger detention and bring inconvenience. In addition, with the increase of lines and the decrease of train tracking interval, the delay of one train may affect the other trains and form a knock-on effect. Train delay has always been one of the core research problems in high-speed railway dispatching [2]. Reliable prediction of station delay can help dispatchers to accurately estimate the train operation status and make reasonable dispatching decisions to improve the operation and service quality of rail transit.
Out of the above consideration, this paper aims to dig out the hidden train operation law in the actual operation data based on the previous research, that is, on the basis of the actual operation data, comprehensively consider the dual propagation characteristics of time and space of train operation delays and external factors such as weather, wind level, and major holiday to predict train station delays. is paper uses statistical analysis to observe whether the weather, wind speed, and major holiday have an impact on train delay and comprehensively considers the impact of spatiotemporal characteristics and external factors on train delay to predict the delay of some stations in a certain period of time.
e train delay prediction of high-speed railway stations is a typical spatiotemporal network prediction problem [3,4]. In the analysis of train delay, it is necessary to comprehensively consider the spatiotemporal dependence between multiple trains and multiple lines [5]. e adjacent stations are spatially related and the timestamps are related in time [6]. So, the train delay data has the characteristics of spatial dependence, temporal relevance, and spatiotemporal correlation.
In addition to spatiotemporal factors, the operation of one train is also affected by many external factors [7]. For example, in rainy, snowy, and foggy weather, the operation speed of one train is limited, which may lead to delay, and, in extremely bad weather, trains may even be suspended. In addition, passenger flow is also a major influencing factor. During major holiday, a substantial increase in passenger flow will affect the trains' stop time.
rough the above analysis, we find that the train operation analysis needs to consider not only spatiotemporal factors but also relevant external factors. In this paper, the factors we choose are wind level, temperature, weather conditions, and whether it is a major holiday. e single-train delay refers to the delay of a specific train at each station; this paper does not predict the delays of one specific train, because if one train is delayed, the specific dispatching decision is issued by the railway dispatching department, which depends on the experience and knowledge of the dispatchers. On the contrary, we vaguely predict the number of train delays in each time period for each station. e main difference between the single-train delay and station delay is whether to pay attention to the delay of a train or the total number of delayed trains in a station over a period of time.
At present, there are many SOAT models in the field of traffic prediction, but most of the predictions of flow and speed are concentrated on the highway network, such as DCRNN [8] and its derived models.
It is difficult for us to directly apply these models to the prediction of train station delay; the reasons are as follows: (1) At present, we cannot obtain such close train operation data in time and space similar to the highway network. (2) All kinds of vehicles running on the highway have no fixed speed and direction, and the train needs to travel in strict accordance with the minimum and maximum speed limit and line on the train diagram. Many traffic prediction SOAT models are based on random walk, so they cannot be directly applied to train delay prediction. (3) Traffic predictions are often concentrated on several roads or within a city. But this paper uses a large dataset, including most high-speed rail stations and lines in China. Its research scope runs through China, and almost no highway prediction work is established in such a large range. is problem brings us more difficulties, such as the extraction range of node features, the capture of spatiotemporal characteristics, the different train operation laws between different regions and different lines, and the test of the robustness of the model. ere are many works on the analysis and prediction of train delay in high-speed railway. For example, Liu et al. [9] used statistical methods to study the actual operation data of the two stations of Beijing-Shanghai railway lines and calculated the delay rate of the station; Milinković and Marković [10] proposed a fuzzy Petri net (FPN) model to simulate the traffic process and train operation in the railway system to estimate train delays; Marković and Milinković [11] analyzed the relationship between passengers and various characteristics of the railway system in train arrival delays and applied the support vector machine model to make train delay analysis; Lessan et al. [12] built a train delay prediction model based on Bayesian network. Our work is an improvement of the paper of Zhang et al. [13]; compared with that paper, we proposed the multiattention mechanism to achieve more accurate prediction, and we will introduce the differences in Section 3.3. Most of these works have some similar characteristics: (1) e research on train operation data mostly stays in the stage of statistical analysis but fails to tap the hidden train operation law in it. (2) It is rare to consider the spatiotemporal attributes of trains. e temporal impact caused by delay is obvious, but the spatial impact of different lines in some hub stations is often ignored. (3) Almost no research considers the comprehensive impact of spatiotemporal characteristics and external factors.
Compared with existing works, the contributions of this paper can be summarized as follows: (1) We define the train operation network as a graph and the stations on the network as nodes and add node features. We define the lines connecting stations as edges and the reciprocal of the distance between adjacent stations as the weight of edges, indicating the mutual influence between adjacent stations. (2) We propose a MATGCN model based on multiattention mechanism to predict the total number of train delays at one certain station in a certain period of time; this mechanism makes MATGCN able to adjust the parameters during training according to the importance of different attributes, so as to have better robustness. (3) We spent a lot of time building a high-speed rail delay dataset and published it on Figshare [14]; this dataset contains the train operation data from October 8, 2019, to January 27, 2020, and the train delay data of the railway stations passing by these trains. Weather, temperature, wind power, and major holidays are considered as factors affecting train operation. As we know, this is the first public largescale high-speed rail delay dataset. (4) In the contrast experiment, we use real-world data and make predictions for 1 to 6 hours. e result shows that our MATGCN model can well capture the periodic law of train operation and maintain good accuracy in long-term prediction.
e following parts of this paper are organized as follows: Section 2 systematically investigates the existing train delay prediction and spatiotemporal data mining methods.

2
Journal of Advanced Transportation Section 3 shows the materials and methods. Section 4 shows the results of the experiment and Section 5 summarizes the work of this paper.

Literature Review
Some achievements have been made in the prediction of train delay previously. Generally, it can be divided into the following categories: (1) works based on scenario calculation and simulation data; (2) works based on actual data without considering the spatiotemporal characteristics of train operation; (3) works based on actual data and considering external factors but ignoring the spatiotemporal characteristics of train operation; (4) works based on the actual performance data, considering the spatiotemporal characteristics of train operation but ignoring the external factors. Some studies are not based on actual train operation data. For example, Wang et al. [15] analyzed the four aspects of people, equipment, environment, and management and further selected 14 main influencing factors of train delay; the interpretive structure model is used to analyze the train delay. Based on scenario calculation, Ma [16] analyzed the influencing factors of train delay degree and calculated the corresponding weight through expert scoring method and analytic hierarchy process, solved the models of different scenarios by introducing genetic factor and information entropy, and solved the train operation adjustment model by example simulation, so as to adjust and optimize the train delay model. Some studies are based on actual performance data but do not consider the spatiotemporal characteristics of the train. For example, Huang et al. [17] put the delay time of the train at the initial late station, the total delay time of train passing through each station, and the total interval buffer time for each stop, as well as the 0-1 variable that identifies whether the train is delayed through the Zhuzhou West-Changsha South interval as independent variables, and used random forest regression to predict train delays. Oneto et al. [18] proposed a fast learning algorithm for shallow and deep extreme learning machines based on the useful and actionable information in a large amount of historical train operation data of the Italian railway network and made full use of the recent memory scale data processing technology to predict train delays. Some studies consider external factors but do not consider the spatiotemporal characteristics. For example, the research of Oneto et al. [19] does not use the historical data of train operation but uses the static rules established by railway infrastructure experts based on classical univariate statistics and uses the weather information provided by the national meteorological service to further improve the model. e train operation data changes with time and space. e model that only depends on the rules defined by experts has poor flexibility and portability, and it is hard to grasp the train operation law in the data.
More studies consider the spatiotemporal characteristics on the basis of actual operation data but ignore the impact of external factors. For example, Huang et al. [5] used the dynamic system of moving objects to generate multiattribute data, including static, time series, and spatiotemporal format, and used a three-dimensional convolutional neural network. e long-term and short-term memory cycle neural network and fully connected neural network were used to predict train delay. Zhang et al. [20] comprehensively considered the relationship between the delay propagation of current train and its adjacent trains, constructed a hierarchical prediction model of train associated delay based on wavelet neural network for delay prediction, and divided it into four categories: serious delay, dissipated delay, potential delay, and general delay. Lessan et al. [12] proposed a train delay prediction model based on Bayesian network, which used the real train operation data from high-speed railway line and adopted three different Bayesian network schemes to capture the superposition and interaction of train delays. Zeng et al. [21] designed the classification method of initial delay and associated delay on the basis of delay propagation analysis and performance data statistics. Based on the data provided by the classification method, they proposed a delay prediction model and used back-propagation neural network to predict the delay time. Hu et al. [22] established the prediction model of train delay recovery time by using multilayer perceptron and cyclic neural network with initial delay time, station stop redundancy time, and interval redundancy time. Corman and Kecman [23] used Bayesian network to predict train delay propagation based on a set of historical traffic actual data of busy sections in Sweden and fully considered the dynamic changes of train delay with time and space. Hou et al. [24] used the train operation records from the scheduled and actual train schedules to sort the modeling data, used the stepwise regression method to determine the importance of the influencing factors corresponding to the train delay time, and applied the gradient boosting regression tree to construct the delay recovery model.
It can be observed that the above research methods mainly have one or more of the following problems: (1) e spatiotemporal correlation of train delay is not comprehensively considered. (2) e impact of external factors such as weather and major holiday on train operation is not considered. (3) ere is too much focus on the delay prediction of one specific train but the importance of dispatchers is ignored. (4) Some works do not use actual train operation data, and there will be problems in the actual application. e change of weather plays an important role in train operation. Ludvigsen and Klaeboe [7] evaluated how the 2010 winter weather affected rail freight operations in Norway, Sweden, Switzerland, and Poland, as well as the response behavior mobilized by railway managers to reduce adverse consequences. e results show that railway operators are not prepared to deal with the three kinds of bad conditions: low temperature, heavy snow, and strong wind. Moreover, studies have shown that 60% of the delays of freight trains are related to winter weather. For Journal of Advanced Transportation example, with a snowfall of 5 millimeters and a temperature below −20°C, there will be a 79% change in arrival delay.
In fact, some works consider the external factors, but a common way like Huang et al. [25] did is to treat these as the nonoperational data and use the simple fully connected layers to process, but our paper thinks that these data can be better processed by treating as the feature of the nodes in graph and should be added in the model to do convolution duo to the spatiotemporal characteristics as mentioned above.
In the graph convolution, we propose a multiattention mechanism; it consists of three parts: a spatial attention mechanism for different nodes in network, a temporal attention mechanism for the correlation of traffic conditions in different time slices, and a multifeature attention mechanism for different external factors fed into MATGCN.
During the experiment, we conducted experiments without considering the spatiotemporal attention mechanism, only considering the spatiotemporal attention mechanism, and considering the above three attention mechanisms.
e results show that the three attention mechanisms proposed in this paper play a positive role in improving the performance of the model.

The Method
Before this section, as shown in Table 1, we first give a table of notation definitions to help find the meanings of notations used in the model and method descriptions.

Train Delay Prediction.
Train delay can be roughly divided into station delay, interval delay, line delay, singletrain delay, boundary delay, and so on. e work of this paper focuses on the prediction of station delay which refers to the delay of trains passing through one station in a certain period of time.
e train operation network can be regarded as an undirected graph [16]. e nodes in the graph represent a series of interconnected stations, and the connection between stations is determined by the running lines of one or more trains. Any train running on the train network has an itinerary consisting of station S � S 1 , S 2 , . . . , S N . is itinerary is composed of a departure station, a target station, and one or more intermediate stations. ese stations are distributed in different locations. For one station, the scheduled arrival time in station S is T S SA and the scheduled departure time in station S is T S SD . According to the railway operating plan, these data should be accurate and strictly implemented. It should be noted that the initial station S 1 has no scheduled arrival time, and the target station S N has no scheduled departure time.
In this way, through the analysis of the trains at all stations, we convert the existing train operation data into spatiotemporal data and then add historical weather data from China Weather Network (https://www.tianqi.com), as well as the information of major holiday.

Data Collection.
e train operation data used in this paper comes from the train delay data of the China Railway Ticket System (https://www.12 306.cn) and the historical weather data from the China Weather website (https://www. tianqi.com) [14]. It is spliced according to date and station ID, including the train operation records of 727 stations from October 8, 2019, to January 27, 2020. e attributes include arrival delay, departure delay, wind level, weather condition, temperature, and major holiday. e train operation data is recorded in whole minute. e running data of some passed trains can be seen in Table 2.  Table 3 shows the historical weather data published by China Weather Network with major holiday including Spring Festival and Public Sacrifice Day.

Data Analysis.
Train operation data is typical spatiotemporal network data [5]. In the real high-speed railway network, the operation of trains has a strong spatial dependence, temporal relevance, and spatiotemporal correlation. Spatial dependence is the direct influence between adjacent stations. e number of train delays at the next station will be affected by the delays at the previous station. Temporal relevance refers to the fact that the delay of a certain time period at a certain station has the same trend as that in the past few days and weeks. Spatiotemporal correlation refers to the fact that, in the spatial dimension, the mutual influence between different stations is different. Even the same station has different effects on its adjacent stations over time, and, in the time dimension, the historical observation data of different stations have different effects on the delay status of the station and its adjacent stations at different times in the future; therefore, the train operation data of high-speed railway shows strong dynamic correlation in spatiotemporal dimension.
is paper uses three ways to sample data: the latest time series (by hour) and the time series of one day and one week. Weather conditions and major holidays also have dual attributes in time and space. From the perspective of temporal dimension, for a special station, the change of weather in a week will be greater than that in a day, and the change in a day will be greater than that in each hour. From the perspective of spatial dimension, in the same time period, different stations have different weather. For example, the weather conditions between closer stations will be more same, while the weather conditions of stations farther away will be more different. erefore, we believe that weather factors have spatiotemporal characteristics. For major holidays, we believe that the major holiday factors have the temporal characteristics.
is paper makes statistics on the external data. Among the 1,954,176 pieces of data, about 89.59% of the day it is weak wind, about 10.02% it is middle wind, and 0.37% is strong wind; 96.63% of the trains are in good weather, 2.11% in normal weather, and 1.24% in bad weather. At the same time, about 7.14% of the days are major holiday and 92.85% are not major holiday. Table 4 shows the departure delay and arrival delay rate of train operation under various external factors. For example, in good weather, the departure delay rate of train operation is 16.38%; in normal weather, the rate is 17.78%; and, in bad weather, the rate is 19.56%.
In order to more directly observe the influence of different external factors on the change of departure and arrival rate, this paper uses a heat map to describe it. As shown in Table 5, the departure and arrival rates under different weather conditions and wind levels and in whether it is a major holiday are changing. External factors are the statistics of the proportion of the total data of each factor. For example, 7.14% of the days are major holiday. As the color gradually deepens from left to right, with the increase of wind level, the worse of weather conditions, and the influence of major holiday, the departure and arrival rates increase, that is, the external factors used in this paper have impacts on the departure and arrival rate.

Data
Processing. However, there are nearly 80 types in different weather, wind direction, wind level, and holiday. Although many of them are different, the impact on train operation is roughly the same; for example, southwest wind levels 1-2 and northeasterly wind levels 1-2 are relatively low wind levels and have roughly the same impact on train All the features of all stations in t time periods y τ i ∈ R e number of arrival delays of station i in the future time period τ Y � (y 1 , y 2 , . . . , y N ) T ∈ R N×T P e arrival delay sequence of all stations e arrival delay sequence of station i in the future T P time period  Journal of Advanced Transportation 5 operation. erefore, these two types of wind direction and wind level can be classified as weak wind levels. Similarly, the wind levels are classified in this paper. e wind below level 4 is weak, the wind from level 4 to level 6 is middle, and the wind above level 6 is strong. e weather conditions are classified. Nine kinds of weather such as sunny and cloudy are classified as good weather, six kinds of weather such as moderate snow and moderate rain are classified as normal weather, and nine kinds of weather such as sleet and blizzard are classified as bad weather, as shown in Table 6. But we find that the weather conditions, wind level, and holiday data are not numerical and cannot be fed into the MATGCN model for calculation and training. erefore, we use one-hot encoding to transcode these data. is process is implemented by using Python machine learning third-party library scikit-learn.
As shown in Algorithm 1, the input data are spatiotemporal and external factors data and columns that need to be encoded. e program reads the original data, uses the OneHotEncoder class provided by scikit-learn to convert nonnumerical columns into one-hot encoding and combines and splices the converted data with the original data to obtain numerical data that can be applied to model calculations.
e conversion result is shown in Table 7. Take the data in the first row as an example, during the period from 2:00 to 3:00 on October 8, 2019 (not a major holiday), at WanZhou Station, the temperature is 22°C, the wind level is weak, the weather is good, and there are no delayed trains.
We need to reprocess and modify the original data of the train as in Table 8, assuming that the actual arrival time of the train in station S is T S AA , the actual departure time in station S is T S A D , T S AA − T S SA is defined as the arrival delay, and, similarly, T S A D − T S SD is defined as the departure delay. If T S AA − T S SA >0, it will be counted as an arrival delay; if T S A D − T S SD >0, it will be counted as a departure delay.

MATGCN.
e train network is defined as an undirected graph G � (S, E, A, M), where S is the set of all stations; |S| � N, and N represents the number of stations, E is the set of all edges, which represents the train line between the stations, A ∈ R, representing the connectivity between the stations, is the adjacency matrix of G, and M representing the distance between the stations is the distance weight matrix of G. Because the greater the distance between the two stations, the less the influence, the weight is also smaller. In G, each station has a number of statistical values in the time period τ, including the total number of departure delays and arrival delays. We use F to represent the number of station features, and X τ i ∈ R represents all features of station i in τ. X τ � (X τ 1 , X τ 2 , . . . , X τ N ) T ∈ R N×F represents all features of all stations in τ. χ � (X 1 , X 2 , . . . , X t ) T ∈ R N×F×t represents all the features of all stations in t time periods; that is, χ ∈ R N×F×T . In addition, we set y τ i ∈ R to represent the number of arrival delays of station i in the future time period τ. Given a fixed time period τ, the various eigenmatrices of all stations on the train network generated by the train dataset in the past τ time period are used to predict the arrival delay sequence of all stations on the entire train network Y � (y 1 , y 2 , . . . , y N ) T ∈ R N×T P in the future T P time period. Among them, y i � (y τ+1 i , y τ+2 i , . . . , y τ+T P i ) represents the arrival delay sequence of station i in the future T P time period. e MATGCN model (as shown in Figure 1) is a significant improvement of TSTGCN [13]. TSTGCN is a train station delay prediction deep learning model we proposed before, which uses train operation data on the original high-speed railway network and effectively captures dynamic spatiotemporal characteristics to predict the delay of high-speed train stations. Our MATGCN model does some significant change based on TSTGCN. Like TSTGCN, we divide the input data into three  categories, the recent, daily period, and weekly period, but we add more external features into the graph nodes and redivide the input data as follows: recent-external, daily-period-external, and weekly-period-external, and further the multiattention attention mechanism we proposed is a combination of spatial attention module, temporal attention module, and multifeature attention module; it can solve the spatiotemporal data and process   the input data in every layer according to its importance to the model. So it is much better than the TSTGCN. We use the similar ways to combine the results from three components to get the final result. en we will introduce the MATGCN in detail. As shown in Figure 1, the input data is the integration of three time series (X h , X d , X w ) with external factors data X E . When these data pass TAtt (temporal attention block) and SAtt (spatial attention block), the MATGCN model can capture the spatiotemporal correlation; when they pass the MAtt (multifeature attention block), MATGCN can add attention matrix to external factors and then model the spatial characteristics of the nodes on the train operation network through the GCN and make full use of the correlation of the graph node signals in the train operation network. Finally, the result is obtained by fusing the output of the three components through the full connection layer according to the influence weight.

Input Row Data.
e input data are divided into three categories: (1) Recent time series data with external factors. e arrival delay of the previous one or more stations in the past will affect the arrival delay of multiple stations in the future; among them, external factors will have an effect on it. e mathematical representation is as follows: (1) (2) Daily-period series data with external factors. People's daily travel is regular; station delays may occur in a relatively fixed time period, such as five to six o'clock in the afternoon every day, and external factors will have an effect on it; the purpose of the daily-period component is to simulate the dailyperiodity of the train arrival delay data. e mathematical representation is as follows: (3) Weekly-period series data with external factors. e weekly attributes and time intervals of these fragments are the same as the predicted period. Normally, the traffic pattern on Wednesday is similar to the traffic pattern on Wednesday in history, but it may be very different from that on ursday and Friday, and external factors will have an effect on it. For example, even if there are similar train delay rules every week, this rule will change under continuous blizzards. erefore, external factors also   Journal of Advanced Transportation play a key role in exploring the rules of train delays. e mathematical representation is as follows: 3.3.2. GCN. In this paper, GCN is used to model the spatial characteristics of nodes on the train operation network. In the spatial dimension, train operation data is a kind of graph structure data. Different from grid data, it exists in non-Euclidean space, which makes it difficult for the traditional neural network to process. However, graph convolution neural network can directly model the original graph structure data and obtain the representation of nodes in graph structure data. In this paper, the spectral method is used to define the graph convolution. e spectral method uses the convolution theorem and Fourier transform to transfer the graph from the node domain to the spectral domain and then defines the convolution kernel in the spectral domain.

2D-CNN.
CNN is a type of feedforward neural network that contains convolution calculations and has a deep structure. It is specially used to process data with a similar grid structure. is paper uses 2D-CNN to model the time correlation characteristics of nodes on the train operation network. After collecting the adjacency information of each node on the train operation network in the spatial dimension, the graph convolution operation updates the node signal by merging the information of adjacent time slices along the temporal dimension to capture the dependence between adjacent time slices. Taking the r-th layer in the daily-period component as an example, its convolution operation is shown as follows: where ReLU is the activation function and ϕ is the temporal dimensional convolution kernel parameter.

Attention Mechanism.
MATGCN model uses a multiattention mechanism including a spatial attention mechanism, a temporal attention mechanism, and a multifeature attention mechanism. is multiattention model can well capture the spatiotemporal correlation and process the input data in every layer according to its importance to the model. In the temporal dimension, there is a correlation between the arrival delays of stations in different periods. e correlation of each station is also changing in different time.
e arrival delays in the previous periods will affect the future arrival delays of the stations on the line.
We calculate the time weight matrix Z of the input data. e element Z ij in Z represents the degree of dependence between times i and j. e calculation formula is as follows: where, · means inner product, · means Hadamard product, X � (X 1 , X 1 , . . . , X T r−1 ) ∈ R N×F r−1 ×T r−1 represents the input data of the r-th layer of multiattention module, F r−1 represents the number of features of the r-th layer, T r−1 represents the length of the time series of the r-th layer, the activation function is sigmoid, are characteristic transformation matrices, which are learnable parameters. After that, we use the softmax function to normalize Z to ensure that the sum of attention weights is 1 and get the final time attention matrix: e obtained time attention matrix will be directly applied to the input of the r-th layer of spatiotemporal module to obtain the input data X integrating temporal attention X Z′ � X ⊙ Z ′ ; then X Z′ will be used as input to the spatial attention module.
Different features have different effects on train delay, so, in this paper, we propose a multifeature attention mechanism to capture this difference: In the above equation, X (r−1) h � (X 1 , X 2 , . . . , X r−1 ) ∈ R N×F r−1 ×T r−1 represents the input data of the r-th layer of multifeature module, U ∈ R F r−1 ×F r−1 , b p ∈ R N×F r−1 ×T r−1 , and V p ∈ R T r−1 ×N×N are learnable parameters, * represents the matrix batch dot, the activation function is sigmoid, attention matrix P is dynamically calculated according to the current input of this layer, and S i,j in S semantically represents the importance of different features of different nodes to the model; after that, we use the softmax function to normalize P to ensure that the sum of attention weights is 1 and get the final time attention matrix: In the spatial dimension, there is a certain correlation between the arrival delays of trains at different stations; in particular, the influence between adjacent stations is highly correlated, and the interaction between adjacent stations with different distances is also different. e greater the distance between the two stations, the greater the possibility of adjusting from the delayed state to normal; then the delay impact of the current station on the next is smaller. Assuming that the distance between station i and station j is d S i S j , the weight of the corresponding position of the distance matrix is Consider the static characteristics of high-speed railways network. We calculate the correlation weight matrix C of the input data. Element C ij in C represents the correlation between stations i and j. e calculation formula is as follows: In the above equation, X Z′ ∈ R N×F r−1 ×T r−1 represents the input data processed by the multifeature attention module of the r-th layer; W 1 ∈ R T r−1 , W 2 ∈ R F r−1 ×T r−1 , W 3 ∈ R F r−1 , and V S , b S ∈ R N×N are the feature conversion matrices, which are learnable parameters. By fusing the correlation weight matrix C and the distance weight matrix M ′ , we obtain the spatial attention matrix Q. Similarly, we use the softmax function to normalize Q to obtain the final spatial attention matrix Q ′ . e calculation formula is as follows: e spatial attention matrix can capture the correlation and distance influence between nodes on the train operation network. When performing graph convolution, we will dynamically adjust the influence weight between nodes with adjacency matrix and spatial attention matrix.

Multicomponent Fusion.
In central cities such as Beijing, the passenger flow has obvious peak periods in the morning or evening, and trains may also be delayed. erefore, the output of daily-period and weeklyperiod components is more critical. In some remote areas, due to the lack of strong periodic passenger flow, the possible prediction results of daily-period and weekly-period components are less accurate. erefore, when the outputs of these three components are fused, the weight of the influence of the three components on each node is different, which needs to be determined according to the historical data of train operation. So the final fusion result of the three components is

Results and Discussion
In this paper, we use the three following common evaluation indexes to evaluate the prediction performances of ANN, SVR, LSTM, RF, TSTGCN, and MATGCN models. ey are mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE). e calculation formulas are as follows: In the above equation, x i is the actual value, x i is the predicted value, and n is the number of test samples.
We implement the MATGCN model on the MXNet framework. In our model, the term of the Chebyshev polynomial is set to 3, and all graph convolution layers use 64 convolution kernels. All time convolutional layers also use 64 convolution kernels and adjust the time span of data by controlling the step size of time convolution. We set T h � 3, T d � 1, and T w � 1. e size of the prediction window is T p � 1; that is, our goal is to predict the number of delays in the arrival of the station in the next hours. In the training phase, the batch size is 4 and the learning rate is 0.000 01.
We implement ANN, SVR, RF, LSTM, TSTGCN, and MATGCN models on Windows 10 system. Among them, ANN uses a single hidden layer network structure with a learning rate of 0.01; the kernel function of SVR selects poly, and the learning rate is 0.001; the learning rate of RF is 0.001, and the batch size is 128; LSTM contains two hidden layers, and the activation function of the hidden layer is ReLU, the gate activation function is sigmoid, the number of outputs per layer is 100, the activation function of the output layer is softmax, the loss function is L2Loss, and the learning rate is 0.001. TSTGCN is based on MXNet, the batch size is 4, and the learning rate is 0.000 01. Except for RF and TSTGCN, the training batch sizes of other models are all 64, and the other parameters remain the default.
We compare MATGCN with the other five learning models on the processed station delay dataset. Table 9 shows the results of train arrival delay prediction performance in the next hour. Among them, the best two scores are displayed in bold.
It can be observed that, among the five benchmark models, the best MAE value is 0.444 7 (SVR), the best RMSE value is 0.8299 (SVR), the best MAPE value is 53.660 8 (ANN), and the TSTGCN score is 0.160 0, 0.450 0, and 34.360 0; the effects of ANN, SVR, RF, and LSTM that only use train delay data as time series data for prediction are far inferior to TSTGCN. Although TSTGCN considers that train station delay data is spatiotemporal data, it does not consider the external factors of train operation. It can be seen that, compared with TSTGCN, MATGCN without MAtt has a 6.66% decrease in MAE, a 6.66% decrease in RMSE, and a 27.73% decrease in MAPE, and MATGCN with MAtt has a 33.33% decrease in MAE, a 26.19% decrease in RMSE, and a 35.84% decrease in MAPE and obtains the best prediction performance.
Figures 2(a)-2(c) show the performance of various methods to predict the number of train delays at stations in the next 1 to 6 hours. We can observe the changes in the prediction performance of each method as the prediction duration increases. In general, as the prediction duration increases, the corresponding prediction difficulty becomes greater, so the prediction error is also increasing. e errors of ANN, SVR, RF, and LSTM are always maintained at a high level. e prediction ability of RF decreases sharply. In contrast, the performance of LSTM decreases slowly. It can be seen from the figure that the MATGCN proposed in this paper has also obtained better prediction results than TSTGCN and can achieve the best prediction performance almost at any time. Even in the long-term prediction, the error remains at a low level.
is is because the spatiotemporal correlation and external factors are particularly important in the longterm prediction.
rough the above analysis, we find that, compared with other existing methods, MATGCN can more comprehensively consider the spatiotemporal and external factors that affect train operation and shows excellent performance in station delay prediction.

Conclusions
Focusing on the spatiotemporal and dynamic correlation of high-speed railway train operation data, this paper constructs MATGCN model based on multiattention mechanism to predict the train delay at high-speed railway stations. is model combines multiattention mechanism and spatiotemporal convolution, including spatial dimension graph convolution and temporal dimension standard convolution, to capture the spatiotemporal characteristics of train operation data at the same time, and adds multifeature attention mechanism to process the external factors such as weather conditions, wind level, and major holiday to achieve more accurate prediction. In the experimental stage, we compare and evaluate the MATGCN model proposed in this paper with the ANN, SVR, LSTM, RF, and TSTGCN models and use MAE, RMSE, and MAPE to evaluate the prediction effect of the model. e result shows that the three attention mechanisms play a positive role in improving the performance of the model.

Data Availability
e train operation and external feature data used to support the findings of this study have been deposited in the Figshare repository: https://figshare.com/articles/dataset/A_high-spe ed_railway_network_dataset_from_train_operation_record s_and_weather_data/15 087 882.  Additional Points e focus is to propose a multifeature attention mechanism to capture the different effects of different external factors such as weather and holidays on train operation. e results show that the MATGCN is better than TSTGCN.

Conflicts of Interest
e authors declare that they have no conflicts of interest.