Road Travel Time Prediction Based on Improved Graph Convolutional Network

Travel time prediction is playing an increasingly important part in advanced traveler information system (ATIS), which is of great significance to alleviate urban traffic congestion. Although graph convolutional networks have been widely used in road network traffic prediction, spatiotemporal dynamic modeling of urban traffic is still an intractable task. In this study, we propose an improved graph convolutional network (IGC-Net) for travel time prediction. Specifically, we design a modified adjacency matrix by fusing distance and correlation matrix with original adjacency matrix to capture spatial dynamic feature. We then establish three components based on temporal property to capture recent, daily-periodic, and weekly periodic correlations.+e comparison experiments with baselinemodels and variants on a real-world dataset in Beijing are conducted.+e results show that the IGC-Net outperforms baseline models in different prediction horizons and has stronger robustness for dynamic traffic prediction.


Introduction
In recent years, the problem of urban traffic congestion has become more and more serious. e duration of commuting directly determines people's travel experience, which is not only an important index to measure the efficiency of urban operation but also an important factor affecting the life quality of residents. Road travel time can directly reflect the traffic congestion state of the road segment, which is an important basis for the development of intelligent transportation system (ITS) [1,2]. Travel time prediction is of great significance and application value to both traffic users and traffic managers. e study of traffic forecasting has been developing over the past few decades. Initially, the researchers used statistical methods including historical average (HA) and autoregressive integrated moving average (ARIMA) [3,4] to predict temporal traffic parameters. However, simple statistical models are difficult to capture the nonlinear relationship of traffic data. anks to the progress of computer technology and the diversification of data acquisition system, neural networks modeling and multisource information fusion are widely used in traffic prediction research, which greatly improves the accuracy of traffic prediction. Wu et al. [5] used the support vector regression model (SVR) for traffic prediction. Tian and Pan [6] utilized long short-term memory network (LSTM) to predict traffic flow, which demonstrated the effectiveness of recurrent neural network (RNN) to forecast time-series data. However, the traditional machine learning model still has deficiencies in capturing periodic features and selecting model parameters. Some researchers further proposed hybrid models to predict traffic parameters [7][8][9]. For example, Li et al. [10] established a model based on ensemble empirical mode decomposition and random vector functional link network for travel time prediction on highway network. ey also proposed another model based on a deep belief network, which is optimized by the multiobjective particle' swarm algorithm [11]. Moreover, traffic prediction research is also extended to the spatial scope. e researchers first analyzed the characteristics of spatiotemporal data. For instance, Zhang et al. [12] divided the city into a grid to explore the spatial distribution and correlation of the cellular traffic and analyzed temporal dynamics between different cells using autocorrelation coefficient. Based on the analysis of temporal and spatial characteristics, the convolutional neural network (CNN) was widely applied [13,14] to capture traffic features as images. However, due to the real traffic network is complex, the standard convolution with Euclidean grid is no longer suitable for general graphs.
In recent years, there are two ideas to explore how to generalize convolutional neural network into structured data forms. One is to expand the spatial definition of convolution network in the spatial domain [15], and the other is to process it by graph Fourier transform in spectral domain [16]. e former directly defines the convolution operation on the connection relationship of each node, which is more similar to the traditional convolution neural network. e latter is to realize convolution operation on the topological graph with the help of the spectral graph theory. e graph neural network (GNN) [17] has been widely used in traffic prediction. In addition, deep learning techniques can automatically extract features of multisource data [18] and model more complex spatial and temporal traffic patterns in various traffic scenarios. e sequence-to-sequence (Seq2Seq) model with encoder-decoder structure [19,20] combined with graph convolutional network (GCN) which has been widely used to construct spatiotemporal prediction models. For instance, Yu et al. [21] adopted the method of spectral graph convolution combined with gated recurrent neural network to obtain spatiotemporal features. Guo et al. [22] proposed an attention-based spatial-temporal graph convolutional network (ASTGCN) model to realize the traffic flow forecasting. Nevertheless, due to the complexity and dynamics of the actual traffic network, the traditional adjacency matrix cannot effectively capture the time-varying spatial dynamic characteristics.
To solve the above problems, an improved graph convolutional network is proposed in this paper to improve the accuracy of travel time prediction. e primary contributions of this paper are as follows: (i) We propose a modified adjacency matrix to better capture the spatial features by integrating dynamic weight information between road segments. e main idea is to construct distance weight matrix and correlation weight matrix, respectively, based on geographical location attributes and dynamic traffic information.
(ii) According to the temporal property, we establish the recent, daily, and weekly component to model the temporal dependencies. Furthermore, we use the same improved graph convolutional network in three components to capture spatiotemporal characteristics. (iii) By using real-world datasets, we conduct baseline model comparison and ablation experiments to evaluate our model performance. e prediction results demonstrate the superior performance of our proposed model. e remainder of this paper is organized as follows. In Section 2, we present the basic concepts and problem formulation and describe the concrete modeling process. In Section 3, we introduce the experimental environment and setting. e experimental results are analyzed in detail in Section 4. Finally, Section 5 summarizes the study and looks forward to the prospects for the future.

Preliminaries.
In this section, we first describe the notations of variables and formalize the traffic prediction problem.

Traffic Network Topology.
e road network is defined as a graph G � (V, E, A), where V is the set of N vertices (i.e., road segments) v i ∈ V and E is edges between different vertices (v i , v j ) ∈ E. e adjacency matrix A reflecting the connectivity between road segments can be indicated as A � (a ij ) N×N ∈ R N×N . a ij � 1 if node v i and v j are accessible; otherwise, a ij � 0.

Travel Time.
e travel time of each road segment is normalized to unit length time (s/m) considering the influence of unequal segment length. e travel time of the whole traffic network G at tth time slot is defined as denotes all node values over the first p time steps of input.

Historical Average Travel Time.
Historical traffic information can reflect the trend of daily traffic conditions. e historical average travel time at tth time slot of road segment i is denoted by x i t,h .

Problem
Definition. e task of travel time forecasting is to use the past traffic observations to predict the future value of each road segment in a certain period. Given traffic network graph G, the prediction problem is formulated as where Y t+q denotes the predicted value of the qth time step, X t , X t−1 , . . . , X t−p+1 represents the historical values of the 2 Mobile Information Systems first p time steps, and Pr(·|·) is the conditional probability function.

Improved Graph Convolutional Network.
In this study, the urban traffic network is regarded as a graph structure, and the features of each node are the signals on the graph [23]. In spectral convolution, the graph is represented by its corresponding Laplace matrix.

Spectral Graph Convolution.
As it is difficult to express meaningful translation operator in the node domain [17], the spectral convolution on graphs is defined as the multiplication of a signal X ∈ R N (a scalar of all nodes at a time slot) with a filter g θ � diag(θ) parameterized by θ ∈ R N in the Fourier domain [24]. According to the convolution theorem and Fourier transform of graphs, the spectral graph convolution is defined as where * G represents a graph convolution operation, U ∈ R N×N is the Fourier basis composed of eigenvectors, and Λ ∈ R N×N is a diagonal matrix of eigenvalues. However, when the scale of the graph is large, it is computationally expensive to decompose the eigenvalue of the Laplace matrix in equation (2). Hammond et al. [25] used Chebyshev polynomials to circumvent this problem. Furthermore, a layerwise linear model can be built by stacking multiple localized graph convolution layers with 1order approximation of graph Laplacian [26].

Modified Adjacency Matrix.
e spatial dependence in the actual traffic network is complex, which is affected by the geospatial distance and dynamic traffic flow. We further construct distance matrix and correlation matrix to capture spatial attention.
Distance matrix W d : we first calculate the shortest distance d ij between node v i and v j by Dijkstra algorithm and then use the reciprocal of the distance to represent the weight between two vertices: where δ is set to 1000 to control the sparsity of the matrix. Correlation matrix W r t : due to the daily commuting rules, urban traffic has significant periodic characteristics. erefore, based on the Pearson correlation coefficient method, we utilize historical average value x i t,h to obtain the dynamic correlation between road segments: W r � 1 r t,12 · · · r t,1N r t, 21 1 where the correlation coefficient r ij < 0.2 means that the two vertices are almost uncorrelated.
Based on the above, the modified adjacency matrix is defined as where ⊙ is the Hadamard product operator.

Improved Graph Convolution (IGC).
With the modified adjacency matrix A ′ , the calculation formula between multiple graph convolutional layers is as follows: To capture a larger range of spatial correlations, we leverage the residual learning to our model, which has been proved to achieve better results in deep network training [21]. e residual units are denoted as where A ' � A ′ + I N is the modified adjacency matrix added self-connections and D ii ′ � j A ij ′ . H (l) , H (l+1) , and Θ (l) are the input, output, and a trainable matrix of the lth layer, respectively. f(·) denotes a nonlinear activation function, e.g., ReLU. Figure 2 presents the overview of our proposed prediction model, which is mainly composed of four components to model temporal dependencies and spatial correlations. IGC_1, IGC_2, and IGC_3 have the same network structure, which can capture recent, daily-periodic, and weekly periodic correlations, respectively. According to the definitions in Section 2.1, the sampling frequency of historical observations is 288 times per day, the current time slot is t, and the prediction horizon is q in this study. erefore, the input of the above three components can be defined as follows, where l r , l d , and l w indicate the lengths of three parts of data.

Framework of the Prediction Model.
(i) Recent: χ r � (X t− l r +1 , X t− l r +2 , . . . , X t ) T (ii) Daily: en, the result Y IGC is obtained from the output of each component by means of parametric-matrix-based fusion: where W 1 , W 2 , and W 3 denote trainable matrices.
In the rightmost component of Figure 2, the metadata are transformed into the binary vector by one-hot encoding and the fully connected neural network is utilized to process binary feature. Finally, by integrating Y IGC with the output Y meta , we yield the predicted value by means of Tanh function: We leverage minimizing mean squared error to calculate the loss between predicted value and historical observation. e loss function is expressed as where δ represents all learnable parameters in the model.

3.1.
Data. e data used in our experiments includes road network geographic information (as shown in Figure 3) and road segments travel time data from October 1 to December 31, 2019, in Beijing, for a total of 13 weeks. In the datasets' division, we divide the data of the first ten weeks into nonoverlapping training set and verification set samples on the timeline and take the data of the last two weeks as the test set.

Experimental Settings.
e modeling process is carried out on the Anoconda3 using Python. e experiments are performed on a server with an Intel Core i9-9900 KF 3.60 GHz CPU and 32-GB RAM. Furthermore, a GPU with 8G memory is used to accelerate the model learning process.
In the experiments, the model is trained by the Adam optimizer [27], where the learning rate is set to 0.001 and the batch size is 64. Taking the mean square error (MSE) as the objective function, the early stop technique is applied on the verification set to avoid overfitting. For our proposed model, we use the data of previous 60 minutes to predict single-step (5 min) and multistep (15 min and 30 min) traffic in the future. Take 30 min as an example, the prediction horizon q � 6, and the lengths of three parts are set as l r � 12, l d � 18, and l w � 12. Furthermore, the prediction results of models are evaluated by MAE, MAPE, and RMSE. e calculation formula is as follows:

Comparison Algorithms and
where m is the number of test samples and X i and Y i are the ith real and prediction value, respectively.

Model Comparisons.
In this section, we use the data of previous 60 minutes to predict the travel time of the next 15 minutes and 30 minutes, i.e., the input step is set to 12, and the output step is set to 3 and 6, respectively. Table 1 shows the prediction performance of different algorithms. It can be seen that our proposed model outperforms baseline models in different prediction horizons. e prediction accuracy of the statistical model tends to be worse, which is caused by the model's inability to model nonlinear and complex traffic. Besides, LSTM only models the time dependence without considering the spatial correlation, and the prediction performance is also greatly reduced.

Mobile Information Systems
Compare our IGC-Net with the graph convolutional network-based model, all of which model the spatial and temporal characteristics. However, our model performs the best. e reasons are as follows. On the one hand, we consider the daily-periodic and weekly periodic correlations of time-series data; on the other hand, we modify the adjacency matrix of graph convolution.
As can be seen from Figures 4 and 5, the deep learning model has better prediction performance compared with the traditional neural network model, but consumes longer calculation time. DCRNN has the lowest running efficiency because of the time-consuming sequence calculation in recurrent neural network. STGCN has relatively poor prediction performance, despite the short running time. On the basis of integrating spatial dynamic features, the running time of IGC-Net is only slightly higher than those of Graph WaveNet. As the prediction horizon increases, our model becomes more advantageous.
In addition, IGC-Net_E removes exogenous variables, which reduces prediction performance by 1.13%. e prediction performance of IGC-Net_T decreases more significantly due to not considering the temporal dependencies. However, IGC-Net_T outperforms LSTM, which proves the importance of considering spatial dynamic correlation in traffic prediction.

Effect of Input Sequence Length and Prediction Horizon.
For a time-series model, data acquisition and prediction requirements directly affect the final prediction performance. erefore, we conduct sensitivity analysis on the input sequence length and prediction horizon of the proposed model.
As shown in Figure 6, for each input sequence length, the prediction performance decreases as the prediction horizon increases. Intuitively, increasing the prediction horizon will make the model require more input sequence to capture temporal correlation information. However, the prediction results become complex when we increase the input sequence length in two datasets. We find that it cannot meet the requirement of long-term prediction when the input is 3 steps, especially the prediction horizon of 30 min; however, when the input is increased to 12 steps, the prediction result also becomes worst. e reason is that too much historical information causes data redundancy, which will weaken the temporal correlation and reduce prediction accuracy. erefore, to learn more relevant historical information for accurate prediction, it is beneficial to appropriately increase the input sequence length. However, too much or too little input will reduce model performance.

Effect of Spatial Dynamic Modeling.
To verify the performance of the improved graph convolutional network, we use a variant GC-Net, which only contains the adjacency matrix for spatial modeling without considering the spatial dynamic correlation, to carry out comparative experiments. Table 2 shows the comparison results of two models in 5 minutes, 15 minutes, and 30 minutes ahead forecasting. Furthermore, we can more intuitively analyze the comparison results of model performance from Figure 7. IGC-Net achieves better performance in different prediction horizons, which demonstrates the effectiveness of the modified adjacency matrix in spatial dynamic modeling.
Further, to analyze the prediction performance of different road segments more intuitively, the results on December 20-21 are taken as an example. As shown in Figure 8, the GC-Net has a poorer ability to capture the dynamic change of travel time, especially in peak periods.

Conclusions
In this study, we propose an improved graph convolutional network (called IGC-Net) for travel time prediction. We construct distance weight matrix and correlation weight matrix, respectively, to modify the adjacency matrix of traditional GCN. Furthermore, we establish recent, daily, and weekly component to model the temporal dependencies according to the temporal property. Our proposed model can not only capture the static spatiotemporal characteristics but also realize the modeling of spatial dynamic correlation. e comparison experiments are carried out using Beijing road network traffic data. e results prove that our proposed model baseline models and the modified adjacency matrix can significantly improve the model's prediction accuracy.
For future work, we will investigate the applicability of our model to other urban traffic forecasting tasks and further explore the method of dynamic spatial modeling in graph convolutional network.

Data Availability
e data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.