^{1}

^{1}

^{2}

^{1}

^{2}

Traffic forecasting is an important prerequisite for the application of intelligent transportation systems in urban traffic networks. The existing works adopted RNN and CNN/GCN, among which GCRN is the state-of-the-art work, to characterize the temporal and spatial correlation of traffic flows. However, it is hard to apply GCRN to the large-scale road networks due to high computational complexity. To address this problem, we propose abstracting the road network into a geometric graph and building a Fast Graph Convolution Recurrent Neural Network (FastGCRNN) to model the spatial-temporal dependencies of traffic flow. Specifically, we use FastGCN unit to efficiently capture the topological relationship between the roads and the surrounding roads in the graph with reducing the computational complexity through importance sampling, combine GRU unit to capture the temporal dependency of traffic flow, and embed the spatiotemporal features into Seq2Seq based on the Encoder-Decoder framework. Experiments on large-scale traffic data sets illustrate that the proposed method can greatly reduce computational complexity and memory consumption while maintaining relatively high accuracy.

Traffic forecasting using timely information provided by Internet of Things technology (IoT) is an important prerequisite for the application of intelligent transportation system (ITS) [

This paper mainly studies the problem of urban traffic forecasting based on the Internet of Things technology (IoT) in large urban road traffic networks. This problem is how to use historical traffic flow data to predict traffic flow data in future timestamps in large urban road traffic networks. In the literature, there has been plenty of studies in traffic forecasting, including traffic volume, taxi pick-ups, and traffic in/out flow volume. Initially, numerous statistical based methods, such as Historical Average (HA) [

In order to solve the above problem, we propose forming the road network into a geometric graph and constructing a spatiotemporal graph convolution network based on the abstract graph to capture the spatiotemporal features of traffic flow for prediction. We propose using GCN as the spatial topology extractor of the model and applying the sampling method [

Urban traffic flow prediction is based on historical traffic flow sequences, which are highly time-varying, nonlinear, and uncertain. The traffic flow in the road network usually has the following temporal characteristics [

At a certain time, traffic flow also has some spatial characteristics, such as the impact of traffic flow upstream and downstream of the road on the current road, and the rules of speed limit and traffic flow limit of the same level of road.

In view of these two main influence factors, especially considering the large scale of the road network [

It uses recurrent neural network to capture the long-term temporal dependency of traffic flow and the graph convolution neural network (GCN) to capture the spatial correlation among roads in different geographical locations. At the same time, importance sampling is applied to GCN to reduce the computational complexity of large road networks.

Given an undirected graph

As a semisupervised model, GCN can learn the hidden representation of each node. The hidden vectors of all nodes in layer

The traffic flow of a road is affected by the traffic flow of the surrounding roads and the historical traffic flow of the road itself, so the prediction model should consider these two factors. To model the temporal dependency of historical traffic on the road, GRU unit is embedded in the Seq2Seq model based on Encoder-Decoder framework to complete sequence prediction, and, to model the spatial correlation among neighbor roads, FastGCN is used in the traffic map of the road network to reduce the computational complexity and improve the efficiency. We integrate a model for quickly extracting spatiotemporal features, so we propose the FastGCRNN (Fast Graph Revolution Recurrent Neural Network) model. The overall architecture of the model is shown in Figure

FastGCRNN model.

This model mainly includes six parts, namely:

The whole FastGCRNN model adopts the Seq2Seq model based on the Encoder-Decoder framework, which can use traffic flow of each road within the road network to predict the future traffic flow. Firstly, the continuous traffic flow data

Each road in the urban road network does not exist in isolation but connects with the surrounding roads to form a whole. The traffic flow between roads is interactive; particularly, on the two-way road, there are vehicles flowing in and out. To model spatial correlation of traffic flows among road networks, we abstract the roads in road networks as nodes and their intersections as edges, as shown in Figure

Construction process of road network graph.

In order to consider the influence of multihop in GCN, the number of layers of GCN will be increased recursively to realize the information exchange between multiple upstream and downstream roads. However, the recursive neighborhood expansion across layers poses time and memory challenges for training with large, dense graphs. To solve this problem, the FastGCN method is used, which interprets GCN as the integral transformation of the embedded function under the probability measure. The integration at this time can use the Monte Carlo method for consistency estimation, and the node training in the graph can also be performed in batches. Because the node training is carried out in batches, the structure of the graph is not limited; that is, when performing test prediction, the number of nodes and the connection relationship in the graph can change, and it does not have to be the same as the graph structure during training. This increases the generalization ability and scalability of the model to a certain extent.

The nodes in the graph of FastGCN can be regarded as independent and identically distributed sampling points that satisfy a certain probability distribution, and the calculated loss and convolution results are expressed as the integral form of the embedding function of each node. The estimation of integration can be expressed by Monte Carlo approximation which defines the sampling loss and sampling gradient. In order to reduce the variance of estimation, the sampling distribution can be further changed to make it more consistent with the real distribution. For example, the simplest way is to use uniform distribution for sampling convolution. The improved method is to use importance sampling to make it continuously approach the real distribution and reduce the error caused by sampling.

If a node

GCN in the form of integration is integrated by Monte Carlo method, and then it is transformed into the discrete form of sampling. At layer

If each layer of convolution uses this method for sampling and information transfer, after layer

In the above integral form of GCN, the embedded information expression of node

Here is an example to illustrate the advantages of FastGCN compared with GCN, if the abstract road network graph has 5 nodes and 6 edges, as shown in Figures

The process of GCN performing a convolution operation. (a) Convolution process of node A. (b) Convolution process of node B. (c) Convolution process of node E.

Convolution operation process in a batch of FastGCN under sampling distribution. (a) Sampling convolution operation of node A. (b) Sampling convolution operation of node B. (c) Sampling convolution operation of node E.

In GCN, each epoch must be put into a complete graph, instead of using only a few nodes in the graph; that is, each node in the graph needs to convolute and exchange information with all other nodes in the graph. In FastGCN, we decompose the large graph into several small graphs by batch operation and put them into memory, as well as the method of sampling to remove the information exchange with some low correlation nodes. Each node only interacts with the sampled nodes in the graph. As shown in Figure

For the sampling method, in order to make the sampling closer to the real connected nodes, FastGCN does not use uniform sampling [

In the experiment, only two FastGCN units were used to extract spatial features. This is because we need to avoid the problem of oversmoothing [

This is a key issue to effectively capture the long-term temporal dependence of traffic flow. The observed value of each timestamp is shown in Figure

Traffic flow data with graph structure at different timestamps.

LSTM and GRU are commonly used in time series prediction. Both models use gating mechanisms to remember as much long-term information as possible and are equally effective for various tasks. To maximize efficiency, we chose GRU with relatively simple structure, fewer parameters, and faster training ability. GRU unit has update gate, reset gate, and memory unit, which can make it have a process of screening memory for historical data, so it can retain long-term memory. In GRU, time sequence information is saved by memory unit, which can capture long- and short-term memory in time and improve the accuracy of prediction.

In order to complete the sequence prediction, the Seq2Seq model based on the Encoder-Decoder structure is used. Seq2Seq puts the input history sequence into GRU, extracts the timing features, and obtains the hidden state vector

In order to illustrate the role of the model in the large graph, 1865 roads in Luohu District of Shenzhen city are selected for the experiment, and the specific roads and areas are shown in Figure

Part of the road network map of Luohu District, Shenzhen.

To calculate the traffic flows in each road, we map the GPS coordinates to the corresponding roads through the Frechet method [

Shenzhen taxi GPS record information example.

Road_id | Car_id | Time |
---|---|---|

92230 | 02341 | 2015-01-01 00 : 03 : 46 |

92230 | 03982 | 2015-01-02 06 : 23 : 12 |

… | … | … |

In data preprocessing, the taxi data in Shenzhen is transformed into the form of continuous timestamps on the road network, i.e., the traffic data shown in Figure

Remove duplicates

Count

The biggest advantage of FastGCRNN model is that it can be applied to large graphs, and it can reduce the computational complexity without losing the accuracy of the model. On the road network data of Shenzhen, the experiment is conducted with the traffic flow series of different time intervals to compare with some classic traffic flow prediction models: (1) HA, (2) ARIMA, (3) SVR, (4) LSTM, (5) ConvLSTM, (6) GCRN [

Comparison of results between FastGCRNN model and other traffic flow prediction models.

Model | Time | |
---|---|---|

RMSE | ||

5 min | 30 min | |

HA | 19.502 | 23.158 |

ARIMA | 17.541 | 19.097 |

SVR | 17.895 | 19.005 |

LSTM | 13.102 | 16.930 |

ConvLSTM | 19.481 | 21.038 |

GCRN | 11.892 | 16.265 |

GCRNN-nosample | 9.950 | |

FastGCRNN | 16.2734 |

From the table results, we can find that FastGCRNN model has reached the best prediction performance in terms of RMSE. In these comparison models, HA, ARMIMA, SVR, and LSTM only consider the temporal correlation without considering the spatial correlation, which is also one of the reasons for their poor accuracy. ConvLSTM divides the urban area into a grid and maps the traffic volume in each time period to the grid, and the traffic volume is regarded as the pixel value of the grid. Although this method considers the spatial correlation of vehicle flow, it also loses the topological structure relationship of the road network graph.

For verification, the proposed GCRNN can reduce the computational complexity, compared with the GCRN model, which also captures the topology information of the road network; the result is shown in Figure

Time consumption of training an epoch with different models.

In Figure

In FastGCRNN, each sampling point has a certain effect on the accuracy and training time of the model. When using 1685 roads in Shenzhen for experiments, different sampling sizes were set to compare the accuracy and time changes. The experimental results are shown in Figure

RMSE and training time when using different sampling sizes in two layers of FastGCN.

From the experimental results, it can be seen that choosing different sampling sizes has little effect on accuracy, and it does not necessarily mean that the more the samples, the more the information obtained, and the better the prediction effect. For example, the accuracy of sampling 50 nodes for each layer in the figure is not the best, because there are “bridge” type (other nodes affecting the central node will spread to other unrelated distant areas) and “tree” type (other nodes affecting the central node will be limited to the small area to which the node belongs) of connection relationship between nodes [

Distribution of node degree of road network graph in Shenzhen.

And we compared the time consumption of FastGCN and standard GCN in different sizes of graphs. The experimental results are shown in Figure

Time consumption of FastGCRNN and GCRNN unsampled models at different graph sizes.

From the experimental results, it can be seen that FastGCRNN has obvious advantages in dealing with large graph problems. Particularly, when the size of graph reaches a certain degree, FastGCRNN is still running normally when GCRNN-nosample model has overflowed memory and cannot be trained.

This paper mainly deals with the problem of large graphs with spatiotemporal properties by constructing the FastGCRNN model and applies them to road network traffic graphs. The model predicts the traffic flow by extracting the temporal and spatial attributes of the traffic flow on the large-scale road networks. Among them, FastGCN is used to extract the topological structure in the space and accelerate training and reduce complexity. GRU is used to extract time series features, and the Seq2Seq model based on the Encoder-Decoder framework can complete sequence prediction tasks of unequal length. The most prominent advantage of this model is the FastGCN embedded in it, which uses the sampling method to accelerate the extraction of spatial features, reduce computational complexity, and improve efficiency. Moreover, the model is not prone to memory overflow in processing large-scale graph-structured data.

It is worth mentioning that this model is not only applicable to traffic flow data, but also applicable to all graph structure data with spatiotemporal characteristics, especially the largerscale data.

The data used to support the findings of this study are available upon request to Ya Zhang,

The authors declare that they have no conflicts of interest.