TaxiInt: Predicting the Taxi Flow at Urban Traffic Hotspots Using Graph Convolutional Networks and the Trajectory Data

. Taxi ﬂow is an important part of the urban intelligent transportation system. The accurate prediction of taxi ﬂow provides an attractive way to ﬁnd the potential traﬃc hotspots in the city, which helps to avoid serious traﬃc congestions by taking eﬀective measures in advance. The current prediction of taxi ﬂow and its impact on urban transportation are closely related to the passenger origin-destination (OD) information. However, high-quality OD information is not always available. To address this problem, a prediction model, named as TaxiInt, is proposed in this study. Diﬀerent from other density-clustering-based approaches, neural network, or OD information based models, TaxiInt predicted the taxi ﬂow using the trajectory data of taxis. The spatial features and temporal features of each road were extracted using a graph convolutional network, which was trained with the road network information and the trajectory data. The experiments carried on a real taxi dataset showed the validity of our model. It can predict the taxi ﬂow at a given urban intersection with high accuracy.


Introduction
Taxi is a comprehensive reflection of urban traffic. It provides information regarding not only the traffic situation but also the trend of crowd activities. e accurate prediction of taxi flow helps to find the potential traffic hotspots in the city to take effective measures to avoid the coming traffic congestions. During the past few decades, the research on the prediction of taxi flow has attracted extensive attention [1]. Balan designed a trip information system to predict the fare and trip duration of the taxi ride the passengers were planning to take. e authors claimed that the accuracy and the real-time performance were validated by large scale evaluation [2]. Li conducted a similar work. e authors proposed a hybrid model coupling the deep learning model and the quantile regression aiming at the travel time prediction [3]. Kong proposed a framework called as TBI2Flow to predict the taxi passenger flow [4]. In addition, the prediction of taxi flow was believed to be beneficial for optimizing public transportation planning [5] and discovering unreasonable urban planning [6].
Currently, the taxi passenger's pick-up and set-down feature, called origin-destination (OD) information, is one of the most frequently used features in taxi flow prediction. Some researchers used machine learning models to analyse taxi data, such as density-clustering models (DBSCAN) [7], support vector machines (SVM) [8][9][10], and k-nearest neighbour (KNN) [11]. Some combined the basic models to create a novel one and then used it to analyse the data. Li [12] proposed a combined model to forecast the potential passenger demand in different regions based on Daubechies wavelets analysis and least squares support vector machine (LS-SVM). Recently, convolutional neural networks (CNN) were used to discover the spatial characteristics of vehicle travel, and cyclic neural networks were also employed to learn the periodic and trending regularities of travel data. Yao [13] designed a deep multiview space-time network to predict the taxi demand. Liu [14] proposed a contextualized spatial-temporal network on the prediction of taxi demand with the spatial, temporal, and global correlation information fully considered. Xu [15] employed incorporating graph attention and recurrent architectures to forecast the demand for taxis in a city-wide area. ese tentative studies have achieved desired successes. But their prediction highly depends on the passenger's OD information, which may be easily affected by many external factors, such as the failure of the recording instrument, a mistake operation from the driver, or the interruption of calling taxis from the mobile APPs. What is more, the OD information is a digital state in most cases. Once the data are missed or disturbed, it is very difficult to reconstruct them.
Compared to the OD information, the trajectory data of a vehicle, which are obtained from the positioning system (like GPS), are continuously updated and become more reliable. If some of the trajectory data are lost, they can be reconstructed through the nearest valid track points. Due to this attractive advantage, a variety of trajectory-based models were proposed for the traffic flow prediction. Xu [16] proposed WTFPredict methodology to make short-term traffic flow forecasting, which is based on taxi data and weather data. Zhang [17] used taxi data to predict short-term flow trends in urban areas to analyse urban crowd mobility. Li built a feature-level fusion model to fuse the representative features extracted from the temporal and spatial features of traffic data [18]. A deep learning approach was proposed to extract the complex features of traffic flow and then predict the short-term traffic flow forecast with high accuracy and stability [19].
Recently, the graph neural network (GNN) became an increasingly used network in traffic flow prediction. Compared with the common convolution, the convolution kernel of the GNN has a flexible number of neighbour nodes which makes the GNN more suitable for complex traffic applications. Lv introduced the GNN for the analysis of traffic network resilience [20]. Cui proposed the HGC-LSTM framework, which was based on the GNN and LSTM, to learn interactions between links in the traffic network [21].
In this work, a novel framework was proposed based on the GNN. Unlike the existing methods which analysed the taxi flow of largescale urban areas, the framework proposed in this study focused on the prediction of taxi flow at urban traffic hotspots which may have a greater impact on the overall traffic condition. First, the trajectory data of taxis were converted into traffic flow data of core intersection nodes in the road network. en, a GNN model and a time series network are created to capture the spatiotemporal information of intersection traffic flow. Finally, a model named TaxiInt was used to predicate the taxi flow at intersections. TaxiInt consists of three separate components that simulate three characteristics of traffic flow, respectively [22], and each component consists of an attention mechanism, a graph convolution network, and a common convolution network. e method proposed in this work has the following advantages: (1) Fewer requirements on the taxi's dataset. TaxiInt does not depend on the passenger OD information. It uses the trajectory data to predict the traffic flow at urban intersections. (2) Model reliability. ree characteristics of the traffic flow are extracted by three separate components of TaxiInt. at spatiotemporal information extracted would be more reliable than existing baseline models.

TaxiInt Framework
In this section, we introduce our TaxiInt framework which referenced the network presented in [22]. As shown in Figure 1, the overall framework of TaxiInt consists of three parts: the data sources, the time-based road network traffic information change graph, and neural network structure. In part A, the data source used by TaxiInt is displayed that includes the taxi trajectory data, the urban road network data, the weather data, and the weekday label information. In part B, the road traffic information at the selected area for some time is displayed in the form of a time axis. e redder side in the figure represents the denser taxi traffic. e different coloured patches represent different time points on the time axis. Part C is a schematic diagram of the combination of timing diagram and neural network structure diagram. e meanings expressed by the colour blocks on the top time axis are consistent with the content of part B. e overall framework includes 3 subunits, each of which contains 2 ST blocks to capture traffic flow timing information and road network space information.

Preliminaries.
Here, we define a road information network as an undirected graph G � (V, E, A), as shown in Figure 1 part B, V is a node set, N＝|V|, E is an edge set, which reflects the link between nodes, and A ∈ R N×N is an adjustable matrix based on G. Next, we define the input data ∈∈R N×F , which were already converted from trajectory data, for each time slice, and F means the number of the node features. Since the input data are composed of multiple time slices, we introduce X T � (X 1 , X 2 , . . . , X T ) to represent the entire input data stream, and T is the time slice number. Figure 1 part C, in the first part, we feed the data to the "spatial-temporal attention" structure of the network model, which is composed of "spatial attention" component and "temporal attention" component. By using these components, we can capture the dynamic information, like spatial and temporal correlations, from the road information stream.

Spatial Attention Component (SAtt).
In the spatial dimension, taxi flow at each intersection may be affected by the flow value at adjacent intersections. In order to increase the sensitivity of the network model to the traffic data in the spatial structure of the road network, we introduce the at-tention mechanism to make the model more sensitive to the changes of the spatial correlations. where and W 3 ∈ R C r−1 are the parameters that to be learned, and C is the number of channel. In the layer, we use σ as the activation function. From the attention matrix S, we can get the correlation weight between the graph nodes, which will be dynamically adjusted according to the input stream of the layer. In the final part of the component, we choose the softmax function to ensure that the sum of the weight nodes is one.

Temporal Attention Component (TAtt
where are the parameters to be learned. Like the matrix S, the matrix E is automatically capturing the changes in the input stream, making the entire network sensitive to traffic trends in the time dimension. en, we choose the softmax function to ensure that the sum of the weight nodes is one and get E ′ .

Spatial-Temporal Convolution.
e SAtt and TAtt components capture important information in the input stream automatically. en, the adjusted input stream would be fed into the spatial-temporal convolution component, which consisted of a graph convolution in the spatial dimension and a common convolution along the temporal dimension.

Graph Convolution Component.
Under different road and time conditions, the nodes can be regarded as the change signals of the graph. erefore, in order to make full use of the spatial nature of the road information network, we use spectral convolution to process the signal of the whole graph in each time slice and capture the spatial dependence of the neighbour nodes through the signal correlation.
In spectral graph analysis, we can obtain the properties of the graph structure by analysing the Laplacian matrix and its eigenvalues. e graph convolution is a convolution operation implemented by using the linear operators that diagonalize in the Fourier domain to replace the classical convolution operator [23]. However, they are not efficient when dealing with largescale graph networks. erefore, we adopt Chebyshev polynomials to solve the task approximately but efficiently [24]:

Common Convolution Component.
After capturing spatial dependencies from neighbours, we set standard convolution layers in the temporal dimension, which is used to update the signal of nodes from the neighbouring time slice. e following is an example of a formula on the r th layer: where * defines a common convolution process, and Φ is the parameter used in the temporal dimension convolution kernel. In the layer, we select ReLU as the activation function.

Multicomponent Fusion.
In the final part of the network model, we integrate the output of the three components. In general, there are differences of the temporal and spatial distribution of taxi demand in urban areas with different social functions. Even for the same area, taxi flow is different at different time slices. erefore, the integration formula is defined as follows: where ⊙ is a Hadamard product operation. W h , W d , and W w are the three learn and adjustable parameters that reflect different degrees of y h , y d , and y w effects on the final forecasting target.

Experiments
In this section, we introduce the taxi dataset, baselines, and evaluation metrics of our experiment. e results of our TaxiInt and test baselines are displayed.

Dataset and Data Preprocessing.
e data used in this work consisted of three parts: the trajectory data, the weather data, and the road network data. e trajectory data were gathered from the GPS recorder, from March 1, 2018, to March 31, 2018. e entire dataset includes 12,544 vehicles and 1,087,825,260 records in total. Each record includes five elements: taxi ID, latitude and longitude, taxi speed, passenger status, and time tag. e sampling frequency of GPS data was 22 s. Table 1 presents a typical sample of the trajectory data. e weather data were fetched from the "wunderground" website (https://www.wunderground. com), which collected daily precipitation in Hangzhou in March 2018. e road network data were obtained from the "Open-StreetMap" website. Figure 2 shows the road network map of Hangzhou city, which was used in this work. e areas highlighted with orange on the map are the radiation range of our selected road points, which are observation nodes for our model training. e road network information of the selected area includes some trunk roads, scenic areas, commercial areas, and a small residential area. We set the time interval to 5 minutes and count taxis that cross the intersections in each interval, including taxi number, taxi speed, and traffic flow of the adjacent intersections within 1 km.
ere are six traffic hotspots in the area as shown in Figure 3.

Figures 3(a) and 3(e) represent Hangzhou's transportation hub, Figures 3(c) and 3(f ) represent
Hangzhou's urban core area, Figure 3(b) is the education area, and Figure 3(d) is the residents' living area. is study selects the taxi trajectory data of the above areas to verify the prediction effect of TaxiInt.
Since we assumed the OD information was distorted, so the passenger status was removed from our model. And then, in the data cleaning process, the format of the dataset is converted. We deleted passenger status and filtered the taxi IDs which are not in full attendance during the month (taxis that occurred less than 31 days in March 2018). Finally, track records of 12,196 vehicles were left. is model uses 328 road points from the city road map for training, and most of them were located near the intersections. Figure 4 schematically describes the process of counting the vehicles passing by each intersection in each period. Black dots represent intersections. e black dotted circle represents the preset intersection observation range in this article, and its radius is R_range, as shown in point F. e yellow dotted line is the vehicle trajectory. e solid blue line represents the road. We set an intersection range threshold R_range, when a vehicle enters an intersection range, we would count vehicle information. As shown in Figure 4, the vehicles driving sequence is "A-C-B-E-F-CD," so the traffic flow statistics of intersections A, B, D, E, and F increase by one, and intersection C increases by two. e average speed of each intersection is also calculated using a similar method. e trajectory of the vehicle near the intersection C shakes like a cluster. Such problems are often caused by waiting for the traffic lights. It is necessary to filter the vehicle information that repeatedly appears at an intersection in a short time. After the fore processes, we finally got 2 datasets: (1)    Journal of Electrical and Computer Engineering 5 road network adjacency matrix A(328 × 328), which was used to record the road distance between nodes, and (2) node information matrix X(8928 × 328 × 5), which was used to record taxi flow, average speed, number of taxis around the node, weather precipitation, and weekday/ weekend status, in each time slice. Since the urban traffic flow changes weekly, in order to ensure the training effect of TaxiInt (during the model training process), this study divides the one-month dataset into 3 parts, including 21-day training set, 7-day verification set, and 3-day test set. e method of splitting the data for model training is a common method in machine learning [25][26][27].

Baselines.
We compared TaxiInt with the following three baselines, and the performance of each method was evaluated by the metric of mean absolute error (MAE).
HA (historical average method): this method predicts the value by calculating the average of the last 12 values LSTM (long short-term memory): LSTM is a time series recurrent neural network [28] LCTFP: A model based on CNN and LSTM used to predict freeway traffic flow [29] 3.3. Results. TaxiInt predicts the taxi flow of each intersection in the next hour. Since the model set the time interval to 5 minutes, 12 numbers are needed to be calculated for each prediction. Figure 5 shows the changes in loss values during the training of the TaxiInt model. e vertical axis represents the loss index, and the horizontal axis represents the iteration period. As shown in Figure 5, during the training process, when the iteration number reaches 1500, the output loss tends to be stable. We retain the learned parameters after 2000 iterations and predict taxi flow of six local areas of Hangzhou on March 30 and March 31. Table 2 provides the MAE results of short-term prediction (forecast results for the first 5-minute interval). It has four models and 24 predicted results in 6 regions. e six selected areas are the hotspots of Hangzhou that include the transportation hubs, core urban areas, residential areas, and education areas. e values represent the accuracy of models for short-term intersection traffic prediction. Figure 6 shows MAE results of the long-term forecast effect (forecast results of all 12 time segments) and reflects prediction changes in different models after time interval increases. It can be found that TaxiInt is superior to baselines in each region selected in this article. By observing Table 2 and Figure 6, we can find that the TaxiInt model is superior to all baselines in 6 regions, and the HA model is similar to the LSTM model in short-term prediction, and the prediction from the HA model is more

Conclusions
e research of taxi data is a major issue in the field of smart transportation. e forecast of taxi flow is more vulnerable to the local road network. is study presented the TaxiInt model, a convolutional neural network with an attention mechanism and spatial-temporal correlation of taxi flow embedded. TaxiInt focuses on the learning of distribution of taxis in different city blocks from different time slices. By converting trajectory data into a graph structure, it can predict the taxi flow of backbone road nodes at urban hotspots in different time slices. It removes the dependence on the OD information from the prediction and reduces the requirements for the high-precision datasets. Moreover, compared with the passenger OD information, the trajectory distribution of taxis contains much more information about the traffic conditions. e experimental results demonstrate the effectiveness of the model in predicting the taxi flow at hotspots. In future research, we intend to introduce more information about the activities of urban residents to expand and enrich the functionalities of the model. So, it can be helpful for the city's municipal planning.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.