Spatiotemporal Traffic Flow Prediction with KNN and LSTM

,


Introduction
The accurate prediction of future traffic conditions (e.g., traffic flow, travel speed and travel time) is crucial requirement for Intelligent Transportation Systems (ITS), which can help administrators take adequate preventive measures against congestion and travelers take better-informed decisions.Among different applications in ITS, traffic flow prediction has attracted significant attention over the past few decades.It is still a challenging topic for transportation researchers.
Due to the stochastic characteristics of traffic flow, accurate traffic prediction is not a straightforward task.In order to deal with this issue, many techniques are deployed for modeling the evolution of the traffic circulation.These existing prediction schemes are classified roughly into three categories: parametric methods, nonparametric methods, and hybrid methods.The parametric methods include Autoregressive Integrated Moving Average method (ARIMA) [1], Seasonal Autoregressive Integrated Moving Average method (SARIMA) [2,3], and Kalman filter [4,5].The parametric methods are widely used in traffic flow prediction, but these methods are sensitive to the traffic data for different situations.The nonparametric methods include artificial neural networks (ANNS) [6][7][8][9], k-nearest neighbor (KNN) [10][11][12][13][14], support vector regression (SVR) [15,16], and Bayesian model [17,18].Compared to the parametric methods, nonparametric methods are more effective in prediction performance.Even so, nonparametric methods require large amount of historical data and training process.The hybrid methods are mainly combining the parametric approach with nonparametric approach [19][20][21][22][23][24][25][26][27][28][29].Although the prediction accuracy of nonparametric methods and hybrid methods is superior to parametric methods, all these methods mainly considered the data closed to the prediction station, which could not fully reveal the spatiotemporal characteristics of traffic flow data.Vlahogianni et al. [30] summarized existing traffic flow prediction algorithms from 2004 to 2013.Suhas et al. [31] followed a systematic study to aggregate previous works on traffic prediction, highlight marked changes in trends, and provide research direction for future work.Lana et al. [32] summarized the latest technical achievements in traffic prediction field, along with an insightful update of the main technical challenges that remain unsolved.The readers interested in details of models that applied in traffic prediction field could refer to review reference paper.
With the widespread traditional traffic sensors and new emerging traffic sensor technologies, tremendous traffic sensors have been deployed on the existing road network, and a large volume of historical traffic data at very high spatial and temporal resolutions has become available.It is a challenge to deal with these big traffic data with conventional parametric methods.But for nonparametric methods, most are shallow in architecture, which cannot penetrate the deep correlation and implicit traffic information.Recently, deep learning, an emerging machine learning method, has drawn a lot of attention from both academic and industrial filed.Traffic flow prediction based on deep learning methods has become a new trend.
Huang et al. [33] proposed a deep architecture for traffic flow prediction with deep belief networks (DBN) and multitask learning.Lv et al. [34] used a stacked autoencoder (SAE) model to learn generic traffic flow features.Duan et al. [35] evaluated the performance of the SAE model for traffic flow prediction at daytime and nighttime.Soua et al. [36] proposed a DBN based approach to predict traffic flow with historical traffic flow, weather data, and event-based data.An extension of dempster-shafer evidence theory was used to fuse traffic prediction beliefs coming from streams of data and event-based data models.Koesdwiady et al. [37] predicted the traffic flow and weather data separately using DBN.The result of each prediction was merged using dada fusing techniques.Yang et al. [38] proposed a stacked autoencoder Levenberg-Marquardt model to improve prediction accuracy.The Taguchi method was developed to optimize the model structure.Zhou et al. [39] introduced an adaptive boosting scheme for the stacked autoencoder network.Polson and Sokolov [40] developed a deep learning model to predict traffic flows.An architecture was proposed combined with a linear model that was fitted using regularization and a sequence of tanh layers.Zhang and Huang [41] employed the genetic algorithm to find the optimal hyperparameters of DBN models.In recent years, recurrent neural network (RNN) was more practical in comparison with other deep learning structures for processing sequential data.Ma et al. [42] utilized a deep Restricted Boltzmann Machine and RNN architecture to model and predict traffic congestion.However, the traditional RNNs face problems of vanishing gradients and exploding gradients.To solve this problem, a long short-term memory network (LSTM) was proposed.Because LSTM can automatically calculate the optimal time lags and capture the features of time series with longer time span, a better performance can be achieved with LSTM model in traffic flow prediction.LSTM was developed to capture the long-term temporal dependency for traffic sequences by Ma et al. [43].Shao and Soong [44] utilized LSTM to learn more abstract representations in the nonlinear traffic flow data.In recent years, LSTM was very successful in traffic flow prediction, but the spatiotemporal characteristics of traffic flow were hardly considered.Zhao et al. [45] proposed an origin destination correlation matrix to represent the correlations of different links within the road network, and a cascade connected LSTM was used to predict traffic flow.However, the architecture of proposed LSTM model was overly complicated, making comprehension difficult.The prediction results were not very stable and reliable in different observation points.
In this paper, inspired by the successful application of LSTM in traffic flow prediction, the high spatiotemporal correlation characteristics of traffic flow data are considered in order to improve prediction performance.A hybrid traffic flow prediction methodology is proposed based on KNN and LSTM.KNN is used to choose mostly related neighboring stations with the test station.A multilayer LSTM is applied to predict traffic flow in all selected stations.The final prediction results are obtained by weighting the prediction values in all selected stations.The weights are assigned by adjusting the weight dispersion measure with rank-exponent method.The experiment results show that proposed method has better performance on accuracy compared with most existing traffic prediction methods.
The main contributions of this paper are summarized as follows.
(1) A hybrid traffic flow prediction methodology is proposed combined KNN with LSTM, which utilizes the spatiotemporal characteristics of traffic flow data.Experimental results demonstrate that proposed approach can achieve on average 12.59% accuracy improvement compared to ARIMA, SVR, WNN, DBN-SVR, and LSTM models.
(2) The prediction results are obtained by weighting the prediction values in all selected stations by adjusting the weight dispersion measure with rank-exponent method.Different from the traditional weighting method, the proposed method highlights the importance of the highly relevant stations to the prediction result.
(3) From classical understanding, closer stations from the prediction station have more correlation than those further stations.In fact, some further stations have also correlation with the prediction station.However, it is consistent with the general fact that the traffic flow in the upstream and downstream has great influence on the prediction result in the traffic flow prediction.
The rest of this paper is organized as follows.Section 2 gives details on a hybrid traffic prediction method based on KNN and LSTM.In Section 3, the dataset used is introduced for the numerical experiments.The results and performance evaluation are also presented.Finally, the conclusions and the future research are given in Section 4.

Methodology
2.1.LSTM Network.RNN is a neural network that is specialized for processing time sequences.Different from conventional networks, RNN allows a "memory" of previous inputs to persist in the network internal state, which can then be used to influence the network output.Traditional RNN exhibits a superior capability of modeling nonlinear time sequence problems, such as speech recognition, language modeling, and image captioning.However, traditional RNN is not able to train the time sequence with long time lags.To overcome the disadvantages of traditional RNN, LSTM is proposed.LSTM is a special kind of RNN, designed to learn long term dependencies.The LSTM architecture consists of a set of memory blocks.Each block contains one or more self-connected memory cells and three gates, namely, input gate, forget gate, and output gate.The typical structure of LSTM memory block with one cell is in Figure 1.Input gate takes a new input from outside and process newly coming data.Forget gate decides when to forget the previous state and thus selects the optimal time lag for the input sequence.Output gate takes all results calculated and generates output for LSTM cell.
Let us denote the input time series as  = [ 1 ,  2 , ⋅ ⋅ ⋅ ,   ], and  is input sequence length. is the number of inputs,  is the number of cells in the hidden layer, and  is the number of memory cells.The subscripts , , and  refer to the input gate, forget gate, and output gate, respectively.  is the weight of the connection from  unit to unit .   is the network input to some unit  at time , and    is the value after activation function in the same unit.   is the state of cell at time . is the activation function of the gates, and  and ℎ are, respectively, the cell input and output activation functions.The LSTM model can be conducted by the following equations.

Input Gates
Forget Gates Cells Output Gates Cell Outputs By the function of the different gates, LSTM network has the capability of processing arbitrary time lags for time sequence with long dependency.

KNN Algorithm.
KNN algorithm is a nonparametric method used for classification and regression.The KNN method makes use of a database to search for data that are similar to the current data.These found data are called the nearest neighbors of the current data.In this paper, KNN is used to select mostly related neighboring stations with the test station.Suppose there are M stations in the road network.  () = [ 0 (),   ( − 1), ⋅ ⋅ ⋅ ,   ( − )] is the historical traffic flow data in test station, and  is the sample data length.
is the historical traffic flow data in the m th station, which is different from the test station.The Euclidean distance [see (10)] is used to measure the correlation between the test station with others.
According to the calculated distance, a total of K-nearest neighbors are found, and K stations are selected as mostly related stations with the test station.

Proposed Method.
Different form the conventional LSTM network, KNN algorithm is used to select spatiotemporal correlation stations with the test station at first.A two-layer LSTM network is applied to predict traffic flow, respectively, in selected stations.The final prediction results in test station are obtained by weighting with rank-exponent method.At time , the traffic flow data in the test station is denoted as   () = [ 0 (),   (− 1), ⋅ ⋅ ⋅ ,   ( − )].The traffic flow data for  − 1 stations near the test station is denoted as () ( = 1, 2, ⋅ ⋅ ⋅ ) is the station selected by KNN.The prediction traffic flow in the selected stations and test station can be calculated as where  ℎ is the weight matrix between the hidden layer and output layer and  is bias term.The final prediction results in test station are obtained by weighting according to (12).
where   is the weight coefficient.The Rank-Exponent method of weights is used in this paper.Rank-Exponent method can provide some degree of flexibility by adjusting the weight dispersion measure as shown in (13).The value of  is set to 2 as indicated by the authors [46].
where   is the rank of the  ℎ selected station,  is the total number of selected stations, and  is weight dispersion measure.
The flowchart of the proposed method is shown in Figure 2, and the detailed calculation process is shown as follows.
Step 1. Calculate the Euclidean distance between adjacent  − 1 stations with the test station according to (10).
Step 2. Select mostly related stations with the test station.
Step 3. Predict traffic flow with LSTM network, respectively, in selected stations according to (13).
Step 4. Weigh prediction value in selected stations according to (14).
Step 5. Calculate the RMSE for the predicted traffic flow.
Step 7. Find the smallest RMSE in all the different .
Step 8. Obtain the predicted traffic flow in the test station when RMSE is the smallest.

Data Description.
The data used to evaluate the performance of the proposed model was collected in mainline detectors provided by the Transportation Research Data Lab (TDRL) at the University of Minnesota Duluth (UMD) Data Center from March 1st, 2015, to April 30th, 2015.The sampling period of the testing dataset was 5 min.In our experiment, we selected the road network in Figure 3 as the experiment area.The area mainly contains four expressways numbered I394, I494, US169, and TH100.There are 36 stations in the experiment area.The station locations and ID that are used are shown in Figure 3. Stations S339 and S448 are located near a transportation hub in road networks in the experiments.Therefore, they were selected as the test stations for the traffic flow prediction.Due to the similarity of traffic flow on the same workday in different weeks, we used the data in the one workday as train and test data in order to ensure the prediction stability.In our experiment, we chose the traffic flow data on Tuesday.Of course, we can choose any one workday from Monday to Friday.There was a total of 9day traffic flow data on Tuesday in our test dataset.The dataset was divided into two datasets.The data in first 8 days was used as training sample, while the remaining data was employed as the testing sample for measuring prediction performance.The most commonly used prediction interval is 5 min, and we also select the prediction time interval as 5 min, and it is verified to be reasonable by the real experimental results.
Traffic flows for 5 consecutive Tuesdays are shown in Figure 4 in the station S339, and typical traffic flows are shown in Figure 5 in the station S339 and four neighboring stations.From Figure 4, we can see that there is a little difference in the rush hours; however, the profiles of the traffic flows are basically consistent.From Figure 5, it can be seen that there are some differences in different stations, but the data distribution is similar to the station S339.Because traffic flow data has high spatiotemporal correlation characteristics, it is effective to improve traffic prediction accuracy with the spatiotemporal correlations.

Performance Indexes.
In order to evaluate the prediction performance, Root Mean Square Error (RMSE), which was the most frequently used metrics of prediction performance in previous work, and predicting accuracy (ACC) were chosen to evaluate the difference between the actual values with predicted values.
where  is the length of prediction data and   and ŷ are the measured value and predicted for i th validation sample, respectively.

Results and Discussions
4.1.Results Analysis.In our experiment, stations S339 and S448 are chosen as the test stations, which are located in the two directions of the road network.The timesteps are an important hyperparameter, which are the input size to the model and determines number of LSTM blocks in each level.Through experiment, when timesteps are set as 6, the prediction performance can achieve the optimal value.To validate the efficiency of the proposed method, the performance is compared with some representative approaches, including ARIMA model, SVR, wavelet neural network (WNN), DBN, and LSTM.In SARIMA model, AR and MA order are set as 5 and 4, and normal and seasonal differencing order are set as 1 and 2. In SVR model, kernel function is set as Radial Basis Function (RBF), the penalty parameter of the error term as 300, and the iteration number as 1000.In WNN model, the number of hidden nodes is set as 6, the learning rate as 0.001, and the iteration number as 500.For DBN model, 3-layer architecture is used, and the number of nodes in each layer is set to 128 for simplicity.
The predicted results of different models and real traffic flow are shown within one day in Figures 6 and 7.It is observed that the predicted traffic flow has similar traffic patterns with the real traffic flow and the prediction value of the proposed KNN-LSTM model is almost coincided with the measured data, especially in morning and evening peak hours.The RMSE and ACC for different models are shown for stations S339 and S448 in Table 1.It can be seen that the proposed method has the minimum RMSE.The average ACC for the proposed method is 95.75%, which improve by 28.92%, 8.31%, 14.44%, 6.95%, and 4.32% compared with other models.The traditional ARIMA model has the worst prediction performance, which assumes the traffic flow data is a stationary process but this is not always true in reality.
The SVR and WNN method receive better RMSE and ACC than the ARIMA model, while they show weakness when compared with the deep learning methods.The DBN model has also no obvious advantage over SVR.

Discussions.
In this paper, KNN is used to select mostly related  stations with the test station.The different  values have different prediction performance.We search for all possible values for , the corresponding  is the optimal value when the RMSE is minimum.The optimal  is set as 10 for the station S339 in our experiment, and the ID numbers of selected stations are S339, S340, S341, S321, S337, S342, S338, S344, S336, and S293.The optimal  is set as 6 for station S448, and the ID numbers of selected stations are S448, S447, S446, S450, S737, and S452.As shown in Figure 3, it can be seen that almost all of the selected stations are located in upstream and downstream in the test stations.From classical understanding, closer stations from the prediction station have more correlation than those further stations.
In fact, some further stations have also correlation with the prediction station.For the test station S339, the closer station S343 is not selected, and closer station S451 is not selected for the test station S448.However, it is consistent with the general fact that the traffic flow in the upstream and downstream has great influence on the prediction result in the traffic  flow prediction.When  = 1, the temporal correlation is only considered, the average ACC is 91.43% which is decreased by 4.32% compared with the proposed method.It indicates that spatiotemporal features have important roles in the traffic prediction.These results verify the superiority and feasibility of the KNN-LSTM, which employ KNN to capture the spatial features and mine temporal regularity with the LSTM networks.

Conclusions
In this paper, we proposed a spatiotemporal traffic flow prediction method combined with KNN and LSTM.KNN is used to select mostly related neighboring stations that indicated the spatiotemporal correlation with the test station.A LSTM network was applied to predict traffic flow,  prediction results in test station are obtained by weighting with rank-exponent method.We evaluated the performance of our model with real traffic data provided by TDRL and compared with ARIMA, SVR, WNN, DBN, and LSTM model.The results show that proposed model is superior to other methods.Since the traffic flow data is affected by weather, incident, and other factors, the impact of these factors on traffic flow data will be further studied so as to improve the prediction accuracy.

Figure 1 :
Figure 1: LSTM memory block with one cell.

Figure 2 :
Figure 2: The flowchart of the proposed method.

Figure 3 :Figure 4 :
Figure 3: The ID and locations of stations in our experiment.

Figure 6 :
Figure 6: The real and predicted traffic flow in S339.

Figure 7 :
Figure 7: The real and predicted traffic flow in S448.

Table 1 :
Prediction performances of different models., in selected stations.LSTM is able to exploit the long-term dependency in the traffic flow data and discover the latent feature representations hidden in the traffic flow, which yields better prediction performance.The final respectively