Highway Travel Time Prediction of Segments Based on ANPR Data considering Traffic Diversion

,


Introduction
With the increase of the cooperative vehicle infrastructure system (CVIS), the informationization of highways tends to be significant for the construction of smart highways. Many pilot projects of smart highways have been proposed in China, such as 116 km of Yanchong Highway (Beijing), 248 km of Huhangyong Highway, 161 km of Hangshaoyong Highway (Zhejiang), and 885 km of Hunchun-Wulanhaote Highway. ITS, which generally aims at promoting effective urban planning, route decision, and other traffic applications, requires accurate traffic travel time prediction.
However, it is a well-known challenging task, especially when involving the trip travel time distribution for an arbitrary origin-destination (OD) pair. Existing travel time estimation approaches do not fully explore the data pattern of each segment that can help improve the traffic congestion problem. For example, many deep-learning-based models have been recently proposed, in which the traffic data are processed in the form of time series. Some simple convolutional neural networks (CNNs) of feature extraction only generate partial level features and ignore the relationship of vehicles entering or exiting segments that may affect prediction performance. In order to learn segment temporal patterns, other methods directly stack a series of images in chronological order [1,2]. For instance, simple long short-term memory (LSTM) learns the regular temporal representation in nonlinear traffic flow data [3] and captures the inherent characteristics of long-term dependencies in sequential data. ese characteristics make it a suitable choice in traffic prediction. Liu et al. [4] combined convolution and LSTM to form a Conv-LSTM module, which extracts relatively complex road information for traffic flow prediction.
Both CNNs and LSTM have succeeded in the field of image processing and sequence prediction. In the trend of traffic prediction methods, more and more people have adopted CNNs and LSTM modules. But they often neglect the characteristics of different segments and the effective use of models. In order to solve the aforementioned problems, we propose a hybrid model entitled LSTM-CNN. In our model, we use the LSTM module to capture travel correlations to enhance the relationship and expand feature propagation at the CNN layer. We apply residual network [5,6] to the CNN layer to make it easier to train deep networks. Attention mechanism [7,8] is applied to assign different levels of attention features based on prediction target to discover potential dependencies of highway data. By combining these technologies and mechanisms, this study will explore the potential of travel time features of automatic number plate recognition (ANPR) data with consideration of traffic diversion and other relevant information, which contributes to the management of the smart city.
e main contributions of this paper are summarized as follows. First, a large number of studies about the highway problems are only based on the location of detectors to obtain traffic data. But on the highway, due to the high cost, the distance between adjacent detectors is often far, so it is impossible to obtain a more detailed traffic state directly. It is necessary to introduce a generalized extended-segment data acquirement mode for the highway to predict accurately and find out the source of congestion, and we divide the segment into different sections based on ANPR. Second, the residual network layers (ResNet) are proposed to dig out deep features of spatiotemporal highway segments, which consist of three pairs of 1D convolutional layers and the BatchNorm, followed by a rectified linear unit (ReLU) activation layer.
ird, to smooth the noisy data, we applied the method of winsorization. Fourth, a four-step mechanism combining with a hybrid spatiotemporal model can achieve the multidimensional final prediction. e remainder of the paper is organized as follows. Section 2 summarizes the present development of the travel time prediction, while in Section 3, the methodology framework of travel time prediction is proposed, and its details are described. In Section 4, training and test data are presented, as well as the analysis of the data. In Section 5, different methods are compared with the proposed LSTM-CNN in terms of the accuracies based on the case study in this study. Finally, conclusions are given in the last section.

Literature Review
Travel time information is a basic component of advanced traveler information systems, which is currently the main research of ITS. is research is driven by the development of information collection and communication technologies that improve the accessibility of various types of traffic data. For example, ANPR data can be used to model the key factors that contribute to the travel time variations, such as the entry and exit of vehicles and traffic diversion on different segments of highways. Moreover, in the past few decades, more efforts have been invested in developing novel methods and strategies for traffic time prediction with a high requirement on the accuracy and reliability of forecasting. Linear time series analysis has been widely accepted and applied to this research area, such as linear regression models [9], autoregressive integrated moving average (ARIMA) [10,11] and its extensive extended applications (including spatial-temporal ARIMA [12,13] and seasonal ARIMA [14,15]), and Kohonen self-organizing initial classifier [16]. Other studies consider predicting short-term travel time on highways using Bayesian dynamic linear models (DLM) [17], Kalman filtering [18], nonparametric regression models [19], support vector machine models [20,21], and XGBoost [22], while multiple support vector regression (SVR) models [23] have been used to predict iterative time series. e hidden Markov model (HMM) is performed to forecast short-term traffic during freeway peak periods [24,25]. Research on traffic prediction by deep learning methods is gradually emerging, and a bilinear recurrent neural network (BLRNN) is used [26], such as LSTM models [27,28], automatic encoder [29], and CNN-based methods [30,31]. ese methods have abilities to extract features effectively from traffic data.
Furthermore, hybrid models of deep learning are popular because of LSTM units that can find temporal relationships from input sequences and have the ability to extract information features through convolution operations [32][33][34]. Traffic flow data, which is similar to frequently studied data in the area of machine learning such as video and audio, have plentiful characteristics in both time and space domains. For example: in the space domain, traffic flow patterns sometimes have strong dependencies on nearby locations (topological locality); in the time domain, the present traffic flow will influence the future one. Motivated by the successes of the CNNs and the LSTM and with consideration of the characteristics of travel time, a CNN and an LSTM are combined as the basic frame of the proposed method in this study. Moreover, the attention mechanism is adopted to improve result quality because it can consider differences between input features, which have been proven to be successful in a wide range of tasks [35,36].
e improved hybrid models that combine CNNs and LSTM can achieve higher prediction accuracy [37,38]. In addition to directly splicing the CNN module and the LSTM module, the two modules can be merged into one module that is named LSTM-CNN.
In summary, due to the growing demand for real-time travel time information in ITS, a large number of traffic prediction algorithms have been developed. e advantages of hybrid models are gradually highlighted. Hence, we propose an LSTM-CNN strategy to predict the travel time of the target highway with consideration of the correlation of spatial, temporal, and depth components, which can achieve a better performance than the present methods.

Methodology
e LSTM-CNN framework for highway travel time forecasting is presented in this section. e method integrates LSTM and CNNs with the attention mechanism and the residual network, so it can have an excellent generalization performance in prediction accuracy and good scalability in highways. As shown in Figure 1, this method mainly includes four steps. In the preliminary prediction step (Step I), the multidimensional inputs of travel-time time series data are grouped by spatial segments to be fed into different LSTM. In the temporal extension step (Step II), the attention mechanism is applied to combine the latest hidden output of LSTM to perform preliminary forecasting. us, both spatial and temporal features are stored in the attention-combined output. In the depth extension step (Step III), the residual network is designed to balance the ability to learn the complex spatiotemporal regulation of travel time when nonrecurrent incidents occur and overcome the vanishing gradient problem. In the feature extraction and dimensionality reduction step (Step IV), the complex features from the residual network are extracted with dimensionality reduction by a series of Conv1D layers to achieve the multidimensional final prediction.

3.1.
Step I: Preliminary Prediction. A series of travel times of each segment has been input into its LSTM, which composes the spatial input in parallel of LSTM-CNN. e main objective of LSTM is to simulate long-term dependencies of each segment and determine the optimal input length of a gate unit that acts as an input to three multiplying units, blocking or transmitting information based on the importance of the data elements. Estimated weights of data are stored or deleted in a cell by a backpropagating learning process. e mathematical LSTM formulas for each segment are as follows: where x t is the input of LSTM from traffic time series at time step t. i t and o t are the outputs of the input and the output gates at time step t, respectively. e output of the forgotten gate, memory cell, and hidden state is represented as f t , S t , and h t , respectively. S t is the candidate cell to achieve S t with combination of the memory cell S t−1 at time step t − 1. W i , W f , W o , and W S are weights of x t in the equations of input gate, forgotten gate, output gate, and memory cell, respectively. U i , U f , U o , and U S are weights of h t−1 in the equations of input gate, forgotten gate, output gate, and memory cell, respectively. V i , V f , and V o are weights of S t−1 in the equations of input gate, forgotten gate, output gate, and memory cell, respectively. σ is the sigmoid function, which controls the access of information by the output of 0 and 1. tanh(·) is the hyperbolic tangent function with the range of [−1.0, 1.0], which is the activation function of LSTM. × is the point multiplication, and + is the addition operation. And the structure diagram of LSTM is shown in Figure 2.

3.2.
Step II: Temporal Extension. Although LSTM selects and forgets valuable information from the recurrent time series, there is still a problem in the learning process of LSTM. In LSTM-CNN, spatial information of travel time from different segments are extracted for outputting the travel time of different segments in the next interval, which is a seq2seq model. However, the contribution of the selected information from different segments is dynamic. Methods, such as autoencoder structure, tend to lose information in the extracting process. e attention mechanism is proposed to give dynamic weights for mapping the output of different segments to the final prediction. To consider the state transmission from the downstream to the upstream, S t is defined as the travel time at any segment at the t interval. e downstream segment is given a smaller serial number, then the n segments, and so on. S t , S t+1 , . . . , S t+n are input into each LSTM and combined by attention mechanism. e softmax activation function is used to deal with α into a weighted vector α that belong to [0, 1]. Lastly, the weighted vector α is applied to combine S t to acquire the prediction c j in the next interval. In the weighted vector α, α ij is the spatial information from segment i to the predicted segment j. α is the output of the LSTM hidden layer of segment i.

3.3.
Step III: Depth Extension. e residual network layers (ResNet) are proposed to dig out deep features of spatiotemporal highway segments, which consist of three pairs of 1D convolutional layers and the BatchNorm, followed by a rectified linear unit (ReLU) activation layer. ReLU is used because it can maintain a stable gradient even for larger activations. e next step is to predict future traffic status through the integration of four 1D CNN layers. It combines LSTM with the attention mechanism to grasp more information.
Considering residual network deep processing, the BatchNorm layer has been introduced to accelerate the convergence of the network. With deepening calculations, the network is prone to overfitting and gradient disappearance. By reducing internal covariance offset, it can quickly converge and avoid overfitting. In deep neural networks, after each single gradient update of a batch of data, each layer will show different feature information than the previous layer. Because parameters of the previous layer are updated during the training, data distribution of the input feature maps is also changed greatly. It also significantly affects the training speed and requires heuristics to determine the initialization of parameters. e BatchNorm layer is a technique that is used to solve the internal covariate offset problem. e normalization of minibatch can be calculated by the following formula: where μ B and σ 2 B are mean and variance of the minibatch, respectively; ε is a constant; and X k and X k are the first and second versions of the k-th input data. Formula (3) can reduce the covariance offset to achieve data standardization. Formula (4) is a learnable parameter, which denotes the scale and rotation values of the input data.
e BatchNorm conversion is expressed as follows: where c and β are learnable parameters and Y k is a scaling ratio and a rotation value of the k-th input data. In the deep network layer, the parameter initialization is generally close to 0. When updating the parameters of the shallow network during training, it is easy to cause the gradient to disappear as the network is deepened. Shallow parameters cannot be updated. ResNet is assumed to involve a network layer that is an optimized network level for shallow parameters. en the designed deep network has many redundant network layers. We hope that these redundant layers can complete the identity mapping to ensure that the input and output pass through the identity layer are exactly the same. e identity Step I Preliminary prediction Step II Temporal extension Step III Depth extension Step IV Feature extraction and dimension reduction  layers are specified and will be self-determined during network training. It can be seen that X is the input of the residual network by this layer. F(X) is the output after the linear change and activation of the first layer. Figure 3 shows the basic framework of the residual network. Before the second layer performs a linear change and activates, F(X) adds the input value X to this layer and then activates the output. X is added before the output value of the second layer. e path is called a shortcut connection. So ResNet meets the exploration of deeper traffic correlations. Without ResNet, the deeper the network is, the harder it is to train with an optimization algorithm. As the depth of the network deepens, training errors will increase.

3.4.
Step IV: Feature Extraction and Dimensionality Reduction. As the output of the residual networks with the fully connected layer is rough, a series of one-dimensional convolution layers is connected to extract valuable features from it to the final prediction. Meanwhile, the one-dimensional convolution layer transmits the dimensionality of the output to the final prediction.

Data Sources.
Our data was collected from Shaoxing, Zhejiang Province, China, including data from September 1 to November 30, 2019. e total length of the target highway is about 39.25 km, and the speed limit is 100 km/h. Ascending from the southeast direction to the northwest direction, we divide the entire upward highway into six segments, as shown in Figure 4. Table 1 provides basic information of each segment. ere are three toll stations in total, which are Keqiao Toll Station (black circle 1), Shaoxing Toll Station (black circle 2), and 1039 Shangyu Toll Station (black circle 3). ere are also four bayonet camera detectors, which are 20311, 20312, 20301, and 20302 from the northwest to the southeast. ANPR cameras were installed in a fixed position to read vehicle license plates with high accuracy. Travel time on roads can be estimated from the data received from the ANPR. For all 80-day data, we use the first 75-day data as the training set and the last 5-day data as the testing set. LSTM and CNN access the characteristics of traffic status by predicting the statistics of the next 5 min based on the past six 5 min.
Anomalies in the data sets may be much higher or lower than the normal true value, thus providing inaccurate predictions. To suppress the effect of these outliers, we applied the method of winsorization. e purpose of winsorization is to replace the minimum and maximum values in a data set with the closest values. Winsorization plays a crucial role in suppressing the effects of extreme values. Assuming that the value of the sequence to be processed is given by w, where w � (w 1 , w 2 , . . . , w c ), the processed value w C i following winsorization will be where w i is the i pending value, min(w) is the minimum value of the pending value, max(w) is the maximum value of the pending value, c is the number of pending values, and w C i is the winsorized value of the i pending value.

Data Analysis.
Frequency distributions are plotted against observed travel times for each of the six segments of the target highway over a two-month period as shown in Figure 5. Each segment shows the distribution of raw data frequencies over the entire data cycle. It is interesting to note that frequencies of segments 2, 3, 5, and 6 are close to Gaussian shapes. e traffic flows in these segments are uninterrupted flows. Segments 1 and 4 did not subject to Gaussian distribution because the traffic flows are interrupted by nearby toll gates in the ramps. Hence, based on the choice of different destinations for different travelers, the results of travel times fluctuate greatly. In particular, there is a Shaoxing service area in the middle of segment 4. Some vehicles may be repaired or under service. e frequency distribution of travel time is further expanded.
Median absolute deviation (MAD) is a parameter for detecting outliers by calculating the sum of distances between observation and average values. e MAD is the average distance between each point and the average point for the overall data fluctuation. MAD is defined as follows: In time domain T, tr is the mean of the average travel time in each interval temporal distribution, and tr t is the average travel time at interval t. Calculation results of six segments are shown in Table 2. It is clearly seen that the MAD value of segment 4 is much larger than those of other segments, which fully shows that its data are extremely volatile. Among them, segments 2 and 5 are uninterrupted, and the estimated data is very stable. Segment 1 has an exit at    the end and no service area in the segment; therefore, its MAD is smaller than that of segment 4, although it is still much larger than those of segments 2 and 5. For segments 3 and 6, the upstream includes the entrance area of the toll station, and vehicles imported from time to time, which makes data also fluctuate greatly. For different traffic diversions in different segments, we have conducted a study using LSTM-CNN to make predictions. e results demonstrate the accuracy of the proposed model.

Baseline Methods.
e proposed method in this study is compared with the following five methods: (1) XGBoost: XGBoost implements a generic tree boosting algorithm. e loss function uses Taylor expansion to the second order, using the first two orders as improved residuals. Regularization is introduced to limit the complexity of the model.
(2) LR: linear regression is a statistical analysis method that uses regression analysis in mathematical statistics to determine quantitative relationships between two or more interdependent variables. In this model, we use one or more seasonal differences to eliminate cyclical variation. SARIMA appears to be more robust than ARIMA. (4) CNNs: deep convolutional neural networks are feedforward neural networks that are very similar to ordinary neural networks. ey both consist of neurons with learnable weights and bias constants (biases). Each neuron receives some input and does some dot product calculations. e output is a score for each classification. which is a series of LSTM that is similar to the recurrent structure of Step I (preliminary prediction). e difference is the connection between different LSTM is existing.

Evaluation Index.
Before performing error calculations, we first normalize the test set results. For the evaluation of different prediction methods, we employ root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) as the evaluation indexes. In time domain T, given predicted values tr t and ground truth values tr t , the RMSE, MAE, and MAPE are calculated as follows: T . (7)

Results.
e accuracy of the proposed predicting method is examined based on the traffic levels and time of the day for each segment. Figure 6 shows the performance of traffic travel time forecasting using the proposed method, that is, MAE, MAPE, and RMSE by variation in segment levels and time of day. e box plot shows the distribution of prediction errors, with the solid red line representing the mean of the errors. It is observed that travel time varies from one segment to another over the forecast time. It can be seen from different fluctuations in MAPE. Segment 4 has the largest error, which is followed by segments 6 and 1. Similarly, when examining the nature of the errors corresponding to the time of day, the prediction accuracy is measured with a peak of 7:00 to 8:00. e error is higher during the peak hour. It is relatively low in the off-peak hours. Overall, the distribution in Figure 6 shows that travel times in different segments are affected differently at different times of a day. In all, the results show that the LSTM-CNN model provides reliable and accurate predictions of travel time.
As shown in Figure 6, the results of segment 4 fluctuate most significantly due to the various travel behaviors of different drivers. Figure 7 shows the forecast results from segment 4 for October 16 to October 21. e purple shaded area shows that when there is a sharp increase of travel time, the ground true curve (red) is very similar to the model prediction curve (green). In this case, the approach presented in this paper helps obtain the desired performance. e predictions of segment 6 have a flatter fluctuation, as shown in Figure 8, while the predictions of segment 2 have the least fluctuation in the data, as shown in Figure 9. It can be seen that for each different traffic diversion case, the predictive accuracy of the LSTM-CNN model is excellent. Figure 10 shows that the error results of six segments at different time steps, indicating that the model is better at predicting short travel times. e predicted accuracy for the next six steps of different methods is given in Tables 3-5 . LSTM-CNN shows the best performance in multistep prediction than other benchmarks with both spatial and temporal learning ability. Figure 11 shows prediction error results of LSTM-CNN, LSTM-ST, and CNNs in 5 days of the mean errors that are indicated by different heights. e results show that there is a statistically significant difference in the mean prediction error between the other two methods and the LSTM-CNN method. As shown in Table 6 For the visualization of one-dimensional data, both histogram and kernel density estimates are good ways to represent the probability distribution of individual data values, but the two methods for representing the cumulative distribution of data are helpless. Cumulative distribution of the data, which is the probability distribution of all data less than or equal to the current data value, is not useful for indicating. Data points are in an interval of the probability of occurrence. Mathematically speaking, cumulative distribution function (CDF) is the integral of the probability distribution function. And when plotting CDF, the true probability distribution function is unknown. erefore, it is often defined as the integral of histogram distribution, as shown in the following formula:

Comparison.
(8) Figure 12 plots the accumulation of RMSE for LSTM-CNN, CNNs, and LSTM-ST of the distribution function. ey demonstrate the statistical properties of the three methods separately. Experimental results demonstrate the validity of the proposed method for travel time prediction. Figure 13 shows the mean errors of CNNs, LR, LSTM-CNN, LSTM-ST, SARIMA, and XGBoost, with the left axis showing the three errors of the box and the right axis showing the mean line. Table 7 shows specific error values for different methods. LSTM-CNN has the best performance for all three types of errors in six segments.
Short-term predictions are mainly used for short-range trip planning and are desired by travelers. For this purpose, based on historical data, we predict the following (5 min, 10 min, and 15 min) trips time. Figure 14 and Table 8 list results of XGBoost, LR, SARIMA, CNNs, LSTM-ST, and LSTM-CNN.
We compare LSTM-CNN with the five other methods. It is clearly observed that LSTM-CNN produces the most accurate short-term traffic travel time prediction considering these three errors. In the mean results of short-term prediction shown in Table 8, the RMSE of LSTM-CNN was better than XGBoost, LR, SARIMA, CNNs, and LSTM-ST method by 20.41%, 18.93%, 13.04%, 22.27%, and 30.72%, respectively. As for the MAE, the improvements were 29.60%, 46.27%, 25.90%, 59.58%, and 49.81%, respectively, whereas for the MAPE, the improvements were 19.41%, 49.40%, 30.40%, 62.49%, and 44.92%, respectively. LSTM-ST method exhibited the worst predictive performance. 8 Journal of Advanced Transportation Long-term prediction is used primarily by long-distance travelers who plan their trips in advance, and it is considered more challenging than short-term prediction. e setup predicted the next 20 min, 40 min, and 60 min trip times based on historical data. Figure 15 and Table 9 list results of XGBoost, LR, SARIMA, CNNs, LSTM-ST, and LSTM-CNN.
Similarly, we compare LSTM-CNN with five other methods. It is clearly observed that LSTM-CNN produces the most accurate long-term traffic trip time predictions for the three errors. According to results in Table 9, the RMSE of LSTM-CNN was lower than XGBoost, LR, SARIMA, CNNs, and LSTM-ST methods by 7.76%, 13.67%, 6.51%, 6.28%, and

Conclusion
e ability to predict highway travel time in a timely and accurate manner is essential for proactive traffic management strategies that helps provide reliable services to travelers. In this paper, we build upon residual network and attention mechanism and propose a prediction method for the combined LSTM-CNN method. e collected and processed high-speed uplink data sets are predicted segment by segment. We consider traffic diversion to evaluate multisteps performance and test the method in a real-world case to predict 5-day travel time. RMSE, MAE, and MAPE are used as evaluation metrics. e results show that the proposed method is superior to XGBoost, SARIMA, CNNs, LR, and LSTM-ST methods, indicating that it can contribute to the improvement of the traffic travel time prediction.
Data Availability e data that support the findings of this study were provided by Zhejiang Communications Investment Group Co., Ltd. Restrictions apply to the availability of these data, which were used under license for this study. Disclosure e views present in this paper are those of the authors.

Conflicts of Interest
e authors declare that they have no conflicts of interest.