Short-Term Traffic State Prediction Based on the Critical Road Selection Optimization in Transportation Networks

Short-term traffic prediction under corrupted or missing data for large-scale transportation networks has become an important and challenging topic in recent decades. Since the critical roads have predictive power on their adjacent roads, this paper proposes a novel hybrid short-term traffic state prediction method based on critical road selection optimization. First, the utility function of the quality of service (QoS) for the critical roads in a large-scale road network is proposed based on the coverage and the data score. Then, the critical road selection optimization model in the transportation networks is presented by selecting an appropriate set of critical roads with the maximum proportion of the total calculation resources to maximize the utility value of the QoS. Also, an innovative critical road selection method is introduced, which is considering the topological structure and the mobility of the urban road network. Subsequently, the traffic speed of the critical roads is regarded as the input of the convolutional long short-term memory neural network to predict the future traffic states of the entire network. Experiment results on the Beijing traffic network indicate that the proposed method outperforms prevailing DL approaches in the case of considering critical road sections.


Introduction
Real-time traffic state prediction plays a vital role in traffic management and public service. By predicting the evolution of traffic timely and accurately, governments and travelers could react to the traffic congestion ahead of time. For instance, intelligent transportation systems, advanced traffic management systems, and traveler information systems depend on real-time traffic state prediction.
In the past decades, there have been numerous studies on this topic [1]. ese studies could be divided into mathematical models, statistical models, and data-driven methods [2]. Compared with data-driven methods, mathematical or statistical models derived from macroscopic and microscopic theories of traffic flow are difficult to handle unstable traffic conditions and complex road settings due to strong hypotheses and assumptions [3].
Data-driven methods have achieved promising results due to more potential in processing complex nonlinear problems [4]. ese methods include support vector machine (SVM), Bayesian network, and neural network. Among all these datadriven methods, deep learning approaches have proven effective in traffic state prediction. is method could exploit much deeper architectures and process the high-dimensional set of explanatory variables [5].
However, most of the deep learning approaches are constructed based on the entire dataset. e attributes of datasets influence the prediction performance. In other words, the deep learning approaches have highlighted the data quality in short-term traffic state prediction [5]. Due to the corrupted or missing data problem, there is limited highquality real-time or historical data obtained, indicating that the data has low predictive quality [6]. ese setbacks weaken the predictive accuracy and efficiency, limiting the capacity for providing a practical and reliable forecasting result.
Recently, researchers find that some critical roads significantly affect the traffic states of their adjacent roads at a specific road network [7,8]. is conclusion indicates that the traffic state of one road could be predicted based on its neighbors' state [6].
ere have been numerous studies on this topic, but only a tiny percentage of them have paid attention to prediction based on critical road selection optimization. erefore, if we extract the traffic state of the critical road sections with the most dominant predictive power, we could characterize the spatiotemporal features of traffic flow and predict the future traffic state of the overall network.
In this paper, we propose a novel hybrid short-term traffic state prediction method based on critical road selection optimization. Different scores were given to data collected on different road segments. e higher the criticality of the road segment, the higher the score of the data. e critical road selection problem is abstracted as a multiobjective optimization problem, maximizing the sensing coverage and data scores. In other words, our objective is to select the most suitable critical roads to maximize the quality of the service (QoS) with the limited consumption of network resources.
en, a novel critical road selection method is proposed, considering the topological structure and the mobility of the urban road network. Subsequently, the traffic speed of the critical roads is regarded as the input of the convolutional long short-term memory neural network to predict the future traffic states of the entire network. Finally, to demonstrate the effectiveness of the proposed method, the numerical experiments using the traffic states depicted from GPS trajectory data in Beijing. In addition, other traditional machine-learning models are compared to demonstrate the advantage of the proposed method. Practical experiment results showed that the proposed method could precisely predict future traffic network states. e rest of our paper is organized as follows. Section 2 discusses the literature on short-term traffic state prediction and the existing approaches to prediction based on critical roads. In Section 3, the critical road selection optimization model is proposed by selecting an appropriate set of critical roads with the maximum proportion of the total calculation resources to maximize the utility value of the QoS and also an innovative critical road selection method is introduced. Section 4 elaborates on the numerical experiments using the traffic states depicted from GPS trajectory data in Beijing. e last section concludes the study and discusses future work.

Literature Review
Over the past decades, considerable short-term traffic forecast models have emerged to handle the prediction of future traffic states ranging from a few seconds to few hours. ese approaches could be generally categorized into mathematical approaches, statistical approaches, and datadriven approaches.
Mathematical approaches focus on predetermining the model structure by theoretical assumption. e evolution of the traffic state could be simulated by the theoretical mathematical models [9]. Because of the strong dependence on theoretical considerations, these mathematical approaches do not match the actual traffic state condition [10]. Mathematical approaches depend on theoretical mathematical models to simulate the traffic evolution. ere are many famous mathematical models in the past decades, such as the Greenshields model, car-following model, Van Lighthill-Whitham-Richards (LWR) model, and so on. e future traffic state of the road network could be calculated by these models directly [11,12]. Most traffic simulation simulators, such as Q-Paramics, VISSIM, DynaMIT, and DynaSMART-X, predict traffic states based on the traffic flow models, car-following models, and dynamic traffic assignment models [13][14][15][16].
Statistical approaches usually rely on statistical assumptions.
e autoregressive integrated moving average (ARIMA) and its variants are the most common mathematical approaches. Hamed et al. [17] proposed a simple ARIMA to predict the traffic state of the road network. e results showed that the S-ARIMA model could obtain better properties [18,19]. Ding et al. proposed a space-time ARIMA to predict the traffic state of the road network [20].
In addition, the Markov chain, Kalman filter (KF), KFbased approaches, and other approaches also have important applications in short-term traffic prediction [21][22][23][24]. However, these approaches fail to produce favorable results under unstable traffic conditions, such as unexpected events [25].
Different from mathematical approaches and statistical approaches, data-driven approaches rely on a sufficient mass of traffic data. In recent years, due to abundant data attached to extensive traffic sensors and advanced big data technology, data-driven approaches have developed rapidly. Numerous data-driven approaches were put forward for the short time traffic state prediction, such as SVM [26][27][28], neural network [29][30][31][32], and hybrid methods [33,34].
Among these data-driven approaches, deep learning methods have become extremely popular and successful because of their powerful ability to process nonlinear highdimensional problems. Huang et al. employed a deep belief network (DBN) with multitask learning for traffic flow prediction [4]. Lv et al. proposed an SAE model to predict the traffic flow, and the performance is superior to other methods at different prediction horizons [35]. Furthermore, numerous efforts have been devoted to emphasizing the temporal characteristics and spatial dependencies on prediction [2]. Ma et al. predicted the traffic speed of a largescale transportation network using an LSTM neural network and CNN [36,37]. Wu and Tan extracted spatial and temporal features by CNN and LSTM, respectively, and predicted traffic volume combined these two approaches [38]. Wang et al. proposed an eRCNN model then trained the recurrent CNN by reducing predictive feedback errors [39].
In summary, among all these traffic state prediction approaches, deep learning methods stand out as the most potent alternative. However, few prediction methods could achieve satisfactory accuracy because of the missing data problem in reality. For the critical roads that have a 2 Computational Intelligence and Neuroscience significant effect on the traffic states of their respectively adjacent roads at a specific road network, we could extract the traffic state of the critical road sections and predict the future traffic state of the overall network.

Methodology
is research focuses on utilizing data from critical road sections to predict future traffic conditions of the overall urban transportation network. In this section, the utility function of the QoS for the critical roads in a large-scale road network is proposed based on the coverage and the data score. en, the critical road selection optimization in the transportation networks is presented by selecting an appropriate set of critical roads with the maximum proportion of the total calculation resources to maximize the utility value of the QoS. Specifically, the critical road sections are selected by an innovative critical road selection method. Finally, the traffic speed of the critical roads is regarded as the input of the convolutional long short-term memory neural network to predict the future traffic states of the entire network.

Road Network Topology Analysis.
In the practical urban road network, road sections are connected at intersections and eventually form a complex network. When performing sensing tasks in urban road network areas, the topological structure characteristics of the road network need to be considered. e results of numerous studies have shown that the topology of urban road networks is complex and diverse. e failure of connectivity in some sections of the network can impact the traffic operation in the whole network. However, it is inevitable that road connectivity in the network will sometimes be interrupted due to bad weather, technical failures, or serious accidents. is will lead to local traffic congestion and paralysis in the network, which will have harmful economic and social impacts. It means that the higher importance of road sections in the network under different spatial and temporal conditions indicates the more severe damage to the integrity of the road network connectivity brought by their failure. In order to avoid serious impacts on road network connectivity and vehicle movements, priority should be given to the use of high, critically important road sections for future traffic state prediction.
In order to study the structural properties of real road networks, it is necessary to reasonably abstract the road networks into topological structure diagrams, consisting of points and lines by suitable methods. e primal approach and the dual approach are two methods to abstract the road network into a complex network model. e primal approach uses the roads in the road network as edges in the abstract network and the intersections as nodes in the abstract network. e dual approach models the intersections as edges in the network and the roads in the urban road network as nodes, thus showing the interconnections between roads in a highly abstract way. e primal approach is intuitive and simple and can retain the layout characteristics of the road network. e dual approach ignores some geographic significance of network entities, such as the geographic location of roads, length, and width, and therefore, it is more suitable for analyzing abstract network structures. In this paper, we pay more attention to the characteristics of road connectivity, so we use the primal approach to model the road network.

QoS for the Critical Roads.
e QoS is the standard for measuring the performance of the critical roads in a prediction task. e greater the value of QoS, the better the prediction performance of the critical roads. Here, we propose a utility function to calculate the QoS for the critical roads.
e coverage of the selected roads is mainly concerned with people when predicting the traffic state of the whole road network. So, we choose coverage as our first metric. Coverage indicates the coverage of the selected vehicle to the entire road network. For better calculating the coverage C, we divide the road network into small grids, and the grid is used as the basic measuring element for coverage calculation. As shown in Figure 1, the pink curves represent roads.
e grid with roads passing through is painted green. In this paper, the size of the grid is 0.0005°× 0.0005°, based on longitude and latitude, where 0.0005 longitude (latitude) is about 50 m.
Define g j as the single grid in the urban road network. Let f(g j ) represent the coverage state of g j , and f(g j ) � 1 indicates that grid j is selected during the predicting task time; otherwise, f(g j ) � 0. In this paper, we tried to predict the whole network traffic state. If g j belongs to road section l and road section l has φ grids, then the road factors could be calculated as z(g j ) � 1/φ. e following equations could calculate the coverage of the selected grids: (1) On the other hand, we introduce the concept of data score to represent the criticality level of the roads. Specially, we give a higher score to the data collected from the critical roads than the data collected from the ordinary roads. Here, we use the correlations among road sections on both space and time to calculate the data score. e spatial weights matrix represents the spatial dependency among road sections in traffic networks. According to graph theory, the local connectivity of a node can be calculated based on the connection between the node and its adjacent node, which is represented by the degree of node k. Suppose there exists a network, which can be represented as G � (V ′ , E), where V ′ is the set of nodes in the network and E is the set of edges in the network, representing the connections between nodes in the network. e network G � (V ′ , E) has |V ′ | � N nodes and |E| � M edges. en, this network can be represented by the adjacency matrix A M×N . If nodes i and j are connected directly or Computational Intelligence and Neuroscience 3 indirectly, we call these two nodes "kth-order neighbors," and their adjacency relationship could be expressed as equation (2). e spatial weights matrix P could be defined as the sum of the kth spatial weights matrix; element p ij in P is calculated as shown in equation (3).
1, when nodes i and j are directly connected, where K is the highest order of the spatial weights matrix. And, to comprehensively consider the spatial correlation of the road network with the temporal correlation, the spatial weights matrix is introduced as a spatial indicator to improve the initial correlation distance. Let x t i be traffic speed in the road section i at time t, where x t i � (x 1 i , x 2 i , . . . , x T i ). en, the integrated speed values of the adjacent road sections RX t i could be computed by equation (4). en, the data score D i (s) between road section i and its adjacent road sections could be calculated by equation (5): where s � (1, 2, . . . , S) (s > 0) is the time lag between the speeds of road sections i and its adjacent road sections; x i is the mean value of traffic speed x t i in road sections i during the time duration of T. RX i is the mean value of integrated speed RX t i . Based on the previous analysis, mathematical expressions of data score D are defined as follows: e utility function of QoS, U, is defined as where α ∈ [0, 1] is a parameter for tuning the weight of data coverage and data score. To make α more sensitive in equation (7), we apply log 2 C instead of C and ln D instead of D. Since logarithm is a common way to aggregate data, this process can reduce the heteroscedasticity of data in the QoS function.

Definition of the Critical Road Selection
Problem. e goal of the critical road selection model is to maximize the QoS of the urban traffic prediction system with limited critical roads, so the mathematical expression of the critical roads selection problem can be defined as 4 Computational Intelligence and Neuroscience where function U is used to calculate the utility value of the QoS; f(g j ) represents whether grid j is selected, with f(g j ) � 1 indicating that grid j has been selected; otherwise, the value of f(g j ) is 0. F is the set of critical roads, and C is the coverage. D i (s) is the data score between road section i and its adjacent road sections. μ is the threshold for the selection.

e Critical Road Selection
Method. Based on the above analysis, a critical road selection model for traffic state prediction is constructed. In this paper, due to the spatiotemporal state changes of the road network, the selected roads need to be updated at intervals T. An improved greedy algorithm is proposed to solve the critical road selection problems. A greedy algorithm means that we always make the best choice for the current situation when solving an optimization problem. In other words, such an algorithm does not consider the overall situation but rather considers only the local situation. e pseudocode of the proposed algorithm is shown in Algorithm 1.

Traffic State Prediction Using the Deep Learning Approach.
In this paper, a spatiotemporal recurrent convolutional network is proposed for the prediction (STRCN). e proposed STRCN inherits the advantages of deep convolutional neural networks (DCNN) and long short-term memory (LSTM) neural networks. e spatial dependencies of network-wide traffic can be captured by CNN, and the temporal dynamics can be learned by LSTM.

Capturing Spatial Features by CNN. CNN has been
successfully applied to traffic prediction for its great potential in extracting features using multiple layers. A typical CNN mainly comprises multiple convolution layers and pooling layers. e former contributes to mine spatial dependencies of road sections since every layer retrieves a distinct feature using different filters. In comparison, the latter assists in reducing the number of parameters required for training CNN under the premise of ensuring prediction accuracy. Given that the input for CNN could be intuitively regarded as an image with each pixel value associating one kind of traffic state during a certain time, 2D CNN is naturally utilized to abstract spatial features between road sections. Figure 2 illustrates the structure of CNN, including the input layer, convolution layer, pooling layer, fully connected layer, and output layer. Each part plays a unique and vital role for CNN, and the details are briefly explained below.
Suppose that we need to predict the future traffic speed of the network V t+a � v t+a i m i�1 , where a is the prediction horizon and m is the number of road sections. e input of the CNN is the historical traffic speed of critical road sections {U t−n , U t−1 , U t }, where U t−n � v t−n i P α i�1 represents the traffic state of critical road sections at time (t−n), n is the look back step, and P α is the number of critical road sections. en, the spatial features of input are captured by convolutional and pooling layers. Let O l r be the output of lth convolutional and pooling layers with r filters and the weights and bias of lth layers be (W l r , b l r ). en, the O l r could be calculated by the following equation:

Computational Intelligence and Neuroscience
where O l−1 r means the output of the previous layer and O 1 r is exactly the input layer. f is a nonlinear activation function, and pool denotes the pooling procedure.

Capturing Temporal Features by LSTM.
Intuitively, traffic states at each moment have a strict sequential relationship in time dimension rather than isolated from each other, which is especially suitable for RNN to capture the temporal evolution of traffic flow. However, it is difficult for traditional RNN to capture temporal dependency if two time intervals are remote. en, LSTM, one of the specific forms of RNNs, is proposed to tackle these issues by adding memory cells in hidden layers. As shown in Figure 3, four main parts, an input gate, a neuron with a self-recurrent connection, a forget gate, and an output gate, are collaborated to alleviate the problems of traditional RNN caused by the gradient vanish and explosion.
In our model, next to CNN, the LSTM naturally takes the output of CNN V t � x t i R i�1 as its input to predict the future traffic states H t � h t i q i�1 , namely, its output, where q is the number of hidden units of the output layer. For a memory cell, the input states are G t−1 while the output is G t . Meanwhile, the states of input, forget, and output gates are I t , F t , and O t , respectively. e temporal features could be iteratively calculated by the following equations: forget gate:  Input: number of roads: n Number of prediction tasks: m Set of critical roads: F QoS for the critical roads: U Output: set of finally selected roads: if ΔU > 0 then (8) R←R ∩ F i (9) else (10) i � F i ⟶ next (11) end if (12) end for (13) cell input: cell output: hidden layer output: where weights matrices W and bias vectors b are constructed to connect input layer, output layer, and the memory cell, ⊙ denotes the scalar product of two vectors, and σ(·) represents the standard logistics sigmoid function defined as follows:

Training with STRCN.
Integrated with the advantages of CNN and LSTM, the STRCNs is utilized to predict future traffic states by sufficiently exploiting the spatiotemporal characteristics of the data. Eventually, a fully connected layer is employed to predict the future speed by taking the output of LSTM as input. e future speed could be calculated by the following equation: where W y and b y are weight and bias related to the hidden layer. Conclusively, the model is trained from end to end, and the values of Y t+1 are prediction results, which are the output of the entire mode. Several hyperparameters within the model will be set and elaborated in the experiment section. Additionally, it is significant to note that the input size will alter as the number of critical road sections changes due to different extracting rate α, and hence several hyperparameters will change too.

Data Used.
In this section, a case study is conducted to evaluate the performance of the proposed critical road selection optimization model and the traffic prediction method. e main urban road of a subtransportation network of Beijing near West Second Ring Road is selected as the research objective, as shown in Figure 4. e network comprises 278 road sections, including several kinds of hierarchies of roads, such as freeways, arterials, secondary roads, and collectors. e total length of all the roads is approximately 24.53 km, and the network covers around 0.6 km 2 areas.
Data collected by taxis equipped with GPS devices from June 1st, 2015, to August 31st, 2015 (92 days) is utilized for training the proposed model and predicting the future traffic speed of the network. e updating frequency of data is 2 minutes, and a time period ranging from 6 : 00 : 00 to 23 : 00 : 00 is concerned for high travel demand is repeatedly observed. Accounting for the traffic state varies every time interval, we could observe 511 traffic states per day.

Critical Road Selections.
e proposed QoS-based critical road selection method and other two road selection methods (random method and coverage-based method) were used to select the road sections for prediction in the road network. In a random method, the road is selected randomly during the prediction process. In the coveragebased method, the road is selected based on the coverage, which means that roads with higher coverage are preferred for selection. e three critical road selection methods are applied under different roads extracting rate μ (i.e., the proportion of critical road sections to all roads, μ ∈ [10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%]).
Eventually, the correspondence of extracting rate and the number of critical road sections are listed in Table 1. For example, μ � 0.50 means that we will select 139 roads as critical road sections and subsequently use them to predict the traffic states of 278 roads.

Performance between Different Critical Road Selection
Methods.
e root means squared error (RMSE) and root mean squared error proportional (RMSEP) are employed to evaluate the performance of all the models, which could be calculated as in the following equations: where y i is the ith ground-truth value and y i is the ith predicted value. e value of n μ denotes the number of critical road sections at extracting rate μ and N μ is the total number of traffic states. Computational Intelligence and Neuroscience

Performance Using the Different Critical Road Selection
Method. In order to evaluate the performance of the proposed critical road selection optimization model, we train and test the STRCN model using different road selection methods. e RMSEs and RMSEPs in the context of different extracting rates μ are listed in Table 2.
It could be found that in the range of 0.8 to 1.0, the performance of the STRCN model using different road selection methods is almost the same. e reason is that the finally selected road sections by different methods have little difference under the condition of high extracting rate. However, when the extracting rate is between 0.5 and 0.8, the performance of the QoS-based selection method is a little superior to the overall prediction model. In general, the decrease of accuracy is reasonable and within the acceptable limits, which demonstrates the validation and generalization of the approach and the fact that some road sections indeed have less contribution towards prediction.
Additionally, when the extracting rate comes to 0.5, the predictive performance gradually tends to be unstable,    Computational Intelligence and Neuroscience probably because too many roads are omitted. Particularly when the extracting rate is between 0.0 and 0.2, the performance is basically the same. at is because no matter what method is used, there are too many missing road sections to predict. Figure 5 shows that prediction accuracy generally declines as the extracting rate decreases with different critical road selection methods.

Performance between Several DL Algorithms.
As we introduced before, corrupted or missing data generally exist on account of the monitoring equipment failure, extreme weather, data transmission error etc., which weakens the effectiveness of the prediction model or even disables the model. To test the performance of our model under random structural missing data, we stochastically extract a part of   Computational Intelligence and Neuroscience road sections of the rate of μ where the value remains the same as those mentioned above.
Four popular deep learning-based algorithms are selected for comparison, including ANN, CNN, LSTM, and SAE. ANN adopts a plain and shallow structure to process multidimension and nonlinear problems. e parameters of the SAE are set according to [35], which achieves high accuracy in predicting traffic flow. We take 30 min historical traffic speed as the input to predict overall traffic states after 2 min. Table 3 presents the quantitative results of Q-STRCN, ANN, CNN, and LSTM. It could be observed that the result of Q-STRCN outperforms other models, indicating that our model could precisely mine the spatiotemporal features of the data and make a relatively accurate prediction. Among all the rival algorithms, LSTM has the best performance, probably resulting from that the temporal features of the time-series data are essentially prominent. e results of ANN and SAE demonstrate that these two models fail to extract spatiotemporal characteristics that have vital impacts on prediction. Figure 6 shows the quantitative results among different approaches under different extracting rates.

Summary and Conclusions
Structural missing data usually has a massive negative effect on short-term traffic state prediction. In this paper, a novel hybrid short-term traffic state prediction method based on critical road selection optimization is proposed. First, the utility function of the quality of service (QoS) for the critical roads in a large-scale road network is proposed based on the coverage and the data score. en, the critical road selection optimization model in the transportation networks is presented by selecting an appropriate set of critical roads with the maximum proportion of the total calculation resources to maximize the utility value of the QoS. Also, an innovative critical road selection method, which is considering the topological structure and the mobility of the urban road network, is introduced. Subsequently, the traffic speed of the critical roads is regarded as the input of the convolutional long short-term memory neural network to predict the future traffic states of the entire network. Experiment results on the Beijing traffic network indicate that the proposed method outperforms prevailing DL approaches in the cases of considering critical road sections.
However, even the case study showed that the proposed method could significantly improve the QoS of the traffic prediction, there is still a long way to go from practical application for the method. For future studies, the inherent attributes of the road should be taken into account when calculating the QoS for the road network. Besides, the traffic accident, temperature, weather, and other external factors affect the traffic prediction accuracy. All these will be left for our future research.

Data Availability
All data, models, and code generated or used during the study are available in the submitted article.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this study.