Urban Traffic Travel Time Short-Term Prediction Model Based on Spatio-Temporal Feature Extraction

,


Introduction
Intelligent transportation system (ITS) is currently the most effective technical solution to improve public transportation service and management [1,2]. e successful application of ITS is inseparable from the accurate identification and prediction of urban traffic status, usually measured by travel time. e advantage of travel time lies in that it can be easily understood by road administrators and users [3]. Also, managers can take precautions and actively intervene in the state of urban traffic from the overall situation, and users can choose their driving routes according to the released information. Previous studies have proved that using travel time as an indicator can better guide users and make the traffic flow in the peak period reach a state of balanced distribution in the road network [4,5]. However, the prediction of urban road network travel time is a complicated and challenging work, mainly reflected in the acquisition and analysis of the spatio-temporal characteristics of the road network. erefore, the prediction of urban traffic conditions has attracted much attention from scholars committed to fully exploring public transportation information, grasping the law of urban traffic situation and development and providing real-time and accurate traffic information for decision-makers [6]. Initially, traditional statistical models, including autoregressive integrated moving average (ARIMA) [7] and Kalman filter [8], were leveraged to forecast the traffic state. However, the statistical methods are based on linear assumptions about data, while the actual traffic data are usually nonlinear.
us, state prediction has gradually shifted from the previous statistical models to the current data-driven models [9], which mainly includes machine learning and deep learning methods. Both machine learning algorithms such as artificial neural network [10], Bayesian network [11], the k-nearest neighbor [12], and support vector machine [13], as well as deep learning models such as deep confidence network [14], and long short-term memory models [15], have achieved excellent results in the field of traffic prediction. Furthermore, some prediction approaches have evolved in recent years based on multiinput multioutput supervised regression (M-SVR and M-KNN, for instance), and these approaches strive to capture the spatio-temporal relationship within the regression process [16,17]. Some studies also intend to capture traffic states' spatial dependencies through the graph convolutional network and learn the dynamic changes of traffic data by the recursive neural network to capture the time dependence of traffic states [18]. Although these models perform well in describing the influence of spatio-temporal correlation on traffic state evolution, detailed explanations of temporal and spatial feature information are still lacking [19][20][21].
Because various factors affect the travel time of the road network, only by thoroughly understanding these factors can we better plan and manage urban traffic. To better complement the point, this study conducts an exploratory analysis of traffic factors to understand the physical mechanisms of the system by introducing a spatio-temporal approach that only uses the nonlinear method, which includes empirical dynamic modeling (EDM) [22,23] and complex networks (CN) [24]. e method provides a perspective of highlighting in an understanding way the spatiotemporal features without binding assumptions. EDM has been verified promising in nonlinear testing of time series, external driving factors for the evolution of mining systems [25]. erefore, EDM can be adopted to analyze travel time's influencing factors and quantify the causal relationship between these factors. Simultaneously, the urban road network's topological characteristics and spatial structure play an irreplaceable role in the analysis of urban traffic characteristics. To reveal it better, the study uses the relevant knowledge of complex networks to deeply explore the spatial structure characteristics of urban road networks and their impact on urban traffic conditions.
Recently, extreme gradient boosting (XGBoost) [26,27], as an efficient implementation of gradient enhancement tree algorithm, has become promising candidates for short-term forecasting not only because it can flexibly represent complex nonlinear traffic systems but also because of its ability to map from input variables to output variables directly. Simultaneously, it also has significant advantages to explain the features imported into a predictive model. For example, Zhang and Haghani [28] obtained quite good experimental results when applying the gradient enhancement tree algorithm to predict the travel time of road sections, and the model was well interpreted. Nevertheless, the shortcomings in his paper are only a few road sections which were concerned, the prediction of road network level was not considered, and the selection of features was not explained in detail. To overcome these shortages and maximize the XGBoost forecast method's advantages, a hybrid prediction model named EDMCN-XGBoost for spatio-temporal feature extraction and prediction of urban road network travel time is proposed in this research. Figure 1 shows the complete pipeline of our proposed approach, consisting of four interacting steps. First, data preprocessing is conducted to transform raw data into understandable format. Second, the complete mining process of EDM on road segment data is carried out to explore the temporal dependence and influencing factors of travel time. ird, the complex network theory is used to extract the spatial statistical features of traffic networks. Finally, the linkages between spatialtemporal features and corresponding travel time of the road network are built using XGBoost, and key features are identified according to importance ranking and recursive elimination. Afterwards, we establish the XGBoost predictive model.

Empirical Dynamic
Modeling. Due to the stochastic and nonlinear nature of actual traffic conditions, linear-based methods are no longer suitable for the study of urban traffic data, while the application of nonparametric methods in traffic data mining has gained more attention. Empirical dynamic modeling (EDM) is a data-driven nonparametric modeling framework for nonlinear dynamic systems, including simplex projection, S-map, multivariate embedding, convergent cross-mapping (CCM), and multiview embedding [22,[29][30][31][32][33]. In contrast to fitting parametric equations used by many pieces of research, EDM instead relies on data to infer the operating mechanism of the dynamic system and reveal the relationship among variables, because of the unknown mechanism equation. e main idea of EDM is to regard time series as projections of complex dynamic system behaviors. e state of the system is described as a point in the high-dimensional space whose axis can be regarded as a fundamental state variable, also known as an attractor. An analogy would be with the traffic system. e travel time variable is the projection point of the high-dimensional space of the complex traffic system. e coordinate axis is the primary state variable such as the lagging variable, the speed variable, and the indicator variable. In general, many state variables in complex systems are unobservable, but some systems can recover their evolution using only time series. As Takens' theorem [33] tells us, although the behavior of a system is nominally determined by the high-dimensional state space, it can replace some unknown variables with its lag variable, that is, construct the attractor using the lag variable.
erefore, this paper studies the dimension and nonlinearity of the time series using EDM quantitative analysis. Based on the nonlinear system's dimension, the causal relationship between other external primary state variables and the target variable is determined by the optimal lag variable of the time series and the EDM convergent crossmapping test. e univariate time series of the target variable (travel-time) is analyzed and tested to find its optimal dimension and the nonlinear degree. e optimal dimension (E) of the target variable, determined by simple projection, is defined as the lag number that can maximize the prediction ability. When using the simplex projection method, the time series is divided into two parts, one of which is used as a sample to predict the remaining part. It is noteworthy that the simplex projection method does not divide the training set and the prediction set, and the prediction is performed outside the sample. Simplex projection is a nonparametric analysis method in the state space of dynamic systems that obtains the prediction set by mapping the neighbors of the predicted points.
e nonlinearity of the dynamic system can be quantified by the S-map analysis method (S-map represents sequence-local weighted global linear mapping) to obtain the nonlinear parameter (θ), which is the state dependence of nonlinear systems.
(2) Causality Analysis. EDM can be used to reveal the relationship between different variables. According to the Takens' theorem, the univariate reconstructed attractor X Z is the shadow version of the original multivariate attractor X, i.e., topological invariance. erefore, X ′ s reconstructed shadow version gives a one-to-one mapping with X. Furthermore, when there is a causal relationship between two variables, it will lead to a one-to-one mapping of their corresponding reconstructed attractors (e.g., X Z ′ , X p ′ ) [23]. Based on the principle and the simplex projection method, Sugihara et al. [30] proposed a convergent cross-mapping (CCM) method for testing the causal relationship between a pair of variables in a dynamic system. If two variables belong to the same variable system with causal links, it must be convergent. e convergence means that the cross-mapping technique (p) increases as the size of the library increases, because the more data in the library, the greater the density of reconstructed polylines, and then the higher resolution attractors can improve the prediction accuracy based on neighbors (i.e., simplex projection).

Complex Networks.
Originated from graph theory, complex network is a large-scale complex system theory tool applied to the real world. In fact, the urban traffic state has spatial autocorrelation characteristics, which are affected by Step 1: data preprocessing Step 2: temporal dependence and Step 3: spatial dependence and

Output
Step 4: train model and forecast quantitative analysis of influencing factors quantitative analysis using complex network spatial structure and topology features of road networks. Previous studies have identified the prominent impact of network spatial features and hierarchical features [34,35] on the network traffic flow. erefore, this paper mainly applies the theory of complex networks to mine the topological structure characteristics and network structure characteristics of urban traffic networks. For details, please refer to Section 4.3.

Extreme Gradient Boosting Prediction Model.
XGBoost [27], an integrated learning parallel processing algorithm based on tree structure, is nonparameterized and can deal with the complex nonlinear relationships between features. Compared to other algorithms, XGBoost has higher interpretability, predictive accuracy, and computational speed. e idea of this algorithm is to continuously add trees, continuously perform feature splitting to grow a tree and each time to add a tree, needing to learn a new function to fit the last predicted residual. When we intend to forecast a sample score, we should get K trees through a training sample dataset. is process corresponds to the characteristics of the sample, each characteristic will fall into a corresponding leaf node, and each leaf node will correspond to a score. Finally, we need to add each tree's corresponding score to obtain the predicted value of the sample. XGBoost's tree integration model is essentially a set of classification regression trees (CARTs). Each tree (CART) means a decision model f(·). e XGBoost algorithm uses model integration; to be precise, it uses k decision tree models to output results based on input x i and sums the K output results to obtain y. e set of decision tree models is called F.
where F is a set that incorporates k decision tree models, among them F � f 1 (·), f 2 (·), . . . , f k (·); f k (·) represents the input/output function relationship of the k th regression tree; andf k (·) corresponds to the structure q and leaf node weight w of the k th regression tree.w i represents the score of the i th leaf node. e first few steps of actually fitting the predicted value include (1) erefore, the t th iteration can be displayed as Further optimization of the objective function of the prediction model can be obtained by fd3 where y (t−1) i represents the prediction of the actual value i at iteration t − 1; l(y i , y (t−1) i ) corresponds to the training loss function; and Ω is the regularization term. e regularization term is shown as Equation (4) represents the regularization term for the decision tree function f, where c, λ are regularization coefficients, corresponding to gamma and lambda in XGBoost's parameters; T represents the number of leaf nodes of the decision tree function f; and ‖w‖ 2 represents the sum of the output squares of all leaf nodes of the decision tree function f.
Equation (4) is used to control the variance of the fitting in order to enhance the flexibility of learning task and to obtain a better prediction model for unknown data. erefore, the over-fitting of training data is avoided and the complexity of prediction model is penalized.
As can be seen from the above, the newly generated tree is the new mapping relationship to fit the residual of the last prediction, so when after generating t trees, the predicted value can be written as , so the target function can be rewritten as equation (3).
In order to get the minimum value of the objective function, XGBoost approximates it with Taylor's secondorder expansion, so that the objective function can be approximated as where g i � z is the second derivative. Since the prediction score of the former t − 1 tree does not affect the optimization of the objective function, it can be neglected, so that the objective function can be simplified as Equation (6) sums the loss function values of each sample. According to the abovementioned analysis, each sample will eventually fall to a leaf node, so all the samples of the same leaf node can be reorganized as follows: It can be seen from the abovementioned analysis that the objective function can be transformed into a unary quadratic function about the leaf node score w, and the optimal w and the objective function obj can be directly solved by using the vertex formula of the unary quadratic function.
where G j � i∈I j g i ; H j � i∈I j h i . In the previous lifting tree model, the classification regression trees are generally arranged in a sequence, but the trees generated by the XGBoost training are arranged in a parallel manner. When the XGBoost training data fits a model, all the cores of the computer CPU are called to construct the tree itself in a parallel way, which in turn increases the speed of calculation. For more details on the XGBoost model, see literature [27].

Data Description
e dataset used in this study comes from the Alibaba Cloud Tianchi dataset platform, the actual desensitization data of a specific location in Guiyang, providing the attributes of urban road segments and travel time of each segment during the historical period (April 2017). e research dataset is from 132 roads of a particular place in Guiyang, including the average travel time of motor vehicles passing the road sections. e period is 6:00-9:00, 14:00-16:00, and 17:00-19: 00 in April 2017, and the time interval is 2 minutes. It should be noted that the original data is not aggregated. Some attributes of the road network are shown in Figure 2.

Data Preprocessing.
e target variable shows a long tail distribution through statistical analysis of the dataset, indicating that the overall quality of the dataset is good, but there is still a small number of unreasonable extreme values. To eliminate the influence of abnormal data, the paper first carried out noise reduction processing on data. Next, the target variable is transformed by log; after the log transformation, the target variable presents a normal distribution, which is more suitable for input into the prediction model to fit.
Another problem with the dataset is the existence of missing values. In April 2017, there were 1069200 sample data, but the actual data were only 973978, so there were 95222 missing data. Although XGBoost has a good handle on missing values, due to the nature of time-series data, the values at first few moments have a significant impact on the values at the next moments. erefore, to minimize the impact of missing values, this study uses the trend of the data to complete the missing data.

Road Network Topology.
e topological characteristics and spatial structure of the urban road network play an irreplaceable role in the analysis of urban traffic characteristics. In this study, the relevant knowledge of complex network is employed to deeply explore the spatial structure characteristics of urban road networks and their impact on urban traffic conditions. e road network studied in this paper includes 132 road sections and their upstream and downstream relationships. e actual dataset does not contain the geographic coordinates of the real road sections and only provides the adjacency relations of 132 road sections.
e topology structure of the road network is expressed according to the adjacency relation of road sections [36,37], as shown in Figure 3.

Basic Spatio-Temporal
Characteristics. e attributes of the road network dataset include the link-ID of each road section, travel time of the two-minute slice, and the relationship between the upstream and downstream sections. A large number of basic spatio-temporal features can be extracted from the raw data, as shown in Table 1 Features are derived values from raw data and used as input to a machine learning algorithm. High-quality features (e.g., being informative, relevant, interpretable, and nonredundant) are the basis for modeling and problem-solving and generating reliable and convincing results. Due to the spatio-temporal characteristics of urban road network travel time, this study focuses on two kinds of features: time-related and spatial correlation features. First, the EDM method is used to analyze the basic time-series features quantitatively. en, the theory of complex network is leveraged to deeply dig the urban road network to obtain its spatial characteristics.

Quantitative Analysis by Empirical Dynamic Modeling.
e results of the simplex projection of the dataset show that the optimal embedding dimension (E) of the univariate analysis of the travel time sequence of the road segment is six (see Figure 4(a)). e nonlinear test of the travel time series by S-map shows that the dataset exhibits a relatively high nonlinearity in θ � 2 (see Figure 4(b)). Forecast skill (ρ) represents the Pearson correlation coefficient (ρ) between the observed value and the predicted value by the EDM method, which is used to maximize the prediction performance. e above indicates that it is reasonable to use the EDM method to process and analyze the data. It should be reminded that before using the empirical dynamic modeling, the dataset needs to be normalized to eliminate the influence of different data units. considered as the construction model predictor coordinates. Correctly, the delay effect of the target variable itself is used as a predictor, and the "block-lnlp" function in EDM can be used to accurately show whether the lagging effect of the target variable itself can fully explain the complex system dynamics of the target variable. e result in Figure 5 shows that the lag coordinates of travel time itself cannot reveal its complex dynamic evolution. After adding the external driving factor of the indication, the model prediction skill is not significantly improved, which indicates that the evolution of travel time series is a highly nonlinear complex phenomenon, and the lag variable of the target variable is a very crucial driving factor that is of greater importance than other external driving factors such as a vacation. e driving factors affecting travel time evolution have been proved to be diverse and complex, so if we want to restore the complex dynamic behavior of travel time series as much as possible, it is necessary to find more external driving factors.

Nonlinear Causality Test.
It can be seen from the abovementioned analysis that the evolution of urban road network travel time is a highly nonlinear complex phenomenon. According to the results of EDM multivariate analysis, the five important driving factors of travel time evolution can be preliminarily determined as TT t−1 , TT t−2 , TT t−3 , TT t−4 , and TT t−5 . On this basis, the nonlinear causality relationship (CCM) test is applied to identify the travel time-driven time series variables, time-space timeshifting features, time-indicating features, and state-indicating features of the target road segments. Sugihara et al. [29,30] developed a cross-mapping algorithm to test the causation between a pair of variables in dynamic systems. If there is a causal relationship between two variables, the cross-mapping between them shall be "convergent." Convergence means that the cross-mapping skill (ρ) improves with increasing library size [23]. A typical road segment determines the causal association between the following characteristics and the target variable, specifically Minute, Hour, Dayofweek, Dayofyear, Vacation, Average_speed, out_link_lagging, T 1 , etc. It should be noted that the black-marked numbers in Table 2 are identified as causal variables. e detailed quantitative analysis results are shown in Table 2 and Figure 6.
Note that, the direction of cross-mapping is opposite to the direction of cause-effect in Figure 6. According to the cross-convergence test of some external driving factors, the magnitude of the cross-mapping rise of external factors is not high. erefore, based on the quantitative analysis results, it can be determined that there is no strong causal relationship between these features and target variables, only a medium causal or weak relationship. e interesting point is the first picture and the last picture of Figure 6. e CCM test does not treat time-space time-shifting features (in_link_lagging) in some road segments as a powerful causal, even weak causal association features. By contrast, another time-space timeshifting feature (out_link_lagging) is regarded as a causal connection feature. Maybe the time step is quite long regarding the urban traffic dynamic. It can explain that no relationship between the upstream and downstream sections is statistically highlighted. From the perspective of urban traffic congestion, the downstream section's causal     impact on the target section is following the traffic jam propagation law. It can be seen from Table 2 that the linear relationship cannot be related as the basis for judging the causal connection.

Complex Traffic Network Spatial Characteristics.
To further reveal the spatial dependence of the road network, it is necessary to dig out more valuable information from the perspective of spatial characteristics. Based on the raw   Journal of Advanced Transportation dataset, we can extract some essential spatial characteristics, that is, length, width, area, and link-ID of different road segments, but the above four features are only the primary attributes of a road segment and do not involve the spatial correlation characteristics between road segments. erefore, this study tries to understand the traffic network from the perspective of the complex network to mine useful features and obtain the quantitative characteristics of mathematical statistics of a large number of complex traffic networks such as degree, in-degree, out-degree, closeness centrality, the importance of nodes, betweenness-centrality, degree-centrality, and PageRank. To explore the spatial aggregation behavior of different road segments, the community structure is introduced into the complex traffic network. e following is a detailed explanation of some spatial features.

Complex Traffic Network Characteristic Parameters.
Complex networks have been widely concerned by traffic scholars. A large number of studies have found that transportation networks have the same complex network structural characteristics as social networks and computer networks. erefore, a road network of a certain place in Guiyang city modeled by the dual-method is taken as the research object [37], and the complex network is adopted to study the interaction between the structural characteristics and the topology of the traffic network. e research is shown below.
(1) Degree of node. e degree of a node represents the number of nodes directly connected to the node. In a directed graph, the degree of a node is divided into the degree of out (out_degree) and the progress (in_degree), indicating the intensity of the road segment in the road network to some extent. Outdegree is expressed as the number of road segments connected downstream of the road segment, and indegree means the number of road segments connected upstream. us, the higher the degree (in or out), the node has a more prominent connection effect in the road network. (2) Closeness centrality. e closeness centrality of a node in the network measures the proximity of the node to all other nodes. e larger the value of closeness centrality, the more central the node is in the road network, and the faster it can reach other nodes. erefore, it is used to emphasize the values of the different nodes in a traffic network; the specific calculation method is as follows: where d ij means the shortest distance from node i to node j and N represents the number of road segments. (3) Node importance ordering. e typical node importance ranking methods are PageRank, LeaderRank, and HITS algorithms, among which PageRank [38] is the core sorting algorithm of Google search. e main idea of the PageRank algorithm is that if the quality of webpage A is high, and page A points to page B, then the quality of page B is also high. Because the actual page links are much more complex, iterations are required to get the final result of the page sort. It is mapped to a complex traffic network with specific quantized PageRank values (referred to as PR) after repeated iterations. e specific calculation steps are mainly referenced in [34].

Complex Networks Community Detection.
Community detection, also known as community discovery, is a way to reveal network aggregation behavior. Commonly used community discovery algorithms include Louvain, label propagation, and infomap. After comprehensive comparisons, this study adopts the infomap algorithm to identify community clusters of road network topology, and the specific calculation steps have mainly referenced the literature [31].
Taking the complex traffic network topology (dual-map structure) of a certain place in Guiyang city as input, the community division result of the traffic network is obtained in Figure 7, where nodes of the same color represent the same community. Each node represents a specific road segment, and each arrow indicates the upstream and downstream relationship of adjacent nodes.

Feature Extraction
Results. Quantitative analysis characteristics of each road segment are obtained by the EDM method; it has obtained a large number of spatial statistical features of urban traffic networks through complex network theory. e abovementioned steps make deep excavation and quantitative description of the spatio-temporal features, which greatly enhance the interpretability and richness of the features. However, through the EDM method, the quantitative analysis results of each road segment are not precisely the same. If the analysis result of each road segment is taken as the feature input of the road network, it will cause a lot of information redundancy and lead to conflicting results. erefore, it is necessary to extract the road network characteristics that conform to most road segments. e characteristics of the road network extracted through the complex network are plentiful and specific, but at the same time, there is also redundancy. It is necessary to streamline the complex network special features to seek the causality between those features with the target variable.
e relationship between spatio-temporal features and travel time of road segments is established by XGBoost, and redundant variables are removed by its feature importance ordering and recursive elimination methods. Table 3 shows the road network spatio-temporal features finally determined for this study.

Analysis of Experimental Results
e travel time prediction of the traffic network is a critical step in constructing an urban intelligent transportation system. According to the above empirical dynamic modeling and quantification results, the evolution of travel time in an urban road network is proved to be highly nonlinear, which is difficult to predict by purely mechanical equations. us, to accurately forecast the evolution of complex time series, it is necessary to extract the key driving factors of system evolution from the data itself. Combining the broad prospects of data-driven methods in traffic prediction applications in recent years and the considerable advantage of tree models based on parallel integration processing in dealing with the relationship between a large number of feature data and processing features, this study decides to adopt XGBoost as the predictive model.

Dataset Description and Division.
In the experiment, 80% of the data is used as the training set and 20% as the test set. In other words, the data from April 1 to April 24, 2017, is taken as the training set to predict the travel time in the traffic network from April 25 to 30, 2017. e main target period in this study is the morning peak (8:00-9:00), the noon stable period (15:00-16:00), and the evening peak (18: 00-19:00). e topology of the road network is detailed in Section 3.2. Table 4 shows the model input and output, the first 21 lines (f 0 ∼f 21 ) in the table are the eigenvalue data as input, and the last line is the output value of the XGBoost model. It should be pointed out that some of the data in the model are log-transformed, and the output data also experience the log conversion. So, when we verify the prediction result, the transformed data should be restored to the original scale.

Model Optimization.
In the machine learning model, data quality is the root cause of the prediction accuracy, but different parameter combinations will also have a particular impact on the prediction model. erefore, the model is compared several times through the experiment, and the evaluation index used in the comparison process is the root mean square error (RMSE). e derived optimal parameter combination of the research dataset is determined, as shown in Table 5.

Model Interpretation.
It is well known that the prediction accuracy of the XGBoost model is very demanding on the predictor variables (i.e., feature vectors). In order to further analyze the influence of the characteristic variables of EDMCN on the model output (response variables), the feature importance is understood by the trained XGBoost model (see Figure 8).
It can be seen from Figure 8 that TT t−1 is the most critical characteristic variable, indicating that the value at the previous moment of the prediction point has the most considerable influence on the next point. It is consistent with our common sense, and the performance of other lagging variables demonstrates that the lagging variables selected before the model input are also reasonable. It is proved that the evolution of travel time series is highly time-dependent, and the time dependence is highly valued in urban traffic management. Average_speed is also a vital feature. In urban traffic management, attention should be paid to the driving speed of road sections. Surprisingly, the T 1 feature performed well. e inspiration to us in this article is to pay attention to before making predictions the first values before the start of the forecast period, which may be significant.
Simultaneously, the space-time time-shifted variables, out_link1_lag1 and out_link1_lag2, we need to the critical focus on spatio-temporal variable, are outstanding, which is also in line with the law of congestion propagation of roads. Among the time indication features, Minute performs the best, which indicates that in the management and control of urban traffic, attention should be paid to the excellent time granularity. Among the spatial indication features, link_ID and length are the most prominent, which means that each road needs careful management and detailed analysis of its characteristics in urban traffic management. Among the sophisticated traffic network features, closeness_centrality, PageRank, and infomap perform better. It means that the   The morning peak 8:00-9:00 The evening peak 18:00-19:00 The noon stable period 15:00-16:00 (d) Figure 9: Continued. evolution of travel time in the urban road network is spatially dependent. erefore, urban traffic management and control need to consider the specific characteristics of each road and the spatial connection between roads. e unexpected point is that in_degree has the worst performance in this study. Figure 9 shows the comparison between the travel times of the 132 roads in Guiyang and the travel times predicted by the EDMCN-XGBoost model from April 25, 2017, to April 30, 2017. e blue line represents the actual travel time, and the cross represents the travel time predicted by the forecast model. e marked morning peak, noon stable period, and evening peak in the figure represent the time concept of urban traffic commuting and does not mean the road's actual travel status. e horizontal axis 0-29 represents the morning peak 8:00-9:00, 30-59 represents the noon stable period 15:00-16:00, and 60-89 stands for the evening peak 18:00-19:00. Each unit in the horizontal axis denotes two minutes.
e time scale of the vertical axis in Figure 9 is measured in seconds to reflect the real-time variation law of the traffic state. It can be seen that the overall performance of XGBoost is quite well, and the prediction effect of different roads in different periods is useful. It is worth noting that some roads of the road network are not strictly following the morning and evening peak travel rules, and the holiday travel rules are also different from the usual. Although the travel time rules of different road segments at different periods are different, the spatio-temporal dynamics of this travel time can still be captured by the model and accurately predicted.

Model Comparison.
In order to test the validity of the EDMCN-XGBoost model, the currently accepted traffic prediction baseline models ARIMA [8], ANN [10], and SVR [13] and some hybrid models are comprehensively evaluated, and four classical evaluation index functions MAPE, MAE, RMSE, and R 2 are used to evaluate the prediction models. The morning peak 8:00-9:00 (f ) Figure 9: e prediction effect of the EDMCN-XGBoost model.

Journal of Advanced Transportation 13
The mean absolute percentage error : The mean absolute error : The root mean square error : The coefficient of determination : where t i represents the actual value; t i is the predicted value; t i represents the mean of the test set; and Q represents the number of test sets. In this study, data from the first 24 days of April (see Section 5.1) were selected as the training set, and data from the last six days of April were used as the verification set. To thoroughly verify the validity of the EDMCN-XGBoost model, it was compared with other prediction models from vertical and horizontal directions. For ensuring the fairness of the input of different predicted models. In the longitudinal comparison, the (p, d, q) parameters obtained by the ARIMA model based on the dataset optimization in the longitudinal comparison are (4, 0, 2); the inputs to the ANN and SVR models are based on the features of the EDMCN screening. In the horizontal comparison, when the study relies solely on XGBoost, the input of the lagging variables is determined by ARIMA's 4; the spatial features of road segments are length, width, area, and link-ID; XGBoost's feature importance ranking selects features variables. e input of CN-XGBoost adds complex network features to the XGBoost input, increasing the spatial connectivity between roads. e input of EDM-XGBoost aims to select and filter the feature vector of the roads by EDM quantitative analysis before the input. It is to notice that the dataset is a bit short of producing an efficient neural network, but it remains useful for comparison [39].
It can be seen from the experimental results that the effect of EDMCN-XGBoost is superior to other models from a vertical and horizontal direction. From comparison, result further demonstrates the scientific validity of spatio-temporal features from EDMCN and more explanatory. One thing to add is that it is useful to compare the prediction performance according to the considered peak period instead of the full day-period, especially when aggregated indicators are considered. e off-peak period is not of paramount importance for the prediction as it is a stable state. e specific experimental results are shown in Table 6. Figure 10 compares the actual operating conditions of the road network at a specific time interval (2 min) and the forecast performance of different models. e x-axis indicates the link_ID of each road segment of the road network. It can be seen from the comparison that the EDMCN-XGBoost model performs best in prediction accuracy.
From the experimental results in Figures 8-10, and Table 6, the following conclusions can be drawn: (a) the EDMCN-XGBoost forecast model is useful for forecasting the travel time of the urban road network and superior to the baseline models. (b) e evolution of travel time series is dynamic and highly nonlinear, and the influencing factors are relatively diverse. (c) Travel time prediction at the network level must consider both spatial and temporal dependencies. (d) During the peak period, the change of travel time in the road network is more apparent, and the travel rules of the traffic network on working days, weekends, and holidays are not exactly the same. (e) It is necessary to construct highquality feature engineering as much as possible, making it rich, interpretable, and nonredundant. Only in this way can the evolution of time series be restored to the greatest extent.

Conclusion
is study designed a framework for analyzing, mining, quantifying, and predicting spatio-temporal travel time of the urban road network. e framework is mainly divided into the following steps: (a) firstly, a large amount of primary feature data is obtained from the original data. (b) e EDM is used to quantify the dynamic and highly nonlinear evolution of the travel time series. Meanwhile, the time series lagging variable (TT t−i ) is determined as the crucial variable of travel time evolution for the strong causal relationship between TT t−i and TT t . (c) e complex network theory is adopted to explore the topological structure of urban traffic networks deeply, and a large number of spatial statistical features of the road network is obtained. (d) e feature importance ordering and recursive elimination principle of XGBoost are used to remove the redundancy of acquired spatio-temporal features and finally establish the prediction model. e example shows that the framework proposed in this study performs well in obtaining high-quality features that are rich, interpretable, and nonredundant to establish a predictive model. e established XGBoost prediction model is superior to other comparable models in both interpretability and prediction accuracy.
Overall, the road network travel rules are complex and changeable, challenging to capture. By digging out the information behind raw data as much as possible, we can better seize the evolution rules of urban traffic travel time. To further improve the prediction accuracy of road network travel time, based on the framework of this paper, we will continue to refine the feature engineering of road network travel time and try to combine more forecasting methods.
Data Availability e terms of use of the data used in this study do not allow the authors to distribute or publish the data directly. However, these data can be obtained directly from Tianchi dataset platform via the following webpage: https://tianchi. aliyun.com/dataset/dataDetail?dataId�1079.

Conflicts of Interest
e authors declare that they have no conflicts of interest.