Road Short-Term Travel Time Prediction Method Based on Flow Spatial Distribution and the Relations

There are many short-term road travel time forecasting studies based on time series, but indeed, road travel time not only relies on the historical travel time series, but also depends on the road and its adjacent sections history flow. However, few studies have considered that. This paper is based on the correlation of flow spatial distribution and the road travel time series, applying nearest neighbor and nonparametric regression method to build a forecasting model. In aspect of spatial nearest neighbor search, three different space distances are defined. In addition, two forecasting functions are introduced: one combines the forecasting value by mean weight and the other uses the reciprocal of nearest neighbors distance as combined weight. Three different distances are applied in nearest neighbor search, which apply to the two forecasting functions. For travel time series, the nearest neighbor and nonparametric regression are applied too. Then minimizing forecast error variance is utilized as an objective to establish the combination model. The empirical results show that the combination model can improve the forecast performance obviously. Besides, the experimental results of the evaluation for the computational complexity show that the proposed method can satisfy the real-time requirement.


Introduction
In transportation management, trip information and route guidance play a vital role. As one of the curial information, travel time can directly indicate state of the current traffic environment and to guide the driver information systems and other related traffic systems, such as personal car navigation and logistics system. As we learnt over years, travel time can be broadly classified into two categories: real travel time and prediction travel time. The former one is defined as the current time cost through the road and the latter one is presented as a certain future time cost for the whole road. Many practical applications showed that the real travel time was not well satisfied for the trip information distribution and route guidance since the traffic environment is dynamic. To be more specific, when we start our traveling and on the road, it is highly possible that the traffic state changed significantly and the optimized route guidance may fail to catch the changes of traffic information. So considering short-term prediction can improve the performance of trip information system or route guidance system to some extent. Current majority of researches on travel time prediction are based on time series theory. Those researches always take the history travel time as time series and then apply stochastic process analysis method to find the series tendency from history data. Applying those tendency principles to current or some historical data, the next future state could be predicted. Although the time series theories used in short-term travel prediction have already developed many prediction models [1], it is still far from being satisfactory. The time seriesbased method get tough on prediction mainly because it only considers the single factor related to time series. Other factors such as change of traffic state on neighbor road as well as previous traffic state can also affect the results of final prediction. So only taking the time series relation into account to develop the prediction model may ignore some important spatial related information among adjacent roads and ultimately affect the prediction performance.
It can be seen from Figure 1 that the next state of road is not only related to the current traffic state but also relies on the previous states in upstream and downstream roads. For example, before time step, the upstream ud , ul , and ur 2 Mathematical Problems in Engineering flows inflow towards R2 at − time. At time + 1, the flows arrive at R2, so the R2 + 1 state is partly affected by three upstream flows − time step state. Moreover, the previous state of downstream flows dd , dl , and dr also affect outflow of R2, so R2 + 1 state has some relations to downstream current as previous states. To further verify the accuracy and stability of traffic state or travel time prediction, the model should consider the spatial and temporal properties, which is the main purpose of this paper.

Review
Until now, looking from prediction approach, many methods have been proposed to predict road travel time or short-term traffic state. These models can be classified into three types. One of these is based on the mathematical analysis, such as mean value of history [1], regression method [2], exponential smoothing model [3], time series model [4], and Kalman filter model [5]. The others can be classified as intelligent or learning forecasting method; those methods include nonparametric regression model [6], neural network [7], chaos theory [8], fractal theory [9], and support vector machine model [10]. Besides, there exit many combination prediction models which combined two or more subforecasting models and take advantage of dynamic combination weight to make those much suitable for the current traffic state submodel that has more weight for final result, so the forecasting accuracy can be improved. Mathematical analysis based on traffic flow physical characteristics and strict mathematical deduction, which are under the general traffic condition, have a good accuracy. However, when the traffic state is in large fluctuations, this method lost its performance greatly. The intelligent method is primarily based on statistics, machine learning, or data mining, which does not depend on complicated physical relations of traffic flow, has already been developed comparatively mature, and is easy applied in prediction. But the main disadvantage of intelligent method is that it needs the large history data and huge learning computation before prediction. Moreover, intelligent method has a relative poor generalization performance: that is to say, the model which trained on this road may not be suitable for the other road. For the combination model, it combines two or more methods as an integral one, fully using submodel's good properties in various traffic conditions, improving the integral model performance. Though many scholars believed that the combined model shows good accuracy and stability in traffic prediction, which has great potential for the further study, but the performance of combined model depends on the ability of prediction on each submodel. So improving submodel and studying ideal combined method are a key process for short-term traffic forecasting.
From the viewpoint of traffic properties, traffic flow is a stochastic and complex system with a strong randomness. However, it is still of regularity, especially in a long term. So we can say that the road traffic has a time series regular pattern. On the other side, expressed as in Section 1, a road next time step traffic flow state must be affected by the adjacent of upstream and downstream road current or before time step flow state. So traffic flow states have spatial relations. However, at present, a lot of literatures only take temporal relations into account. This mindset severely restricted further improving the short-term traffic forecasting property.
According to the traffic temporal-spatial characteristic, there are two research ideas, which include temporal analysis and temporal-spatial analysis.
Based on temporal analysis, Lee and Fambro applied autoregressive integrated moving average model (ARIMA) to predict the freeway traffic volume [11]. Williams and Hoel, based on ARIMA model, take the traffic periodicity and seasonality into account to develop a seasonal autoregressive integrated moving average model (SARIMA) [12]. Wang and Jun applied application of minimum probability machine regression in traffic flow time series forecasting [13]. Xie et al. employed the nearest neighbor algorithm and pattern distance search to predict traffic flow [14]. Okutani and Stephanedes proposed a Kalman filtering model for traffic flow prediction [15]. Wei et al. according to traffic nonlinearity and uncertainty combined the PSO and neural network to predict traffic flow [16]. Fan et al. developed a combined model based on BP neural network and nonparametric regression method [17]. All above temporal based models show the good performance at steady traffic state. But when the traffic is affected by outside incidents and the state fluctuates between large ranges, the performance was lost greatly. Engineering   3 Based on temporal-spatial analysis, Dunguo et al., according to the historical cycles and spatial correlation of traffic flows, proposed a temporal-spatial prediction model. They used time series theory to forecast the next step traffic state and then, based on road spatial relations, used RBF neural network to predict traffic flow. For two prediction results combined with properly weight and as final prediction result, the test data shows that prediction accuracy and stability have a great improvement [18]. Dong et al. applied traffic flow theory to present a formula about current road step + 1 flow rate and its adjacent step flow rate. Then take advantage of linear filtering through history observing road flow data, and at the end, calculate the prediction. By comparison, the result shows that the temporal input factor provides more accurate information to the flow rate prediction [19]. Hu et al. adopt cross-correlation function to depict similarities between different traffic flow series according to the observed flow data. After choosing the most correlative road links and their time delay instead of the upstream or downstream ones, a Hybrid Process Neural Network is constructed to predict short-term traffic flow. The experiment results show that this method outperforms other compared methods [20].

Mathematical Problems in
Although the literature [21] considers the traffic spatial and temporal characteristics, it is really a weight combination of time series prediction and spatial sequence prediction, and there is no clear presentation of the temporal and spatial relations of traffic state. The literature [22] studies the spatialtemporal relations based on traffic flow theory which is good fit for freeway or expressway traffic state prediction. That type of traffic is a continuous stream which has little external influence. But most city road traffic has interrupted flow; moreover, the city road traffic has many effects such as pedestrian and bicycle. So the model has the weakness in applying in city road traffic prediction. The literature [23], however, describes the road traffic space-time relations but it finally employed neural network to train and predict traffic. So the model also has the neural network deficiency such as overfitting or bad generalization. Comparing above methods, nonparametric regression is another good method for traffic state prediction. Because nonparametric regression has free model pattern and nonlinear characteristics, the properties are great fit for traffic variation. This study will fully take advantage of nonparametric regression combined with the spatial and temporal characteristics of traffic flow, to improve the prediction accuracy, forecasting adaptability and computational efficiency.

The Model Development
To develop the model, three steps would be introduced, first, by analyzing the traffic spatial and temporal characteristic proposed nonparametric regression (NPR) and nearest neighbor (KNN) method. And then give a definition for spatial and temporal data structure of the road, introducing neighbor distances and forecasting function beside combination of forecasting. Finally, real traffic data is used to test the performance.

The Spatial-Temporal Relations of Road Traffic. A large road network is composed of segments and intersections.
One segment is connected with other segments by intersections. So one road traffic flow has close spatial-temporal relations to its adjacent links. In this section, the relations would be discussed from traffic theory and real detection traffic data.
(1) The Relations about Traffic Theory. The R2's flow volume includes the upstream inflows ud , ul , and ur and flow volume exists at the end of time to subtract the same interval outflow from R2 downstream; the relation is presented in (1). From traffic flow theory, travel time and volume are shown in formula (2), this function is named BPR.
where is travel time on the road, 0 is free flow travel time on the road, is real flow volume on the road, ud ( ), ul ( ), ur ( ), dd ( ), dl ( ), and dr ( ) are defined as upstream inflow and downstream outflow direction through left turn and right turn flow rates separately. Δ is a time step length, is the capacity of the road, and , are parameters. Using those detection data to draw the road upstream intersection east, west, and north directions and Jiang Bing Road's flow curve in the same time section, the time section from Friday 0 o' clock to next Thursday 24 o' clock, there are total 2016 flow series with one direction. The flow curves can be seen from Figure 3.
From all direction flow curves, we can draw two conclusions: one is that the daily fluctuation trend is similar, almost the same as the colony; another is that if upstream traffic flow in one time section is similar to the other time sections, the downstream has similarity too. Though curves of Figure 3 indicate that upstream and downstream flows have strong spatial correlations, they cannot indicate the related degree using numbers. Correlation coefficient is an important indicator to reflect two series relativity, so we calculate the flow series correlation as shown in Table 1.
From Table 1, we can see that the all upflows and Jiang Bin Road flow correlations are greater than 0.8, which shows the strong relativity obviously.    Figure 4 is Jiang Bing Road travel time series curve; the curve shows the similarly of each day, especially the series treated by smoothing.
(3) Conclusion of the Relation and the Way of Applying. In practice, (1) and (2) have a good performance on freeway or expressway, but on urban road, they have large estimation errors. So this paper abandons traffic flow theory and takes advantage of road flow spatial relation, at the same time employing the nonparameter regression to build forecasting model.
Urban road traffic flow can be separated to commute traffic and elastic traffic. Commute traffic shows much stability, which has same characteristic and change pattern on the same day in a week. Elastic traffic has strong stochastic; it is difficult to forecast to some extent. However, real survey date presents that, in peak hour, majority of traffic is commute and in off-peak time section there exists elastic traffic. Stability property can be disclosed by traffic temporal and traffic spatial property fits for interpretation of elastic traffic. In this respect, utilizing traffic time-space relation can improve short-term traffic forecasting performance, especially for urban road traffic.
Make use of the principle that a road always has a similar traffic state on same workday or weekends at the same time section, besides a road adjacent link has similar state to its history; the road traffic; the road traffic state may be similar to those history cases. Based on this phenomenon, we can use the upstream and downstream traffic current traffic state to forecast road short-term traffic state, such as travel time. Nonparametric regression (NPR) and nearest neighbor ( NN) method can be applied in this problem.

NPR and KNN Methodologies.
The NPR based method has been developed and widely used in various fields of predication problems. In particular, NPR shows good performance on those questions which use parametric model that do not describe the relations of dependents and independents well with current human mathematical knowledge. The fundamentals of NPR originate from pattern recognition itself [24]. NPR needs little or no theory, no assumptions regarding the strict normal distributions of independent variables, no consideration of unknown parameters, and no need to · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · know the complexity of model process or requirements of deep understanding for object. On the other hand, the NPR approach assumes that the knowledge about the relationships between the independent and dependent variables is included in past experience instead of artificial knowledge of the target held by the modeler. This assumption gives NPR distinct advantages over parametric approaches in estimating or forecasting mixed states [25]. Additionally, NPR is free from the strict assumptions of parametric approaches, that is, the normality of the intrinsic distribution of each independent variable and the independence between independent variables. NPR is based on chaotic system theory with strong theoretical grounding and successful applications. Traffic state transformation has nonlinearity properties; otherwise, it shows strong similarity principle in history. This features great fit for interpretation by NPR.
Only applying NPR for forecasting may cause some stochastic errors, so another method which called nearest neighbor ( NN) always combined with NPR to solve forecasting problem. NN searches for potential neighbors that included historical data, similar to the current case, and it is used to image the value of the dependent variable and as predictions value. The approach is excellent for time series analysis and, moreover, outperforms linear approaches when the relations of state are nonlinear [26]. The KNN method is a kind of data-driven, heuristic approach for forecasting, the method makes use of a database to search for data that is similar to the current, and then it makes a forecast based on the found data. These found data are called the nearest neighbors of the current data. The algorithm of this method can be described in the following steps.
(1) Create a database containing a mass of representative historical data, and the data needs to be constructed.
(2) Calculate Euclidean distance (ED) between current data and historical data, and then select minimum Euclidean distance corresponding to history data as most closeness history data; namely, select nearest neighbors from history data.
(3) Take those selection history data as database and use history data in the next time step value to forecast current next time step state.
For improving forecasting performance, various weights calculation methods are applied in nearest neighbors.

Traffic History Data Structure
(1) Spatial Data Structure. Traffic flow volume is the easiest measurement parameter, so take traffic flow as input parameter of the model. According to Figure 1, the R2 upstream direction flow at time section is defined as ud ( ), the left turn flow is ul ( ), and the right turn flow is ur ( ). Downstream through flow is dd ( ), downstream left turn flow is dl ( ), downstream right flow is dr ( ), R2 flow at time section is ( ), travel time of R2 is ( ), and ( , + 1) is road travel time at + 1.
For aim of efficiency in organizing the history data, a structure named Link ( , ) is defined to save the road relevant history data at time .
where is the index of a road and history measurement data of road is saved as the above structure. At time , the component of ( , + 1) is null, when time comes to + 1 and the road travel time of +1 has been detected, then saving the travel time into the vector component ( , + 1).
(2) Travel Time Series Data Structure. As shown from above, road travel time has strong cyclical principle in travel time sequence, so forecasting the travel time based on temporal sequence property may improve forecasting performance. The history travel time sequence is saved simply, which is defined as ℎ ( , ), = 1, 2, . . . , , where is used to express the th road. Creating a history database, which saves the road travel time day by day, and one record includes one day with all 288 series. The history data structure is presented in Figure 5. Current detection data is defined as ( , ); for the aim of applying NN model, ( , ) series length must be appropriate, too long may increase the amount of computation in nearest neighbor search, and too short may not able to reflect the dynamic change of travel time series. In practice, set the length as a variable, determined by real data test. The process applying NN and NPR to travel time forecasting based on temporal is presented in Figure 5.

Distance Definition and Calculation.
Nonparameter regression is used in travel time forecasting, its essence is to search a similar traffic case from the history database and use the history case in next time step travel time to estimate current next time step travel time. So similarity calculation is the necessary step; distance always is used to determine the degree of two similarity samples. For example, Euclidean distance (ED) is the most popular method for data similarity. But the traffic flows spatial property is different from the characteristic of travel time series, so it is necessary to define the spatial flow series distance and temporal series distance separately.
(1) Definition of Spatial Flow Distance. Euclidean distance is used to present the similarity of current and history traffic state. Euclidean distance (ED) is calculated by the following formula: where ( , ) is Euclidean distance between current time and history data at time. The above formula is classic Euclidean distance equation. In this research, for the purpose of presenting the close relations vectors that have more weight to the distance calculation, we use each vector to multiply by a weight . The distance formula is changed in (5). ( , , ) is named as Weight Euclidean Distance (WED).
is the weight of each component, which is determined by the components effect of forecasting road traffic state. Suppose has direction ratio relations to correlation between upstream each inflow flow and each downstream outflow for forecasting road. Defining ( ) as those correlations, ( ) can be written as where ( ) is the flow of upstream or downstream direction, left turn, right turn, and objective road itself at time , is presented as 1, 2, . . . , 6, 7, is the mean value of ( ), ( ) is forecasting road flow at time , and is the mean flow of forecasting road in time interval. Defining as normalization of ( ), (7) is presented as follows: Employing (4) to calculate two samples distance not always fits space distribution of traffic flow data sets. Because each direction flow may have large wieldy range and have different distribution, taking those flows as same metrics may magnify smaller fields effect in the end. For example, in two samples, a major direction flow is V ma ( ), V ma ( ), a minor direction flow is V mi ( ), and V mi ( ), V ma ( ), and V ma ( ) are far greater than V mi ( ) and and V mi ( ); here this phenomenon shows that if [V ma ( ) − V ma ( )] 2 = [V mi ( ) − V mi ( )] 2 , then we can get the result that major direction flow and minor direction flow have the same effect on two-sample distance calculation. But in fact, major direction flow field will reflect the true closeness between two samples. So we introduce relative Euclidean distance (RED), that is to say, taking two components relative difference as variable to calculate the Euclidean distance. Including relevant parameter, (5) is changed to (8), and ( , , , ) is called weight relative Euclidean distance (WRED).
(2) Definition of Temporal Series Distance. Compared with the space flow series, the time series distance calculation is relatively simple. It is because of two reasons: one is the travel time series detected on same road section causing the series to have same attribute and the other is that the travel time series generally have the same distribution and the similar fluctuation trend on the same road. Due to above reasons, it is not needed to consider the consistency between the series each component in calculation of the time series distance; therefore, Euclidean distance (ED) formula can be directly applied in time series distance calculation; the equation is presented in 3.5. Forecasting Method. Once we find out nearest neighbors, the forecasting value can be directly calculated by simple forecasting function. One function can be described wherẽ ( , ( + 1)) is the mean of selected samples vector field ( , ( + 1)), named mean weight function (MF), presented as̃( , ( + 1)) = 1 ∑ =1 ( , ( + 1)) .
This function treats each neighbor with same importance for forecasting but ignores the distance order. For the purpose of considering the neighbors closeness between current states, here we introduce a distance inverse weight function (DIWF). DIWF uses each neighbor for current state distance inverse as weight to adjust̃( , ( +1)). The DIWF is presented as̃( where is the distance between th nearest neighbor and current traffic state and it is calculated by the above distance equations. Appling forecasting function to current spatial neighbors can get next step forecasting travel time based on road upstream or downstream flows; the forecasting value can be expressed as ( , + 1). As the same way, applying the forecasting function to temporal neighbors can get the other forecasting travel time, based on time series, expressed as ( , + 1).
From Section 3.1, the traffic data analysis shows the spacetime characteristics of the traffic state. To improve the shortterm traffic forecasting accuracy, it is necessary to consider the fusion of space-time characteristics. Consequently, using spatial submodel's good property of the traffic random influence to adjust the temporal submodels stability, it may get rid of the two models' drawback and improve prediction performance.
(1) Spatial and Temporal Model Combination. Assuming that the prediction result of each of those methods is presented as ( , + 1), = 1 or 2, presented as spatial or temporal model, then the combination integration model can be expressed as formula (12). The ( , + 1) is the integration model forecasting travel time at time + 1.
where +1 is the weight of each subprediction model at time + 1, which is adjusted by the objective of combination forecasting error minimum. Each submodel has stability in a shorter time interval: that is to say, in steps the submodels optimal combination weights have some sustainable properties and can be used as next step combination weight. We use these characters to build next optimization model. It is expressed as (13), (14), and (15).
where ( , ) is the detection flow at time and is the number of detection flow series before current time. Before time steps, we use submodel optimization mean weights as the submodel weight at time + 1. Therefore, the integration weight at time + 1 is presented as (16), where is determined by the actual road traffic state fluctuations. Formula (13) is a nonlinear constrained optimization problem; it can utilize particle swarm optimization algorithm (PSO) to solve this problem. The particle swarm algorithm is an intelligent searching method, which has good generality. Take equation (13) as the fitness function. Set the constraint conditions (14) and (15) as the range of particles. The particles speed and location can be updated by formula (17). Meanwhile, setting terminate iteration condition, which is presented as two adjacent iterations, the optimization particles Euclidean distance is less than a given threshold or the number of iterations to reach a certain threshold of .  When it reaches the terminate condition, the program exits and outputs final optimization particles location.
where V is the particles speeds at time , is the optimal position of the particles that is searched at time , is the globally optimal particles position at time , is the particles position at time , is inertia weight, which is presented as a weight for particles holding its speed, 1 is the weight of particle tracking its optimal value, 2 is the weight of particles tracking the global optimal particle position, is speed constraint factor, usually set as 1, and , are two uniformly distributed random numbers located in [0, 1].
The algorithm steps are as follows.
Step 1. Randomly generate a certain amount of individual particles that satisfy the constraints condition.
Step 2. Based on the objective function, calculate the fitness of each particle, update each particle history optimal fitness value corresponding to the location information, and at same time update the global optimum particle corresponding location position.
Step 3. Use (17) to update the particle speed and position.
Step 4. Go to Step 2, and determine whether to terminate, if yes output the final position and go to Step 5; otherwise go to Step 3.
The combination model gets the submodel as integration and takes traffic spatial-temporal characteristics into account; meanwhile, the combination weight dynamic changes with submodel before performance; all the above would be helpful to improve the accuracy and stability of the model.

Forecasting Algorithm and Process.
Creating a database, the structure is the same as formula (3), saving adequate number of history detection traffic flow or travel time data. Apply (6) and (7) to determine based on the database. Then load road current detection state data Link ( , ) and ℎ ( , − +1), employ NN and NPR to spatial and temporal series separately, and find their closeness neighbor vector data sets. Finally, using (10), (11), and (12) to calculatẽ( , + 1), the algorithm process is presented as in Figure 6.

Case Study
For the aim of inspection of the study performance, we select the arterial road in Nanchang China as test road. Each direction has installed inductiveloops at intersection. The continuity flow data and R2 travel time have been detected all days and the detection interval is five minutes. For fourteen days, the flow data of these times section between 7:00 a.m. to 19:00 p.m. are selected as test data, and one day has 144 sequences. All fourteen days include 2016 flow sequences in total. In this case, the first thirteen-day data are set as experiment data, and the fourteenth-day data are regarded as validation data. The road topologic structure and the detector distribution are shown as in Figure 7. R2 is defined as forecasting road. R1 through flow, A1 left turn flow, A2 right turn flow, R2 flow, R2 travel time, R2 through downstream, and left turn and right turn flows are detected.
In order to test the performance of each submodel and combination model, respectively, the test solution of each model is introduced in the following section.
(1) Spatial Submodel Test (SKNN). For testing the spatial submodel's performance under a variety of similarity distances, the ED, WED, and WRED, formed as ( , ), ( , , ), and ( , , , ), are calculated. Those distances are used to find nearest neighbors of current detection data from history data separately. MF and DIWF functions are employed to forecast R2 road travel time at time ( + 1). The number of may affect forecasting accuracy, so = 1, 2, . . . , 25 are applied in each forecasting function. The forecasting process is programmed with MATLAB. Running the program with the history detected data and corroboration data will get the forecasting performance.
(2) Temporal Submodel Test. Compared with the spatial submodel, temporal model directly uses Euclidean distance to compute the series similarity distance, so there is no need to consider the correlation and relative parameters in the test. But with the same as spatial model, the number of time series nearest neighbor should be considered, so different number neighbors would be tested. In the case, it sets the neighbor number from 1 to 30 to show the performance and determine the appropriate nearest neighbor number.
(3) Combination Model Test. The input parameters of combination model include two submodels output, so there is no need to think about any other variables. In this test, the objective is to compare the forecasting accuracy with each submodel and analyze the errors variation.
(4) The Evolution of Forecasting Indicator. To verify the accuracy of the proposed method in this paper, select the average relative error (ARE) and mean square error (MSE) as evaluation indicator to study each method's performance. The ARE can present the prediction value deviation extent from the real detection value. The MSE not only reflects the deviation extent but also reflects the dispersion degree of error; the smaller MSE, the better the predicted accuracy. Each error indicator calculator formula is defined as ( ) is prediction value at time and ( ) is real detected value at time .

Test Analysis
In order to test the performance of submodel and integral model, the ARE and MSE can be used to evaluate those models. After then, comparing the indexes with each other, the improvement has been proofed.
(1) Spatial Submodel Evaluation. Take ARE as accuracy index to demonstrate the different distance method and forecast function precision. Because different number nearest neighbors may affect the performance, drawing those ARE graph  by , the curves are presented as in Figures 8 and 9. Let us introduce some definitions to formalize the prediction problem. M.ED here stands for the ARE applying MF function and ED, D.ED denotes the ARE under the condition of WED distance, and M.WRED is the ARE according to WRED. Following this principle, D.WED and D.WRED can also be defined as ARE acquired by WED and WRED, respectively. Figures 8 and 9 show that ARE is decreased by the increasing number of nearest neighbors. When increases from 1 to 4, ARE reduces largely, but when rises above 5, the ARE changes slowly, almost near to horizontal. increases continually, until reaching 15; after that the ARE has a little increasing trend.  Clearly, each method can achieve a good performance when lies in the range between 6 and 15, and D.WED is found to obtain the best performance. It is interesting to note that there is no significance difference in performance among all methods in terms of MF as well as DIWF when is above 15. Furthermore, looking from each defined distance, we can conclude that when is equal to a certain value, especially less than 5, ED distance has the worst performance, the WED distance performs a little better than ED distance, and WRED distance has the greatest performance among three distances. In addition, when is less than 5, the WRED performance is better than others obviously.
Take ARE value of 6 methods into one figure to compare the performance with each other; as shown in Figure 10, the figure displays that D.ED is little better than M.ED, D.WED is little better than M.WED, and D.WRED is near equal to M.WRED before five nearest neighbors. When did exceed five, the D.WRED's performance is better than M.WRED. This phenomenon is caused by those farther neighbors having equal weight with forecasting value in M.WRED, because the farther neighbors may have more errors. However, D.WRED uses the neighbors distance inverse as weight to combine the nearest neighbors forecasting, so those farther nearest neighbors have smaller weight in forecasting little errors. The ARE of test data is presented in Table 2. We draw the figure of MSE index for each defined distance and its forecasting function. Six plots can be seen in Figure 11. MSE can present the extent of forecasting value deviating from true value.
From Figure 11 we can see that the plots of MSE and ARE have similar tendency. Specifically, the indicators increase dramatically at early period and then keep stable. After that, the indicators turn up slowly when increasing the value of .
Difference distance and each function are applied in the test data; MSE is presented in Table 3.
Through the above-mentioned analysis and that in Table 3, some conclusion can be drawn. One is that, when the nearest neighbor number is less than 5, D.WRED has superiority. Each method's best performance is appearing in the section of between 6 and 15. In this section all method's accuracy has no much difference. There is no significant difference among all methods in this section. Besides, we can conclude that there is no further improvement on performance when the parameter exceeds 15.
Over all, it can be said that taking road spatial flow relations as parameter to search nearest neighbors in history data can find the most relevant neighbors. And then defining each vector relative distance as distance measure index can compare each road flow in an order of magnitude; take the vectors relativity into account, so the distance did show  that the vectors similarity is better than Euclidean distance. Furthermore, using the nearest neighbors distance inverse as combined weight to replace mean weight can improve the performance of the method. That is to say, selecting DIWF as forecasting function is a good choice.
We draw three plots to predict time for further presenting the performance of DIWF under the condition of applying ED, WED, and WRED.
Because when (nearest neighbors) is equal to 15, each method has good performance; for simplicity, select 15 nearest neighbors forecasting value as comparing objects. While the detected test data curve can be drawn to show the comparisons. The curves are presented in Figure 12.
In Figure 12, DET.T is the curve of travel time series on road 2 (R2) that can be detected as test data. The series are detected by two video detectors installed at R2 upstream and downstream exits of the intersection. The detection time is from 7:00 a.m. to 19:00 p.m., and the value is taken as test data.
As can be seen form Figure 12, all the methods based on DIWF have a good performance. D.WED performs better than D.ED, and the performance of D.ED is slightly better than D.WED.
(2) Temporal Submodel Evaluation. Time series model has two parameters: one is the length of base series and the other is the number of nearest neighbors. So take series length and  neighbor's number as variable to apply time series model to test the temporal model performance. The series length is between 1 and 20 and the neighbor number is from 1 to 30. Using average relative (ARE) and mean square error (MSE) to illustrate the time series model's performance in different series length and neighbor number, the drawing curves of ARE and MSE are shown in Figures 13 and 14, and in two figures, represents the series length.  As can be seen from Figures 13 and 14, no matter what kind of indicator is applied, the predict model can obtain the best performance. When the length of time series is equal to 8 and is equal to 15 the best performance will be achieved and the ARE and MSE are 11.37 and 11.64, respectively.
To further present the performance of proposed method, the indicator of spatial submodel and temporal submodel is plotted sequentially; see Figure 15.
The test data shows that spatial submodel's ARE is 8.172 and temporal submodel is 11.37; the former is better than the latter, but temporal model is better than spatial model on MSE; the value is 11.64 and 13.4 separately. The MSE shows that, although the temporal model ARE is no better than spatial model, its MSE is lower than spatial model; this indicates that temporal model error's distribution is no more spreading than spatial model, and the stability of temporal model is better than spatial model.
To observe the plots of these indicators we can also conclude that temporal submodel is more close to the ground truth compared with spatial submodel in terms of global prediction. But spatial submodel is better to match the true value in some local domain.
(3) Combination Model Evaluation. Apply the combination model to test case, getting the combination forecasting, and the comparison curves are drawn in Figure 16.
The curves demonstrate that, through the combination, the forecasting curves have two submodels good properties; in each time section, the submodel which is more nearer to real data has more weight to leave its value to combination model. This phenomenon shows the effectiveness of combination model. In order to descript the improvement of each submodel and integration model, we use Table 4 to show the forecasting error indicators. The statistics Table 4 shows that, when applying the combination model to this test, the ARE decreased to 5.76; compared with spatial model, there is 2.41% improvement; for temporal model, there is 5.59% improvement. This indicates that combining the spatial and temporal models can take advantage of traffic potential characteristics and improve the forecasting accuracy. Compared with other traffic short-term forecasting methods, for example, all reference methods in this paper, the ARE is about 6% to 14%, which is no better than this study. For MSE, there is 7.51 and 5.57 improvement in two submodels, so it can be said that through combination the model stability greatly improved.
(4) Computation Efficiency Evaluation. The short-term traffic forecasting method can be applied or not depending on two factors: one is accuracy and the above content shows the applicable property; the other is algorithm timeliness, so computation efficiency evaluation is needed. The introduced method main consumption of computing resources is in searching nearest neighbors. The searching time is related to the amount of history database; when history data increase, the searching time will increase too. For the sake of timeliness verification, apply the method to various number record history databases and test the time consumption. The program is built on MATLAB, and running on Windows 7, the computer's processor is 2.0 G; according to each database, its time consumption draw is in Figure 17.
Form this test, we can observe that the time consumption of prediction on two-week traffic data is 0.683 seconds.
The figure also shows that the consumption of time is increasing with the database increasing, and the curve has similarity with exponential function; this trend illustrates that the computational efficiency of this method is heavily depending on the amount of data in database. When the traffic database increases to 128 weeks, the time increases to 44.053 seconds, and the value is near to 1 minute. For shortterm traffic forecasting, the interval always is 2 minutes to 30 minutes, so the description method fits for the database that would be better for no more than 128 weeks' traffic data being saved. In practical application, large amount data would be saved in history database; hence, the introduced method is required for further study of history data organization and algorithm of searching, consequently adapting to the need of practical application. The work would start in the future.

Conclusions
Based on the spatial and temporal relationship, the nearest neighbor nonparametric regression model is used to predict the short-term travel time. The test data shows the good performance of the method. In order to improve the prediction property, Euclidean distance (ED), correlation Weight Euclidean Distance (WED), and correlation weight and relative Euclidean distance (WRED) are defined to present spatial nearest neighbor distance, respectively. The two forecasting functions are described: one is mean weight function (FM) and the other is neighbors distance inverse weight function (DIWF).
Under different distance definition, WRED is better than WED, but the effect is not particularly significant; moreover, we can draw that WRED and WED are better than the Euclidean distance (ED). That is to say, considering the spatial correlation and flow vector relative in distance definition can effectively improve the performance match in nearest neighbor searching.
The results of test show that the number of neighbors can affect the results of prediction in aspects of spatial sequence as well as time series. Moreover, we can acquire good results when the number of lies in the range between 6 and 15. But the performance could deteriorate when is less than 6 since it could be affected by randomness of historical data. But the number is more than 15, the prediction has a trend to close to the mean value of the historical data, and this may influence the model to present the traffic randomness, so take more errors in prediction.
Minimizing the combination error variance as objective to establish the combination model, the particle swarm optimization algorithm is used to solve the model, and the test result shows that by the combination of spatial-temporal model the forecasting performance has a good improvement.
Timeliness is the key of short-term forecasting, especially in KNN algorithm, with the accumulating of historical data, the computational of nearest neighbors searching and matching will be more and larger, resulting in the fact that the real-time efficiency of the model may be not high. The computation efficiency evaluation presents that the time cost is increased by the number of history data sets; applying the introduced method to 128 weeks' history data, the time cost is near one minute. This time consumption has already affected the real-time requirement of the forecast model. Therefore, optimizing the organization of history data and improving nearest neighbor matching search algorithm will be the next step problem which needs to be studied.