Space-Time Hybrid Model for Short-Time Travel Speed Prediction

Short-time traffic speed forecasting is a significant issue for developing Intelligent Transportation Systems applications, and accurate speed forecasting results are necessary inputs for Intelligent Traffic Security Information System (ITSIS) and advanced traffic management systems (ATMS). This paper presents a hybrid model for travel speed based on temporal and spatial characteristics analysis and data fusion.This proposedmethodology predicts speed by dividing the data into three parts: a periodic trend estimated by Fourier series, a residual part modeled by the ARIMAmodel, and the possible events affected by upstream or downstream traffic conditions. The aim of this study is to improve the accuracy of the prediction by modeling time and space variation of speed, and the forecast results could simultaneously reflect the periodic variation of traffic speed and emergencies. This information could provide decision-makers with a basis for developing traffic management measures. To achieve the research objective, one year of speed data was collected in Twin Cities Metro, Minnesota. The experimental results demonstrate that the proposed method can be used to explore the periodic characteristics of speed data and show abilities in increasing the accuracy of travel speed prediction.


Introduction
Driven by the need to promote Intelligent Transportation Systems (ITS) and traffic safety management, short-time future travel speed predicting is a crucial issue that attracted a number of studies.Common forecasting methods mainly can be divided into five categories: methods based on traditional statistical theory, intelligent model methods based on knowledge discovery, methods based on nonlinear system theory, methods based on hybrid model, and other forecasting methods.Concrete methods include regression method, neural network methods, wavelet network models, time series model, support vector regression methods, Kalman filtering methods, exponential smoothing methods, gray system model, trend extrapolation method, and artificial intelligence.
Travel speed data often shows periodic trends, with one to two minimum value peaks in one weekday; however, such a trend may not be similar from Monday to Sunday because of the different characteristics of the day trip within a week.
Observing the cyclical changes in the regulation of the speed, one can grasp the daily variation characteristics, which could provide the planning and design indicators and basic services.Dendrinos [1] considered traffic as a combination of 17 periodic components and residual parts and as fitting the trend with Fourier series.Fei et al. [2] present a dynamic linear model to predict short-term travel time: travel time is the sum of the median of historical travel times, time-varying random variations in travel time, and evolution error model.Liu [3] presents that there is an intrinsic mode function in speed, and this part is predicted by ARIMA.Zhang et al. [4] present that periodic variation of traffic flow been analyzed by the statistical volatility model.It is important to note that periodic trend analysis cannot give the speed change in small scale, which means the prediction of small range float for speed becomes particularly important.
The residual part is the difference between the real-time series data and the periodic trend.With comparison and analysis of a great amount of data, the residual part shows an obvious fluctuation, but the volatility is stable within a certain range.For such series, autoregressive integrated moving average (ARIMA) method has a unique advantage at the time of prediction.ARIMA model is one of the most widely used regression techniques.Its applications in freeway traffic forecasting can be traced back to 1979 [5].When using the ARIMA model for traffic flow prediction [6], difference processing is performed on the data if it is a nonstationary sequence and has a certain growth or decline trend.If there has a heteroscedasticity in the data, the nonstationary sequence is required to be smoothed with the difference model until the autocorrelation function and the partial correlation function of the processed data are not significantly different from zero.
Lund [7] used R language to simulate the ARIMA process, with a great result.However, Karlaftis and Vlahogianni [8] point out that the ARIMA model lacks the ability to capture long memory properties and does not jointly treat the mean and variance.
Research on the speed prediction at the present stage focuses primarily on characteristics of time variation on a single point on the road, considers changing speed with time, lacks spatial analysis of the target position, and ignores the influence of predicted results from the travel speed change of adjacent links.However, links in the road network are not isolated; speed change of target position follows its own change rule in the dimension of time and is affected simultaneously by upstream and downstream traffic conditions.The speed of the upstream section can be transmitted to the downstream section through the road with similar distribution characteristics, and the traffic state of the downstream will also react to the upstream.Adding spatial correlation analysis can effectively detect the occurrence and influence of nearby plugging points and can reflect the impact of emergencies on speed, might warn of upstream and downstream traffic issues, and reduce the congestion and network congestion caused by emergencies in a timely manner through management means paralysis.At first, van Lint [9] established a state-space neural network (SSNN) model to predict travel times directly from the data of adjacent section.Pan et al. [10] developed a spatial and temporal dynamic model with stochastic cell transmission.Zou et al. [11] put forward a hybrid model combined with spatial analysis and several time analyses to compare and analyze the results.
Most of methods encounter the problem of lack of accuracy or reliability when used separately in travel speed prediction, and so we proposed a hybrid model to properly combine them.Using the real data to test and verify results of periodic trend and ARIMA, ARIMA method, spatial analysis regression model, and the hybrid model we proposed, the comparison indexes of predictive effects like RAME, MAE, and MAPE all indicate that our hybrid model can predict short-time travel speed accurately and reliably.

Methodology
The signal forecasting method for short-time travel speed has proved to be difficult in providing accurate information.We established a hybrid model combining the results of time prediction and spatial prediction so that the real speed changes of freeways can be more accurately simulated.

Periodic Trend Analysis.
Traffic speed usually changes daily.The cyclical change of travel speed is obvious, especially within a workday.Tang et al. [12] discussed the periodic characteristics of speed in detail.The periodic trend is a significant feature in dealing with speed data and should be considered first in analyzing speed change rules.Periodic analysis [7] is adopted to investigate the cyclicality in the daily traffic data and is also effective in analyzing travel speed rules.
From the perspective of regression analysis, change of travel speed appears a kind of trend of cyclical fluctuations; therefore, wave theory is feasible during the analysis.Fourier series express the periodic trend in traffic speed, which is driven by trigonometric functions.For a fixed period, T, the velocity of a point and the velocity after the  period of this point often show some similarity and consistency, which function together by repeatability of driving behavior and stationarity of road conditions.We assume that cyclical change of travel speed conforms to a common expression of Fourier series [1], a complete cycle expression in a unit time interval shown as in which   sin( +   ) =   cos  +   sin  is harmonic of  order.Consider the truth of travel speed prediction occurring from 0 a.m. to 12 p.m., where (1) has a general form: in which  = 2/, which is a frequency index defined as cycles per unit time. is the cycle time. is the index of cycle series;  represents the order of periodic elements in the series.  and   are parameters determined by historical data.

Autoregressive Integrated Moving Average Model.
The autoregressive integrated moving average (ARIMA) model has always been used to solve stationarity, randomness, and periodicity in time series analysis and is one of the most general models used to predict spot speed by its past speed data.The periodicity of traffic speed has been considered in the above analysis, but we should remove the periodic trend in daily speed data before using the ARIMA model to forecast travel speed.The remaining part represents variations in realtime specific traffic conditions.In fact, ARIMA is the combination of the two algorithms: AR and MA, with  representing the integrated term.A nonseasonal ARIMA model is classified as ARIMA (, , ) model, in which AR means autoregressive,  is the number of the autoregressive term, MA means moving average,  is the order of the moving average, and  is the number of difference when the time series is steady.The mathematical representation of an ARIMA (, , ) is as follows: where   is an original data series;   is a white noise sequence, which is a sequence of random variables where the mean is zero and variance is  2 ;  represent lag operators, and q is order of moving average;  0 is a parameter, The process of ARIMA modeling and analyzing is as follows, which is implemented by R: (1) Smoothing the data: if the data sequence is not stable, we should differentiate the data, and the times of difference are , until we get a stationary time series.
(2) Model identification: the initial ARIMA (, , ) model gained from autocorrelation function (ACF) and partial autocorrelation function (PACF) of the processed time series could determine autoregression order () and order of moving average () preliminary.
(3) Parameter estimation and model diagnostic: when we gain the coefficients of the initial, we should test the significance of coefficients in the model and at the same time test the white noise of the model.
(4) Forecast and analyze the data using a model with appropriate parameters.

The Spatial Correlation Analysis.
The road consists of connected sections; each spot on the road has spatial accessibility.We can learn from the theory of spatial analysis that the relationship between the difference of attribute values in space and the distance between two points obeys the first law of geography [13].This means that, from a statistical point of view, the closer the distance between two points, the higher the degree of similarity they have.The spatial correlation of traffic flow would reduce with the increase of distance in a certain space.For a section of same spatial distance in a road, the spatial correlation generally increased, accompanied by the increase of traffic load [14].
Correlation index  of statistics could analyze the spatial correlation of travel speed (study on remote monitor system based on 3G mobile system).Assume that there are two points A and B in the adjacent position,   ,  B is the spot speed of A and B at the same time , and the correlation index is as follows: in which cov( A ,  B ) is the covariation of speed  and  at the time , cov( A ,  B ) = (1/) ∑  =1 ( A −  A )( B −  B ); ( A ) and ( B ) are the time series variance of speed in A and B at time .
As the vehicle has liquidity, the vehicle would arrive downstream from upstream after a certain period of time; the upstream traffic flow would be influenced if congestion or failure occurred in downstream.Therefore, we introduce lagged value while analyzing the spatial correlations.Assuming that a car traveled from A to E, with time-lagged value expressed as , the correlation index is changed as Correlation response is the degree of linear correlation between different groups of data.After repeated verification, two points adjacent to the forecast point upstream and downstream are suitable for selection.If the forecast point has only one adjacent point, the sample size is not sufficient to predict accurately.If the predicted point has too many adjacent points, it will produce invalid calculations and reduce the computational efficiency.Assuming that there are five adjacent points on one road as shown in Figure 1, we can build a multiple linear regression (MLR) model to forecast travel speed of point C as follows: in which  0 ,  1 , . . .,   ,  +1 , . . .,   ,  +1 , . . .,   ,  +1 , . . .,  n are coefficients of MLR model. 1 to i is the number of lagged value between A and C,  + 1 to  is the number of lagged value between B and C,  + 1 to  is the number of lagged value between C and D,  + 1 to  is the number of lagged value between C and E. And  is the total amount of time-lagged value at points A, B, D, and E and C, which is an integer greater than or equal to zero.And i, j, m, n all come from formula (5).The value of  is also obtained from a large amount of data by multiple linear regression from formula (5). Amin ,  Bmin ,  Dmin ,  Emin ,  Amax ,  Bmax ,  Dmax ,  Emax are the minimum and maximum time-lagged value between point A and C, B and C, C and D, as well as C and E. The calculated time-lagged values come from formula (5).The value of each point and the point C after the spatial correlation analysis can be determined, in which, we define the smallest  of one point and point C as the minimum value and the largest one as the maximum value.The time-lagged values are continuous between the maximum value and the minimum value. A ( −  Amin ), . . .,  E ( −  Emax ) denote the speed of points A, B, D, and E at the period of − Amin , . . ., −  Emax , respectively.Note that we can determine the number of terms of multiple linear regression and the value of  in each term only with specific data. is the random error.
For application in the actual engineering project, we should test statistical information at first, including model verifying and a goodness-of-fit test.Then, established time series or predicted value of time series can forecast the value of travel speed of the target position next period.In this study, we consider the time series and space influence of speed at the same time, which purposely increases the accuracy of short-time speed prediction.The reliability and accuracy of traffic speed prediction depend on a comprehensive understanding of every part of speed in the above analysis.Therefore, we assume that traffic speed is constituted of three components based on the discussion in the introduction: a periodic trend, a residual time series, and the space-time part.The structure of the proposed model is shown as follows: where ŷ is the predictor value of travel speed at time t,   is predictor value of method  at time t,   are corresponding parameters of   , of which values can be obtained through the regression of   some time ago;  is a constant.Calculating this formula usually requires taking advantage of analysis software and then selecting the most suitable model according to the specific conditions.

Case Study
The speed data used in the study were collected on a northeastward segment of I-394 freeway stretching between Trunk Highway 100 and Egret Blvd in Twin Cities Metro area, Minnesota [15], which is a two-way six-lane section.This freeway segment experiences significant speed decline during afternoon peak hours.Five adjacent stations are located on the selected segment to collect traffic speed data, in which point C is the one we try to predict.The distance between adjacent stations is about one mile.In order to ensure traffic stability on the road, we specifically chose points far from the import and export ramps that have large amounts of traffic, and we avoided the intersection to reduce the impact of traffic lights on the speed deliberately.The location of the segments and stations is shown in Figure 2.
The travel speed data were collected once every 5 minutes, 24 hours a day, from July 14, 2015, to July 13, 2016.The data has a long time span, so we removed the abnormal value caused by extreme weather conditions, traffic flow breakdown, and so on.The missing data for the 5 stations are less than 5%, and, in order to ensure the integrality of selected speed data for model validation in this study, we adopt historical average data to replenish missing data and bad data.

Periodic Analysis by Fourier Series.
It is known that the traffic speed pattern on weekdays is quite different from that on weekends; traffic flow during weekends is normally smooth.Thus, this study chose the data on weekdays to express periodicity of speed change.Taking Monday as an example, Figure 3(a) exhibits the law of speed variation and Figure 3(b) exhibits historical median travel speed in different days integrally.It is observed that there is one set of peak hours (from 3:00 p.m. to 7:30 p.m.), and the peak values of different weekdays are variable.
Through the analysis shown in Figure 3, we establish a two-dimensional Fourier series to match the periodicity of speed changed by day.For example, we plot the speed of every Thursday of 53 weeks and fit the scattered points, as shown in Figure 4.
The goodness-of-fit test results of each working day (as shown in Table 2) show that Fourier series cannot be a good fit.Values of -square, which is considered one of the main indicators of the accuracy of the evaluation, are all less than 0.6, which means the accuracy of this fitting model is limited.Therefore, just using periodic trend analysis to forecast speed is not enough.

Process of ARIMA Modeling.
We take the speed data of June 14th as an example.As shown in Figure 5, the residual part of the travel speed is the remaining of the trend part removed from the original data, which still shows some patterns.Dealing with volatile short-time prediction, the ARIMA model had been widely used, and its effectiveness and accuracy have been confirmed by a large number of experiments.Through the analysis above focusing on the periodicity of speed data, we do not need the difference proposed to remove the data cycle in the process of ARIMA modeling, which means  = 0 in ARIMA model.Accordingly, ARIMA model translates into ARMA (, ) model.The autocorrelation function and partial autocorrelation function of the residual part could inform us of the value of  and .
From Figure 6, we know that there are couples of order beyond the confidence interval, which lead to a number of alternative models.It is usual to determine the specific parameters of the model by Akaike's information criterion (AIC).According to this criterion, value size is used as a primary basis for determining which model is the best.It is obvious that we could improve the goodness of fit by increasing the number of free parameters.AIC could express the excellent properties of data fitting and avoid overfitting at the same time: In our case analysis, the most suitable model is ARIMA (1, 0, 2).Through this model, we can obtain the speed residual part of the predicted value, and the confidence interval is 95%.The residual part of the prediction results is shown in Figure 7.

Discussion of Spatial Correlation.
We can know the location of the detection point distribution from Figure 2, taking point C as the target location, and the sample cross correlation function (CCF) between point C and other points in different lags.In statistical studies, there is generally a strong correlation when the correlation coefficient is larger than  0.8.As shown in Figure 8, CCF gradually increases with the decrease of the lags' absolute value.Influence of upstream points is larger than that of downstream points on speed.
In our case, the CCF of E and C is less than 0.8; the relationship between these two points is ignored in the process of calculation.Therefore we consider that there are strong correlations between C and A, B, D, and that the largest lag is 1.The general expression of MLR model in point C is given by   () =  0 +  1  A () +  2  A ( − 1) +  3  B ( + 1) +  4  B () +  5  B ( − 1) +  6  D () .(10) Parameters in the model can be obtained by linear fitting.The speed of point C can be forecasted according to realtime speed data and the model above.The predicted results show that the -square is 0.8745 and the probability of the corresponding  statistic is close to 0. So, this MLR model is established.
This method can obtain higher accuracy of prediction results but will be easily affected by upstream and downstream traffic flow.If the upstream or downstream speed record failed, the abnormal value of prediction would appear.

Hybrid Model.
In the above study, we get the result of prediction of time and space, and the final result was shown  in Figure 9.In order to improve the accuracy of the forecast, this case uses the data of every Tuesday in the past year as training data and the data of the previous hour of the forecast target as the testing data.All data are separated by 5 min.In this case, there is no correlation between speed prediction time and space after being calculated, so we use a nonlinear regression model to estimate the parameters of the hybrid model.The red solid line stands for the results of predicted travel speed, the blue scatter represents the real speed data, and the green dash lines express the upper and lower bounds of the 95% prediction intervals.The model we proposed is able to predict traffic flow with a small degree of error in most situations.Generally speaking, the possibility of speed anomalies becomes greater where confidence interval amplitude becomes greater so that it can cover the abnormal data.The velocity prediction results show a large fluctuation between 210 and 250, and there is a certain error with the actual velocity.Similarly, there is such a phenomenon between 50 and 100.This is because the ARIMA model is still a kind of time series model, and, with the use of time series prediction methods, the forecast results will be subject to a period of time before the speed of the impact.This is also reflected in ACF and PACF when making ARIMA predictions.

Discussion
To objectively evaluate the effectiveness of different prediction models, four types of indexes are introduced for comparison: the root mean squared error (RMSE), the mean absolute error (MSE), the mean absolute percentage error (MAPE), and the maximum absolute percentage error (mAPE), with the results shown in Table 3.
As seen from Table 3, the predicted result of single time series method is the worst of these four methods.Spatial correlation predicted is better with experimental data, but the model assumes that the points on road are close to adjacent and that the road alignment is stable with no mutation.When the cross correlation is less than 0.8, the correlation does not exist.The method combining periodicity trend and residual had higher accuracy but did not consider emergent events happening in upstream and downstream.Compared with those above methods, the indexes of the hybrid model have the top performance.Generally, the hybrid model prediction based on spatial-temporal fusion is better than single time series prediction and spatial regression estimation prediction.Availability of this method is verified with a higher forecasting precision.
The results of the periodic analysis show that the points in the case are discrete at a speed between 0 a.m. and 7 a.m. and at 3 p.m. to 6 p.m. will appear to decrease speed and then return to normal again.These rules exist every Monday.Therefore, the decision-makers can set the maximum and minimum speed at 0 a.m. to 7 a.m., in order to stabilize the speed difference between vehicles, improving traffic safety.ARIMA prediction residuals are basically in accordance with actual residuals, but it is important to note that ARIMA predictions are affected by data changes, so there is some lag in the prediction of actual changes, but the results are basically within the 95% confidence interval.The results of spatial correlation analysis show that there is a strong correlation between A, B, D and C; that is, if there are emergencies at points A, B, and D, warnings should be set at point C in time to reduce congestion diffusion.

Conclusion
This paper proposed a hybrid model of multistep short-time travel speed prediction with spatial and temporal fusion.This model divides travel speed into three parts: a periodic trend, residual part, and the mutation influenced by upstream and downstream traffic flow.The change of the periodic part is generally by week, and we can gain the periodic trend expression by using Fourier series fitting historical data.Residual part is the difference between real speed and the value of time corresponding to the periodic function.The residual part is then further analyzed by the ARIMA model, the purpose of which is to forecast the change in a short time.The aim of the spatial analysis regression model is to achieve the influence of traffic incidents.Therefore, the model can improve the accuracy of the prediction of the speed and provide feedback on the law of speed at the same time, especially the periodic variation.It provides the theoretical basis for improving the service quality of ITS and traffic information systems.Freeway travel speed data was collected from five sensors located at I-394 in Minnesota.Through the calculation of multistep, we gained the predicting result of the proposed model.Comparisons between the proposed hybrid model with others indicated that treating the time and space comprehensively can generate better forecasting accuracy and reliability in terms of RMSE, MSE, MAPE, and mAPE.
Based on this study, there are two conclusions worth noting.First, there is no significant correlation between the speed predicted by time series and spatial correlation analysis.Therefore, we can only use a nonlinear function to fit data when taking calculations; concrete formulas need to be calculated repeatedly in SPSS or MATLAB according to the actual data, which means the rate is low in initial setting.Furthermore, the function needs rededuction after a period of time, to prove there is no change of the expression.Secondly, travel speed has always been considered to exhibit strong day-to-day cyclical patterns.But we found the speed changes and extremum of different days in a week are different by comparing with real data.Therefore, the periodic law can be expressed more accurately on a week-to-week basis.

Figure 1 :
Figure 1: Location of 5 adjacent points in one road.

2. 4 .
Hybrid Model.The travel speed has strong temporal and spatial correlation in the road network.The speed change in the next period can be regarded as a continued speed variation rule of what occurred in the last period; on the other hand, travel speed can be affected by traffic conditions in nearby links.An alternative way is to conduct the change of travel speed as a combination of cyclic and definitive components determined by real-time transformation, spacetime relationship, specific road conditions, and so on.

Figure 2 :
Figure 2: Location of 5 adjacent stations on I-394 segment, from station S979 to station S983.

Figure 3 :
Figure 3: (a) 53 Monday speed change.(b) Historical median travel speed change from Monday to Friday.

Figure 4 :
Figure 4: Speed distribution scattered points and Fourier fitting curve.

Figure 5 :Figure 6 :
Figure 5: Dividing the speed into a trend and a residual part.

Figure 9 :
Figure 9: Results of hybrid model prediction.
and  is autoregression order of model;   is moving average operators,

Table 1 :
Coefficient of Fourier series.

Table 3 :
Evaluation index of different prediction method.