Examining the Impact of Different Periodic Functions on Short-Term Freeway Travel Time Prediction Approaches

Freeway travel time prediction is a key technology of Intelligent Transportation Systems (ITS). Many scholars have found that periodic function plays a positive role in improving the prediction accuracy of travel time prediction models. However, very few studies have comprehensively evaluated the impacts of different periodic functions on statistical and machine learning models. In this paper, our primary objective is to evaluate the performance of the six commonly used multistep ahead travel time prediction models (three statistical models and three machine learning models). In addition, we compared the impacts of three periodic functions on multistep ahead travel time prediction for different temporal scales (5-minute, 10-minute, and 15-minute). *e results indicate that the periodic functions can improve the prediction performance of machine learning models for more than 60 minutes ahead prediction and improve the over 30 minutes ahead prediction accuracy for statistical models. *ree periodic functions show a slight difference in improving the prediction accuracy of the six prediction models. For the same prediction step, the effect of the periodic function is more obvious at a higher level of aggregation.


Introduction
Travel time can effectively measure roadway traffic conditions [1]. us, accurate prediction of freeway travel time is important for traffic management agencies to provide better traffic guidance. However, it is challenging for researchers to predict travel time accurately due to the complex changes in traffic states [2]. A large number of algorithms have been proposed to improve the prediction accuracy of travel time.
Some researchers compared the performance of statistical models and machine learning models. For example, Stathopoulos et al. [26] found that fuzzy neural network outperformed Autoregressive Integrated Moving Average Model (ARIMA) in prediction performance. Vlahogianni [27] suggested that the advanced Neural Network (NN) structure can perform better than the ARIMA model. Jiang et al. [28] examined the prediction performance of different models under multiple steps ahead, and their results indicated that the machine learning models are superior to the two statistical models (i.e., vector autoregressive models and ARIMA).
Traffic data usually exhibit periodic characteristics during weekdays. us, considering the periodicity of data can improve the prediction performance. Up to date, three different approaches have been proposed to capture the periodic characteristics. Zou et al. [29] found that a synthetic prediction model consisting of statistical models and trigonometric polynomial function (TPF) can achieve higher prediction accuracy when the forecasting horizon is greater than half hour with 5 minutes as the aggregation level. Tang et al. [30] applied a double exponential smoothing method (DES) to describe the weekly similarities of traffic data. In the course of the study, Chen et al. [31] utilized the prediction model in accordance with the original traffic flow series compared with the intraday trend removed the by simple average (SA) approach. It is found that the accuracy of the prediction could be considerably improved by using the residual time series.
Regarding the prediction interval (steps), some existing studies have investigated the impact of data resolution on model prediction performance, but there are no definitive results. For example, Park et al. [32] considered the aggregation level from 2 minutes to 60 minutes of the ARIMA model based on travel time data. ey concluded that forecasting route travel time required higher concentration levels than link travel time prediction. Vlahogianni et al. [33] found that time clustering may distort critical traffic flow information, and we need further research to determine the optimal concentration level. Some studies found that higher data resolution usually shows larger noise [34,35].
Based on the previous studies, some studies have compared statistical models and machine learning models, and some scholars have proposed the improvement of periodic functions on travel time prediction. However, few studies have comprehensively evaluated the effects of different periodic functions on the two types of models under different prediction steps.
us, this study focuses on multistep ahead travel time prediction by considering different periodic functions. e periodic characteristics of the travel time are captured by SA, TPF, and DES models. e residual part is modeled by the statistical models (ARIMA, space time (ST) model, vector autoregressive (VAR) model) and machine learning models(support vector machine (SVM), back propagation neural network (BPNN), multilinear regression (MLR)). In total, 18 hybrid prediction models were established and compared. In addition, the performance of prediction models was evaluated under different scenarios: multistep ahead prediction (1, 3, 6, and 12 steps ahead predictions) with different aggregation levels (5-minute, 10-minute, and 15-minute). e remainder of the paper is organized as follows. In Section 2, we introduce the travel time data in the study. We describe the data collection site and analyze the temporal and spatial correlation as well as the diurnal pattern observed in the data. In Section 3, we introduce periodic functions and two main methodologies used in this study: statistical models and machine learning approaches. We also discuss the evaluation measures and determine the appropriate training periods. In Section 4, we evaluate the prediction performance of the six models and compare the impacts of different periodic functions on prediction models under different scenarios. In Section 5, we provide the conclusions and some future works.

Travel Time Data
is study analyzed the travel time data of US-290 between IH-610 and FM-1960 in Houston, Texas. e total length is approximately 12 miles. e segment is divided into five links by six automatic vehicle identification (AVI) readers ( Figure 1). Vehicles with toll tags passed through the AVI readers will be recorded with their ID and timestamps. Travel time of the link enclosed by this pair of AVI readers is the difference in the timestamps. e length of link A to link E is 0.8, 2.6, 3.0, 1.5, and 4.1 miles, respectively. e data collection duration is from January 2008 to August 2008, a total of 174 days. e travel times were initially collected once every 30 seconds, 24 hours per day. We calculated the arithmetic mean of travel time and aggregated the travel time into 5-minute, 10minute, and 15-minute intervals for each link. e missing data for the five links are all less than 1%, and historical averaged based data imputation method have been implemented to ensure the selected travel time data are appropriate for model validation and evaluation in this study. is study only focuses on the weekday (Monday-Friday) travel time prediction.

Temporal and Spatial Correlation of the Travel Time.
We calculate the historical average travel time per mile of the five links (Monday to Friday, January to August 2008) ( Figure 2). It can be found that the peak time of traffic occurs in the afternoons of all these five links, and there are mainly three types of travel time patterns. For link A, travel time increases after 12:00, peaks at about 16:30, and finishes later than 20:00. For links B and C, traffic congestion starts at 12: 00, peaks around 17:35, and returns to usual after 20:00. For links D and E, traffic congestion often occurs before 16:00, peaks around 17:50, and dissipates after 20:00. In our study, link D is chosen as the target link.
Changes in traffic flow have certain temporal and spatial characteristics. Autocorrelation and cross-correlation functions were calculated to examine the temporal and spatial correlation. e equation adopted here follows that of Zou et al. [23], as shown in equations (1)-(3). ar k a � ac k a ac 0 , where ar k a is the sample autocorrelation function; k a is the time lag; ac 0 � ac k a �0 ; N a is the number of observations; x t is the sample observation; and x is the sample means of the series. In this case, the cross-correlation function measures the temporal and spatial correlation between the travel time data pairs recorded on two selected links. For travel time data pairs (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x n , y n ), an estimate of the cross-covariance function c xy (k c ) is where x and y are the sample means of the x t series and y t series, respectively. n is the number of travel time data pairs, and an estimate of the lag k c cross-correlation function is cr xy k c � c xy k c s x s y , k c � 0, ± 1, ± 2, . . . , where s x � ����� � c xx (0) and s y � ����� � c yy (0), and cr xy (k c ) is the sample cross-correlation function.
We found that the autocorrelation function of travel time shows a downward trend with time lag (Figure 3). Cross-correlation functions between link D and links A, B, C, and E peak at the lag of −9 and −4, 0 and 0, respectively ( Figure 4). As can be seen from previous analysis (Figure 2), the peak times of five links are not same. e peak of links A and B occurs earlier than link D, so the cross-correlation functions between links A and B and link D peak reach the peak at lags of −9 and −4. e traffic state of links A and B Journal of Advanced Transportation changed 45 minutes and 20 minutes earlier than link D. Links C and E are directly connected to link D, the traffic congestion state and the peak time are more similar, the cross-correlation functions between links C, E and link D peak at lags of 0. Furthermore, the maximum cross-correlation values between link D and its adjacent links A, B, C, and E are 0.547, 0.720, 0.822, and 0.904.

Periodic Pattern of Travel Time.
Previous research showed that travel time exhibits periodic characteristics during the weekdays. Similar periodic characteristics were found by Kamarianakis et al. [36] in occupancy, speed, and flow data. Because the periodic trend may affect the travel time prediction, this study proposed the hybrid prediction models to accommodate the periodic trend components as well as the temporal and spatial correlation observed in the data. Specifically, periodic characteristics are modeled using TPF, SA, and DES methods.

Periodic Functions
Simple average method is one of the commonly used methods to describe the periodic characteristics [31]. During the study, the researchers set the hypothesis that the sampling travel time data of M consecutive working days could be written as a series of onedimensional vectors Y d , as shown in equation (4). e intraday trend Y is calculated by simple average method as equation (5).
where y d (m) stands for travel time data collected at time m on day d. M indicates the number of sampled days, m is sampling data points per day. In this study, M � 30, m � 288.

Trigonometric Polynomial Function.
Trigonometric polynomial adopts the sinusoids and cosinusoids to describe the periodic pattern. Equation (6) was used to calculate the average daily travel time at each station. Trigonometric polynomial function is represented in the following equation: where S t is the estimated periodic component at time t; m indicates the number of samples per day; n r is the number of trigonometric polynomials; and a 0 , . . . , a 2n r are the coefficients.
Regarding the selection of optimal number of trigonometric series functions, Zou et al. [29] claimed that the number of trigonometric polynomials might have an impact on the prediction accuracy of the hybrid model. At the same time, they found that 15 or more trigonometric polynomials should be included in the periodical component. erefore, in this study, the researchers set the value of n r in equations (6)-(15).

Double Exponential Smoothing.
Double exponential smoothing is one widely used method for both smoothing and forecasting time series. is approach builds the prediction in accordance of the levels mean M t and the trend T t . e model can be expressed as where Y t is the observed travel time at time t; M t stands for the estimate of level of series at time t; T t indicates the estimate of slope of series at time t; α and β are smoothing parameters, the two parameters can be estimated using the Levenberg-Marquardt algorithm.

Statistical Models (1) Autoregressive Integrated Moving Average (ARIMA)
Models. e ARIMA model transforms nonstationary time series into stationary time series after di differences, and then stationary sequence can be predicted by the ARMA model. From the view of mathematics, the demonstration of an ARMA (p, q) procedure is as where y(t) stands for the future travel time at time t; φ i and θ i are the parameters of pattern; ε(t) indicates white Gaussian noise with mean zero and variance σ 2 w ; p is the number of autoregressive terms; q is the amount of lagged forecast errors. Let ARIMA model is as equation (10): where di is a nonnegative integer, which stands for the number of nonseasonal differences. If di � 0, the ARMA model could be obtained. When predicting each future travel time value, the best order of the ARIMA model is decided by Akaike information criterion (AIC).
(2) Space Time Model. As a probabilistic modeling method, ST model can provide point prediction and corresponding prediction intervals. e normal distribution is used to describe travel time in this study. e point prediction D t+p is D t+p given by where pr � 1, 2, 3, . . . , 12; u t+pr and σ t+pr are the location parameter and scale parameter of N(u t+pr , σ 2 t+pr ); Φ is the cumulative density function of a standard normal distribution. e u t+pr is modeled through a linear combination of current and previous values of all travel time series on all links. When choosing the predictive variables, different combinations of predictive variables need to be considered. erefore, researchers begin with the most complex models and gradually subtract predictive variables until no further improvement was obtained. For instance, if pr � 1, 9 variables were selected, where A t , B t , C t , D t , E t are the travel time at links A, B, C, D, and E at time t and α 0 , α 1 , . . . , α 9 are model coefficients. To build a model for the predictive spread, σ t+p , the ST model allows for conditional heteroscedasticity by modeling σ t+pr as a linear function of the volatility value v t , e coefficients b 0 and b 1 are nonnegative, and their volatility values could be modeled as (3) Vector Autoregressive Models. VAR model is regarded as one of the most widespread methods which utilize statistical methods in time series prediction. e model can include many factors consisting of the impact of upstream and downstream links on predicting future travel time. During the process of this research, a 5-equation VAR model is utilized, and it can be expressed as follows: . . , c k � 5 × 5 coefficient matrices; and f t+1 � the corresponding 5 × 1 independently and identically distributed random vector, E(f t+1 ) � 0. e stability of the VAR model could be guaranteed through the characteristic polynomial where I 5 stands for a 5 × 5 identity matrix. It is a necessary and sufficient condition that all characteristic roots are located outside the unit circle for stability.

Machine Learning Models
(1) Support Vector Machine Model. e SVM approach is a method which could be used to map the sample space into high or even infinite dimension feature space (Hilbert space) by nonlinear mapping to construct linear regression in a new space. Given a set of data points (x 1 , y 1 ), (x 2 , y 2 ), . . . , (x N , y N ) for regression, N is the number of training samples. Normally, the objective of SVM is to find a function where Φ(·) � the kernel function that maps input x into the feature space g; w is the weighting vector; b is a constant bias. A λ-insensitive loss function is assumed as en, it could be estimated that w and b by working out this optimization problem: where λ is the maximum deviation permitted; L λ (x, y, g) is loss function, C indicates the related penalty for stating deviation within the training process that assesses the tradeoff between the empirical risk and the smoothness of the model. e relaxation variables β i and β * i are used to indicate the optimization objective into the optimization issue stated as at issue mentioned above is worked out by utilizing the Lagrange equation. e regression function is demonstrated as where E(s i , s j ) is the kernel function. α i and α * i are the solutions to dual problem. In our study, the grid analysis and cross validation are used to optimize the parameters C and c.
e cross-validation method divides data into three groups, among which one subset is the validation group and the other two subsets are used as the training set; 3 models are obtained. Grid analysis is a method of programming enumeration to compare the performance of models with different parameters C and c. In this paper, all combinations of Log 2C and Log 2c parameters between −5 and 5 were traversed. e parameter combination with minimum mean square error was selected.
(2) Back Propagation Neural Network Model. In short, the BPNN model is a multilayer feed forward neural network which consists of many parallel nonlinear computing elements. As we all know, initialization network is composed of input layer, hidden layer, and output layer. Within the neural network, the weights between the most important parameter connection layers can be calculated by error back propagation algorithm. When a neural network model acquires the mapping relationship between input and output variables through continuous learning, it can predict the output according to the given input variables.
First, equation (22) can be used to calculate the value of the predicted hidden layer: where H j stands for the production of hidden layer and S is the incentive function of neurons, h stands for the neuron number of hidden layers, num refers to the neuron number of the input-layers, v ij stands for the weight element between input-layer and hidden-layer, e j stands for the bias value of hidden layer.
where Q stands for the actual output of output layer, q ζ refers to the bias value of output layer, and k 1 stands for the neuron number of the output layer. In our study, the empirical formula combined with the trial and error method was used to determine the number of nodes in the hidden layer. 4 nodes with the best performance were selected finally.
(3) Multilinear Regression Model. Compared with the above two supervised algorithms, the construction of multiple linear regressions is simpler and belongs to regression learning category. In MLR, the prediction values can be calculated by the following equation: where y(t) represents the prediction value at time t. e independent variable y(t − j) means the travel time data at the previous t − j period, lr is the number of historical travel times considered in MLR model, and r 0 , . . ., r j are the regression parameters which can be optimized by training samples. lr is chosen on basis of an analysis of the travel data from January to April 2008. Different numbers of lr are considered.

Hybrid Prediction Models.
As mentioned in Section 2, freeway travel time has a daily periodic characteristic. erefore, it can be assumed that the travel time has two parts. One of the two parts is the deterministic component; the other is the irregular component. In such a hypothesis, the hybrid prediction model can be used to describe or calculate the freeway travel time: where P t is the travel time at time t at station D; D t is the periodic component; and y r t represents the residual part after removing the periodic component.
Periodic component can be described by three kinds of functions (TPF, SA, and DES), and the residual part is modeled by six prediction models. We compare the impacts of different periodic functions on multistep ahead freeway travel time prediction models using travel time data with different aggregation levels.

Measures and Training Period.
To evaluate the multistep prediction performance of all prediction models, three indicators, mean absolute error (MAE), mean absolute prediction error (MAPE) and root mean square error (RMSE) are considered comprehensively. e equations for calculating three indexes are as follows: where n is the number of observations; y i represents the actual travel time at time i on link D; and y i refers to the predicted travel time. So far, there is no automatic way to calculate and evaluate the model training period. is study considered different training periods of 15, 20, 25, 30, 40, 50 and 60 days. For comparison, the travel time data in August (21 weekdays) were used as the test set. Figure 5 shows MAE, MAPE, and RMSE values of six travel time prediction models under different lengths of training periods. It is observed that the prediction performance of statistical models and MLR model changed slightly as the number of training period increases. e performance of SVM and BPNN has been greatly improved as training period increases when the training period was less than 30 days. If the training period was more than 30 days, the prediction accuracy of SVM and BPNN models changed slightly. Longer training period usually requires larger computational time for each model. For example, the computational time of the SVM model is 5 minutes when 10-day travel time data was used for model training, and the computational time can be as high as 68 minutes when 60 days was chosen as training period. e calculation time and prediction accuracy were considered comprehensively in our study, and a 30-day (July (23 days) and June (7 days)) training period is chosen for models.

Results and Discussion
In this part, the multistep ahead prediction performance of SVM, BPNN, MLR, ARIMA, ST, VAR under different aggregation levels (i.e., 5-minute, 10-minute, and 15-minute) are evaluated using the travel time data observed on link D. In addition, we explored the impacts of different periodic functions on statistical models and machine learning models under different aggregation levels for the input data. e testing period is 15:30 to 19:30 from 1 August to 31 August (21 weekdays).

e Performance of Six Models.
e study provides the MAE, MAPE, and RMSE values of SVM, BPNN, MLR, ARIMA, ST, and VAR models for different forecasting horizons under different aggregation levels (5-minute, 10minute, and 15-minute) for the input data (Tables 1-3). Tests on travel time data indicate the following findings. First, the prediction accuracy deteriorates as the forecasting step increases for all models. Second, the higher the data aggregate level, the higher the accuracy of short-term travel time prediction results. For example, when we predict the 30minute ahead travel time of link D, the prediction result is better with 15-minute data as the aggregation level than that with 10-minute and 5-minute data as the aggregation level.
ird, for machine learning models, the prediction accuracy of MLR is lower than that of BPNN and SVM. For statistical models, the prediction accuracy of ARIMA is lower than that of ST and VAR. e possible reason is that the SVM, BPNN, ST, and VAR models use spatial and temporal information from neighboring links to predict the future travel time value at time t + p. While MLR and ARIMA models use the travel time data collected on the target link D only to predict travel time values at time t + p. Fourth, the prediction accuracy of two machine models (SVM and BPNN) is better than that of statistical models. However, the prediction accuracy of MLR model is significantly lower than that of statistical models.   (Figure 6). Based on the observation of the results, several interesting conclusions can be drawn. First, period functions have similar impacts on SVM and BPNN models. e periodic functions have a definite improvement for more than 60 minutes ahead prediction under three data aggregation levels. Second, three periodic functions have improved the prediction performance of MLR model in multistep ahead prediction for three data aggregation levels. ird, three periodic functions can improve the prediction accuracy of travel time over 30 minutes ahead for all statistical models. Fourth, with the increase of aggregation level, the difference of prediction results of the comprehensive prediction model considering periodic functions increases gradually. For example, when the aggregation level is 5-minute, the prediction results of the three SVM comprehensive models have little difference, while when the aggregation level is 15 minutes, the prediction results are significantly different.  Journal of Advanced Transportation 9   From the above analysis, we can conclude that the periodic functions obviously improve the prediction accuracy of the six prediction models for multistep ahead prediction. en we analyze the impact degree of different periodic functions on prediction models based on mean absolute error difference (MAED). e equation of the MAED is as follows:

Impacts of Periodic
where n is the number of observations; y i represents the actual travel time at time i on link D; y i refers to the predicted travel time based on traditional prediction models, and y i refers to the predicted travel time based on models considering periodic functions. If the MAED is greater than zero, this periodic function improves the prediction accuracy of the traditional prediction model; otherwise, it reduces the prediction accuracy of the traditional prediction model. e result shows the MAED values for 18 hybrid models from 1-step to 12-step ahead forecasting with 5minute, 10-minute and 15-minute as aggregation levels ( Figure 7). First, periodic functions can significantly improve the MLR model in multistep ahead prediction for three data aggregation levels. Second, for 1-step and 3-step ahead prediction, three periodic functions reduce the prediction accuracy of SVM and BPNN models. ird, for 1step ahead prediction, both TPF and SA improve the performance of statistical models in multistep ahead prediction   -10  10  -15  15  -30  15  -45  15  -60  15  TPF  90  15  TPF  120  15  TPF   ARIMA, ST, VAR, MLR   5  5  TPF  10  10  SA  15  15  SA  30  15  SA  45  15  SA  60  15  SA  90  15  SA  120 15 SA for three data aggregation. When the aggregation level is 5minute, TPF can obviously improve the prediction accuracy. While when the aggregation level increases to 10-minute, SA periodic function performs better. For 3-step ahead prediction, three periodic functions improve the forecasting results of the statistical models obviously, and three periodic functions have slight difference in improving prediction accuracy of the statistical models. Fourth, when 6-step and 12-step prediction ahead, three periodic functions improve the forecasting results of the six models obviously, and three periodic functions have slight difference in improving prediction accuracy of the prediction model. Fifth, for the same prediction step, the improvement of periodic function is more obvious with the increase of data aggregation level. For example, for 12-step ahead prediction, hybrid models perform better with 15-minute as aggregated level than that of with 5-minute as aggregated level.
In this section, we discuss the aggregation level and periodic function suggestions. According to the conclusion of Table 1 and Figure 7, the higher the data aggregate level, the higher the accuracy of short-term travel time prediction results. When prediction time was greater than 15 minutes, the highest accuracy could be obtained by using the aggregation level of 15 minutes. According to Figures 6 6(a) and 6(b), the periodic functions cannot improve the prediction performance of SVM and BPNN models when prediction time was less than 60 minutes, it is recommended that no periodic function should be considered. When the prediction time is greater than 60 minutes, TPF periodic functions are recommended. As can be seen from Figure 7, both TPF and SA have improved the performance of statistical models and MLR model in multistep ahead prediction for three data aggregation and SA performed better. For different minutes ahead prediction, aggregation level and periodic function suggestions are shown in Table 4.

Conclusions
is paper evaluated the multistep ahead prediction performance of SVM, BPNN, MLR, ARIMA, ST, and VAR models using the freeway travel time data collected from vehicle identification readers along US-290 in Houston, Texas. e performances of the six prediction models under different aggregation levels (5-minute, 10-minute, and 15minute) were compared. e impacts of different periodic functions on machine learning and statistical models under different aggregation levels (5-minute, 10-minute, and 15minute) are also investigated. Several important conclusions can be drawn based on the results. First, the periodic functions can improve the prediction performance of machine learning models for more than 60 minutes ahead prediction and improve the over 30 minutes ahead prediction accuracy of all statistical models. Second, the considered three periodic functions have slight difference in improving prediction accuracy of the six prediction models during multistep ahead prediction. ird, with the increase of prediction steps, the impact of periodic function on the prediction model becomes obvious. Fourth, for the same prediction step, the effect of periodic function is more obvious with the increase of data aggregation level. For future work, since nonrecurrent events (incidents, special events, etc.) may disturb the cyclical pattern of travel time, it will be interesting to analyze and compare the impacts of periodic functions on prediction models under nonrecurrent traffic conditions. In addition, artificial intelligence has greatly promoted the development of traffic science. Especially deep learning algorithms, such as deep residual networks, cyclic neural networks and convolutional neural networks, have been rapidly developed in transportation field. It is also interesting to examine the impact of different periodic functions on deep learning algorithms.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.