MPE Mathematical Problems in Engineering 1563-5147 1024-123X Hindawi Publishing Corporation 10.1155/2015/154703 154703 Research Article Short-Term Traffic Flow Local Prediction Based on Combined Kernel Function Relevance Vector Machine Model http://orcid.org/0000-0001-8180-2424 Bing Qichun 1 Gong Bowen 1, 2, 3 http://orcid.org/0000-0002-0664-0426 Yang Zhaosheng 1, 2, 3 Shang Qiang 1 Zhou Xiyang 1 Small Michael 1 College of Transportation Jilin University Changchun 130025 China jlu.edu.cn 2 State Key Laboratory of Automobile Simulation and Control Jilin University Changchun 130025 China jlu.edu.cn 3 Jilin Province Key Laboratory of Road Traffic Jilin University Changchun 130025 China jlu.edu.cn 2015 5102015 2015 30 05 2015 03 08 2015 5102015 2015 Copyright © 2015 Qichun Bing et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Short-term traffic flow prediction is one of the most important issues in the field of adaptive traffic control system and dynamic traffic guidance system. In order to improve the accuracy of short-term traffic flow prediction, a short-term traffic flow local prediction method based on combined kernel function relevance vector machine (CKF-RVM) model is put forward. The C-C method is used to calculate delay time and embedding dimension. The number of neighboring points is determined by use of Hannan-Quinn criteria, and the CKF-RVM model is built based on genetic algorithm. Finally, case validation is carried out using inductive loop data measured from the north–south viaduct in Shanghai. The experimental results demonstrate that the CKF-RVM model is 31.1% and 52.7% higher than GKF-RVM model and GKF-SVM model in the aspect of MAPE. Moreover, it is also superior to the other two models in the aspect of EC.

1. Introduction

Short-term traffic flow prediction is an important basis for intelligent transportation systems (ITS). Real-time and accurate prediction information can be directly applied to the advanced traffic management system (ATMS) and advanced traffic information service system (ATIS). Because of its importance, short-term traffic flow predication has generated great interest among the scientific community and a large number of relevant methods exist in the literature. These include the spectral analysis model [1, 2], time series model [3, 4], regression model [5, 6], the Kalman filtering model [7, 8], neural network model [9, 10], support vector machine model [11, 12], and wavelet network model . Reader interested in details of models applied in traffic flow prediction field could refer to review papers such as . With the development of chaos theory, recent studies such as  have found that the short-term traffic flow time series data had nonlinear chaotic phenomena. Therefore, short-term traffic flow chaotic predictions have gained special attention. The prediction of chaotic time series could be generally classified into two categories: global prediction and local prediction. Global prediction methods use all phase points to describe the evolution law and then to predict the future value. A number of researchers have utilized global prediction methods in prediction of chaotic time series. Karunasinghe and Liong  investigated the performance of artificial neural network as a global model in chaotic time series predictions compared to local prediction models. Dong et al.  adapted the Elman neural network to realize short-time traffic flow prediction based on chaos analysis. Baydaroglu and Kocak  used support vector regression model to predict evaporation amounts, and phase space reconstruction is used to prepare input data for SVR. Local prediction methods select K neighboring points to fit the brief evolution trend of phase points and then to obtain the predicted value. Local prediction methods mainly include local average prediction method , weighted first-order local prediction method , the Lyapunov index prediction method , and support vector machine model . Due to the less number of fitting phase points, the local prediction method has the advantage of low computational complexity and high fitting degree. Farmer and Sidorowich  had already proved that the performance of local prediction methods was better than global prediction method under the same embedding dimension. Therefore, local prediction is adopted to achieve short-term traffic flow prediction in this paper.

In order to get the accurate prediction results, we need to find the nonlinear prediction function. However, it is hard to get the accurate function due to the interference of inside and external excitations. But determining the linear function is not hard since detecting linear relations has been focus of much research in statistics and machine learning fields for decades and the resulting algorithms are well understood, well developed, and efficient. So if we could combine both, it will solve the problem. Instead of trying to fit a nonlinear model, we can map the problem from the input space to the feature space by doing a nonlinear transformation using suitably chosen basis functions and then use a linear model in the feature space. The basis function is called kernel function. The linear model in the feature space corresponds to a nonlinear model in the input space. This is the main idea of relevance vector machine (RVM) model. Due to RVM theoretical advantages, it has gained special attention in recent years, such as . This paper is motivated to build the short-term traffic flow forecasting model based on RVM because of its ability to deal with the dynamic, nonlinear, and complex traffic flow time series. consequently, it is very suitable for short-term traffic flow prediction.

For these reasons, and with the goal of improving the accuracy of short-term traffic flow prediction, we put forward a short-term traffic flow local prediction method based on combined kernel function relevance vector machine model. The remainder of this paper is structured as follows: Section 2 presents the phase space reconstruction theory. Section 3 gives the process of building combined kernel function relevance vector machine model. Section 4 describes the experiment setup and case study. Section 5 draws some conclusions.

2. Phase Space Reconstruction Theory

Phase space reconstruction theory proposed by Packard et al.  is a powerful tool in the study of complicated system. According to the theory of chaos dynamics, the time series contains total useful information and reflects the process of system evolution in a long term. Complex characteristics found in a time series may be the result of temporal evolution on a chaotic attractor, objects of fractal dimension created by means of stretching and folding of space. If we could capture chaotic behavior from the time series signal of traffic flow models, we could enhance our knowledge about the inherent properties of the traffic flow system. Phase space reconstruction theory is used to create topologically equivalent attractors to the original dynamical system using the information from a scalar time series only .

Phase space can be reconstructed using delay coordinate method. The basic idea of delay coordinate method is that the evolution of any single variable of a system is determined by the other variables with which it interacts. Information about the relevant variables is thus implicitly contained in the history of any single variable. For a time series {x(i),i=1,2,,N}, the phase space can be reconstructed according to(1)Xi=xi,xi+τ,,xi+m-1τ,where τ is delay time and m is embedding dimension.

Embedding dimension and delay time are the key parameters for phase space reconstruction. At present, there are two kinds of views about the selection of these two parameters. One view is that the two parameters are independent and could be determined separately. The methods of calculating delay time include Average Displacement method , Mutual Information method , and Autocorrelation Function method . The methods of calculating embedding dimension include False Nearest Neighbors method , Cao method , and G-P method . Another view is that the two parameters are interrelated and should be determined simultaneously, such as C-C method . C-C method can obtain embedding dimension and delay time simultaneously. Compared with other methods, C-C method has the advantage of small amount of calculation and strong anti-interference. Therefore, C-C method is employed to determine delay time τ and embedding window width τω, and then the embedding dimension m is calculated according to τω=(m-1)τ. The principle of C-C method is as follows.

{ x ( i ) , i = 1,2 , , N } denotes time series data; a new set of vector series denoted by X=X(i) could be obtained through phase space reconstruction. The correlation integral for the embedded time series is the following function:(2)Cm,N,r,τ=2MM-11i<jMθr-Xi-Xj,where r is the neighborhood radius, Xi is phase point in phase space, τ is delay time, m is embedding dimension, M=N-(m-1)τ is the number of embedded points in phase space, N is the length of time series, · denotes sup-norm, and θ· is Heaviside unit function; if x<0, θx=0; if x0, θx=1. The correlation integral is a cumulative distribution function and denotes the probability that the distance between any two points is less than r. We define the test statistics(3)Sm,N,r,τ=Cm,N,r,τ-Cm1,N,r,τ.

The time series xi, i=1,2,,N, can be divided into t disjoint time series. The results are as follows:(4)x1,xt+1,x2t+1,x2,xt+2,x2t+2,xt,xt+t,x2t+t,.

The test statistics is(5)Sm,N,r,t=1tl=1tClm,Nr,r,t-Clm1,Nr,r,t,where Cl denotes the correlation integral of the lth subsequence.

As N, we can write(6)Sm,r,t=1tl=1tClm,r,t-Clm1,r,t.

For fixed embedding dimension m and delay time τ, as N, S(m,r,t) will be identically equal to 0 for all r if the time series data are independent and identical distribution. However, the actual time series data are finite and correlated, so S(m,r,t) is not equal to 0 generally. Thus, the local optimal times may be either the zero crossings of S(m,r,t) or the times at which S(m,r,t) shows the least variation with r, because this indicates that these points are uniform distribution. Hence, we select the maximum and minimum radius to define quantity.

Consider(7)ΔSm,t=maxSm,ri,t-minSm,rj,t.

Δ S ( m , t ) measures the maximum deviation of S(m,r,t)~t with r. Therefore, the optimal delay time is the first zero crossings of S(m,r,t)~t or the first local minimum point of ΔS(m,t)~t.

According to the BDS statistic result, we select m=2,3,4,5, rj=iσ/2, i=1,2,3,4, to calculate the following variables:(8)S-t=116m=25j=14Sm,rj,t,ΔS-t=14m=25ΔSm,t,Scort=ΔS-t+S-t,where S-t is the mean of S(m,r,t) for all subsequence. The optimal delay time τ is the first local minimum point of ΔS-t~t. The delay time window τω is global minimum point of Scort~t.

3. Modeling of CKF-RVM Model 3.1. The Principle of RVM Model

The relevance vector machine (RVM) model proposed by Tipping  is a sparse probabilistic model based on Bayesian principle. Compared with other intelligent algorithms, RVM owns better performance. For example, the kernel function of RVM model need not be restricted by Mercer’s condition. Moreover, it inducts a priori distribution of the weights and then greatly reduces the complexity of calculation. The principle of RVM model is as follows.

Consider a data set {xn,tn}n=1N, where xnRn, tnR. The relationship between xn and tn is as follows:(9)tn=yxn;w+ξn=i=1Nwiφix+w0+ξ,where w=(w0,w1,,wN) is weight vector, ξn is the independent additive noise term subject to ξn~N(0,σ2), φix=K(x,xi) is the nonlinear basis function, and K· is the kernel function. Therefore, p(tnx)=N(tny(xn),σ2) denotes the normal distribution of tn with mean y(xn) and variance σ2. Assume tn are independent of each other; the likelihood of the complete data set can be written as(10)ptw,σ2=2πσ2-N/2exp-12σ2t-Φw2,where t=(t1,t2,,tN)T and Φ=[φ(x1),φ(x2),,φ(xN)]T is the N×(N-1) kernel function matrix in which φxn=[1,K(xn,x1),K(xn,x2),,K(xn,xN)]T.

Because there are many parameters in the model, the maximum likelihood estimates of ω and σ2 will lead to severe overfitting. Therefore, the sparse Bayesian theory is adopted and a prior zero-mean Gaussian distribution over ω is as follows:(11)pωα=i=0NNωi0,αi-1,where α={α0,α1,,αN} is a vector of N+1 hyperparameters. Each weight is individually associated with a parameter, which controls the influence of the prior distribution over associated weight.

Because we have defined the prior probability distribution and the likelihood distribution, the posterior probability distribution is as follows according to the Bayesian theory:(12)pwt,α,σ2=2π-N+1/2Σ-1/2exp-12w-μT×Σ-1w-μ.

Posterior covariance matrix and mean value are as follows, respectively:(13)Σ=σ-2ΦTΦ+A-1,(14)μ=σ-2ΣΦTt,where A=diag(α0,α1,,αN).

According to the maximum expected hyperparameter estimation, the value of α and σ2 can be obtained through iterative algorithm. Consider (15)αinew=riμi2,where μi is the ith posterior average weight and ri=1-αiNii, where Nii is the ith diagonal element of the covariance matrix computed by the current α and σ2.

The noise variance σ2 can be obtained through iterative algorithm(16)σ2new=t-Φμ2N-Σiri.

Given a new sample x, t is the corresponding prediction value. The probability distribution of prediction value follows a normal distribution N(μ,σ2) with mean μ and variance σ2. Consider (17)μ=μTΦx,σ2=σMP2+ΦxTΣΦx,where μ is the predictive mean on x and σ2 is the predictive variance.

3.2. The Construction of Combined Kernel Function

The traditional relevance vector machine model mostly adopts single kernel function to complete the process of feature space mapping, which has achieved good performance in many practical applications. But the single kernel function has great limitations when the sample data contains heterogeneous information. Therefore, this paper integrates the Gaussian kernel function and polynomial kernel function to construct a new combination kernel function. The form of combination kernel function is as follows:(18)Kx,xi=λ·exp-x-xi22σ2+1-λ·xxi+1d,where λ is weight coefficient, 0λ1, σ is the kernel width of Gaussian kernel function, and d is the order of polynomial kernel function.

Different kernel functions have different advantages; if the weight coefficient of combination kernel function is inappropriate, the performance of combination kernel function may be lower than single kernel function. Therefore, proper weight coefficient is of great importance for the combined kernel function.

3.3. Parameter Optimization Based on Genetic Algorithm

There are three parameters that need to be optimized in the combined kernel function. The commonly used parameter optimization methods mainly include cross validation method  and grid search method . But these methods have a large amount of calculation and are often trapped in local optimum. Genetic algorithm (GA)  is a heuristic scientific method based on Darwin’s biological evolutionism, which has been widely applied to solve high dimensional optimization problem for parameter optimization in engineering and science areas. Genetic algorithm differs from traditional search and optimization methods in four significant points:Genetic algorithms search parallel from a population of points. Therefore, it has the ability to avoid being trapped in local optimal solution like traditional methods, which search from a single point.Genetic algorithms use probabilistic selection rules, not deterministic ones.Genetic algorithms work on the chromosome, which is encoded version of potential solutions’ parameters, rather than the parameters themselves.Genetic algorithms use fitness score, which is obtained from objective functions, without other derivative or auxiliary information.

Therefore, genetic algorithm is used to obtain the optimal parameters of combination kernel function. The specific steps are as follows.

Step 1 (initialize the parameters).

The population size and maximal generation count: the population size is 20, and the maximal generation count is 100.

Step 2 (representation).

The parameters to be optimized λ, σ, and d are coded in binary to generate the chromosomes.

Step 3 (fitness function definition).

The cross validation method is used to prevent overfitting and underfitting. The training data set is randomly divided into K subsets in K-fold cross validation. The RVM model is built using K-1 subset as the training set. The performance of the parameters is checked on the Kth subset. In this paper, fivefold cross validation method is used. The fitness function is defined as the mean absolute percentage error of the fivefold validation method on the training data set.

Step 4 (creating new population).

Selection, crossover, and mutation are carried out to generate population. The chromosomes with better fitness function values are selected using the roulette wheel method. The crossover probability of creating new chromosomes is set to 0.8. Mutation probability is set to 0.05.

Step 5 (stopping criteria determine).

If the generation count reaches its maximum value, the iteration is stopped. Otherwise, the process is repeated from Step 3 to Step 4.

4. Experiment Setup and Case Study 4.1. Data Source

The experimental traffic flow data come from loop detectors located on the north–south viaduct expressway in Shanghai, China. This segment includes 24 mainline detecting sections and 30 ramp detecting sections, equipped with 88 mainline loop detectors and 60 ramp loop detectors, respectively. The experimental data are collected on five consecutive Mondays from September 1, 2008, to September 29, 2008. The original time interval of collected data is 5 min. Figure 1 gives the traffic flow time series data from five consecutive Mondays.

The traffic flow time series data from five consecutive Mondays.

4.2. Phase Space Reconstruction

Phase space reconstruction is the basis of chaotic time series analysis which affects the prediction performance directly. This paper selects C-C method to complete phase space reconstruction. Figure 2 gives the curve graph between ΔS-t and t. Figure 3 gives the curve graph between Scort and t.

The curve graph between ΔS-t and t.

The curve graph between Scort and t.

From Figure 2, it can be seen that when ΔS-t get the first minimum, the value of t is 18. Therefore, the value of delay time τ is determined to be 18. From Figure 3, when Scort get the global minimum, the value of t is 113. Therefore, the value of τω is 113, and the embedding dimension m is determined to be 7 according to τω=(m-1)τ.

Figure 4 displays the 2D attractor of the reconstructed phase space for traffic flow time series.

The 2D attractor of the reconstructed phase space for traffic flow time series.

From Figure 4, we could see clearly that the 2D attractor for traffic flow time series is well-regulated, which instructs that the C-C method could implement phase space reconstruction of traffic flow time series excellently.

4.3. Identification of Chaos

Among the wide variety of methods available for chaos identification, the most popular one is the largest Lyapunov exponent method. The main methods of calculating largest Lyapunov exponent include Wolf method , Jacobian method , and small data sets method . Due to the smaller amount of calculation and clear principles, the small data sets method is employed to calculate the largest Lyapunov exponent of traffic flow time series. Figure 5 displays the result of small data sets method. The linear range is from 57 to 98, and the largest Lyapunov exponent corresponding to the slope value is obtained after the least-squares fit for the linear range. The largest Lyapunov exponent is found to be 0.0014, and this positive value implies an exponential divergence of the trajectories and hence a strong signature of chaos.

The result of small data sets method.

4.4. The Number of Neighboring Points

The number of neighboring points is one of the most important parameters which affects the prediction accuracy and the amount of calculation. If the number of neighboring points is too little, the nonlinear fitting advantage of relevance vector machine model will not be reflected. However, if the number of neighboring points is too much, the amount of calculation will increase greatly and the overfitting phenomenon will appear. Therefore, the Hannan-Quinn criteria  are used to determine the number of neighboring points. Figure 6 shows the results of Hannan-Quinn criteria.

The number of neighboring points based on Hannan-Quinn criteria.

According to Hannan-Quinn criteria, when Ck gets the minimum value, the corresponding k is the number of neighboring points which we need. From Figure 6, we could see that the number of neighboring points is 26.

4.5. Parameter Optimization

Genetic algorithm is used to optimize λ, σ, and d. The specific parameters of genetic algorithm are as follows: the population size is 20, maximal generation count is 100, the crossover probability is 0.8, and the mutation probability is 0.05. Figure 7 gives the fitness curve.

The fitness curve of GA.

From Figure 7, we could see that the optimal parameters of combined kernel function are λ=0.57, σ=0.25, and d=3.

4.6. Performance Evaluation Index

In order to evaluate the performance of the proposed method, two different types of measurements are introduced: the mean absolute percentage error denoted by MAPE and equal coefficient denoted by EC. The equations for the MAPE and EC are as follows:(19)MAPE=1ni=1nyi-y^iyi,EC=1-i=1nyi-y^i2i=1nyi2+i=1ny^i2,where yi denotes the actual value for the ith time interval, y^i denotes the predicted value for the ith time interval, and n is the total number of time intervals.

4.7. Model Performance and Analysis

Data collected from September 1 to September 22 are used as training samples, and data collected on September 29 are used as test samples to evaluate the performance of prediction model. In order to illustrate the predictive performance of the proposed method intuitively, Figure 8 presents prediction results based on the proposed method. The black line stands for actual traffic flow data, and the red line stands for the prediction results of CKF-RVM model. Figures 8(a) and 8(b), respectively, show the prediction results and the MAPE for east mainline detector denoted by NBDX16(2). Figures 8(c) and 8(d), respectively, show the prediction results and the MAPE for west mainline detector denoted by NBXX10(1).

The prediction performance based on the proposed method.

As shown in Figure 8, the prediction results are quite close to the actual data, and the MAPE are mostly within 10%. However, the MAPE from 0:00 to 4:00 is high, and this is because the actual traffic flow data during that time period is small. Overall, the CKF-RVM model achieves good prediction performance, which could meet the needs of short-term traffic flow prediction.

To describe the superiority of the proposed method detailedly, comparative analysis is carried out. This paper selects Gaussian kernel function relevance vector machine (GKF-RVM) model and Gaussian kernel function support vector machine (GKF-SVM) model as comparative approaches. For the sake of comparison and analysis in terms of macroscopic and microscopic aspects, Figure 9 gives the microscopic comparative results of different methods. Figure 9(a) shows the prediction results for east mainline detector denoted by NBDX11(1), and Figure 9(b) shows the prediction results for west mainline detector denoted by NBXX15(2). Table 1 gives the macroscopic comparative results of different methods.

Prediction performance comparison of different methods.

Model East mainline West mainline
MAPE EC MAPE EC
CKF-RVM 5.0% 0.987 5.4% 0.982
GKF-RVM 7.2% 0.958 7.9% 0.965
GKF-SVM 10.6% 0.940 11.4% 0.935

The microscopic comparative results of different methods.

As shown in Figure 9, we could see clearly that the prediction results of CKF-RVM model have the best fitting performance comparing to GKF-RVM model and GKF-SVM model. Therefore, the CKF-RVM model could further improve the accuracy of short-term traffic flow prediction.

From Table 1, we could see that the overall improvement of CKF-RVM model is obvious comparing to GKF-RVM model and GKF-SVM model. More precisely, the CKF-RVM model has an extra 31.1% improvement over the GKF-RVM model and an extra 52.7% improvement over the GKF-SVM model in the aspect of MAPE. Meanwhile, the CKF-RVM model is also superior to the other two models in the aspect of EC. Furthermore, the experimental results also demonstrate that the CKF-RVM model achieves good prediction performance for both east mainline data and west mainline data, which proves that CKF-RVM model has strong generalization ability. Overall, the CKF-RVM model is an effective and accurate method for short-time traffic flow prediction, which can provide satisfactory prediction results.

5. Discussion and Conclusions

This paper proposes a new short-term traffic flow local prediction method based on combined kernel function relevance vector machine model. The proposed method is more in line with the short-term traffic flow characteristic, which are nonlinear, chaotic, and nonstationary. The main contribution of this paper is not the specific techniques but rather the demonstration that the forecasting model should take the dynamic characteristics of short-term traffic flow into consideration. The most important contribution is that this paper provides the new idea and methodology to the relevance vector machine model on how to construct the combined kernel function for the short-term traffic flow forecasting model and how to optimize and identify the model structure parameters efficiently and effectively.

Traffic flow data collected from expressway are employed to evaluate the prediction performance of the proposed method, and the results are encouraging. The theoretical advantage and better performance from our studies indicate that the CKF-RVM model has good potential to be developed and is feasible in applying for short-term traffic flow prediction. In order to have more general and robust conclusions, traffic data from different roadways require further exploration. And future studies need to apply the model to other traffic variable data sets (such as traffic speed, travel time, and average occupancy; this study chooses the traffic flow as the demonstration). Moreover, it will be interesting to test traffic data set in different time intervals in the model.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

The authors express their sincere appreciation to the Chinese National High Technology Research and Development Program Committee for the financial support provided under Grant no. 2014BAG03B03, National Natural Science Foundation of China (no. 51408257 and no. 51308248), and Jilin Province Science and Technology Development Plan of Youth Research Fund Project no. 20140520134JH.

Nicholson H. Swann C. D. The prediction of traffic flow volumes based on spectral analysis Transportation Research Part C 1974 8 6 533 538 10.1016/0041-1647(74)90030-6 2-s2.0-0016349498 Zhang Y. R. Zhang Y. L. Haghani A. A hybrid short-term traffic flow forecasting method based on spectral analysis and statistical volatility model Transportation Research Part C: Emerging Technologies 2014 43 65 78 10.1016/j.trc.2013.11.011 2-s2.0-84902550168 Ishak S. Al-Deek H. Performance evaluation of short-term time-series traffic prediction model Journal of Transportation Engineering 2002 128 6 490 498 10.1061/(ASCE)0733-947X(2002)128:6(490) 2-s2.0-0036848533 Yeon J. Elefteriadou L. Lawphongpanich S. Travel time estimation on a freeway using Discrete Time Markov Chains Transportation Research Part B: Methodological 2008 42 4 325 338 10.1016/j.trb.2007.08.005 2-s2.0-39849084754 Clark S. Traffic prediction using multivariate nonparametric regression Journal of Transportation Engineering 2003 129 2 161 168 10.1061/(ASCE)0733-947X(2003)129:2(161) 2-s2.0-0037336276 Kamarianakis Y. Oliver Gao H. Prastacos P. Characterizing regimes in daily cycles of urban traffic using smooth-transition regressions Transportation Research Part C: Emerging Technologies 2010 18 5 821 840 10.1016/j.trc.2009.11.001 2-s2.0-77953362241 Okutani I. Stephanedes Y. J. Dynamic prediction of traffic volume through Kalman filtering theory Transportation Research Part B 1984 18 1 1 11 10.1016/0191-2615(84)90002-x 2-s2.0-0021375695 Wang Y. Papageorgiou M. Real-time freeway traffic state estimation based on extended Kalman filter: a general approach Transportation Research Part B: Methodological 2005 39 2 141 167 10.1016/j.trb.2004.03.003 2-s2.0-11144262651 Yang Z. S. Bing Q. C. Lin C. Y. Yang N. Y. Mei D. Research on short-term traffic flow prediction method based on similarity search of time series Mathematical Problems in Engineering 2014 2014 8 184632 10.1155/2014/184632 2-s2.0-84907246554 Zhu J. Z. Cao J. X. Zhu Y. Traffic volume forecasting based on radial basis function neural network with the consideration of traffic flows at the adjacent intersections Transportation Research Part C: Emerging Technologies 2014 47 2 139 154 10.1016/j.trc.2014.06.011 2-s2.0-84908312152 Fu G. Han G. Q. Lu F. Short-term traffic flow forecasting model based on support vector machine regression Journal of South China University of Technology (Natural Science Edition) 2013 41 9 71 76 Zhang Y. L. Xie Y. C. Forecasting of short-term freeway volume with v-support vector machines Transportation Research Record 2007 2024 2007 92 99 10.3141/2024-11 2-s2.0-40449104106 Xie Y. Zhang Y. A wavelet network model for short-term traffic volume forecasting Journal of Intelligent Transportation Systems: Technology, Planning, and Operations 2006 10 3 141 150 10.1080/15472450600798551 2-s2.0-33746860294 Adeli H. Neural networks in civil engineering: 1989–2000 Computer-Aided Civil and Infrastructure Engineering 2001 16 2 126 142 10.1111/0885-9507.00219 2-s2.0-0035044111 Vlahogianni E. I. Golias J. C. Karlaftis M. G. Short-term traffic forecasting: overview of objectives and methods Transport Reviews 2004 24 5 533 557 10.1080/0144164042000196000 2-s2.0-4444369422 Van Lint J. W. C. Van Hinsbergen C. P. I. J. Short-term traffic and travel time prediction models Artificial Intelligence Applications to Critical Transportation Issues 2012 22 22 41 Dendrinos D. S. Traffic-flow dynamics: a search for chaos Chaos, Solitons and Fractals 1994 4 4 605 617 10.1016/0960-0779(94)90069-8 2-s2.0-0028406835 Lan L. W. Lin F. Y. Kuo A. Y. Testing and prediction of traffic flow dynamics with chaos Journal of Eastern Asia Society for Transportation Studies 2003 5 1975 1990 Frazier C. Kocklman K. M. Chaos theory and transportation systems: instructive example Journal of Transportation Research Board 2007 1897 9 17 Karunasinghe D. S. K. Liong S.-Y. Chaotic time series prediction with a global model: artificial neural network Journal of Hydrology 2006 323 1–4 92 105 10.1016/j.jhydrol.2005.07.048 2-s2.0-33646572227 Dong C. J. Shao C. F. Li J. Short-term traffic flow prediction of road network based on chaos theory Journal of System Engineering 2011 26 3 340 345 Baydaroglu O. Kocak K. SVR-based prediction of evaporation combined with chaotic approach Journal of Hydrology 2014 508 356 363 10.1016/j.jhydrol.2013.11.008 2-s2.0-84888773320 Sugihara G. May R. M. Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series Nature 1990 344 6268 734 741 10.1038/344734a0 2-s2.0-0025199496 Zang L.-L. Jia L. Yang L.-C. Liu T. Chaotic time series model of real-time prediction of traffic flow China Journal of Highway and Transport 2007 20 6 95 99 2-s2.0-37449016635 Zhang Y. Guan W. Prediction of multivariable chaotic time series based on maximal Lyapunov exponent Acta Physica Sinica 2009 58 2 756 763 Zhang J.-S. Dang J.-L. Li H.-C. Local support vector machine prediction of spatiotemporal chaotic time series Acta Physica Sinica 2007 56 1 67 77 2-s2.0-33847774083 Farmer J. D. Sidorowich J. J. Predicting chaotic time series Physical Review Letters 1987 59 8 845 848 10.1103/PhysRevLett.59.845 2-s2.0-34249982739 Yan J. Liu Y. Han S. Qiu M. Wind power grouping forecasts and its uncertainty analysis using optimized relevance vector machine Renewable and Sustainable Energy Reviews 2013 27 613 621 10.1016/j.rser.2013.07.026 2-s2.0-84881341396 Bai Y. Wang P. Li C. Xie J. Wang Y. A multi-scale relevance vector regression approach for daily urban water demand forecasting Journal of Hydrology 2014 517 236 245 10.1016/j.jhydrol.2014.05.033 2-s2.0-84904904412 Fei S. W. He Y. H. Wind speed prediction using the hybrid model of wavelet decomposition and artificial bee colony algorithm-based relevance vector machine International Journal of Electrical Power & Energy Systems 2015 73 625 631 10.1016/j.ijepes.2015.04.019 Packard N. H. Crutchfield J. P. Farmer J. D. Shaw R. S. Geometry from a time series Physical Review Letters 1980 45 9 712 716 10.1103/PhysRevLett.45.712 2-s2.0-35949021230 Takens F. Rand D. A. Young L. S. Detecting strange attractors in turbulence Dynamical Systems and Turbulence, Warwick 1980 1980 898 Berlin, Germany Springer 366 381 Lecture Notes in Mathematics 10.1007/BFb0091924 Rosenstein M. T. Collins J. J. De Luca C. J. Reconstruction expansion as a geometry-based framework for choosing proper delay times Physica D: Nonlinear Phenomena 1994 73 1-2 82 98 10.1016/0167-2789(94)90226-7 2-s2.0-26544461279 Fraser A. M. Swinney H. L. Independent coordinates for strange attractors from mutual information Physical Review A 1986 33 2 1134 1140 10.1103/physreva.33.1134 2-s2.0-34548696055 Holzfuss J. Mayer-Kress G. An approach to error-estimation in the application of dimension algorithms Dimensions and Entropies in Chaotic Systems 1986 32 114 122 Kennel M. B. Brown R. Abarbanel H. D. I. Determining embedding dimension for phase-space reconstruction using a geometrical construction Physical Review A 1992 45 6 3403 3411 10.1103/physreva.45.3403 2-s2.0-35949006791 Cao L. Practical method for determining the minimum embedding dimension of a scalar time series Physica D: Nonlinear Phenomena 1997 110 1-2 43 50 10.1016/S0167-2789(97)00118-8 ZBL0925.62385 2-s2.0-0001874436 Grassberger P. Procaccia I. Characterization of strange attractors Physical Review Letters 1983 50 5 346 349 10.1103/PhysRevLett.50.346 2-s2.0-33646981873 Kim H. S. Eykholt R. Salas J. D. Nonlinear dynamics, delay times, and embedding windows Physica D: Nonlinear Phenomena 1999 127 1-2 48 60 10.1016/s0167-2789(98)00240-1 2-s2.0-0040315167 Tipping M. E. Sparse Bayesian learning and the relevance vector machine Journal of Machine Learning Research 2001 1 3 211 244 10.1162/15324430152748236 2-s2.0-0001224048 Zhang Y. L. Yang Y. H. Cross-validation for selecting a model selection procedure Journal of Econometrics 2015 187 1 95 112 10.1016/j.jeconom.2015.02.006 MR3347297 Liu X. L. Jia D. X. Li H. Research on kernel parameter optimization of support machine in speaker recognition Science Technology and Engineering 2010 10 7 1669 1673 Holland J. H. Adaptation in Natural and Artificial Systems 1976 University of Michigan Press Wolf A. Swift J. B. Swinney H. L. Vastano J. A. Determining Lyapunov exponents from a time series Physica D: Nonlinear Phenomena 1985 16 3 285 317 10.1016/0167-2789(85)90011-9 2-s2.0-0008494528 Sano M. Sawada Y. Measurement of the Lyapunov spectrum from a chaotic time series Physical Review Letters 1985 55 10 1082 1085 10.1103/PhysRevLett.55.1082 MR800046 2-s2.0-0001394076 Rosenstein M. T. Collins J. J. De Luca C. J. A practical method for calculating largest Lyapunov exponents from small data sets Physica D: Nonlinear Phenomena 1993 65 1-2 117 134 10.1016/0167-2789(93)90009-p 2-s2.0-43949166788 Hannan E. J. Quinn B. G. The determination of the order of an autoregression Journal of the Royal Statistical Society, Series B (Methodological) 1979 41 2 190 195