HybridModel for Method for Short-Term Traffic Flow Prediction Based on Secondary Decomposition Technique and ELM

Strong nonstationary and nonlinearity are the main characteristics in the short-term trac ow data, which frustrates traditional methods (e.g., autoregressive integrated moving average and deep belief network) to provide a satisfactory prediction. To address the above problem, a novel forecasting method, which is composed of a secondary decomposition technique and extreme learning machine, is proposed in this study. is developed technique is a hybrid of time-varying ltering-empirical mode decomposition (TVF-EMD) and local mean decomposition (LMD), which not only can eectively handle the above complex data features by decomposing them into several regular subsets but also produce the smoother subseries that is benecial to prediction. To verify the eectiveness of the proposed method, a case study based on two groups of actual trac ow data with dierent characteristics is performed. Meanwhile, several single models and hybrid models based on the other decomposition methods (e.g., EMD and variational mode decomposition) are considered benchmark models. e experimental results reveal that the proposed model presents the best performance. For example, compared with the TVF-EMD-based method, the improvement by the proposed approach reaches 33.3% in terms of the evaluation criterion of mean absolute percentage error.


Introduction
With the increasing wealth of material life, vehicles have become the most popular transportation means for the most urban residents.
is phenomenon inevitably leads to an increase of cars driving on the urban roads, thereby yielding serious tra c jams [1]. Meanwhile, the existing road network capacity cannot match the rapid growth of vehicles [2]. Although some attempts (e.g., the expansion and renewal of tra c facilities) have been made, the above problem still exists and has to resort to some new technical development. Fortunately, the rapid development of information technology (e.g., intelligent transportation system, ITS) makes it possible to address above the problems. e ITS has the ability to realize tra c control and guidance at the same time, which plays the important role in reducing tra c accidents and congestion [3]. Especially, it is bene cial to provide a reference for the management department to make strategies in time, such as realizing e ective control by adjusting the time interval of intersection lights. In the ITS, short-term tra c ow prediction is one of the key technologies that attract more and more attention during the recent decades. e prediction accuracy is directly related to the performance of tra c management and control, and the higher one indicates the more reliable real-time road information [4]. erefore, the realization of accurate shortterm tra c prediction is urgently needed.
Tra c prediction models have been developed over the past few years, and they can be roughly divided into two categories including single models and hybrid models. In actual cases, single models normally only explain one characteristic. However, the true short-term tra c ow contains complex characteristics. To this end, the hybrid prediction models become mainstream. e hybrid models based on commonly methods include parametrical optimization, single model combination, and data decomposition. In contrast with parametrical optimization and single model combination, the data decomposition method has the capability to explain a variety of complex data characteristics in short-term traffic flow, which attracts the growing attention of researchers.
However, traditional prediction models based on decomposition methods exist some problems in the actual application that may cause some deviation in the prediction result. For example, the empirical mode decomposition (EMD) method is sensitive to noise and limited by the mode mixing problem. To enhance the accuracy of the short-term traffic prediction, this paper proposes a new hybrid model based on the secondary decomposition technique and ELM. e contributions of this paper can be summarized as follows: (1) time-varying filtering-EMD (TVF-EMD) is used to decrease nonstationary and remove the high-frequency noise of the original traffic flow. en, we obtain the all IMFs. (2) LMD can further handle the IMFs to obtain smoother subseries. To decrease the nonstationary and avoid extreme oscillations by LMD, we propose decomposition conditions to ensure that the IMFs can be decomposed or not. (3) Extreme learning machine (ELM) is employed to forecast each subsequence decomposed by the secondary self-adaptive decomposition method (TVF-EMD-LMD) and obtain the final prediction value. en, two groups of actual traffic flow data with different characteristics are used to verify the effectiveness of the proposed method. e organization in this paper is listed as follows: Section 2 exhibits the proposed models and methods in short-term traffic flow. Section 3 briefly presents the basic theory of corresponding methods. Section 4 gives the construction steps of the proposed hybrid model. In Section 5, two groups of data of short-term traffic flow are used to verify the proposed model, and the superiority of the proposed model is illustrated by comparing it with the hybrid model based on the traditional decomposition method. e last chapter makes a summary of the full article.

Literature Review
In the past decades, various methods have been developed to improve the forecasting performance. ese single models are mainly involved in autoregressive integrated moving average (ARIMA), Kalman filter (KF), and so on. ese models usually feature a simple structure and mainly aim at explaining the linear component hidden in the data [5,6]. However, short-term traffic flow not only contains linear components but also has a significant nonlinearity. ese features greatly affect the prediction accuracy of the model. To this end, many scholars turn their focus on the research of nonlinear prediction models. Fortunately, with the rapid development of computer technology, artificial intelligence models (AI-based models) have become a convenient tool to solve the nonlinear problems in short-term traffic flow prediction. By comparison, it can provide more accurate prediction due to its strong adaptive learning and nonlinear mapping abilities. As a typical intelligent model used in this field, artificial neural network (ANN) usually has a higher forecasting accuracy than traditional time series models [7]. For example, Huang et al. [8] proposed a type of method, i.e., ELM model. In this model, the output matrix of samples is obtained by randomly generating parameters, and the output connection weights are determined by solving the generalized inverse matrix. So it has the ability to realize a fast calculation. To effectively forecast the short-term traffic flow, the researchers start adopt deep learning models. For instance, the deep belief network (DBN) is used to extract the features of the traffic flow, then adopt support vector regression to predict. Compared with the traditional AIbased models, the application of the DBN model can realize satisfactory prediction effectiveness [9]. In addition, the deep learning model is also adopted as a prediction machine to learn the corresponding obtained features [10]. More details about the superiority of the AI-based models can be found in reference [11].
Although the single models have achieved well prediction results, they cannot explain multiple characteristics (e.g., nonstationary and nonlinearity) of short-term traffic flow at the same time [12]. Fortunately, the hybrid model can explain them. A large number of facts show that the prediction effect of the hybrid models is better than that of the single models. For instance, a hybrid model based on the long and short memory model is proposed to predict traffic flow, and the prediction effect is better than the single model [13]. In addition, nonlinear and nonstationary are obstacles in time series prediction, which increase the difficulty of prediction work. In order to separate the above important features hidden in short-term traffic flow and decrease the nonstationary, the hybrid models based on the data decomposition method are an optional scheme to address this issue. e construction steps of this kind of hybrid model are roughly divided into three parts: (1) decompose the original short-term traffic flow by using the data decomposition method. (2) e proper predictors are established to forecast the decomposed subseries, respectively. (3) e final prediction result is obtained by superimposing the subprediction results. e data decomposition method has become one of the key technologies of the hybrid model in recent years [14][15][16][17]. Huang et al. [18] first proposed the EMD method, which can adaptively process nonstationary and nonlinear signals. Meanwhile, it is widely used in various fields and achieves satisfactory results [19,20]. However, the EMD method cannot cope with the problem of signals affected by noise effectively. erefore, ensemble empirical mode decomposition (EEMD) is proposed to further optimize EMD, which adds a white noise to the original signal for eliminating the negative influence of the noise, and the experiment shows that it can achieve good results in a certain range of applications [21]. Nevertheless, the applications of EEMD will be restricted due to an unreasonable assumption of white noise on the actual signal. In order to address the noise problem effectively, TVF-EMD method is proposed in the following period of time. is method adopts a B-spline instead of a cubic spline to interpolate the extreme points, which greatly improves the filtering effect of high-frequency noise. Moreover, TVF-EMD can eliminate the pseudo extreme points of "pollution" by noise and further realize accurate interpolation [22]. In the application of short-term traffic flow prediction, the prediction models based on TVF-EMD are more effective than EMD [23,24]. Although EMD-based models are widely applied in prediction fields, it is a decomposition method based on Hilbert-Huang transform theory [25], and the decomposed subsequence cannot well explain the negative frequency components in the frequency domain of signal. To this end, the local mean decomposition (LMD) method [26] was proposed to further improve EMD, and the combination of EMD and LMD model shows an excellent performance in application [27]. Besides the EMD-based decomposition methods, it is worth mentioning that a new signal processing method, i.e., variational mode decomposition (VMD), has been proposed in [28]. is method can not only remove the highfrequency noise adaptively but also decompose the original signal into several subseries whose central frequencies can be clearly separated. Its good performance has been verified in the field of prediction [29][30][31][32]. However, the choice of VMD decomposition level in the field of short-term traffic flow prediction lacks theoretical guidance, which may lead to a huge deviation in prediction accuracy [33].

TVF-EMD.
TVF-EMD is an improved data decomposition method of EMD. Its two main processes are the calculation of the cut-off frequency of the signal and the filtering of the signal approximated by B-spline polynomials. e following is a brief introduction to their theory. Suppose that there is a signal x(t), the analytic form of the signal will be obtained by Hilbert transform, and its mathematical expression is as shown in the formula.
where j is the imaginary unit; A(t) and φ(t) are the instantaneous amplitude and frequency of the analytic signal, respectively; and x(t) is the Hilbert transform of the original signal. More details about the Hilbert transform can be found in reference [34]. en the analytic signal Z(t) can be further decomposed into two analytic signals by equation (2).
Similarly, a 1 (t), a 2 (t), φ 1 (t), and φ 2 (t) are the instantaneous amplitude and frequency of the two decomposed signals correspondingly. Denote the set of the local minimum and maximum values of the original signal amplitude A(t) as A(t min ) , A(t max ) ,respectively, and the corresponding timing node-sets are t min and t max . en, in terms of the theoretical derivation of the local minimum and maximum values of the signal (the detailed contents can be found in [22]), we can get formulas (3)-(6).
where equations (7) and (8) assume that a 1 (t) is greater than a 2 (t). Finally, we get the expression of the cut-off frequency (11).
B-spline is commonly applied in interpolation and exhibits a great approximation effect. It is a polynomial interpolation function so that any finite signal can be approximated by the B-spline basis function [35,36], and the expression of approximating signal is shown as the following formula.
where c n (·) is the B-spline basis function of a polynomial with degree n, m is the node step size, and c is the coefficients of the B-spline basis function. For a given signal y(t), the coefficients of the approximation signal are determined by the least square error between the real signal and the approximation signal, as shown in equation (13).
Mathematical Problems in Engineering where [·] ↑m is the m-step up sampling of the c(t) and * is the convolution operation. e solution of the problem is shown in equation (14).
where [·] ↓m is m-step down sampling operation and p n m is the filter term, and its expression is exhibited as follows: In this way, the original signal can be replaced by the approximation signal, and the approximation signal plays a very important role in filtering the noise of the original signal. According to the corresponding experiment in [22], the larger the value of m in the proper range, the better the suppression effect of high-frequency noise.
e detailed steps about TVF-EMD are presented in Appendix A.

Extreme Learning Machine.
e extreme learning machine is an artificial intelligence model, which can effectively solve the problem of nonlinear classification and prediction. Suppose that there is an observation sample en the extreme learning machine can be used to learn the sample space adaptively. e structure is composed of the input layer, hidden layer, and output layer, and its learning process can be summarized as the following steps.
Step 1.: Set the number of neurons in the hidden layer M and the neuron activation function f(·).
Step 2.: Generate input layer connection matrix ω and bias vector b randomly, where ω is the matrix of n × M, and the element of row i and column j is denoted as Step 3.: From the randomly generated connection matrix and bias vector, the output matrix ψ can be obtained, as shown in formula (16).
where ω i , b i are the ith column of ω and the ith element of b, respectively.
where P is the observed value vector, i.e., According to the definition of the generalized inverse matrix, the solution of equation (17) can be obtained by equation (18).
where ψ + is the generalized inverse of matrix ψ. From the above four steps, it can be seen that the learning process of the extreme learning machine does not need to learn the corresponding parameters through iteration, but only one calculation of a generalized inverse matrix is required.  combine their advantages to establish a new method. TVF-EMD is a valid method to remove data noise and decrease the nonstationary. In addition, LMD has the capability to overcome the related frequency problems of EMD-based decomposition methods. e detailed steps of LMD can be found in Appendix B. Hence, this paper adopts the hybrid of TVF-EMD and LMD to decompose the original short-term tra c ow data. e simple owchart of the proposed model is shown as Figure 1.

The Hybrid Forecasting Model
Central to the decomposition method is to decrease the nonstationary of the original time series and then obtain the smoother subseries. To this end, the following conditions should be satis ed in the further decomposition of IMFs by LMD. First, the number of local extreme points of decomposed IMFs is not less than 3 according to the principle of the LMD method. Second, to avoid extreme oscillations in the decomposition sequence, the total energy of the subsequences obtained by decomposition cannot be higher than that of IMFs to be decomposed. e rst condition means that not all IMFs need to be decomposed; the second one controls the amplitudes of the subseries, which reduces the nonstationary and obtains smoother subseries.

Case Study
To verify the performance of the proposed in a more thorough way, this work collected two groups of data with di erent characteristics at the intersection of main roads in the urban area of Chongqing, its schematic is shown in Figure 2. From a longitudinal perspective, the maximum tra c ow of data set 1 does not exceed 50 (vec/5 min), which is a low-tra c tra c ow; from the other direction, the tra c ow has a certain periodicity, and each cycle is about one day, which is also consistent with social regularity.
Because the same experiment will be conducted on the two groups of data, we will discuss data set 1 in detail and the forecasting results of data set 2 are simply presented in Section 5.4. e statistical results of data set 1 are shown in Figure 3.
To make an e ective analysis and evaluation to prediction results, this paper divides the data set into the training part and test part, in which two-thirds of the data set is considered as the training part and the remaining part is the test part [37].

Data Decomposition
Results. According to the owchart described in Section 4, the original short-term tra c ow is decomposed by TVF-EMD, and several IMFs are obtained. e results of TVF-EMD decomposition are displayed in Figure 4.
It can be intuitively observed from Figure 4 that the oscillation of IMFs is decreased gradually. Furthermore, compared with IMFs 1-5, the IMFs 6-10 show a smoother uctuation. Because LMD has the capability of handling smooth and symmetric signals [27]. Besides, it can avoid a negative frequency trap. en all IMFs are tested by LMD decomposition conditions in Subsection 4. Finally, the IMF 7 of the training set of data set 1 can be further decomposed into 5 di erent subseries, respectively. eir LMD decomposition results are presented in Figure 5.

Prediction Results and Comparative Analysis.
is work adopts the ELM model to forecast subsequence decomposed by the proposed approach, then superimpose the prediction result of all subseries, by which the corresponding prediction results of the test set can be obtained. In order to highlight the advantages of the hybrid model, several commonly used single models (ARIMA, ELM, DBN) are used for comparison. In addition, this paper will prove the superiority of the proposed decomposition method by comparing it with the conventional methods (EMD, TVF-EMD, and VMD). Finally, it will make a comparative analysis in terms of the prediction results of all involved models. For the convenience of description, three contrastive hybrid models are renamed in Table 1.
To intuitively exhibit the comparative results among the above-involved models, the predicted results of single models and the proposed model are shown in Figure 6. Similarly, the forecasting results of the hybrid models based on the traditional decomposition method and the proposed model are shown in Figure 7.
Some information can be captured from Figures 8 and 9. First, the prediction trend of the proposed model is closer to the true one. Second, the proposed model achieves the best performance in local intervals [453,457]. e above results can only re ect the advantages of the proposed hybrid model in the local part. erefore, we use four popular indicators including mean absolute error where σ i is the ith true value in the test set, σ i ∧ is the corresponding predicted value, and n is the number of samples  e evaluation indicators about these seven models are presented in Table 2 (data set 1).
On the one hand, the smaller average amount of the original traffic flow may lead to a larger MAPE. In addition, the prediction performance of the deep learning model is influenced by the hyperparameter, which will produce a poor prediction effect. On the other hand, Table 2 shows that the proposed hybrid model achieves the best performance in prediction accuracy. Furthermore, the comparison with the ELM model, models 1-3 indicate that the application of the decomposition method has the capability of decreasing nonstationary. Similarly, the prediction accuracy of the proposed model is higher than the model 3, which reveals the superiority of the LMD method.
To avoid the biased results by chance and prove the robustness of the proposed model [38], this paper selects the cross-validation strategy to verify the involved prediction models. e MAE value of the involved models by crossvalidation strategy is showed as follows: From Table 3 and Figure 8, the proposed method exhibits the stable results under the cross-validation strategy and owns the best performance in the involved models.

Additional Case.
Another group of data is further used to verify the effectiveness of the proposed method. e corresponding statistical result is displayed in Figure 9. It is easy to find that the traffic flow data of data set 2 also feature a periodicity, while the peak value in the interval [250, 350] is smaller than other peak values. In addition, the average value of data set 2 is larger than that of data set 1.   Similarly, TVF-EMD is employed to decompose original traffic flow into several IMFs that the fluctuation could become weaker gradually. en the IMFs that meet the decomposition condition of LMD are further decomposed. Finally, ELM is utilized to forecast using the same way.
With the update of the training set step by step, all the forecasting tasks are completed. en, the corresponding evaluation indicators of all involved models are summarized in Table 4.
From Table 4, the MAE and RMSE values of all involved models are higher than that of Table 2, which shows the data set 1 and data set 2 own different characteristics. Further, the MAPE and RMSRE value of the proposed method is relative low, there are perhaps two reasons: 1. the average traffic flow is larger than data set 1; 2. the decomposition model proposed in this paper can not only reduce the nonstationarity but also make the decomposed subsequence smoother, both of which are beneficial to handle the strong nonstationarity series (data set 2). But the proposed model both achieve the best performance.

Conclusions
In view of the shortcoming of the traditional decomposition methods, this paper proposes a new decomposition method based on the combination of TVF-EMD and LMD. To the best of our knowledge, this method is first applied to the field of short-term traffic flow prediction. Compared with the traditional TVF-EMD method, it not only decreases the nonstationary the time series but also further decomposes the IMFs to avoid the negative frequency trap by the LMD method and obtain the smoother series. From the perspective of prediction accuracy, a case study based on two groups of the measured traffic flow data verifies that the TVF-EMD-LMD method can achieve the highest prediction accuracy. erefore, this developed self-adaptive decomposition method (i.e., TVF-EMD-LMD) may have great potential in the prediction of short-term traffic flow.

A. The complete steps of TVF-EMD
Step1.: Find the position of extreme maximum in input signal x(t) and denote them as u i (i � 1, 2, . . . L); Step2.: Seek all intermittences of u i that satisfies the following formula and mark them as e j (j � 1, 2, . . . K); where φ ′ (·) is bisecting frequency introduced in Subsection 3.1 and ρ is the threshold. If vd, it means e j locates on a rising edge of φ ′ (·) and vice versa; Step3.: Ensure the true peak and bottom parts of φ ′ (·) in terms of rising and floor edge; Step4.: Readjust the local cut-off frequency by interpolating extreme points in new peak parts and bottom parts; Step5.: Take the extreme timing of h(t) as knots, and the expression of h(t) is as follows. en utilize the B-spline technique to interpolate between knots, and the corresponding approximate results are marked as r(t) h(t) � cos φ ′ (t)dt .

(A.2)
Step6.: Select the stop criterion to examine whether the x(t) satisfies the condition. e criterion is shown as follows: where B Loughlin and φ avg (t) are Loughlin instantaneous bandwidth and the weighted average of the instantaneous frequency, respectively; ξ is given bandwidth threshold; the calculation method of B Loughlin ; and φ avg (t) is detailed in [39,40]; Step7.: if x(t) satisfies the stop criterion, then it will be regarded as IMF; otherwise, x(t) � x(t) − r(t), repeat steps1∼6 until obtaining all IMFs.

B. The complete steps of TVF-EMD
Step 1: Initial (p, q � 0), take IMF i as an input signal and redefined as u p,i , i.e., u p,i � IMF i .
Step 2: Check whether u p,i satisfies the first decomposition condition introduced in Section 4, then p � p + 1; Step 3: If the condition is satisfied, then calculate the local means and magnitude of the input signal by moving average and denoted them as m pq,i (t) and a pq,i (t), respectively; Step 4: Calculate b pq,i (t) � u p,i (t) − m pq,i (t) and s pq,i (t) � b pq,i (t)/a pq,i (t), then (q � q + 1); Step 5: Judge s pq,i (t) is a purely frequency modulated signal or not. If s pq,i (t) is not a purely frequency modulated signal, back to step 3; otherwise, obtain subseries of IMF i by following formulas;  where PF pq,i (t) is one of subseries of IMF i .
Step 6: Regard u p,i (t) � u p,i (t) − PF pq,i (t), then examine whether the u p,i satisfies the first decomposition condition. If it satisfies, back to step2 and continue decomposing; otherwise, obtain all subseries PF j,i (t)(j � 1, 2, · · · p) and residual signal u p,i .
Step 7: Check whether all decomposed subseries satisfy the second decomposition condition. If it is true, regard them as eventual subseries of IMF i . Otherwise, it cannot further decompose.

Data Availability
Some or all data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.