A Noise-Immune Boosting Framework for Short-Term Traffic Flow Forecasting

Accurate short-term traffic flow modeling is an essential prerequisite to analyze and control traffic flow. Canonical data-driven methods are a large account of parameters that may be underfitted with limited training samples, yet they cannot adaptively boost their understanding of the spatiotemporal dependencies of the traffic flow.)e noisy and unstable traffic flow data also prevent the models from effectively learning the underlying patterns for forecasting future traffic flow. To address these issues, we propose an easy-to-implement yet effective boosting model based on extreme gradient boosting and enhance it by wavelet denoising for short-term traffic flow forecasting. )e discrete wavelet denoising is employed to preprocess the noisy traffic flow data. )en, the denoised training datasets are reconstructed to train the extreme gradient boosting model. )ese two components are integrated seamlessly in a unified framework, and the whole framework can retain the features in the data as much as possible. Ourmodel can precisely capture the hidden spatial dependency in the data. Extensive experiments are conducted on four benchmark datasets compared with frequently used models. )e results demonstrate that the proposed model can precisely capture the hidden spatial dependency of the traffic flow data and achieve superior performance.


Introduction
Intelligent transportation system (ITS) plays an important role for traffic management and control [1,2], which significantly benefits traffic safety enhancement, traffic efficiency, traffic congestion alleviation, and so forth. Accurate traffic flow forecasting in a roadway network provides crucial information for the ITS to implement proactive and efficient traffic management decisions. More specifically, traffic flow forecasting estimates traffic state variation tendency by exploiting traffic flow intrinsic patterns via a large amount of historical data [3]. With the fast development of information and electronic technology, the traffic flow data collection changes from original single-source to multiple sources [4], for example, inductive loops, remote microwave, Bluetooth, video, and float cars with GPS navigation. However, as the external environment of the transportation system is complicated, the unobservable factors may interfere with the raw traffic data collected from detectors [5]. Such interference results in the degrading of reliable and accurate traffic flow forecasting [6].
Traffic flow is a complex dynamic system [7]. e intrinsic periodicity and correlation indicate governing the evolution of the traffic flow. After years of research efforts, traffic flow modeling has achieved considerable results in both theory and practice. e traffic flow forecasting methods are mainly divided into two categories: modeldriven methods and data-driven methods. Model-driven methods include Kalman filtering models [8,9], k-nearest neighbours [10], and time series models [11]. ese kinds of methods are robust and efficient, but they are expertise handcraft. e most representative data-driven methods for traffic flow forecasting are neural networks, such as recurrent neural networks (RNN) [12] and convolution neural networks (CNN) [13]. However, the performance of these neural networks highly depends on the quality and quantity of the training samples [14]. A large number of parameters in the deep networks may be underfitted with limited training samples or noisy samples and thus result in low training efficiency [15]. In particular, it is easy to gradually propagate the errors and prevent the network from achieving high accuracy. It is difficult for a static learning model to reflect the periodicity, nonlinearity, and randomness of the traffic flow. In recent years, the online boosting models receive substantial attention and have been successfully applied in this field. As an important field of machine learning, the boosting models have unique advantages in time series modeling [16].
e boosting models, such as gradient boosting decision tree (GBDT) and adaptive boost strategy (AdaBoost) [17], exhibit their adaptive learning ability for large-scale distribution processing of traffic flow data. ey are widely used in complex systems, such as short-term traffic flow forecasting [18], feature recognition of urban road traffic accidents [19], and taxi travel time forecast [20]. e boosting models, such as gradient boosting machine (GBM) [21] and gradient boosting decision tree (GBDT) [22], can approximate periodicity functions well and perform satisfactorily in specific data and applications. Nevertheless, the boosting model has randomness in the selection of weights and thresholds, which affects the convergence speed and results of the network [23,24]. However, the aforementioned on-shelf boosting models are complicated for the traffic engineers to be integrated into the existing ITS. Exploring an effective and easy-to-implement model for short-term traffic flow forecasting is still essential.
In this paper, we propose a boosting model based on extreme gradient boosting (XGBoost) enhanced by discrete wavelet denoising, which addresses the two shortcomings we have mentioned above.
is idea was first present in a conference [25] and has been admired by transportation engineers. XGBoost is a scalable end-to-end tree boosting system [26], improved from GBDT. It learns a set of regression trees (CARTs) in parallel and obtains the result by summing up the score of each CART [26]. However, the noisy and unstable traffic flow makes the XGBoost difficult to identify underlying patterns for predicting future traffic flow [4]. In this regard, we propose to preprocess the traffic flow data by discrete wavelet denoising, which can reduce the impact of noise in traffic flow. Compared with the original GBDTalgorithm, one of the special improvements is the regularized objective of the loss function. We further take the spatiotemporal correlation of the traffic flow into consideration. We reconstruct the traffic flow datasets by involving the phase space reconstruction theory. In the end, XGBoost is executed to forecast future traffic flow. e performance of XGBoost for traffic flow forecasting is greatly improved and ensures accuracy and robustness. is work was first accepted as a poster at the 8th International Conference on Digital Home [25]. We have refined our model, reconducted most of our experiments, and rewritten our paper. e main contributions of this work are listed as follows: We construct a boosting model for traffic flow forecasting enhanced by discrete wavelet denoising We investigate the forecasting performance of different mother wavelets to reveal the best one for traffic flow, and we reconstruct the traffic flow by considering the phase space reconstruction theory We evaluate the proposed model on four benchmark datasets e results demonstrate that the proposed model outperforms frequently used models with lower computation cost.

Wavelet Denoising.
Denoising algorithm has received considerable attention in various fields [8,9]. Most of the conventional filtering techniques, such as mean filter, Gaussian filter, and minimum mean squared error filter, cannot always guarantee the acceptable quality of denoised traffic data [27]. In recent decades, the discrete wavelet transform (DWT) has been applied to dispose of the problem of noise reduction, and it has outperformed traditional filters in terms of root mean squared error (RMSE), PSNR, and other evaluations [28]. e wavelet denoising algorithm has been well acknowledged as an essential method. In mathematics, the essence of wavelet denoising is a function approximation problem, in other words, finding the best approximation of the original signal in the wavelet space developed by the scaling and translation of the Wavelet generating function, according to the proposed criteria, to achieve the complete distinction between the original signal and the noise signal. Compared with the noise feature, the larger amplitude in the wavelet domain is the coefficients with important signal characteristics, while the amplitude of noise coefficients is smaller. erefore, the wavelet coefficients with larger absolute value can be retained or contracted only by setting an appropriate threshold, and the estimated wavelet coefficients (EWC) have been obtained.
For the traffic flow data case, the wavelet denoising algorithm transforms the data to a time-frequency domain under DWT processing. en we could keep only some large coefficients and throw away the rest using a proper threshold level, too. e result is that a small number of largest coefficients which have crucial information are saved, while most of noise coefficients that are small will be discarded. If we use DWT to decompose the high-frequency noise from the original traffic flow data, the periodical pattern in the traffic flow would be identified efficiently. Furthermore, an example of the application of wavelet analysis in traffic flow denoising is demonstrated in Figure 1. From Figure 1, by comparing the actual data and the denoised data, we can see that the waveform of the denoised data is much smoother than the one of the real data. We expect that the denoised data positively affects the following traffic flow analysis and prediction.

XGBoost.
XGBoost is a scalable machine learning system for tree boosting. e system's impact has been widely recognized in several machine learning and data mining challenges. e system is widely applied in domains such as high energy physics event classification, customer behavior prediction, ad click-through rate prediction, and massive online course dropout rate prediction [29]. e most crucial factor behind the success of XGBoost is its scalability. e system runs more than ten times faster than existing popular solutions on a single machine and scales to billions of examples in distributed or memory-limited settings. e scalability of XGBoost is due to several essential systems and algorithmic optimizations [26]. ese innovations include a novel tree learning algorithm for handling sparse data and a theoretically justified weighted quantile sketch procedure, enabling handling instance weights in approximate tree learning. Parallel and distributed computing make learning faster, which enables quicker model exploration. More importantly, XGBoost exploits out-of-core computation and enables data scientists to process a hundred million of examples on a desktop. Finally, it is even more exciting to combine these techniques to make an end-to-end system that scales to even more extensive data with the least cluster resources.

Methodology
In this section, we first give the mathematical definition of the short-term traffic flow forecasting task. en, we propose to preprocess the raw traffic flow data by wavelet denoising. After that, the adaptive gradient boost algorithm performs more effectively on the denoised traffic flow data Algorithm 1.
where K is the number of measurement points on the road network. e traffic flow forecasting task is to train a model F to predict the traffic flow given a dataset S � s (1) , s (2) , . . . , s (T) , where T is the number of training samples. s (t) � v t− τ+1 , . . . , v t , and τ is the time lag. In this way, given a query sample s (q) , traffic flow prediction at t + 1 time interval can be denoted as

Traffic Flow Preprocessing.
As previously mentioned, traffic flow consists of period trend in low frequency and minute-to-minute fluctuations in high frequency [9]. e high-frequency traffic flow fluctuations are often considered as the noises to the periodical traffic flow trend [30]. If these high-frequency noises are learned by a statistical learning models, the models will produce unstable predictions for future traffic flow. Hence, it is important to eliminate the noises in the traffic flow to concentrate the learning-based models to the evolution trend of the traffic flow. In this regard, we propose adopting a wavelet denoising method to eliminate the high-frequency noises in the traffic flow to avoid the learning-based models to learn the minute-tominute fluctuations. e wavelet decomposition transforms the original traffic data into a couple of oscillatory waveforms in different frequencies, and the structure of each waveform at a specific instant can be determined. In this way, the traffic flow signal can be localized in both the time and frequency domains.
Relying on this property, the wavelet transform is widely adopted for traffic flow denoising.
Given a mother wavelet φ(ω) (i.e., dbN wavelet), the continuous wavelet transform (CWT) of a signal is denoted as where a is the scale or dilation parameter and b is the translation parameter that reflects the position information of the wavelet according to the time information. Further, since the traffic data is discrete, the wavelet transform is discrete as discrete wavelet transform (DWT). DWT parameter a and the translation parameter b are discredited by the dyadic sequence; for example, a � a m 0 , b � nb 0 a m 0 , and m, n ∈ Z. When a 0 � 2 and b 0 � 1, the DWT degrades to binary wavelet.
where X h (t) and X l (t) are the high-frequency information and low-frequency information of traffic flow, which are calculated by the following equations: Input: V(t), date set of the traffic flow Output: Split and default directions with max gain; (1) Step1: Decompose the data v(t) wavelet into high-frequency information cD and low-frequency information cA; (2) Step2: Reduce the sampling rate of high-frequency information cD to half to get new high-frequency information l; (3) Step3: Decompose the low-frequency information cA and the new high-frequency information l by inverse wavelet to obtain the reconstructed data; (4) Step4: Import the reconstructed data into xgboost for training; (5) gain ⟵ 0 (6) G ⟵ i∈I , g i , H ⟵ i∈I h i (7) for k � 1 to τ do (8) //enumerate missing value goto right (9) G l ⟵ 0, H l ⟵ 0 (10) for j in sorted( I k , ascent order by s jk ) do (11)  en, the high-frequency information is handled by a threshold.
After the wavelet decomposition process, the high-frequency information and low-frequency information of traffic flow are reconstructed. Reconstruction is the inverse process of decomposition. After an upsampling process of the high-frequency information and low-frequency information, the new training label X(t) is obtained by convolving the inverse transformation of high-pass and low-pass filter with the coefficients.
By this transformation, the proposed technique removed the high-frequency noise in the traffic flow signal and also can preserve the quality of the original data, fulfilling our purpose.

Model Training.
e extreme gradient boosting (XGBoost) is an efficient tool for large-scale parallel boosted trees, which can be effectively applied to classification and regression tasks [26]. e XGBoost improves the gradient boost decision tree (GBDT) by enhancing parallel computing, approximate tree building, and sparse data processing. It also optimizes the usage of computational cost, making it suitable for multidimensional data feature recognition and classification.
In this paper, we first transform the traffic flow forecasting into a supervised learning task. Different from GBDT, XGBoost adds a regularization term to the objective function to reduce the complexity of the model and avoid overfitting.
where y i is the prediction, y i is the ground truth, Ω(f k ) is the regular term, f k is a decision tree, p represents the number of leaf nodes, Ω represents the fraction of leaf nodes, c controls the number of leaf nodes, and λ controls the fraction of leaf nodes. e objective function constructed by the iteration of the XGBoost is By the second-order Taylor process, the convergence speed of the model is accelerated, and the optimal global solution is obtained.
where g t � z y (p− 1) l(y t , y ) is a first-order derivative and ) is a second-order derivative. e experiment tries to add partitions to the existing leaf nodes in each step for generating the optimal tree structure. e splitting gain is When the splitting gain is less than the fixed value or the number of times the division reaches the specified maximum depth, the division stops. We can get the final regression model. Traffic flow prediction is essentially a regression prediction task, so we use regressor as the base learner of XGBoost. We put the wavelet transformed traffic flow data into XGBoost for training. In each training, we minimize each regressor to fit the residual error generated by the last leaf and calculate the split score to determine whether to generate a new leaf. Finally, we can obtain the predicted traffic flow data by adding each leaf. e proposed framework is provided as follows. e computational complexity of the proposed method is slightly higher, and the wavelet denoising and XGBoost are suitable for parallel computing.
is means that we can preprocess a proportion of data to train the XGBoost model. As the XGBoost model is training or prediction, we simultaneously preprocess the next batch of data to update the model. e core of XGBoost is approximate calculation, the complexity of which is O(K d‖x‖log q), where d is the maximum depth of the tree and K is the total number of trees.

Data Description.
We employ four benchmark datasets to evaluate the performance of the proposed method. e traffic flow data were collected from four sites on the highways ending on the ring road in Amsterdam, Netherlands, a short distance before the merged points. e data on the four sites (i.e., A1, A2, A4, and A8) were collected from May 20, 2010, to June 24, 2010. Highway A1 connects Amsterdam and the German border. It is the first highcapacity road, and its flow pattern is difficult to find. e A2 highway is one of the busiest highways in Netherlands, connecting Amsterdam and the Belgian border. In the experiment, we used the data before widening. It can test whether our model can predict congestion well. e A4 motorway is part of Rijksweg 4, starting from Amsterdam to the Belgian border. It is representative of a mature highway, which can prove the universality of the model. e A8 highway starts from the A10 road at the Coenplein interchange and is less than 10 kilometers from Zaandijk. Because the road has more connections with other highways, the vehicles' speed on the road is constantly changing. By predicting the road as mentioned above traffic flow, we can study the impact of expressway capacity, travel time changes, and model prediction accidents. e 1-minute average traffic data over five weeks were collected by MONICA sensors (velocity-flow measurement points). e datasets are split in Complexity 5 chronological order with 70% for training, 10% for validation, and 20% for testing. Missing values are excluded from both training and testing.

Baselines.
We compare XGBoost with the following frequently used models in intelligent transportation systems: Decision tree (DT) is a decision support tool that uses a tree-like model of decisions and their possible consequences Artificial neural networks (ANN) are computer programs inspired by biological design to simulate how the human brain processes information ANNs gather their knowledge by detecting the patterns and relationships in data and learn (or are trained) through experience, not from programming.

Support vector regression (SVR) is a version of support vector machines (SVM) for regression
Gradient boosting decision tree (GBDT) is an iterative decision tree algorithm e algorithm consists of multiple decision trees, and the conclusions of all trees are accumulated to make the final answer.

Experimental Setup.
We use two criteria, root mean squared error (RMSE) and mean absolute percentage error (MAPE), to evaluate the performance of the proposed model, as defined in the two following equations, respectively.
Our experiments are conducted under a computational environment of Intel Core i7@3.60 GHz with 8 GB RAM. To determine the number of optimal lags in the model, we set the forecasting lags of n � 6, 8,10,12,15,20 { } for the model and use the MAPE to evaluate the optimal lags. Figure 2 shows that when n � 12, the value of MAPE is the lowest, so we set the lags for the forecasting at 12. We train our model using an optimizer with an initial learning rate of 0.01, the number of decision trees of 100, and the random sampling ratio of each tree of 0.5. Table 1 lists the performances of denoised XGBoost model and baseline models for 10minute ahead prediction on four benchmark datasets. XGBoost archives superior results on all datasets. It outperforms frequently used models, including DT, ANN, SVR, and GBDT. Compared to other prediction models, XGBoost outperforms the DT, ANN, and SVR significantly.

Performance Evaluation.
Regarding the second-best model GBDT in Table 1, XGBoost achieves more accurate predictions over GBDT on all the datasets. is is because our model eliminates the noises in the traffic flow, which prevent the learning-based models from learning the temporal dependencies. erefore, XGBoost can discover implicit relationships within data.
We also compare the proposed framework with the conventional models in Table 2. From Table 2, we can observe that XGBoost outperforms the conventional ones.

Ablation Study.
e observed traffic flow data quality is crucial for traffic flow prediction accuracy, and thus data quality control is essential to smooth the noisy traffic flow data. To comprehensively compare varied denoising framework performance, we employ the wavelet denoising model with different wavelet bases to preprocess the raw data. e RMSE and MAPE statistics help us analyze varied smoothing methods quantitatively. Overall, there is no significant difference between varied smoothing models at the same time span data samples. Taking traffic flow  denoising results on the 10 min data from sensor A1 as an example, the db4 model obtains optimal noise removal performance compared to other wavelet based denoising results. To further examine the denoising effects of different models, we looked at smoothing details of how each of the models addresses the outliers in the original traffic flow data. It is observed that the wavelet denoising can successfully smooth the anomaly oscillations without discarding data details. Taking the denoising effect on data samples with sensor A1 at a 10 min scale as an example, the variation tendency was successfully shown in the denoised traffic flow data in Figure 3. Table 3 shows a similar smoothing result for the traffic flow data at sensor A1 under 10 min. In sum, various wavelet-based models showed similar results on  Complexity suppressing the data outliers, and db4 obtained slightly better performance compared to other smoothing methods.
We also compare the performance of the model when the window is 20 and 30 in Table 4. e following significant test for the experiment results is illustrated in Table 5. From Table 5, the PR value of the model is 0.000001, which is much smaller than 0.01. e PR value of the highway is 0.72, which is much larger than 0.01. In this regard, we conclude that the choice of model has a significant influence, while the choice of highway has no influence. Figure 1 shows the result for comparing the observed traffic flow data and the prediction value of the A1 detector in the test sample. XGBoost model has a more accurate and smoother result of short-term traffic flow. e integrated model is more suitable for periodic data than canonical methods.

Conclusion
In this paper, we propose an easy-to-implement and effective boosting model for accurate short-term traffic flow forecasting. e noisy and unstable traffic flow data are firstly preprocessed by discrete wavelet denoising. We conclude that the db4 mother wavelet is more suitable for traffic flow data denoising by ablation study. en, the extreme gradient boosting model is trained by the denoised dataset reconstructed by considering the phase space reconstruction theory. Extensive experiments on four benchmark datasets reveal that the proposed method can better learn the periodicity, nonlinearity, and randomness of the traffic flow. e result demonstrates the outperformance by comparing the proposed model with the frequently used models. In the future, we plan to extend the method to other forecasting applications, such as load forecasting and taxi demand forecasting.
Data Availability e data and source code that support the findings of this study are available upon request from the corresponding author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.