Short-Term Traffic Flow Prediction with Weather Conditions: Based on Deep Learning Algorithms and Data Fusion

,which


Introduction
In recent years, with the continuous increase of vehicle ownership, the conflict between road resources and travel demand is becoming increasingly acute, which leads to the increasingly serious traffic congestion and even hinders the development of social economy. ITS is an effective means to alleviate traffic congestion, and short-term traffic flow prediction is the key to it. Accurate and timely prediction of traffic flow provides reliable basis of traffic control for governors and meanwhile offers appropriate travel advice for travelers so as to optimize road network and reduce traffic congestion. However, traffic prediction is a sophisticated and nonlinear problem. Traffic flow in reality has obvious temporal correlation and periodicity, but it may evolve in an irregular way under disturbance of weather changes, which makes this problem more challenging. e existing short-term traffic flow prediction models can be mainly divided into 3 categories: statistical models, traditional machine learning models, and deep learning models.
Statistical models include historical average (HA) and autoregressive integrated moving average (ARIMA). e former takes the statistical average value at a certain time slip in the past as the predicted value, while the latter establishes mathematical model based on the time series. is kind of method has been widely used for a long time because it can reveal the periodic changes of traffic flow data. In 1970s, Ahmed et al. [1] firstly applied ARIMA on short-term traffic flow prediction problem. After that, some improvements had been made on the ARIMA model. Voort et al. [2] combined Kohonen maps with ARIMA and proposed the KARIMA method to forecast traffic flow. Williams et al. [3] proposed seasonal ARIMA for traffic flow prediction on expressway. Min et al. [4] proposed GSTARIMA for shortterm traffic flow prediction in urban network. For further extraction on spatiotemporal correlation, Duan et al. [5] proposed an extended space-time ARIMA for short-term traffic flow estimation. However, statistical models use empirical data for parameter calculation based on transcendental knowledge, which is not suitable to reveal the nonlinearity and uncertainty of traffic flow.
Compared with the statistical methods, traditional machine learning methods like support vector machine (SVM) and support vector regression machine (SVR) show stronger function fitting ability in complex and nonlinear traffic flow prediction problem. e essential idea of this kind of method is to transform low-dimensional and linearly inseparable traffic data into high-dimensional and linearly separable expression through kernel function. Hong et al. [6] proposed a SVR traffic flow prediction model employing the hybrid genetic algorithm to determine the suitable combination of parameters. Lou et al. [7] presented the least square SVR algorithm for short-term traffic flow forecasting. Hu et al. [8] used particle swarm optimization (PSO) to determine optimal parameters for SVR for higher precision in short-term traffic flow forecasting problem. Ling et al. [9] proposed multikernel SVM and used adaptive particle swarm optimization (APSO) to improve it. Feng et al. [10] proposed a novel SVM with adaptive multikernel (AMSVM). Although there are a lot of optimization studies on this kind of method, limitation in the regression problem and lack of the ability of knowledge mining for large-scale traffic data still constrain the prediction performance.
With the emergence of traffic big data [11], short-term traffic prediction becomes more challenging and complex, which put forward higher requirements for data modeling. Deep learning models, with the effectiveness for high-dimensional space modeling and the ability to extract features of parameters through hierarchical representation, have become the mainstream technology of traffic flow prediction. Deep artificial neural network (ANN) [12], deep belief network (DBN) [13,14] based on restricted Boltzmann machine, and long short-term memory (LSTM) network [15] for time series problems have been studied and applied to some extent. In addition, Xiao et al. [16] presented a shortterm multistep freeway traffic flow prediction model with RBF whose center position of the hidden layer is determined by the fuzzy c-means clustering algorithm. Lv et al. [17] firstly used stacked autoencoder to learn the representation of traffic flow features for prediction. Abdi et al. [18] proposed a novel temporal difference backpropagation (TDBP) method in the training of RBF, which improved the shortterm traffic flow prediction accuracy. Dai et al. [19] combined the spatiotemporal analysis with GRU to predict short-term traffic flow. However, single model still has the limitation on the process of complex data. In order to integrate the advantages of single model to achieve more accurate traffic flow forecasting, a variety of combined models have emerged. Hong et al. [20] proposed the ARIMA-ANN combination model, using ARIMA to deal with the linear part of the historical data and ANN for the nonlinear part. Li et al. [21] combined the ARIMA model with the RBF model to capture the different aspects of the underlying patterns of traffic flow. Du et al. [22] proposed a hybrid deep learning framework based on RNN and CNN, which can capture the spatiotemporal dependencies of the traffic flow. e existing short-term traffic flow prediction methods mainly aim at data modeling for traffic flow, and little research has been done on the effect of external conditions like weather on traffic flow. Hall et al. [23] discussed how adverse weather affects traffic flow. Holdener et al. [24] considered the effects of weather condition on rural freeway flow. Jian et al. [25] investigated the microscopic traffic flow parameters under rainy environment. ese research studies did reveal part of the correlation between traffic flow and weather conditions, but the conclusions have not been applied to the prediction problem. Koesdwiady et al. [26] incorporated DBNs for more accurate prediction based on flow data and traffic data, and decision-level data fusion of traffic flow and weather data had been realized. Zheng et al. [27] proposed a combined architecture of embedded components, LSTM and CNN, to capture the relationship between traffic flow and weather. However, Koesdwiady et al. [26] did not have a further consideration on the weather decision (traffic prediction based on weather data only), making it hard to achieve high performance in decision-level data fusion. Reference [27] used embedding components to extract the weather disturbance but lacked the analysis and processing of weather parameters.
Based on the deficiency of existing methods, research studies on traffic flow prediction driven by both the traffic data and weather data are very important for mining data characteristics of traffic flow and improving the accuracy in prediction. is paper proposes a novel combined framework of SAE and RBF based on traffic flow and weather data. e main contributions are as follows: (1) Corresponding data processing according to the characteristics of the data: in terms of the non-numerical weather type parameter, we firstly use onehot coding for the original expression. en, with an embedding component, the explicable expression is learned. To deal with numerous weather parameters, the Pearson correlation coefficient (PCC) is calculated to find out the flow-related parameters and with principal component analysis (PCA), the selected parameters are processed to be a new parameter with higher correlation. In addition, to integrate time periodicity into prediction, HA is used to construct time expression based on historical traffic flow data. (2) Incorporating the SAE and RBF to capture the features of traffic flow and weather conditions: considering the effectiveness of combination modeling based on deep learning, we use SAE to learn the temporal correlation in traffic flow, RBF to learn the periodic evolution under weather disturbance, and another RBF to realize the decision-level data fusion of the former models. is combined framework can effectively learn the periodicity and temporal correlation of traffic flow and the disturbance of weather conditions so as to improve the accuracy and robustness of the prediction model.

Problem Description
Traffic prediction in our research is based on the former parameters of 12 consecutive 5-minute intervals to predict the output flow in any subsequent time slice. e output target y of the prediction model can be expressed by the following formula: Considering that the evolution of traffic flow is not only restricted by its own regularity but also disturbed by external weather conditions, the input parameters of the model need to include external weather factors in addition.
where y flow represents the flow prediction based on traffic sequence and y weather&time represents the flow prediction based on weather and time periodicity. From the perspective of decision-level data fusion, the final flow prediction value is the fusion value of two decisions, so the output y of the combined model can also be expressed as follows: In multistep prediction, y is represented by y i , and i is the step size. As shown in Figure 1(a), in the sequence-tosequence framework, the former output is also the input of the next unit, and in this way, the prediction results can be extended to any one time slice. At the beginning, this framework was proposed for machine translation [28]. As shown in Figure 1(b), the proposed model in this paper referred to sequence-to-sequence model for multistep prediction. Considering that the input of the prediction model includes parameters of different types (weather and traffic flow), some of the modules are adjusted to satisfy the modeling demand of multivariate data.

Data Process
e data from January 12, 2018, to June 11, 2018, are selected as the training set, and the data from June 17, 2018, to January 12, 2019, are selected as the test set. e primary objective of data mining in our research is to study the data rules from 6:00 to 21:00, which is the peak period of the day.

Traffic Data.
e traffic dataset of metro freeways in the Twin Cities is from the Regional Transportation Management Center (https://www.d.umn.edu/tdrl/traffic/). e original data are collected at a 30-second interval from more than 4,500 loop detectors. e No. 644 detector data from January 12, 2018, to January 12, 2019, with less errors and omissions are selected. In the data preprocessing stage, the data are processed into a table with 5-minute interval. Meanwhile, the omissions and error are corrected by using the principle of time similarity. Part of the processed traffic data is shown in Table 1.
To reveal the periodicity of traffic data under weather disturbance, we construct time-flow correlation expression by the HA method. e training set is divided into working days and nonworking days, and the average flow in every certain time slice is counted, which is taken as the representation of time slice. e time-flow correlation expression on time slice can be expressed as follows: where x flow i,j represents the flow of time slice i on day j.

Weather Data.
e weather dataset is from the National Oceanic and Atmospheric Administration (https:// gis.ncdc.noaa.gov/maps/ncei/lcd). On the map, we select the site with the closest location to detector 644 and the final selection is No. 72658414927. e collection time of weather data corresponds to traffic data. After data preprocessing, part of the weather data is shown in Table 2.
Weather type is a non-numerical parameter, so we use one-hot coding for the preliminary treatment. However, the sparse representation of one-hot coding cannot reflect the correlation between weather types, resulting in the model not effectively extracting its rich features during training. To solve this problem, an embedding component has been applied to extract the expression of higher dimension of weather type. e embedding vector of weather type can be expressed by the following formula: where x embedding is the trained embedding vector of weather type, while x one− hot is the one-hot expression; the relation between them is shown in Figure 2. Except for weather type, there are still 7 types of weather parameters. To select the parameters relevant to traffic flow, Pearson correlation coefficient ρ as formula (6) is calculated. X and Y stand for two target variables involved in the operation. e PCC of traffic flow and weather parameters is shown in Table 3, and the corresponding heat map is shown in Figure 3.
From Table 3, hourly dew-point temperature and hourly precipitation have the minimum Pearson correlation coefficient, so only the other five parameters are retained. For further extraction of weather parameters, the mathematical PCA method is used for feature-level data fusion. By linear transformation, the original set of variables with certain correlation is reformed into a new set of independent variables to replace the original ones. By this way, the original information can be retained and the similar information can be removed. Formula (7) is the original matrix A of weather parameters:

Complexity
A � where x DB 、 x RH 、 x Vis 、 x WB , and x WS are the selected parameters. After being processed by PCA, the new matrix P is generated as follows: where x pca is the fusion value of the selected weather parameters processed by PCA.

Proposed Methodology
To capture the features of traffic flow and weather conditions, we propose a combined framework of SAE and RBF, as shown in Figure 4. e combined framework can be divided into three modules: the flow prediction module (FPM), the weather and periodicity module (WPM), and the decision-level data fusion module (DDFM).

FPM Using SAE.
FPM uses SAE to extract the temporal correlation of traffic flow, as shown in Figure 5. e stacking of multiple hidden layers can improve the function fitting ability of the neural network for complicated issues. Weights of the former three hidden layers are trained by three different autoencoders (AEs) using backpropagation algorithm (BP algorithm). With these autoencoders, the input is reproduced and SAE learns multiple expressions of original data layer by layer. e input of the SAE is a sequence of flow data in continuous segments, which is denoted as Ignoring the calculation process of the model and focusing on the input and output, the output target of the model can be expressed as To train SAE, firstly, an autoencoder encodes the input X flow to a sparse representation as shown in the following formula: where X hidden1 is the learned sparse representation of the original input data and W hidden1 is the weight matrix between the input layer and the first hidden layer. en, with the other two autoencoders, the input is reconstructed again, as shown in formulas (12) and (13).
where W hidden1 , W hidden2 , and W hidden3 are transferred to SAE as the weights of first three layers. e weights of the last layer of the SAE are trained separately in all with BP algorithm after weight transfer. In this way, after training, SAE can capture the time correlation of traffic flow data according to the former traffic flow which is close to the predicted value in time.

WPM Using RBF.
As shown in Figure 6, RBF is a threelayer neural network, consisting of an input layer, hidden layer, and output layer. Unlike general ANN, the transformation of RBF from input space to hidden layer space is nonlinear, while that from hidden layer space to output layer space is linear. e training of the RBF can be summarized with two stages: (a) determine the center of basis function of the hidden layer with unsupervised learning like clustering algorithm and (b) train weights between hidden layer and output layer with supervised learning. e purpose of WPM is to generate the flow prediction with three processed parameters including x embedding , x timecode , and x pca , so the function of RBF can be described as follows:     Complexity In part (a), the K-means clustering algorithm is applied to find out m cluster centers of the input data as the radial basis of Gaussian kernel function. is procedure is summarized in Algorithm 1. e selected set of cluster centers from Algorithm 1 is also called the set of radial basis, which can be defined as x embedding x timecode    where σ represents the hyperparameters. With formula (15), linear indivisibility in low dimension becomes linearly separable in high dimension, which is the core idea of kernel function. us, nonlinear mapping from input layer to hidden layer is finished. en, with formula (16), the output target of RBF is determined.
where w j represents trainable weights from hidden layer to output layer. In part (b), to determine the value of w j , different methods of parameter updating can be attempted, including BP algorithm and pseudoinverse matrix method (PIM). PIM is a network training algorithm similar to BP, but the calculation of PIM is more simple. Since the output of the network is linear to the adjustable parameters, using PIM to solve linear equations directly for the weights is more efficient and accurate than using BP algorithm.
Pseudoinverse matrix is a generalized form of inverse matrix, aiming at the singular matrix or nonsquare matrix with no corresponding inverse matrix. Use W to represent the set of w j , Y to represent the output of the network, and X to represent the output matrix of the hidden layer processed by Gaussian kernel activation function. In training stage, we assumed that the input and the output of RBF satisfy the following formula: where Wcan be calculated as where dot is vector dot multiplication and pinv(X) is the pseudoinverse matrix of X. Algorithm 2 shows the calculation process of pinv(X). Figure 7, DDFM is realized by another RBF with the same configuration of that in WPM and differs from its input dimension. Different from the former which used RBF for data fusion on feature level, DDFM is designed for the decision-level data fusion of the outputs of the former modules. RBF is a kind of neural network with simple structure, so there is no need to consider the hierarchical structure when modeling, and it can meet the requirement of data fusion both in feature level and decision level. e output of DDFM is also the final prediction of the combined model, and the definition is (20)

Configuration and
Baselines. e proposed model and the baseline-model are implemented by Python-Keras, the data are normalized by MinMaxScaler, the iteration time is 600, the batch size is 256, the function optimizer selects RMSprop, and the loss function is MSE. According to the actual performance of each model based on the above standards, the relevant parameters are fine tuned. e used baselines include the following: Input: given the training set of [x embedding , x timecode , x pca ], the number of cluster centers m Procedure: Step 1: m points were selected randomly as clustering centers.
Step 2: calculate the distance between every point and the m clustering centers, and then the points are divided into the nearest clustering center, thus forming m clusters.
Step 3: recalculate the centroid (mean value) of each cluster, forming m new clustering centers.
Step 4: repeat step 2 to step 4 until the cluster centers no longer change or the set number of iterations is reached. Output: the m cluster centers in the final iteration as C � [c 1 , c 2 , . . . , c m−1 , c m ].

Performance Comparison in WPM.
e purpose of WPM is to capture the periodic evolution and weather disturbance. In data preprocessing stage, efforts have been done to meet the requirements of feature extraction of weather data, including PCC for feature selection and PCA for feature-level data fusion of the selected weather parameters. In addition, the scheme of WPM is selected from numerous experimental schemes. Some controlled trials with different scheme have been done. e comparison is shown in Table 4. e specific contents of each scheme are as follows: (a) RBF + PIM: use RBF without any process on the weather parameters. Use PIM to train the network. From Table  4, our selected scheme (RBF + PCC + PCA + PIM) achieves the best performance in all metrics. Control groups a-d confirm that the model with feature selection by PCC and feature-level data fusion by PCA can get more accurate prediction value. at is, our proposed method (PCC + PCA) for data process of weather parameters is significant for precise prediction by accurate extraction of periodicity and weather disturbance. Comparing experiment d with experiment e, we can conclude that using pseudoinverse matrix for solution (or weights) of linear equation in RBF can achieve higher accuracy in a simpler way and BP algorithm is probably not suitable for the training of RBF.

Performance Comparison between Proposed Model and Baselines in Single
Step. e performance comparison between our proposed model and the baselines is shown in Figure 8 and Table 5. e proposed model has the optimal performance in all the selected error indexes, with 10.378414% of MAPE, 10.059485% of SMAPE, 9.494741 of MAE, 151.153208 of MSE, and 12.294438 of RMSE. In Table 5, models g, h, and i get better performance in prediction for they all consider periodicity and weather disturbance in modeling. Experiment groups h and i confirm the effectiveness of decision-level data fusion. at is, with Input: matrix X Procedure: Step 1: singular value decomposition of X: X � UDV T , where U and V are the orthogonal matrices.
Step 2: construct the diagonal matrix S: Step 3: calculate pinv(X): pinv(X) � VSU T Output: pinv(X) ALGORITHM 2: Calculation of pinv(X).  Complexity the combination modeling idea and selected data fusion scheme, the proposed model can effectively learn the temporal correlation in traffic flow, the time periodicity from historical data, and the disturbance of weather conditions so as to improve the accuracy and robustness. In Figure 8, we divided these methods into two categories: prediction with traffic data only (a-f ) and prediction with multivariate data (g-i). e groups g-i which consider the weather conditions show higher accuracy compared with the groups a-f which do not.

Performance Comparison between Proposed Model and Baselines in Multiple
Steps. In Figure 9, with the increase of time step size, an upward trend of prediction error occurred in all models for the accumulated error from the last step, but the proposed model is still superior to baselines in most indicators. Compared with experiments in single step, the proposed model shows more obvious advantages of prediction accuracy in multiple steps. With the increase of prediction time span, the accuracy gap between models will be further widened, making the advantages of proposed model more obvious. From Table 6, we can see the indicators of the models in different steps. Models f and g achieve more precise prediction value when compared to the other models because they combine time periodicity with weather disturbance in modeling. Comparing f with g, g is slightly better than f in most indicators and most step sizes. In Step 05 and  Step 06, f shows slightly better performance in MSE and RMSE than g because time periodicity in modeling is more important for the prediction accuracy in long-term prediction. at is, the proposed model is more stable and precise no matter in single step or in multiple step.

Conclusion
A combined framework of SAE and RBF is proposed for short-term traffic flow prediction based on traffic data and weather data. Before modeling for traffic prediction, lots of work for data processing are involved in our experiment. For precise numerical representation for time periodicity, HA is used to create time expression based on historical flow. In terms of weather data, one-hot coding and embedding component are used for numerical expression of weather type while PCC and PCA are applied to the feature-level data fusion of weather parameters. After data processing, we incorporate SAE and RBF to capture the features of traffic flow and weather conditions. e final prediction considers the temporal correlation, time periodicity, and weather disturbance, owning higher accuracy and robustness. Quantities of experiments have been done to test the performance of the proposed model from different aspects, Step 03 (15 min

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no known conflicting financial interest or personal relationships that could have appeared to influence the work reported in this paper.