Short-Term Traffic Flow Prediction: A Method of Combined Deep Learnings

. Short-term traﬃc ﬂow prediction can provide a basis for traﬃc management and support for travelers to make decisions. Accurate short-term traﬃc ﬂow prediction also provides necessary conditions for the sustainable development of the traﬃc environment. Although the application of deep learning methods for traﬃc ﬂow prediction has achieved good accuracy, the problem of combining multiple deep learning methods to improve the prediction accuracy of a single method still has a margin for in-depth research. In this article, a combined deep learning prediction (CDLP) model including two paralleled single deep learning models, CNN-LSTM-attention model and CNN-GRU-attention model, is established. In the model, a one-dimensional convolutional neural network (1DCNN) is used to extract traﬃc ﬂow local trend features and RNN variants (LSTM and GRU) with attention mechanism are used to extract long temporal dependencies trend features. Moreover, a dynamic optimal weighted coeﬃcient algorithm (DOWCA) is proposed to calculate the dynamic weights of CNN-LSTM-attention and CNN-GRU-attention with the goal of minimizing the sum of squared errors of the CDLP model. Then, the neuron number, loss function, optimization al-gorithm, and other parameters of the CDLP model are discussed and set through experiments. Finally, the training set and test set for the CDLP model are established through the processing of traﬃc ﬂow data collected from the ﬁeld. The CDLP model is trained and tested, and the prediction results of traﬃc ﬂow are obtained and analyzed. It indicates that the CDLP model can ﬁt the change trend of traﬃc ﬂow very well and has better performance. Furthermore, under the same dataset, the results from the CDLP model are compared with baseline models. It is found that the CDLP model has higher prediction accuracy than baseline models.


Introduction
With the economic development, the number of motor vehicles in the urban area has increased rapidly, and traffic congestion and traffic accidents have become increasingly serious. In order to mitigate the urban traffic problem, intelligent transportation systems have been widely implemented [1][2][3][4]. Among them, short-term traffic flow prediction is one of the core parts of an intelligent transportation system, which provides the basis for traffic management, traffic control, and traffic guidance and also provides support for traveler's decision-making. Prediction of short-term traffic flow has always been a hot topic for scholars in the field of traffic engineering.
For short-term traffic flow forecasting, early research mainly focused on statistical learning methods based on traditional mathematical models. Under the assumption of a certain probability distribution, the parameters of the statistical forecasting model are estimated through theoretical inference, and the model's forecasting results have a better strong explanatory. e traditional methods mainly include Kalman filter models, time series models, and nonparametric regression models.
Okutani and Stephanedes [5] proposed two prediction models based on the Kalman filter theory to predict the traffic flow of streets in Nagoya. In the models, the newest prediction error and the traffic data of multiple adjacent road sections are considered to improve the prediction accuracy. Xie et al. [6] used the discrete wavelet decomposition method to denoise the traffic flow data and then established a Kalman filter model to predict the traffic flow, which reduced the interference of local noise on the original data and obtained better prediction results. Guo et al. [7] proposed an adaptive Kalman filter model, which uses the adaptive update method of variance to improve the parameters of the model, and verified that the prediction accuracy of the model is better than the traditional Kalman filter model through a large amount of highway traffic data. Emami et al. [8] proposed a fade memory Kalman filter model based on realtime data from the Internet of vehicles and Bluetooth detectors. is model considers the influence of weights and reduces the errors caused by the measurement method. Experiments show that the model can improve the accuracy of the forecast data. e autoregressive integrated moving average (ARIMA) model is widely used in traffic flow prediction. Ahmed and Cook [9] investigated the ARIMA model in representing freeway time series data and found ARIMA was more accurate than moving average, double-exponential smoothing models. Hamed et al. [10] applied the ARIMA model to forecast traffic volume in urban arterials, and it turned out to be the most adequate model in reproducing all original time series and is computationally tractable. In addition to the ARIMA model, the autoregressive integrated moving average model with explanatory variables, seasonal autoregressive moving average model and other variant structures ARIMA models have also been applied in the field of traffic flow forecasting [11,12]. e K-nearest neighbors (KNN) method does not require complex prior knowledge and precise function expressions. It has the advantages of a simple algorithm and good portability and has been applied in the field of traffic flow prediction. Zhang et al. [13] used the mean KNN and weighted KNN to establish traffic flow prediction models and comparative analysis was made. Cheng et al. [14] proposed an adaptive spatiotemporal KNN model, which comprehensively considers spatiotemporal weights, time windows, and other parameters, and simulation results demonstrated that the prediction effect of traffic flow has been further improved. e core content of the KNN is to design an appropriate search mechanism, and its prediction results rely on historical data. When the historical data are large, the search efficiency of this method will have a greater impact on the real-time performance of the prediction model. e basic idea of the support vector machine (SVM) method for traffic flow prediction is to map the original traffic flow data to the high-dimensional feature space through the kernel function and to find the linearly divided plane from the mapping space to solve nonlinear problems in traffic flow data. Yang et al. [15] proposed a short-term traffic flow prediction model based on spatiotemporal correlation and adaptive multicore SVM for the nonlinearity and randomness of traffic flow. Luo et al. [16] used the method of least square SVM to predict the traffic flow, in which a hybrid optimization algorithm is proposed to select the optimal parameters, and the experimental results show the model can improve the prediction ability and computational efficiency. Tang et al. [17] proposed a traffic flow prediction model that combines denoising schemes and SVM algorithms to improve the prediction accuracy. Results show the model outperforms that without denoising strategy. In addition to the traditional SVM model, variant SVM algorithms, such as seasonal SVM [18], which considers traffic data seasonality, and Online-SVR [19], which deals with special events, have also been applied in traffic flow prediction and good results are obtained. e development and wide applications of traffic information collection technology, such as inductive detector, geomagnetic detectors, radio frequency identification technology, radar detection, video detection, and floating car detection [20][21][22][23][24], provide a large amount of data for traffic flow prediction. At the same time, with the rapid development of artificial intelligence technology, deep learning, which has powerful data feature mining and nonlinear data fitting capabilities, has been successfully applied in many fields, such as image processing and speech recognition [25][26][27], and gradually used in traffic parameter forecasting [28][29][30][31].
Moreover, the key point of traffic flow forecasting research has also shifted from traditional statistical learning forecasting methods and shallow neural networks [32][33][34] to deep learning forecasting methods. e shallow neural networks, which only have a single hidden layer, cannot learn the deeper features of traffic flow data and their prediction accuracy is often lower than that of the deep learning network. e deep learning methods have been gradually applied to the field of traffic flow prediction.
Deep belief network (DBN) is an earlier deep learning method used for traffic flow prediction. Huang et al. [35] designed a combined prediction model with unsupervised learning DBN at the bottom layer and multitask learning layer at the top layer for supervised prediction. e multitask learning layer can make full use of the weight sharing in DBN and outperform predicted results. Koesdwiady et al. [36] incorporated weather conditions and traffic flow data into the feature space at the same time and designed a DBN network for unsupervised pretraining, and relevant data from San Francisco are used to conduct experiments to verify the effectiveness of the proposed method. Xu and Jiang [37] proposed a DBN-support vector regression model for short-term traffic flow, in which DBN is used to learn the internal characteristics of traffic flow and support vector regression to predict the traffic flow. Experiments show that the model can effectively predict traffic flow and has fine prediction accuracy. Han and Huang [38] proposed a traffic flow prediction model combining DBN and a kernel extreme learning classifier, in which the internal characteristics of traffic flow data are extracted by DBN and the kernel extreme learner is used to predict traffic flow. Experiments show that the model can improve the accuracy of traffic flow prediction and reduce simulation time.
Convolutional neural network (CNN) is also a typical structure of deep learning. It is a feedforward neural network used to solve data problems similar to a grid structure. It can accurately extract data features while reducing the complexity of the model. is efficient local feature extraction capability is conducive to better find the spatial correlation between traffic flow data, and then it is widely used in traffic flow prediction [39]. Zhang et al. [40] proposed a short-term traffic flow prediction model based on CNN, in which a spatiotemporal feature selection algorithm determines the optimal input data time lags and amounts of traffic flow data; then, CNN learns these spatiotemporal features.
e effectiveness of the model was verified by comparing the prediction results with actual traffic data. An et al. [41] proposed a fuzzy-based CNN traffic flow prediction model, in which the fuzzy approach is used to represent the features of traffic accidents. e experimental results show that the model has superior performance. Liu et al. [42] proposed a CNN-attention model to predict traffic speed. Experimental results show that the model has a great advantage in traffic flow prediction and the impact of different traffic flow temporal and spatial data on traffic flow can be found through visualizing the weights generated by the attention model. Peng et al. [43] proposed a spatial-temporal incidence dynamic graph recurrent CNN to predict urban traffic passenger flow and experiments show that the predictive performance of this network is superior to traditional predictive methods.
LSTM network is a deep learning structure and also a variant of recurrent neural network (RNN). RNN can be applied to the relevant forecasting field of time series data [44]. However, RNN has a problem of the disappearance of the gradient, which can be overcome by LSTM [45]. LSTM has been applied in the field of traffic flow prediction. Ma et al. [46] applied the LSTM to establish a traffic speed prediction model. e results show that the LSTM network effectively captures the time correlation and nonlinearity of the traffic state, and the prediction accuracy is better than most statistics methods. Zhao et al. [47] proposed a traffic forecast model based on LSTM considering temporal-spatial correlation in traffic systems. e results validate that the model can obtain better prediction performance compared with other representative forecast models. Tian et al. [48] proposed a multiscale smoothing method to fill in the missing values in traffic flow data and established an LSTM model to predict traffic flow. Experiments show that the LSTM model has better prediction performance than other prediction methods. Zhao et al. [49] established the LSTM model to predict traffic flow speed and validated that the prediction accuracy is higher than that of the support vector regression prediction method. Wang et al. [50] constructed an LSTM encoding and decoding model based on the attention mechanism for time series prediction, which includes periodic mode and recent time mode. Experiments show that the model is effective and reliable in long-term prediction of time series.
In addition, combination algorithms for traffic flow prediction, especially deep learning algorithms, have received more attention from scholars and produced a series of achievements. Zhou et al. [51] combined LSTM and SVR to build a model for short-term traffic flow prediction, in which a genetic algorithm is used to optimize the parameters of SVR.
e results indicate that the prediction model has higher accuracy than LSTM and CNN. Zhang et al. [52] proposed a model for short-term traffic forecasting, which integrates a graph convolution operator and a residual LSTM structure. e model is evaluated on a traffic speed dataset and better prediction results than six baselines are obtained. Li et al. [53] developed a deep learning-based method, including CNN and LSTM, for real-time movement-based traffic volume prediction at signalized intersections. In the model, CNN is applied to learn the spatial features of traffic volume and LSTM to learn the temporal dependencies. Xia et al. [54] proposed a distributed LSTM weighted model combined with a time window and normal distribution to enhance the prediction capability for traffic flow. Furthermore, the experimental results indicate that the model achieved accuracy improvement.
In summary, the deep learning methods have been widely applied to short-term traffic flow prediction and achieved series of results. Moreover, from the above literature researches, it can be found that the combination of multiple deep learning methods, such as a combination of CNN and LSTM, can improve the performance of the prediction model. LSTM is a variation of RNN, which can obtain the time series characteristics of traffic flow. Meanwhile, there is another variant of RNN, namely GRU, which can also obtain the time series characteristics of traffic flow and make traffic flow prediction [55,56]. e combined model of LSTM and GRU is used to predict traffic flow parameters, which has been discussed and applied in [57,58], and its outstanding performance in both prediction accuracy and stability has been proved. In the two works of literature, LSTM and GRU are serial structures. LSTM is firstly used to learn the spatial-temporal characteristics of data, and then GRU is used to predict traffic parameters or LSTM is firstly used to predict value and then encoder with GRUs further captures the relationship between the input sequence and the output sequence. However, the sequential combination structure of LSTM and GRU does not simultaneously use the advantages of the two to complement each other, and it also lacks CNN's guidance on the local trend of traffic flow. It is necessary to apply the combination of three deep learning methods to study the prediction of traffic flow. In addition, the attention mechanism theory [59] has the function of improving the data extraction capabilities of deep learning by imitating human vision to assign weights to data features and has been widely used in image processing and speech recognition [60][61][62][63]. Applying it to CNN, LSTM, and GRU deep learnings for traffic flow prediction is also worthy of discussion.
In this article, a DOWCA is presented, and a combined prediction model with CNN, LSTM, GRU, and attention mechanism for short-term traffic flow is proposed and discussed.
e main contributions of this study are as follows: (1) In order to build a combined traffic flow prediction model, a dynamic optimal weighted coefficient algorithm (DOWCA), is proposed, in which the weights of each single prediction method are Journal of Advanced Transportation calculated dynamically following new prediction results added. (2) A combined deep learning model for short-term traffic flow prediction, namely CDLP, is established based on the CNN, LSTM, GRU, and attention mechanism, which includes paralleled CNN-LSTMattention model and CNN-GRU-attention model. In CDLP, the dynamic weights for the two single models are calculated by DOWCA. (3) After parameter setting through experiment comparison and analysis, the CDLP model is trained and tested using traffic flow data from the field. e results indicate that the CDLP model outperforms baseline models. e rest of the article is organized as follows. In Section 2, the methodologies of CNN, LSTM, GRU, and attention mechanism are introduced. In Section 3, a DOWCA is proposed and the CDLP model is constructed. In Section 4, the experiment results and analysis are presented. Finally, a brief conclusion and recommendations for future work are presented in Section 5.

CNN.
CNN is a feedforward neural network with a deep structure and mainly composed of convolution layer, pooling layer, and full connection layer [64]. Among them, the convolutional layer is the most important part of CNN, which uses the convolution kernel to carry out a convolutional calculation for data from the input layer and outputs the convolutional characteristics of the data. If the CNN model contains multiple convolutional layers, then the number of output characteristic parameters by the convolutional layer is large. In order to reduce the number of parameters, the pooling layer is often used to carry out subsampling operations on the convolutional features of the data to extract part of the information and prevent the model from overfitting. e fully connected layer is usually used at the end of the CNN model to reduce unnecessary feature loss, in which all features are integrated and calculated as the final output.

LSTM Network.
LSTM is a variant structure of RNN, which can solve the problem of gradient disappearance and gradient explosion in RNN and can better realize the prediction of time series sequence. e LSTM network is composed of a series of basic cells. e basic cell structure is shown in Figure 1, which includes three gate structures: input gate, output gate, and forget gate. e orange lines in Figure 1 represent the input gate. e main function of the input gate is to control the input process of all information at time t. e information input process mainly includes two parts. One part is the process of updating the current time information through the tanh function to obtain a new state vector, and the other part is superimposing the current input and the output information of the hidden layer at the previous time through the sigmoid function. e specific implementation process can be expressed as follows: where W i , W c , U i , and U c are the weights of the input gates; b i and b c are the biases of the input gates; and σ and tanh are activation function, and their formulas are as follows: (2) e red lines in Figure 1 represent the forget gate, whose main function is to determine the redundant information to be discarded in the unit. e input of the forget gate includes input X t and output h t − 1 of the unit at the previous time. e output process is shown in formula (3).
where W f and U f are the weight of the forget gate and b f represents the bias of the forget gate. e forget gate uses the sigmoid function to superimpose the input values X t and h t − 1 , and the output value is limited to the range of [0, 1]; finally, the output value is multiplied by the output unit state C t − 1 at the previous moment. When the output value is 0, it means that the information will be completely discarded. When the output value is 1, it means that the information will be completely retained. e output information of the forget gate and the input gate is, respectively, multiplied and superimposed on each other to obtain the current unit output state. e specific calculation process is as follows: It can be seen from this formula that C t represents the long-term memory of all historical information at the current moment. e purple lines in Figure 1 represent the output gate. e output gate determines the output result of the entire basic cell, which is related to the cell output state C t at the Figure 1: e basic unit structure of LSTM. current moment. First, use the sigmoid function to process part of the information of the input unit to obtain the output O t of the output gate and then use the tanh function to process the information in C t . After the two sets of processed information are multiplied, the final output h t is obtained. e specific calculation formula is as follows: 2.3. GRU Network. Similar to LSTM, GRU is also a variant structure of the RNN algorithm, and it also has the function of dealing with the problem of gradient disappearance in RNN and ineffective long-term sequence memory. Compared with LSTM, GRU reduces the complexity of the structure by reducing the gates in the architecture. e cyclic structure of GRU consists of two gate structures, an update gate (purple lines) and a reset gate (red lines), and its cell structure is shown in Figure 2. e update gate z t can determine the memory information at the previous time and the remaining part of the information at the current time and continue to transfer the remaining information to the future time so as to obtain the long-term dependence in the entire network transmission process. e reset gate r t is mainly used to obtain short-term time dependence, control the operation of the hidden state information h t−1 and the current input value x t at the previous moment, and decide to forget the amount of information in the past.
Formulas (6)-(9) represent the calculation process of each state within each time step in GRU cell.
where W z , W h , and W g are input-related weight matrices; U z , U h , and U g are cyclically connected weight matrices; and b z , b r , and b g are related biases.

Attention Mechanism.
Attention mechanism focuses on important information by assigning different weights to input features. e process of focusing on important information is shown as the calculation process of weight. e higher the importance of information is, the larger the weight is allocated. In the application of attention mechanism in deep learning model, the calculation process of context vector and weight involved is as follows.
e output hidden state of the deep learning model is supposed as h 1 , h 2 , . . . , h i , . . . , h t , and the context vector C t can be calculated as follows: In formula (10), α t,i is the weight for h i , and the sum of the weights is 1. It can be calculated as follows: where e t,i is an alignment model, and its calculation formula is as follows: where W a , U a , and b a are the network parameters of deep learning model and s t−1 can be calculated as follows: where g(·) denotes the deep learning network. Based on formula (13), the output of the attention mechanism is expressed as follows: where softmax is activation function.

Dynamic Optimal Weighted Coefficient Algorithm.
Compared with a single prediction model, the combined prediction model can comprehensively utilize the advantages of multiple prediction models, improve the accuracy of prediction results, and has better robustness. In the combined prediction model, the calculation of the weighted coefficient of each single prediction model is the key. Generally, the optimal weighted coefficient algorithm (OWCA) is used, in which the weighted coefficient of each single prediction method is calculated with the goal of minimizing the sum of squared errors of the combined prediction [65][66][67][68]. e calculation principle is as follows. Suppose there are m prediction methods; the prediction value of the ith method at time t is y it , where i � 1, 2, . . ., m; t � 1, 2, . . ., N.
en, the prediction error e it of the ith prediction method can be expressed by the following: Let l 1 , l 2 , . . ., l m be the weighted coefficients of m prediction methods, respectively, and l 1 + l 2 + · · · + l m � 1. e prediction result of the combined prediction method, labeled as y t , can be calculated as follows: y t � l 1 y 1t + l 2 y 2t + · · · + l m y mt , (16) and the prediction error e t for the combined prediction method at time t can be obtained: Let J represent the sum of squared errors of the combined prediction method, then the problem of solving the optimal weight at time t can be expressed as the following optimization model: Formula (18) can be expressed in matrix form as follows: where L � (l 1 , l 2 , . . ., l m ) T represents the weighted coefficient column vector; R � (1, 1, . . ., 1) T represents the m-dimensional column vector with all 1 elements; E is the combined prediction information error matrix, E � (E ij ) m × m and E ij is expressed as follows: where e i represents the prediction error column vector of the ith single prediction method, and e i � (e i1 , e i2 , . . ., e iN ) T . If the prediction error vector group of m prediction methods is linearly independent, then the combined prediction information error matrix E is an invertible matrix. According to the Lagrange multiplier method [69], the optimal solution of model (18) can be obtained as follows: where L * is the optimal weight vector, namely, the optimal weighted coefficients of m prediction methods. According to the OWCA and the historical prediction error of each single prediction method, the optimal weighted coefficient of each single prediction method can be obtained so as to carry out the combined prediction. In the OWCA, the weighted coefficient of each single prediction method is fixed. However, in the prediction of time data sequences, such as traffic flow, with the increase of time, the prediction results of each single prediction method also increase. More importantly, the prediction errors of each single prediction method also vary. If the weighted coefficient of each single prediction method is invariable, it cannot reflect the influence of the newly increased prediction results of each single prediction method on the combined forecasting, which also affects the accuracy of the combined forecasting results.
erefore, based on the optimal weighted coefficient algorithm, a dynamic optimal weighted coefficient algorithm, namely, DOWCA, is proposed. In the DOWCA, with the increase of time, the amount of historical prediction error data increases continuously, the weighted coefficient of each single prediction method, namely, the dynamic weighted coefficient, labeled as l 1t l 2t . . . l mt , is recalculated by the OWCA. e dynamic weighted coefficients are applied to each single prediction method and the combined prediction results are obtained. e whole process of the DOWCA is shown in Figure 3, and the pseudocode of DWOCA is shown in Algorithm 1.

Combined Deep Learning Prediction
Model. CNN has the ability to obtain local trend features of data sequences, while LSTM and GRU have the ability to obtain long-term dependent features of data sequences. At the same time, the attention mechanism can make the deep learning model pay attention to important features. Based on this, a combined deep learning prediction model with CNN, LSTM, GRU, and DOWCA is designed for traffic flow prediction, namely, CDLP model. In the CDLP model, CNN, LSTM, and attention are connected sequentially and become the sequential combination structure, which is named as CNN-LSTM-attention model, i.e., one single traffic flow prediction model in the CDLP model. Moreover, CNN, GRU, and attention are also designed as the sequential combination structure and named as CNN-GRU-attention model, i.e., another single traffic flow prediction model in the CDLP model. en, the two sequential combination structures are paralleled and combined by DOWCA. From a layer standpoint, the CDLP model has three layers, input layer, hidden layer, and output layer. e hidden layer includes four layers, CNN layer, LSTM and GRU layer, attention layer, and dropout layer. e whole structure of the CDLP model is shown in Figure 4. e input layer of the CDLP model is the processed traffic flow data sequence, including training set and test set, which is simultaneously inputted to two paralleled CNN layers in the hidden layer of the CDLP model. e hidden layer of CDLP includes two CNN layers, LSTM and GRU layers, two attention layers, and two dropout layers in sequence. Moreover, all of them are paralleled. About the CNN layer, due to the periodicity and sequence of traffic flow data, 1DCNN is used and the output of 1DCNN is computed by the activation function ReLu. e formula of ReLu is as follows: About LSTM and GRU layers, if too many network layers are selected, the calculation of the entire network will be large and more training time will be needed. According to [70], when both the accuracy of the prediction model and the training time are considered, the two LSTM network layers are suitable, so two network layers in LSTM are selected. Similarly, two network layers are selected in GRU. e input of the first LSTM and GRU network layer is local trend features extracted by 1DCNN and its output is the state of the neural unit of the current LSTM and GRU layer. e second LSTM and GRU network layer mines the characteristics of the data and outputs the hidden layer state to the attention layer.
About the attention layer, the input state h 1 , h 2 , . . . , h i , . . . , h t comes from LSTM and GRU. Correspondingly, g(·) in formula (13) denotes LSTM and GRU. e last layer in the hidden layer, the dropout layer, is designed to prevent the occurrence of overfitting after the attention layer, which is the output from the hidden layer of CDLP to the output layer. Moreover, the input of the dropout layer is the output y 1 , . . ., y t−1 , y t from the attention layer.
e CDLP model is aimed to predict the traffic flow at the next moment based on the historical data. erefore, the output layer includes two paralleled neural units, which are actually the outputs of two single models, CNN-LSTM-   Input: the predicted value of different single model at time y it (i � 1, 2, · · · , m, t � 1, 2, · · ·) and actual data y t Output: combined prediction value y t (1) begin (2) calculate the prediction error e it of the i th prediction method by equation (15)  (3) for t � 1, 2, . . . do (4) construct the combined prediction information error matrix E by equation (20) calculate optimal weights by equation (21)  (8) calculate combination prediction results y t by equation (16)  (9) output y t (10) end (11) end ALGORITHM 1: e pseudocode of DWOCA Journal of Advanced Transportation 7 First, the abnormal and missing data in the original data are processed, in which the abnormal data are regarded as missing data. e Lagrangian interpolation method is used to process the missing data. In the process, four adjacent data before and after the missing datum are selected for interpolation to ensure the reliability of the interpolation data. en, the Min-Max method is used to normalize the data, and the calculation formula is as follows: where y min and y max are the minimum and maximum values of traffic flow, respectively and y and y ′ are the traffic flow data before and after being normalized, respectively. e normalized data are divided into the training set and test set. e data from February 15, 2019, to May 1, 2019, are    Journal of Advanced Transportation used as the training set, and the dataset from May 2, 2019, to May 15, 2019, is the test set.

Experimental Environment and Selection of Evaluation
Indicators. e hardware and software conditions in the experimental environment of this article are shown in Table 1.
In order to evaluate the traffic flow prediction performance of the CDLP model, three evaluation indicators are selected: MAPE, MAE, and RMSE. eir calculation formulas are as follows: where n is the total number of samples in the test set, y i is the ith actual value of sample, and y i is the predicted value of the ith sample.

Loss Function.
e loss function quantifies how close a given neural network is to the ideal state it is trained on. e average absolute error function and the mean square error function are used as loss functions commonly. Because of the convenient calculation of the mean square error function, in the CDLP model, the mean square error function is selected as a loss function and the calculation formula is as follows: where y i is the actual value, n is the total number of samples, and y i is the prediction value.

e Neuron Number in the CDLP Model.
e neuron numbers of the input layer and hidden layer should be set before the model is trained (the number of neurons in the output layer has been determined in Section 3.2). e following is the process of setting the number of neurons in the input layer and the hidden layer.
In order to obtain the appropriate neuron number of the input layer, 6, 12, 18, and 24 are selected, respectively, to train the model, and the optimal neuron number is obtained through error analysis of the test set. Similarly, for the setting of the neuron number of LSTM layers and GRU layers, four numbers of 16, 32, 64, and 128 are selected, respectively, to train the model. Moreover, the optimal neuron number is determined through the error analysis of the test set.
Regarding the error analysis of the test set, MAPE is selected as the main evaluation indicator, while MAE and RMSE are used as auxiliary evaluation indicators. e evaluation indicator results of the test set under different neuron numbers in input and LSTM layers are obtained, which include the MAPE, MAE, and RMSE, as shown in Table 2.
From Table 2, it can be seen that when the neuron number of the input layer is set to 12 and the neuron numbers of the two LSTM layers are set to 128 and 128, respectively; the MAPE, MAE, and RMSE of the model test set are all the smallest. It indicates that the neuron numbers of the input layer and hidden layer are the best for the model training effect under this setting. Moreover, the neuron numbers of the two GRU layers are the same as those of LSTM, i.e., 128 and 128, respectively.

Optimization Algorithm.
In the training process of the deep learning model, an optimization algorithm is used to iterate the model parameters to reduce the loss function value so that the training process of the model tends to be stable as the number of iterations increases. e optimization algorithms mainly include RMSprop and Adam. e two algorithms are applied to train the CDLP model and the better one is selected as an optimization algorithm according to the prediction results. After training of CDLP model under RMSProp algorithm and Adam algorithm, respectively, the results of three evaluation indicators are obtained and shown in Table 3.
It can be found from Table 3 that when the Adam algorithm is used to train the CDLP model, the MAPE, MAE, and RMSE are less than those of the RMSProp algorithm. It indicates that the Adam algorithm is more effective than the RMSProp algorithm and is selected as the optimization algorithm of the CDLP model.

Other Parameters.
In the 1DCNN layer, the convolution operation is implemented by convolution kernels, and 64 convolution kernels with a size of 2 × 1 are used, i.e., filters � 64, size � 2. In the dropout layer, the loss rate of the dropout function is set as 20%. In addition, the epoch is set as 500 iterations, and the batch size is set as 128.

Results and Analysis.
e CDLP model is trained and tested with a designed training set and test set after the above model parameters are determined. At the same time, in order to verify the advantages of the CDLP model, the prediction results from the single CNN-LSTM-attention model and single CNN-GRU-attention model are extracted during the process of training and testing for the CDLP model. Moreover, the corresponding results are obtained. Figure 5 shows the loss function curve of the training set and test set of CNN-LSTM-attention and CNN-GRU-attention. Figure 6 shows the prediction results of the CDLP model for the test set. From Figure 5(a), it can be seen that the loss function of the training set of the CNN-LSTM-attention decreases rapidly and steadily as the number of iterations increases and finally tends to a stable state. en, the loss function of the test set goes through initial fluctuations as the iteration progresses, quickly tends to the loss function of the training set, and is in a stable state. It can be seen from Figure 5(b) that similar to the CNN-LSTM-attention, the loss function of the training set of the CNN-GRU-attention network decreases rapidly and steadily and finally tends to a stable state and the loss function of the test set also gradually tends to the training set after initial fluctuations. Finally, the loss function is in a stable state. e loss function curves of the training set and test set of CNN-LSTM-attention and CNN-GRU-attention show that the design of CNN-LSTM-attention and CNN-GRU-attention network in the CDLP model is reasonable.  Figure 7. From the figure, it can be found that the trend of the MAPE curve first quickly rises to the maximum value, then quickly decreases, and gradually becomes stable. Finally, the MAPE curve tends to be 5.12%.
is shows that the CDLP model has excellent robustness and obtains small error, further showing that the CDLP model can better realize the prediction of traffic flow.
Furthermore, in order to further verify the prediction effect of the CDLP model, Figure 8 Table 4. Moreover, the training times of CDLP and baseline models are shown in Table 5. It can be seen from Table 4 that the evaluation indicators of the CDLP model are the smallest, followed by baseline models. is shows that the prediction accuracy of the CDLP model is the best.  Moreover, it can be found from Table 5 that the training time of the CDLP model is as long as the time of the CNN-LSTM-attention model, but its prediction accuracy is higher than that of CNN-LSTM-attention model. e training time of the CNN model is the shortest, but the prediction accuracy is the lowest, so the robustness of the CDLP model is relatively high.
In addition, according to the DOWCA, the weights of CNN-LSTM-attention model and CNN-GRU-attention model in the CDLP model are calculated, as shown in Figure 9. Figure 9 shows that the weights of CNN-LSTM-attention and CNN-GRU-attention are dynamic and constantly changing, which indicates the two methods have different  Journal of Advanced Transportation prediction results for the same traffic flow data. Moreover, it can be seen from Figure 8 that the weights of the two models gradually decrease from a large change at the beginning and eventually become stable, which reflects the systematic feasibility of the dynamic weighted coefficient algorithm, namely, the convergence. Furthermore, it shows that the weights of the CNN-LSTM-attention model are greater than those of the CNN-GRU-attention model, indicating that the prediction accuracy of the CNN-LSTM-attention model is higher than the CNN-GRU-attention model, which is consistent with the results in Table 4.

Conclusion and Future Work
Traffic flow prediction is an important part of the intelligent transportation system. In this article, a dynamic weighted coefficient algorithm for combinational prediction model is presented, namely, DOWCA. Furthermore, based on CNN, LSTM, GRU, and DOWCA, a combined deep learning model for short-term traffic flow prediction is proposed, namely, CDLP model. e structure of the CDLP model with an input layer, a hidden layer, and an output layer is designed. From the point of the combined model, the CDLP model includes two paralleled single models, i.e., CNN-LSTM-attention model and CNN-GRU-attention model. e parameters of CDLP model are determined by experiment, which includes loss function, the neuron number, and optimization algorithm. e data from a field intersection are collected, and the dataset for the CDLP model is obtained through abnormal and missing data processing and normalization processing, which is divided into the training set and test set. e CDLP model is trained and tested. e results obtained show that the feasibility of the CDLP model can predict traffic flow with high accuracy. Moreover, in order to further verify the performance of the established model, based on the same dataset and the same parameter settings as the CDLP model, the baseline models are, respectively, used to predict the traffic flow. After analyzing the prediction results of these models, the results show that the accuracy of the CDLP model is higher than the baseline models. And DOWCA is validated to obtain the optimal weighted coefficients for CNN-LSTM-attention and CNN-GRU-attention in the CDLP model dynamically.
e structure of a CDLP model is designed and its parameters are set in this article. However, some parameters, for example, the number of nodes in the input layer and hidden layer in the model is obtained through experiments based on the selection of short-term traffic flow parameters in the past. How to optimize the parameters in a combined deep learning model needs to be further studied. Furthermore, traffic flow prediction involves several parameters; the deep learning structures based on the combinatorial algorithm can be expanded to multidimensional input variables, such as traffic speed and occupancy.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.