Interval Short-Term Traffic Flow Prediction Method Based on CEEMDAN-SE Nosie Reduction and LSTM Optimized by GWO

With rapid economic growth and urbanization, the accelerated increase in car ownership has brought massive pressure on urban tra ﬃ c, and accurate tra ﬃ c ﬂ ow prediction information can provide an important basis for urban tra ﬃ c dynamic planning. The existing methods have problems such as low e ﬃ ciency, large error, and inability to adapt to short-term tra ﬃ c changes. To solve the above problems, the CEEMDAN-SE-GWO-LSTM method was proposed in this paper. First, the tra ﬃ c ﬂ ow data is processed for outliers and missing values. The Complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) method is used to decompose the tra ﬃ c ﬂ ow data, and sample entropy (SE) is used to reconstruct the subsequence, which is used to improve the quality of the input data. Then, the Grey Wolf Optimizer (GWO) is used to optimize the parameters of the long-short-term memory (LSTM) in order to improve the prediction accuracy and prevent the model from falling into a local optimum. Three models are used to compare with the ensemble model proposed in this paper, including back propagation neural network (BPNN), LSTM, and long-short-term memory optimized by Grey Wolf Optimizer (GWO-LSTM). Root mean square error (RMSE) is reduced by 40.9% to 66.7%; R 2 score is improved by 1.5% to 7.1%. The experimental results show that CEEMDAN-SE-GWO-LSTM has a higher prediction accuracy than the existing tra ﬃ c ﬂ ow prediction models. Finally, this paper uses the model prediction error to establish an interval prediction model based on the kernel density estimation theory, which enhances the generalization of the model and the practical application value.

1. Introduction 1.1. Background. Traffic congestion has become increasingly severe, generating social problems such as prolonged travel times and frequent traffic accidents. These issues can be effectively alleviated by collecting and processing traffic flow data and building an intelligent transportation system (ITS) [1]. The recent works on intelligent transportation system are summarized in Table 1.
Traffic flow forecasting, as an important part of the traffic system, can effectively reduce traffic congestion. Furthermore, accurate traffic prediction information can efficiently help local governments allocate traffic resources to reduce traffic congestion. Moreover, the forecast results can then aid travelers in planning their traffic routes and thus reduce travel time.
1.2. Literature Review. In cities, various factors can influence traffic flow, such as weather, geography, and time of day. These factors are predominately highly nonlinear and volatile. Therefore, if the collected data is used directly in the experiment, the model will not accurately discover the changing pattern of traffic flow time series, resulting in low prediction accuracy. To further reduce the effect of noise on model predictions, Chen and Chou [8] proposed an empirical mode decomposition (EMD) method, which decomposes traffic flow signals to intrinsic mode functions (IMFs) by using EMD theory. Ensemble empirical mode decomposition (EEMD) improves the mode mixing of EMD by adding Gaussian white noise to the original sequence. Liu et al. used EEMD to decompose the traffic flow data [9]. CEEMDAN improves the processing of EEMD and achieves better decomposition results with higher computational efficiency. Lu et al. used the CEEMDAN method to decompose the raw traffic flow [10]. This paper uses the SE to reorganize the IMFs obtained from CEEMDAN to complete the dimension reduction processing of traffic flow data.
Traffic flow can be predicted due to its strong regularity and periodicity, but its uncertainty increases the difficulty of prediction [11]. Many efforts have been made to forecast STTF, and these efforts can be broadly classified into parametric and nonparametric methods. Autoregressive integrated moving average (ARIMA) is a standard parametric method for forecasting time series data. Yu and Zhang proposed switching the ARIMA model and applied it to actual data obtained from UTC/SCOOT system [12]. Kumar and Vanajakshi proposed a seasonal ARIMA (SARIMA) model for STTF prediction [13]. Chen et al. proposed an autoregressive integrated moving average with generalized autoregressive conditional heteroscedasticity (ARIMA-GARCH) model to predict traffic flow [14].
Another standard parametric method for time series prediction is the Kalman filter technique. Kumar proposed a Kalman filter technique (KFT) model for traffic flow prediction [15]. Guo et al. proposed an adaptive Kalman filter approach to predict STTF prediction [16]. However, parametric methods are limited by assumptions such as smoothness of the time series, which may lead to poor accuracy when the time series varies irregularly. Therefore, the parametric approach has limited applicability in the transportation field. Unlike parametric methods, which are limited by several preconditions, nonparametric methods have become the mainstream strategy for STTF prediction. Yang and Lu proposed a combined wavelet-SVM prediction model for STTF prediction [17]. Duan used a particle swarm optimization (PSO) algorithm to select the appropriate learning parameters of support vector machine (SVM) for STTF prediction [18]. Alam et al. applied five regression models to predict the traffic flow in the city of Porto [19]. However, the predictions calculated by these methods are still unsatisfactory and need to be more precise in their application. With the development of deep learning and data volume growth, traffic flow prediction methods based on deep learning have been gradually proposed and achieved good results.
Zhang et al. proposed a convolutional neural network-(CNN-) based deep learning framework for STTF prediction [20]. Zheng and Huang used a long-short-term memory (LSTM) network to predict traffic flow data [1]. Qu et al. proposed a new end-to-end improved LSTM model, M-B-LSTM, to predict STTF [21]. Ma et al. used a convolutional neural network (CNN) to extract traffic flow pattern features, and the extracted features were fed into an LSTM unit [22]. Zhao et al. proposed a temporal convolutional network (TCN) model to predict STTF in the city [23]. LSTM is widely used to predict time series among the algorithms Table 1: Recent works on intelligent transportation system.

Number
Solved problem Model name 1 Sensor data is analyzed using crow search algorithm optimized long-short-term memory to correctly identify drivers [2].
Crow search algorithm optimized long-short-term memory (CSA-LSTM) 2 Spatio-temporal individual mobility graph encoding network with group mobility assistance (SIGMA) is proposed to encode individual mobility behavior, which enables recommendation of new locations [3].
Spatio-temporal individual mobility graph encoding network with group mobility assistance (SIGMA)   3 This work proposes a deep learning-based traffic safety solution in 5G intelligent transportation systems that can effectively predict drivers' intention to change lanes [4].
(1) Lane-change intention recognition based on an LSTM and historical driving-track data (2) Lane-change intention recognition based on an LSTM and natural-driving data This work presents an edge node deep learning-based traffic flow detection scheme that combines vehicle detection and vehicle tracking algorithms and is deployed to the edge device Jetson TX2 platform [5].
A vehicle detection network based on improved YOLOv3 and a vehicle tracking network based on the improved DeepSORT 5 This work proposes a dynamic and intelligent traffic light control system (DITLCS) that dynamically adjusts traffic light durations by analyzing real-time traffic information, which improves the efficiency of traffic light control systems [6].
Deep reinforcement learning and fuzzy inference system 6 This work presents a radial basis function neural network algorithm based on quantum particle swarm optimization (QPSO) strategy for traffic flow prediction in intelligent transportation system (ITS) [7].
Quantum particle swarm optimization (QPSO) strategy 2 Wireless Communications and Mobile Computing related to neural networks. However, this algorithm has significant complexities and has the disadvantage of not achieving global optimality. This paper uses the Grey Wolf Optimizer (GWO) to optimize the parameters of the LSTM network, which avoids the situation that the LSTM algorithm falls into local optimum. In this paper, to reduce the effect of variable nonstationarity on the prediction, we decomposed the natural traffic flow data using the CEEMDAN-SE method. Then, we used the Grey Wolf Optimizer (GWO) LSTM algorithm for traffic flow prediction.

Contribution and Paper
Framework. The main contributions of the paper are as follows.
(i) Noise reduction of the data is performed by the CEEMDAN-SE method. Due to the volatility and instability of traffic flow, using raw data directly as the input to the model can lead to low prediction accuracy. In this paper, we use CEEMDAN to decompose the original sequences and use SE to measure the complexity of each IMF component. Moreover, we combined the subsequences with similar complexities to reduce input dimension and improve prediction efficiency (ii) An optimized GWO-LSTM model is built to predict the traffic flow data. Since the traditional LSTM model is prone to fall into local optimum, this paper adopts the GWO algorithm to optimize the parameters of the LSTM, which improves the model's optimizationseeking speed and prediction accuracy and R 2 metrics are calculated separately for the above models to quantify prediction accuracy (iv) Use the kernel density estimation function to estimate the model prediction error distribution and establish an interval prediction model The rest of the paper is organized as follows. In Section 2, the paper describes the signal decomposition algorithm used, the optimization algorithm, the deep learning model, and the probabilistic interval estimation method. In Section 3, the paper presents the experiments performed, which mainly include the results of the decomposition of the traffic flow, the tuning of the hyperparameters by the optimization algorithm, the comparison of the prediction results of the proposed model with other models, and the results of the interval prediction. In Section 4, the work done in this paper is summarized, and the limitations of this research as well as the future directions of development are described.

Proposed Method
This chapter mainly introduces our proposed STTF prediction model. It is well known that traffic flow is a signal that varies nonstationarily over time (frequency varies over time). Previously, there were no excellent theories for nonstationary processing signals. The EMD decomposition [24] proposed by Huang et al. in 1998, part of the Hilbert-Huang transform (HHT), was a breakthrough in this kind of signal analysis. Huang et al.'s method was based upon spectral decomposition. Spectral decomposition decomposes the signal into components of different frequencies. In our proposal, we adopted the theory of CEEMDAN [25] (an improved version of EMD) to decompose the original nonstationary traffic flow signal. We decomposed the signals into several IMF subsignals of different frequencies as the input of the prediction model GWO-LSTM to improve the model's prediction accuracy.
Furthermore, this paper utilised SE to measure the nonlinear complexity of the IMF subsequence processed by CEEMDAN [26] to reduce the dimension of the IMF subsequence. Thus, SE was introduced to solve the problem of excessive computational scale. In the prediction model, we utilised the LSTM model based on GWO optimisation to improve the prediction accuracy and reduce the training time by optimising parameters such as the number of nodes, iterations, and learning rate. The superiority of our proposed CEEMDAN-SE-GWO-LSTM is demonstrated by comparison with other benchmark models. We will introduce our model in detail through the following three aspects: decomposition of nonstationary traffic flow signals, SE-based IMF subsignal fusion, and STTF prediction based on GWO-LSTM. This work's general arrangement is shown in Figure 1.

CEEMDAN-SE
2.1.1. CEEMDAN. EMD (empirical mode decomposition) is a classic adaptive method for solving nonstationary signal problems [18]. However, the modal aliasing problem of EMD will cause severe sawtooth lines in the time-frequency distribution. It makes certain eigenmode functions lose their physical meaning, which leads to the degradation of the performance of EMD.
Based on the CEEMD-SE method, Wang et al. [27] presented a wind power short-term prediction model. In their experiment, the RMSE and MAE were 2.16 and 0.39, respectively, better than EMD-SE-HS-KELM, HS-KELM, KELM, and ELM models. In 2020, Tian [28] presented a STTF prediction model. The model was based upon EMD method and combination model; their proposed model demonstrated superior performance in STTF prediction. However, the two models have limitations. Neither model takes noise into account; neither can automatically adjust to follow changes in noise.
Therefore, we propose utilizing CEEMDAN [25] method to decompose the traffic flow time series to reduce its nonstationarity. CEEMDAN is an EMD-based algorithm, and it uses the EEMD method to add Gaussian noise to the original signal. Then, the signal decomposition performance is improved. The mode mixing problem is solved by the idea of multiple stacking and averaging operation to cancel the influence of noise to obtain better mode decomposition results. The process of the CEEMDAN algorithm is shown in Figure 2.

Wireless Communications and Mobile Computing
2.1.2. Sample Entropy Theory. N subsequences will be generated after the CEEMDAN decomposition of the traffic flowtime series data. Using them directly as the input data of the GWO-LSTM model will result in a sizeable computational scale. Therefore, SE, a nonlinear complexity measure, is used to classify and reconstruct the traffic flow-time series samples to reduce the complexity of subsequences. SE [29] is a method based on approximate entropy (ApEn) [26], which evaluates time series complexities by measuring the probability of generating new patterns in time series signals. SE  Figure 1: The framework of this paper.
Original signal x Determine the amplitude and average number I of adding white noise Add white noise E k (n 1 ) Add white noise E k (n I ) Add white noise E k (n 2 ) ...
Is h ik (t) a IMF?
B m ðrÞ in Equation (1) is the probability of the two sequences matching m points under the similarity tolerance r, and A m ðrÞ is the probability of the two sequences matching m + 1 points. The calculation formulas are Equation (4) and Equation (5), respectively.
A i and B i are the number of the maximum distance, not greater than r, between the vector sequences X m ðiÞ and X m ðjÞ of the dimension m composed of time series data when the dimension is m + 1 and m, respectively. Specifically, X m ðiÞ = fxðiÞ, xði + 1Þ, ⋯, xði + m − 1Þg, 1 ≤ i ≤ N − m + 1, represents m consecutive values of x starting from the ith point.
The amount of data is usually limited in specific applications. Thus, Equation (1) 2.2. GWO-LSTM. LSTM was first presented in 1997 [30] as an algorithm that could make machines learn much faster and help solve complex artificial long-time-lag tasks. As the GWO algorithm [31] imitates the hunting mechanism that the leadership hierarchy of grey wolves in nature, it could be used to optimize the LSTM model. Our presented model has a significant optimization effect compared to LSTM neural networks and BP neural networks. Thus, this paper uses GWO-LSTM method to complete the postprocessing of CEEMDAN-SE. The general process of the GWO-LSTM framework is shown in Figure 3.

LSTM.
LSTM is designed to have the problem in longlong-term dependence solved. Compared to RNN, LSTM has three more gates-forgetting gate, input gate, and output gate-enabling it to achieve better results in traffic flow prediction.
Since the output is a linear combination of the inputs, we need to enhance the nonlinearity of LSTM. The enhancement will be done through the use of the activation function as it exacerbates the nonlinearity of the network model. Common activation functions for LSTM are tanh (-1, 1), sigmoid (0, 1) and relu [0, 1). Following experimental verification, tanh (-1, 1) presents better results to our problem and is selected as our activation function.
(1) GWO. We first divided the traffic flow prediction into four layers and entered them into the GWO model to complete the initialization, with the first three layers being of greater significance. We defined α as the optimum solution. During the hunt, the behavior of grey wolves rounding up their prey was defined as Equation (7) and Equation (8), where t is the current iterative generation, A and C are the coefficient vectors, and X p and X are the prey position vector and the grey wolf position vector, respectively.
The calculation equations of A and C are shown in Equation (9) and Equation (10), where α is the convergence factor. As iterations decreases linearly from 2 to 0, the norms of r 1 and r 2 are random numbers between [0, 1].
In the GWO model, the upper layer leads the lower layer to the set of update equations shown in Equation (11), and after completing the update, the GWO model outputs Xðt + 1Þ to the LSTM model according to Equation (12). Subsequently, the model calculates the loss function and adjusts the learning rate of the GWO model according to the vector X.
(2) GWO-LSTM. We referenced the data on the LSTM model to derive a prediction of the baseline model. Subsequently, we incorporated the LSTM prediction results into the GWO model to obtain the new four strata. Once the four strata are obtained, GWO will calculate the coefficient matrices A and C according to Equation (9) and Equation (10). Then, it will calculate the ratios of each stratum in the four strata using A and C, inputting them into the LSTM model for automated parameter tuning, and continue to train the 5 Wireless Communications and Mobile Computing LSTM model. The above process will then repeat until a user-specified number of iterations is reached.
Machine learning training aims to update the parameters and optimize the objective function. In this paper, the GWO is set as an optimizer to perform a local estimation (jAj shown as Equation (9)) based on the results of each LSTM iteration to minimize the loss function. We use 1024 as the initial batch size and 0.01 as the initial learning rate. GWO decides to update or not update (eliminate or not eliminate) the population of grey wolves based on the results of each LSTM iteration and, thus, dynamically adjusts the learning rate of the LSTM each time. In addition, our GWO network is optimized for four layers.
Finally, this paper constructs a CEEMDAN-SE-GWO-LSTM model combining GWO-LSTM with CEEMDAM-SE to obtain a more accurate traffic flow prediction. Repeating the above process, when jAj is less than 1, the local opti-mum is trapped at this time; when jAj is greater than 1, the global optimum is reached.

Probability Interval Prediction.
We define the STTF prediction error ε as the deviation between the actual observed value P obs of traffic flow and the predicted value P pred of traffic flow at a certain moment, defined as Then, we use the historical data of prediction error to perform probability density function fitting; the purpose is to avoid the contingency of STTF error accuracy using kernel density estimation (KDE). Finally, we use the inverse cumulative distribution function (ICDF) to expand the prediction results into intervals to improve the prediction accuracy and generalization.

Wireless Communications and Mobile Computing
Due to the randomness of STTF and its dependence on time characteristics, the prediction error of STTF does not conform to the assumption of normal distribution, as is shown in Figure 4 that the error distribution of our proposed method does not belong to the normal distribution after being discriminated by the Q-Q diagram.
Therefore, the normal distribution based on the parameter estimation method is not suitable for the confidence estimation of STTF problems. Correspondingly, nonparametric estimation methods do not require prior assumptions about the distribution of prediction errors. Instead, it fits a probability distribution according to the input data, which is more adaptable to nonstationary data.
As a nonparametric probability density estimation method, KDE has a stable fitting effect, so this paper uses KDE to predict the confidence interval of traffic flow. KDE   Assuming the sample set X = fx 1 , x 2 ,⋯,x n g of traffic sequence data, all sample points obey the distribution f ðxÞ, let the functionf ðxÞ be the KDE of f ðxÞ,f ðxÞ can be expressed asf In the upon formula, x i represents the ith sample in the dataset, h is the window width, which represents the interval division size of the sample error distribution, and KðxÞ is the kernel function, which determines the role of each sample point x i in density estimation. In practical applications, the Gaussian kernel function is the most widely used due to its good mathematical properties. This paper selects the Gaussian kernel function for kernel density estimation, and its expression is as 2.4. Inverse Cumulative Distribution Function. The inverse cumulative distribution function gives the value associated with a specific cumulative probability. Therefore, this paper uses ICDF to determine the confidence interval. That is, we want to expand the prediction results into an interval and discuss whether the method can include observations in our prediction interval, thereby further improving the prediction accuracy [32].
To get the ICDF, the cumulative distribution function (CDF) of the error function needs to be calculated first. The cumulative distribution function of a random variable X is a function on the real numbers that is denoted as F and is given by Equation (16), where x is for any x ∈ R.
The ICDF and CDF are inverse to each other, so we draw the ICDF diagram under different algorithms, as shown in Figure 5, which is the ICDF diagram of the error distribution of the LSTM algorithm on the dataset used in this paper. The ICDF and CDF are inverse to each other, so we draw the ICDF diagram under different algorithms, as shown in the figure, the ICDF diagram of the error distribution of the LSTM algorithm on the dataset used in this paper. Then, we take the 80%, 90%, and 95% confidence intervals, respectively, and calculate their error intervals under different

Experiment
The section in question will thoroughly examine the related experiments. Notably, the CEEMDAN-SE and the GWO-LSTM were initially tested on our datasets. Finally, we combined the two models and created CEEMDAN-SE-GWO-LSTM. CEEMDAN-SE-GWO-LSTM will then be placed into direct comparison with other state-of-the-art models to evidence how it surpasses other models.

Sample Selection and Data Sources.
Among the most significant ways of travel, flying remains one of the most prevalent means in which the public opts. The geographical location of an airport generally is distant from an individual's area of residence. Hence, providing accurate traffic information is essential for planning in advance. This paper utilizes the measured traffic flow data of high-speed stations near the M25 Heathrow Airport to ensure the reliability and authenticity of predicated results. The dataset is the traffic flow data for 30 consecutive days from September 1, 2019, to September 30, 2019, with a collection frequency of 15 minutes. The data volume became 2880 ensuing the interpolation method. The interpolation method was applied to fill in the missing values of the time series and remove the outliers. In this paper, we uniformly divided the dataset into the training set and test set according to the ratio of 8 : 2.

CEEMDAN-SE.
We first employed the CEEMDAN method to decompose the traffic flow. After the CEEMDAN decomposition (as shown in Figure 6), we obtained the first 11 IMF components with different complexities and one IMF residual component with a relatively gentle change.
In order to avoid the input signal data being too large, we introduced the theory of SE. We introduced said theory in order to group and reconstruct the IMF subsignals that were obtained after decomposition. Figure 7 is the SE of the IMF  In this paper, the entropy value of each component (as shown in Figure 7) is used as the judging standard; each IMF component is reorganized. Specifically, IMF1, IMF2, and IMF3 have similar complexities and thus can be combined into recombined components. Similarly, the remaining IMF components are merged and recombined; the recombination result is shown in Table 2, and the new subsequence after reconstruction is shown in Figure 8.

GWO-LSTM.
Experiments are conducted based on the above data to verify the effectiveness of the GWO-LSTM model on traffic flow prediction. The initialized batch size is 1024, and the learning rate is 0.01. The number of implied layers of the LSTM network chosen are two. The learning rate of the LSTM model, thus, is continuously and dynamically adjusted in the iterative process using the GWO algorithm with four implied layers until the loss function is obtained below a predefined specific value. In the experiments of this paper, we uniformly set the number of iterations to 500 to train GWO-LSTM and CEEMDAN-SE-GWO-LSTM.
In order to obtain better results from the CEEMDAN-SE-GWO-LSTM model, we need to pretrain the GWO-LSTM. Our optimization target is the loss function; the loss values of LSTM and GWO-LSTM with different feature combinations and iteration numbers are shown in Table 3. The numbers in column features shown in Table 3 stand for the corresponding layers that the GWO optimizes.

Comparative Analysis of Prediction Models.
To further illustrate the effectiveness of the CEEMDAN-SE-GWO-LSTM model, the BP model, standard LSTM model, and the improved GWO-LSTM model was selected for comparison.
As more error evaluation indexes are found in traffic flow prediction, six commonly indexes are used to evaluate the model in this paper. The related equations of the six error evaluation indexes mentioned are shown from Equation (17) to Equation (22), where n is the number of test set data, and y i andŷ i are the actual traffic flow value and predicted traffic flow value at moment i, respectively.
The results of different models are shown in Table 4, and the comparison of prediction results of different models is shown in Figures 9 and 10. For the analysis of different models, Figure 10(a) is the result of the BP neural network. The limitation of the BP network is that it has a significant error in the prediction of extreme values, and traffic congestion often occurs in the period when the extreme values are generated. Figure 10(b) is the result of the LSTM model. The predicted value can better reflect the traffic flow trend, and it performs well at extreme values. However, when the traffic flow has a long-term trend, it has a hysteresis, which cannot perfectly solve the problem of timely forecasting. Figure 10(c) is GWO-LSTM, and it shows that the error performs well in each quantization index. However, the forecast results will have large fluctuations at certain times; this is due to the characteristics of LSTM and nonstationary signals. When LSTM uses the node information of adjacent nonstationary signals, it will be affected by a considerable fluctuation value and generate a significant prediction error. Figures 9 and 10(d) show the prediction results of each subsequence and the sum of each subsequence, respectively. Our proposed modal decomposition can better suppress this noise effect.

12
Wireless Communications and Mobile Computing models under different confidence levels. From a longitudinal perspective, the error interval of any model increases with the increase of confidence. As the model continues to be optimized, horizontally, our proposed model performs the best with any error interval at any confidence level. Figure 11 shows that in the BP and LSTM models, the prediction error is large, and the prediction is hysteretic, which leads to a traffic flow prediction error of 200-300 when the confidence level is 80%. This is very disadvantageous for traffic planning dynamic planning. At the high confidence level of 80% in our proposed model, the error every fifteen minutes can be controlled within 100, and it can be seen from the figure that most of the true observations are within the prediction interval. Even some extreme fluctuations are included, which shows that the interval prediction proposed in this paper can improve the accuracy of STTF.

Conclusion and Discussion
In this paper, we proposed a method to solve accurate shortterm predictions for signals, using real datasets for model validation. The method is generally integrated into two parts: CEEMDAN-SE and GWO-LSTM. First, we performed data cleaning on the traffic flow dataset by removing outliers and invalid values and then performing interpolation to ensure data consistency. Secondly, the CEEMDAN method is used to deconstruct the time series data of traffic flow. In order to prevent excessive computational scale, we introduced SE to measure the nonlinear complexity of the IMF subsequence processed by CEEMDAN to reduce the dimension of the IMF subsequence. The data is then fed into the optimised GWO-LSTM model. GWO minimises its loss function by continuously and dynamically optimising the hyperparameters during training with different sequences to achieve the desired accuracy. Finally, the component prediction results are integrated to obtain the predicted value, and the quantitative calculation of the evaluation index is carried out. The results evidence how our proposed ensemble model performs the best compared to other commonly used models. In order to improve the generalization of our model, we propose an interval prediction model based on KDE theory, which can increase the accuracy and be more practical. Our method can accurately predict extreme values with a rapid response to sudden changes in short-term trends. After the modal decomposition, the component features can be better utilised so that the final predicted value and its derivative will be smoother and have more practical application significance. The research shows that the ensemble model proposed by us can improve STFF prediction accuracy to a certain extent, which can provide a reference for related research.
Our work also presents certain limitations. More methods to evaluate the importance of the variables for feature selection might be required. Some scholars use a random forest algorithm (RF) to evaluate it, which more rigorously demonstrates the excellence of their model. Furthermore, both CEEMDAN-SE and GWO-LSTM could be improved if the two parts are combined well. We initial GWO model by random values in different values. It might be better to fix the values adapted to CEEMDAN-SE.
Thus, our future work is to solve the spiking phenomenon after CEEMDAN decomposition; the reason is that the decomposition is not thorough enough and has   pseudomodalities. Using the IMF component of the original noise signal as the noise for each calculation of the IMF may better avoid pseudomodal phenomena and reduce unnecessary components to improve the input data quality of subse-quent prediction models. For better prediction, other better time-series models can be applied to form a combined model. For instance, when given better time-series model X, scholars could consider using the GWO model to optimize the time- 14 Wireless Communications and Mobile Computing series model, and we hope to be able to consider other dimension features to get better prediction performance, find the most suitable combination, and use a machine learning model like random forest to evaluate.

Data Availability
The data used to support the findings of this study have been deposited in https://github.com/Bobbed1999/Short-Term-Traffic-Flow-Prediction.