Short-Term Traffic Speed Prediction Method for Urban Road Sections Based on Wavelet Transform and Gated Recurrent Unit

As a core component of the urban intelligent transportation system, traffic prediction is significant for urban traffic control and guidance. However, it is challenging to achieve accurate traffic prediction due to the complex spatiotemporal correlation of traffic data. A road section speed prediction model based on wavelet transform and neural network is, therefore, proposed in this article to improve traffic prediction methods. ,e wavelet transform is used to decompose the original traffic speed data, and then the coefficients obtained after the decomposition are used to reconstruct the high-frequency random sequences and the low-frequency trend sequence. Secondly, a GRU neural network is constructed to learn the trend of low-frequency sequence. ,e spatiotemporal correlation between input data is extracted by adjusting the input of the model. Meanwhile, an ARMA model is used to fit unstable random fluctuations of high-frequency sequences. Last of all, the prediction results of the two models are added together to obtain the final prediction result. ,e proposed prediction model is validated by using road section speed data based on the floating car data collected in Ningbo. ,e results show that the proposed model has high accuracy and robustness.


Introduction
With the socioeconomic development and the acceleration of urbanization, the demand for transportation continues to grow.
ough the urban transportation system has been developing as well, trying to match with the increasing demand for transportation, and its supply capacity is improving through the construction of transportation infrastructure, it is still one step behind. Congestion has become a universal problem, causing headache all over the word. Problems brought in by the overcrowding of urban roads include economic, health, and environmental problems, such as stress, fuel consumption, wasted time, and traffic accidents [1]. e intelligent transportation system (ITS) can effectively control and induce urban traffic. It is a key way to solve urban traffic problems, so it puts forward higher requirements for accurate intelligent traffic services [2]. e objects of traffic prediction are traffic parameters such as vehicle flow, speed, and occupancy in a specific area and time period [3]. High-precision traffic prediction can provide accurate travel information for urban residents, and it helps to realize ITS applications [4]. Real-time traffic data can be obtained effectively with the development of ITS [5]. erefore, in order to improve the accuracy and reliability of traffic prediction, researchers are committed to develop and improve effective traffic prediction models based on fully mined historical traffic data. In this paper, traffic speed prediction on road sections is studied using historical traffic speed data.
It is the research focus of scholars to consider the spatial correlation between road sections in traffic prediction models due to the complex spatiotemporal correlation between road sections. e traffic conditions of each section of the road network are often affected by the conditions of its upstream and downstream sections. For example, traffic congestion often starts on one or more sections and spreads to other sections after a period of time, resulting in regional congestion [6]. Regarding this trait of congestion, some scholars in early years had built nonparametric models using speed data of the studied road section and its upstream and downstream sections, which can better capture the spatiotemporal correlation between road sections and thus will improve the accuracy of the prediction models [7].
Due to their high flexibility, good learning, and generalization capabilities, algorithms based on neural networks have been widely used in transportation-related tasks [8]. Recurrent neural network (RNN) is applied to traffic prediction because of its special internal structure capable of effectively processing time series. RNN used for traffic prediction mainly includes Long Short-Term Memory (LSTM) neural networks and Gated Recurrent Unit (GRU). GRU was proposed by Cho et al. in 2014 and it is also reported that GRU achieves equal to or better performance than LSTM [9]. In addition, it is proven that GRU outperformed LSTM on nearly all tasks except language modelling with the naive initialization [10]. Dai et al. used GRU network to make short-term traffic forecasts. In the study, GRU was used to process the spatiotemporal feature information of the internal traffic flow of the matrix to achieve the purpose of prediction [11]. In another hybrid model which predicts lane speed, a GRU network layer was used to achieve the final speed prediction [12]. ese studies have shown that GRU is competent on traffic prediction and achieve good results.
At the same time, the traffic speed data generated by each road section also has complex attributes. Firstly, the realtime and dynamic nature of urban traffic results in strong random fluctuations in traffic speed data. e difficulty of traffic prediction is enhanced by this inherent characteristic. Secondly, urban traffic has certain spatiotemporal characteristics and periodic laws, which makes the traffic data of each road section have a relatively constant change trend. For example, commuting sections have low traffic speed during morning and evening rush hours and have high traffic speed during other hours. is makes the trend of traffic data over time more predictable. In order to effectively learn the stable periodic characteristics of traffic data and random fluctuations under real-time dynamic traffic, wavelet analysis theory is applied to perform traffic prediction. e historical traffic data are decomposed into subsequences from high to low in terms of frequencies by wavelet transform (WT). e low-frequency sequence contains characteristics similar to the original data. From a long-term perspective, the volatility trend of low-frequency sequence has a repeating daily periodicity. From a shortterm perspective, the data at the moments before and after the sequence are similar and continuous [13]. In the past, some scholars used different models to predict the sequence after WT, including ARIMA model and BP neural network [14][15][16]. However, most of the inputs of these combined prediction models are single time series, which ignores the correlation between the traffic data generated by spatially adjacent road sections. erefore, a hybrid prediction framework has been proposed to make more accurate predictions in this paper. Firstly, the historical traffic speed data are decomposed into subsequences from high to low in terms of frequencies by WT. Moreover, a GRU network is constructed to learn the development trend of low-frequency sequence. By controlling the input of this model, the spatiotemporal correlation of traffic speed data can be extracted effectively. At the same time, an autoregressive moving average (ARMA) model is used to fit the random fluctuations on the high-frequency sequences. Finally, the prediction results of the two models are added together to represent the final prediction result. e rest of the paper was organized as follows. Previous research studies are discussed in Section 2. And the prediction model proposed will be explained by us in Section 3. In Section 4, the validity and robustness of the model are proven using the speed data set in Ningbo, China. e conclusions and future work of this paper are summarized in the end.

Related Work
While long-term prediction predicts the future traffic demand using data such as socioeconomic attributes, the traffic prediction required by ITS is a short-term traffic prediction which mainly focuses on the traffic conditions in next few minutes to several hours [1]. In the past few decades, a series of studies on traffic prediction have been implemented by researchers. e existing methods can be divided into three categories: parametric methods, nonparametric methods, and hybrid model prediction methods.
e structure of prediction model based on the parameter methods is rather simple. Parametric methods such as Autoregressive Integrated Moving Average (ARIMA) [17,18] and Kalman filtering [19,20] achieve promising results but they rely on certain physical or statistical assumptions. However, traffic flow has the characteristics of randomness and nonlinearity, and it is difficult to establish a parameter model that can reproduce the characteristics of traffic flow in practice. e gradual popularization of sensors on urban roads and GPS on vehicles enabled the acquisition of real-time traffic data. erefore, the nonparameter method of modelling using large amount of historical data is widely applied. Nonparametric models mainly include the most traditional statistical machine learning methods and the most popular artificial intelligence algorithms. Support vector machine (SVM) [21,22], K-nearest neighbor (KNN) [23,24], and artificial neural network (ANN) are the most widely used ones [25].
Intelligent algorithms have attracted widespread attention not only in academia and industry but also in the field of transportation in recent years. RNN is widely recognized as a suitable method to capture the temporal evolution of traffic flow. LSTM, as a typical representative of RNN, was used by many scholars for short-term traffic prediction. Ma et al. applied LSTM to road traffic speed and volume prediction for the first time. By introducing the forget gate, the network can internally connect time series data and increase prediction accuracy [26]. Experimental results show that this network model is superior to ordinary neural networks. e variant GRU of LSTM, which was proposed in 2014, has also been used in traffic prediction because of its simpler structure and similar effects with LSTM [11,12]. Convolutional neural networks (CNN) algorithm is widely used in computer vision and image classification [27]. In 2017, CNN was proven to be suitable for traffic speed prediction of the entire road network by learning traffic as images [28]. However, due to the complexity of the topological structure of urban road networks, it is difficult for traditional CNN networks to obtain the spatial characteristics of irregular grid structures. A model called graph convolutional network (GCN) is used to extract the spatial correlation between the road sections. For example, a spatiotemporal GCN (STGCN) model was proposed to extract the spatiotemporal dependence of road network traffic speed data and make predictions [29]. In addition, a model called diffusion convolutional CNN (DCRNN), which combines GCN and RNN at the same time, models the spatial dependence of traffic as a diffusion process on a directed graph and uses RNN to fit temporal correlation [30]. An unsupervised algorithm named sparse autoencoder (SAE) was firstly used to identify and predict traffic state [31]. Furthermore, a deep belief network (DBN) trained with a greedy unsupervised method is also used to predict the traffic speed of an arterial in Beijing [32]. Most neural network models do not provide a reasonable explanation, which is different from statistical machine learning algorithms. However, neural network models have higher prediction accuracy, especially when dealing with large amounts of data.
Short-term traffic prediction can be affected by many factors due to different prediction scenarios. A single prediction model may not be suitable for all scenarios. Fusco et al. built a two-layer model combining Bayesian network and neural network and verified it with floating vehicle data [33]. ere are also scholars who combined unsupervised learning algorithms with supervised learning algorithms for model. A prediction model combining SAEs and LSTM is proposed, which uses SAEs and LSTM to extract the spatial and temporal correlations of traffic data, respectively [34]. In the research of Duan, CNN for extracting spatial features and LSTM for capturing temporal information were combined [35]. e method based on wavelet transform has also been applied to predicted the traffic. After the data is decomposed and reconstructed by WT, the neural network optimized by particle swarm is used to predict the sequence separately [36]. A speed prediction study down to the lane scale has also been proposed recently. e researchers built a two-layer deep learning framework that combines LSTM and GRU to predict the speed of different lanes [12]. e combined prediction model combines different prediction algorithms, which can give full play to the advantages of each model to obtain better predicted results.

Framework Overview.
A W-GRU-ARMA model was constructed for short-term traffic speed prediction in this section. As its name implies, the model is composed of three parts: wavelet transform (W), GRU, and ARMA. e spatiotemporal relationship of urban traffic speed data is taken into account in this model. It focuses on the short-term correlation of traffic speed data for the predicted road sections in the time dimension and the spatial correlation with the upstream and downstream sections. Figure 1 shows the overall architecture. After the wavelet transform of the traffic speed time series data, different lowfrequency and high-frequency sequences will be obtained.
erefore, a GRU model for predicting low-frequency trend sequence and an ARMA model for predicting highfrequency random sequences are established. e final prediction result is obtained by summing the prediction results of each model. e input of the constructed GRU prediction model contains the traffic speed data for predicting road sections and its upstream and downstream sections. For example, the road section A is selected as a research object, and the speed of the road section A at t + 1 is used as a prediction target. en, the input vector X of the GRU model is as follows: where i represents the predicted road section, i + 1 and i − 1 represent the downstream and upstream sections of the predicted road section, respectively, t represents current time, and t − m represents the m previous time. e value of m is determined according to the algorithm performance during the model training phase.

Wavelet Transform.
Wavelet transform is a method for processing nonstationary and nonlinear signals with the advantages of multiresolution and multiscale. Wavelets provide an output in terms of the time-frequency scale, which can approach the original signal at any scale and capture the details of the original signal [37]. In nonstationary data analysis, wavelet transform is often used to extract the trend information of sequence changes [38]. e discrete wavelet transform (DWT) can decompose the original traffic speed data into a series of multiple frequencies. e Mallat algorithm is efficient in nonstationary traffic speed time series into sequences of different frequencies through high-pass filters and lowpass filters. e outputs of the low-pass filter and the high-pass filter are defined by dA and dD in equations (2) and (3) which are called approximate coefficients and detail coefficients: where X is the original signal, φ represents the filter, and the subscribe l and h represent the low-pass filter and high-pass filter, respectively. Figure 2(a) demonstrates the process of Mallat algorithm for two-level decomposition. e original time sequence data X is put through both low-pass filter and high-pass filter, and the outputs are dA1 and dD1 of the first layer, respectively. en, the obtained dA1 is passed through two filters again to obtain two coefficients dA2 and dD2. After decomposition, time series components with different frequencies can be obtained, but the lengths of the components after decomposition are not equal. e sequence length is reduced by half after the decomposition. erefore, inverse discrete wavelet transform (IDWT) is used to reconstruct the data using approximate coefficients and detail coefficients to obtain sequences that are equal to the length of the original sequence but with different frequencies. As shown in Figure 2(b), the approximate coefficient dA2 is used to reconstruct the low-frequency component to form a sequence A2; the detail coefficients dD1 and Dd2 are used to reconstruct the high-frequency component to obtain the sequences D1 and D2.

Gated Recurrent
Unit. RNN has a wide range of applications in the field of time series analysis. It can implement a mechanism similar to the human brain and maintain a certain memory of the processed information. However, traditional RNN models are prone to vanishing gradients and gradient explosions during training [39]. A variation of RNN called LSTM is proposed to solve the problem effectively. e cells of hidden layers for LSTM have a special structure compared with traditional neuron nodes, which is the key to the long-term dependence of LSTM learning time series. Figure 3(a) shows the cell structure of the hidden layer of LSTM. Information inflows, outflows, and previous status updates can be achieved by adding input gates, output gates, and forget gates to this cell structure. e forget gate is responsible for determining how much of the previous cell state is retained in the current cell state, the input gate is responsible for determining how many inputs are retained in the current cell state, and the output gate is responsible for determining the output of the current cell state.
Gated Recurrent Unit (GRU) is a variation of LSTM networks. It inherits the advantages of RNN model: it automatically learns features and effectively models long-termdependent information. It has been applied to short-term traffic prediction successfully [11,12]. Figure 3(b) shows the cell structure of the hidden layer of GRU network, and it is more simply compared with LSTM, obviously. Intuitively, the input and forget gates in LSTM were integrated as a reset gate in GRU [9], which determines how to combine the new input information with the information from the previous time. Another gate in GRU is called update gate; it determines how much of the information from the previous time can be saved to the current time. erefore, GRU is one gate less compared to LSTM. In addition, the cell state and hidden state in LSTM have been integrated as one hidden state in GRU. ese changes make the GRU network have fewer parameters and faster training speed and require less data to generalize the model effectively [40]. e calculation formula of GRU is as follows: Formulae (4) and (5) show how the update gate z t and reset gate r t are calculated in GRU neurons. W z denotes the weight of z t , W r denotes the weight of r t , and σ denotes the sigmoid function. e innermost term [h t−1 , x t ] represents the sum of vectors h t−1 and x t . A larger value of z t indicates that more information has been maintained by the current cell while the less for the previous cell. r t suggests that when the value of the equation is equal to 0, the information from the previous cell is discarded. Formulae (6) and (7) show the calculation of the pending output value h t and final output value h t of the GRU neural network. h t−1 represents the output from previous cell, W denotes the weight of the z t , and tanh denotes the hyperbolic tangent function. h t is obtained by multiplying h t−1 of the previous cell by r t , plus x t , multiplying by the W, and using the hyperbolic tangent function. h t is the sum of two vectors. One is obtained by multiplying 1−z t by h t−1 and another one is obtained by multiplying zt by h t .

Autoregression Moving Average Model.
Autoregressive moving average (ARMA) model is the most common type of time series models used for stationary random process analysis [41]. is method does not require strong similarity between the data at the predicted time and the data at the previous time.
e method could smooth the predicted values at excessive fluctuations by averaging several measured data. e high-frequency sequences generated by WT have smooth fluctuations. erefore, this article makes use of ARMA to predict the high-frequency sequences and simulate the random fluctuation of original data caused by the real-time and dynamic nature of traffic. e basic model of the autoregressive moving average model is ARMA (p, q), which consists of two parts, namely, the autoregressive model (AR) and the moving average model (MA). e basic expression is as follows: where c is a constant, ε t denotes the random error of the Gaussian white noise distribution, φ and λ are the parameters of the models AR and MA, and p and q refer to the orders of the models AR and MA. On the left side of the equation, y t denotes the predicted result of the ARMA model, corresponding to the predicted value of the traffic speed on the high-frequency sequences at time t.

Data.
e data used to evaluate the performance of the proposed model were the floating car speed data collected in Ningbo. e raw data were uploaded by the GPS equipment on about 4,300 taxis every day from July 1, 2017, to July 30, 2017. e GPS equipment records the running status of the vehicle every fifteen seconds. Each recorded piece of floating car speed data included record time, vehicle speed (instantaneous speed), location (latitude and longitude), and direction of travel. ese data-high-frequency floating vehicle data-can reflect detailed vehicle dynamics [42]. In this study, a representative busy area in Ningbo was selected for the study, and the road network was divided into 283 sections based on the existence of intersections (Figure 4). e vector road network data were obtained from Open Street Map.
In order to use the raw data to estimate the average road speed, data cleaning was performed in the first place. e erroneous data with incorrect time and location have been deleted, and the abnormal speed values were identified and removed by the interquartile range method [43]. Since the speed of urban sections will be affected by the intersection [42,44], the data uploaded by GPS when the vehicle is suspended at the intersection temporarily were kept, which makes the final estimated results more realistic. en, the Feature Manipulate Engine (FME) platform is used to estimate the speed of the road sections. FME is a set of solution customization software for spatial and nonspatial data analysis, processing, and conversion. rough this platform, a geometric map matching algorithm [45] is used to match the cleaned spatial data with direction attributes to the road network. Meanwhile, an algorithm for estimating the average vehicle speed at the interval of ten minutes is designed. In the algorithm, the average value of consecutive track points of the same vehicle on the same road section within the same ten minutes represents the average speed of a single vehicle (SV). And the final average speed of a road section (RV) is the mean of the all SV. After all these, the missing values were imputed by the linear interpolation [43]. Finally, Mathematical Problems in Engineering 5 148 speed values are obtained every day and we choose data from 6 a.m. to 24 p.m. (108 time steps in 1 day) as experimental data. A time-space traffic diagram, in which x-axis is time, y-axis is space, and the color inside represents speed [46], is used to demonstrate the processed data, as shown in Figure 5. Next, road section B, which is adjacent to the hospital, and the main channel section A of the city are the main research objects, as shown in Figure 4. e calculated results of the two road sections are shown in Figure 6. Road section B has a low overall speed during the daytime, and the pattern is not obvious. However, road section A has a more obvious morning and evening peak trend. According to the demand of the prediction model and the abovementioned data processing steps, the average speed of the road section in every driving direction from 06:00 to 24:00 every 10 min was obtained. After that, a time series of road section speeds with 108 data per day and a total of 3240 data over 30 days were generated. e road section speed data with an effective length of 30 days were divided into two parts according to a ratio of 8 : 2. e data of the first 24 days were used to train the model, and the data of the next 6 days were used to test the model.  (8) and (9). MAPE is the relative error of the prediction, and RMSE provide the deviation of the predicted value from the actual value. ese measurements help us to better understand the prediction results:

Accuracy Indicators and Experimental
where x is the predicted value at time i, x is the actual value at time i, and n is the number of predicted values.
According to the characteristics of wavelet decomposition, DB4 is used as the mother wavelet of DWT to decompose and reconstruct the original data in two stages. Two high-frequency sequences and one low-frequency sequence can be obtained for training and prediction. For low-frequency sequences, the minimum-maximum normalization method is used to scale the input data to the [−1, 1] range before training the model. e predicted output values by the model will be readjusted to normal values. After several experiments, a GRU network with two hidden layers and each hidden layer with 256 units is used to predict lowfrequency sequences. All neural network approaches were implemented using Tensorflow.

Results and Analysis.
In this section, a speed data set for urban road sections is used to evaluate the W-GRU-ARMA model. e validity of the model is verified by predicting the speed value of the road section in the next 10 minutes.
First of all, we applied the model to the experimental data of two road sections (A and B) with different traffic patterns. In this prediction experiment, it is important to choose the appropriate time steps for input. e best input is determined after performing prediction experiments on different inputs of time steps from 1 to 8 in the GRU network. Experimental results are shown in Figure 7. Figure 7 shows that when the input time step increases from 1, the prediction errors decrease rapidly and the best prediction results on both A and B were obtained when the input time step is 2. e RMSE and MAPE values of prediction for road section A are 1.585 and 6.014, respectively, and those of road section B are 1.361 and 5.459, respectively. On the other hand, as the number of input step size continues to increase, the prediction errors do not decrease significantly. is means that the traffic status at a certain time has a strong correlation with the historical time closer to it. In addition, the prediction errors of the two research sections have similar changes with the increase of the input time step, which may indicate that the traffic data generated by different road sections have similar short-term dependencies. erefore, the time step of the input was set to 2 in subsequent experiments. In this prediction experiment, a total of six days of traffic speeds were predicted, including weekdays and weekends. e actual traffic speed, the predicted traffic speed, and the associated residual for road sections A and B on July 26 (Wednesday) and July 30 (Sunday) are shown in Figure 8 to illustrate the prediction results of proposed model. Figure 8 show that the actual value and the predicted value are in good agreement. e traffic pattern of road section A is more complicated on weekend, which makes the prediction difficult. e proposed model, therefore, has slightly better prediction performance during the weekdays than at the weekends. On the other hand, the position of the standard red rectangles also reflects that the proposed model can better catch sudden changes of traffic speed. In addition, the prediction result drawn in Figures 8(c) and 8(d) reflects that the model performs better on road section B, possibly because similar traffic pattern B presents every day.
In order to find more details from the prediction results, the prediction errors of the two road sections per hour within 6 days are calculated and shown by box diagrams in Figure 9.
As shown in Figure 9, the prediction error of a road section at different times of the day is inequable. No matter for road section A or B, the prediction performance is more stable at the nonpeak hours, and there is a certain fluctuation of the performance at the morning and evening travel peaks,   especially on road section A. For road section A, the prediction performance is better at noon and evening. Furthermore, the regularity of the prediction result for road section B in different time periods is not significant, because road section B is relatively busy all the time. But it is worth mentioning that the prediction error in different periods is always within a smaller range.
To verify the effectiveness of the proposed model, the prediction performance is compared with the GRU model, LSTM model, SAEs model, and ARIMA model. Tables1 and 2 demonstrate the prediction performance of different models.
From Tables 1 and 2, it can be seen that the prediction performance of the proposed model is better than ARIMA, GRU, SAEs, and LSTM models on the two experimental road sections, especially compared with ARIMA.
is is owing to the ARIMA model which assumes that the traffic condition is a stationary process while this is not always true in reality. In general, the prediction error of each model on road section B is lower than that on road section A. In the prediction for road section B, the performance of LSTM is better than GRU. Based on this finding, this experiment was expanded on 12 road sections. e experimental results of the following four models are compared:  Mathematical Problems in Engineering W + GRU + ARMA, W + LSTM + ARMA, GRU, and LSTM. e prediction results are presented in Figure 10.
As can be seen in Figure 10, W + GRU + ARMA achieves better results than W + LSTM + ARMA when the prediction results of the ARMA model are the same. is means that GRU networks, which are simpler in structure than LSTM, have the potential to predict the low-frequency sequence. Looking at it another way, the prediction effect of W + LSTM + ARMA model is not as good as that of the pure LSTM on some road sections. is is because pure LSTM can better capture the sudden changes in traffic data [26]. When LSTM is used instead of GRU in our proposed model, it is possible that the smoothness of the data makes LSTM lose its advantage in some cases.
To verify the robustness of the proposed model, W-GRU-ARMA prediction was applied to the experimental data of 81 road sections in the study area. e prediction performance of different models is shown in Figure 11.
Prediction results shown in Figure 11 have proven that the proposed model is robust. e prediction error of the     proposed model always fluctuates within a lower range, while the performance is better than other models on most road sections. On more than 80% road sections, according to Figure 11(c), the RMSE of prediction is below 2.5. e MAPE, as shown in Figure 11(d), is below 10% on more than 90% of the road sections.
ese results verify the effectiveness and superiority of our model for short-term traffic prediction on road sections.

Conclusions
is paper proposed a new combination model for shortterm vehicle speed prediction based on the spatiotemporal correlation of traffic data and unstable random fluctuations. e model consists of three components, namely, WT, GRU network, and ARMA model. For one thing, the historical traffic speed data are decomposed into subsequence from high to low in terms of frequencies by WT. Furthermore, there are multiple low-frequency sequences that have stable trends and spatiotemporal relationships between one another. A GRU model is constructed so that it takes in these sequences and then predicts the speed value of next moment. At the same time, the ARMA model is used to predict highfrequency sequences with unstable randomness. is enables our proposed model to simultaneously fit the steady trend and randomness of traffic speed data. e prediction experiment using the real traffic speed data generated by floating vehicle was carried out. Experimental results showed the advantage of the proposed model compared with the previous models in terms of two performance measures: MAPE and RMSE.
We also obtained the following key content from the experimental results. When GRU networks and LSTM networks are used to predict low-frequency sequences, GRU networks perform better, which means that GRU networks have great potential for smooth sequence prediction. In addition, although the traffic speed data has a strong time dependence, the prediction performance of the model does not increase with the input time step. It makes sense to choose the best input before conducting a prediction experiment. It is also found that the prediction error of the proposed model in different periods is slightly different, implying that effective analysis of traffic data at different time periods and the establishment of corresponding models may improve the accuracy of traffic predictions. erefore, this will be the main content of our further research.
Data Availability e taxi trajectory data used to support this study were provided by the Ningbo public transportation passenger transport administration and were made nonpublic.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper. "road_B.xlsx" is the training data of road section B, and file "road_B_test.xlsx" is the test data of road section B. Each file contains four valid fields: "time," "up," "mid," and "down," where "up" represents the speed data of the upstream section, "mid" represents the speed data of the study, and "down" represents the speed data of the downstream section. (Supplementary Materials)