Highway Traffic Speed Prediction in Rainy Environment Based on APSO-GRU

In order to accurately analyse the impact of the rainy environment on the characteristics of highway traﬃc ﬂow, a short-term traﬃc ﬂow speed prediction model based on gate recurrent unit (GRU) and adaptive nonlinear inertia weight particle swarm optimization (APSO) was proposed. Firstly, the rainfall and highway traﬃc ﬂow data were cleaned, and then they are matched according to the spatiotemporal relationship. Secondly, through the method of multivariate analysis of variance, the signiﬁcance of the impact of potential factors on traﬃc ﬂow speed was explored. Then, a GRU-based traﬃc ﬂow speed prediction model in rainy environment is proposed, and the actual road sections under diﬀerent rainfall scenarios were veriﬁed. After that, in view of the problem that the prediction accuracy of the GRU model was low in the continuous rainfall scenario, the APSO algorithm was used to optimize the parameters of the GRU network, and the APSO-GRU prediction model was constructed and veriﬁcations under the same road section and rain scene were carried out. The results show that the APSO-GRU model has signiﬁcantly improved prediction stability than the GRU model and can better extract rainfall features during continuous rainfall, with an average prediction accuracy rate of 96.74%.


Introduction
Rainfall is the most frequently occurring severe weather, which brings serious impact to highway traffic safety. It is important to study the traffic flow characteristics of highways under rainy environment and grasp the regularities of rainfall on traffic flow, making stable prediction and analysis of traffic flow to implement effective traffic control [1][2][3][4].
In terms of the impact of adverse weather on traffic flow characteristics, with the improvement of the highway system and the continuous development of information observation and collection technology, scholars at home and abroad have conducted continuous research [5][6][7][8]. At present, the data analysis and modelling system for the impact of weather factors on highway traffic flow is well established. In 1994, Ibrahim and Hall [9] studied the impact of adverse weather on the flow-occupancy and speed-flow relationships through regression analysis and showed that heavy rain and snow caused a 10-20% and 30%-48% reduction in maximum highway flow, respectively. In 2005, Agarwal et al. [10] used years of traffic flow data and contemporaneous weather data of the Northern United States to quantify the effects of adverse weather conditions and roadway conditions on highway traffic flow. e results showed that heavy rain, snow, and low visibility resulted in a 10%-17%, 19%-27%, and 12% of reduction in capacity, respectively, and a reduction of vehicle speed by 4%-7%, 11%-15%, and 10%-12%, respectively. In 2015, Li [11] derived the mean values of vehicle speeds of different rainfall intensities on highways based on statistical analysis of data, used standard deviations to measure the dispersion of vehicle speeds, and investigated the variability of vehicle speed on different lane locations, vehicle types, and time periods during rainfall.
In terms of traffic flow prediction models, they are mainly divided into prediction models based on statistical theory analysis, nonlinear theory prediction models, machine learning prediction models, and combined prediction models [12]. However, nonlinear theoretical model related theories are very complex, especially in terms of mathematical processing. e model has a high degree of complexity and a large amount of calculation, which is suitable for more complicated emergency transportation systems. With the rise of artificial intelligence, machine learning has been more often applied to the field of traffic flow prediction, and related prediction algorithms have emerged.
ey are mainly divided into support vector machines, artificial neural networks, and deep learning [13]. In 2009, Castro-Neto et al. [14] proposed a supervised online SVR statistical learning model, which optimized the problem of limited applicability of general models in atypical cases. e developed model outperformed models such as Gaussian maximum likelihood, Holt exponential smoothing, and artificial neural networks in typical and atypical traffic flow prediction. In 2013, Jeong et al. [15] proposed an online learning weighted SVR model (OLWSVR) for short-term traffic flow prediction, which outperformed prediction models such as locally weighted regression, conventional SVR, and online learning SVR. Smith and Demetsky [16] analysed short-term traffic flow prediction models based on neural network and nonparametric regression. Cai et al. [17] proposed a neural network based on improved cuckoo algorithm with optimized radial basis function (CS-RBF) for highway traffic flow prediction under heavy rainfall, and the study showed that the algorithm has better prediction accuracy and convergence speed. In 2015, Lv et al. [18] proposed traffic flow feature learning using a stacked autoencoder model and trained it with greedy hierarchical unsupervised learning deep learning model. Zhang and Wang [19] built an urban trunk road travel time prediction model based on GRU network and simulated it using real road network data. In 2020, Wang et al. [20] proposed an LSTM travel time prediction model considering rainfall data, and the results showed that the prediction results with the inclusion of rainfall features were more accurate than when the rainfall features were not included. Meng proposed the LSTM-GRU combined model to predict the short-term traffic flow speed of highways in rainy days. is model is well adapted to the uncertainty and sudden change of traffic flow speed in rainy days [21].
Reviewing the above literature, we can find the following research trends regarding the influence of rainy weather environment on traffic flow characteristics and traffic flow prediction. (1) For the research on the influence of rainy weather environment on traffic flow characteristics, most domestic and foreign scholars divide the rainfall intensity into levels [22]. Using the traffic flow data and rainfall data of the actual road section, the changes of the macro traffic flow parameter values of the road section under different levels of unfavorable weather are given. However, there is a lack of comprehensive consideration of the impact of multiple factors on traffic flow. (2) In terms of traffic flow prediction, various prediction models have different principles. Currently, machine learning and deep learning models have become the mainstream of research in the field of traffic flow prediction. roughout the many previous studies on traffic flow prediction, there are fewer studies on traffic flow prediction under rainy environment, and more related studies only add the verification of rainy weather scenarios on the traditional prediction. erefore, in this paper, we consider the influence of various factors to carry out the research on traffic flow characteristics of highways under rainy environment. Also, we add rainfall features to the deep learning model to carry out the prediction of highway traffic flow speed under rainy environment. In view of the fact that the PSO algorithm can adjust the hyperparameters of the deep learning model and bring better prediction performance, this article will build the APSO-GRU model.

Preprocessing of Traffic Flow Data.
e data in this article come from the floating car data of Beijing-Harbin Highway (JingHa Highway), Beijing-Tianjin Highway (JingJin Highway), Beijing-Taipei Highway (JingTai Highway), and Beijing-Kaifeng Highway (JingKai Highway). By fusing multisource floating car data and combining the original data with relevant geographic information through the MapInfo interface, the space-time matching of traffic flow data is completed. After data preprocessing, the proportion of abnormal data accounts for 5% of the total original data. e raw traffic flow data are recorded in 5-minute intervals, spanning a total of six months from June 1 to August 31, 2018, and June 1 to August 31, 2019. e raw data include information such as highway section ID, section direction, average vehicle speed, and traffic flow. e data format is shown in Table 1.
In the table, the first 13 digits of Section_id indicate the number of a section of the highway, and the last digit indicates the direction of vehicle travel on the section, with 1 representing upward and 0 representing downward; Spee-d_avg indicates the average speed of all vehicles passing the section during the collection time; volume indicates the traffic volume of the section during the collection time.
e data cleaning is divided into two parts: rejection of erroneous data and repair of missing data. For the rejection of erroneous data, a "rule rejection method" is used, which integrates the threshold method and the basic theory of traffic flow [23]. For the missing original data [24] of traffic flow, a simple nearest neighbor mean fill method is used to fill the data, which combines the mean filling method of replacing the missing data with the mean of the existing data and the nearest neighbor interpolation method using the observation values near the missing value to replace the missing value. e nearest neighbor interpolation method of missing values is combined. en, we take the average value of the valid data adjacent to the missing data as filling, as in the following equation: where H t represents the missing traffic flow data of the t th cycle, including V and Q, and H t−1 , H t+1 are traffic flow data of the two adjacent cycles of the t th cycle.

Preprocessing of Rainfall
Data. e rainfall data were obtained from Beijing Nanjiao Observatory Station (No. 54511), Tongzhou Station (No. 54431), and Daxing Station (No. 54594). In this paper, only hourly rainfall data and their corresponding dates and times are extracted. A total of 4397 meteorological data from Daxing District and 4401 meteorological data from Tongzhou District were extracted, and the format of rainfall data is shown in Table 2. e time item indicates the end moment of the data collection time, and the rainfall amount is the accumulated rainfall amount within the data collection time.
Only a small amount of rainfall data was found to be missing through inspection. According to the method of traffic flow data filling, the average value of rainfall in the adjacent hours of the missing data was used to fill in the missing data.

Spatial and Temporal Matching.
e recording period of traffic flow data is 5 minutes, while the recording period of rainfall data is 1 hour. It is necessary to match two data from time granularity. High-precision rainfall data are currently not available, and it is difficult to decompose long-period data into short-period data. In addition, the time accuracy of the floating car data acquisition system needs to be improved. Even if high-precision rainfall data are obtained, the time-space error of the two data matching is difficult to evaluate. So, it is more reasonable to combine the traffic flow data from 5-minute recording period to 1-hour recording period, and the combination rule is expressed as follows: where C represents traffic flow on the section in one hour (veh/h); C i represents traffic flow on the section in 5 minutes (veh/5 min); V represents the average speed of vehicles on the section in one hour (km/h); and V i represents the average vehicle speed on the section in 5 minutes (km/h). Based on the weighted average method, the average vehicle speed time series and traffic flow time series of each highway are calculated. e weight of the road section is the proportion of its length in the whole highway. e calculation method is shown in the following equation: where V represents the average speed of vehicles on the highway (km/h); V n represents the average vehicle speed of the road section (km/h); L n represents the length of the road section (m); L represents the length of highway (m); K represents the total number of sections of the highway; C represents the overall traffic volume of the highway (veh/h); and C n represents the traffic volume of the road section (veh/h). e traffic flow time series and the rainfall time series in the region are integrated according to the corresponding time to complete the spatiotemporal matching, and the format of the matched data is shown in Table 3. In the table, Date_hour indicates the date and time; precipitation indicates the rainfall amount in mm/h; Volume_sum indicates the traffic flow in veh/h; and Speed_avg indicates the average vehicle speed in km/h.

Analysis of the Factors Influencing the Traffic Flow Speed.
e traffic flow speed of highway is affected by many factors. Four potential factors, such as rainfall intensity, date category, time period, and number of lanes, are selected to explore whether these factors affect the traffic flow speed of highway by multivariate analysis of variance. According to the statistical analysis of the characteristics of highway traffic flow, it can be seen that the "morning peak hour" of highway is relatively lagging behind that of urban roads. Before the arrival of the peaking hour, it can be clearly seen that both traffic flow speed and traffic flow have experienced two processes of first decreasing and then increasing. rough observation, it is found that it is more reasonable to divide every four hours as a time period, as shown in Table 4.  SPSS software was used for multivariate analysis of variance, and the output of SPSS is shown in Figure 1. e results show that the four factors have significant influence on the traffic flow speed. In addition, the interaction of date category, time period, and number of lanes has a significant impact on traffic flow speed. e combination of other factors has no significant effect on traffic flow speed. It can be seen that the standard speed of highway vehicles in rainy days decreases with the increase of rainfall intensity. In terms of date category, weekend is more vulnerable to rainfall than working day. e slope of "rainfall intensity standard speed" of the four-lane highway in the two areas is greater than that of the three-lane highway, which indicates that the four-lane highway is more vulnerable to rainfall.

Speed Distribution Characteristics of Traffic Flow in Different Periods.
Considering the different levels of rainfall intensity, date category, and number of lanes, the distribution statistics of the standard speed of vehicles in different periods of each highway are carried out, as shown in Figure 3.
It can be seen from Figure 3 that the standard speed of four-lane highway is generally lower than that of three-lane highway when other factors are the same, which indicates that its traffic flow speed is more easily affected. e same rainfall intensity has different influence on the traffic flow speed in different periods of the day, and the morning peak and evening peak are more easily affected by rainfall. Similarly, the speed of traffic flow in the first period, the second period, and the sixth period is relatively less susceptible to rainfall. With the increase of rainfall intensity, the above differences will be more obvious.

Design of Traffic Flow Speed Prediction Model Based on
GRU. e proposed GRU model is composed of three sections, i.e., input layer, hidden layer, and output layer. e output layer is a fully connected dense layer. Adam algorithm is selected as the weight optimizer to optimize the internal weight of the model. e structure of the prediction model is shown in Figure 4 [25].
e input of the model is a time series matrix composed of traffic flow speed, traffic flow, and rainfall, the output is traffic flow speed, and the loss function is MAE. MSE is more affected by outliers, while MAE is more stable. After actual data validation, the parameters of the GRU prediction model are set as follows: the number of hidden layer nodes is 15, dropout parameter is 0.3, batch size is 200, epoch is 180, and learning rate is 0.004. And the training set and test set are divided in a ratio of 4:1.

PSO Algorithm.
Part of the parameters of the GRU model is automatically adjusted by the model, and the other part of the parameters needs to be set artificially, which are called superparameters, including the number of hidden layers, the number of hidden layer nodes, the number of iterations, etc., and the rationality of superparameter setting directly affects the convergence speed of model calculation and the accuracy of prediction. erefore, this section uses PSO algorithm to optimize the GRU model.
Particle swarm optimization algorithm is described in the D-dimensional search space, and N different particles form a search population. e current position of the i th particle is , v i2 , . . . , v iD ), and the best location currently searched by the individual is p i � (p i1 , p i2 , . . . , p iD ); it is called individual extremum [26]. e optimal position of the During the rising period, the speed and flow increase rapidly 3 8:00-12:00 In the morning peak, the speed and flow are high 4 12:00-16:00 In the afternoon, the peak was flat, and the speed and flow decreased smoothly and slowly 5 16:00-20:00 In the evening peak, the flow reaches the peak again, and the speed decreases gradually 6 20:00-24:00 In the low period of night, the speed and flow decrease rapidly whole population is called global extremum, and it is g � (g i1 , g i2 , . . . , g iD ). e current position of the particle corresponds to a candidate solution of the optimization problem, and the flight process is the search process of the individual. Each particle iterates continuously to update its speed and position, which are determined by equations (4) and (5), respectively: where v i (t) represents the velocity of the i th particle at time t; x i (t) represents the position of the i th particle at time t; c p , c g represent the acceleration coefficients, where c p is the cognitive learning factor and c g is the social learning factor, respectively, representing the self-learning ability of particles and the ability to learn from the optimal individual of the group, c p , c g > 0; r 1 , r 2 represent random numbers with (0, 1) interval uniform distribution; p i (t) represents the historical optimal position of the i th particle at time t; and g(t) represents global optimal position of particle swarm optimization at time t.
In order to further optimize the performance of the PSO algorithm, Shi introduced a new parameter inertia weight [27] into the particle velocity update formula of the original PSO algorithm, and equation (4) becomes Inertia weight determines the influence of particle velocity at the previous time on the current velocity, which can effectively balance the role of global search and local search. Equation (5) consists of three parts. e first part is inertial motion, which indicates the degree to which the particle    maintains its velocity at the previous moment; the second is cognitive learning, which means that the particles memorize their historical optimal position and make them close to the historical optimal position. Finally, social learning, which means the information exchange between particles, makes particles close to the historical optimal position of the population [28]. It can be seen from equation (4) that particles have memory, and they move towards the direction of the optimal particle combined with their own and group experience. Equations (5) and (6) constitute a new PSO algorithm called standard PSO algorithm (SPSO).

Adaptive Nonlinear Inertia Weight PSO.
In order to further improve the problem that PSO algorithm is easy to fall into local optimal solution and reasonably balance the ability of local search and global search of PSO algorithm, an adaptive nonlinear inertia weight method is used to adjust, as shown in the following equation: where w max , w min are the maximum and minimum values of inertia weight; f i represents the adaptation value of particle i; f min represents the minimum fitness of all current

Experimental Verification Scenario and Evaluation
Indexes. In the field of traffic flow prediction, the most common loss functions include mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) [29]. eir calculations are shown in equations (8)- (10). RMSE and MAPE are used as error functions to evaluate the performance of the prediction model.
where L represents the total length of time series; Y p (t) represents prediction value at time t; and Y(t) represents true value at time t.
Based on the established model of the traffic flow speed prediction of GRU and APSO-GRU, the traffic flow speed of Beijing-Harbin Highway, Beijing-Tianjin Highway, Beijing-Taiwan Highway, and Beijing-Kaifeng Highway in Beijing is predicted, respectively. At the same time, the support vector regression (SVR) model is compared with the APSO-LSTM model on prediction performance.
In order to more comprehensively and accurately evaluate the performance of the model under different rainfall scenarios, the rainy environment is divided into two categories: noncontinuous rainfall and continuous rainfall. Noncontinuous rainfall is the situation that the rainfall process is relatively short and sparse in a specific period of time, while continuous rainfall is the situation that the rainfall process is relatively long and dense in a specific period of time.

Prediction Results of Noncontinuous Rainfall.
e time span of noncontinuous rainfall in Tongzhou District is the interval between 0:00 on 9th August (working day) and 24: 00 on 10th August (weekend) in 2019, and the time span of noncontinuous rainfall in Daxing District is from 0:00 on 11th August (weekend) to 24:00 on 12th August (working day) in 2019.
Due to article content limitation, this paper only selects to visualize prediction results of Beijing-Harbin Highway, as shown in Figure 6. e horizontal axis represents the time, and the left vertical axis represents the speed, corresponding to the broken line chart. e right vertical axis represents precipitation, corresponding to the histogram. is setting is used in the following.
Under the noncontinuous rainfall scenario, the traffic flow speed of highway is obviously disturbed during the rainfall (moderate rain and heavy rain), and the change  Table 5.

Prediction Results of Continuous Rainfall.
e time span of continuous rainfall in Tongzhou District and Daxing District is from 0:00 on 28th July 2019 (weekend) to 24:00 on 29th July 2019 (working day). e prediction results of traffic flow speed of Beijing-Harbin Highway during continuous rainfall are shown in Figure 7.
It can be seen from the figure that under the continuous rainfall scenario, the traffic flow speed of highway is greatly affected, and the operation state continuously fluctuates. In this unstable situation, the trend of the predicted values of each model is basically consistent with the real values, but the prediction accuracy is significantly lower than that of noncontinuous rainfall. e prediction error is shown in Table 6.

Comparative Evaluation and Analysis of Prediction
Results.
e statistics of the average prediction error of the traffic flow speed of each model under different rainfall scenarios on each highway can be seen in Table 7.
In the aspect of traffic flow speed prediction, the prediction accuracy of the APSO-GRU model is better than that of the GRU model and APSO-LSTM model under the two rainfall scenarios, and the accuracy of the SVR model is the lowest, which verifies the performance improvement of the built deep learning model. e average prediction accuracy of the GRU model, APSO-GRU model, APSO-LSTM model, and SVR model is 95.96%, 97.16%, 96.42%, and 94.25%, respectively. e average prediction accuracy of the APSO-GRU model is 1.20% higher than that of the GRU model and 0.74% higher than that of the APSO-LSTM model. e average prediction accuracy of the APSO-GRU model is 96.75% under the continuous rainfall scenario, which is 2.38% and 2.22% higher than that of the GRU model and APSO-LSTM model, respectively. e prediction accuracy of each model has declined, but the decline of the APSO-GRU model is not obvious, followed by the GRU    model. e prediction accuracy of the APSO-LSTM model and SVR model is lower than that of the former two models.

Conclusions
e main conclusions obtained in this paper are as follows: (1) Based on the results of the multivariate analysis of variance, rainfall intensity, date category, time of day, and number of lanes have significant effects on traffic flow speed. e higher the intensity of rainfall is, the more the traffic flow is affected. Traffic flow is more likely to be affected by rainfall on weekends than weekdays, and it is more likely to be affected by rainfall during daytime (especially AM peak and PM peak) than at night. (2) An APSO-GRU traffic flow speed prediction model was built for the rainy environment. Under the noncontinuous rainfall scenario, the average prediction accuracy of the APSO-GRU model reaches 97.33%, which is 1.19% and 0.71% higher than that of the GRU model and the APSO-LSTM model, respectively. Under the continuous rainfall scenario, the average prediction accuracy of the APSO-GRU model reaches 96.74%, which is 2.69% and 2.39% higher than that of the GRU model and the APSO-LSTM model, respectively. e results show that the prediction accuracy and stability of the APSO-GRU model are significantly improved compared with the APSO-LSTM model under different rainfall scenarios. (3) Comparison of the traffic flow speed prediction results between the machine learning model SVR and the APSO-LSTM model in deep learning shows that the prediction accuracy of APSO-LSTM is higher than that of the SVR model by 2.18% and 5.55% under noncontinuous rainfall and continuous rainfall scenarios, respectively. It indicates that the prediction accuracy and stability of the model based on LSTM are better than those of the SVR model, which fully proves that the prediction performance of the deep learning model is better than the traditional SVR model.

Data Availability
e data used to support the findings of this study have not been made available because of data ownership issues.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.