BiLSTM-and GNN-Based Spatiotemporal Traffic Flow Forecasting with Correlated Weather Data

,


Introduction
Trafc fow forecasting is a crucial task for transportation management and decision-making.Accurate trafc fow predictions can help to improve trafc control, optimize transportation infrastructure, and reduce travel time and fuel consumption [1,2].However, traditional methods such as time series analysis and regression models often fail to capture the complex spatiotemporal patterns and correlations with weather data that can signifcantly impact trafc fow [3].
In recent years, there has been a growing interest in using deep learning models for trafc fow forecasting.Bidirectional long short-term memory (BiLSTM) networks have been shown to be efective in capturing the temporal dependencies of trafc fow [4].Additionally, graph neural networks (GNNs) have been proposed to model the spatial dependencies and interactions between trafc nodes [5].However, these models are often limited to using historical trafc fow data as input and do not consider the impact of weather conditions on trafc fow.
In this study, we propose a novel approach for trafc fow forecasting that combines the power of BiLSTM and GNN models with correlated weather data.Our approach leverages the ability of BiLSTM to capture the temporal dependencies of trafc fow and the ability of GNN to model the spatial dependencies and interactions between trafc nodes.We also incorporate weather data as an additional input to capture the impact of weather conditions on trafc fow.Trough extensive experiments on real-world trafc fow data, we demonstrate the efectiveness and superiority of our proposed model in comparison to state-of-the-art methods.
Te main goal of the research is to develop a spatiotemporal trafc fow forecasting model that captures the complex interactions between trafc fow and weather data and to improve the accuracy of trafc fow predictions.
Te main contributions of this research are highlighted as follows: (1) We propose a novel hybrid model, named DAGNBL, that combines graph neural networks and bidirectional long short-term memory networks to forecast trafc fow in a particular area.(2) We take into account the impact of nonrecurrent events such as weather changes on trafc fow by incorporating local meteorological information into the model, which leads to a better understanding of spatiotemporal deviations of trafc patterns.(3) We implement a double attention approach to enhance the performance of our model and enable it to learn the dynamic spatial-temporal correlations of trafc data.Specifcally, a spatial attention is used to simulate the complex spatial relationships between diferent locations, and a temporal attention is employed to capture the dynamic temporal links between diferent times.
Te rest of this essay is organised as follows.Te state-ofthe-art methodologies for trafc fow prediction are reviewed in Section 2. Te baseline models and methods are described in Section 3 along with our suggested model and in-depth description of the datasets.An experimental setup is presented in Section 4. Section 5 presents the fndings of the experiment.Te study is concluded in Section 6, which also suggests areas for future research.

Literature Review
Over the past few years, a lot of researchers have focused on the issue of trafc fow forecasting, driven mostly by the advantages it can ofer in real-time trafc monitoring, including the authors of [6][7][8][9][10][11][12][13].However, as most of the research focuses on the continuation of the existing situation into the future, the outcomes of these projections are frequently not very accurate.Trafc patterns are infuenced by many factors such as construction and maintenance of road or roadside infrastructure, population or employment changes, holidays and other special events, weather, and even accidents caused by human error.In light of these difculties, we intend to look into the issue of urban trafc fow forecasting using one of the external factors mentioned previously, notably weather circumstances.Additionally, encouraging is the fact that during the past 10 years, IoT has promised to increase the knowledge and productivity of transportation organisations.Sensors and other IoT-enabled equipment are able to gather and communicate data about activity occurring in the road network in real time.Transportation management can then examine the data obtained from these devices to manage the fow of trafc on the roads.Te data can vary from vehicle detection, vehicle volume, pressure and speed measurement, road surface conditions, and road weather conditions [14,15].Te road trafc data which are needed for intelligent transport system are gathered from any of the sources as shown in Figure 1.Te difculty of predicting trafc levels should ideally be divided into long-term forecasting as addressed in [16][17][18][19][20] and short-term forecasting as explained by the authors of [21][22][23][24].Short-term forecasts usually incorporate only a few months' worth of data from a small number of sensors, and they frequently focus on the near future, i.e., predicting 10-15 minutes in the future.Long-term forecasting, on the other hand, necessitates data from numerous sensors gathered over a comparatively longer period, such as an hour or a day.Tis assists stakeholders in making long-term decisions, such as allowing passengers to schedule their travel according to peak trafc hours and the government to plan the construction of a fyover or overhead bridge in response to a route that consistently experiences trafc congestion.Time-series modelling is frequently used to tackle both types of forecasts and examine the difculties in projecting trafc fow at a specifc site.Usually, one attempts to forecast the value of a variable using a series of historical samples obtained at predetermined intervals.But predicting trafc is an extremely difcult task.Trafc volume and fow are not just dependent on the driver or the vehicle.However, a number of other outside variables, such as on-road incidents or changes in the environment and weather, also have a signifcant impact on how trafc patterns change on the roads.When such external factors are present, the simple process of time series forecasting becomes a multivariate time series forecasting task.It is a difcult problem since one must simultaneously take into account intraseries temporal correlations and interseries correlations.However, these long-and short-term temporal patterns are now easily analysed, learned, and predicted with accuracy, thanks to the development of machine learning and deep learning algorithms powered and fueled up by big data IoT devices.Tese algorithms have been useful in many felds, including forecasting of trafc [25][26][27], energy use [28], stock market analysis [29], pandemic outbreak [30], sales analysis [31], and price prediction [32].Tus, it can be stated that, for the problem of forecasting trafc fow, if only a single site's It is clear from the abovementioned explanation that predicting trafc in isolation without using neighbourhood trafc patterns or external factors, particularly weather conditions, is not very efective.Te trafc volume data gathered from all nearby sensors should be collected along with the external weather conditions since they have a signifcant impact on trafc fow for real-time and accurate trafc fow forecasting of a specifc area.Trafc fow prediction research falls under the categories of parametric, machine learning, deep learning, and hybrid techniques as shown in Figure 3.
In this section, we will go over the research that has already been performed employing techniques for predicting both typical trafc patterns and trafc under difcult conditions.

Forecasting Trafc under Regular Road Conditions.
To forecast trafc fow, Xia et al. in [33] suggested a bidirectional LSTM network with attention and a normal distribution module.Te attention mechanism is used to identify the high-impact attention weight values that have an impact on the targeted road segment, and it employs a fve-second time window for the road segment.Te normal distribution is utilized to identify the infuence of spatial correlation.In another study [34], Wei and Sheng suggested a hybrid model consisting of graph attention network and LSTM network is proposed.In their work, they used a dynamic adjacency matrix to depict the spatial dependencies of the topological road network.LSTM network was used to extract dynamic temporal features.Guo et al. in [35] use a fusion of spatial and temporal attention modules to forecast trafc fow.Tey used graph convolutions for spatial dependencies and normal convolutions for temporal dependencies.However, their work also lacks the external infuencing factors such as accidents or weather-changing conditions.Li et al. in [36] also have a similar approach where temporal and spatial attention modules are used, and a layer of dynamic graph convolution neural network is used to fnd the data.Tus, their approach incorporates a multisensor data connection convolution block with a benchmark adaptive mechanism correlation.Lu et al. in [37] suggested LSTM outftted with temporally aware convolutional context (TCC) and lossswitch mechanism (LSM) blocks.To suppress the data outliers, Chen et al. in [38] used a variety of denoising techniques, such as empirical and ensemble empirical mode decomposition and wavelet.Te LSTM model's training data are used to make the predictions.In some other works, Ali et al. used support vector regression [39] and graph convolution networks with dynamic hash tables [40] to forecast trafc fows.Qiao et al. in [41] use 1D CNN with LSTM to predict trafc fows in urban city.In [42], Bohan proposes a bidirectional recurrent neural network to predict trafc states utilizing both historical and future data in training the network.

Forecasting Trafc Correlated with Weather Information.
In a simple work performed by Jia et al. [43], frst, an image matrix is constructed using the urban trafc data infow and outfow.Ten, the self-attention module is used to discover the internal relationships between pixels and record the internal organisation of the image.Finally, a deep Res2Net module is employed to forecast how many people will go through each area of the city based on previous trajectories and vacations.Zhang et al. in [44] and Ye et al. in [45,46] both used graph convolution neural network with attention mechanism and considered external factors while making predictions.Cui et al. in [47] use a stacked approach where the frst layer is of BLSTM and the last layer is of LSTM.Teir model has a sandwiched layer of either LSTM or BLSTM between the frst and the last layers for capturing the spatiotemporal dependencies.A very similar work is proposed by Ma et al. in [48].In conclusion, numerous studies using trustworthy LSTM models and graph neural networks have been explored in the literature in relation to the topic of trafc prediction.Te hopeful potential of BiLSTM models for future trafc time series predictions that take the temporal dependencies in the past, however, has received very little attention from studies.Furthermore, there has not been much study performed on a system that combines the strength of graph neural networks with bidirectional LSTM networks.Table 1 summarizes the main deep-learning-based research conducted in spatiotemporal-based trafc fow forecasting.

Materials and Methods
Te baseline architectures that we used in our model and the model that we presented are described in-depth in this section.

Deep Bidirectional Long Short-Term Memory Network.
An extension of the straightforward LSTM network is the deep bidirectional LSTM network proposed frst by [58].It operates with two LSTM cells in a single timestamp as opposed to its progenitor.Te frst is a forward LSTM cell, and the second is an LSTM cell in reverse.Tis should not be confused with the neural network's forward pass and the backward pass.Te forward and reverse cells receive only inputs, and the output is collected by sending it through the sigmoid activation function.Tis allows us to preserve the long-term dependencies between the data features.Te overall structure of the bidirectional LSTM cell is depicted in Figure 4.

Graph Neural Network.
Graph neural network is based on graph data structure consisting of a group of nodes and edges represented in Euclidean space.Nodes usually present the feature vectors, and edges maintain the relationship between the adjacent nodes.A GNN usually takes node attributes and fnds embedding for each node.Te idea of GNN was proposed by [59][60][61].A graph is usually represented as a set of nodes and edges.
where N represents set of nodes and N � 1, 2, 3, . . ...n { }, and E represents edge existing between any two nodes (i and j) in the graph.
A graph adjacency matrix represents all vertices labelled as rows and columns with a 0 or 1 value depending if there exists a connection between two nodes.
Te objective of the graph neural network is to encode contextual graph information by combining the data from nearby nodes.Each node receives information from its neighbouring node at every iteration.After that, the information is merged with the already-existing features to create a useable function.

Spatiotemporal Graph Neural
Network.A GNN that changes over time is called temporal graph neural networks and usually are represented with the following equation: where V n and V e represent the dynamic features of node and edge, respectively.For time series forecasting problem, it is important to combine a graph neural network with an RNN, LSTM, or GRU which lets the overall network to capture the spatial and temporal features together.Figure 5 illustrates the connection between the spatial and temporal features.

Description of Datasets.
For addressing the problem of intelligent transportation systems, there may be four main categories of data which may be required, including emergency information, information about vehicles, information about trafc facilities, and information about trafc fow [86].CityPulse is a live broadcast of IoT from numerous sensors placed throughout the Danish city of Aarhus Road trafc, and pollution, weather, cultural, social, library event, and parking data are among the datasets that are available [39,87,88].For our study, we will be using only  the road trafc and weather data for the period of eight months from February 2014 to September 2014.Table 2 shows the description of parameters in the CityPulse road trafc data.It contains data collected from two linked sensors connecting two streets in the Danish city of Aarhus.
Other fgures and tables display additional information and data about these observation points.LSTW is a national dataset that includes information on weather and trafc conditions in the United States, including trafc incidents (e.g., accidents and construction) and weather events (e.g., rain, snow, and storms).As of 2021, it contains approximately 37 million records of weather and trafc-related occurrences since August 2016.Figure 6 shows a map of Arhus indicating two observation points from street Arhusvej72 to Arhusvej 0.

Experimental Setup
In the experimental setup, TensorFlow and necessary packages were installed on Google Colab.Te dataset containing both trafc fow and weather data was then uploaded and preprocessed using the Pandas library.Te model was built using PyTorch, a high-level API for Ten-sorFlow, with a combination of BiLSTM and GNN layers.
For the training of the proposed model, the following hyperparameters were used: batch size of 32, L2 regularization of 0.01 for both time series GNN and LSTM layers, Adam optimizer, 200 epochs, and a learning rate of 0.01.Additionally, a search space was defned for hyperparameter tuning, which included varying batch sizes, learning rates, regularization strengths, number of hidden units in LSTM and GNN layers, and number of attention heads in the model.In our experiments, we used a graph convolutional layer with 64 nodes and a 2-layer BiLSTM with 128 hidden units.We found that these hyperparameters resulted in a good trade-of between model complexity and predictive accuracy.In particular, we used multihead attention in our model, which allows for attention to be computed across multiple feature maps and has been shown to be efective in modelling spatial dependencies in graph data.Te bestperforming hyperparameters were chosen based on validation accuracy, and the chosen values were used for fnal model training and testing.Furthermore, data normalization was performed on both input features and target variables, which is a common practice for time series data.Te training and testing data for this study were chosen based on a train-test split.Te dataset was randomly divided into a training set and a testing set, with a ratio of 80 : 20.Te training set was used to train the model, while the testing set was used to evaluate the performance of the model on unseen data.Te model was then used to make predictions on the test data, and the performance was evaluated using metrics such as mean squared error, mean absolute error, and R-squared.Tis approach can be used to improve the accuracy of trafc fow forecasting by incorporating correlated weather data.

Proposed Model.
Our proposed model aims to accurately forecast trafc fow by utilizing a combination of BiLSTM, GNN, and attention mechanisms as shown in Figure 7. Te model takes into account both the temporal dependencies in the trafc fow and weather data as well as the spatial relationships between the trafc sensors.Te use of attention layers allows the model to weigh the importance of each feature in the input data and improve the accuracy of the fnal prediction.Te main steps are defned as follows: (1) An attention layer weighs the importance of each feature in the input trafc fow and weather data.Tis attention layer can be implemented using the attention mechanism from TensorFlow.(2) A BiLSTM layer processes the sequential trafc fow and weather data.Te BiLSTM will capture longterm dependencies in the data.(3) Another attention layer weighs the importance of each feature in the spatial location of the trafc sensors.It can be used to focus on specifc parts of  Tis architecture allows the model to weigh the importance of each feature in the trafc fow and weather data as well as the spatial location of the trafc sensors, which can improve the accuracy of the fnal prediction.A sample of information stored at nodes and edges can be visualized in Table 3. Table 4 explains the graph represented in Figure 8, where the nodes represent the sensor locations and edges represent the connection between two sensor points.

Data Preparation for Hybrid Model.
Te data required for the model training was prepared by merging the weather data and the road trafc data.For example, the road trafc data for the month of February to June and August to September were copied in a single .csvfle.Table 5 shows the frst few entries of the processed dataset for our model.Te merged fle contained more than 9000 k rows of data for the month of February to September for any two observation points at a particular.Figures 9(a)-9(c) show the visualisations generated from our processed dataset for the vehicle count from 14 February to 16 February 2014.Additionally, the fow pattern on weekdays and weekends was compared between various sensors that are situated on various road segments.One example comparison is shown in Figures 10(a 197274) where the last road segments connect two cities, i.e., Aarhus and Tilst.Te graph used in this study was constructed using the CityPulse trafc and weather data.Each row of the merged dataset was considered as a node in the graph, and the edges were formed based on the pairwise Euclidean distances between the nodes.Te edge weights were calculated using the trafc data (vehicle count and average speed) and trafc metadata (distance in meters and report ID).Te node features were obtained from the weather data (humidity, dew point, and wind speed).Te process of constructing the graph can be described using the following equations.

Node Feature Matrix.
Let X be the node feature matrix of size (N * D), where N is the number of nodes and D is the number of features.Each row of X corresponds to a node, and each column corresponds to a feature.In this study, X was constructed using the weather data as follows:  node j.In this study, W was constructed using the trafc data and metadata as follows: where VehicleCount ij is the number of vehicles between node i and node j, AvgSpeed ij is the average speed between node i and node j, Distance ij is the distance in meters between node i and node j, and ReportID ij is the report ID of the trafc data between node i and node j.By constructing the graph in this manner, we were able to incorporate both the trafc and weather data into our GNN model, which allowed us to predict trafc fow with high accuracy.

Results and Discussion
Two studies, GMAN [90] and STSGCN [91], utilize graph convolutional networks and multihead attention mechanisms to predict trafc fow.GMAN takes trafc sensor data as input and predicts trafc speeds at future time steps, while STSGCN uses spatiotemporal trafc data to make predictions.Both models outperform several baseline trafc datasets, demonstrating the efectiveness of graph convolutional networks for trafc forecasting.However, neither GMAN nor STSGCN incorporates weather data into their models, making direct comparisons with our model inappropriate.Nevertheless, we compare our suggested model with established techniques and representative techniques    (v) ST-GNN: To more efciently incorporate information on trafc fows from surrounding roads, a layer of a GNN with a position-specifc focus mechanism was used.Tey combine an RNN with a transformer layer in order to capture the local and We evaluated the accuracy of our approach using the root mean square error (RMSE), which is provided in equation (1).For each model, we calculated the discrepancy between the projected and actual trafc count amounts in order to be straightforward and reasonable.Te prediction efect is improved with a low RMSE value.Table 7 contrasts the RMSE values of our model with those of the reference models.Figure 11 shows the ground-truth and predicted daily trafc passenger fows of road segment "158324"(Arhusvej) for one day.
Figure 11 shows the prediction results compared to the ground truth after training for one sensor 158321 from Feb to June using the CityPulse dataset.Te ST-GNN model and the GNN-DHSTNet model were the next lowest RMSE models, according to Table 7, which also reveals that our model with BiLSTM and GNN with attention mechanism had the lowest RMSE.Terefore, it can be inferred that the addition of the BiLSTM network to extract the temporal dependencies while preserving some external parameters such as dew point and air pressure has enhanced the overall prediction performance.One explanation for the results being diferent from those of the other models could be that our model used data from the variable road segments as input to obtain spatial dependencies from the graph neural network, as opposed to the other model, the GNN-DHSTNet, which used a 32 × 32 fxed grid size for building the graph representation.Another evaluation metric taken was mean absolute percentage error (MAPE).Tis metric takes the diference between the ground-truth values with the forecast values.A forecast is deemed to be of acceptable accuracy when the MAPE value is low, usually less than 5%.Te calculating method for MAPE is shown in equation ( 6).
Figure 12 compares the evaluation metric results on both datasets.Min-max normalisation, also known as feature scaling, was used on both datasets to conduct a linear transformation on the raw data.Using this method, all scaled   Journal of Advanced Transportation data within the range are obtained (0, 1).Te city pulse dataset performs somewhat better, as can be seen from the graph because it contains linking road ids as segments, which greatly aided in the construction of the graph neural network.

Conclusion
In this article, we have put forth a hybrid model for spatialtemporal trafc fow forecasting on city roads.Te suggested model incorporates a graph neural network with mechanisms for extracting spatial characteristics from various road segments while also paying attention to environmental variables such as percentage of dew, air pressure, and wind speed.Using the cityPulse and LSTW road trafc and weather datasets, a BiLSTM network with an attention mechanism has been proposed for the prediction while maintaining the temporal dependencies.Te suggested approach is more suited for predicting the monthly trafc patterns in transportation hubs along signifcant road segments.Results show that our model has an MSE value of 6.309, an MAE value of 2.256, and an RMSE value of 2.511.Dew, humidity, and wind speed are the only three weather factors the model currently takes into account.Nevertheless, the dataset also contains numerous additional meteorological condition data, such as temperature, pressure, and wind direction.Another limitation of our study is that we do not account for the trajectory features during training due to the model's increasing complexity.In future work, we plan to conduct sensitivity and scalability experiments to explore the optimal values of input parameters and investigate the performance of the proposed method as the sizes of training and test sets change.Additionally, in the future, we would like to broaden the scope of our work to incorporate other factors afecting the trafc fow, such as festivals and accidents as these and many other factors also afect how much trafc will be present on the roads.Additionally, we intend to use ensemble forecasting to study the issue in the future.

Figure 3 :Figure 2 :
Figure 3: Classifcation of techniques used for solving the problem of trafc fow forecasting.
matrix fed to DBN + KELM for feature extraction and prediction Deep belief network and kernel extreme learning of intra and interday trafc fow 2020 [57] ST-GNN METR-LA, PEMS-BAY Short term Position-wise attention layer of a graph neural network with an additional recurrent network and a transformer layer GCNN + Gated RNN 2022 [40] DHSTNet GCN-DHSTNet NYC bike data and taxi Beijing data Short term Taking into consideration spatiotemporal dependencies as well as additional external considerations like road quality CNN + LSTM CNN + LSTM + GCNN Journal of Advanced Transportation

Figure 4 :
Figure 4: A simple bidirectional LSTM cell with F-LSTM as forward LSTM cell and R-LSTM as reverse LSTM cell.

Figure 5 :
Figure 5: From static to dynamic features: exploring the relationship between space and time with a GNN and LSTM.

) 4 . 4 .
Pairwise Distance Matrix.Let D be the pairwise distance matrix of size (N * N), where N is the number of nodes.Te element D ij represents the Euclidean distance between node i and node j.In this study, D was constructed as follows: Dij � sqrt sum (Xi − Xj) 2    .(6) 4.5.Edge Weight Matrix.Let W be the edge weight matrix of size (N * N), where N is the number of nodes.Te element W ij represents the weight of the edge between node i and

Figure 8 :
Figure 8: A sample graph visualisation to show the connection between seven diferent road segments.
based on BiLSTM and GNN to showcase its efcacy.Te following is a brief summary of the baselines.(i) TFFNet: It simply creates a cubic spatiotemporal trajectory by dividing and matching the GPS trajectory data from each day's relevant locations.By integrating the sampling from each cube slice, a path is produced.A spatial grid is made by connecting each of the routes.Te graph shows the volume of trafc in each grid cell over a 15minute period.Te model is trained using the Wuhan trafc dataset using a deep convolution neural network based on residual network architecture.(ii) Dynamic GRCNN: Tey predict the movement of people in city trafc.Tey created incidence dynamic graph structures to replicate the trafc linkages from historical passenger movements among stations and used the SubwayBJ, BusBJ, and TaxiBJ datasets for training their model based on the LSTM and graph convolution network.(iii) trafcBERT: Tey constructed a model of the transformer by stacking numerous layers of encoders in order to preserve the BERT properties.After that, by combining all the data, they were able to get the model to comprehend the full trafc fow.Teir model used the METR-LA, PeMS-L, and PeMS-Bay datasets to anticipate trafc volume using a transformer-based BERT algorithm.(iv)ST-TrafcNet: Tey suggested a novel multidifusion convolution blocks made up of attentive and bidirectional convolution for capturing spatial interactions.High-dimensional temporal data are kept in layered long short-term memory (LSTM) blocks.Tey employed LSTM with multidifusion convolution blocks to extract and forecast spatiotemporal characteristics.
value, a � actual value, n � number of data points, i � current data point.

Figure 11 :
Figure 11: Te predicted vs. ground-truth daily trafc fow of sensor 158324 from February to June.

Figure 12 :
Figure 12: Te comparison of performance indicators on both datasets.

Table 2 :
Te CityPulse Road Trafc Dataset of nine parameters with descriptions and example values.Te last column indicates if the felds are selected for study or not.
Figure 6: Map of Hinnerup, Arhus indication starting and ending point of observations [89].

Table 4 :
Explanation of the nodes and edge information depicted in Figure8.

Table 5 :
Te header of the merged dataset for model training.
Table 6shows the MAPE of our model compared with some of the other baseline models.
i � current data point.

Table 6 :
Te comparison of the MAPE value of our model with the baseline models.

Table 7 :
Te RMSE value of our model with baseline models.