Attention Mechanism with Spatial-Temporal Joint Deep Learning Model for the Forecasting of Short-Term Passenger Flow Distribution at the Railway Station

,


Introduction
With its attributes of speed, exceptional comfort, high safety standards, and environmental friendliness, global highspeed railway operations face signifcant demand for travel.Railway stations serve as the central hubs of the highspeed railway network, responsible for the pivotal task of gathering and dispersing passengers.For instance, Berlin Central Station and Shanghai Hongqiao Station each experience daily passenger infows of 250,000 and 126,000, respectively.Given the substantial infux of passengers into railway stations, frequent peaks in passenger activity occur within the station premises, particularly driven by train operation events.Tese peaks lead to a signifcant concentration of passengers in specifc areas within a short timeframe, generating substantial trafc demand.Such scenarios can exert considerable pressure on public security and may result in various negative incidents, including stampedes.
Predicting the real-time dynamics of in-station passenger fow is essential for ensuring efcient and safe station crowd management.However, railway stations primarily estimate passenger fow distribution through manual and video monitoring, relying on empirical methods to assess crowd tendencies and implementing reactive control measures accordingly.Without the utilization of mathematical programming and scientifc methodologies, station management may demonstrate inadequate overall and dynamic performance.Terefore, it is crucial to explore new methods for accurately predicting the short-term spatial and temporal distribution of passenger fow in railway stations.Such advancements will enable proactive management strategies, enhancing efciency and safety within the station environment.
Te current methods for trafc fow forecasting can be broadly categorized into two fundamental classifcations: parametric methods and nonparametric methods.Parametric models consist of linear regression model [1], autoregressive moving average (ARIMA) [2], historical average models (HA) [3], Kalman flter models [4], etc. Tese models are based only on historical fow data; therefore, the capacity of taking care of temperamental and nonlinear trafc fow information is restricted.Nonparametric model conquers such drawbacks, capable of capturing additional transient attributes from verifable historical data.It consists of the K nearest neighbor algorithm [5], support vector machines (SVMs) [6], Bayesian networks [7], neural network models [8,9], and so on.Tese models achieve more accurate predictions than parametric models, but their relatively simple architecture and consideration of a single variable of passenger fow limit the result accuracy.
Te neural network model can realize the representation of complex high-dimensional functions through the stack of hidden layers, efectively capturing the dynamic characteristics of nonlinear trafc.Terefore, increasing research focused on utilization of the neural network model and its variants, including the Artifcial Neural Networks (ANN) [10], the Radial Basis Function Networks (RBF) [11,12], and the Recurrent Neural Networks (RNN) [13], to improve the accuracy of forecasting.Ma et al. [14], Huang et al. [15], and Li et al. [16] utilized the improved LSTM for efectively capturing nonlinear trafc dynamics.Fu et al. [17] used Gated Recurrent Units (GRUs) for predicting the short-term fow.Li et al. [18] applied LSTM to the prediction of the short-term departing passenger fow in railway stations and demonstrated the model's efectiveness.Tese studies only took into account the temporal features exhibited by passenger fow, ignoring the efect of spatial correlation on fow fuctuations.
Terefore, based on time series data, certain studies begin to include the spatial features exhibited by trafc data as one of the variables infuencing the variation in trafc fow.Due to the efective spatial feature extraction capabilities in regular grids, Convolutional Neural Networks (CNNs) are commonly employed [19,20].However, due to CNNs' difculty in applying to data with a non-Euclidean structure, Graph Convolutional Networks (GCNs) are proposed [21].Terefore, for improved performance, numerous models make use of spatial and temporal deep learning models, following the spatial-temporal models.Zhao et al. [22] represented the entire urban route network as an undirected graph, respectively, capturing geographical and temporal dependence in the trafc fow data using GCN and GRU.Zhang et al. [23] proposed a deep learning architecture combining the residual network, GCN, and LSTM for predicting the short-term passenger infow and outfow in the urban railway stations on a network scale.Yu et al. [24] ofered a Spatial-Temporal Graph Convolutional Network (STGCN) deep learning framework (STGCN), for dealing with the time series prediction issue.
Compared to the application of time series prediction models and graph processing models, the application of attention mechanisms in fow prediction started relatively late.An increasing number of studies have begun to deeply concern attentional mechanism in spatial feature extraction considering its advantages.Wang et al. [25] utilized a unique CNN design for this task, which adopts two attention levels for better recognizing spatial-temporal patterns.Du et al. [26] proposed an attention-based LSTM network and convolution network to identify the potential patterns among the time series to achieve classifcation.Cinar et al. [27] proposed an extended attention model for recurrent neural networks (RNNs) designed to capture periods in time series.Zhou et al. [28] put forward a wide-attention and deep-composite (WADC) model, adopting the self-attention mechanism for extracting global pivotal features from data fows.Guo et al. [29] utilized the spatial-temporal attention mechanism for capturing the trafc fow data spatialtemporal correlation.Wang et al. [30] established a technique for attention mechanism in the GCN for determining the role played by variables in each node.
Although many models have considered spatial and temporal information, they ignore the way external factors impact data fow in prediction process.Zhang et al. [23] considered weather conditions and air quality in the process of passenger infow prediction and quantifed their infuences on prediction precision but did not take into account how train timetable afects passenger infow.Similarly, Wang et al. [31] proposed a temporal graph attention convolutional neural network model (TGACN) for forecasting the passenger density at signifcant station regions but did not take into account external factors, like train timetable, events, or area functional attributes.
Based on the aforementioned summary of existing research, the majority of studies have focused on leveraging temporal-spatial correlation within various networks such as railway networks and urban road networks.However, an important consideration is that although passenger fows enter the station through entrances in diferent directions, the internal travel process remains consistent due to the infuence of functional area distribution.Consequently, the micronetwork of the station, based on passenger circulation plans, exhibits a consistent local structure.As a result, traditional graph convolutional network methods, like GCN and GAN, which rely on structure awareness, are no longer applicable.To address this limitation, we propose a spatial feature extraction method based on DeepWalk to establish location awareness of network nodes, enabling the quantifcation of node information with similar attributes but varying location distributions.Additionally, utilizing data obtained from AFC systems, inbound or outbound passenger fows within railway stations are primarily considered as the prediction target.However, to date, no relevant research has explored the use of in-station micronetworks for predicting dynamic passenger fow distribution.Terefore, there is a research demand for further investigation into integrating in-station spatial and temporal correlations, as well as assessing the applicability of current methodologies for predicting passenger fow dynamics at various locations within the railway station.
To address these challenges, this research proposes a novel deep learning architecture named ST-Bi-LSTM, which incorporates DeepWalk-based spatial-temporal attention mechanisms.Tis architecture utilizes in-station video monitoring statistics to predict short-term passenger fow distribution in key areas.In this model, the spatial dependency between nodes in the station network is represented by a directed and unprivileged graph.Te Deep-Walk model efectively captures spatial correlations by mapping semantic correlations between nodes, overcoming the limitations of traditional graph convolutional networks in distinguishing identical local structures within the network.Furthermore, the extracted spatial information is integrated with passenger fow distribution data, train schedule data, and area information.Tese fused features are then fed into the prediction framework to achieve accurate short-term passenger fow distribution predictions.Te proposed architecture makes three primary contributions: (1) Te study introduces a novel DeepWalk-based combined spatial-temporal short-term prediction model for passenger fow.Additionally, it implements comprehensive prediction of passenger fow distribution in railway stations using dynamic video monitoring.
(2) Te study enriches the deep learning model by incorporating both spatial and temporal attention mechanisms to improve the accuracy and efectiveness of passenger fow prediction.It has been demonstrated that the model outperforms other baseline approaches in terms of accuracy and performance, showcasing its superiority in fow prediction.
(3) Considering that the behavior of in-station passengers is predominantly infuenced by their travel purpose, train timetable, and in-station activities, these factors are integrated into the framework along with area location information.Te efectiveness of incorporating these three factors in improving prediction accuracy is demonstrated through ablation experiments.
Te remainder of the paper is organized into four sections.Section 2 introduces the ST-Bi-LSTM architecture.Section 3 outlines the methodologies of spatial and temporal attention mechanisms, as well as Bi-LSTM.Section 4 presents the analysis of the case study results, along with the main fndings.Section 5 summarizes the current study and discusses its implications, limitations, and potential directions for future research.

Forecasting Architecture
In this article, we present the ST-Bi-LSTM model architecture, depicted in Figure 1, which comprises four data branches.Tese branches are represented by I 1 to I 4 .Te input data are collected from time t − n to t, and output data are obtained at time t + 1.We denote the datasets for each branch as Branch 2.1 to Branch 2.4.Branch 2.1 constructs station spatial network topology information and extracts spatial features of diferent nodes using DeepWalk algorithms.Branch 2.2 represents passenger fow distribution data.Te spatial-temporal passenger fow distribution dataset is obtained by combining the spatial correlation characteristics of the output from Branch 2.1 spatial attention model.Branch 2.3 accounts for the impact of train timetables and station operating parameters on prediction accuracy.Branch 2.4 utilizes passenger travel behavior data to classify the functional characteristics exhibited by various areas within the station.After completing the preprocessing stage of the four data branches, feature fusion is conducted in the prediction architecture.Furthermore, Bi-LSTM, in combination with the temporal attention model, is employed in the trunk to extract data from each branch.Sections 2.1-2.4 provide a comprehensive overview of the model architecture.
2.1.Spatial Network Topology.Te spatial network topology has been shown to be important for the short-term passenger fow prediction [32].In railway stations, the spatial structure of the station and the facility layout limit passenger movement in railway stations.Terefore, our study treats the station space as a directed spatial network structure consisting of functional areas (vertex) and passenger circulation line (edges).Defne G � (V t , E t ) as the time-varying spatial network topology in the station at time t.We describe the treatment of the network topology in Section 3, using the network G � (V t , E t ) as input.For the network architecture remains constant throughout our investigation, we simply take into account the real-time pattern. (1)

Passenger Flow Distribution Data. Area fow prediction
necessitates historical fow data.In this study, we utilize passing passenger fow data collected by CCTV to generate the experimental dataset based on the functional characteristics exhibited by various regions within the station.To obtain passenger fow data, we employ a pedestrian dynamic tracking method by slicing the real-time monitoring video.Given the crowded nature of the scene, there exists an occlusion problem between passengers in the station and the camera viewpoint.Tis occlusion issue may result in a higher rate of missed detections if pedestrians are detected as a whole.Terefore, our study employs the head-tracking method for passenger fow tracking, enabling accurate passenger fow statistics within the designated area.Te resulting passenger fow distribution data series are represented by the following equation: where X t is the set of passenger fows for each partition in the space at historical time steps t, x n,t represents passenger fow of the area at the corresponding time step, and n denotes the number of areas for each area.According to the number of each area in the network, the areas are arranged in columns.

Train Timetable.
To our knowledge, previous research has not explored the impact of train timetables on the distribution of in-station passenger fows.However, the train timetable is a crucial factor infuencing the distribution of passenger fow at stations.For example, as the departure time approaches and ticketing information is broadcasted in the station, passenger fow tends to gather in the ticketing area.Conversely, passenger fow in the waiting area may decline, as Chinese railway passengers typically wait for trains within the station premises.
Te study defnes "train departure time proximity" as a metric to measure the impact of timetable.Since the train departure time is determined based on the train timetable, taking the opening time of the ticket gate as T 0 , the time that corresponds to passenger fow change in the local area due to passenger behavior is T.Ten the diference between the two time nodes is ∆T � T 0 − T, and the diferent time span is used to judge the train departure time proximity P at different times, as shown in Table 1.Te preprocessed input data for train timetable were obtained, as shown in the following equation: where P k,t denotes the set of departure time diferences of trains with diferent distances from each other at time t, k denotes the region number, which is used to represent the efect of the train's departure time on the passenger fow in the region, and ω denotes the train number of the station sorted by the train's departure time in a day.First is the area location information.According to the study by Liu et al. [33], the following 3 metrics should be used to determine a target node's node relevance in a trafc network: (1) Te number of other nodes that are connected with the target node.(2) Weights of edges that are connected to the target node.(3) Te signifcance possessed by other nodes that are connected with the target node.
Since this study constructs a directed unweighted network, metrics (1) and (3) are considered.
In addition to considering the impact of regional functional attributes on passenger fow fuctuations, we also explore the infuence of area function on passengers' in-station behavior.To investigate this further, we conducted a survey on travelers' behavior, defning the concept of "Passenger Aggregation Factor" to quantify the degree of fow aggregation in diferent areas.Te survey on travelers' behavior was disseminated as a questionnaire via the Internet, yielding 846 valid responses.Participants ranged in age from 20 to 65 years old (mean � 26.51, standard deviation � 6.83), with a male-tofemale gender ratio of 49.76% and 50.24%, respectively.All participants had prior experience with rail travel and were familiar with the process.
Based on the survey results, the ratio of travelers heading to a single area within the travel process to the total number of respondents was categorized into three classes: 0% to 40%, 40% to 60%, and greater than 60%.Tese classes corresponded to aggregation values of 1 to 3, respectively, with higher values indicating greater passenger fow fuctuation in the area, as shown in Table 2.
Hence, equations below explain the input.
where A k,t denotes information data of area k at time t and a i,t is value of type i information for the region k at time t.
To generate weighted indicators, the network fattens the preprocessed input data and incorporates them into the fully connected layer.Subsequently, for the frst and second layers, a Bi-LSTM with 128 neurons is introduced.Te network then transfers the results to the feature fusion section.

Methodology
Te methodology design of the proposed model incorporates attention mechanisms and Bidirectional Long Short-Term Memory (Bi-LSTM).Terefore, the study provides a brief overview of the individual methodologies of each component.

DeepWalk-Based Spatial Attention Mechanism.
Previous studies have indicated that, in contrast to the structure-aware feature of other graph neural networks, the position-aware feature of DeepWalk can efectively distinguish identical local structures within the network.Consequently, it has the capability to capture a broader range of graph structures and extract vertex information [34].DeepWalk was introduced by Perozzi et al. [35] in 2014, comprising a random walk generator for node sequence sampling and a semantic model SkipGram for embedding representation of node information.
In this study, DeepWalk is adopted to take the node sequence obtained by random walk sampling as data input.Mapping the semantic correlation between nodes to spatial correlation by SkipGram, DeepWalk uses the probability distribution of co-occurrence between the target node v i and other nodes v j , j ∈ [i − w, i + w], j ≠ i in the network as the spatial attention weight to distinguish the contribution made by passenger fow fuctuations in other areas of the network to the predicted area passenger fow.
Algorithms 1 and 2 show the main methods of Deep-Walk and SkipGram, where W v i is a sequence of nodes with v i as root and Ψ(v j ) denotes a mapping function, which maps the vertex v j to representation vector.Te objective function of SkipGram holds the objective function of maximizing the co-occurrence probability of neighbor nodes and target nodes in the sequence.Equation ( 5) explains the problem.
After this, soft attention is to calculate the probability distribution.Te Hierarchical Softmax is chosen as the spatial attention distribution calculation function, taking the probability calculation of the input vector of node v j as an example.
where λ j denotes attention distribution and Pr(v j |Φ(v i )) is the scoring function for attention model.
In above equations, W f , W i , W c stands for the corresponding weight of diferent gates f, i, c, b f , b i , b c denotes the bias term in diferent parts of gate in forgetting gate f t , input gate i t , and output gate O t , h (l)  t represents lth hidden layer state at time t, and σ is the sigmoid activation function.
Bi-LSTM considers forward and backward LSTM simultaneously.For each moment of fow, the input is provided to two LSTMs in opposite directions, one participating A in the forward computation and another A ′ in the reverse.Te fnal network output depends on the summation of the forward and reverse calculations, but the weights are not shared between the two directions.Figure 3 displays the Bi-LSTM structure.

6
Journal of Advanced Transportation where x t refers to the input fow at time t, U, V represent diferent weight matrices in the network, and W (1) , W (2)  correspond to diferent weights from both the forward and reverse order directions.

Temporal Attention Mechanism.
Prediction models for passenger fow distribution can be infuenced by various factors such as area attributes, spatial network topology, passenger entries, and train timetables, resulting in a high level of complexity.Terefore, assigning weight scores solely based on recentness in the Bi-LSTM network might be inadequate.Given that LSTM combined with attention mechanisms has proven efective in trafc fow prediction, many researchers have incorporated it into short-term passenger fow prediction [36][37][38].Consequently, we introduced a temporal attention mechanism to capture diferent feature weights.
To address the limitation of recentness-based weight score assignment by traditional attention mechanisms, we utilize a fully connected network to provide weights that can be graded based on the Bi-LSTM output.Tis approach builds upon earlier work by Zhang et al. [36].As a result, the proposed model assigns scores to the output weights.Subsequently, we obtain the attentionbased hidden layer output H * by the following procedure.
where q is the query vector, α n denotes attention distribution, s(h n , q) denotes additive model, and W ′ , U ′ , and v T are learned parameters.

Case Study
In this section, we provide an overview of the study scenario, detailing the model confguration and comparing it with other prediction models to assess the predictive efectiveness of the developed framework.5. Passenger fow data are collected and quantifed in diferent areas using multiangle and multidirectional realtime monitoring video within the station.OpenCV and its classifer are employed to achieve dynamic people counting within delineated areas, as illustrated in Figure 6.Tis fgure presents an example of dynamic tracking statistics from the waiting hall monitoring video at Tianjin West Station.Following data collection, one-day passenger fow data from each area within the station are obtained.Specifcally, passenger fow and train timetable data from 16 areas within the hub are selected for analysis during the time period of 8: 00-19:10 on December 15, 2020.Figure 7 visualizes the distribution of passenger fow data across the 16 regions during the study period.To analyze the model's prediction performance at diferent time granularities, the passenger fow data are processed into 10-second, 20-second, 30second, and 60-second time intervals.For model calibration, the validation split rate is set at 0.2.Section 2 provides examples of data preprocessing.
In the formula, y i is the true value, ŷi is the predicted value, and n is the number of training samples.
(None, 128, 3) permute_1: Permute output:  10 Journal of Advanced Transportation RNN and LSTM.Te passenger fow prediction efect is signifcantly optimized when comparing ST-Bi-LSTM with other deep learning and mathematical statistical models optimized considerably.Obviously, the architecture in the study is highly robust, i.e., even if a branch is removed from the framework, the prediction results do not show a large error change as a result.Among ST-Bi-LSTM and its fve variants, due to numerous characteristics, the full ST-Bi-LSTM functions, including spatial-temporal correlation, train timetable, area attributes, and other factors, are fully considered in the framework.Te results indicate that spatial correlation has the greatest impact on prediction outcomes, followed by inputting train timetable and area information.Tis suggests that passenger fow is infuenced bytrain departure times, as passengers tend to gather both at waiting areas and ticket gates.Our model helps to obtain the passenger fow fuctuation degree in the area.After quantifying these two efects, based on 30 s timegranularity passenger fow data, the RMSE, MAE, and MAPE decreased from 13.855, 8.527, and 20.2% to 13.215, 8.126, and 18.4%, respectively, for the model considering schedule and area attractiveness compared to ST-Bi-LSTM (no T&A).
Figure 11 compares the prediction performance exhibited by ST-Bi-LSTM with Bi-LSTM and LSTM by displaying their prediction residuals for 30 s timegranularity passenger fow data.Te three models have the same peak residuals at 16:44 and 18:30, and the ftting efect is weak for large passenger fows.Te residual peaks of the three models remain the same at 16:44.Still, the residual peaks of the 18:30 passenger fow refect large diferences, with the residuals of the LSTM ranging from −100 to 150, the residuals of the temporal attention Bi-LSTM and ST-Bi-LSTM being signifcantly smaller, mostly in the range of −100 to 50, and the residuals of the ST-Bi-LSTM having the smallest overall fuctuations and the lowest peaks.Terefore, the residuals of the ST-Bi-LSTM fusion model exhibit a stronger stability and the absolute value is lower, relative to the other two models.

Fusion Model Prediction Performance Evaluation.
For more clearly testing the model prediction performance, scatter plots and linear fts of the actual and predicted values for the cases of time granularity of 10 s, 20 s, 30 s, and 60 s are plotted in Figure 12.Te black line is the linear ft target.Te calculated Pearson's correlation coefcients at diferent time granularities were 0.99482, 0.98614, 0.97466, 0.97972 , and the standard deviations of the residuals were 2.6827, 6.7084, 14.2942, 25.8434..As the time granularity increases, the prediction efect decreases slightly, especially for some of the peak passenger fow locations.As the time granularity increases for the same study period, the dataset's data decrease, and the learning efect decreases slightly.However, the overall model can still ft the passenger fow data more accurately at diferent time granularities.Te prediction output of the framework and its residuals for diferent time granularity data are analyzed, and Figure 13 shows the passenger fow data and residuals for diferent time granularity data.It can be seen that the residual peaks appear in similar areas for all four-time granularities, indicating that the points of passenger fow

Comparison of Graph Feature Extraction Performance.
For verifying the prediction efectiveness of the network features extracted by DeepWalk graph neural network compared with GCN, the study compares the prediction error results for two and three layers of GCN with diferent neurons, respectively, without considering the infuence of exogenous factors in the process.Table 4 lists the passenger fow prediction errors of the GCN with diferent parameters and plots the spline connectivity as in Figure 14, which shows that the prediction errors of the two-layer GCN (blue line) are overall higher than those of the DeepWalk (black dashed line) for the same prediction using Bi-LSTM.In comparison, the prediction errors of the three-layer GCN are only slightly lower than those of the DeepWalk when the neurons are taken as 16 and 32.However, comparison of their time complexities ) D is the feature dimension), DeepWalk-based spatial feature extraction method has 81% lower time complexity than the GCN-based method in the application scenario of this study.

Prediction Performance on Individual
Area.Te study selects three typical areas within the Tianjin West Hub for examining whether the forecasting framework can efectively predict the passenger fow in diferent types of areas with varying characteristics of fuctuation in space.Te frst is Area 6, a ticket gate in the Tianjin West Hub, which shows signifcant fuctuations in passenger fow over time.Te second is Area 9, a commercial area within the hub, adjacent to the waiting area and ticket gates.Te last one is Area 13, a waiting area between Area 7 and 16 ticket gates, which is connected to the commercial area, the entry gates, and the ticket gates.

Journal of Advanced Transportation
According to Figure 15, for the ticket gate area 6, the predicted values at the four time granularities always agree with the actual values during both peak and of-peak periods (refer to the residual plot in Figure 11), showing the strong robustness of ST-Bi-LSTM.Because the ticket gate is the end of the passenger circulation line, which converges the passenger fows from diferent fow lines and is a signifcant area infuenced by the train schedule, the regularity is evident, which helps the prediction efect.
As an essential transition area for the passenger waiting and ticketing process in the hub, the waiting area, i.e., Area 13 passenger fow, exhibits multipeak characteristics, and the prediction framework maintains good performance at all three time granularities, during the passenger fow period, as shown in Figure 16.
Unlike the ticketing and waiting areas, the commercial area within the hub, i.e., Area 9, shows low regularity and signifcant variation in passenger infow.However, as shown in Figure 17, the prediction framework we designed still better captures the regional passenger fow trend.Moreover, the ftting efect of the framework does not deteriorate as the time granularity increases from 10 to 60 seconds, indicating the reliability of the prediction performance of the framework.
In summary, the feasibility of the passenger fow forecasting framework constructed by the study for accurate forecasting is further verifed by demonstrating the forecasting results for diferent areas within the station.However, due to the limited availability of in-station monitoring video data and railway operation data obtained in this research, there is ample room for further improvement and dissemination of the research fndings.Future research endeavors will focus on investigating the impact of heterogeneous traveler behavior on passenger fow fuctuations resulting from regional functional diferences, with the aim of signifcantly enhancing prediction accuracy and extending forecast duration.

Conclusion
Regarding experimental data, the collection of passenger fow data through dynamic pedestrian monitoring from surveillance videos in the hub poses challenges in processing.Consequently, only a portion of the station area is selected for this study to validate the reliability of the prediction framework.We intend to expand the study area in future research initiatives.
Additionally, from the results of the scatter plot test, it is observed that prediction accuracy slightly decreases as time granularity increases, particularly for peak predictions.Future studies are advised to address these limitations and focus on mitigating the aforementioned challenges.

Figure 3 :
Figure 3: Bidirectional recurrent neural network structure expanded by time series. c

Figure 4 :Figure 5 :
Figure 4: Second foor layout of Tianjin West Passenger Transport Station.

4. 3 .
Baseline Models.Te section primarily focuses on evaluating the efectiveness of various time series prediction models.Except for the ARIMA prediction model, all deep learning model optimizers used for comparison are uniformly selected from RAdam.RAdam is an optimizer that combines the advantages of SGD for fast convergence and Adam for fast training[39].For the three variants of the prediction framework, we use the same parameters as the proposed model to ensure a fair comparison.ARIMA: It is a common traditional time series data forecasting model.We use the minimum AIC principle to determine the model order to achieve the prediction for passenger fow data.BPNN: the BPNN network includes 2 hidden layers, each of which has 64 neurons.

4. 5 .
Results and Discussion 4.5.1.Prediction Error Evaluation Index.According to Table 3 and Figure 10, ST-Bi-LSTM signifcantly outperforms the basic mathematical statistical model and other deep learning models.Te ARIMA model is the least effective in training due to its inability to capture the full range of nonlinear characteristics of passenger trafc.CNNs have the worst prediction results relative to other deep learning models, and this gap increases as the time granularity increases.Relative to traditional RNN, the LSTM has improved the accuracy of training efect for passenger fow data.BPNN training obtains slightly better results relative to

Figure 16 :
Figure 16: Predicted and actual value comparison for passenger waiting area.

Figure 17 : 3 )
Figure 17: Predicted and actual value comparison for commercial area.
As previously discussed, the entire graph is sampled into multiple node sequences and fed into DeepWalk.However, this process results in the disruption of many edges connected to nodes and the loss of some connection information.Such loss could potentially detrimentally afect the training and prediction of graph data.Terefore, to address this issue and mitigate information loss, data that encapsulate the attributes and characteristics of the area itself are utilized as input for Branch 2.4.

Table 1 :
Approaching value of departure time.Due to the spatial-temporal correlation of passenger fow and the large, complex, and variable data involved, prediction process involves handling high volumes of data..We establish a Bi-LSTM deep learning model to capture spatial-temporal features of the passenger fow.Figure2shows the structure of each neuron in LSTM network, which consists of three parts: forgetting gate f t , input gate i t , and output gate O t , which can be represented as follows:

Table 2 :
Attractiveness of diferent functional areas.Vertex representation matrix Ψ ∈ R |V|×d (1) Initialization: Sliding window sampling Ψ from u |V|×d (2) Construct a binary Tree T from V

Table 3 :
Comparison of prediction performances obtained using diferent TGs in diferent baseline models.

Table 4 :
Comparison of GCN and DeepWalk-based method prediction errors for diferent parameters.