Prediction and Impact Analysis of Passenger Flow in Urban Rail Transit in the Postpandemic Era

,


Introduction
Te COVID-19 pandemic (hereinafter referred to as the pandemic) has had a huge impact on urban rail transit. At the national level, data from the China Urban Rail Transit Operation Development Report (2020-2021) showed that the average passenger intensity of urban rail transit in China would be 4,500 person-time/(km-day) in 2020, a decrease of 2,700 person-time/(km-day) or 36.9% compared with the same period last year. In Shanghai, according to the Annual Report of Shanghai Comprehensive Transportation Development in 2021, the average daily passenger fow of rail transit in Shanghai was 7.75 million passenger trips per day in 2020, down 27.2% year-on-year.
Since the outbreak of the pandemic in January 2020, passenger fow in the rail transit of Shanghai can be categorized into three stages: sharp decline, gradual recovery, and normal stabilization [1]. On January 24, the Shanghai Municipal People's Government activated the Level 1 response mechanism for major public health emergencies, which, combined with the Spring Festival efect, led to a sudden drop in passenger fow. Since February 9, work and production resumed, and the total number of passengers on working days started to recover gradually. On May 9, the emergency response level for major public health emergencies in Shanghai was raised to Level 3, and the total passenger fow stabilized to over 9 million person-times, entering the normal stabilization phase. Tis is followed by the postpandemic era. Tis is the period in which COVID-19 cases are expected to be under control but will continue to have a lasting and signifcant impact on the public's daily choice of trip modes [2,3]. In the postpandemic era, the number of new COVID-19 cases will tend to stabilize but minor eruptions are likely.
Passenger fow prediction considering the impact of the pandemic can help accurately assess urban rail transit demand and provide an important reference for predicting the operation state of urban rail transit and in the formulation of organizational management strategies in the postpandemic era. To predict the passenger fow in rail transit, existing research methods can be divided into parametric and nonparametric models. Parametric models are mostly based on a self-regression time-series model, where historical passenger fow estimation model parameters are used to forecast the future passenger fow. For instance, Wang et al. [4] used the seasonal autoregressive integrated moving average (SARIMA) model to predict the daily inbound passenger fow in Beijing. Te SARIMA model can predict periodic time series more accurately than the conventional ARIMA model. Similarly, Kumar and Vanajakshi [5] constructed a SARIMA model for a time series analysis of shortterm trafc fow based on limited input data. Milenković et al. [6] adopted a SARIMA model to predict the monthly passenger fows of Serbian railways. Li et al. [7] used the SARIMA model to predict the hourly passenger fow of the Guangzhou-Zhuhai intercity railroad.
However, the self-regression model only considers the variation in the passenger fow over time, ignoring the infuence of external factors such as holidays and weather. To overcome the above shortcomings, Cheng et al. [8] incorporated the holiday efect in SARIMA with the exogenous factors (ARIMAX) method to build a daily passenger fow prediction model for Hongqiao hub. Xu et al. [9] used the SARIMAX model to explain the efect of diferent weather factors on subway passenger fow. Bai et al. [10] proposed a combined ARIMA and multiple linear regression model for a nonconventional short-term passenger fow prediction of urban rail transit.
Compared with parametric models, nonparametric models are more fexible and can efectively deal with the nonlinear relationship between passenger fow and multidimensional infuencing factors, thus producing a better prediction performance. Te main methods for predicting the daily passenger fow in rail transit using nonparametric models include the hybrid deep neural network model [11], long short-term memory (LSTM) neural network [12], random forest model [13], support vector machine [14], and bilayer parallel wavelet neural networks [15].
LSTM neural networks can learn long-term information by introducing gated units, such as forgetting gates, memory gates, and output gates, based on recurrent neural networks for an efective prediction of nonlinear time-series data [16]. Li et al. [12] divided the factors infuencing passenger fow into external and internal and used the LSTM neural network for a 15-min real-time prediction of the passenger fow in rail transit. Te prediction accuracy of the LSTM was higher than that of the multiple linear regression and back propagation (BP) neural networks. Teng and Li [17] combined the LSTM neural network with the particle swarm optimization (PSO) algorithm to predict the daily passenger fow in the Shanghai-Nanjing one-way railroad, considering date attributes and weather factors. Liu et al. [18] established an LSTM neural network to forecast hourly metro passenger fows, and the efects of weather variables on the model's performance were analyzed. Yang et al. [19] built a spatiotemporal LSTM to analyze the time series outbound passenger volume at urban rail stations using historical passenger volume data, a station origin-destination matrix, and rail transit operation data.
A variant of the LSTM model has been applied to the prediction of passenger fow in rail transit. Hou et al. [20] used a gated recurrent unit (GRU) neural network to predict the short-term passenger fow in urban rail transit. Te results showed that the GRU has faster convergence, lower prediction error, and better stability than the LSTM. Huang et al. [21] used the gray relation analysis (GRA) to flter the weather factor with a high correlation with the passenger fow and the bidirectional LSTM (BiLSTM) to predict the hourly passenger fow in rail transit on weekdays and nonweekdays, respectively. Te BiLSTM outperformed the conventional LSTM in terms of the prediction performance.
In summary, in terms of the infuencing factors, the weather and holiday attributes are external factors signifcantly afecting the passenger fow; there has been no research on the infuence of the pandemic on passenger fow. Tis study considered incorporating the daily number of local new COVID-19 cases, weather, temperature, and holidays in the daily passenger fow prediction process for an accurate prediction of the passenger fow. In terms of the model performance, the parametric model is superior at handling time series with signifcant trends and seasonal variations, while the nonparametric model is more efective at handling multidimensional nonlinear inputs. Passenger fow is signifcantly infuenced by nonperiodic factors such as holidays and weather. In the pandemic era, the sudden factor of daily new cases should also be incorporated in the passenger fow prediction so that the nonparametric model can achieve a higher prediction accuracy. However, the "black box" characteristic of the nonparametric model prevents it from assessing the quantitative relationship between the input and output variables. Te above studies that used nonparametric models focused on the passenger fow prediction performance without explaining the infuence degree of each factor on the passenger fow.
In this study, a partial dependence function was used to compensate for the poor interpretation of the nonparametric model. An efective GRU neural network model was constructed for passenger fow prediction based on the daily passenger fow data and the daily number of local new cases in Shanghai urban rail transit. A partial dependence plot (PDP) was then employed to explore the external factors afecting passenger fow and to investigate the quantitative relationship between the pandemic and daily passenger fow. Tis study provides a basis for urban rail transit demand prediction, operation organization, and policy implementation.

Data Preparation
Since the outbreak of the pandemic, passenger fow in the rail transit of Shanghai has experienced three stages, as shown in Figure 1 (where the unit in the y-axis is 10,000 person-time). Te focus of this study is the daily passenger fow in the rail transit of Shanghai in the postpandemic era, i.e., the third period in Figure 1. Te postpandemic era is the period in which COVID-19 cases will be under control but will have a lasting and signifcant impact on the public's daily choice of trip modes until the coronavirus becomes less harmful. Te reasons for selecting the postpandemic era are as follows: (1) among the three periods shown in Figure 1, the postpandemic era is the most long-lasting one and has the most long-lasting impact on the public's daily life and (2) this lasting and sustained impact will make the quantitative results calculated by partial dependence methods more practical.
Te analysis period was from June 1, 2020, to December 31, 2021 (a total of 579 days). Te obtained information included the daily passenger fow in rail transit, the daily number of local new COVID-19 cases in Shanghai, and the corresponding weather and holiday attributes of Shanghai on that day. Te data were collected from Weibo. Figure 2 illustrates the time-series curves of the daily passenger fow and the daily number of new cases. As observed, there is a correlation between the daily number of local new cases and the daily passenger fow. Te passenger fow corresponding to the number of days where new cases appeared tends to be the local minimum in the period before and after. Notably, due to the impact of Severe Typhoon In-Fa, on July 26 th , 2021, Shanghai saw a massive suspension of classes, home ofces, and some rail transit lines; therefore, the passenger fow on that day reached the minimum: 1.814 million person-time.
Te relevant data will be used as input and output of the subsequent prediction model. Table 1 shows the variable defnitions and descriptive statistics. All the variables are divided into two categories: external and internal factors. As observed, local new cases occurred on 24 of the 579 days analyzed, accounting for 4%. In terms of the weather attribute, there were 249 days with rain, accounting for 43%; the mean minimum temperature and mean maximum temperature were 17°C and 23°C, respectively. In terms of the holiday attribute, the number of holidays is 399 days, including weekends, Spring Festival, Qingming Festival, Labor Day, Dragon Boat Festival, Mid-Autumn Festival, and National Day vacation, accounting for 31%. In 579 days, the mean daily passenger fow is 9.587 person-time, and SD is 2.287 person-time. Figure 3 shows the data structure on the i th day. Herein, the input variable dimension is t × n, where t represents the time step and n represents the feature dimension. Considering a cycle of 7 days a week, to predict the daily passenger fow of rail transit on the i th day, the features of that day and the previous 7 days (8 days in total) are used as input, i.e., the time step is set to 8, t � 1, . . . , 8. Te features for each day include the number of new cases on that day, weather attribute, minimum temperature, maximum temperature, holiday attribute, and passenger fow yesterday, with six feature dimensions, i.e., n � 1, . . . , 6. Since the frst seven days of data are used as input, the complete data structure is available from June 8, 2020, with a total of 572 days of valid data, i.e., i � 1, . . . , 572. In summary, the input variable on the i th day has a dimension of 8 × 6 � 48. Te output variable on the i th day is the number of passengers on that day, which corresponds to the variable X 0×6 i with a time step of 0 and a feature dimension of 6. Figure 4 shows the technical route employed in this study. First, the acquired raw data were divided into training and testing sets in the ratio of 3 : 1: training set for the frst 429 days (June 8, 2020, to August 10, 2021) and testing set for the second 143 days (August 11, 2021, to December 31, 2021). Te neural network model is then trained using the training set, and the prediction performance of the trained model is evaluated on the test set. During the training process, the model parameters are continuously optimized based on the loss function until the maximum number of iterations is reached. During the impact evaluation, the training and test sets are combined to train the fnal model. Based on the fnal model to determine the partial dependence function, the PDPs of the input and output variables are plotted, and the quantitative impact of the daily number of new cases on the daily passenger fow is evaluated.

GRU Neural Network Model.
Owing to the advantage of the nonparametric model in handling multidimensional nonlinear input, the GRU neural network is used to predict the daily passenger fow of rail transit. Te GRU neural network is a variant of the LSTM neural network, which is a special type of recursive neural network (RNN). Te RNN uses temporal-dimensional information to process data with temporal characteristics. However, it cannot solve the longterm dependency problem and has the disadvantages of gradient disappearance and gradient explosion, which led to the development of the LSTM neural network. Te LSTM neural network introduces various gated units (e.g., forgetting gates, memory gates, and output gates), retains information that requires long-term memory, and forgets information with decaying value for an accurate prediction of time-series data [22].
Compared with the LSTM, the GRU neural network is simpler and more efcient; Figure 5 shows its structure. In this approach, data input x t and hidden state h t−1 of the previous time step at time step t are received, and the hidden state h t of the current time step is outputted. Unlike the LSTM, the GRU neural network has only two gated units: the reset gate r t and the update gate z t . Te reset gate r t is used to discard irrelevant historical information and control how much of the previous time step's hidden state h t−1 needs to be retained by the candidate hidden state h t . Te candidate hidden state h t is used to assist in the computation of the hidden state h t . Te update gate z t is used to control how the hidden state h t is updated by the candidate hidden state h t [23].
Te expressions for the GRU neural network are as follows: where x t is the input of the current time step t, h t is the hidden state of time step t, and h t−1 is the hidden state of the previous time step. r t and z t are the reset and update gates, respectively. h t is the candidate hidden state. W ir , W iz , and W in are the weights of the input x t and reset gate, update gate, and candidate hidden state, respectively. W hz and W hn are the weights of the hidden state and update gate, candidate hidden state, respectively. b ir , b iz , b in , b hz , and b hn are the bias vectors. σ is the Sigmoid activation function, tanh is the tanh activation function, and * is the Hadamard product.
Te GRU neural network is trained using PyTorch with the parameter settings shown in Table 2. Te optimizer is a method to update the parameters in neural networks, where the goal is to make the parameters approximate or reach the optimal, thus minimizing the network loss. Te Adam optimizer used in this study combines the momentum algorithm with the root mean square propagation (RMSProp) algorithm, using the momentum cumulative gradient, for faster convergence and smaller fuctuations [24]. Te loss function used is the mean square error (MSE) loss, which is calculated as follows: where n is the number of predicted samples and X 0×6 i and X 0×6 i are the predicted and theoretical daily passenger fows on the i th day, respectively.

Sharply decrease
Gradually recover Stabilize

Partial Dependence Plot (PDP).
Te PDP is essentially a machine learning visualization method with "black box" features, which is widely used as a technique to increase the interpretation of input variables in machine learning or deep learning models [25,26]. Te plotting of the PDP relies on the ftted model to describe the average marginal efect of  Day (i -8) Figure 3: Data structure on the i th day.     Journal of Advanced Transportation a feature variable on the predicted outcome through a variable intervention approach [27]. To plot the PDP for feature variable x and output variable y, the steps are as follows:

Journal of Advanced Transportation
where z is the possible value of the number of new cases X 8×1 i , based on the statistical results listed in Table 1, z � 0, 1, . . . , 6. n is the number of samples, and according to the technical route shown in Figure 4, the full dataset is used to draw the PDP; therefore, n = 572. N GRU is the GRU neural network trained with the full sample.
Te PDP refects how the predicted mean daily passenger fow changes when a certain input variable changes. Te PDP analyzes the quantitative change in the output variable with the input variable and intuitively analyzes the causal relationship between them, increasing the explanatory power of the machine learning model [28]. However, the independence assumption is the main problem with the PDP. According to its principle, the premise of PDP is that there is no correlation between the model input and output, while in the real world, there are almost no two variables that are completely independent of each other. In this study, variables with a poor correlation with the daily passenger fow are considered to be used in the PDP calculation. Considering the strong correlation between the minimum and maximum temperatures, as well as the time series of the temperature itself, the analysis of the minimum and maximum temperatures is not included in the subsequent analysis.
It should be noted that besides PDP, there are also other methods that can be utilized for model interpretation, such as attention mechanism and saliency map. Te attention mechanism enables the model to selectively focus on a certain part of the input by adding an attention layer to the network. Te attention layer assigns a global alignment weight to the hidden layer of the encoder network, indicating which input component should be allocated with more attention. However, though the attention mechanism is able to indicate which input component is more important, it is not able to interpret the quantitative relationship between input and output with realistic meanings as PDP does. Te saliency map is a virtualization technique that can highlights the most important regions or features of an input that contribute most to the output prediction, which is widely used in image captioning and object detection. Te method used by saliency maps for identifying the important features is based on gradient calculation, the mechanism of which is similar to PDP, that is, to add a disturbance to the input and then examine the changes brought to model prediction results. Tus, PDP is fnally chosen as the model interpretation method due to its realistic signifcance.

Model Training and Prediction.
Based on the training set (2020), the GRU neural network was used for iterative training, and the training results of the LSTM network were used for comparison. Figure 6 shows the change in the MSE with the iteration number during the iteration. Notably, since the standardized variables are used in the network training process, the value range of the MSE is [0, 1]. As shown in Figure 6, both the GRU and LSTM achieve good convergence after 100 iterations. However, the LSTM fuctuates more at the beginning, and the MSE decreases to the same level as that in the case of the GRU after 80 iterations. In contrast, the GRU corresponds to a smoother curve that converges quickly at the beginning of training and stabilizes after 40 iterations. Terefore, the GRU neural network benefts from its simplicity and efectiveness and outperforms the LSTM in terms of the convergence speed and stability in the proposed daily passenger fow prediction problem.
Using the trained GRU neural network model, the prediction of the testing set was conducted. Based on the same testing set, the prediction result of the GRU was compared with the results outputted by LSTM, SARIMA, and SARIMAX (SARIMA with the exogenous factors). SARIMA is a conventional autoregressive model for timeseries prediction, while exogenous variables are added to the SARIMAX model on the basis of SARIMA [5]. Te SARIMA model can be expressed as (p, d, q) × (P, D, Q) s , where p, d, q, P, D, Q, and s are the orders of the model. Akaike's Information Criteria (AIC) are used to determine the optimal order set for SARIMA: (0, 1, 6) × (0, 1, 1). Figure 7 shows the actual passenger fows and the results predicted using the aforementioned four models. As observed, both the GRU and LSTM can represent the periodic variation in the daily passenger fow on a weekly basis and efectively refect the sudden changes in the passenger fow generated by holidays or new cases. In comparison, the SARIMA and SARIMAX show a relatively poor performance in the case of such sudden changes. Tis suggests that nonparametric models, such as the GRU and LSTM, show better adaptation to external emergency events such as Journal of Advanced Transportation COVID-19 new cases. In addition, the passenger fows predicted by the GRU are closer to the true value, with a mean square error (MSE) of 40.76 million person-time 2 and a prediction accuracy of 95.25% (calculated by where n is the sample size, Y i is the actual value of Sample i, and Y i is the predicted value of Sample i. In contrast, the LSTM has a slightly greater deviation in the prediction of local peaks, with an MSE of 49.21 million person-time 2 and a prediction accuracy of 94.40%, while the MSE values of SARIMA and SARIMAX are much greater, respectively, reaching 194.64 million person-time 2 and 187.24 million person-time 2 . Te results suggest that the GRU has the best prediction performance. To explore the efects of new cases on the prediction performance, a new GRU model was trained and tested on the same dataset after removing "new cases" from the independent variables. Te MSE of the new model (without new cases as the independent variable) was 41.39 million person-time 2 , which was higher than that obtained by the original model (40.76 million person-time) 2 (with new cases as the independent variable). Tis indicates that the prediction performance of the original model is improved when taking new cases as the independent variable.

Partial Dependence Plot (PDP).
Te partial dependence diagrams of the external factors are obtained, and the training and test sets are combined to train the GRU model and plot the partial dependence of the daily passenger trafc on each input variable through PDP. Te premise of the PDP calculation is that the two variables are independent of each other, while two completely independent variables are practically unavailable. In this study, it is considered that the variables that are not strongly correlated with the other input variable can be used in the calculation of the PDP. A lack of strong correlation is defned when the absolute value of Pearson's correlation coefcient is less than 0.5 [29]. After the analysis, both the number of new cases and the weather attribute of each time step satisfy this precondition. Te minimum temperature, maximum temperature, and holiday attribute have strong correlations in their own time series, and the minimum and maximum temperatures of the same time step are highly correlated with each other; therefore, these three variables were excluded from the subsequent analysis. Tables 3 and 4 show the correlations of the daily number of new cases X 1×1 i and the weather attribute X 1×2 i with the other input variables on the i th day, respectively. As shown in Table 3, the table headers represent the time step and feature dimension corresponding to the variable for which the correlation coefcient is calculated with X 1×1 i . Te table contents represent the absolute value of Pearson's correlation coefcient between X 1×1 i and that variable. If t � 2 and n � 2, the corresponding value is 0.04, which means that the correlation coefcient between X 1×1 i and variable X 2×2 i−1 (weather attribute of time step 2, i.e., weather attribute on the (i − 1) th day) is 0.04. From Tables 3 and 4, the correlation coefcients of the variables X 1×1 i and X 1×2 i with the other input variables are less than 0.5, except for the correlation coefcient of 1 with itself, which is low.
Te PDP in Figure 8 shows the relationship between the daily passenger fow and the number of new cases at each time step, where t represents the time step. Based on the data structure shown in Figure 3, for the variable "daily number of new cases," t � 1 corresponds to the number of new cases on that day, t � 2 corresponds to the previous day, and t � 8 corresponds to a week ago.
As shown in Figure 8, there is a negative correlation between the daily passenger fow and the number of new cases, regardless of the time step. Te number of new cases on the current day (t � 1) and the previous day (t � 2) has the greatest efect on daily passenger fow. As the number of new cases increases, the daily passenger fow decreases signifcantly, suggesting that rail transit trips will decrease if there  Hence, the number of new cases yesterday has a greater impact on passenger fow today, while the passenger fow today is slightly less sensitive to the pandemic on that day, suggesting that rail transit on that day will be adjusted to a greater extent based on the pandemic situation yesterday.
As time passes, the number of new cases from the previous day (t � 3, 4, . . . , 8) will have an increasingly small impact on today's passenger fow. Likewise, Figure 9 shows the PDP of the daily passenger fow with respect to the weather attribute at each time step. As shown, the weather attribute of the current day (t � 1) has the greatest efect on the daily passenger fow. With increasing time steps, the efect of the weather attribute on the daily passenger fow decreases. Compared with the case of no rain (weather attribute � 0), the average daily passenger   fow decreases by 207,600 person-times when there is rain (weather attribute � 1), suggesting that rainy days signifcantly reduce the number of rail transit trips.

Conclusions
Based on a GRU neural network model, the daily number of new cases, weather attribute, temperature, holiday attribute, and historical passenger fow were used as input parameters to predict the daily passenger fow in urban railways in the postpandemic era. Te results showed that the GRU neural network can produce an accurate prediction of the daily passenger fow and exhibit faster convergence and lower MSE than the LSTM neural network, consistent with previous studies (Hou et al. [20]), further demonstrating that the GRU, as a variant of the LSTM, retains the ability of the LSTM in dealing with long-term dependence problems, while converging and stabilizing faster owing to its simplifed structure.
Based on the trained GRU neural network model, a partial dependency graph of the daily passenger fow, daily number of new cases, and weather attribute was drawn. Te results showed that (1) daily passenger fow was negatively correlated with the number of new cases. In all the eight time steps, the number of new cases yesterday had the greatest impact on the daily passenger fow. For each additional case, the daily passenger fow decreased by 54600 person-time on average. (2) Te weather attribute of the day also signifcantly infuenced the daily passenger fow. Te daily passenger fow on rainy days decreased by 207600 person-time on average compared with that on nonrainy days.
In the postpandemic era, the previously established daily passenger fow prediction model is no longer applicable; the number of new cases should be incorporated as an infuencing factor to efectively predict and prevent the impact of future pandemic situations on urban rail transit. To the best of our knowledge, this is the frst study to use the daily number of local new COVID-19 cases for daily passenger fow prediction. Te quantitative relationship between the number of new cases and daily passenger fow was investigated using PDP while using a nonparametric model for an accurate prediction. Both the research method and the results provide important references for subsequent studies.
Te shortcomings of the paper and suggestions for future work are explained as follows: (1) as time passes, the public will become more tolerant to health-related emergencies, and the coping mechanisms of society will improve. Whether the number of new cases will have a lesser impact on daily passenger fow and whether the proposed model will be applicable to passenger fow prediction in the long term remain to be verifed. Future studies can consider adopting the methodology of this paper to model and analyze the daily passenger fow in diferent periods and mining the variation laws of the pandemic's impact on the daily passenger fow. (2) Te PDP is limited by the independence assumption, and the quantitative relationship between all the input variables and daily passenger fow cannot be explored accurately. Te impact of the remaining input variables on passenger fow prediction performance can be evaluated in combination with the variable importance calculation methods like attention mechanism and gradient calculation, which can also provide a further insight into the model's prediction behavior. methodology can be further extended to other transportation modes (e.g., intercity rail transit and bus travel) to analyze the impact of the pandemic on urban transportation travel structures.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
Te authors declare that they have no conficts of interest.