Short-Term Passenger Flow Forecast of Rail Transit Station Based on MIC Feature Selection and ST-LightGBM considering Transfer Passenger Flow

To solve the problems of current short-term forecastingmethods for metro passenger flow, such as unclear influencing factors, low accuracy, and high time-space complexity, a method for metro passenger flow based on ST-LightGBM after considering transfer passenger flow is proposed. Firstly, using historical data as the training set to transform the problem into a data-drivenmulti-input single-output regression prediction problem, the problem of the short-term prediction of metro passenger flow is formalized and the difficulties of the problem are identified. Secondly, we extract the candidate temporal and spatial features that may affect passenger flow at a metro station from passenger travel data based on the spatial transfer and spatial similarity of passenger flow. *irdly, we use a maximal information coefficient (MIC) feature selection algorithm to select the significant impact features as the input. Finally, a short-term forecasting model for metro passenger flow based on the light gradient boosting machine (LightGBM) model is established. Taking transfer passenger flow into account, this method has a low space-time cost and high accuracy. *e experimental results on the dataset of Lianban metro station in Xiamen city show that the proposed method obtains higher prediction accuracy than SARIMA, SVR, and BP network.


Introduction
In recent years, China's economy has developed rapidly, and the process of urbanization has gradually accelerated. e country has continuously increased its efforts to build public transportation. Among them, urban rail transit is particularly noticeable as a new direction in the field of public transportation. Urban rail transit has the advantages of strong carrying capacity, a high punctuality rate, energy conservation, and environmental protection [1]. e development of urban rail transit is considered an effective way to alleviate the urban traffic congestion. Hence, it is the future trend of China's urban transportation development to establish a comprehensive transportation system with urban rail transit as the backbone, public transport as the main body, and various modes of transportation interconnected. By the end of 2019, rail transit has been built in 40 cities in China, and the total mileage of metro construction has reached 6736.2 km [2]. Passenger flow prediction not only plays a guiding role in the planning and design of rail transit but also plays an irreplaceable role in the operation of rail transit. e most commonly used passenger flow prediction method is the four-stage method [3,4], which consists of four parts: travel generation, travel distribution, travel mode split, and travel assignment. It is a macrolevel prediction method. e first city, which actually used this method for traffic prediction in 1962, was Chicago. It is very suitable for the long-term prediction of passenger flow and is of great significance for the planning of rail transit networks, the construction of engineering projects, and the selection of station equipment. However, long-term passenger flow forecasting cannot solve the problems arising from the daily operation of the rail transit. With the development of rail transit, most people choose rail transit as their main travel mode, which has directly led to the rapid growth of rail transit passenger flow. is has led to problems such as passenger congestion, low operating efficiency, unbalanced capacity and demand, and poor driving safety [5,6]. erefore, we must adopt more accurate short-term passenger flow forecasting method to scientifically forecast short-term passenger flow. rough short-term passenger flow forecasting, we can obtain passenger travel data for a short period of time in the future so as to grasp the accurate passenger flow change trend and provide the basis for the organization and management of the operation department (e.g., it can help the operation department to realize the dynamic adjustment of the rail transit capacity in the peak hours, the reasonable scheduling of service personnel, and the timely treatment of emergencies). In addition, short-term passenger flow forecasting can improve the operation efficiency of rail transit, reduce the time cost of passengers, and improve passengers' satisfaction, thus improving the level of public service of rail transit and increasing its competitiveness. However, the influencing factors of short-term passenger flow at metro station are intricate. And short-term passenger flow has the characteristics of nonlinearity, nonstationarity, randomness, and suddenness, which makes the prediction more difficult. Using the data-driven method to solve short-term forecasting problems is proven to be to be an effective way [7,8]. LightGBM is a new boosting framework model that was proposed by Microsoft in 2015 [9]. It has a fast training speed, low memory consumption, can process massive data quickly, and has better model accuracy, which are suitable for solving the short-term passenger flow forecast problem of rail transit. e research purpose of this study is to forecast shortterm passenger flow of a metro station. e main contributions and novelty of this paper are as follows: (1) In order to supplement the lack of scientific analysis of short-term metro passenger flow prediction problem, we formally describe the problem based on the data-driven model and analyze the difficulties of the problem to better describe the complexity of short-term metro passenger flow prediction. (2) In order to overcome the problems of feature incompleteness and high cost of feature acquisition in traditional methods, we use temporal features, spatial similarity features, and spatial transfer features extracted from IC card data as the candidate influence features, which are more comprehensive and easy to obtain. (3) In order to solve the problem of heavy computational burden caused by excessive input features, the candidate features are further selected by using a maximal information coefficient (MIC) feature selection algorithm to extract the significant features, which reduces the dimension of the features and reduces the computational cost. (4) In order to solve the problems that the existing methods cannot reflect the uncertainty of shortterm passenger flow and the prediction accuracy is not high enough, we use the integrated learning algorithm LightGBM as a prediction model to describe the nonlinear characteristics of short-term passenger flow and improve the prediction accuracy. (5) e experimental results on the dataset of Lianban metro station in Xiamen city show that the proposed method obtains a higher prediction accuracy than SARIMA, SVR, and BP network.

Related Work
At present, many scholars have conducted a great deal of research on the prediction of short-term passenger flow. e historical average model was the first method applied to traffic flow prediction [10]. However, it is difficult for the historical average regression model to reflect the randomness of passenger flow. It requires strong stability and periodicity of data, which leads to its harsh application conditions. us, the performance of the historical average model in the research of El Esawey [11] and Yang et al. [12] was not good. e Kalman filtering [13] model is also one of the commonly used passenger flow prediction methods. Jiao et al. [14] proposed three improved Kalman filter models for the short-term prediction of rail transit passenger flow and achieved good prediction results. e time series model is a classic model for passenger flow prediction [15]. Milenković et al. [16] predicted railway passenger flow using the autoregressive integrated moving average model (ARIMA), which achieved good prediction results. Anvari et al. [17] constructed a time series prediction framework for a public transport system based on the Box-Jenkins method which included the ARIMA model. Li et al. [18] proposed a hybrid model that combined a symbolic regression model and ARIMA model to predict the passenger flow of Xian rail line 1. e prediction results showed that the hybrid model has better prediction accuracy than the simple ARIMA model. With the rise of machine learning, a nonparametric regression model based on data was applied to the study of short-term passenger flow prediction. Regarding the support vector machine (SVM) model [19,20], Sun et al. [21] forecasted transfer passenger flow for Beijing rail transit by setting a wavelet SVM model. For the K-nearest neighbor (K-NN) regression model [22], Habtemichael and Cetin [23] proposed a nonparametric and data-driven methodology for short-term traffic forecasting based on identifying similar traffic patterns using an enhanced K-NN algorithm. Regarding the Bayesian network model, Roos et al. [24] proposed a method based on a dynamic Bayesian network to predict the short-term passenger flow of the Paris Metro, 2 Scientific Programming which can work normally even when the data are incomplete. For the neural network model [25,26], Zhu et al. [27] constructed a three-layer neural network to predict the outbound and inbound passenger flow of a metro station by analyzing the main dynamic factors that affect passenger flow in a rail transit station. e prediction accuracy was higher than the traditional linear regression method. Liu and Chen [28] used SAE to extract the nonlinear characteristics of the input and constructed a hybrid model (stacked autoencoder-deep neural network, SAE-DNN) to predict passenger flow in BRT stations. Chen et al. [29] constructed a long short-term memory network prediction model for rail transit passenger flow based on empirical mode decomposition. Liu et al. [30] used deep learning architecture to predict the outbound passenger flow of the research station according to the arrival schedule of the rail train and the inbound passenger flow of other stations. Han et al. [31] used the graph convolution to mine the temporal and spatial dependence of each station and proposed a short-term passenger flow prediction model for rail transit based on spatial-temporal graph convolutional neural networks. Both methods only take into account the spatial correlation of stations within the rail transit system and ignore the impact of transfer effects between other public transport modes (i.e., conventional bus transit and bus rapid transit (BRT)) and rail transit [32]. e historical average model cannot reflect the uncertainty caused by the change of passenger flow very well, so the prediction result error is relatively large. e Kalman filtering model requires many parameter vector calculations, which makes its operation complicated. When passenger flow fluctuates greatly, the time series model ARIMA cannot effectively capture the trend of passenger flow. e SVM and K-NN models have a high time complexity and cannot adapt to large-scale training data. e network construction process of the Bayesian network model is complex. e neural network model convergence speed is slow, it falls easily into the local solution, and it has a high demand for training data.
Recently, an integrated learning algorithm was also applied to the prediction of rail transit passenger flow and achieved a good effect [33]. LightGBM is an open-source, fast, and efficient boosting framework based on a decision tree algorithm, which is based on the idea of gradient boosting. LightGBM supports efficient parallel training and achieves good results in regression and classification problems [34][35][36][37], which is very suitable for this field. In this study, a spatial-temporal feature extraction method that considers transfer passenger flow is proposed, and a metro station passenger flow prediction model based on LightGBM is constructed. e remainder of this article is structured as follows. In Section 3, a formal description of the problem of metro passenger flow prediction is presented. In Section 4, a spatial-temporal feature extraction method and passenger flow prediction model are introduced. In Section 5, experimental research based on Xiamen (a city at the southeast end of Fujian Province, China) public transport data is introduced, and the experimental results and model performance are evaluated.

Related Definitions
(1) Rail Transit. e general term of fast and large volume public transportation with electric energy as power and wheel-rail as transportation system (this study refers to the metro).
(2) Metro Station. A place to provide a stop for metro trains to carry goods or passengers.
(3)Yitong Card. A kind of intelligent card which can be used in public transportation payment system. (4) BRT QR Code. A kind of QR code which can be used in BRT payment system. (5) Metro QR Code. A kind of QR code which can be used in metro payment system. (6) BRT One-Way Ticket. A kind of anonymous BRT ticket sold by automatic ticket vending machine, which is swiped once before entering the station and needs to be put into the recycling hole before leaving the station. (7) Metro One-Way Ticket. A kind of anonymous metro ticket sold by automatic ticket vending machine, which is swiped once before entering the station and needs to be put into the recycling hole before leaving the station.

Introduction to the Composition of the Data Dictionary.
Having sufficient data is the basis for forecasting. With the rapid development of passenger data acquisition technology, sufficient data can be obtained for the short-term prediction of passenger flow. e Xiamen public transport system is considered as an example. During the study period, there were six main types of passenger payment in Xiamen: "Yitong card," "Coin payment," "BRT QR code," "BRT oneway ticket," "Metro QR code," and "Metro one-way ticket." Conventional bus transit supports the two payment methods of "Yitong card" and "Coin payment." BRT supports the three payment methods of "Yitong card," "BRT QR code," and "BRT one-way ticket." Rail transit supports the three payment methods of "Yitong card," "Metro QR code," and "Metro one-way ticket." Hence, we counted the rail transit passenger flow using the data of the "Yitong card," "Metro QR code," and "Metro one-way ticket." From the above Scientific Programming description, "Coin payment" can only be used for conventional bus transit; "BRT QR code" and "BRT QR code" can only be used for BRT; "Metro QR code" and "Metro oneway ticket" can only be used for the metro; and the "Yitong card" is the only universal payment method for the three modes of transportation (i.e., conventional bus transit, BRT, and rail transit). Additionally, the "Yitong card" has the property of a unique physical card number that corresponds to a unique passenger. erefore, we can only use the "Yitong card" to identify transfer passenger flow. Additionally, we regard transfer passenger flow as one of the influencing factors in the subsequent section. Table 1 is an introduction to the travel data records: ID, otime, ostation, dtime, dstation, date, type, and public transport are the attributes that denote the card identification, origin time, origin station, destination time, destination station, date, payment type, and travel mode (rail transit, BRT, or conventional bus transit), respectively.

Formal Description of the Passenger Flow Prediction Problem in a Metro Station Based on the Data-Driven and
Multiple Regression Model. Let j be the target metro station, Δt be the prediction time interval (e.g., 10, 20, or 30 minutes), and x in j,t be the inbound passenger flow of station j in target time period t. First, the feature set of the spatialtemporal influencing factors is determined and expressed as Te � te 1 , te 2 , . . . te i , . . . , te n }, where te i represents the i th spatial-temporal influencing feature. It is used as the input to the model and x in j,t is the output of the model. Historical data are used as training data for the multi-input single-output regression model. e regression prediction model of metro station passenger flow is trained, with e problem model is shown in Figure 1.

Difficulties of the Problem
(1) ere are many factors that influence the short-term passenger flow of a metro station. Under the background of the integration of public transport, all types of public transport modes are bound together. Passenger flow in a metro is not only affected by its own system but also by other public transport modes. How to use existing data to extract and select the significant influencing factors from the spacetime dimension is an important issue. (2) e relationship between the influencing factors and short-term passenger flow is complex and nonlinear.
To improve the prediction accuracy, it is also necessary to select a suitable model to express the nonlinear relationship between the influencing factors and passenger flow.

Short-Term Passenger Flow Forecast of Rail Transit Station Based on MIC Feature Selection and ST-LightGBM considering Transfer Passenger Flow
Metro station passenger flow forecasting is a complex problem in time and space. us, this section is divided into four parts: the first part is the extraction of the candidate temporal and spatial features that affect the inbound passenger flow of the metro station, the second part is the selection of candidate spatial-temporal features using the MIC algorithm, the third part is the introduction of the prediction model based on LightGBM, and the final part is the theoretical analysis and comparison of the proposed method and other methods.

Spatial-Temporal Feature Extraction
Let j be the target metro station, Δt be the prediction time interval (e.g., 10, 20, or 30 minutes), x in j,t be the inbound passenger flow of station j in target time period t, day t be the "weekly information" (i.e., Monday, Tuesday, . . ., Sunday) in target period t, and hour t be the hour of the day that corresponds to target period t. Because passenger flow changes in a metro station during a week are different (e.g., working days and nonworking days) and passenger flow changes in a day are also different (e.g., peak hours and off-peak hours), x in j,t also changes with the changes of day t and hour t . Additionally, passenger flow has the property of time delay. us, historical inbound passenger flow is correlated with that of the current period. erefore, the historical passenger flow set His j,t � x in j,t−k , x in j,t−k+1 , . . . , x in j,t−1 } is another time feature that affects x in j,t . Finally, three temporal features are extracted: day t , hour t , and His j,

Spatial Feature Extraction
(1) Spatial Similarity Feature Extraction. Because the land function of the space in which adjacent stations are located is similar, the travel habits (i.e., departure time) of passengers in these adjacent stations are similar. Hence, there is spatial similarity between the passenger flow of a metro station and adjacent stations (i.e., adjacent conventional bus stations, BRT stations, and metro stations). erefore, the current inbound passenger flow of a metro station is also related to the historical inbound passenger flow of adjacent stations. Suppose that the target metro station j has n adjacent metro stations and m adjacent bus stations (i.e., BRT and conventional bus stations). en, the spatial similarity features of the passenger flow at the metro station can be represented by the adjacent station history inbound passenger flow matrix (ASHIM). Select the historical inbound passenger flow in the past k periods. en, the size of the ASHIM is k × (n + m), and it can be denoted by where x in jr(n),t−k is the inbound passenger flow of the n th adjacent metro station of target metro station j at time period t − k and x in jb(m),t−k is the inbound passenger flow of the m th adjacent bus station of the target metro station j at time period t − k.
(2) Spatial Transfer Feature Extraction. Passengers have transfer behavior in travel activities. us, some passengers may transfer to the rail system by other travel modes (BRT and conventional bus transit). Specifically, some passengers will transfer to an adjacent metro station after leaving the bus or BRT station and continue to travel by rail transit. erefore, for metro station j, a proportion of passengers in the outbound passenger flow of the adjacent conventional bus and BRT stations in the several previous periods will transfer to metro station j at time period t and then continue to complete the travel activities by rail transit. Hence, the metro station's inbound passenger flow at the current period is also related to the transfer passenger flow from the historical outbound passenger flow of the adjacent BRT and conventional bus stations. According to the outbound historical passenger flow of m adjacent bus stations (i.e., BRT and conventional bus stations) in the past k periods, we can obtain the outbound passenger flow matrix of the adjacent bus stations, i.e., adjacent bus station history outbound passenger flow   . . .
where x out jb(m),t−k is the outbound passenger flow of the m th adjacent bus station of target metro station j at time period t − k. In the analysis, we obtain the outbound passenger flow of each adjacent bus station in the previous period. However, as time period t has not yet occurred, for x out jb(m),t−k , we do not know what proportion of the passenger flow will transfer to metro station j at time period t. To solve this problem, we set up the transfer ratio matrix (TRM) according to the historical average transfer ratio. e size of the TRM is k × m, and it can be expressed as where transfer jb ( . . .
By adding all the elements of TPM, we can obtain the total number of transfer passengers All_transfer j,t that is transferred from all the adjacent bus stations to metro station j in time period t. We obtain the spatial transfer feature All_transfer j,t .
Finally, we extract the candidate temporal and spatial features that are composed of the candidate feature set Te: day t , hour t , His j,t � x in j,t−k , x in j,t−k+1 , . . . , x in j,t−1 , ASHIM j,t , and All_transfer j,t .

Feature Selection Based on the Maximal Information Coefficient (MIC).
In the previous section, we constructed the candidate spatial-temporal features of passenger flow prediction and obtained a comprehensive set Te of candidate features. Feature selection can solve the problem of heavy computational burden caused by excessive input features [38]. To make passenger flow prediction more effective, we need to select more important features from set Te and obtain a simplified feature input so that the subsequent 6 Scientific Programming learning process only needs to establish a model based on the important features. e performance of an embedded and wrapped feature selection algorithm is closely related to the learner. e algorithm is easy to overfit and has high time complexity and poor interpretability. us, we choose the filter feature selection algorithm MIC [39]. Compared with other filter feature selection methods, the MIC algorithm can widely measure dependence between variables, such as linear and nonlinear relations, even for nonfunctional dependence, which cannot be represented by a single function (e.g., dependence composed of multiple functions). Additionally, as a filtering feature selection algorithm, the execution efficiency is high, so we choose MIC as the feature selection method. e MIC is mainly calculated using mutual information and grid division. Mutual information is an indicator that measures the correlation between variables. Given variables A � a i , i � 1, 2, . . . , n and B � b i , i � 1, 2, . . . , n , n is the number of samples. Mutual information is defined as follows: where pro(a, b) is the joint probability density of A and B and pro(a) and pro(b) are the edge probability densities of A and B, respectively. Histogram estimation is used to estimate the above probability density. Suppose D � (a i , b i ), i � 1, 2, . . . , n is a finite set of ordered pairs. Define division G to divide the range of variable A into x segments and divide the range of B into y segments. us, G is an x × y grid. Calculate the mutual information MI(A, B) in each grid division. ere are many ways to divide the grid into x × y, and the maximum value of MI(A, B) in each way is taken as the mutual information value of G. Define the maximum mutual information formula of D under division G as follows: where D|G indicates that data D is divided by G. Use the maximum normalized MI values obtained under different divisions to form the feature matrix, which is defined as en, the MIC is defined as where B(n) is the upper limit value of grid division x × y. Generally, Reshef et al. [39] suggested that B(n) � n 0.6 is best.
We use the MIC to define the correlation between the features and target value. e candidate feature set is Te � te 1 , te 2 , . . . , te i , . . . , te n . e correlation between any feature te i and target value is defined as MIC(te i , target value). e value range is [0, 1]. e larger the MIC(te i , target value) value, the stronger the correlation between te i and target value, and te i is a strong correlation feature. e smaller the MIC(te i , target value) value, the weaker the correlation between te i and target value, and te i is a weak correlation feature.
A flowchart for feature selection is shown in Figure 2. rough the MIC feature selection algorithm, we obtain the significant feature set Te ′ .

ST-LightGBM Passenger Flow Prediction Model.
LightGBM is an open-source, fast, and efficient lifting framework based on a decision tree algorithm, which supports efficient parallel training and can greatly shorten the training time. e idea of gradient boosting is to iterate variables once, increase the submodels individually in the process of iteration, and ensure that the loss function is constantly reduced. Let f i (X) be the submodel, F n (x) � z 0 f 0 (x) + z 1 f 1 (x) + · · · + z n f n (x) be the composite model, and Loss[F n (x), Y] be the loss function. Every time a new submodel is added, the loss function decreases toward the gradient of the variable with the next highest information content . e gradient boosting decision tree (GBDT) is a classical model. GBDT has the functional characteristics of gradient boosting and decision tree and has the advantages of achieving good prediction results and is not easy to overfit. However, when calculating the information gain, it needs to scan all samples to determine the best partition point, which consumes a great deal of computing time. LightGBM is a type of GBDT that is used to solve the problems encountered by GBDT in massive data processing. It consists of two algorithms: gradientbased one-side sampling (GOSS) and exclusive feature bundling (EFB) to optimize GBDT. GOSS [9] was proposed to prove that the larger the gradient of samples, the more important the role they play in calculating information gain to obtain quite accurate information gain estimates from a small number of samples. e core idea of the GOSS algorithm is to select some samples with a large gradient from the total samples, select some samples randomly from the remaining samples, and combine them into new samples to learn a new classifier. is method makes the distribution of the new samples consistent with the total samples and trains the data of small gradient samples. erefore, under the premise of not changing the distribution of samples, the accuracy of classifier learning is not lost and the speed of classifier learning is greatly reduced. EFB [9] is an algorithm that can reduce the number of features of high-dimensional data and minimize the loss. It binds nonzero features in sparse feature space together to form a feature and then establishes the same feature histogram as a single feature from the feature binding. us, the training of GBDT can be accelerated in the case of lossless accuracy.
Simultaneously, LightGBM adopts the method of leaf splitting, which has a low calculation cost. By controlling the depth of the tree and the minimum amount of data of each leaf node, it avoids the overfitting phenomenon. LightGBM chooses the decision tree algorithm based on a histogram, which can reduce the storage cost and calculation cost. Additionally, the processing of category features also improves LightGBM performance for specific data. e framework of the proposed method is shown in Figure 3. As we can see from the plot, first, we extract temporal features from multisource traffic data. Second, according to the spatial location of the metro station, we extract spatial similarity features and spatial transfer features from the data. ird, we use the MIC algorithm to select the significant features. Finally, we establish an ST-LightGBM passenger flow prediction model to predict the inbound passenger flow of a metro station in a real-world scenario.

Scalability of the Proposed Method
(1) is method can be applied to the inbound passenger flow prediction of any metro station. (2) is method can also be applied to the prediction of inbound passenger flow of conventional bus stations and BRT stations.
(3) is method is not limited by the region and can also be applied to other cities. (4) is method cannot be applied to the prediction of passenger flow at rail stations under the impact of emergencies, such as sudden bad weather (e.g., rainstorm, flood or typhoon, etc.), terrorist attacks, traffic accidents, and metro accidents. (5) e application of this method is limited to stationlevel prediction, not applicable to line-level or citylevel prediction. (6) is method is only suitable for short-term prediction, not when the metro station surrounding environment changes.

Limitations of the Proposed Method
(1) e candidate features need to directly or indirectly reflect the factors that affect the passenger flow at rail stations. If there are important factors missing, such as transfer passenger flow, the accuracy of model prediction will be reduced. (2) It is necessary to collect enough historical data as the training dataset to train the short-term prediction model. If the historical data are insufficient, inaccurate, or noisy, the accuracy of the prediction model will be reduced.  (3) e threshold of MIC algorithm and ST-LightGBM model superparameters will affect the accuracy of experimental results. It is necessary to adjust parameters in advance according to different objects. Improper selection of parameters will lead to a low accuracy. (4) e process of feature extraction is complex, especially the feature extraction of spatial transfer features. (5) When predicting passenger flow at rail stations, it is necessary to use the MIC algorithm to further select candidate features to determine the input of the model. is process is complicated.

eoretical Analysis and Comparison of Methods.
A comparison of various rail passenger flow prediction methods is shown in Table 2. Compared with other methods, the features extracted by the proposed method are more comprehensive. Particularly, it considers the impact of transfer passenger flow, which plays an important role in the prediction of metro station passenger flow. Furthermore, the proposed method has higher prediction accuracy and efficiency than other methods.

Experimental Object and Dataset Description.
Lianban metro station (as shown in Figure 4) is an important passenger flow point of Xiamen rail line 1, with a large and stable passenger flow. erefore, we chose Lianban metro station as the research object. Taking 1,000 meters as the boundary condition, we selected 14 adjacent stations with a stable passenger flow. e adjacent metro stations of Lianban metro station are Hubin East Road metro station and Lianhualukou metro station; the adjacent BRT stations are BRT Lianban station and BRT Huoche station; and the adjacent conventional bus stations are Lianban Book City station, Lianjingerli station, Siming Court station, Lianbanguomao station, Lvjiayuanxiaoqu station, Lianbanbei station, Fengyulu station, Huoche station, Huming station, and Huminglijing station.
We considered Xiamen residents' travel data from November 1, 2018, to November 25, 2018, as the experimental data. e prediction time interval was Δt � 10 minutes. ere are 144 pieces of data in one day. Hence, there are 3600 sample data.

Evaluation Methods and Indicators.
To analyze and compare the prediction effect of each experiment, we use 5fold cross-validation to get the average error. e number of training samples H is 2880, and the number of test samples W is 720.
We used two well-known error evaluation indices: mean absolute error (MAE) and mean square error (MSE). e calculation formulas are e lower the values of MAE and MSE, the higher the prediction accuracy of the model.

Parameter Settings.
ere are 48 candidate spatialtemporal features in total. All candidate spatial-temporal features and their corresponding MIC values are shown in Table 3. According to Figure 5, the MIC threshold is 0.7. We selected the candidate spatial-temporal features    Max_depth is 11 and Num_leaves is 1024. To control overfitting, Min_data is 12.

Experimental Results
Prediction Effect of the SARIMA Model. e prediction results of the model are shown in Figure 6, with the MAE value of 8.28 and MSE value of 164.06 (prediction results of a random fold). Prediction Effect of the SVR Model. e prediction results of the model are shown in Figure 7. Without feature selection, as shown in Figure 7(a), the MAE value is 9.50 and MSE value is 170.67. With feature selection, as shown in Figure 7 e prediction results of the model are shown in Figure 9. Without feature selection, as shown in Figure 9(a), the MAE value is 6.95 and MSE value is 118.36. With feature selection, as shown in Figure 9(b), the MAE value is 5.77 and MSE value is 86.10 (prediction results of a random fold).

Analysis of the Experimental Results.
e experiment results of the algorithms are shown in Table 4. e proposed ST-LightGBM achieved better performance than SARIMA, SVR, and BP network. Moreover, with feature selection, the proposed model achieved higher accuracy than the other models.
(1) As shown in Figure 10, we can see that the training time of ST-LightGBM is less than that of BP and SVR models, but longer than that of the SARIMA model, and so is the prediction time. is shows that the method has high computational efficiency and can be used in practical applications.

Scientific Programming
(2) As shown in the second, third, and last rows of Table 4, with feature selection, the prediction accuracy of the models improved.

Conclusion and Future Work
We proposed a spatial-temporal LightGBM metro station passenger flow prediction model considering transfer passenger flow. Compared with previous research methods, this method considers the temporal and spatial features that affect inbound passenger flow in a metro station. Particularly, in terms of spatial features, we introduced the concept of spatial similarity and spatial transfer and established the ASHIM feature matrix and TPM feature matrix. us, the spatial influence factors were considered more comprehensively. Additionally, we used an MIC feature selection algorithm to obtain the important features; hence, the model input was simplified. Moreover, compared with other methods, the prediction accuracy of this method was also higher, so the proposed method has better applicability for the short-term prediction of metro station inbound passenger flow.  In future work, we will use scientific feature extraction methods [43] to further extract effective features from massive data. At present, it is difficult to further improve the prediction accuracy of the existing single model. In future work, we can further consider combining fast clustering algorithms [44][45][46] and other machine learning or deep learning models to establish a combined prediction model to further improve prediction accuracy. Moreover, we can combine distributed algorithms [47,48] to improve the prediction efficiency of the model.

Data Availability
e data used to support the findings of this study are available in [42].

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this article.