Shortterm prediction of passenger flow is very important for the operation and management of a rail transit system. Based on the traditional Kalman filtering method, this paper puts forward three revised models for realtime passenger flow forecasting. First, the paper introduces the historical prediction error into the measurement equation and formulates a revised Kalman filtering model based on error correction coefficient (KFECC). Second, this paper employs the deviation between realtime passenger flow and corresponding historical data as state variable and presents a revised Kalman filtering model based on Historical Deviation (KFHD). Third, the paper integrates nonparametric regression forecast into the traditional Kalman filtering method using a Bayesian combined technique and puts forward a revised Kalman filtering model based on Bayesian combination and nonparametric regression (KFBCNR). A case study is implemented using statistical passenger flow data of rail transit line 13 in Beijing during a onemonth period. The reported prediction results show that KFECC improves the applicability to historical trend, KFHD achieves excellent accuracy and stability, and KFBCNR yields the best performances. Comparisons among different periods further indicate that results during peak periods outperform those during nonpeak periods. All three revised models are accurate and stable enough for online predictions, especially during the peak periods.
With the rapid development of urbanization and motorization in most Chinese large cities, the urban transportation systems are facing more and more serious problems, such as congestion, crashes, and pollution. As an efficient trip mode, rail transit system has played a more and more important role in solving traffic issues. In Beijing, there are a total of 21 lines in operation now, covering a distance of 527.2 kilometers (327.6 miles). During the past decade, the average daily passenger flow has increased dramatically to about 10 million riders. Therefore, the operation and management of the rail transit system, especially realtime operation, is very important.
During peak hours, pedestrian congestion happens frequently. For safe and efficient purposes, the realtime passenger flows, especially predicted flows during the next several time intervals, are key issues for realtime intelligent operation of the rail transit system. However, with the past and current passenger flows detected easily, the future flows are not straightforward. Therefore, the passenger flow forecast method based on statistical data is rather meritorious.
Most recently, Sun et al. [
Existing traffic flow forecast models cover a wide range consisting of Historical Average (HA), Autoregressive Integrated Moving Average (ARIMA), Neural Network (NN), Kalman filtering (KF), nonparametric regression (NR), chaos theory, Support Vector Machine (SVM), and others. The HA model uses a simple timeseries method [
Generally, the above methods can be classified into statistical and artificial intelligence models. Smith and Demetsky [
More recently, some researches proposed new models for multistep prediction [
Among all the above shortterm traffic flow forecast models, the Kalman filtering method is very efficient due to its recursive attribute and is rather convenient for use in rail transit passenger flow predictions. However, existing researches have proved that the traditional KF methods are not accurate and stable enough for online applications. Therefore, this paper will revise the traditional KF methods and propose three revised models.
To predict passenger flow accurately and efficiently, one key feature of the paper is to introduce some error calibration measures or new state variables into classical models and to construct some revised KF forecast models. The second key feature is to integrate some stable methods and formulate an innovative KF prediction model with good accuracy, stability, and robustness.
This paper consists of six sections. Following the Introduction, the basic KF model is described in the second section, including its state transition and measurement equations. Three revised KF models are formulated in the third section, including the KF model based on the error correction coefficient (KFECC), the KF model based on Historical Deviation (KFHD), and the KF model based on the Bayesian combination and nonparametric regression (KFBCNR). Solution algorithms for the NR model, KF model, and Bayesian combination model are designed in the fourth section, respectively. Prediction results using practical statistical passenger flow data are reported and analyzed in the fifth section. Conclusions and some future research directions are summarized in the last section.
The KF model is a kind of state space method consisting of three important parts: state variable, state transition equation, and measurement equation.
In the rail transit passenger flow prediction, the shortterm passenger flow to be forecasted is taken as the state variable directly. In this paper, we employ the passenger flow at the station. Using
Equations (
Since the historical passenger flow data could be collected easily, we can conveniently track the trend of the flow changes. The basic KF model in (
The error correction coefficient
During weekdays, rail transit passenger flows usually change from morning peak hours to nonpeak hours and then to evening peak hours. Therefore, some similar characteristics in the historical forecasting errors are observed. Statistical analyses prove that it can fit a quadratic parabola function:
Equations (
Since the rail transit passenger flow fluctuates dramatically and the magnitude is rather large, the forecasting process of KF model using passenger volume as a state variable directly is not very stable. Further analyses of passenger flows show that the deviation between realtime volume and the corresponding historical data is fairly smooth [
Equations (
Existing researches [
As stated before, the NR model is fairly applicable to uncertain and dynamic transportation systems, and many literatures have demonstrated its accuracy. Therefore, we introduce the NR method into the Bayesian combined model to further improve the prediction effects. Here, the
From (
We further take the NR prediction as the control variable and introduce it into the KF model. Meanwhile, we combine the NR result in interval
Equations (
Based on the adjusted algorithm of Bayesian weights and the results of the NR model, we can finally obtain the forecasted passenger flows.
The NR algorithm mainly consists of five steps: the preparation of historical data, the generation of sample database, the definition of state vector, the searching of
General flow of the NR algorithm.
Detailed algorithm is described as follows.
All historical detected data are prepared for the NR algorithm in this paper.
The prepared historical data are summarized into the sample database, which keeps updating with the forecast process and integrates both realtime data and historical data. The quality of the sample database greatly influences the performance of the NR model.
Rail transit passenger flows are different from link traffic volumes; that is, there are no upstream or downstream links. However, when forecasting the station, some other stations near it will influence the arrival and distribution characteristics of its passenger flow. Therefore, we introduce the correlation analysis between target station and other stations. The number of correlative stations is determined by the correlation coefficient
Using
For the autocorrelation coefficient, we decompose the timeseries of passenger volumes of the target station,
Here,
Euclidean distance is employed as the index to determine the
The prediction function is presented as in the following equation:
Using the above five steps, we can implement the NR algorithm and obtain the prediction results from the NR model. The above algorithm is coded using M language of the MATLAB platform.
For the purpose of accuracy and efficiency, a sequential KF algorithm is employed to solve three revised KF models, which is illustrated in detail in our previous work [
The key issue of Bayesian combination is weight of each submodel, which is decided logically according to the error comparisons of two single forecast methods.
Based on the historical prediction results and corresponding historical detection data, we can obtain the forecast errors of the KF and the NR models, respectively. Here, the mean absolute percentage error (MAPE) is employed to denote forecast errors, as below:
Furthermore, we denote the historical MAPE of KF and NR models by
To further incorporate the influences of current forecasting errors, we denote the current MAPE of the KF and the NR models by
Then, the posterior probabilities [
Based on (
Equations (
We collected the bus Smart Card Data (SCD) of line 13 of Beijing in the whole month of November 2013 and extracted the passenger volumes of 15 stations in every minute from such SCD information for a case study. According to the unified numbering rules of Beijing rail transit system, these 15 stations are named 21, 23, 25, 27, 29, 33, 35, 37, 39, 41, 43, 45, 47, 49, and 51, respectively. The operation period of line 13 is from 4:55 a.m. to 23:50 p.m. For application purpose, original data were aggregated to five minutes. Therefore, we totally have 228 time intervals. Passenger flows of station number 25 on November 28 (Thursday) were taken as the prediction target.
Using the above data, we implemented the KF model, the NR model, and the three proposed revised KF models and derived the prediction results of all five models, respectively.
The state vectors are decided based on the correlation coefficient
The
MPE (mean percentage error):
RMSE (root mean square error):
NRMS (normalized root mean square error):
Other symbols in (
The error statistics of MAPE, MPE, RMSE, and NRMS in case of different
Prediction error statistics of NR model.

MAPE  MPE  RMSE  NRMS 

1  18.7%  8.2%  16.3  17.2% 
2  19.7%  −3.4%  10.2  12.0% 
3  21.4%  −8.6%  14.3  16.5% 
4  24.0%  10.4%  21.5  25.2% 
5  26.9%  −14.9%  27.4  29.1% 
From Table
All information needed in the three revised KF models is extracted from the database. As stated before, the error correction coefficient
Obviously, it is a quadratic parabola formulation.
In the revised KFBCNR model, the historical data is necessary for the Bayesian weights. Here, information of November 21, the same Thursday during the previous week, is employed to get those weights.
Prediction results of the KF, NR, revised KFECC, revised KFHD, and revised KFBCNR models during the whole day are all reported in Table
Prediction error statistics of five models.
Model  MAPE  MPE  RMSE  NRMS 

KF  38.8%  −33.5%  52.0  60.2% 
NR  19.7%  −3.4%  10.2  12.0% 
KFECC  27.8%  −14.9%  33.0  38.2% 
KFHD  20.5%  7.5%  11.5  13.3% 
KFBCNR  18.1%  4.1%  10.2  11.9% 
From Table
To compare the performances of traditional models and three revised models during different periods, the evaluation indices during morning peak hours (7:00–9:00), nonpeak hours (11:00–13:00), evening peak hours (17:00–19:00), and the whole day (4:55–23:55) are further extracted and summarized in Table
Prediction error statistics of five models during different periods.
Error indices  Models  

KF  NR  KFECC  KFHD  KFBCNR  
MAPE (%)  
Morning peak  35.5  10.4  12.9  10.2  8.2 
Nonpeak  39.7  15.7  27.2  16.4  13.4 
Evening peak  36.9  3.5  23.3  6.0  4.9 
Whole day  38.8  19.7  27.8  20.5  18.1 
MPE (%)  
Morning peak  −35.5  −3.3  −8.2  4.5  0.8 
Nonpeak  −39.7  −9.3  −26.6  −0.8  −4.9 
Evening peak  −36.9  −1.6  −23.3  2.6  0.6 
Whole day  −33.5  −3.4  −14.9  7.5  4.1 
RMSE  
Morning peak  33.4  11.3  13.6  9.9  7.6 
Nonpeak  16.9  8.2  12.7  7.3  6.9 
Evening peak  134.9  14.0  85.1  22.5  21.1 
Whole day  52.0  10.2  33.0  11.5  10.2 
NRMS (%)  
Morning peak  38.4  13.0  15.7  11.4  8.7 
Nonpeak  44.4  21.5  33.3  19.1  18.1 
Evening peak  38.8  4.0  24.5  6.5  6.1 
Whole day  60.2  11.8  38.2  13.3  11.9 
Graphical illustrations of these prediction results and errors during different periods are further described in Figures
Prediction results in morning peak hours.
Prediction errors in morning peak hours.
Prediction results in nonpeak hours.
Prediction errors in nonpeak hours.
Prediction results in evening peak hours.
Prediction errors in evening peak hours.
Prediction results in the whole day.
Prediction errors in the whole day.
A further comparison of the prediction errors among all five models is illustrated in Figure
Comparison of prediction errors for different models and periods.
From the above predictions, one can find out the following results:
All the three revised KF models are fairly accurate for shortterm rail transit passenger flows prediction. The revised KFECC model gets better results than the traditional KF model, due to the introduction of the error correction coefficient. The revised KFHD model further outperforms the KFECC model, because employing Historical Deviation as state variable improves its accuracy. Integrating Bayesian combination and the NR methods, the revised KFBCNR model yields the best accuracy among all three revised KF models.
Concerning the capability of tracking the dynamic characteristics of realtime passenger flows, the three revised KF models also outperform the original KF method. Again, the revised KFBCNR model improved the stability significantly and yields the best result.
As a nonlinear regression method, the NR model gets much better results than the original KF model. It is even more accurate than the revised KFECC model in some cases. However, the revised KFBCNR is still the most excellent model.
The comparisons among different periods show that the prediction performance during peak hours is much better than during nonpeak hours. The intrinsic reason is that the passenger volumes during peak hours are much bigger than those during nonpeak hours, and the fluctuations of passenger flows during peak hours are much weaker than those during nonpeak hours. Moreover, the much big magnitude of passenger volume during peak hours also reduces some error indices, for instance, MAPE, MPE, and NRMS, because of the sum of actual passenger flows in the denominator.
Prediction results during evening peak hours are the most accurate in all cases, with the MAPE at just 4.9% and the NRMS at just 6.1%. The direct reason is that the passenger volume during this period is the highest and the most stable among all the time intervals.
Evaluation indices for the whole day are not very satisfying, because the passenger volumes during early morning and evening are very low and unstable, which can be seen from Figure
Generally, all the three revised KF models are rather accurate and stable for online applications, especially during the very important peak hours.
This paper addresses three revised Kalman filtering models regarding shortterm rail transit passenger flow prediction: the revised KFECC model, the revised KFHD model, and the revised KFBCNR model. We first present a revised KFECC model by introducing the historical prediction error into the measurement equation through an error correction coefficient. Since the original state variable fluctuates dramatically, we further employ the deviation between realtime passenger volume and corresponding historical data as a new state variable and derive a revised KFHD model. For more accurate prediction, we integrate both the Bayesian combination technique and the nonparametric regression method into the traditional KF model and formulate a revised KFBCNR model. The bus Smart Card Data of line 13 of Beijing during onemonth period are collected for case study. The reported prediction results based on the practical data indicate that all three revised models are much more accurate and stable than traditional methods. Moreover, the revised KFHD model outperforms the KFECC method, and the revised KFBCNR model yields the best performance. Further comparisons among different periods show that predictions during peak hours are much more accurate than those during nonpeak hours, and forecast results during evening peak hours are the most excellent ones. Since peak hours are more important for rail transit operation and management, all three revised KF models proposed in this paper are accurate and stable enough for online applications.
Future potential research directions mainly consist of the following aspects. The first is to transform the three revised KF models to a shortterm traffic flows forecast and to testify their applicability. The second is to further revise the models and algorithms for applications in the whole rail transit system or largescale road networks. The third is to explore the inherent interrelations among dynamic passenger volume, realtime urban travel demand, and rail network structure and to propose more logical prediction models based on dynamic travel demand analysis.
The authors declare that there are no competing interests regarding the publication of this paper.
This research is supported by the National Natural Science Foundation of China Project (51578040, 51208024), Beijing Nova Programme (Z151100000315050), Beijing Natural Science Foundation Project (8162013), and the Importation and Development of HighCaliber Talents Project of Beijing Municipal Institutions (CIT&TCD201404071).