Three Revised Kalman Filtering Models for Short-Term Rail Transit Passenger Flow Prediction

1Beijing Urban Transportation Infrastructure Engineering Technology Research Center, Beijing University of Civil Engineering and Architecture, Beijing 100044, China 2Institute of Transportation Engineering, Tsinghua University, Beijing 100084, China 3Parsons Transportation Group, 100 Broadway, New York, NY 10005, USA 4New Jersey Department of Transportation (NJDOT), 1035 Parkway Avenue, Trenton, NJ 08625, USA


Introduction
With the rapid development of urbanization and motorization in most Chinese large cities, the urban transportation systems are facing more and more serious problems, such as congestion, crashes, and pollution.As an efficient trip mode, rail transit system has played a more and more important role in solving traffic issues.In Beijing, there are a total of 21 lines in operation now, covering a distance of 527.2 kilometers (327.6 miles).During the past decade, the average daily passenger flow has increased dramatically to about 10 million riders.Therefore, the operation and management of the rail transit system, especially real-time operation, is very important.
During peak hours, pedestrian congestion happens frequently.For safe and efficient purposes, the real-time passenger flows, especially predicted flows during the next several time intervals, are key issues for real-time intelligent operation of the rail transit system.However, with the past and current passenger flows detected easily, the future flows are not straightforward.Therefore, the passenger flow forecast method based on statistical data is rather meritorious.
Most recently, Sun et al. [1] proposed a nonparametric regression method to forecast passenger flow at subway transfer stations.Except for this, the literature review shows that very few researches have focused directly on short-term rail transit passenger flow prediction.However, short-term traffic flow forecasting has been studied extensively with Intelligent Transportation Systems (ITS) and many practical models have been developed from these studies.With just different input data entered into these models, some of those achievements can be used to forecast rail transit passenger flow easily.
Existing traffic flow forecast models cover a wide range consisting of Historical Average (HA), Autoregressive Integrated Moving Average (ARIMA), Neural Network (NN), Kalman filtering (KF), nonparametric regression (NR), chaos theory, Support Vector Machine (SVM), and others.The HA model uses a simple time-series method [2], which is rarely in use now.Ahmed and Cook [3] put forward an ARIMA model to forecast freeway traffic flows, and Williams et al. [4] further developed it to seasonal case and compared it with an Exponential Smoothing Method (ESM).Many researchers formulated NN-based prediction models and obtained rather satisfying results such as Smith and Demetsky [5], Florio and Mussone [6], Zhang et al. [7], Dougherty et al. [8], Park and Rilett [9], and Vlahogianni et al. [10].Kalman filtering is a kind of recursive state forecast method with high efficiency that has also been widely used in short-term traffic flow prediction, for example, Okutani and Stephanedes [11], Cathey and Dailey [12], and Shekhar and Williams [13].As a nonlinear regression method, the NR model is rather applicable to uncertain and dynamic systems, just like realtime transportation systems.Pioneering work on the NR method can be found in Yakowitz [14] and Karlsson and Yakowitz [15], and some scholars further developed them for traffic flow forecast, for instance, Davis and Nihan [16], Smith and Demetsky [17], Oswald et al. [18], Smith et al. [19], Qi and Smith [20], and Kindzerske and Ni [21].Huang et al. [22], Lu and Wang [23], Meng and Peng [24], Xue and Shi [25], and Pang and Zhao [26] applied chaos theory in the traffic flow prediction and obtained acceptable results.SVM is a new statistical machine-learning method [27] which has been proved to have stronger learning and generalization abilities than the NN model.SVM has also been used in the field of traffic flow forecast such as Ren et al. [28], Wu et al. [29], and Wang et al. [30].
Generally, the above methods can be classified into statistical and artificial intelligence models.Smith and Demetsky [17] and Smith et al. [19] compared some of these models and proposed that no single method was universally accepted as the best one.Therefore, based on existing single models, some combined methods have been developed and one of the most effective approaches is the Bayesian combined model.Zheng et al. [31], Dong et al. [32], Jiao et al. [33], and Jiao et al. [34] have proved its effectiveness.
More recently, some researches proposed new models for multistep prediction [35] and large-scale road network forecast [36].The latter employed cloud computing techniques for large-scale network applications.Among all the above short-term traffic flow forecast models, the Kalman filtering method is very efficient due to its recursive attribute and is rather convenient for use in rail transit passenger flow predictions.However, existing researches have proved that the traditional KF methods are not accurate and stable enough for on-line applications.Therefore, this paper will revise the traditional KF methods and propose three revised models.
To predict passenger flow accurately and efficiently, one key feature of the paper is to introduce some error calibration measures or new state variables into classical models and to construct some revised KF forecast models.The second key feature is to integrate some stable methods and formulate an innovative KF prediction model with good accuracy, stability, and robustness.
This paper consists of six sections.Following the Introduction, the basic KF model is described in the second section, including its state transition and measurement equations.Three revised KF models are formulated in the third section, including the KF model based on the error correction coefficient (KF-ECC), the KF model based on Historical Deviation (KF-HD), and the KF model based on the Bayesian combination and nonparametric regression (KF-BCNR).Solution algorithms for the NR model, KF model, and Bayesian combination model are designed in the fourth section, respectively.Prediction results using practical statistical passenger flow data are reported and analyzed in the fifth section.Conclusions and some future research directions are summarized in the last section.

Basic Kalman Filtering Model
The KF model is a kind of state space method consisting of three important parts: state variable, state transition equation, and measurement equation.
In the rail transit passenger flow prediction, the shortterm passenger flow to be forecasted is taken as the state variable directly.In this paper, we employ the passenger flow at the station.Using () to denote the passenger flow during time interval  at a station, the state transition equation and measurement equation are formulated as follows: where Q() is column vector form of passenger flow () and, accordingly, Q( − 1) is the column vector of ( − 1); W() is Gauss white noise vector with mean value 0 and covariance matrix D  and here D is a constant semipositive matrix and   is the Kronecker delta; that is,   = {1,  = ; 0, otherwise}; H() is column vector form of measurements and here the Historical Average passenger flow during the same time interval  is taken as the measurement; M() is measurement matrix and here it equals the identity matrix in the passenger flow prediction; that is, it can be neglected in the formulation; e() is column vector form of detection errors with mean value 0 and covariance matrix R  and here R is a constant semipositive matrix similar to D. Equations ( 1) and (2) constitute the basic KF model together.Existing researches have proved that the basic form of KF is rather efficient due to its recursive attribute.However, the accuracy is not satisfying.Therefore, we further formulate some revised KF models to improve the prediction accuracy.2) has been employed in historical cases, and the errors between historical forecast and historical detection are thus obtained.Based on characteristics of such errors, we introduce an error correction coefficient into the measurement equation:

Three Revised Kalman Filtering Models
where  is the error correction coefficient based on historical forecasting deviations.Here, measurement matrix M() is neglected, because it is an identity matrix in nature.
The error correction coefficient  varies under different conditions.It is closely correlated to the historical forecasting errors.In detail, it grows with the increase of historical errors, and we can obtain it by the historical data fitting procedures.
During weekdays, rail transit passenger flows usually change from morning peak hours to nonpeak hours and then to evening peak hours.Therefore, some similar characteristics in the historical forecasting errors are observed.Statistical analyses prove that it can fit a quadratic parabola function: where  and  are parameters to be estimated from the data fitting procedures.Equations ( 1), (3), and ( 4) constitute the revised KF-ECC model together.

The Revised KF Model Based on Historical Deviation.
Since the rail transit passenger flow fluctuates dramatically and the magnitude is rather large, the forecasting process of KF model using passenger volume as a state variable directly is not very stable.Further analyses of passenger flows show that the deviation between real-time volume and the corresponding historical data is fairly smooth [37].Therefore, the above-mentioned deviation is introduced into the KF model as the revised state variable to improve the accuracy and stability of the prediction.The revised KF-HD model is formulated as follows: where Q  () is the column vector form of historical passenger flow   () in the same time interval  and the same weekday during the last week.The most important issue is that Q  () is different from H(); that is, Q  () is corresponding to the same weekday in the previous week, while H() is the average value of the historical data.Equations ( 5) and ( 6) together constitute the revised KF-HD model, which is a basic KF formulation except for the state variable in a deviation form.Since Q  () and H() are available from statistical data, one can get the real-time passenger flow Q() easily.

The Revised KF Model Based on Bayesian Combination and Nonparametric Regression.
Existing researches [31][32][33][34] have proved the effectiveness of Bayesian combined approach in traffic flow forecasting.It is a weighted average method in fact, as shown below: where KF is the result from the KF model, NR is the result from the NR model, and   is the weight of the KF or the NR model.
As stated before, the NR model is fairly applicable to uncertain and dynamic transportation systems, and many literatures have demonstrated its accuracy.Therefore, we introduce the NR method into the Bayesian combined model to further improve the prediction effects.Here, the -nearest neighbor nonparametric regression (NNNR) method is employed.
From (7), we can find out that, in the Bayesian combination framework, KF model or NR model may be strengthened or weakened by adjusting the weight   .If we set  KF to zero, the KF model will be neglected from the combination.The same result will be derived for the NR model if we set  NR to zero.Actually, both weights will be adjusted dynamically according to the forecasting errors of two single models.The detailed adjustment mechanism will be illustrated in Section 4.
We further take the NR prediction as the control variable and introduce it into the KF model.Meanwhile, we combine the NR result in interval  with the KF result in interval  − 1 through Bayesian combination method and integrate them into the state transition equation of the KF model.The revised formulation is shown below: where Q KF () and Q NR () are the column vector forms of  KF () and  NR (), respectively, and other symbols are the same as before.The item  NR () ⋅ Q NR () is the control variable of the state transition equation; that is, it reflects the contributions of NR model to the final prediction results.Equations ( 8) and ( 2) constitute the revised KF-BCNR together.The main purpose of this revised KF model is to introduce more historical information and accurate results into the forecast process and to improve the accuracy and stability of the prediction.
Based on the adjusted algorithm of Bayesian weights and the results of the NR model, we can finally obtain the forecasted passenger flows.

Nonparametric Regression Algorithm.
The NR algorithm mainly consists of five steps: the preparation of historical data, the generation of sample database, the definition of state vector, the searching of -nearest neighbors, and the prediction function.The general algorithm flow is shown in Figure 1.

Historical data Sample database State vector
Real-time detected data K-nearest neighbor search Prediction results Detailed algorithm is described as follows.

Prediction function
Step 1 (preparation of historical data).All historical detected data are prepared for the NR algorithm in this paper.
Step 2 (generation of the sample database).The prepared historical data are summarized into the sample database, which keeps updating with the forecast process and integrates both real-time data and historical data.The quality of the sample database greatly influences the performance of the NR model.
Step 3 (definition of state vector).Rail transit passenger flows are different from link traffic volumes; that is, there are no upstream or downstream links.However, when forecasting the station, some other stations near it will influence the arrival and distribution characteristics of its passenger flow.Therefore, we introduce the correlation analysis between target station and other stations.
Step 4 (searching of -nearest neighbor).-nearest neighbor search is to choose -nearest data similar to current state vector and to predict the result of the next time interval based on the selected neighbors.
Euclidean distance is employed as the index to determine the -nearest neighbor; that is, where  is the set of other stations correlated to the target station;   () is the passenger volume of station  during interval ;    () is the historical data corresponding to   (); ( − ) is the passenger flow of the target station during interval  − ;   ( − ) is the historical data corresponding to ( − );  is the Euclidean distance.
Step 5 (prediction function).The prediction function is presented as in the following equation: where  is the number of the most similar data serials, that is, the -nearest neighbors;  = ∑  =1 (1/  ).
Using the above five steps, we can implement the NR algorithm and obtain the prediction results from the NR model.The above algorithm is coded using M language of the MATLAB platform.

The Sequential Kalman Filtering Algorithm.
For the purpose of accuracy and efficiency, a sequential KF algorithm is employed to solve three revised KF models, which is illustrated in detail in our previous work [38].This algorithm is also coded through the M language of the MATLAB software.

Bayesian Combination Algorithm.
The key issue of Bayesian combination is weight of each submodel, which is decided logically according to the error comparisons of two single forecast methods.
Based on the historical prediction results and corresponding historical detection data, we can obtain the forecast errors of the KF and the NR models, respectively.Here, the mean absolute percentage error (MAPE) is employed to denote forecast errors, as below: where Q() is the forecasted passenger flow during interval , () is the corresponding actual value, and  is the total number of time intervals.Furthermore, we denote the historical MAPE of KF and NR models by  KF and  NR , respectively.The prior probabilities of choosing KF and NR models are then presented as 0, ( KF ≥ 1) , where Pr(⋅) denotes a choice probability function; Pr( KF ) is the prior probability of choosing the KF model; Pr( NR ) is the prior probability of choosing the NR model.These two prior probabilities reflect the influences of historical forecasting errors.
To further incorporate the influences of current forecasting errors, we denote the current MAPE of the KF and the NR models by  KF and  NR , respectively.One must know that the current MAPEs are obtained based on the previous five time intervals; that is, they keep updating with the prediction process: 0, ( KF ≥ 1) , where Pr( |  KF ) and Pr( |  NR ) are the probabilities generating forecast  using the KF and the NR models, respectively.
Then, the posterior probabilities [33,34] where Pr( KF | ) and Pr( NR | ) are posterior probabilities of the KF and the NR models, respectively.
Based on ( 16), we finally obtain the weights of the KF and the NR models, as below: Equations ( 7), (8), and ( 17) are integrated collectively as the revised KF-BCNR model.

Case Study
We collected the bus Smart Card Data (SCD) of line 13 of Beijing in the whole month of November 2013 and extracted the passenger volumes of 15 stations in every minute from such SCD information for a case study.According to the unified numbering rules of Beijing rail transit system, these 15 stations are named 21,23,25,27,29,33,35,37,39,41,43,45,47,49, and 51, respectively.The operation period of line 13 is from 4:55 a.m. to 23:50 p.m.For application purpose, original data were aggregated to five minutes.Therefore, we totally have 228 time intervals.Passenger flows of station number 25 on November 28 (Thursday) were taken as the prediction target.
Using the above data, we implemented the KF model, the NR model, and the three proposed revised KF models and derived the prediction results of all five models, respectively.

Analyses of the NR Model.
The state vectors are decided based on the correlation coefficient  AB and the autocorrelation coefficient   , which are from time-series of passenger volumes of the target station and nearby stations, as shown in ( 9) and (10).Results show that the correlation coefficients between target station 25 and stations 21, 23, 27, and 49 all exceed 0.9; however, station 49 is excluded due to the relatively long distance from the target station.Therefore, the passenger flows of stations 21, 23, and 27 are taken as components of the state vector.Meanwhile, comparisons of the autocorrelation coefficients of the target station show that   is the biggest (0.86) when  equals 2.
The -nearest neighbors are further determined by several forecasting experiments.Besides MAPE, three other evaluation indices are also employed to analyze the prediction errors, as below: (1) MPE (mean percentage error): (2) RMSE (root mean square error): (3) NRMS (normalized root mean square error): Other symbols in (18) to (20) are the same as above.
The error statistics of MAPE, MPE, RMSE, and NRMS in case of different  are summarized in Table 1.
From Table 1, one can find out that the general performance is the best while  equals 2. Therefore,  is determined as 2 in the -nearest neighbor nonparametric regression model.

Prediction Results of the Three
Revised KF Models.All information needed in the three revised KF models is extracted from the database.As stated before, the error correction coefficient  in the revised KF-ECC model is determined by historical data fitting procedures: Obviously, it is a quadratic parabola formulation.
In the revised KF-BCNR model, the historical data is necessary for the Bayesian weights.Here, information of November 21, the same Thursday during the previous week, is employed to get those weights.
Prediction results of the KF, NR, revised KF-ECC, revised KF-HD, and revised KF-BCNR models during the whole day are all reported in Table 2.
From Table 2, one can find out that all three revised KF models yield better results than the traditional KF model.In detail, introduction of the error correction coefficient makes the KF-ECC model outperform the original KF model.Employment of the Historical Deviation as state variable further improves the forecast accuracy of the KF-HD model.Integration of Bayesian combination and NR method yields the best performance for the KF-BCNR model.Meanwhile, the NR model is also rather accurate; however, its efficiency is not very satisfying for on-line applications.
Graphical illustrations of these prediction results and errors during different periods are further described in Figures 2-9.
A further comparison of the prediction errors among all five models is illustrated in Figure 10.Here, the MAPE is employed to denote the forecasting error.
From the above predictions, one can find out the following results: (1) All the three revised KF models are fairly accurate for short-term rail transit passenger flows prediction.
The revised KF-ECC model gets better results than the traditional KF model, due to the introduction (2) Concerning the capability of tracking the dynamic characteristics of real-time passenger flows, the three revised KF models also outperform the original KF method.Again, the revised KF-BCNR model  improved the stability significantly and yields the best result.
(3) As a nonlinear regression method, the NR model gets much better results than the original KF model.It is even more accurate than the revised KF-ECC model in some cases.However, the revised KF-BCNR is still the most excellent model.
(4) The comparisons among different periods show that the prediction performance during peak hours is much better than during nonpeak hours.The intrinsic reason is that the passenger volumes during peak hours are much bigger than those during nonpeak hours, and the fluctuations of passenger flows during peak hours are much weaker than those during nonpeak hours.Moreover, the much big magnitude of passenger volume during peak hours also reduces some error indices, for instance, MAPE, MPE, and NRMS, because of the sum of actual passenger flows in the denominator.
(5) Prediction results during evening peak hours are the most accurate in all cases, with the MAPE at just 4.9% and the NRMS at just 6.1%.The direct reason is that the passenger volume during this period is the highest and the most stable among all the time intervals.
(6) Evaluation indices for the whole day are not very satisfying, because the passenger volumes during early morning and evening are very low and unstable, which can be seen from Figure 8.The very big errors corresponding to these time intervals in Figure 9 also indicate this phenomenon.These specific passenger flows greatly influence the prediction process and cause the increases of corresponding error indices.
Generally, all the three revised KF models are rather accurate and stable for on-line applications, especially during the very important peak hours.

Conclusions
This paper addresses three revised Kalman filtering models regarding short-term rail transit passenger flow prediction: the revised KF-ECC model, the revised KF-HD model, and  the revised KF-BCNR model.We first present a revised KF-ECC model by introducing the historical prediction error into the measurement equation through an error correction coefficient.Since the original state variable fluctuates dramatically, we further employ the deviation between realtime passenger volume and corresponding historical data as a new state variable and derive a revised KF-HD model.For more accurate prediction, we integrate both the Bayesian combination technique and the nonparametric regression method into the traditional KF model and formulate a revised KF-BCNR model.The bus Smart Card Data of line 13 of Beijing during one-month period are collected for case study.The reported prediction results based on the practical data indicate that all three revised models are much more accurate and stable than traditional methods.Moreover, the revised KF-HD model outperforms the KF-ECC method, and the revised KF-BCNR model yields the best performance.Further comparisons among different periods show that predictions during peak hours are much more accurate than those during nonpeak hours, and forecast results during evening peak hours are the most excellent ones.Since peak hours are more important for rail transit operation and   management, all three revised KF models proposed in this paper are accurate and stable enough for on-line applications.
Future potential research directions mainly consist of the following aspects.The first is to transform the three revised KF models to a short-term traffic flows forecast and to testify their applicability.The second is to further revise the models and algorithms for applications in the whole rail transit system or large-scale road networks.The third is to explore the inherent interrelations among dynamic passenger volume, real-time urban travel demand, and rail network structure and to propose more logical prediction models based on dynamic travel demand analysis.

3. 1 .
The Revised KF Model Based on Error Correction Coefficient.Since the historical passenger flow data could be collected easily, we can conveniently track the trend of the flow changes.The basic KF model in(1) and (

Figure 1 :
Figure 1: General flow of the NR algorithm.

Figure 3 :
Figure 3: Prediction errors in morning peak hours.

Figure 7 :
Figure 7: Prediction errors in evening peak hours.

Figure 9 :
Figure 9: Prediction errors in the whole day.

Figure 10 :
Figure 10: Comparison of prediction errors for different models and periods.
The number of correlative stations is determined by the correlation coefficient  AB .Meanwhile, the state vector should include the passenger volumes of previous  intervals of the target station, where  is determined by the autocorrelation coefficient   with rank .Using { A 1 , . . .,  A  } to denote the time-series of passenger volumes during consequent  intervals of station A and { B 1 , . . .,  B  } to indicate the time-series of passenger volumes during consequent  intervals of station B, the correlation coefficient between stations A and B is formulated as

Table 1 :
Prediction error statistics of NR model.

Table 2 :
Prediction error statistics of five models.

Table 3 :
Prediction error statistics of five models during different periods.
Figure 8: Prediction results in the whole day.