MPE Mathematical Problems in Engineering 1563-5147 1024-123X Hindawi Publishing Corporation 10.1155/2016/9717582 9717582 Research Article Three Revised Kalman Filtering Models for Short-Term Rail Transit Passenger Flow Prediction http://orcid.org/0000-0002-2351-4544 Jiao Pengpeng 1 http://orcid.org/0000-0002-3405-1143 Li Ruimin 2 Sun Tuo 1 Hou Zenghao 3 Ibrahim Amir 4 Jalali Payman 1 Beijing Urban Transportation Infrastructure Engineering Technology Research Center Beijing University of Civil Engineering and Architecture Beijing 100044 China bucea.edu.cn 2 Institute of Transportation Engineering Tsinghua University Beijing 100084 China tsinghua.edu.cn 3 Parsons Transportation Group 100 Broadway New York NY 10005 USA parsons.com 4 New Jersey Department of Transportation (NJDOT) 1035 Parkway Avenue Trenton NJ 08625 USA 2016 3032016 2016 16 12 2015 10 03 2016 2016 Copyright © 2016 Pengpeng Jiao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Short-term prediction of passenger flow is very important for the operation and management of a rail transit system. Based on the traditional Kalman filtering method, this paper puts forward three revised models for real-time passenger flow forecasting. First, the paper introduces the historical prediction error into the measurement equation and formulates a revised Kalman filtering model based on error correction coefficient (KF-ECC). Second, this paper employs the deviation between real-time passenger flow and corresponding historical data as state variable and presents a revised Kalman filtering model based on Historical Deviation (KF-HD). Third, the paper integrates nonparametric regression forecast into the traditional Kalman filtering method using a Bayesian combined technique and puts forward a revised Kalman filtering model based on Bayesian combination and nonparametric regression (KF-BCNR). A case study is implemented using statistical passenger flow data of rail transit line 13 in Beijing during a one-month period. The reported prediction results show that KF-ECC improves the applicability to historical trend, KF-HD achieves excellent accuracy and stability, and KF-BCNR yields the best performances. Comparisons among different periods further indicate that results during peak periods outperform those during nonpeak periods. All three revised models are accurate and stable enough for on-line predictions, especially during the peak periods.

1. Introduction

With the rapid development of urbanization and motorization in most Chinese large cities, the urban transportation systems are facing more and more serious problems, such as congestion, crashes, and pollution. As an efficient trip mode, rail transit system has played a more and more important role in solving traffic issues. In Beijing, there are a total of 21 lines in operation now, covering a distance of 527.2 kilometers (327.6 miles). During the past decade, the average daily passenger flow has increased dramatically to about 10 million riders. Therefore, the operation and management of the rail transit system, especially real-time operation, is very important.

During peak hours, pedestrian congestion happens frequently. For safe and efficient purposes, the real-time passenger flows, especially predicted flows during the next several time intervals, are key issues for real-time intelligent operation of the rail transit system. However, with the past and current passenger flows detected easily, the future flows are not straightforward. Therefore, the passenger flow forecast method based on statistical data is rather meritorious.

Most recently, Sun et al.  proposed a nonparametric regression method to forecast passenger flow at subway transfer stations. Except for this, the literature review shows that very few researches have focused directly on short-term rail transit passenger flow prediction. However, short-term traffic flow forecasting has been studied extensively with Intelligent Transportation Systems (ITS) and many practical models have been developed from these studies. With just different input data entered into these models, some of those achievements can be used to forecast rail transit passenger flow easily.

Existing traffic flow forecast models cover a wide range consisting of Historical Average (HA), Autoregressive Integrated Moving Average (ARIMA), Neural Network (NN), Kalman filtering (KF), nonparametric regression (NR), chaos theory, Support Vector Machine (SVM), and others. The HA model uses a simple time-series method , which is rarely in use now. Ahmed and Cook  put forward an ARIMA model to forecast freeway traffic flows, and Williams et al.  further developed it to seasonal case and compared it with an Exponential Smoothing Method (ESM). Many researchers formulated NN-based prediction models and obtained rather satisfying results such as Smith and Demetsky , Florio and Mussone , Zhang et al. , Dougherty et al. , Park and Rilett , and Vlahogianni et al. . Kalman filtering is a kind of recursive state forecast method with high efficiency that has also been widely used in short-term traffic flow prediction, for example, Okutani and Stephanedes , Cathey and Dailey , and Shekhar and Williams . As a nonlinear regression method, the NR model is rather applicable to uncertain and dynamic systems, just like real-time transportation systems. Pioneering work on the NR method can be found in Yakowitz  and Karlsson and Yakowitz , and some scholars further developed them for traffic flow forecast, for instance, Davis and Nihan , Smith and Demetsky , Oswald et al. , Smith et al. , Qi and Smith , and Kindzerske and Ni . Huang et al. , Lu and Wang , Meng and Peng , Xue and Shi , and Pang and Zhao  applied chaos theory in the traffic flow prediction and obtained acceptable results. SVM is a new statistical machine-learning method  which has been proved to have stronger learning and generalization abilities than the NN model. SVM has also been used in the field of traffic flow forecast such as Ren et al. , Wu et al. , and Wang et al. .

Generally, the above methods can be classified into statistical and artificial intelligence models. Smith and Demetsky  and Smith et al.  compared some of these models and proposed that no single method was universally accepted as the best one. Therefore, based on existing single models, some combined methods have been developed and one of the most effective approaches is the Bayesian combined model. Zheng et al. , Dong et al. , Jiao et al. , and Jiao et al.  have proved its effectiveness.

More recently, some researches proposed new models for multistep prediction  and large-scale road network forecast . The latter employed cloud computing techniques for large-scale network applications.

Among all the above short-term traffic flow forecast models, the Kalman filtering method is very efficient due to its recursive attribute and is rather convenient for use in rail transit passenger flow predictions. However, existing researches have proved that the traditional KF methods are not accurate and stable enough for on-line applications. Therefore, this paper will revise the traditional KF methods and propose three revised models.

To predict passenger flow accurately and efficiently, one key feature of the paper is to introduce some error calibration measures or new state variables into classical models and to construct some revised KF forecast models. The second key feature is to integrate some stable methods and formulate an innovative KF prediction model with good accuracy, stability, and robustness.

This paper consists of six sections. Following the Introduction, the basic KF model is described in the second section, including its state transition and measurement equations. Three revised KF models are formulated in the third section, including the KF model based on the error correction coefficient (KF-ECC), the KF model based on Historical Deviation (KF-HD), and the KF model based on the Bayesian combination and nonparametric regression (KF-BCNR). Solution algorithms for the NR model, KF model, and Bayesian combination model are designed in the fourth section, respectively. Prediction results using practical statistical passenger flow data are reported and analyzed in the fifth section. Conclusions and some future research directions are summarized in the last section.

2. Basic Kalman Filtering Model

The KF model is a kind of state space method consisting of three important parts: state variable, state transition equation, and measurement equation.

In the rail transit passenger flow prediction, the short-term passenger flow to be forecasted is taken as the state variable directly. In this paper, we employ the passenger flow at the station. Using Q ( k ) to denote the passenger flow during time interval k at a station, the state transition equation and measurement equation are formulated as follows: (1) Q k = Q k - 1 + W k , (2) H k = M k Q k + e k , where Q ( k ) is column vector form of passenger flow Q ( k ) and, accordingly, Q ( k - 1 ) is the column vector of Q ( k - 1 ) ; W ( k ) is Gauss white noise vector with mean value 0 and covariance matrix D δ i j and here D is a constant semipositive matrix and δ i j is the Kronecker delta; that is, δ i j = 1 , i = j ; 0 , o t h e r w i s e ; H ( k ) is column vector form of measurements and here the Historical Average passenger flow during the same time interval k is taken as the measurement; M ( k ) is measurement matrix and here it equals the identity matrix in the passenger flow prediction; that is, it can be neglected in the formulation; e ( k ) is column vector form of detection errors with mean value 0 and covariance matrix R δ i j and here R is a constant semipositive matrix similar to D .

Equations (1) and (2) constitute the basic KF model together. Existing researches have proved that the basic form of KF is rather efficient due to its recursive attribute. However, the accuracy is not satisfying. Therefore, we further formulate some revised KF models to improve the prediction accuracy.

3. Three Revised Kalman Filtering Models 3.1. The Revised KF Model Based on Error Correction Coefficient

Since the historical passenger flow data could be collected easily, we can conveniently track the trend of the flow changes. The basic KF model in (1) and (2) has been employed in historical cases, and the errors between historical forecast and historical detection are thus obtained. Based on characteristics of such errors, we introduce an error correction coefficient into the measurement equation: (3) H k = λ Q k + e k , where λ is the error correction coefficient based on historical forecasting deviations. Here, measurement matrix M ( k ) is neglected, because it is an identity matrix in nature.

The error correction coefficient λ varies under different conditions. It is closely correlated to the historical forecasting errors. In detail, it grows with the increase of historical errors, and we can obtain it by the historical data fitting procedures.

During weekdays, rail transit passenger flows usually change from morning peak hours to nonpeak hours and then to evening peak hours. Therefore, some similar characteristics in the historical forecasting errors are observed. Statistical analyses prove that it can fit a quadratic parabola function: (4) λ = b k - a k 2 , where a and b are parameters to be estimated from the data fitting procedures.

Equations (1), (3), and (4) constitute the revised KF-ECC model together.

3.2. The Revised KF Model Based on Historical Deviation

Since the rail transit passenger flow fluctuates dramatically and the magnitude is rather large, the forecasting process of KF model using passenger volume as a state variable directly is not very stable. Further analyses of passenger flows show that the deviation between real-time volume and the corresponding historical data is fairly smooth . Therefore, the above-mentioned deviation is introduced into the KF model as the revised state variable to improve the accuracy and stability of the prediction. The revised KF-HD model is formulated as follows: (5) Q k - H k = Q k - 1 - H k - 1 + W k , (6) Q H k - H k = Q k - H k + e k , where Q H ( k ) is the column vector form of historical passenger flow Q H ( k ) in the same time interval k and the same weekday during the last week. The most important issue is that Q H ( k ) is different from H ( k ) ; that is, Q H ( k ) is corresponding to the same weekday in the previous week, while H ( k ) is the average value of the historical data.

Equations (5) and (6) together constitute the revised KF-HD model, which is a basic KF formulation except for the state variable in a deviation form. Since Q H ( k ) and H ( k ) are available from statistical data, one can get the real-time passenger flow Q ( k ) easily.

3.3. The Revised KF Model Based on Bayesian Combination and Nonparametric Regression

Existing researches  have proved the effectiveness of Bayesian combined approach in traffic flow forecasting. It is a weighted average method in fact, as shown below: (7) Q k = i I ω i k × Q i k , I = KF , NR , where K F is the result from the KF model, N R is the result from the NR model, and ω i is the weight of the KF or the NR model.

As stated before, the NR model is fairly applicable to uncertain and dynamic transportation systems, and many literatures have demonstrated its accuracy. Therefore, we introduce the NR method into the Bayesian combined model to further improve the prediction effects. Here, the K -nearest neighbor nonparametric regression ( K NNNR) method is employed.

From (7), we can find out that, in the Bayesian combination framework, KF model or NR model may be strengthened or weakened by adjusting the weight ω i . If we set ω K F to zero, the KF model will be neglected from the combination. The same result will be derived for the NR model if we set ω N R to zero. Actually, both weights will be adjusted dynamically according to the forecasting errors of two single models. The detailed adjustment mechanism will be illustrated in Section 4.

We further take the NR prediction as the control variable and introduce it into the KF model. Meanwhile, we combine the NR result in interval k with the KF result in interval k - 1 through Bayesian combination method and integrate them into the state transition equation of the KF model. The revised formulation is shown below: (8) Q k = ω KF k · Q KF k - 1 + ω NR k · Q NR k + W k , where Q K F k and Q N R k are the column vector forms of Q K F ( k ) and Q N R ( k ) , respectively, and other symbols are the same as before. The item ω N R ( k ) · Q N R k is the control variable of the state transition equation; that is, it reflects the contributions of NR model to the final prediction results.

Equations (8) and (2) constitute the revised KF-BCNR together. The main purpose of this revised KF model is to introduce more historical information and accurate results into the forecast process and to improve the accuracy and stability of the prediction.

Based on the adjusted algorithm of Bayesian weights and the results of the NR model, we can finally obtain the forecasted passenger flows.

4. Algorithms 4.1. Nonparametric Regression Algorithm

The NR algorithm mainly consists of five steps: the preparation of historical data, the generation of sample database, the definition of state vector, the searching of K -nearest neighbors, and the prediction function. The general algorithm flow is shown in Figure 1.

General flow of the NR algorithm.

Detailed algorithm is described as follows.

Step 1 (preparation of historical data).

All historical detected data are prepared for the NR algorithm in this paper.

Step 2 (generation of the sample database).

The prepared historical data are summarized into the sample database, which keeps updating with the forecast process and integrates both real-time data and historical data. The quality of the sample database greatly influences the performance of the NR model.

Step 3 (definition of state vector).

Rail transit passenger flows are different from link traffic volumes; that is, there are no upstream or downstream links. However, when forecasting the station, some other stations near it will influence the arrival and distribution characteristics of its passenger flow. Therefore, we introduce the correlation analysis between target station and other stations. The number of correlative stations is determined by the correlation coefficient ρ A B . Meanwhile, the state vector should include the passenger volumes of previous l intervals of the target station, where l is determined by the autocorrelation coefficient ρ l with rank l .

Using V 1 A , , V n A to denote the time-series of passenger volumes during consequent n intervals of station A and V 1 B , , V n B to indicate the time-series of passenger volumes during consequent n intervals of station B, the correlation coefficient between stations A and B is formulated as (9) ρ AB = k = 1 n V k A - V A ¯ V k B - V B ¯ k = 1 n V k A - V A ¯ 2 k = 1 n V k B - V B ¯ 2 , where V A ¯ is the average of time-series V 1 A , , V n A and V B ¯ is the average of time-series V 1 B , , V n B .

For the autocorrelation coefficient, we decompose the time-series of passenger volumes of the target station, V 1 , , V n , into some subsequences with n - l elements, that is, V 1 , , V l + 1 , V 2 , , V l + 2 V n - l , , V n , and then the autocorrelation coefficient is formulated as (10) ρ l = k = 1 n - l V k - V - k V k + l - V - k + l k = 1 n - l V k - V - k 2 k = 1 n - l V k + l - V - k + l 2 .

Here, V - k means the average of time-series of V k , , V k + l .

Step 4 (searching of <inline-formula> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="M78"> <mml:mrow> <mml:mi>K</mml:mi></mml:mrow> </mml:math></inline-formula>-nearest neighbor).

K -nearest neighbor search is to choose K -nearest data similar to current state vector and to predict the result of the next time interval based on the selected neighbors.

Euclidean distance is employed as the index to determine the K -nearest neighbor; that is, (11) d = i = 1 I V i k - V i H k 2 + j = 0 l V k - j - V H k - j 2 , where I is the set of other stations correlated to the target station; V i ( k ) is the passenger volume of station i during interval k ; V i H ( k ) is the historical data corresponding to V i ( k ) ; V ( k - j ) is the passenger flow of the target station during interval k - j ;   V H ( k - j ) is the historical data corresponding to V ( k - j ) ; d is the Euclidean distance.

Step 5 (prediction function).

The prediction function is presented as in the following equation: (12) V k + 1 = i = 1 K 1 / d i d V i k , where K is the number of the most similar data serials, that is, the K -nearest neighbors; d = i = 1 K 1 / d i .

Using the above five steps, we can implement the NR algorithm and obtain the prediction results from the NR model. The above algorithm is coded using M language of the MATLAB platform.

4.2. The Sequential Kalman Filtering Algorithm

For the purpose of accuracy and efficiency, a sequential KF algorithm is employed to solve three revised KF models, which is illustrated in detail in our previous work . This algorithm is also coded through the M language of the MATLAB software.

4.3. Bayesian Combination Algorithm

The key issue of Bayesian combination is weight of each submodel, which is decided logically according to the error comparisons of two single forecast methods.

Based on the historical prediction results and corresponding historical detection data, we can obtain the forecast errors of the KF and the NR models, respectively. Here, the mean absolute percentage error (MAPE) is employed to denote forecast errors, as below: (13) MAPE = k = 1 n Q ~ k - Q k / Q k n × 100 % , where Q ~ ( k ) is the forecasted passenger flow during interval k , Q ( k ) is the corresponding actual value, and n is the total number of time intervals.

Furthermore, we denote the historical MAPE of KF and NR models by E H K F and E H N R , respectively. The prior probabilities of choosing KF and NR models are then presented as (14) Pr H KF = 1 - E H KF , E H KF < 1 0 , E H KF 1 , Pr H NR = 1 - E H NR , E H NR < 1 0 , E H NR 1 , where P r ( · ) denotes a choice probability function; P r ( H K F ) is the prior probability of choosing the KF model; P r ( H N R ) is the prior probability of choosing the NR model. These two prior probabilities reflect the influences of historical forecasting errors.

To further incorporate the influences of current forecasting errors, we denote the current MAPE of the KF and the NR models by E K F and E N R , respectively. One must know that the current MAPEs are obtained based on the previous five time intervals; that is, they keep updating with the prediction process: (15) Pr F H KF = 1 - E KF , E KF < 1 0 , E KF 1 , Pr F H NR = 1 - E NR , E NR < 1 0 , E NR 1 , where Pr F H KF and Pr F H NR are the probabilities generating forecast F using the KF and the NR models, respectively.

Then, the posterior probabilities [33, 34] are formulated as (16) Pr H KF F = Pr F H KF Pr H KF Pr F , Pr H NR F = Pr F H NR Pr H NR Pr F , Pr F = Pr F H KF Pr H KF + Pr F H NR Pr H NR , where Pr H KF F and Pr H NR F are posterior probabilities of the KF and the NR models, respectively.

Based on (16), we finally obtain the weights of the KF and the NR models, as below: (17) ω KF = Pr F H KF Pr H KF Pr F H KF Pr H KF + Pr F H NR Pr H NR , ω NR = Pr F H NR Pr H NR Pr F H KF Pr H KF + Pr F H NR Pr H NR .

Equations (7), (8), and (17) are integrated collectively as the revised KF-BCNR model.

5. Case Study

We collected the bus Smart Card Data (SCD) of line 13 of Beijing in the whole month of November 2013 and extracted the passenger volumes of 15 stations in every minute from such SCD information for a case study. According to the unified numbering rules of Beijing rail transit system, these 15 stations are named 21, 23, 25, 27, 29, 33, 35, 37, 39, 41, 43, 45, 47, 49, and 51, respectively. The operation period of line 13 is from 4:55 a.m. to 23:50 p.m. For application purpose, original data were aggregated to five minutes. Therefore, we totally have 228 time intervals. Passenger flows of station number 25 on November 28 (Thursday) were taken as the prediction target.

Using the above data, we implemented the KF model, the NR model, and the three proposed revised KF models and derived the prediction results of all five models, respectively.

5.1. Analyses of the NR Model

The state vectors are decided based on the correlation coefficient ρ A B and the autocorrelation coefficient ρ l , which are from time-series of passenger volumes of the target station and nearby stations, as shown in (9) and (10). Results show that the correlation coefficients between target station 25 and stations 21, 23, 27, and 49 all exceed 0.9; however, station 49 is excluded due to the relatively long distance from the target station. Therefore, the passenger flows of stations 21, 23, and 27 are taken as components of the state vector. Meanwhile, comparisons of the autocorrelation coefficients of the target station show that ρ l is the biggest (0.86) when l equals 2.

The K -nearest neighbors are further determined by several forecasting experiments. Besides MAPE, three other evaluation indices are also employed to analyze the prediction errors, as below:

MPE (mean percentage error): (18) MPE = k = 1 n Q ~ k - Q k / Q k n × 100 % .

RMSE (root mean square error): (19) RMSE = k = 1 n Q ~ k - Q k 2 n .

NRMS (normalized root mean square error): (20) NRMS = n k = 1 n Q ~ k - Q k 2 k = 1 n Q k × 100 % .

Other symbols in (18) to (20) are the same as above.

The error statistics of MAPE, MPE, RMSE, and NRMS in case of different K are summarized in Table 1.

Prediction error statistics of NR model.

K MAPE MPE RMSE NRMS
1 18.7% 8.2% 16.3 17.2%
2 19.7% −3.4% 10.2 12.0%
3 21.4% −8.6% 14.3 16.5%
4 24.0% 10.4% 21.5 25.2%
5 26.9% −14.9% 27.4 29.1%

From Table 1, one can find out that the general performance is the best while K equals 2. Therefore, K is determined as 2 in the K -nearest neighbor nonparametric regression model.

5.2. Prediction Results of the Three Revised KF Models

All information needed in the three revised KF models is extracted from the database. As stated before, the error correction coefficient λ in the revised KF-ECC model is determined by historical data fitting procedures: (21) λ = 0.010742 k - 0.000045 k 2 .

Obviously, it is a quadratic parabola formulation.

In the revised KF-BCNR model, the historical data is necessary for the Bayesian weights. Here, information of November 21, the same Thursday during the previous week, is employed to get those weights.

Prediction results of the KF, NR, revised KF-ECC, revised KF-HD, and revised KF-BCNR models during the whole day are all reported in Table 2.

Prediction error statistics of five models.

Model MAPE MPE RMSE NRMS
KF 38.8% −33.5% 52.0 60.2%
NR 19.7% −3.4% 10.2 12.0%
KF-ECC 27.8% −14.9% 33.0 38.2%
KF-HD 20.5% 7.5% 11.5 13.3%
KF-BCNR 18.1% 4.1% 10.2 11.9%

From Table 2, one can find out that all three revised KF models yield better results than the traditional KF model. In detail, introduction of the error correction coefficient makes the KF-ECC model outperform the original KF model. Employment of the Historical Deviation as state variable further improves the forecast accuracy of the KF-HD model. Integration of Bayesian combination and NR method yields the best performance for the KF-BCNR model. Meanwhile, the NR model is also rather accurate; however, its efficiency is not very satisfying for on-line applications.

To compare the performances of traditional models and three revised models during different periods, the evaluation indices during morning peak hours (7:00–9:00), nonpeak hours (11:00–13:00), evening peak hours (17:00–19:00), and the whole day (4:55–23:55) are further extracted and summarized in Table 3.

Prediction error statistics of five models during different periods.

Error indices Models
KF NR KF-ECC KF-HD KF-BCNR
MAPE (%)
Morning peak 35.5 10.4 12.9 10.2 8.2
Nonpeak 39.7 15.7 27.2 16.4 13.4
Evening peak 36.9 3.5 23.3 6.0 4.9
Whole day 38.8 19.7 27.8 20.5 18.1
MPE (%)
Morning peak −35.5 −3.3 −8.2 4.5 0.8
Nonpeak −39.7 −9.3 −26.6 −0.8 −4.9
Evening peak −36.9 −1.6 −23.3 2.6 0.6
Whole day −33.5 −3.4 −14.9 7.5 4.1
RMSE
Morning peak 33.4 11.3 13.6 9.9 7.6
Nonpeak 16.9 8.2 12.7 7.3 6.9
Evening peak 134.9 14.0 85.1 22.5 21.1
Whole day 52.0 10.2 33.0 11.5 10.2
NRMS (%)
Morning peak 38.4 13.0 15.7 11.4 8.7
Nonpeak 44.4 21.5 33.3 19.1 18.1
Evening peak 38.8 4.0 24.5 6.5 6.1
Whole day 60.2 11.8 38.2 13.3 11.9

Graphical illustrations of these prediction results and errors during different periods are further described in Figures 29.

Prediction results in morning peak hours.

Prediction errors in morning peak hours.

Prediction results in nonpeak hours.

Prediction errors in nonpeak hours.

Prediction results in evening peak hours.

Prediction errors in evening peak hours.

Prediction results in the whole day.

Prediction errors in the whole day.

A further comparison of the prediction errors among all five models is illustrated in Figure 10. Here, the MAPE is employed to denote the forecasting error.

Comparison of prediction errors for different models and periods.

From the above predictions, one can find out the following results:

All the three revised KF models are fairly accurate for short-term rail transit passenger flows prediction. The revised KF-ECC model gets better results than the traditional KF model, due to the introduction of the error correction coefficient. The revised KF-HD model further outperforms the KF-ECC model, because employing Historical Deviation as state variable improves its accuracy. Integrating Bayesian combination and the NR methods, the revised KF-BCNR model yields the best accuracy among all three revised KF models.

Concerning the capability of tracking the dynamic characteristics of real-time passenger flows, the three revised KF models also outperform the original KF method. Again, the revised KF-BCNR model improved the stability significantly and yields the best result.

As a nonlinear regression method, the NR model gets much better results than the original KF model. It is even more accurate than the revised KF-ECC model in some cases. However, the revised KF-BCNR is still the most excellent model.

The comparisons among different periods show that the prediction performance during peak hours is much better than during nonpeak hours. The intrinsic reason is that the passenger volumes during peak hours are much bigger than those during nonpeak hours, and the fluctuations of passenger flows during peak hours are much weaker than those during nonpeak hours. Moreover, the much big magnitude of passenger volume during peak hours also reduces some error indices, for instance, MAPE, MPE, and NRMS, because of the sum of actual passenger flows in the denominator.

Prediction results during evening peak hours are the most accurate in all cases, with the MAPE at just 4.9% and the NRMS at just 6.1%. The direct reason is that the passenger volume during this period is the highest and the most stable among all the time intervals.

Evaluation indices for the whole day are not very satisfying, because the passenger volumes during early morning and evening are very low and unstable, which can be seen from Figure 8. The very big errors corresponding to these time intervals in Figure 9 also indicate this phenomenon. These specific passenger flows greatly influence the prediction process and cause the increases of corresponding error indices.

Generally, all the three revised KF models are rather accurate and stable for on-line applications, especially during the very important peak hours.

6. Conclusions

This paper addresses three revised Kalman filtering models regarding short-term rail transit passenger flow prediction: the revised KF-ECC model, the revised KF-HD model, and the revised KF-BCNR model. We first present a revised KF-ECC model by introducing the historical prediction error into the measurement equation through an error correction coefficient. Since the original state variable fluctuates dramatically, we further employ the deviation between real-time passenger volume and corresponding historical data as a new state variable and derive a revised KF-HD model. For more accurate prediction, we integrate both the Bayesian combination technique and the nonparametric regression method into the traditional KF model and formulate a revised KF-BCNR model. The bus Smart Card Data of line 13 of Beijing during one-month period are collected for case study. The reported prediction results based on the practical data indicate that all three revised models are much more accurate and stable than traditional methods. Moreover, the revised KF-HD model outperforms the KF-ECC method, and the revised KF-BCNR model yields the best performance. Further comparisons among different periods show that predictions during peak hours are much more accurate than those during nonpeak hours, and forecast results during evening peak hours are the most excellent ones. Since peak hours are more important for rail transit operation and management, all three revised KF models proposed in this paper are accurate and stable enough for on-line applications.

Future potential research directions mainly consist of the following aspects. The first is to transform the three revised KF models to a short-term traffic flows forecast and to testify their applicability. The second is to further revise the models and algorithms for applications in the whole rail transit system or large-scale road networks. The third is to explore the inherent interrelations among dynamic passenger volume, real-time urban travel demand, and rail network structure and to propose more logical prediction models based on dynamic travel demand analysis.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

Acknowledgments

This research is supported by the National Natural Science Foundation of China Project (51578040, 51208024), Beijing Nova Programme (Z151100000315050), Beijing Natural Science Foundation Project (8162013), and the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions (CIT&TCD201404071).

Sun Y. Zhang G. Yin H. Passenger flow prediction of subway transfer stations based on nonparametric regression model Discrete Dynamics in Nature and Society 2014 2014 8 397154 10.1155/2014/397154 2-s2.0-84901036098 Stephanedes Y. J. Michalopoulos P. G. Plum R. A. Improved estimation of traffic flow for real-time control Transportation Research Record 1981 795 28 39 Ahmed M. S. Cook A. R. Analysis of freeway traffic time-series data by using Box-Jenkins techniques Transportation Research Record 1979 722 1 9 2-s2.0-0018729076 Williams B. M. Durvasula P. K. Brown D. E. Urban freeway traffic flow prediction: application of seasonal autoregressive integrated moving average and exponential smoothing models Transportation Research Record 1998 1644 132 141 2-s2.0-0032207514 Smith B. L. Demetsky M. J. Short-term traffic flow prediction: neural network approach Transportation Research Record 1994 1453 98 104 Florio L. Mussone L. Neural-network models for classification and forecasting of freeway traffic flow stability Control Engineering Practice 1996 4 2 153 164 10.1016/0967-0661(95)00221-9 2-s2.0-0030081052 Zhang H. J. Ritchie S. G. Lo Z.-P. Macroscopic modeling of freeway traffic using an artificial neural network Transportation Research Record 1997 1588 110 119 2-s2.0-0001477735 Dougherty M. S. Kirby H. R. Boyle R. D. The use of neural networks to recognise and predict traffic congestion Traffic Engineering & Control 1993 34 6 311 314 2-s2.0-0027787102 Park D. Rilett L. R. Forecasting multiple-period freeway link travel times using modular neural networks Transportation Research Record 1998 1617 163 170 2-s2.0-0032155636 Vlahogianni E. I. Karlaftis M. G. Golias J. C. Optimized and meta-optimized neural networks for short-term traffic flow prediction: a genetic approach Transportation Research Part C 2005 13 3 211 234 10.1016/j.trc.2005.04.007 2-s2.0-23844513726 Okutani I. Stephanedes Y. J. Dynamic prediction of traffic volume through Kalman filtering theory Transportation Research Part B 1984 18 1 1 11 10.1016/0191-2615(84)90002-X 2-s2.0-0021375695 Cathey F. W. Dailey D. J. A prescription for transit arrival/departure prediction using automatic vehicle location data Transportation Research Part C: Emerging Technologies 2003 11 3-4 241 264 10.1016/S0968-090X(03)00023-8 2-s2.0-0042163123 Shekhar S. Williams B. M. Adaptive seasonal time series models for forecasting short-term traffic flow Transportation Research Record 2007 2024 116 125 10.3141/2024-14 2-s2.0-40449101916 Yakowitz S. Nearest-neighbour methods for time series analysis Journal of Time Series Analysis 1987 8 2 235 247 10.1111/j.1467-9892.1987.tb00435.x MR886141 ZBL0615.62115 Karlsson M. Yakowitz S. Rainfall-runoff forecasting methods, old and new Stochastic Hydrology and Hydraulics 1987 1 4 303 318 10.1007/BF01543102 2-s2.0-0000506898 Davis G. A. Nihan N. L. Nonparametric regression and short-term freeway traffic forecasting Journal of Transportation Engineering 1991 117 2 178 188 10.1061/(ASCE)0733-947X(1991)117:2(178) 2-s2.0-0026128928 Smith B. L. Demetsky M. J. Traffic flow forecasting: comparison of modeling approaches Journal of Transportation Engineering 1997 123 4 261 266 10.1061/(asce)0733-947x(1997)123:4(261) 2-s2.0-0031472064 Oswald R. K. Scherer W. T. Smith B. L. Traffic flow forecasting using approximate nearest neighbor nonparametric regression Research Report 2001 Uvacts-15-13-7 Charlottesville, VA, USA Center for transportation studies at the University of Virginia Smith B. L. Williams B. M. Keith Oswald R. Comparison of parametric and nonparametric models for traffic flow forecasting Transportation Research Part C: Emerging Technologies 2002 10 4 303 321 10.1016/S0968-090X(02)00009-8 2-s2.0-0036692982 Qi Y. Smith B. L. Identifying nearest neighbors in a large-scale incident data archive Transportation Research Record 2004 1879 89 98 2-s2.0-14744291778 Kindzerske M. D. Ni D. Composite nearest neighbor nonparametric regression to improve traffic prediction Transportation Research Record 2007 1993 30 35 10.3141/1993-05 2-s2.0-38349078518 Huang K. Chen S. Zhou Z. Research on a nonlinear chaotic prediction model for urban traffic flow Journal of Southeast University 2003 19 4 410 414 Zbl1108.90306 Lu J. Wang Z. Prediction of network traffic flow based on chaos characteristics Journal of Nanjing University of Aeronautics and Astronautics 2006 38 2 217 221 2-s2.0-33744949147 Meng Q. Peng Y. A new local linear prediction model for chaotic time series Physics Letters, Section A: General, Atomic and Solid State Physics 2007 370 5-6 465 470 10.1016/j.physleta.2007.06.010 ZBL1209.37095 2-s2.0-35348991143 Xue J.-N. Shi Z.-K. Short-time traffic flow prediction using chaos time series theory Journal of Transportation Systems Engineering and Information Technology 2008 8 5 68 72 2-s2.0-55649085084 Pang M.-B. Zhao X.-P. Traffic flow prediction of chaos time series by using subtractive clustering for fuzzy neural network modeling Proceedings of the 2nd International Symposium on Intelligent Information Technology Application (IITA '08) December 2008 Shanghai, China 23 27 10.1109/iita.2008.50 2-s2.0-63149155060 Vapnik V. N. The Nature of Statistical Learning Theory 2000 2nd New York, NY, USA Springer Statistics for Engineering and Information Science 10.1007/978-1-4757-3264-1 MR1719582 Ren J. Ou X. Zhang Y. Hu D. Research on network-level traffic pattern recognition Proceedings of the IEEE 5th International Conference on Intelligent Transportation Systems 2002 Singapore 500 504 10.1109/itsc.2002.1041268 Wu C.-H. Ho J.-M. Lee D. T. Travel time prediction with support vector regression IEEE Transactions on Intelligent Transportation Systems 2004 5 4 276 281 10.1109/tits.2004.837813 2-s2.0-10644266188 Wang J. Chen X. Guo S. Bus travel time prediction model with ν -support vector regression Proceedings of the 12th International IEEE Conference on Intelligent Transportation Systems (ITSC '09) October 2009 St. Louis, Mo, USA 1 6 10.1109/ITSC.2009.5309844 Zheng W. Lee D.-H. Shi Q. Short-term freeway traffic flow prediction: Bayesian combined neural network approach Journal of Transportation Engineering 2006 132 2 114 121 10.1061/(ASCE)0733-947X(2006)132:2(114) 2-s2.0-31044437283 Dong S. Li R. Sun L. G. Chang T. H. Lu H. Short-term traffic forecast system of Beijing Transportation Research Record 2010 2193 116 123 10.3141/2193-14 2-s2.0-79951542151 Jiao P. Sun T. Du L. A bayesian combined model for time-dependent turning movement proportions estimation at intersections Mathematical Problems in Engineering 2014 2014 8 607195 10.1155/2014/607195 2-s2.0-84911914122 Jiao P. Liu M. Guo J. Sun T. Bi-bayesian combined model for two-step prediction of dynamic turning movement proportions at intersections Advances in Mechanical Engineering 2014 2014 9 439031 10.1155/2014/439031 2-s2.0-84911896375 Yang Z. Bing Q. Lin C. Yang N. Mei D. Research on short-term traffic flow prediction method based on similarity search of time series Mathematical Problems in Engineering 2014 2014 8 184632 10.1155/2014/184632 2-s2.0-84907246554 Yang Z. Mei D. Yang Q. Zhou H. Li X. Traffic flow prediction model for large-scale road network based on cloud computing Mathematical Problems in Engineering 2014 2014 8 926251 10.1155/2014/926251 2-s2.0-84907246794 Ashok K. Ben-Akiva M. E. Alternative approaches for real-time estimation and prediction of time-dependent origin-destination flows Transportation Science 2000 34 1 21 36 10.1287/trsc.34.1.21.12282 2-s2.0-0033890854 Jiao P. Sun T. Multiobjective traffic signal control model for intersection based on dynamic turning movements estimation Mathematical Problems in Engineering 2014 2014 8 608194 10.1155/2014/608194 MR3268290 2-s2.0-84911874854