Passenger flow is increasing dramatically with accomplishment of subway network system in big cities of China. As convergence nodes of subway lines, transfer stations need to assume more passengers due to amount transfer demand among different lines. Then, transfer facilities have to face great pressure such as pedestrian congestion or other abnormal situations. In order to avoid pedestrian congestion or warn the management before it occurs, it is very necessary to predict the transfer passenger flow to forecast pedestrian congestions. Thus, based on nonparametric regression theory, a transfer passenger flow prediction model was proposed. In order to test and illustrate the prediction model, data of transfer passenger flow for one month in XIDAN transfer station were used to calibrate and validate the model. By comparing with Kalman filter model and support vector machine regression model, the results show that the nonparametric regression model has the advantages of high accuracy and strong transplant ability and could predict transfer passenger flow accurately for different intervals.
Most cities in China are facing serious traffic problems, such as traffic congestion, pollution, and accidents. It is agreed that subway system is one of the efficient countermeasures to solve traffic problems. However, passenger flow is increasing dramatically with accomplishment of subway network system in big cities. As convergence nodes of subway lines, transfer stations need to assume more passengers due to amount transfer demand among different lines. Transfer facilities have to face great traffic pressure because passengers always arrive in a very short time. Consequently, pedestrian congestion or other abnormal situations will occur more easily. So, in order to avoid pedestrian congestion or warn the management before it occurs, it is very necessary to predict the transfer passenger flow to forecast pedestrian congestions.
Nonparametric regression was selected as the prediction method to forecast the passenger flow due to the fact that the authors have demonstrated the advantages of nonparametric regression over other approaches, such as Kalman filtering [
Nonparametric regression is suitable for uncertain and nonlinear dynamic system. It is founded on chaotic system theory. Earlier work by Smith [
Nonparametric regression model is quite suitable for deterministic and nonlinear prediction. And it could be used in the situation without transcendental knowledge and enough historical data. It can try to find the nearest neighbor between historical data and current data, and with the nearest neighbor, it tries to predict the flow in the next interval. The algorithm assumes that the intrinsic links of all factors are all contained in the historical data. So, the information can be obtained directly from the historical data instead of establishing an approximate model for it. In other words, the nonparametric modeling does not smooth the historical data. Therefore, the predicted effect is more precise than the parameters modeling, especially in the special events. As a free parameter, portable, and high prediction accuracy algorithm, the error of nonparametric regression is relatively small. What is more, this model is quite suitable for computer programming and can be applied to the complex environment.
The basic idea of nonparametric regression is to form a typical historical database, which is on the basis of comprehensive analysis of a large number of historical data. The historical database contains variety of traffic state trends as well as the typical rules. Each type of data in the sample library represents a traffic evolution trend. The latest traffic data collected in real-time are matched with historical data to find the nearest
The schematic illustration of nonparametric regression theory.
Due to well prediction ability, kinds of nonparametric regression models were used to forecast traffic states gradually. In 1991, Davis and Nihan [
Tang and Gao [
Liu et al. [
Sun and Zhang [
From the previous literature review, it can be found that kinds of nonparametric regression models were widely used to predict traffic condition of motor vehicles. However, there were few research works related with pedestrian traffic. So, in order to test and verify the applicability of nonparametric regression in pedestrian traffic condition prediction, the
The application of nonparametric regression prediction contains five key steps: choosing clustering methods of historical database, the definition of state vector, the determining of similar mechanism, the choosing of the nearest neighbor mechanism, and the choosing of prediction function.
The first and critical step in nonparametric regression is historical data preparation, whose quality directly determines the prediction effect of nonparametric regression. What is more, the prediction effect of nonparametric regression is closely related to the choosing of clustering methods and computational time. Therefore, firstly, in order to search enough nearest neighbors, the historical database which was built by clustering method must cover all state of the system. Secondly, clustering method should be able to meet the requirements in the dynamic data real-time classification and to meet the requirements of real-time, online programming. But now, traditional clustering methods take the average state vector or a single historical value as the clustering objects; it is difficult to reflect the data changing trends characteristics. Thus, the paper will focus on discussing the improvement of clustering methods and the model computational speed.
State vector is composed of the minimum number of state variables, which are associated with the predictor variables. Because maybe there are a lot of state variables associated with predictor variables, it is necessary to properly select the number of state vectors to achieve the best balance between accuracy and computational speed.
It is an important concept in the nonparametric regression, which means how to evaluate the similarity of the current point and the historical database. The most commonly used metric method is the Euclidean distance or weighted Euclidean distance.
As a core concept of nonparametric regression, the nearest neighbor mechanism refers to the point in the history database and how to become a close neighbor of the current point. There are two mechanisms: minimum
After finding the nearest neighbor points, a function needs to be used to take advantage of these points to predict the next period value. Commonly used methods are average, weighted average, and so on.
The basic procedure of nonparametric regression prediction is to compare the recent data status with the historical data and figure out the most similar data serials which would be used to predict the future data status. So, in order to provide the most similar data serial, the historical database should include enough historical information. And, in order to reflect as many trends of data serial as possible, all the historical data were stored in the database without any processing. So, the organization method of data serial in historical database determines the calculation efficiency of the prediction model. The historical database is the foundation of transfer passenger flow prediction. The core concept of the nonparametric regression is to match recent data with the historical database. From all the matches, either the
If the length of the data serials is
If
The number of clustering types of historical database is
For one data serial, the clustering label is
Figure
Illustration of trend label of state vector for nonparametric regression.
And the clustering label is
Based on the experimental analysis, the neighbor data are chosen as the state vector. The vector contains four current transfer passenger flow trend data and five historical transfer passenger flow trend data. Four neighbor data are selected as data serial. The prediction model calculates the clustering label based on the trend of the four neighbor data and searches for the most similar data serials from history database. Then, the future data status is predicted according to the next trend of the most similar data serials.
The Euclidean distance is used to calculate the similar level between the recent data serial and the historical data serials. The equation is
Except for the Euclidean distance, the weights of the most similar historical data serials are also used in the prediction model. As shown in (
The weighted average method based on the reciprocal of the matching distance is chosen as the prediction function. The shorter distance point is the more similar point. Then, the weighing is bigger. For most nonparametric regression prediction models, the next value of the most similar historical serial is used as the prediction value of recent data serial. The next value and weighted coefficient based on the historical data are used to predict the transfer passenger flow in the prediction algorithm. In the state vector of the prediction model, the historical data of the current time and the nearest time are used to identify different prediction coefficient, and the historical data of the next trend are used to calculate the prediction data directly.
However, due to reasons such as the lack of historical data or abnormal flow, the next value of recent data serial may change dramatically, taking Figure
Comparison of state vector of prediction and similar neighborhood.
In order to test the accuracy of the prediction model, the transfer passenger flow of XIDAN station was used to calibrate the model. The historical database was built with the transfer passenger flow from July 26 to August 25, 2011. The prediction data were the passenger flow of August 25, 2011. The prediction results are illustrated in Figure
Forecasting result for each 5 minutes in morning peak hour using nonparametric regression.
See Figures
Forecasting result for each 3 minutes in morning peak hour using nonparametric regression.
Forecasting result for each 1 minute in morning peak hour using nonparametric regression.
See Figures
Forecasting the result for each 5 minutes in evening peak hour using nonparametric regression.
Forecasting the result for each 3 minutes in evening peak hour using nonparametric regression.
Forecasting the result for each 1 minute in evening peak hour using nonparametric regression.
The prediction performance for different time and intervals is shown in Table
Precision of nonparametric regression forecasting model.
Performance | Time | |||||
---|---|---|---|---|---|---|
7:00–9:00 | 17:00–19:00 | |||||
1 minute | 3 minutes | 5 minutes | 1 minute | 3 minutes | 5 minutes | |
Average relative error | 12.20% | 8.10% | 6.30% | 11.80% | 6.00% | 4.00% |
Maximum relative error | 42.00% | 35.00% | 23.00% | 31.00% | 24.00% | 13.00% |
Equalization coefficient | 0.91 | 0.96 | 0.96 | 0.93 | 0.96 | 0.98 |
Comparison of average relative error for different forecasting models.
Comparison of maximum relative error for different forecasting models.
Comparison of equalization coefficient for different forecasting models.
As a convergence node of subway lines, transfer stations need to assume more passengers due to amount transfer demand among different lines. So, it is really very necessary to predict the transfer passenger flow to avoid pedestrian congestion or warn the management before it occurs.
Based on nonparametric regression theory, a transfer passenger flow prediction model was proposed. And data of transfer passenger flow for one month in XIDAN transfer station were used to calibrate and validate the model. The results show that the model could predict transfer passenger flow accurately for different intervals. What is more, the prediction accuracy is also much better than Kalman filter model and support vector machine regression model. The bigger the interval is, the more accurate the prediction result is. The maximum average relative error is 12.20%, which means that the prediction model can be used in real application.
The authors declare that there is no conflict of interests regarding the publication of this paper.