Shortterm forecasting of OD (origin to destination) passenger flow on highspeed rail (HSR) is one of the critical tasks in rail traffic management. This paper proposes a hybrid model to explore the impact of the train service frequency (TSF) of the HSR on the passenger flow. The model is composed of two parts. One is the HoltWinters model, which takes advantage of time series characteristics of passenger flow. The other part considers the changes of TSF for the OD in different time during a day. The two models are integrated by the minimum absolute value method to generate the final hybrid model. The operational data of BeijingShanghai highspeed railway from 2012 to 2016 are used to verify the effectiveness of the model. In addition to the forecasting ability, with a definite formation, the proposed model can be further used to forecast the effects of the TSF.
As of September 10, 2016, the operating length of China’s HSR has exceeded 20,000 km (the HSR in China refers to the train services that have an average speed of 200 km/h or higher. The HSR network consists of upgraded conventional railways and newly built HSR lines). The HSR provides a new choice of transport for some people who used to travel by airline or the highway. The income of the HSR mainly comes from two parts of the passenger flow, one is the newly generated HSR passenger traffic, and the other one is the passenger who transforms from other modes of transport. Therefore, the accurate shortterm forecasting of OD passenger flow on HSR is significantly important because (i) it provides the basis for the planning and enhancement of the railway network; (ii) it is the fundament for the investment and construction of HSR; (iii) it affects the revenue management, technical specifications, operation mode, and facility improvement on HSR line [
In the past decades, a lot of attention has been paid to the shortterm forecasting. These models can be generally divided into three categories: time series models, causal models, and hybrid models.
Firstly, time series models are the functions in which the traffic flow is modeled by the observed values. In general, time series models mainly include autoregressive integrated moving average (ARIMA) [
Second, causal models are that the traffic flow is modeled as a function of some exogenous factors or endogenous factors. The model which considers the effects of multiinfluencing factors can greatly enhance the flexibility of the forecasting model. By describing the relationship between transportation capacity, passenger volume, and quantity demanded, Luo et al. [
Third, hybrid models are the wellestablished and welltested approach to improving the forecasting accuracy [
In this paper, we construct a hybrid forecasting model to forecast the daily passenger flow (DPF) by taking the advantages of the time series model and the causal model considering the impact of the TSF. The remainder of this article is organized as follows. In Section
In order to have a prior knowledge of the DPF on the HSR, the features of the TSF and the temporal features of OD passenger will be introduced here. For different OD markets on the HSR line, the patterns of their DPF might be different. However, there is no doubt that the DPF of any particular OD market creates some regularities over time. In this section, we focus on the OD market of Beijing South Station to Shanghai Hongqiao Station on BeijingShanghai HSR line. This HSR line is studied because it is a representative line in China’s HSR network. In the first subsection, we will analyze the time series characteristics of the DPF of this OD, including the periodic changes of DPF in weeks, months, and years. In addition, the second subsection illustrates that the train service frequency (TSF) also puts obvious influences on the OD’s DPF.
Figure
OD’s (Beijing South Station to Shanghai Hongqiao Station) daily passenger flow.
Then, we use the moving average method to generate the red dash line. It is found that the whole line can be divided into four similar parts, and each part corresponds to one year. Hence, a yearly regular pattern of the change of DPF exists.
Moreover, in Figure
The DPF from Beijing South Station to Shanghai Hongqiao Station in different weeks.
From the above analysis, we can find that there exist the yearly, monthly, and weekly periodic characteristics of the OD’s DPF. In order to describe the time characteristics of passenger flow changes, this time series data will be modeled using the HoltWinters model in the next section.
China Railway Corporation (CR) modifies the train timetable at least twice per year, typically in January and July. In each modification, three types of new timetable are generated, namely, normal timetable, weekend timetable, and peak timetable, which are applied during Monday and Thursday, during Friday to Sunday, and in the festival (i.e., the National Day and the spring festival) respectively. Generally, the number of trains in different types of timetable is different. The peak timetable is the most saturated one, followed by the weekend timetable, which is still more saturated compared to the normal one.
As an illustration, Figure
Analysis of the correlation between the TSF and DPF in the OD (Beijing South Station to Shanghai Hongqiao Station).
From above, the change of TSF may influence the daily passenger flow. As Figure
Service frequency analysis of the OD (Beijing South Station to Shanghai Hongqiao Station) in different periods.
From Figure
In this section, we build the HoltWinters time series model based on the historical DPF’s data. Meanwhile, the linear regression model which considers TSF is proposed. The minimum absolute value method is used to combine the two models to form a hybrid model to complement the advantages of the two.
The HoltWinters model consists of three smoothing formulas that reflect the longterm trend of the data, the incremental trend, and the seasonal variation. The prediction formula is used to predict the extrapolation, which applies to situations where demand data exhibits trend and seasonal cycle characteristics.
Trend feature is represented by a single variable
And the predicted value at time
To initiate the updating procedure, we must choose the starting values for the smoothing value, the trend value, and the period coefficient. At present, there are many methods for the selection of initial values. Here, we use a more general way to select the initial value. The equation is as follows:
We denote the number of cycles
To obtain more accurate smoothing parameters
And
From the analysis in Section
The temporal features considered in the regression model include trend feature, dayofweek feature, and monthofyear feature. Trend feature is represented by a single variable
As the size of time interval changes, not only the number of variables used will be different, but also the final prediction accuracy will be different for the model. So we assume that the time interval variable is
Next, we constructed a linear regression model to describe the relationship between TSF and the DPF, so the DPF can be expressed as
The variables
Both of the above methods can be used to forecast the DPF, but they have some insignificance. The HoltWinters model takes advantage of time series characteristics of passenger flow, ignoring the impact of different periods of departure frequency. The other method considers the change of OD’s TSF in different times, but there are some shortcomings in the time characteristics of the impact of the method. It is natural for a decision maker to consider timevarying forecast combination schemes to avoid the disadvantages of those two methods. This paper uses the minimum absolute method [
In the hybrid forecasting, the essential step is to determine the weight coefficient of the hybrid forecasting model, so as to achieve the purpose of synthesizing the information of different forecasting methods to improve the accuracy of prediction. The traditional method of combination forecasting is equalization prediction method, least squares method, and so on. However, these methods have several disadvantages, but the minimum absolute method has excellent characteristics to overcome these problems.
From the above,
The minimum absolute value method was based on the absolute value error as the objective function of the minimum; the mathematical model is as follows:
The constraint satisfies the following formula:
So we can transform (
In data preparation, daily data of the Beijing South Railway Station to Shanghai Hongqiao Railway Station’s passenger flow and TSF from July 1, 2012, to the end of 2016 was collected in our experiments. The official operation time of the BeijingShanghai HSR was in July 2011. However, since the passenger flow was still in the breeding stage and the passenger flow characteristics were not visible, this article abandoned the passenger flow data from July 2011 to June 2012. Moreover, the DPF in the holiday has a significantly different pattern from that in ordinary days, among the over two years’ daily data, points that are during China’s legal holidays (e.g., Spring Festival, Labor Day, and National Day) are removed from the regression. That is to say, only ordinary days are considered in training and testing experiments. For example, when evaluating the forecasting ability of the period from October 16 to November 15, 2013, daily data from July 1, 2012, to October 15, 2016, are used for training.
In China, HSR in the working day (except Friday) use the daily timetable, on Friday and the weekend use the weekend timetable, and in the holiday use the peak timetable. Different types of timetable are designed to accommodate different passenger demand, so in a day at various intervals and the same OD’s TSF, there will be some differences. Typically, the daily TSF of the three timetables for the same OD has a quantitative relationship,
According to the data used, starting from Beijing to Shanghai trips starting time ranges from 6 a.m. to 19 p.m. Therefore, this article uses the 6:00 a.m. as the starting point and 17:00 p.m. for the endpoint, according to a particular time interval which will be divided into different time segments. For comparison of several prediction methods, we use three methods to predict the traffic after 30 days, respectively, and compare it with actual traffic.
The performance measures adopted in our research are the widely used mean absolute percent error (MAPE,
MAPE results of different models for 25 testing experiments were shown in Figure
MAPE values (%) of different models for 25 testing experiments.
Testing experiment  Hybrid  HoltWinters  Linear 

1  18.1  18.1  20.4 
2  17.3  17.0  20.5 
3  12.2  12.5  20.4 
4  15.7  21.9  21.4 
5  12.6  16.8  22.3 
6  12.1  13.6  23.5 
7  12.3  15.9  23.3 
8  20.4  32.9  23.1 
9  15.0  23.0  22.4 
10  14.7  19.4  15.5 
11  19.4  27.2  15.6 
12  16.2  22.7  15.0 
13  11.2  13.6  14.8 
14  11.0  12.6  14.7 
15  11.7  11.9  14.5 
16  10.6  12.2  14.6 
17  10.1  12.7  14.8 
18  15.3  23.8  15.2 
19  22.8  34.6  15.8 
20  12.5  17.1  17.0 
21  10.3  12.7  17.4 
22  9.7  12.1  17.3 
23  16.2  24.3  17.2 
24  21.1  42.3  37.9 
25  12.2  22.4  39.0 


Average  14.4  19.7  19.7 
MAPE of different models for 25 testing experiments.
VAPE results are demonstrated in Figure
VAPE values (%) of different models for 25 testing experiments.
Testing experiment  Hybrid  HoltWinters  Linear 

1  1.2  1.1  2.9 
2  1.2  1.2  2.9 
3  1.8  2.0  2.8 
4  1.2  2.2  2.8 
5  0.8  1.2  2.7 
6  0.8  0.9  2.6 
7  0.8  1.4  2.8 
8  1.5  3.1  2.8 
9  1.1  2.5  2.6 
10  1.1  2.0  2.1 
11  1.8  3.2  2.0 
12  1.7  2.9  1.9 
13  0.6  1.0  1.8 
14  0.5  0.7  1.9 
15  0.5  0.6  2.1 
16  0.5  0.6  2.4 
17  0.6  0.7  2.4 
18  1.4  3.0  2.6 
19  2.4  4.3  2.5 
20  0.9  2.3  2.7 
21  0.6  1.0  2.9 
22  0.6  1.6  2.7 
23  1.3  3.2  2.5 
24  2.9  5.4  1.8 
25  0.7  3.5  1.6 


Average  1.1  2.1  2.4 
VAPE of different models for 25 testing experiments.
From the above comparisons, it is found that hybrid model proposed in this work can perform better in both forecasting accuracy and stability. Besides, since our model has established a definite relationship between passenger time characteristics and TSF, it also gives a way to forecast the real DPF. Taking experiment 25 (corresponding to the period of November 5, 2016, to December 4, 2016) as an example, the real DPF, together with the forecasted DPF by three different ways, are illustrated in Figure
Analysis of daily passenger flow by different methods (the period of November 5, 2016, to December 4, 2016).
From Figure
From the model, we can see that the selection of the interval will affect the accuracy of the linear regression model, which in turn affects the accuracy of the hybrid model prediction. 17 different time intervals were selected to explore the influence of different time intervals on the model. At the same time, to enhance the reliability of the results, we will increase the number of samples at each time interval from the 25 to 60.
The MAPE of different time intervals is plotted as a boxplot. On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually.
From Figure
MAPE at different time intervals.
Multisample average MAPE at different time intervals.
From Figure
This paper constructs a hybrid forecasting model, and the model not only makes full use of the time series model for the more accurate prediction of the periodic data but also utilizes the effects of TSF. From the example, we can see that the hybrid model is more accurate and more stable than the single model. Compared with other models, the data sources used are less and easier to obtain. Moreover, the model in this paper can be employed to provide the basic information for the adjustment of the train timetable.
However, in this paper, we consider the TSF in different time during a day but never take the rated transportation capacity into account. Other factors such as the strategy of seat allocation are not considered for simplicity. The refined representation of transportation capacity of HSR line is necessary for the future work.
The authors declare that they have no conflicts of interest.
This research was supported by China Railway Corporation Technology Research and Development Plan Project (2016X005E) and National Key R&D Program of China (2016YFB1200600).