Forecasting Method for Urban Rail Transit Ridership at Station Level Using Back Propagation Neural Network

Direct forecasting method for Urban Rail Transit (URT) ridership at the station level is not able to reflect nonlinear relationship between ridership and its predictors. Also, population is inappropriately expressed in this method since it is not uniformly distributed by area. In this paper, a new variable, population per distance band, is considered and a back propagation neural network (BPNN) model which can reflect nonlinear relationship between ridership and its predictors is proposed to forecast ridership. Key predictors are obtained through partial correlation analysis. The performance of the proposed model is compared with three other benchmark models, which are linear model with population per distance band, BPNN model with total population, and linear model with total population, using four measures of effectiveness (MOEs), maximum relative error (MRE), smallest relative error (SRE), average relative error (ARE), and mean square root of relative error (MSRRE). Also, another model for contribution rate of population per distance band to ridership is formulated based on the BPNNmodel with nonpopulation variables fixed. Case studies with Japanese data show that BPNNmodel with population per distance band outperforms other threemodels and the contribution rate of population within special distance band to ridership calculated through the contribution rate model is 70%∼92.9% close to actual statistical value. The result confirms the effectiveness of models proposed in this paper.


Introduction
In transportation planning, ridership modeling and forecasting is the basis for analyzing project viability and sustainability in the long run.Urban Rail Transit (URT) ridership at station level is an important element of URT ridership, which is critical for determining scale of stations and access facilities.Four-step model has been a traditional method for transit modeling [1], a forecasting method mainly on a region scale, integrating four interrelated submodels, trip generation, trip distribution, mode split, and traffic assignment together [2,3] which makes it inappropriate to forecast detailed transit ridership, such as at the station level.Furthermore, the fourstep model needs all basic information involving trips on a regional scale, which requires large amount of resources.Despite its proven effectiveness, the high complexity and costs prevent it from providing quick and timely response to dynamic land-use change within service area of stations.
Forecasting method for URT ridership at station level with multivariate regression models, also known as directforecast method, can forecast ridership based on the changes in factors affecting ridership throughout service area of stations [4][5][6][7][8].This method considers factors affecting ridership such as built environment, social and economic attributes within service area, and ownership of stations as independent variables and average daily or peak hour ridership as dependent variable for regression analysis.
Considerable research has been conducted on finding factors significantly affecting ridership.Cervero and Kockelman analyzed relationship between travel demand and 3Ds (density, diversity, and design) [9]."Density" refers to population and employment density within the service area of the stations, which is considered as the most important factor [4][5][6][7][8]."Diversity" means land-use type and land-use mix within the service area of the stations [10][11][12]."Design" indicates whether or not the design of streets and roads is convenient for people to access transit services [12,13].Additionally, station ownership can also explain the ridership changes well.The type of station (intermediate, terminal, interchange, and intermodal) influences the ridership as well [7,8,11,12].The distance from the station to central business district (CBD) was also found to be a significant factor [8,12].Parking and riding facilities and feeder services [7,8,12] also affect the ridership.
The tendency of using transit declines as the distance from stations/lines increases [7,8,14,15].Zhao et al. (2003) drew an exponential curve to depict the effect of walking distance to transit by surveying people living in station service areas [14].Gutiérrez et al. drew a line to depict the ratio of inhabitants living in service areas of stations to daily boarding per distance band and an exponential curve to depict the ratio of employment population within service area of stations to daily alighting per distance band [8].The previously mentioned line or curve is called distance-decay function curve.The literature recommends that the population variable in forecasting model needs to be weighted by the distance-decay function according to its distance to the station.However, it is problematic when the function is used to weight population living or working in the service area of stations, because distance-decay weight in real world may not change continually as the distance-decay function shows in special distance bands.It stays the same in per distance bands.
Forecasting models such as the ordinary least squares (OLS) regression model [12], Poisson regression [16], and other multivariate linear regression [8] models are fed by variables on the characteristic of stations and their catchment areas.However, linear regression models are not appropriate because relationship between some predictors and URT ridership at station level does not show linear trends.Taking population within service area as an example, if the degree of land-use mix is high, inhabitants and employment are balanced in this area.And it is likely that no extra transit demand is derived.For this, back propagation neutral network (BPNN) model which can reflect high nonlinear relationship between URT ridership and its predictors is put forward.BPNN model is trained using case study data, with predictors as input and ridership as output, to get parameters (weights and bias), and then it is used to forecast the URT ridership at station level.
The major contributions of this paper are as follows: (i) Take population or employment per distance band as predictor directly.
(ii) Identify key factors affecting ridership through partial correlation analysis.
(iii) Formulate BPNN model to reflect exact relationship between URT ridership and its predictors.
(iv) Formulate a model of contribution rate of population per distance band to URT ridership using numerical analysis.The population distance bands include 0-1 km, 1-2 km, 2-3 km, 3-4 km, 4-5 km, 5-6 km, 6 km, and above, according to road network buffer by GIS (Geographic Information System) [17].Data from "Total Rail Season Tickets Classified by Departure Stations and Inhabitant Zones of Tokyo, Japan" at 2010 are used [18].The data show that, within 6 km of service areas, access ridership during morning peak hour is above 97% of the total access ridership.So, 6 km and above is taken as a unit distance band.Population of each distance band is sum of capita living/working in Chomes (smallest unit of street block in Japan) within this distance band.

Methodology
Road density (unit: km/km 2 ) is used to measure the convenience to stations.Larger road density favors access to stations by walking and increase transit use [7,8,12].This variable is calculated by total length of roads gauged by Google Earth divided by area of the service area.
Number of shuttle bus lines is used to measure convenience of shuttle bus access, which can be obtained from Google Earth within service area of 200 m radius from station.Both shuttle bus and URT belong to mass transit and they are strongly dependent on each other.
Land-use mix can be expressed by land-use mix ratio.The higher the value, the more diverse the land-use throughout this area and the more likely the fact that inhabitant and employment population are balanced here.The formula is as follows [19]: where  mix is land-use mix ratio of service area;  popu is population density of service area (unit: per/hectare);  em is density of the th type employment (unit: per/hectare) of service area;  is total number of employment types in service area.
Unidirectional peak-hour train frequency is sum of all trains in one direction from any line stopping at a station.Same as number of lines through station, station type (terminal or not) and distance from station to CBD are used to measure station attraction.The station type is a binary variable: "1" indicates terminal and "0" otherwise.
The number of parking and riding facilities is also an important variable.More facilities are likely to attract more drivers, making them choose transit instead.Data for the last 5 variables can be obtained from official websites of railway operation industries in Japan.

Partial Correlation Analysis.
Identifying key factors is critical.If all the above variables are used to formulate forecasting model, the performance of model will be unsatisfying because of the correlation between these variables.
Partial correlation analysis [20,21] is used to find key factors, avoiding the influence of other confounding factors when analyzing relationship between two variables.The formulas are listed as follows: where   is simple correlation coefficient of variable  and ;  , 1 is correlation coefficient of variables  and  when controlling  1 ;  , 1  2 is correlation coefficient of variables  and  when controlling  1 and  2 ;  , 1 ,...,  is correlation coefficient of variables  and  when controlling  1 , . . .,   ;   ,  is the th value of variable  and the mean value of ; the meaning of   ,  is the same as   , .
The partial correlation coefficient needs to be tested for significance further.The null hypothesis is that the partial correlation coefficient to be tested is not significantly different from zero.The corresponding t-statistic for the test is shown in where  is statistic value of test;  is partial correlation coefficient to be tested;  is sample size.When degree of freedom is  − 1, the probability value  corresponding to  is obtained to be compared with 0.01 and 0.05.If  < 0.01, association between the two variables is strongly significant.If 0.01 <  < 0.05, association between the two variables is generally significant.If  > 0.05, association between the two variables is not significant.Partial correlation coefficient is used to identify key factors in the case study.

Data Source and key Predictors.
Data from 129 stations in Tokyo, Japan, are used for case study to get the key predictors.Tokyo is the capital of Japan, encompassing 23 special wards, 26 cities, 5 towns, and 8 villages, which is also called Tokyo Metropolitan Area.The whole area is 2188.67 square kilometers, and the population is 13.23 million.URT is the main commuter traffic mode of Tokyo, whose total line Selected stations in case study Urban rail transit lines is approximately 1000 km long and amount of total station is approximately 800 excluding the line servicing just suburban area.The selected station for case study is mainly in 23 special wards where the economic and trade activity mainly develops.Figure 1 is the distribution of urban transit rail line and selected station in case study in Tokyo Metropolitan Area.
(1) Access Ridership (Annual Average Daily) in Morning Peak Hours.The data are obtained from "Total Rail Season Tickets Grouped by Departing Stations and Inhabitant Zones of Tokyo" at 2010 [18].
(2) Land-Use Mix within Service Area of the Station.The data is obtained from "The Number of Enterprises and Employment Grouped by Chomes and Industries" of Tokyo statistical information [22], where the 16 types of employment are agriculture-forestry-fisheries, mining, construction, manufacturing, electrical, gas heating and water supply Industry, information and communications, transport, wholesale and retail trade, finance and insurance, real estate, restaurant accommodation, medical welfare, education and learning supported industry, composite services, services not classified, and unclassified public services.Chomes are the smallest units for population, similar to blocks in America.
(3) Inhabitant Population within Service Area of the Station.The data is obtained from "Day and Night Population Grouped by Chomes" of Tokyo statistical information [23].
(4) Characteristic of Station.Variables like "unidirectional peak-hour train frequency," "number of lines through station," "number of parking and riding facilities, station type," and "distance from station to CBD" can be obtained from official websites of railway operation corporation in Japan [24][25][26][27].
Table 1 shows the correlation analysis of population per distance band and ridership using (2)-( 3), when fixing all other variables."Total" in Row 1 of Table 1 means population within all distance bands, which is taken as a single variable.Table 2 shows the correlation analysis of nonpopulation variables and ridership, when fixing all the other variables.Table 1 illustrates that correlation coefficient decreases as distance increases.Population in 0-1 km, 1-2 km, 2-3 km, and 3-4 km bands are strongly significantly correlated with ridership.Population in 4-5 km and 5-6 km bands are generally significantly correlated with ridership.Population above 6 km band and population within all bands are not correlated with ridership.
Table 2 shows that "road density," "number of shuttle bus lines," "land-use mix," "peak-hour unidirectional train frequency," and "station-type" are all generally significantly correlated with ridership.It needs to be noted that "landuse mix" is a negative predictor for ridership.That is because ridership in case study is the amount of annual average daily passengers entering station in the morning peak hour, which is mainly produced by inhabitants living in service area of the station.Imagine that there are two stations with same amount of inhabitants in their service area; if one station has no employment in its service area for inhabitants, inhabitants have no choice but find employment outside the region, so the URT railway may be used with high probability.Meanwhile, the area for the other station has enough employments for inhabitants, inhabitants would like to work near their home, and thus URT may be used with low probability.The result is ridership of station whose "land-use mix" is high is less than that of the station whose "land-use mix" is low.However, the case study only shows "land-use mix" is negative for ridership entering station in the morning peak hour, and how it influences the all-day ridership entering station needs further research.
Other than results in Table 2, correlation coefficient of "peak-hour unidirectional train frequency" and "number of lines through station" is 0.924 using (2)-(3), and they are strongly significantly correlated.Table 2 shows "number of lines through station" is not correlated with "ridership," while "peak-hour unidirectional train frequency" is generally significantly correlated with "ridership" even though the coefficient of the two is only 0.043.Since predictors should be independent of each other [28], the variable "number of lines through station" is abandoned.Correlation coefficient of "road density" and "distance to CBD" is 0.542 using (2)-(3), and they are strongly significantly correlated.Table 2 shows "distance to CBD" is not correlated with "ridership," while "road density" is generally significantly correlated with "ridership" and the coefficient of the two is 0.07.Thus, "distance to CBD" is abandoned.

BPNN Model.
BPNN model is selected as the forecasting model for URT ridership at station level [29][30][31].BPNNs have hierarchical feed forward network architecture [32].The processing procedures can be done with a minimum of three layers: one layer that receives and distributes the input pattern, middle or hidden layer that captures the nonlinearities of the input/output relationship, and one layer that produces the output pattern.BPNNs also may contain a bias node in the output and/or hidden layers that produce a constant output of 1 and is fully connected to the upper layer but receives no input.The procedures of the BPNNs are the error at output layer propagating backward to input layer through the hidden layer in the network to obtain the final expected outputs.BPNNs are trained by repeatedly presenting a series of input/output pattern sets to the network.The trained network is usually examined through a separate set of data called the test set to monitor its performance and validity.When the mean squared error (MSE) of the test set reaches a minimum, network training is considered to be complete and the weights are fixed.
Three-layer BPNN (input-hidden-output) is able to reflect any nonlinearity from input to output and thus is adopted in this paper.The number of key factors affecting ridership is 11, and it is also the number of nodes in input layer.In output layer, there is only 1 node: ridership.The number of nodes in hidden layer is | √ 11| ≈ 4, which is square root of product of number of nodes in input layer and number of nodes in output layer.
The mathematical formulation of BPNN is as follows: subject to the following: For output layer, = 1, 2, . . ., ;  = 1, 2, . . ., . ( For hidden layer, in which, transform function where (, , V, ) is the objective function, minimizing the MSE of the actual and predicted ridership; , V are matrix of weights variable; ,  are bias vector variable;   (), Ô () are the actual and predicted ridership of the th neuron in output layer of the th station sample.In our model, there is only one node in output layer and thus  = 1.V  is the weight of the th node in hidden layer related to the th node in output layer;   is the weight of the th node in input layer related to the th node in hidden layer;   is the output of the th node in hidden layer;   is the input of the th node in output layer, that is, the value of key variables;   is the th bias node in output layer;   is the th bias node in hidden layer;  is the number of nodes in hidden layer, which is rounded off of √. is the number of nodes in input layer. is the total number of station samples.Steepest gradient descent method is adopted to update the weight and bias at each iteration.The detailed solution steps are described as follows: Step 1 (initialization).Initialize weight and bias of BPNN.At the same time, for BPNN train, it is needed to set prediction accuracy and the maximum number of learning iterations.
Step 2. Select input (predictors)/output (actual ridership) of any station randomly.Use BPNN with initial weights and bias to forecast and obtain the predicted ridership.Then, MSE can be obtained by comparing the predicted ridership with actual ridership of the station.Calculate   = (ô − )(1 − ), where   is the MSE derivation of output of output layer.
Step 4. Adjust each weight V  using   and   ; that is, ΔV  =     .Adjust each weight   using  ℎ () and   ; that is, Δ  =  ℎ ()  , where  is the learning rate of BPNN; ΔV  is the add value of V  ; Δ  is the add value of   .
Step 5. Use BPNN and weights and bias obtained from Step 4 to forecast ridership of all stations using input (predictors) of these stations.Then, the objective function  can be obtained by comparing the predicted ridership with actual ridership of all stations and summing the differences up.If  satisfies the prediction accuracy requirement or the maximum number of learning iterations is satisfied, then the optimal value is obtained and the calculation procedures are terminated.Otherwise, select input (predictors)/output (actual ridership) of next station, go back to Step 2, and go to the next iteration.

Contribution Rate Model.
Contribution rate model which predicts the contribution of population within specific distance band to ridership at station level is formulated in this section, by fixing the value of other variables.BPNN is able to reflect interrelationship between key predictors and ridership.To obtain contribution rate, population within other distance bands are set to zero.By changing population within specific distance band, we can observe corresponding changes of ridership.Thus, variables other than population (e.g., road density, number of shuttle bus lines, land-use mix, peak-hour unidirectional train frequency, and station type (terminal or not)) need to be known first.Figure 2 shows how to formulate BPNN model to obtain the contribution rate of population within 0-1 km band for ridership.
The detailed formulas are listed as follows: where ℎ is contribution rate to ridership of population within specific distance band;  is population within specific distance band  > 0;  is ridership predicted by BPNN; () is the BPNN model.The contribution rate model is solved by the following process.Plot dot (,) in the two-dimension coordinate which takes population within specific distance as -axis and ridership as -axis.Find a curve to comply with all the existing dots in the principle that variance is smallest.The curvature is the appropriate solution that is contribution rate to ridership of population within specific distance band.

Case Studies
3.1.BPNN Model.Data of previous 129 stations in Tokyo, Japan, are used as case study and implementing BPNN model to forecast ridership at station level.All variables are normalized due to dimension difference.This paper sets learning rate, prediction accuracy, and maximum number of learning iteration to be 0.8, 0.001, and 30000, respectively.BPNN is trained with data of 117 stations.
The optimal weights and bias are shown in Tables 3-5.Data of the other 12 stations are taken for model test.Weights and bias from the above training process are used to forecast the ridership of the 12 stations.Results are shown in Figure 3.
To verify forecasting accuracy of BPNN model in this paper, its results are compared with that of linear model with population per distance band, linear model with total population, and BPNN model with total population.The other three models are calibrated/trained using the same input/output pattern sets of 117 stations and are implemented  for prediction with the same data of other 12 stations.Linear model is calibrated/trained by SPSS.The results are compared using multiple MOEs (unit: %), maximum relative error (MRE), smallest relative error (SRE), average relative error (ARE), and mean square root of relative error (MSRRE).Relative error is the difference between forecasting ridership of one model and actual ridership of each station divided by actual ridership.MRE, SRE, and ARE of a model indicate the maximum, the smallest, and mean values among relative errors of 12 stations, respectively.MSRRE of a model is the mean value of square root of relative errors of 12 stations.6 show results of four models graphically and numerically.
Figure 3 shows that, compared with other three models, result of BPNN model with population per distance band is the best and the difference of prediction of this model and the actual ridership is the smallest.In  The results are shown in Figure 4. Figure 4 shows the curve of contribution rate of population per distance band to ridership.By (8), we get contribution rate of population within 1-2 km, 2-3 km, 3-4 km, 4-5 km, and 5-6 km distance band to ridership.The results are compared with actual contribution rate of 129 stations by statistical analysis that total ridership from special distance band of 129 stations divided by total population within corresponding special distance band of 129 stations.Table 7 shows results from the BPNN model and the actual contribution rate are close.The relative error between the BPNN mode and actual data ranged from 7.1% to 30%.There are two possible explanations.The major one is that the value of population per distance band of one station may overlap with adjacent stations.For example, "14201 inhabitant zone" belongs to service area of both Kitami station and Komae station in Tokyo, Japan.It is 1.7 km from Kitami station and 1.8 km from Komae station.There are 14775 inhabitants in 14201-inhabitant zone.Potentially, some of the people may choose Kitami station while others may choose Komae station.However, in fact, population in 1-2 km band of the two stations are both set as 14775 in BPNN model, which leads to contribution rate of population within 1-2 km band to ridership decreasing.When calculating actual contribution rate of population within 1-2 km band to ridership, population within 1-2 km band of 129 stations is added up, and population in overlapping areas of two adjacent stations are just added once.Thus, actual rate is greater than the results from model.Another possible reason is that this paper sets the value of station type to 0 as the background value.In fact, some stations are not terminal.Population contribute less at nonterminal stations than at terminal station to ridership.So the rate from the model is smaller.

Conclusions
On the basis of previous researches, factors affecting URT ridership at station level are summarized and identified.Key factors are then obtained through partial correlation analysis, including population per distance band, road density, number of shuttle bus lines, land-use mix, peak-hour train frequency in one direction, and station type (terminal or not).
BPNN model is formulated to forecast ridership due to the nonlinear relationship between ridership and its predictors.Input (factors affecting ridership)/output (ridership) pattern sets of 117 stations in Tokyo, Japan, are adapted to train the model and data from other 12 stations are used to predict for test.The result obtained from BPNN with population per distance band is compared with that of BPNN with total population, linear model with population per distance band, and linear model with total population.The MOEs, for example, MRE, SRE, MRE, and MSRRE, are used to evaluate the model and results show BPNN model with population per distance band has the best performance.Since the model can reflect the internal relationship between ridership and its affecting factors, when one of factors varies, ridership can be quickly and efficiently predicted.
Based on BPNN model, contribution rate model of population per distance band to ridership is constructed, when setting other nonpopulation variables as background.Results of the case study show the effectiveness of the model.When population within special distance band changes, ridership from this population can be calculated quickly and timely by multiplying corresponding rate and population without performing BPNN model once more.Explanations of the relative errors are also presented.

Figure 2 :
Figure 2: Contribution rate model for ridership of population within 0-1 km band.

Figure 3
Figure 3 and Table6show results of four models graphically and numerically.Figure3shows that, compared with other three models, result of BPNN model with population per distance band is the best and the difference of prediction of this model and the actual ridership is the smallest.In Table 6, except SRE, for other MOEs, BPNN model with population per distance band outperformed other three models.Values of each MOE of linear model with total population are nearly twice that of linear model with population per distance band.Meanwhile, values of each MOE of BPNN with total population are about
This paper uses data of Tokyo, Japan, to illustrate how values of the above factors are calculated or obtained and the case study is conducted using the same dataset.

Table 1 :
Correlation analysis of population per distance band and ridership.

Table 2 :
Correlation analyses of nonpopulation variables and ridership.

Table 6 ,
except SRE, for other MOEs, BPNN model with population per distance band outperformed other three models.Values of each MOE of linear model with total population are nearly twice that of linear model with population per distance band.Meanwhile, values of each MOE of BPNN with total population are about

Table 3 :
Weights for nodes in input layer and hidden layer.

Table 4 :
Weights for nodes in output layer and hidden layer.

Table 6 :
MOEs of four models.

Table 7 :
Comparison of results from model and the real contribution rate.