Impact of Transit Network Layout on Resident Mode Choice

This study reviews the impact of public transit network layout (TNL) on resident mode choice. The review of TNL as a factor uses variables divided into three groups: a variable set without considering the TNL, one considering TNL from the zone level, and one considering TNL from the individual level. Using Baoding’s travel survey data, a Multinomial Logit (MNL) model is used, and the parameter estimation result shows that TNL has significant effect on resident mode choice. Based on parameter estimation, the factors affecting mode choice are further screened. The screened variable set is regarded as the input data to the BP neural network’s training and forecasting. Both forecasting results indicate that introducing TNL can improve the performance of mode choice forecasting.


Introduction
With the rapid urban development, traffic congestion has become an important topic and numerous measures are taken to solve the problem of congestion.Public transit is one useful approach to reduce the traffic congestion.The rational for this study is to determine if the transit network layout (TNL) affects the traveler's mode choice.Public transit would be used in preference to other modes if the TNL is well designed.This paper studies the impact of TNL on mode choice by evaluating different features of TNL as influencing factors.In addition, individual characteristics and travel features are taken into consideration.The results reported in this paper can be applied to public transit travel demand forecasting.
Mode choice is a hot topic in transportation planning and is often used for traffic demand forecasting.A number of factors influencing mode choice have been taken into account, including energy costs [1], transit fare price [2], parking fees [3][4][5][6], urban land utilization [7], congestion pricing [8], commuters travel time [9], and more.Some researchers have also paid attention to the TNL, but most of them incorporated the TNL into variable used to analysis transit service and in studies on the transit's effect on mode choice.Zhu [10] imported two variables related to TNL into mode choice model.The variables are the distance between origin/destination and metro station and the number of stops surrounding the residence.Pan and Ma [11] utilized GIS Network Analyst functions to reconstruct travel costs for different modes impacted by facilities or services of a new transit project.A "Winner-takes-all" mechanism was applied to assign trips to optimal mode under different traffic conditions.Mode share was calculated based on the estimated trips in impacted areas.Racca and Ratledge [12] employed transit level of service and accessibility as variables and logistic regression was used to forecast the mode choice.The findings show that both variables have a significant influence on resident's mode choice, but it did not tell the degree influence on mode choice.Jin et al. [13] examined how transit service factors such as accessibility and connectivity can be incorporated into mode choice models.The results showed that importing transit service can improve the model's forecasting performance.
Although a number of mode choice model models have been introduced and researchers have paid attention to the transit service's effect on mode choice, little in depth research has studied TNL's (one aspect of transit service) effect on mode choice.Much of the previous work has limitations: (1) They considered the transit service as a variable group and few variables related to TNL are included; (2) The model' input data for TNL is mostly at a zone level.This paper will address these limitation and more fully consider variables related to TNL.MNL Logit model and BP neural network will be employed to estimate parameters and forecast the mode choice respectively.It is also worth noting that zone-level and individual-level input data will both used and a comparison work will also be conducted to determine which data format will have a better forecasting accuracy.

MNL Model and BP Neural Network
MNL model and BP neural network are both widely used to forecast the mode choice, but only BP neural network will be used here, because the BP neural network has a better forecasting performance than MNL model.This conclusion was drawn by other researchers' paper.So MNL model will be only used to estimate the parameters, thus the variables in three groups will be screened.The screened variables set will be regarded as the input of BP neural network.A brief introduction about MNL model and BP neural network will be given.

MNL Model.
Logit model is the most widely used method to forecast mode choice.It consists of binary Logit, MNL, Nested Logit, and Cross-Nested Logit model, of which MNL model is commonly used.It is assumed in the MNL model that residents will choose the most efficient travel mode under certain circumstances (which is known as utility maximization).This utility correlates with individual, family, and travelling characteristics.As a result, the relationship between these characteristics and the utility can be investigated.
If it is defined that resident  has  possible mode choices, then the probability to choose  is   ( = 1, 2, . . ., ) and   represents utility which according to discrete choice model is based on random utility theory.  is made up of certain term   and stochastic term   , where ( = 1, 2, . . ., ) is defined to comply with an independent Gumbel distribution,.The MNL model expression for k to choose i is formulated as where   is the probability for  to choose ;  is the travel mode;   is the set of all the possible modes.  is usually assumed to be the linear function of influencing factor   ( = 1, 2, . . ., ) as is the total number of influencing factors.Procedures of using BP neural network to forecast mode choice are as follows.

BP Neural
Step 1 (determining the input and output layer of the network).Here, the input layer is the set of influencing factors and the output layer is the set of mode choices.
Step 2 (normalizing the input data).In order to accelerate learning speed of the network, the input data is normalized to equal status.All the values of variables (including input and output) are normalized into numbers ranging from 0 to 10 ([0, 10]).
Step 3 (determining the number of hidden nodes).The number of the hidden nodes () is obtained from the traditional empirical formula: where  is the number of input node;  is the number of output node;  is the integral number between 0 and 10.
Step 4 (training the model and forecasting).A thousand records of data are assigned to training data set and the remaining 250 records act as testing data set to verify model's accuracy.

Variables Related to TNL.
There is a positive correlation between TNL density in origin/destination and the possibility residents choose transit.Here the density will be considered from two levels: the zone level and the individual level.The zone level considers the density of transit network and stops in each traffic zone to present the TNL of a city or a region.The individual level considers the number of the bus stops around the resident's origin and destination.The details of the level are next.

Zone Level.
At this level, variable names of TNL are the coverage ratio of bus stops and the density of bus network.The computational formula is [14] as follows.
(1) Coverage ratio of bus stops () is the percentage between service area of bus stops and the whole area of the traffic zone: Coverage ratio of bus stops = Service area of bus stops in traffic zone Area of traffic zone (5) Input layer Hidden layer Output layer Note that some adjacent stops may have overlapping service area, but the overlapping area will only be counted once when all the service area of bust stops is summed.
Usually,  is divided into  300 m (coverage ratio within 300 meters of origin-destination zone) and  500 m (within 500 meters).
3.1.2.Individual-Level.At this level, the TNL variable is the number of bus stops () in the vicinity of an origindestination area. is specifically divided into  300 m (the number of bus stops within 300 meters) and  500 m (within 500 meters).
In conclusion, variable names used to characterize the TNL are as follows. 300 m  (the number of bus stops within 300 meters of destination)  500 m  (the number of bus stops within 500 meters of destination).

Other Variables Selection.
A household survey conducted in Baoding, China, in 2007, reveals that mode choices which account for 97% of all travel choices are bike, on foot, car, motorcycle (including moped), and transit.Therefore, these 5 modes are chosen as the output choice and their proportions are shown in Table 1.
Other variables involve city features, individual characteristics, travel features, transportation policies, and so forth.Based on the survey data, variables in the model are chosen as shown in Table 2.These variables are divided into 3 groups according to whether or not the TNL is taken into consideration and from which aspect it is considered as follows.

Group 1 (Individual Travel). Individual characteristics and travel patterns.
Group 2 (Individual Travel NetZone).Individual characteristics, travel patterns, and TNL in terms of traffic zones.

Group 3 (Individual Travel NetIndividual
).Individual characteristics, travel patterns, and TNL in terms of individuals.

Apply the MNL Model and BP
Neural Network to train the model and the remaining 150 are used to verify the model's accuracy.
In particular, values of  300 m ,  300 m ,  500 m , and  500 m  depend on the exact location of the origin-destinations.However, the data of resident's origin and destination is recorded in zone level in the survey.For example, a resident will leave zone 1 for zone 2 to work, but where is the exact location he leaved in zone 1 and where is the exact location he arrived in zone 2 are not given; thus, a random assignment method is used to generate the exact location for each activity location, and the detailed description is as follows.
Step 1. Divide the activity into 5 types: home, work, shop, leisure, and education.
Step 2. Divide the facility which is used these activities into 5 types.These 5 kinds of facility are used to perform corresponding activity from Step 1.For example, a shop facility allows resident perform shop and work activity.
Step 3. Prepare the input data which mainly includes resident's travel records and the facilities' exact location.A resident's travel records will record his activity and transport mode throughout a day but for this study only travel records related to transit is extracted.
Step 4. Nearby principle is employed to randomly assign each activity to a corresponding facility; thus, each activity's exact location can be obtained.The nearby principle means that every resident will usually choose a nearby facility to perform his next activity.This is logical since it is common in daily life that people will likely to choose a close shop for shopping, a nearby leisure place to play, and so forth.In order to demonstrate the random assignment, two cases are considered.
Take a resident's trip for an example to show how to assign an activity to a corresponding facility.We assume that he will perform leisure activity after work, in addition, which zone he works and plays in is given (these can be found in the survey data).
Case 1. Two adjacent activities (work and leisure) are performed in the same traffic zone (zone 1).After work, resident will choose a facility to perform leisure activity.According to the nearby principle, all leisure facilities within a radius of  will have the equal probability to be chosen for performing the next activity.The value of  is set to be 500 m in this paper.A resident will randomly choose a facility to perform leisure within 500 m of his workplace, and each facility within the circle will have the same probability to be chosen.This process is showed by Figure 2(1).In this case, if a leisure facility is within 500 m of the workplace but out of zone 1, it will not be chosen.
Case 2. Two adjacent activities are performed in a different traffic zone (zone 1 for work and zone 2 for leisure).Resident will simply choose a leisure facility which is nearest to the workplace in zone 2. This process is showed by Figure 2(2).
Step 5.After assignment, each activity's location can be obtained according the corresponding facility's location.Thus, using GIS technology, values of  300 m ,  300 m ,  500 m , and  500 m  can be calculated based on the activities' location.

Parameter Estimation in MNL Model
. Parameters in the model are estimated by Stata software and are used to verify model's accuracy.Three groups of variables are estimated as shown in Table 3 (variables that did not pass -test have been eliminated) in which the walking mode serves as base group.
Several conclusions can be drawn from Table 3.
(1) TNL has a major impact on mode choice.At the individual's level, parameter estimation displays a positive correlation between N 300 m O and the probability for residents to choose public transit.At the traffic zones level, estimation shows that a significant positive correlation is found between  300 m  and the probability to choose public transit, while a significant negative correlation is between  500 m  and the probability to choose public transit.
(2) For those who own bikes, motorcycle and car, the estimation result (see Table 3) shows that owners would use these vehicles to travel.Take bicycles for example, the estimated values of BikeNum in all 3 groups is positive and high, implying that if someone owns a bike, there's a high probability that they would choose to travel by cycling.
(3) Of all the influencing factors on mode choice, travel distance has the most significant impact on all the choices.
Based on the aforementioned, a number of original variables are screened as input variables for BP neural network. 300 m ,  300 m  all other variables related to TNL are eliminated from Table 2, but there still remain three groups of variables.

Results
Forecasted by BP Neural Network.After the screening process, three cases stated in Section 3.1 are trained and forecasted by BP neural network.Results are shown in Table 4 (1, 2, 3, 4, and 5 represent transit, bike, motorcycle, car, and on foot, respectively).Several conclusions can be drawn from Table 4.
(1) The sequence of forecasting level is Group 2 (Individual Travel NetZone) > Group 3 (Individual Travel NetIndividual) > Group 1 (Individual Travel).The finding show that forecasting level can be improved with the addition of correlated variables in TNL; moreover, a better level of forecast accuracy is obtained using variables in Group 2 (Individual Travel NetZone) when the exact location of origin-destination is not available.Variables in Group 3 (Individual Travel NetIndividual) may display a better result if locations in the survey are more precise.However, a study regarding this aspect cannot be completed due to the lack of a high-precision data set.
(2) Forecasting accuracy of transit travel can be increased by adding correlated variables in TNL.And by doing so, hit rate of other modes tends to go up; only a few would go down.

Conclusions
Three different variable sets with TNL characteristics are built, and a comparison among them is conducted to determine which sets have the best forecasting accuracy.Two main conclusions can be drawn.(1) When using individual-level or zone-level variable set, the MNL model's parameter estimation shows that TNL has a significant effect on resident's mode choice, thereby affecting the whole mode split.
(2) A better result is obtained using variables in Group 2 (Individual Travel NetZone) when the exact location of origin-destination is not available.
This paper can lay a theoretical foundation to the optimization of public transit network.

Figure 1 :
Figure 1: Structure of neural network in forecasting mode choice.

( 2 )
Density of transit line () is defined as follows: Density of normal transit line (1) = Overall length of road axis that contains transit lines Area of traffic zone , Density of running transit lines (2) = Overall length of running transit lines Area of traffic zone .

𝐶
300 m  (coverage ratio within 300 meters of origin)  500 m  (coverage ratio within 500 meters of origin) 1  (density of normal transit line in origin area) 2  (density of running transit lines in origin area)  300 m  (coverage ratio within 300 meters of destination)  500 m  (coverage ratio within 500 meters of destination) 1  (density of normal transit line in destination area) 2  (density of running transit lines in destination area) Individual Level  300 m  (the number of bus stops within 300 meters of origin) 500 m  (the number of bus stops within 500 meters of origin)
4.1.Data.To avoid sample error during regression, 650 records of data (i.e., 130 records for each mode) are randomly chosen from the survey data-set.500 of the records are used

Table 2 :
Definition of variables.