A Logistic Regression Model with a Hierarchical Random Error Term for Analyzing the Utilization of Public Transport

Logistic regression models have been widely used in previous studies to analyze public transport utilization. These studies have shown travel time to be an indispensable variable for such analysis and usually consider it to be a deterministic variable. This formulation does not allow us to capture travelers’ perception error regarding travel time, and recent studies have indicated that this error can have a significant effect onmodal choice behavior. In this study, we propose a logistic regressionmodel with a hierarchical random error term.The proposed model adds a new random error term for the travel time variable. This term structure enables us to investigate travelers’ perception error regarding travel time from a given choice behavior dataset. We also propose an extended model that allows constraining the sign of this error in themodel.We develop two Gibbs samplers to estimate the basic hierarchical model and the extended model. The performance of the proposed models is examined using a well-known dataset.


Introduction
Understanding the utilization of public transport is important for policy design and urban traffic planning.From a behavior analysis perspective, such utilization can be analyzed in terms of a binary choice problem in which the traveler must choose between public transit and a private mode of transport.Previous studies usually employ logistic regression models to discuss this binary choice problem.These models can be used to predict choice probability and to evaluate the effect of various attitudes on the utilization of public transport.The effects of demography, travel cost, travel time, and accessibility on such utilization can be analyzed through estimating the parameters of these models.
McGillivray [1] proposed the binary choice problem in the two-modal (public and private) case and applied a logistic model to investigate the dependence of modal choice on the individual values of time and cost by modal.Gebeyehu and Takano [2] showed that the logistic model is a useful tool for evaluating transit systems; their study developed a logistic model to investigate citizens' perceptions of bus conditions in Addis Ababa.Hess [3] attempted to apply a logistic regression model to assist planners and policymakers in expanding the mobility and accessibility of public transport for older adults by analyzing the influence of the accessibility of public transport on ridership for older people in California and New York.Johansson et al. [4] further formulated the binary logit model with latent variables; this structure can examine the effects of attitudes and personality traits on mode choice.Muley et al. [5] and Badland et al. [6] focused on microscopic behavior in the binary choice problem.Muley et al. [5] employed a binary logistic regression model to explore the impact of personal and transit characteristics on the utilization of public transport.Badland et al. [6] adopted a logistic regression model to discuss how parking availability and public transport accessibility influenced the split between uptakes of the two modes.However, Buehler and Pucher [7] studied the binary choice problem from a macroscopic perspective and employed a logistic model to compare utilization of public transport in the US and Germany.
One benefit of using a logistic regression model to analyze the utilization of public transport is that the model can consider the combined effects of attributes through a linear combination and can easily evaluate the contribution of various attributes to such utilization.Previous studies agree that travel time is an important variable for formulating a logistic regression model to analyze this binary choice problem.Although previous studies usually prefer to treat travel time as a deterministic variable in the model, one cannot neglect that travelers' perception error can prevent travelers from accurately evaluating their actual travel time.On the other hand, travelers also cannot say exactly how long a given travel time, for instance, 10 minutes, actually is.Carrion [8] provided a case analysis for this error using self-reported as well as GPS measured travel times and suggested that perception error regarding travel time can influence travel behavior.Cheng and Tsai [9] studied travelers' perceptions of travel time through a questionnaire survey.They found that travel time perceptions can be influenced by personal characteristics such as gender and age.
As aforementioned, researchers have recognized the effect of perception errors regarding travel time on travelers' choice behavior.However, the classic formulation of a random error term in logistic regression models cannot reflect this perception error appropriately (Chen et al. [10]).Analyzing travel behavior using the classic logistic regression model might therefore prevent us from looking for further insights into the utilization of public transport.Hence, this study proposes a hierarchical logistic regression model to fill the gap.Previous studies have successfully applied hierarchical error terms to model heterogeneous unobservable utility in logistic regression models (e.g., Tilahun et al. [11] and Czado and Prokopenko [12]).Inspired by these studies, we adopt hierarchical error terms to solve the problem.However, we add these terms after important attributes such as travel time rather than after the entire utility function in order to capture travelers' perception error regarding the attributes.This study is not merely an effort to improve the performance of the regression model; more importantly, the proposed model offers an alternative way to estimate the statistical characteristics of perception error regarding travel time and allows us to explore the property of this error.Although this study prefers to focus on modeling perception error regarding travel time, we believe that individuals might have perception errors on other attributes such as travel cost.The proposed model can be easily used to estimate other such errors as well.
We first propose a basic hierarchical model and then develop an extended model in this study.The extended model allows us to constrain the sign of perception error regarding travel time.Correspondingly, this study also develops two Gibbs samplers to estimate the parameters of the proposed models.We evaluate the performance of the proposed models using a well-known dataset provided by Horowitz [13,14].

The Hierarchical Logistic Regression Model
For simplicity, we describe the proposed model based on a binary choice problem that was provided by Horowitz [13].In this choice problem, an agent faces making a choice decision between using a private car or public transport.The attributes considered in this choice problem are CARS, DCOST, DOVTT, and DIVTT where DCOST is "public transport fare minus private car travel cost," CARS is "private cars owned by the traveler's household," DOVTT is "public transport out-of-vehicle time minus private car out-of-vehicle time," DIVTT is "public transport in-vehicle time minus private car in-vehicle time." We formulate the choice problem through a logistic regression model.If we present this logistic regression model as a latent-variable model, then the model can be obtained as follows: where  and  are the indexes of individual and survey questions, respectively.  = 1 denotes that the choice result is private car and is otherwise public transport.The logistic regression model uses a random error term   to present stochastic behavior; however, this formulation does not allow us to investigate the perception error of attributes.In this study, we introduce a random variable Δ  into the logistic regression model.Here Δ  denotes the random perception error on DIVTT for individual .We consider that the value of Δ  can be different among individuals and that Δ  follows a probability distribution.The logistic regression model with a hierarchical random error term can be obtained as follows: In the above equation, we use  to denote the parameter vector [ 0 ,  1 ,  2 ,  3 ,  4 ].In the proposed model, Δ  is a latent random variable, which serves as the random error term for DIVTT rather than the random term of the utility function.This structure enables us to investigate the perception error of DIVTT.We assume that Δ  follows the normal distribution with unknown mean  and specify the variance of the normal distribution as 1 for identification purposes.() and () denote the prior distributions of  and , respectively.We wish to estimate the value of  along with .In this hierarchical model, DIVTT is divided between DIVTT  and Δ  ; this leads to DIVTT no longer being a deterministic variable in the model.
The logistic regression model shown by ( 2) can be further extended.Here, we consider that Δ  < 0 indicates that individual  subjectively weakens the difference of in-vehicle travel time when the agent makes the choice decision; on the other hand, Δ  > 0 suggests that the individual subjectively enlarges the difference of in-vehicle travel time.In addition, we impose the value of DIVTT  + Δ  to be a nonnegative (nonpositive) value if DIVTT  > 0 (≤0).Following these considerations, we rewrite   of (2) as where In ( 4), if DIVTT  < 0 then the value of (DIVTT  ) is 1 or else 0. As shown by ( 2) and (3), the combination of Δ  and   leads to a hierarchical random error term structure; this structure can be presented as a Directed Arc Graph (see Figure 1).

Estimation Methods
First, we discuss how to estimate parameters of the model shown by (2).Stefanski [15] indicated that the logistic distribution can be represented as a normal scale mixture.Accordingly, Holmes and Held [16] suggested an auxiliary variable method to present the logistic regression model.
Along the same lines, the regression model presented by (2) can also be derived in the following form: ∼  ()  ∼  () . is the joint distribution of these parameters, which can be uniquely identified by (2).() and () are the prior distributions for  and , respectively.KS denotes the Kolmogorov-Smirnov distribution.
We develop a Gibbs sampler to draw the samples of , U, , ΔT, and  from the joint distribution and calculate the estimates of the parameters through aggregating the samples.Let X  , X, and Y denote the vector and Y=[  ∀, ], respectively.We use  to denote the indexes of the sample.The outline of the Gibbs sampler can then be given as follows: Step 0 (initialization).Set initial values:  0 = 0, U 0 = 0,  0 = 1, ΔT 0 = 0, and  0 = 0.
Step 1. Draw th sample of Step 2. Draw th sampler of U Step 3. Draw th sampler of Step 4. Draw th sample of ΔT Mathematical Problems in Engineering Step 5. Draw th sample of If  <  then go to Step 1; otherwise, stop the sampling procedure.
The set of conditional distributions , U, , ΔT, and  is referred to as the full set of conditional distributions of (, U, ΔT, ,  | X,Y).Sampling , U, , ΔT, and  in turn from the conditional distributions is equivalent to sampling these parameters from (, U, ΔT, ,  | X, Y) simultaneously.We then illustrate how the Gibbs sampler draws th samplers for , U, , ΔT, and  in detail.

Algorithm 2.
Step 1. Update X   for drawing th samplers for  = 1 to  do Step 2. Draw th sample for .
The conditional distribution ( | U −1 ,  −1 , ΔT −1 ,  −1 , X, Y) is equivalent to ( | U −1 ,  −1 , X  ,Y).On replacing X in Albert and Chib [17] by X  , the sample of  can be generated through the sampling scheme proposed by Albert and Chib [17].As shown by them, () can be specified as a diffuse distribution, and ( | U −1 ,  −1 , X  , Y) can be derived as a normal distribution.
Step 3. Draw th sample for U.
Step 4. Draw th sample for .
The conditional distribution Holmes and Held [16] proposed a rejection sampling scheme that can be used to draw samples from ( |   , U  , X  ).
Step 6. Draw th sample for .
Applying the Bayesian theorem, we obtain (Δ  |   , X, Y, Z  ,   ) as a normal distribution.Without loss of generality, we only present how to obtain (Δ  |   , X, Y, U  ,   ) for the case   = 1 for  = 1, . . ., : The above equation shows that the conditional probability of Δ  is a normal distribution.The mean of the distribution is and the variance of the distribution is In (12),  *  represents We consider the prior distribution of  as a normal distribution with mean 0 and variance  2 .If we further consider the prior distribution as a diffuse prior then this probability function of the prior can be given as Accordingly, the conditional distribution ( | Δ    = 1, . . ., ) can be obtained as This conditional distribution is a normal distribution too; the mean of the distribution is ∑  =1 Δ   / and the variance of the distribution is 1/.Now, we discuss how to estimate the parameters of the extended model described by (3).In this case, we can also employ Steps 1 to 3 and 5 of Algorithm 2 to draw the samples of , U, , and .We need only to use DIVTT We derive the formulation of (Δ  |   , X  , U   ,    ,  −1 ) as where (Δ  |  −1 ) is the probability density function of the normal distribution with mean  −1 and variance 1.Following (3), we get (  | X  ,   , Δ  ) as where   is defined by (3).The H-M sampling scheme is as follows.
The proposed model structure also allows us to further investigate traveler's perception error regarding both DIVTT and DOVTT.To do this, we just need to modify (3) as follows:  We use Δ  to denote the traveler's perception error on DOVTT and use ] to denote the mean of Δ  .Let ΔS be [Δ  ∀]; the outline of the sampling scheme for th sample of , U, , ΔT, ΔS, , and ] can be described as follows.Algorithm 4.
, U, and  can be generated along the same line of Algorithm 2. Since DOVTT *  and DIVTT *  hold the same formulation, therefore, ΔS  and ]  can be sampled the same as ΔT  and   (see Algorithm 2 and ( 16)).
To estimate the parameters of the proposed model, we can apply the sampling algorithm to draw random samples for , U, , ΔS, ], ΔT, and .These samples can be denoted as  t , U t ,  t , ΔS  , ]  , ΔT  , and   for  = 1 to , where  is the number of drawings.The estimates of the parameters can be obtained through averaging the samples.For example, the estimates of ] and  can be obtained as follows: ] and μ reflect the perception error estimate regarding DOVTT and DIVTT, respectively.

Numerical Example
In this section, we use a modal choice dataset provided by Horowitz [13] to examine the performance of the proposed models.This dataset was collected in Washington D.C. and contains 842 persons' modal choice results (private car or public transport) for the daily trip from home to work.Table 1 provides the data structure of the dataset.We first estimate the hierarchical model defined by (2) using the dataset.We conduct the estimation using the Gibbs samplers.The Gibbs samplers generate 20,000 samples for each parameter.The first 5,000 samples are dropped as the burn-in procedure.The remaining samples are used to aggregate the estimates of the parameters.We use Table 2 to show the estimation results.The sign of the estimates of  2 ,  3 , and  4 suggests that DCOST, DOVTT, and DIVTT have a positive effect on the choice probability of car (and have a negative effect on the utilization of public transport).The value of DIC of the model is 465.764.
We estimate the parameter of the extended model through the sampling scheme with H-M step.We draw 20,000 samples of the parameter and also treat the first 5,000 samples as the burn-in procedure.The DIC of the extended model is 461.687.This result indicates that the performance of the extended model is better than that of the basic model defined by (2).We also estimate the parameters of the logistic regression model defined by (1).The value of DIC of this model is 465.848.This value is slightly higher than the DIC of the basic model defined by (2) and clearly higher than the DIC of the extended model defined by (3).
The samples of the parameters are used to investigate the shapes of the distributions of the parameters.Table 3 reports the estimates of the parameters of the extended model.Figures 2 and 3  In addition to the improvement of the performance, the main contribution of the proposed model is that it can allow us to capture the perception error on travel time and analyze the characteristics of the perception error.Let us look at the extended model.As shown by Table 3, the estimate of  is −1.022; this result implies that people tend to weaken the difference of in-vehicle travel time when making the choice decision.The sampling schemes also draw samples of Δ  for  = 1 to .The latent parameter Δ  directly reflects the perception error on DIVTT of individual .
To further investigate the property of Δ  of the extended model, we group these samples according to the value of DCARS, which is related to the demography of an individual.We calculate the mean and variance of the samples of Δ  for each group.Figure 4 plots the mean of the samples, and Figure 5 shows the variances of the samples versus the value of DCARS.Since the values of DCARS in the dataset do not contain 6 and only one-record DCARS is equal to 7, we study the case where the values of DCARS range from 1 to 5 to avoid bias.
One can find that the mean of the samples of Δ  tends to be close to 0 as DCARS increases.On the other hand, the variance decreases with the value of DCARS.This result implies that travelers who hold more cars will be more sensitive to the difference of travel time between public transport and private car.For example, the mean of Δ  for travelers who own 5 cars is −0.357; this value can change to −1.5623 for travelers who do not own any cars.As mentioned  above, Δ  < 0 indicates that individual  subjectively weakens the difference of travel time when he/she makes the choice decision; therefore, if Δ  < 0, a smaller Δ  can reduce the contribution of the difference of travel time on the travel mode choice decision even more.
To investigate traveler's perception error regarding both DIVTT and DOVTT, we also use Algorithm 4 to estimate the model defined by (20).Table 4 shows that the DIC of the model is significantly smaller than the standard logistic model defined by (1).In the model defined by (3), traveler's perception error regarding travel time can be characterized by the estimate of .On the other hand, in the model defined by (20), traveler's perception error regarding travel time should be characterized by the summation of the estimates of  and ].Looking at Table 4, one can find that the summation of

Conclusions
This study proposes a logistic regression model with a hierarchical random error term to analyze the binary choice problem.The proposed model can account for travelers' perception errors regarding attributes.Since a number of studies have shown perception error regarding travel time to have a significant impact on modal choice, this study focuses in particular on how to capture this error from behavior data.

Figure 1 :
Figure 1: The DAG of the parameters of the hierarchical logistic regression model.

Figure 4 :
Figure 4: Mean of the samples of Δ  grouped by the value of CARS.

Figure 5 :
Figure 5: Variance of the samples of Δ  grouped by the value of CARS.
1. Generate a candidate value for Δ   .

Table 2 :
(2)imates of the parameters of the basic model defined by(2).

Table 3 :
Estimates of the parameters of the extended hierarchical model defined by (3).

Table 4 :
Estimates of the parameters of the extended hierarchical model defined by (20).