Modelling Stochastic Route Choice Behaviours with a Closed-Form Mixed Logit Model

A closed-formmixed Logit approach is proposed to model the stochastic route choice behaviours. It combines both the advantages of Probit and Logit to provide a flexible form in alternatives correlation and a tractable form in expression; besides, the heterogeneity in alternative variance can also be addressed. Paths are compared by pairs where the superiority of the binary Probit can be fully used.The Probit-based aggregation is also used for a nested Logit structure. Case studies on both numerical and empirical examples demonstrate that the newmethod is valid and practical.This paper thus provides an operational solution to incorporate the normal distribution in route choice with an analytical expression.


Introduction
Route choice is one of the crucial issues in transportation analysis because it models the travelling behaviours so as to provide predictions for the future demand.Drivers always try to maximize their travelling welfare when choosing a path from a given origin-destination (OD) pair.However not all of them choose the best alternative because of the imperfect knowledge of network.To model this perception error and the stochastic route choice behaviour, Probit and Logit models are two of the most wildly used methods.The utility of each alternative is decomposed into a deterministic and a random portion.Assume that there are  paths between an OD pair and the route choice set is ; the utility (welfare) of an alternative path  can be represented as where   is the utility of path ,   is the deterministic part which is composed by attributes such as length and cost that can be explicitly captured, and   is the random term that captures the perception error.A rational traveller would select a path with the maximum utility among the alternatives in .
Probit assumes that the random portion is normally distributed; besides, it provides a highly flexible structure for correlation.However, it is limited due to the computation burden.It does not have a closed-form formula when there are more than two alternatives.Generally, the computation of multinomial Probit requires either Clark's approximation [1,2], Monte Carlo simulation [3], or numerical integration [4].Yai et al. [5] used the multinomial Probit model in the context of route choice in the Tokyo rail network, but the maximum number of alternatives is limited to four.On the other hand, Logit is more popular for its analytical tractability.Logit assumes that the error term is type I extreme value (EV) distributed.Moreover, it assumes each of the error terms is independently identically distributed (IID), which leads to a closed-form mathematical structure to simplify the computation in estimation and prediction.As a consequence, Logit has two main disadvantages because of the IID assumption: (1) it cannot represent the path correlation which leads to enlarged probabilities of the overlapped paths, namely, the overlapping problem and (2) it cannot represent the heterogeneity in perception errors which would produce unreasonable results, namely, the scaling problem.
There are several modified methods to address the two drawbacks of Logit in the context of route choice.Regarding the first disadvantage, the overlapping problem, the improved models are classified into two types.
2 Mathematical Problems in Engineering (i) Modifications of multinomial Logit (MNL), such as path size Logit (PSL) [6][7][8][9], C-Logit [10], and implicit availability/perception (IAP) model [11]: in these models, an additional term is introduced in the utility function to capture the correlations of paths, so as to decrease the attraction of the overlapped path.This method maintains the simple form of MNL.Besides, the log-likelihood function of this method is globally concave, so it guarantees a global optimum for parameters estimations.However, the additional terms are convenient approximations.Previous researches show that they might be too sensitive to the composition of the choice set [9,12].
(ii) Generalized extreme value (GEV) proposed by McFadden [13]: the most widely used methods of this type in route choice are the link-based crossed nested Logit (CNL) [14,15] and the paired combinatorial Logit (PCL) [16][17][18][19] model.These two models all have a tree structure to represent the link-path relation, where alternatives with shared attributes are classified into the same nest so the correlation can be explicitly captured.The link-based CNL model treats each link as a nest, and each path uses several links which are classified into the corresponding nests.The PCL model compares paths by paired combinations, and each path pair is a nest.The CNL model has a large set of parameters that need to be estimated, so some researches provide approximated formulas [14,20].Besides, some researchers suggest that the parameters can be achieved by solving a system of equations of the correlation and constraints [21,22].Likewise, the PCL model also requires a parameter to represent the correlation, and the specifications are provided by Gliebe [23] and Prashker and Bekhor [24].
As for the second drawback of Logit, the scaling problem, Pravinvongvuth and Chen [18] propose origin-destination specific scaling factor to represent the different scale of diverse networks.Chen et al. [25] examine the scaling effect when applying route choice model in stochastic equilibrium models.Miwa et al. [26] examine how to set the scale parameter (dispersion parameter) and apply a multiclass stochastic user equilibrium (SUE) assignment model to consider differences in drivers' perception errors.Some researches combine both advantages of Probit and Logit, and the most representative model is the mixed Logit [27], also named as Logit kernel, error component, or hybrid Logit.It incorporates other distributions other than type I EV to provide a flexible and tractable form to represent the correlation across alternatives, the alternative specific variances, and also taste heterogeneity.Frejinger and Bierlaire [8] use the error component to model the subnetwork so as to represent the path overlap in an abstract network.Bekhor et al. [28] estimate an error component model based on the Boston route choice data.However, the mixed Logit does not have a closed-form expression; consequently the estimation and prediction all require the simulation-based method.Researches [20,29] show that the simulation-based method requires a large number of draws to achieve stable predictions.Besides, currently there is no efficient path-based SUE traffic assignment for solving the route choice model with the mixed Logit model [18].
To fully use the advantages of Probit and Logit, this paper proposes a mixed Logit method with a closed-form to model the stochastic route choice behaviours.With a closedform expression, the computation burden in estimation and prediction would be relieved.Moreover, the closed-form formula alleviates the difficulties in the path-based SUE assignment with a mixed Logit model.The paper is organized as follows.Section 2 describes the methodology, including the nested model structure and the Probit-based aggregation.The validation from a numerical example is presented in Section 3. The new method is applied to real data in Section 4. Finally conclusions and discussions for future study are given in Section 5.

Methodology
2.1.Model Structure.The proposed model has a similar structure as the PCL model.Paths are compared by paired combination.Consider  alternatives in the choice set  between an OD pair; by paired combination there are totally  = ( − 1)/2 paths pairs.The new model has a twolevel nest structure.Each path pair is a nest; within the nest there is actually a binary choice case.The expected maximum utility [7] of each path pair is used as the utility of the nest.In the upper level, it is a multinomial choice model with  nests.Consider a three-alternative case, as shown in Figure 1, paths , , and , and the path pairs are (, ), (, ), and (, ).The probability that path  is chosen among three paths is a combination of the marginal probability (, ) of the nest and the conditional probability [ | (, )] within the nest, which is In order to relax constraints of the Logit model, the error part   in ( 1) is decomposed into two parts,   and   .The first error   , which is IID EV distributed, captures the differences of nests; the second error   , which obeys normal distribution, captures the differences within the nest.For the first nest that includes paths  and , their utilities are where   =   and   =   ,   and   are attributes of paths  and ;  is a vector of parameters that to be estimated.  and   are nest specific, and they capture the unobserved attributes shared by alternatives in the same nest, so consequently they are the same, which is where  is the scale parameter.  and   are alternative-specific, and they capture the unobserved attributes specific to alternatives  and .  and   are assumed to be normally distributed with the expectation of zero and the variances  2  and  2  , respectively.Sheffi [30] argues that the variance can be assumed to be proportional to the deterministic utility of paths, so as to link the perception error to the paths attributes.Hence the variance-covariance matrix of paths ,  is where   is the correlated utility of paths  and , such as the overlapped links;  is the scale parameter of the lower-level and it is to be estimated.The probability that traveller chooses path  given the nest (, ) is chosen as Because   and   are both normally distributed with zero means,  =   −   is also normally distributed with expectation zero but with variance , where   is the correlation.We have where Φ(⋅) is the standard cumulative normal distribution function.Since the Probit model is only used in a binary choice case with a closed-form expression, the advantages of Probit, such as flexible form of correlation and alternativespecific variance, could be fully exploited in our model.
The expected maximum utility of each nest is the aggregation of the paths within the nest; define it as  (,) for nest (, ).The utility of nest (, ) is where where  (,) ∀ ̸ = ; ,  ∈  is IID EV distributed.According to the properties of EV distributed [7], we have The probability that traveller chooses path  in the choice set  is

Calculation of the Nest Utility.
The nest utility  (,) in ( 7) is represented by the expected maximum utility of paths  and .In order to fully use the advantage of the Probit model, this paper incorporates the Probit-based aggregation to represent the utility of the nest.
Clark [1] provided an approximation method with which the expected maximum utility of paths  and  is where (⋅) is the standard normal distribution function and  is given by Noticing that Φ(−) = 1 − Φ(), the right-hand side of (11) can be simplified as The Probit-based aggregation which is used in ( 9) is

Numerical Example
The new model is tested in a three-path network with one independent path and two correlated paths, as shown in Figure 2. Three paths have the same lengths of 10.The value of  varies from 0 to 10.When  = 0, the middle and lower paths are completely correlated, so the choosing probabilities for the upper, middle, and lower paths are 1/2, 1/4, and 1/4, respectively.When  = 10, three paths are all independent, so the probabilities are 1/3 for all of them.When 0 <  < 10, the probabilities for the middle and lower paths should monotonically increase from 1/4 to 1/3, while the probability for the upper route should decrease from 1/2 to 1/3.The purposes of this numerical example are firstly to check whether the models have reasonable outcomes in extreme cases, which are when  = 0 and when  = 10, and secondly to check whether the shapes of the models are as the same as expected when 0 <  < 10, which means whether they are concave or convex and monotonously increase or decrease.
The proposed model is compared with the Probit, PCL, CNL, MNL, C-Logit, PSL, and Logit kernel (LK) models.The results of Probit, calculated by Clark's approximation, are assumed to be the standard.The new model is actually an improved version of the PCL model with mixed distribution.The LK model is compared because it is also a mixed Logit model but without a closed-form.The CNL is compared because it is also a nested structure as the proposed one.The PSL and C-Logit models are the two most widely used models for their simplicity.The MNL, without addressing the overlapping or scaling issues, is supposed to be the worst.Assume that length is the only attribute in the deterministic utility and its parameter is set to be one.The choosing probability of the middle path in MNL, namely,  (2), is where   is the length of path .The scale parameter  is normalized to one.The link-based CNL model [14] is chosen because it is the most widely used CNL model in the route choice field because it systematically captures the route-link relations in the road network, where each link corresponds to a nest and the paths that share the same link belongs to the same nest.Therefore we choose the formula as follows: where  is the GEV generating function,   is its partial derivative with respect to exp(−  ), and   is the inclusive parameter and is defined as   =   /  , where   is the length of link  and   is the length of path . is the root scale parameter and is normalized to 1;   is the scale parameter for nest  and is set as 1.5 in our case.Since this case is just illustrative, a larger or a smaller setting of the parameter   would not change the major properties of the model, which means that it would not change its value when  = 0 or when  = 10 and it would not change the concave, convex, or monotonic properties.Without loss of generality, in this case we set the nest parameter   = 1.5.
The choice probability of the middle path in the LK model is calculated by where   is the length of link ,  LK is the scale parameter and is set to be 1 in this case just for illustration, and  is a random number from the standard normal distribution.Simulationbased method [28] is used with one million draws.
The formulas of PSL, C-Logit, and PCL are corresponded to the researches of Ben-Akiva and Bierlaire [6], Cascetta et al. [10], and Koppelman and Wen [17], respectively.Probit employs the scale parameter , whereas the MNL, PCL, CNL, PSL, and C-Logit models employ the parameter .To ensure consistency among different models, we assume that perception errors are the same, which is  2 /6 2 =   .In this case all the paths lengths are the same so the variances are the same.
The choice probabilities for the middle route calculated by different models are provided in Figure 3.When  = 0 and  = 10, the new model has the expected results, so do the PCL, PSL, and C-Logit models; however both the CNL and the LK models fail to produce reasonable value when  = 0.When 0 <  < 10, the curve of the new model is close to the Probit and they both show a downward concave shape.Besides, the result from the new model demonstrates a substantial improvement over the PCL model.The results from PSL and C-Logit also have reasonable performances.However, the results from the CNL model have bizarre behaviours.It overpredicts the choosing probabilities while the paths are partly correlated.The curves of Probit, PSL, PCL, and the new model are upward concave, while the curves for C-Logit and LK are downward concave.The results from MNL are 1/3 regardless of the variation of .It fails to provide logical prediction while paths are overlapped, as expected.According to the results and the comparisons in this case, the new model is valid and capable to produce reasonable outcomes.

Empirical Results
In order to evaluate the performances of the proposed method, we apply the new model to real data.A case study of taxi drivers choosing routes in the city center is presented.The studied city, Guangzhou, is situated in the southern China and it has approximately ten million inhabitants.Only the central business district (CBD), the Tianhe region as highlighted in Figure 4, is studied.The data set for the estimation is from GPS-equipped taxis when they were carrying passengers.The data was collected by a management company for monitoring purpose but not for navigation, so the route choice behavior is based on the drivers' own judgment.The vehicles were monitored within a radius of 5 km in the CBD, and 5786 trips from 473 ODs are collected for case study.The information on the studied network is shown in Table 1.Three data sets are collected: the first one is for estimating the parameters of the new model and the compared models, which will be presented in Sections 4.1, 4.2, and 4.3; the last two data sets are for validating forecasting, which is to use the estimated models to predict the choosing probabilities and compare them with the actual choosing behaviors, and the results are shown in Section 4.4.

Model Specification.
Three attributes, length, artery road ratio, and the number of signal-controlled intersections, are included into the utility function, as shown in Table 2. Length and time are two highly similar and correlated attributes, so only one of them would be sufficient into the utility function.However a precise actual travelling time is difficult to obtain before departure.When drivers decide which route to choose, they usually process the information which is more stable in their concept, in this case length is relatively more stable than time.Therefore we use length in the utility other than time.
The unit of length is kilometer so its magnitude is similar with other attributes for the convenience of the estimation.The artery road ratio is the length of the artery road (major roads and arterial streets) divided by the total length of the trip.We assume that artery roads have a significant and positive impact in the utility, because, compared with minor streets, the artery roads have more lanes, higher capacity, and even less traffic lights in the studied region, which means higher level of service when driving.Therefore the larger value of the artery road ratio is supposed to be more attractive to the drivers and this attribute is included.The number of signal-controlled intersections is expected to have significant and negative impact when driving in the city center.More traffic lights mean more chances of stopping and delay; therefore the more intersections with traffic lights are expected to have a lower utility.The deterministic utility is shown in 4.2.Route Choice Set.According to the researches from Ramming [20] and Frejinger and Bierlaire [8], it is difficult to generate a choice set including all the actually chosen paths.At best 84% of the observed paths are found by combining all the choice set generation algorithms that Ramming [20] had tested.In order to avoid generating a choice set that misses the important chosen paths, we employ a data mining method to build the choice set for each individual.Assume that we observe the trips between a given OD pair for a long enough time period , and if there are totally  paths that are actually chosen by the travellers, we can conclude that these  paths are the choice set  of this OD.Since the GPS data is large enough and is continuously provided, it is possible to find out the choice sets for all the ODs.The advantage of this method is that it would not miss the important and actually chosen paths, but the shortcoming is that it may require a very long observation time  to determine a stable choice set .
According to 5786 trips from 473 ODs collected by taxi drivers in a period of one week, this paper analyzes the size of the choice set, denoted by .As shown in Table 3, the number of actual chosen paths between any OD pair are not larger than 12, and the average is just 4 paths.It suggests that it is rational to use the paired combination model in route choice, because the magnitude of  = ( − 1)/2 does not lead to a heavy computation burden.
The objective of this paper is purely illustrative.It does not provide a full analysis to determine how long the observation time  should be or how large the number of the observed trips should be.More tests on this subject would be desirable.

Model Estimation.
This paper uses the maximum likelihood estimation method to calibrate the parameters.Five models are estimated and compared in this section: the proposed model; the MNL, expected to have a poor result; the two most wildly used models with tree structure, the PCL and CNL; the LK model with 100 draws estimated by a simulated-based method [31].Ramming [20] points out that the estimation of the Logit family either is normalizing the scale parameter as  = 1 or alternatively is actually the jointly estimates of .To facilitate the comparison among different models, a scaled parameter estimate is also provided.The scaling,   , is based on the estimated length parameter in the MNL model.The magnitude of the scaled estimate for the parameters is consequently the same among the models.
The signs of all the estimated parameters are as expected, as shown in Table 4.The positive sign of the parameter  ARR suggests that taxi drivers tend to travel on the artery roads.The scaled estimates of  ARR and  Signal in models MNL and PCL and the new model have the same magnitude, and the magnitude of these parameters is approximately ten times smaller.The magnitude of the scaled and nonscaled estimates is the same for models PCL and CNL, but not for the new model.Actually, the log-likelihood function of the new model is not globally concave; when searching for the optimum it is easily "trapped" in a local point.The selection of the initial values, upper and lower bounds, is important in estimation.It would be a reasonable approach to estimate a MNL model first in order to gain adequate information on the parameters.The parameters estimates of all the models but LK are all significantly different from zero.Two parameters of LK,  ARR and  Signal , are not significantly estimated.The reason may be that the draws are not enough (only 100 draws).The estimation time for MNL, PCL, CNL, and the new model is within five minutes thanks to the closed-form expression.On the other hand, the LK model uses 19 hours, with only 100 draws, and still cannot attain good estimates.
Table 5 provides the likelihood ratio test between the models.It is asymptotically distributed as  2 with  degrees of freedom, where  is the difference of the numbers of estimated parameters between two models.Results from this test show that the new model and LK and CNL models are significantly better than the PCL and MNL within the 95% confidence interval.Since the number of estimated parameters for CNL and the new model is the same, it is possible to compare the goodness-of-fit by the final loglikelihood.The data shows that the new model is better than LK, but the CNL model has a better model fit than the new model.

Forecasting.
Route choice models are important to predict individual behavior; therefore the comparison of models should not just focus on model fit, but also on the forecasting results.Two out-of-sample data sets are chosen to validate the forecasting ability of models."Out-of-sample" means these where    is the out-of-sample forecasting error of model , and a smaller value would suggest a better forecasting ability in this case;   obs, (,   ) is the predicted probability of the actual chosen path  computed by model  with estimated parameter   , between OD pair , and for the observation obs.  obs, () is the probability of the chosen path in reality, which is 1.  is the number of observations in the data set.
The out-of-sample forecasting errors of the models are shown in Table 6.The MNL model uses the least time for computation in each data set; however its errors are the second largest among all the compared models.Even though the LK model provides a highly flexible structure by incorporating the multinomial normal distribution, however its errors in both data sets are the largest, and its computation time is also the largest.Although the CNL model has a good model fit in the estimation, it does not provide the most promising forecasting results in this case.The PCL model and the new model, which is also a PCL-based model, provide the least forecasting errors in both data sets.Because of the mixture of the Probit model, the new model uses more computation time than the PCL, but it still uses less time than the traditional mixed Logit model (the LK model) because the new one is closed-form and it does not require the simulation-based method in computation.By summarizing the numerical example, the estimation by real data, and the out-of-sample forecasting results, this paper therefore concludes that the new model is a promising approach in route choice analysis.

Conclusions
This article proposes incorporating the normal distribution to model the perception error of the travellers in route

Figure 1 :
Figure 1: Model structure of a three-alternative case.

Figure 4 :
Figure 4: Map of the studied region from Guangzhou city.

Table 1 :
Information on the studied network.

Table 2 :
Statistics on routes corresponding to observations.

Table 3 :
Information on the number of actually chosen paths.
*Computed with MATLAB on Intel i5 with 4 GB RAM, one processor.

Table 5 :
Likelihood ratio test.twodatasets for validating forecasting have not been used for estimating the models.Each data set includes 200 OD pairs which are randomly selected in the studied region, and all observations associated with these OD pairs are included in the forecasting data sets.3345and 3536 observations are in the first and second data set.To compare the forecasting power of models, we define the out-of-sample error as