Performance Comparison of Mode Choice Optimization Algorithm with Simulated Discrete Choice Modeling

Until recently, amajority ofmodeling tasks of transportation planning, especially in discrete choicemodeling, is conductedwith the help of commercial software and only concerned about the result of parameter estimates to get a policy-sensitive interpretation.This common practice prevents researchers from gaining a systematic knowledge involved in estimation mechanism. In this research, to shed a light on these limited modeling practices, a standard discrete choice model’s parameter is estimated using Quasi-Newton method, DFP, and BFGS. Two extended algorithms, called DFP-GSM and BFGS-GSM, are proposed for the first time to overcome the weakness of the Quasi-Newton method. The golden section method (GSM) incorporates a nonlinear programming technique to choose an optimal step size automatically. Partial derivatives of log-likelihood function are derived and coded using Visual Basic Application (VBA). Through extensive numerical evaluation, estimation capability of each proposed estimation algorithms is compared in terms of performance measures. The proposed algorithms show a stable estimation performance and the reasons were studied and discussed. Furthermore, useful insights educated in custom-built modeling are present.


Introduction
Discrete choice modeling is widely used across disciplines to predict a certain choice situation through a mathematical inferring of a choice model's parameters [1].In transportation demand forecasting, this choice analysis method is generally used to describe a variety of choices.Applications of discrete choice theory in various fields are very useful tool for policy analysis and planning.Applications in the transportation sector are very active and diverse and apply to almost every choice situation where alternatives exist, for example, the choice of travel mode [2][3][4][5][6][7], mode and departure time selection [8,9], route selection [10][11][12][13], and airport selection [14][15][16].It is common practice to estimate model parameters by relying on commercially available software.
Until recently, a concern of discrete choice modelers is mainly about to find an interpretable parameter estimates sound in their statistical accuracy level and in their reflected meaning in a policy-sensitive way.Estimation algorithm and calculation mechanism are actually not concern of researchers.However, as a specification of model becomes more complicated to model a real situation in a persuasive way, and as a more realistic assumption is frequently adopted in a model developing process, researchers (or modelers) might meet difficulties in dealing with modeling tasks.Employing a new probability distribution function (PDF) for describing an unobserved part of a choice model can be prohibited by a limitation of software packages.Choosing and applying a specific algorithm for satisfying a number of occasions in modeling tasks are not easy to practice with a normal software package.In particular, opportunity to have access to all the details of calculation that enable researchers to have a systematic understanding of modeling mechanism is hardly to get unless modeling is in custom-built procedure.A few literature items on this topic can be found in the econometrics field [17][18][19][20][21][22][23], but these studies have addressed a different application environment and, therefore, it is difficult to apply these to modeling transportation related decisions [24,25].To tackle this kind of challenges that can occur in model developing, researchers must have a capability of doing modeling with custom-built computer codes that are tailored for each unique modeling case.To be able to reach this level of stage, researchers should be more familiar with a calculation mechanism and a flow to be undertaken before final parameter estimates are taken.
In this research, a standard discrete choice model is estimated using self-made computer code that is developed by using Visual Basic Application (VBA) in EXCEL [26][27][28].In order to guarantee convergence of a test model, excellent and robust in convergence properties even for the problems that cannot be solved in satisfactory manner [29,30], two practically and pedagogically important estimation algorithms are employed to calculate parameter estimates in every iteration stage.These are DFP and BFGS algorithms.More importantly, for the first time, two new estimation algorithms are proposed to improve a way of finding an optimal step size by incorporating a golden section method (GSM) into general routine of both DFP and BFGS.These are DFP-GSM and BFGS-GSM.These four algorithms are compared in terms of performance measures to show a different operational characteristic according to application of several critical factors that are identified and experimented through extensive numerical trials.
The next parts of this document consist of the following.Section 2 is about a model specification assumed within a concept of multinomial logit choice model and data utilized in this research.Section 3 is dedicated to the issues that researchers should have knowledge on in making custom-built modeling procedures.Section 4 deals with the results obtained from extensive numerical experiments that show a relative operational performance among algorithms, responding to changing of criteria.The last section summarizes the results obtained from this research and presents conclusions.

Model and Data
As a standard discrete choice model, multinomial logit (MNL) model has been widely applied to describe a choice behavior of decision maker due to its simplicity of its probabilistic choice function, which is the results derived based on two big assumptions, first one is about an utility maximization which explains a choice mechanism based on an utility concept and second one is about an probability distribution function applied to the distribution of unobserved part of total utility.The simple choice function is shown below: where   () is a choice probability of mode  and   and   are a representative utility function of mode ,  respectively.The final form of the multinomial logit model is composed of two parts, a denominator which is the sum of the systematic utility of all the alternatives in the choice set and a numerator which is the systematic utility of the alternative chosen by decision makers.
As a revealed preference (RP) type data, 540 travels interviewed on site for passengers who travel to airport using possible modes are used to estimate a mode choice model.Five alternatives are considered and the representative utility function of each alternative has a specification as follows: (2)

Components in Custom-Built
Discrete Choice Modeling 3.1.Formulating Log-Likelihood Function.To describe estimation process with computer codes using maximum likelihood estimator (MLE), a high-order nonlinear likelihood function containing whole information of the surveyed data is to be built.Likelihood function specific this research is below: By a log transformation, the above equation (3) changed into a tractable form from the mathematical standpoint as shown in (4).By differentiating the equation in first and second order with respect to each parameter, all elements contained in a vector of gradients and the Hessian matrix can be expressed with mathematical expressions that provide a numeric value in iterative estimation process.This estimation process will be discussed in detail in the following section: 3.2.Iteration Rule.As the most frequently adopted in commercial statistical package, Quasi-Newton method shows a better performance in running time and is considered to be robust in convergence properties compared to any other algorithms [30].More importantly, this method might be excellent in convergence properties even for problems that cannot be solved by other algorithms in a satisfactory manner [29].As noted in the text of Train [30], this unique excellence of the algorithm is the results of selecting a different approach in updating the Hessian matrix, compared to other estimation algorithms such as Newton Raphson, BHHH, BHHH-2, and Steepest Ascent.
A general iteration rule applied in this research for estimating parameters of discrete choice model is shown below: where   is the parameter estimates after  iterations,  +1 is the parameter estimates after  + 1 iterations,  is a step size (can be assumed by a researcher at the initial time of model parameter estimation),   is the Hessian matrix in iteration  in each algorithm, and   is a vector of gradient in iteration .This rule is commonly applied to DFP and BFGS.
As compared in Table 1, each algorithm might be generally tried in custom-built modeling and it is important to know how these algorithms work in estimation process from the pedagogical aspect [30].In fact, they have been widely adopted as an estimation routine in commercial software packages.The first four algorithms (i.e., Newton Raphson, BHHH, BHHH-2, and Steepest Ascent) are referred from the earlier research of Roh and Khan [25] and Train [30].For more details refer to the above two researches.In this research, Quasi-Newton method (the last two algorithms of Table 1) is mainly considered in finding parameters of the test model with developed computer codes for this specific purpose.Furthermore, an extension of the method is proposed to present a new technique of searching a step size automatically.
The second iteration rule experimented for the first time in this research shows a difference in a way of adopting a step size (  ) which is adjusted repeatedly during estimation of the test model.More details will be discussed in the following section.This iteration rule is commonly applied to DFP-GSM and BFGS-GSM algorithms.
A slight, but notable difference of two iteration rules described in both ( 5) and ( 6) is of acquiring a step size (,   ) during iteration. is transposed matrix of   ,   is equal to the value of  +1 −   , the difference of value between a gradient vector in iteration  = 1,  +1 , and a gradient vector in iteration ,   , and    is transposed matrix of   .The iteration rule can be written as After sufficient iterations before convergence,  DFP AP is assumed to approximate inverse of real Hessian of NR.In this first DFP method, we can try a variety of , a step size, to find the best  with which the iteration shows the best performance.

Broyden-Fletcher-Goldfarb-Shanno (BFGS) Algorithm.
The BFGS algorithm is slightly different in terms of its updating rule of approximate Hessian compared to that of DFP.The updating formula of approximate Hessian is presented below and it shows almost the same formula as for that of DFP except the last term: where  The additional term (i.e., V     ) can be calculated using the information of the first term already provided in parenthesis.As in the case of DFP method,  BFGS AP would be another approximate Hessian matrix after sufficient iteration.After inputting  BFGS AP into iteration formula, it has the form as below: In this formula, like the iteration rule of DFP, we can employ several values to find an optimal  that shows the best estimation performance in terms of several measures such as one iteration time, the number of iterations, and convergence time before convergence.An strong advantage of the methods, DFP and BFGS, is to accept   =  +1 −   in updating procedures, which is the difference between continuous gradient in iteration  + 1 (i.e.,  +1 ) and iteration  (i.e.,   ).This means that   might contain more information on the shape of the log-likelihood function and it makes it easier to reach the optimal value that maximizes the log-likelihood function [30].
The following section will suggest the way of finding  automatically with the help of line search method.In this research, we adopted the golden section method (GSM) as a line search tool.This method was successful in the estimation of parameter estimates.

Golden Section Method (GSM).
Especially in custombuilt discrete choice modeling, the problem of using estimation algorithm for the estimation of model's parameter is that the researchers have no preliminary knowledge of choosing a step size () that guarantees the best performance of the estimation procedure.To address this problem, we put a new module of line search, which uses golden section method (GSM) between codes of both DFP and BFGS methods.Incorporating the golden section method (GSM) for a line search was tried.
As a line search method which is implemented without using derivatives for the targeted function for finding an optimal value that maximizes or minimizes the function, the basic concept of golden section method (GSM) is to reduce the interval of uncertainty during the search procedure to fix a point for minimizing, in this research, the given function.Let LL( 1⋅⋅⋅9 ) be the function to be minimized and  ≤  ≤  is the interval of uncertainty.To find a point for minimizing the function we can exclude portions of the interval of uncertainty that do not contain the optimal point through the line search procedure.After a number of iterations for elimination, infinitesimally small portions are left.By averaging lower and upper bounds of the remaining portions, the optimal point for the given function, the current iteration is calculated.
The following summary of the golden section method (GSM) is of description for minimizing a strictly quasiconvex function over the uncertain interval ( 1 ,  1 ) [31].

An Application of GSM to This
Research.Table 3 shows the example computation of DFP-GSM.In this specific example, we adopt 3 as the interval of uncertainty, and use   −   < 0.01 as stopping criterion.In Table 3, the initial interval of uncertainty is of length 3(), and after 13 iterations involving fourteen observations (see * ), the interval of uncertainty is [0.024402711, 0.033719691], so that the minimum point can be calculated to be the midpoint of two values.This midpoint may be considered as an optimal step size   searched by the golden section method in certain iteration before convergence.In particular, in the context of this research, the log-likelihood function of multinomial logit is globally concave [32], and to apply the golden section method (GSM) limited to a strictly quasi-convex function [31], we transform the log-likelihood function made in this research into a symmetry function to -axis by adding (-) sign, −LL( 1⋅⋅⋅9 ).

DFP-GSM Algorithm.
Iteration formula is nearly the same as for the DFP method, other than adopting   instead of , which is continuously having different values in every iteration.Line search has an important role of finding optimum value (i.e.,   ) within the interval of uncertainty.In every iteration,   which minimizes the given log-likelihood function is founded and used in the iteration formula shown below: 3.3.6.BFGS-GSM Algorithm.In the case of BFGS-GSM, all operations are the same as applied to DFP-GSM.The approximate Hessian,  BFGS AP , can be obtained in the same way as for the BFGS method in Table 2.The iteration rule is as presented below:

Experimental Results
Four algorithms mentioned in this research are used to estimate the test model.The same data are input in the estimation in order to compare their estimation performance between algorithms.Three performance measures such as one iteration time, the number of iterations, and convergence time are considered in measuring the performance differences between algorithms.Various step sizes are selected randomly in the first two algorithms (e.g., DFP and BFGS); all those runs that converged successfully were chosen and summarized.As shown in Table 4, for the last two algorithms (e.g., DFP-GSM and BFGS-GSM), instead of choosing a random step size, we need to guess an uncertain interval.With these conditions, we conducted three categories of experiment as follows.(1) The first category is about choosing different convergence criteria; it is tested in experiments 1.1 and 1.2.(2) The second category is about a subject of different initial guessing of the parameters; this work is tried in experiment 2. (3) The third category is about adopting a new method for guessing of initial Hessian instead of using identity matrix at the initial estimation stage; it is implemented by using a method of both BHHH and BHHH-2.
The four estimation algorithms can be recognized with the unique serial number as shown in Table 4, and three categories used to compare the estimation performance are presented in Table 5.

Numerical Experiment 1
4.1.1.An Experiment 1-1 ( = 10 −4 ).As given in Table 5, a vector of the test model's parameters is estimated with a convergence criterion represented by 1 in the table in its mathematical expression, having CR = 10 −4 as the level of precision of parameter estimates.This convergence criterion is referred from Ben-Akiva and Lermans [1].The results of the experimental estimation for the four different algorithms are summarized in Table 6 for only experimental trials that converged.
As shown in Table 6 diverse step sizes were tried for the first two algorithms to find the best value of step size for making the log-likelihood function to increase in the fastest way until it reaches the maximum.In the first two algorithms, the step size is fixed at the initial stage of estimation and changed to other values by a researcher.
In the last two algorithms, DFP-GSM, BFGS-GSM, the step size is calculated automatically with the function which is imbedded in the general estimation routine and is operated with the golden section line search method.Instead of choosing step size manually, a researcher must establish interval of uncertainty in which the best step size is searched by the golden section line search method.In this research, eight different uncertain intervals are used and compared.Typically, there are no certain limitations in choosing the uncertain intervals, where choosing the value above 0 is common [31].
Figure 1 shows the estimation performances of the four algorithms in terms of three performance measures.The first two algorithms converge or do not converge in different step sizes.Only converged runs appear in both upper part of Table 6 and left part of Figure 1.In contrast, the last two algorithms converge in the every uncertain interval and their results are presented in lower parts of Table 6 and right part of Figure 1.
Experiment 1-2 (applied to four algorithms) Experiment 2 (applied to four algorithms) For the first performance measure, one iteration time, there are somewhat large differences between the first two algorithms and the last two algorithms.This reflects the fact that the last two algorithms take more one iteration time due to the procedure added for conducting a line search.It takes about 1∼2 minutes for DFP-GSM and BFGS-GSM to finish one iteration.The values of one iteration time can be used to recognize the operational superiority or inferiority of algorithm because it dominates the overall convergence time for all algorithms, even though it is more reasonable to consider simultaneously both one iteration time and the number of iterations in decision of operational performance of algorithm.
For the second performance measure, the number of iterations, obviously there are big differences both between the step sizes in the same algorithm and between the four different algorithms.In case of DFP, BFGS, for runs converged, as the step size increases, the number of iterations decreases.The result indicates that there is a close relationship between the step size and the number of iterations.Choosing a step size randomly at the initial stage of the estimation can affect the estimation performance negatively if the selection is poor or affect positively if the selection is fortunately good.To address the problem of choosing initial step size and to improve the estimation performance, in this research, a new systematic method, namely, golden section method for a line search, is incorporated in algorithms, DFP and BFGS.As a result, DFP-GSM and BFGS-GSM algorithms show a stable estimation performance in terms of the number of iterations.The iteration numbers presented in the right part of Figure 1 are nearly similar for the all uncertain interval considered in experimental runs except one run of DFP-GSM using L ower L imit (0), U pper L imit (1) as an uncertain interval.Stated simply, the DFP-GSM and BFGS-GSM algorithms compared to the DFP and BFGS in Figure 1 definitely make a decrease in the number of iterations.Moreover, the iteration number converges to around a certain value: these new algorithms are stable in terms of the number of iterations.
The third performance measure, the convergence time, is used to compare the estimation performance of the four algorithms.This measure is the most important measure in order to decide the excellence of an algorithm because the operational time before convergence is of much interest to researchers; especially in large scale modeling it can be very critical.
As a Quasi-Newton method, DFP and BFGS algorithms use a similar updating process for the Hessian matrix as explained in earlier section.The BFGS algorithm takes less convergence time (e.g.,  = 1/4 or  = 1/16) or more convergence time (e.g.,  = 1/8,  = 1/2) than DFP when compared in the same step size.Key point is that, even though the number of iterations is much less in BFGS than that in DFP for the estimation using the same step sizes, the difference in convergence time between two algorithms is not significant.This phenomenon might be attributed to the updating process of the Hessian matrix: BFGS uses a more complicated process than that of DFP and it takes more time in finishing one iteration.The last two algorithms used for the first time in this research, namely, DFP-GSM and BFGS-GSM, show stable estimation performance.In other words, the convergence times are distributed more evenly than other two algorithms.The BFGS-GSM shows slightly better performance than the DFP-GSM in terms of convergence time.
In conclusion, in experiment 1, compared to DFP and BFGS, DFP-GSM and BFGS-GSM show stable performance in terms of both the number of iterations and the convergence time irrespective of the choice of any uncertain interval.DFP and BFGS show somewhat large differences in estimation performance, depending on the selection of different step sizes at the initial stage of estimation.
As shown in Figure 2, the line graphs of the experimental estimations are drawn to show the trace of log-likelihood function values during the iterations and moreover to compare clearly the different convergence behaviors both between step sizes and uncertain interval in each algorithm.).

Algorithms & performance measures
Step size () summarizes critical values in iterations such as starting loglikelihood value and convergence log-likelihood value and number of iterations according to both different step sizes and uncertain interval.As shown in Table 7, all starting log-likelihood values are exactly the same for the all four algorithms and both for any step sizes and any uncertain intervals within each algorithm because for exp.1-1 (CR = 10 −4 ) the initial guess of parameters is assumed to be zero, On the contrary, convergence values are slightly different both between algorithms and between step sizes for the first two algorithms.For the last two algorithms, they are exactly the same in their values for almost all uncertain intervals except some of them.
In the case of the DFP algorithm, there is a large difference in its convergence behavior compared to the four estimation algorithms discussed in the research of Roh and Khan [25].They used exactly the same experimental setting except using different four estimation algorithms (i.e., Newton Raphson, BHHH, BHHH-2, and Steepest Ascent).The most obvious difference is that the line graphs for all step sizes, except 1/16, show much bigger fluctuations than the four algorithms [25] in the initial part of iterations without a steady increase.This phenomenon might be assumed as a result of the mechanism used for updating the approximate Hessian matrix of DFP algorithm.Train [30] indicates that this kind of algorithm, including BFGS algorithm, uses information obtained at more than one point on the log-likelihood function for a calculation of approximate Hessian matrix in every iteration.The step size of 1/2 shows the best performance with 115 iterations and its critical values are presented in Table 7.  ).

Algorithms & log-likelihood values
Step size () In the case of the BFGS algorithm, like the DFP algorithm, there are large fluctuations for all step sizes except 1/32 and 1/16.Moreover, a similar reason used in DFP can be applied to explain this phenomenon.Overall, there is only a slight difference in the method to calculate updated approximate Hessian matrix for two algorithms, DFP and BFGS.One more term is added to BFGS algorithm as was shown in earlier section.A run with step size 1 outperforms the other runs, as can be seen in Table 6.It can be noted that it experiences an extreme change in value of log-likelihood at the initial iterations.
In the last two algorithms, the use of a variety of step sizes is not needed because the golden section line search method (GSM) takes over the role of choosing an optimum step size using an automatic method.Another task arises instead, making an interval of uncertainty.In this research, eight different uncertain intervals are considered.It should be noted that there are no rules and no guidance is available from literature regarding the choice of uncertain interval.Temporarily, eight different intervals characterized by an increasing of the same amount, namely 1, were applied to simply examine the convergence behavior that might be caused with different uncertain intervals.Figure 2 shows the convergence behavior of the DFP-GSM and BFGS-GSM algorithms according to various intervals.All line graphs show almost same convergence profile and thus they are drawn in the same line irrespective of uncertain intervals used temporarily.Almost the same results were obtained from DFP-GSM and BFGS-GSM algorithm, as presented in Figure 2.

4.1.
2. An Experiment 1-2 ( = 10 −4 ).In an experiment 1-2, the test model estimates a vector of parameters based on different convergence criterion in both its mathematical form and the degree of stooping criterion.The mathematical form of convergence criterion used for this experiment is referred to from Besley [17].The different degree of stopping criterion applies gradually from CR = 10 −4 to CR = 10 −6 in order to examine the impact of applying different degree of convergence precision on the operational behavior in discrete choice model estimation.Table 8 shows the results of the experimental estimation for the four different algorithms for all experimental trials that converged.Figure 3 shows the results graphically in terms of three performance measures.
All conditions and environments applied in the experimental estimation are exactly the same as experiment 1-1 except the formulation of the convergence criterion written in the form of    (−  ) −1   < 10 −4 .For all algorithms, the estimation performance is improved as compared to experiment 1-1 in terms of three performance measures.This phenomenon can be seen in Table 8.There are significant improvements for the first two algorithms for all step sizes and for all performance measures considered.For the last two algorithms, all uncertain intervals except both L ower L imit (0), U pper L imit (2) and L ower L imit (0), U pper L imit (3) show an improvement of performance in terms of convergence time.Another important point is that the last two algorithms show a better performance in terms of both the number of iterations and the convergence time as the uncertain intervals increase gradually by an amount of unit, 1.
In conclusion, when a new convergence criterion that has a different mathematical form is applied, all algorithms show an improvement in terms of three performance measures compared to the case of experiment 1-1.
As shown in Table 9 for the first two algorithms and for any step sizes used within each algorithm, all calculated starting log-likelihood values are exactly the same due to   = [ 1 = 0,  2 = 0,  3 = 0, . . .,  9 = 0]  .The same condition imposed as an initial guess of starting points in experiments 1-1, 1-2, and 3. On the contrary, convergence values are slightly different among the step sizes used in each algorithm and the differences are so small that it can be considered as being equal.As shown in Table 9 for the last two algorithms, all starting log-likelihood values are exactly the same due to the same reason explained for the first two algorithms.In case of the convergence log-likelihood values, slight differences exist among uncertain intervals but these are also so small that these can be considered to be converged at the same value even though it is not exactly the same.
As we can see by comparing two figures, Figure 2 and Figure 4, the most important point we can find is that the last part of iteration procedures has a different profile in graph.That is, the overlap parts observed in experiment 1-1 are less and simplified in experiment 1-2.It means that a researcher can control the termination timing by changing the convergence criterion within a scientific way, of course, if or only if the new given convergence criterion guarantees the degree of accuracy of parameter estimates up to the level which a researcher expects it must be.
The results of the experimental estimation for the algorithms are summarized in Table 10 in the same pattern and style as used in previous presentations.Detailed results are compared graphically in Figure 5 in terms of three performance measures.
Generally, for all four algorithms, the estimation performance of the current experiment 1-2 (CR = 10 −5 ) has improved compared to that of experiment exp.1-1 (CR = 10 −4 ) but has slightly deteriorated as compared to that of experiment 1-2 (CR = 10 −4 ).The poor performance observed in this experiment can be attributed to the stopping criterion moved in the stringent direction.It must be the fact that a more stringent stopping criterion needs more calculation time to find a vector of parameter estimates satisfying the stringent stopping criterion.
In conclusion, when a new more strict convergence criterion is applied as a new stopping standard, all algorithms show better estimation performance than experiment 1-1 (CR = 10 −4 ) but present poorer performance than experiment 1-2 (CR = 10 −4 ).If a more stringent stopping criterion is applied, the time to convergence tends to increase.).

Algorithms & log-likelihood values
Step size () Moreover, the mathematical formula of stopping criterion used in experiment 1-1 takes more convergence time than that of experiment 1-2.
For all estimation results, the graphs are drawn in order to show a variation of the log-likelihood function values following the iterations and moreover to compare clearly the different convergence behaviors between step sizes used in each algorithm.Table 11 summarizes critical values such as starting log-likelihood value and convergence log-likelihood value and the number of iterations due to the different step sizes.As shown in Table 11, all calculated starting loglikelihood values are exactly the same for the all four algorithms and for any step sizes used within each algorithm due to   = [ 1 = 0,  2 = 0,  3 = 0, . . .,  9 = 0]  , which implies the same condition imposed as an initial guess of starting points in experiments 1-1, experiment 1-2, and experiment 3. On the other hand, convergence values are slightly different among the step sizes only below the decimal point and so the difference can be ignored and all runs are considered to be converged at the same log-likelihood value.
The convergence behavior is identical with experiments 1-1 (CR = 10 −4 ) and 1-2 (CR = 10 −4 ) except the fact that slight differences happen over the three performance measures.These experimental results are clearly presented in Figure 6.Overall, the phenomena observed are that, with decreasing step size, the slope also becomes small but the number of iterations rises.
We can clearly identify the best performance among the runs performed using various step sizes for the first two algorithms but in the last two algorithms almost all graph ).

Algorithms & performance measures
Step size () drawn for each uncertain interval look to be drawn in the same line graph (see Figure 6).Also, the profile of graph at the last iteration is not as long as that of experiment 1-1 (CR = 10 −4 ) but as compared with experiment 1-2 (CR = 10 −4 ), the overlapped part becomes longer.It means that more iteration is conducted in each algorithm to satisfy a new stringent criterion charged for stopping the iteration process.It also means that the iteration number and convergence time can be adjusted to become suitable for each specific modeling work or for maintaining the accuracy of parameters within the desired accuracy demand.
The results obtained from the experimental estimation for the four algorithms are summarized in Table 12, and the detailed results are presented graphically in Figure 7 in terms of three performance measures.

Algorithms & performance measures
Step size () attributed to the most stringent stopping criterion applied in this experiment.The reason is the fact that the most stringent stopping criterion needs much more calculation time than for runs having a less stringent stopping criterion.Finding a vector of parameter estimates that satisfies this stringent stopping criterion is a time consuming process and it may not be worthwhile.Researchers can take moderate procedures and controlling measure according to their expectation and specific purpose in doing their unique modeling.
In conclusion, when the most strict convergence criterion, specifically in this research, is applied as a new stopping measure, all algorithms show a worse estimation performance than all experiments tried previously.Applying a more stringent stopping criterion causes an increase in convergence time in discrete choice model estimation.
Table 13 summarizes critical values such as starting loglikelihood value and convergence log-likelihood value, and the number of iterations due to different step sizes.As shown in Table 13, calculated starting log-likelihood values are exactly the same for all four algorithms and for any step size used within each algorithm due to the condition applied in guessing of starting points,   = [ 1 = 0,  2 = 0,  3 = 0, . . .,  9 = 0]  .On the other hand, convergence values are slightly different among the step sizes only below one decimal point and so the difference can be ignored and all runs are considered to be converged at the same log-likelihood value.
The convergence behavior is identical with previous experiments, except the fact that slight differences occur all over the three performance measures.These experimental ).

Algorithms & log-likelihood values
Step size () LL(0), UL( 5) LL(0), UL( 7) results are clearly presented in the graph.Overall, it was observed that as the step size decreased, the slope became gradually gentle and the number of interaction increased.As shown in Figure 8, the profile of graph at the last iteration part also is shorter than that of experiment 1-1 (CR = 10 −4 ), but as compared with experiments 1-2 (CR = 10 −4 ) and experiment 1-2 (CR = 10 −5 ), the overlapped part becomes much longer.It means more iteration is conducted for each algorithm in order to satisfy a new stringent criterion charged for stopping the iteration process.It also means that the number of iterations and convergence time can be adjusted, if desired.

Numerical Experiment 2.
In experiment 2, the applied convergence criterion is    (−  ) −1   < 10 −4 .The main concern of this section is to recognize the importance of guessing starting points and exhibit different estimation performance using experimental estimation runs with different starting points.A new method or an idea of finding a good initial guess of the starting point is not the subject of this section.The point of this section is that it might be important just to know that changing initial starting points can make a difference in estimating performance and that the task of developing a new systematic way of guessing starting points that guarantee a better operational performance might be a useful future research topic.For example, refer to the research conducted by Liu and Mahmassani [33] for additional information on this subject.To shed a light on this new interesting topic, in this section, a new assumed vector of parameter applied to the four algorithms is written as   = [ 1 = −0.01, 2 = −0.1, 3 = 0, . . .,  9 = 0]  .This vector of parameter is randomly chosen among the vector of parameters that work in estimation.
The results obtained from this experimental runs are shown in Tables 14 and 15.In addition, the detailed results are compared graphically in Figures 9 and 10 in terms of three performance measures or according to variances of loglikelihood values obtained for four algorithms and using an assumption on a vector of parameter.
It is useful to know that applying different vectors of starting points can cause diverse variations in estimation performances that can enhance the situation or make it worse and that trying to find an advanced way of guessing starting points guarantees improvement in the estimation performance in targeted performance measures.Therefore, this experiment could be recognized as a potential topic that a researcher can undertake.
Overall, the results obtained from experimental estimation are as follows.(1) Starting log-likelihood values are enhanced in its magnitude due to adopting new starting points.(2) In case of algorithms, the convergence time is greatly improved compared to the results of other experiments and the improvement in test results means that a researcher's prior knowledge on parameter to be estimated is correct and new guessed starting points make an improvement in estimation performance.(3) The convergence loglikelihood values are deviated between algorithms, but the reason is not clear here.

Numerical Experiment 3.
In this section, we consider another way of guessing the initial Hessian matrix, which is assumed to be an identity matrix before starting estimation and continuously updated as the estimation processes in the DFP, BFGS and DFP-GSM, BFGS-GSM algorithms.Another way that is tried in this experiment is to incorporate the first iteration routine that is used to calculate the Hessian matrix in both BHHH and BHHH-2 algorithms instead of assuming identity matrix as a staring Hessian matrix.
By adding codes associated with the calculation of the initial Hessian matrix in BHHH and BHHH-2 algorithms to general estimation procedures of the above four algorithms, new estimation codes are completed in this experiment.They are named as DFP (BHHH), DFP (BHHH-2), BFGS (BHHH), BFGS (BHHH-2) and DFP-GSM (BHHH), DFP-GSM (BHHH-2), BFGS-GSM (BHHH), and BFGS-GSM (BHHH-2).It is important to note that the only difference between two sets of algorithms (e.g., DFP and DFP (BHHH)) is the way of obtaining the initial Hessian matrix.The first algorithm (DFP) uses an identity matrix as an its initial Hessian matrix, and the second algorithm (DFP (BHHH)) uses an advanced way presented in this experiment to get an initial Hessian matrix.
An important lesson being learned through a given series of experimental estimations is to know that applying a different way of guessing initial Hessian matrix can make a difference in estimation performance in terms of performance measures, which can be directed to better or worse performance.In the context of this research, the experimental estimations tried in this experiment generally give better results than that of experiment 1-2 (CR = 10 −4 ) for the given performance measures.It means that a better way of guessing the initial Hessian matrix, unlike the general procedures of using an identity matrix as an initial Hessian matrix, would improve the estimation performances in the targeted performance measures (i.e., convergence time).This issue can be another topic having potential to generate fruitful research results in the field of econometric modeling.
The following explanations of the results are compared to the results of the experiment 1-2 (CR = 10 −4 ), which is estimated under the same condition of experiment 3 except initial Hessian matrix.The results obtained from experimental estimation are as follows.(1) In terms of a single iteration time, there is no big difference between the runs using a general routine shown in Figure 3 and the runs using a new routine proposed in experiment 3 shown in Figures 11 and 12.However, when using a new proposed routine, the single iteration time becomes similar to other runs irrespective of using both different step sizes (in the case of DFP (BHHH), BFGS (BHHH) and DFP (BHHH-2), and BFGS (BHHH-2)) and different uncertain interval (in the case of DFP-GSM (BHHH), BFGS-GSM (BHHH) and ).
(2) In terms of the number of iterations, the routine proposed in experiment 3 shows a better performance.The number of iteration is decreased.(3) In terms of the convergence time, the big difference is observed between experiment 1-2 (CR = 10 −4 ) and experiment 3. The experiment 3 takes much less convergence time than experiment 1-2 (CR = 10 −4 ).The reason is that using a more accurate initial Hessian matrix at the initial stage of estimation, which can be calculated and input into the routine of estimation, instead of using an identity matrix would decrease the iteration time needed to reach the convergence Hessian matrix.

Summary and Discussion
Three important factors affecting strongly overall estimation performance in discrete choice modeling are detected through extensive experimental estimation which is classified with three categorical experiment: convergence criterion, initial guessing of starting points, and initial Hessian matrix.
(i) The first category considered in this research is to adopt convergence criterion that is different in both its mathematical expression (compare experiment 1-1 and experiment 1-2) and its degree of stopping precision (experiment 1-2) as given in Table 5.The purpose of these experiments was to show the evidence of changing estimation performance, to compare them in terms of performance measures, and to recognize the convergence criterion as an important factor dominating the performance of estimation process.
(a) As an initial guess of the starting value of a vector of parameter, the value of   = [ 1 = 0,  2 = 0, 3 = 0, . . .,  9 = 0]  is used for D BFGS-GSM) use the golden section line search (GSM) method instead of choosing a step size manually before the estimation procedure begins.The GSM method finds an optimal value of step size automatically during iteration, with which the log-likelihood function is maximized in every iteration process.
Step (f) To use the GSM in line search work, we should assign the interval of uncertainty, in which the optimal step size is investigated automatically and it is used in every iteration estimation procedure.In the context of this research, eight different uncertain intervals are proposed and the experimental estimation is run using each of them.Typically, there are no limitations in choosing uncertain intervals, if a researcher chooses the interval having a distance of between 0 and a certain value over 0 [31].(g) From the results of the first categorical experiment, we identify several facts as follows. (1) In spite of using the same model specification (i.e., using the same multinomial logit choice function and the same utility functions for each alternative) and common data set, there exist large or small differences among four estimations algorithms in terms of three performance measures.
(2) Depending on the step size, large variations are generated in the estimation performance among four different algorithms or even within the same algorithm.In conclusion, the step size chosen by a researcher at an initial stage of model estimation can be an important control factor controlling the entire estimation performance, while the model is under estimation.(3) As opposed to choosing a step size randomly as required in the first  two estimation algorithms, in this research, a new algorithm employs a golden section line search method (GSM) for finding the optimal step size automatically in every iteration.The two algorithms tried, DFP-GSM and BFGS-GSM, show a stable estimation performance in terms of measures used regardless of using any uncertain intervals.By using this method, a researcher can minimize the risk of choosing a bad step size, which can happen in the case of first two estimation algorithms.In this case, the same mathematical expression of stopping criterion is used but different stopping precision is employed.(1) In the results obtained with increasing precision of stopping criterion, the performance degenerates in terms of performance measures: taking more time in one iteration, more iterations before convergence, and taking more time before convergence.
(2) The reason for this phenomenon can be attributed to the fact that applying the more stringent stopping criterion necessitates more calculation time to satisfy the more stringent stopping criterion before terminating iteration procedures.
(3) This result implies the fact that a different stopping precision applied in the same mathematical expression of stopping criterion is another control measure having a big impact on model estimation performance.
(ii) The second category is to use a different initial guess of the parameter estimates of model.This is the subject of experiment 2. series of experimental estimations is to know that applying a different way of guessing initial Hessian matrix can make a difference in estimation performance assessed according to specified performance measures.(g) In the context of this research, the results of experiment 3 are in general better than that of experiment 1-2 (CR = 10 −4 ).These use the same condition of stopping criterion but differ in terms of the method for acquiring the initial

Figure 11 :
Figure 11: The results of four algorithms responding to a new way of calculating the initial Hessian matrix adopted from BHHH algorithm (exp.3) (CR = 10 −4 ).

Figure 12 :
Figure 12: The results of four algorithms responding to a new way of calculating the initial Hessian matrix adopted from BHHH-2 algorithm (exp.3) (CR = 10 −4 ).

Figure 13 :
Figure 13: Variances of log-likelihood values for four algorithms responding to a new way of calculating the initial Hessian matrix adopted from BHHH algorithm (exp.3) (CR = 10 −4 ).

Figure 14 :
Figure 14: Variances of log-likelihood values for four algorithms responding to a new way of calculating the initial Hessian matrix adopted from BHHH-2 algorithm (exp.3) (CR = 10 −4 ).
(h) The facts observed from a comparison of experiment 1-1 (CR = 10 −4 ) with experiment 1-2 (CR = 10 −4 ) are as follows.(1) The result of experiment 1-2 (CR = 10 −4 ) shows a better estimation performance in terms of three performance measures than that of experiment 1-1 (CR = 10 −4 ): less time in one iteration, smaller number of iterations, and faster convergence time.(2) This means that different types of stopping criteria can lead to different estimation results and show a different estimation performance, even in the same modeling situation.(i) Observations drawn from the comparisons of experiments 1-2 (CR = 10 −4 ), experiment (CR = 10 −5 ), and experiment (CR = 10 −6 ) are as follows.
(a) Here, the stopping criterion is    (−  ) −1   < 10 −4 .(b) The main concern of experiment 2 is to recognize the importance of initial guess of starting points and to show an example of experimental estimation results reflecting the fact that the estimation performance can be affected by use of different starting points.(c) The vector of parameter is   = [ 1 = −0.01, 2 = −0.1, 3 = 0, . . .,  9 = 0]  .(d) It is important to know that changing initial starting points can result in different estimation performance and that the development of a new systematic way of guessing starting points guaranteeing a better operational performance would be a useful future research topic.Overall, the results obtained from experimental estimation are as follows.(1) Starting log-likelihood values are different from other categorical experiments due to adopting new sets of starting points.(2) Different convergence time compared to experiment 1-2 (CR = 10 −4 ) is examined.This means that if a researcher can specify properly parameter's characteristics based on good prior knowledge (e.g., sign or magnitude of parameter estimates) or if a researcher can develop a new scientific way of guessing starting points, estimation performance may be improved.(iii) The third experimental category is on employing a different way of calculating the initial Hessian matrix instead of using identity matrix.This experiment is carried out in experiment 3.(a) The applied convergence criterion is    (−  ) −1   < 10 −4 .(b) The applied initial guess of starting points of a vector of parameters is   = [ 1 = 0,  2 = 0,  3 = 0, . . .,  9 = 0]  .(c) All four algorithms are used for experiment 3 which use an identity matrix as an initial Hessian matrix in the model estimation procedures (i.e., DFP, BFGS, DFP-GSM, and BFGS-GSM).(d) An experimented way of getting a new initial Hessian matrix is possible by incorporating the method of both BHHH and BHHH-2 for getting an initial Hessian matrix in the first iteration.(e) By adding computer codes associated with the calculation of the initial Hessian matrix used in both BHHH and BHHH-2 algorithms to general estimation procedures of the above four algorithms, new estimation codes that differ only in the way of getting initial Hessian matrix were written.They are named as DFP (BHHH), BFGS (BHHH), DFP (BHHH-2), BFGS (BHHH-2) and DFP-GSM (BHHH), BFGS-GSM (BHHH), DFP-GSM (BHHH-2), and BFGS-GSM (BHHH-2).(f) An important lesson educated from a given 1   +  2   +  3 Acco  +  4 Sex  +  5 Age 340  taxi =  1   +  2   +  6   subway =  1   +  2   +  7   bus =  1   +  2   +  8   limousine =  1   +  2   +  9 .

Table 1 :
Different way of acquiring the Hessian matrix between algorithms.
is the approximate Hessian matrix of DFP algorithm in  + 1 iteration, replacing the real Hessian matrix appeared as ( NR Real ) − in Newton Raphson algorithm,  DFP = ), in the context of this research  is 9 × 9 dimension identity matrix,   is equal to the value of (− DFP  )  in iteration , BFGS +1(AP) is the approximate Hessian matrix of BFGS algorithm in  + 1 iteration, replacing the real Hessian matrix appeared as ( NR Real ) −1 in Newton Raphson algorithm,  BFGS  is the approximate Hessian matrix in  iteration (if  = 1 then  BFGS  = ), in the context of this research  is 9 × 9 dimension identity matrix,   is equal to the value of (1/     )  − (1/    BFGS    ) BFGS    ,    is transposed matrix of   , and V is equal to the value of     BFGS    .

Table 3 :
Example of computation of GSM in the case of DFP-GSM algorithm.
* Values of function evaluated newly in each iteration.

Table 4 :
Serial number assigned to each of four algorithms.

Table 5 :
Three categories used in experimental runs.

Table 16 :
The experimental results of four algorithms responding to a new way of calculating the initial Hessian matrix adopted from either BHHH or BHHH-2 algorithm (exp.3) (CR = 10 −4

Table 17 :
Log-likelihood values obtained from four algorithms responding to a new way of calculating the initial Hessian matrix adopted from BHHH and BHHH-2 algorithm (exp.3) (CR = 10 −4