Using Clustering Methods in Multinomial Logit Model for Departure Time Choice

Travellers have to make some decisions for each trip, and one of them is the choice of departure time. Discrete choice models have been employed as an approach to departure time modelling by many researchers. In this method, preparing choice set is a primary challenge which involves the denition of some departure periods to be selected by the traveller. In this research, choice sets were formed by applying the clustering methods on departure times. Aerwards, we developed Multinomial Logit (MNL) models on dierent choice sets and compared the models. e data used throughout this research belonged to Mashhad City. Research results indicated that Ward’s hierarchical clustering method is improper for time discretization; furthermore, the K-means clustering method is more ecient than the expectation maximization and K-medoids methods in the time discretization for MNL modelling. e developed model (based on K-means clustering method) accurately predicts departure time for 58% of persons within the test group, which reects the eectiveness of the resulting model compared to the 36% which is obtained without the model.


Introduction
Travellers' choice plays a substantial role in the performance of the transportation system; thus, models which will predict traveller behaviour and their choices are highly valuable. In other words, travel demand is the result of various decisions which are made by travellers and modelling of these choices is profoundly in uential. One of the choices available in the travelling process is the departure time choice (DTC). It is essential to develop an appropriate DTC model due to the nature of departure time, its application, and complexity of time representation; also, developing an accurate model that is closely related to precise generation of choice set.
Firstly, based on an investigation on nature of traveller's decisions [1], the choice behaviour can be classi ed into the following three levels: strategic, tactical, and daily level. e DTC can be allocated to the tactical and daily levels; therefore, it is not a long-term choice and may be changed based on travellers' daily condition. Furthermore, the nature of DTC has unique characteristics due to travel time uncertainty and departure time can be a ected by demand management policies.
Besides, the wide range of applicability of DTC is another issue that emphasizes the importance of modelling DTC. For instance, its application in peak hour estimation, evaluation of travel demand management policies, and dynamic tra c assignment is vital. e peak hour coe cient is usually used in four-step models to convert daily demand to peak hours [2]. In other words, time of peak hour demand is calculated with a constant coe cient (resulted from surveys), and the performance of transportation system is assessed for peak hours. But departure time models can help to reach better estimation from the share of demand in various time periods. Nonetheless, in assessing the e ect of di erent management policies (including demand management policies), it is necessary to examine how travel demand is distributed in a day. e departure time in uences the tra c peak hour and its pattern; therefore, developing a DTC model which is applicable to demand management policies is required [3][4][5]. Another primary application of these models is in dynamic tra c assignment studies because development of accurate departure time models improves the performance of these models [6]. A further application of the DTC is in activity-based models [7,8]. e definition of choice set for DTC models is a complicated problem due to the following considerations which should be taken into account. First, the conversion of continuous time to discrete time and establishment of a rational relation between temporal resolution and model complexity must be considered. Second, the relationship between choices should be taken into account, especially in short periods.
ird, the person's perception of choices depending on travel time must be considered. Many people round off the time, which depends on travel time and its variability. For instance, in short trips, 9 : 48 may be rounded to 9 : 50, while it may be rounded to 10 : 00 for long trips.
Moreover, the accuracy of DTC modelling is mainly influenced by choice set generation. e choice set generation refers to the definition of departure periods which are considered and selected by the person. us, to achieve an accurate DTC model using the appropriate method for choice set generation is inevitable.
A er we generally stated the importance of DTC models, research motivation will be discussed in this section. ere were two main motivations for this research which are firstly, to emerge a method for estimating the peak hour pattern based on estimated socioeconomic characteristics of travellers at planning horizon and secondly, developing an approach to evaluate the effect of travel demand management in four-step models.
In the first place, because of changeable socioeconomic attributes of travellers during years of planning period, the choice of departure time will be changed and the pattern of peak hours will be varied consequently. erefore, using the observed pattern of peak hours in the base year (based on survey) for planning horizon would lead to inaccurate estimation of traffic volumes. Accordingly, we tried to make a DTC model based on observed behaviours so the peak hours can be predicted accurately using the DTC model and estimated socioeconomic variables of planning horizon.
In the second place, the analysis and evaluation of demand management policies (for example network pricing in a specific duration of time such as peak hours or making changes in starting and ending time of activities in administrative offices and educational centres) and their corresponding impact on DTC in the four-step models cannot be performed properly. erefore, authors made an effort to make a DTC model and incorporated sociodemographic variables as well as trip attributes within the model. is model can be a tool to assess demand management implications on the departure time choice and consequently on the pattern of peak hours.
In this paper, we review modelling departure time and choice set generation in literature and examine a different method of clustering as a tool for choice set generation and compare the result of departure time modelling based on these various techniques. e DTC and choice set generation (as a component of DTC model) is performed in the city of Mashhad with a population of about 2.7 million people as a case study. is paper applied the multinomial logit (MNL) model to estimate the departure time choice for home-based work trips, which constitute a significant proportion of urban trips. e remainder of this paper is organized as follows. First, we review the literature on DTC; then we provide an overview of the problem definition. In the next section we present methods. Finally, we emphasize the results and findings from this study respectively, and last section presents the conclusion.

Literature Review
Departure time models reflect the selection of a point or period at which the person begins the trip. e DTC models have been developed with various methods based on the framework of application. e main applications of this models are using DTC as a subcomponent of dynamic assignment or an element of activity scheduling modelling, using DTC as a tool for predicting peak hours pattern in long-term planning (as a time of day model) and using DTC as a tool for analysis and evaluation of demand management policies.

e DTC Model as a Subcomponent of Dynamic Traffic
Assignment or an Element of Activity Scheduling Modelling. Lim et al. [6] investigated the logit-based combined departure time and dynamic stochastic user equilibrium assignment problem. In this study, for each OD pair whenever road users choose their departure times and route according to the logit choice model, considering time dependent travel cost and schedule delay.
Feil et al. [8] acknowledged that scheduling in activity-based travel demand modelling follows three major lines of research which are econometric models, utility-based microsimulations and computational process models (CPMs). Econometric models have some advantages and they are based upon a well-established statistical methodology. Utility-based microsimulations apply a sequential decision making process. In this approach, rather than a probability distribution, the result is always a precise solution alternative.
Adnan [9] developed an integrated model for scheduling the activities an individual is supposed to do in a given day with the representation of road network congestion effects. In recent works of activity scheduling modelling for a daily activity travel pattern, the utility specification of their model includes a component that measures the utility of activity engagement. is has been calculated through predetermined time-of-day-dependent marginal utility profiles for a particular activity.
Cantelmo et al. [10] considered the problem of jointly modelling activity scheduling and duration within a Dynamic Traffic Assignment (DTA) problem framework. ese researchers introduced the travel choice model as a subcomponent of DTA which consists route, departure time, and mode choice. is study considered the final daily activity pattern is a function of travel time, activity duration and the preferred arrival time at the destination. e utility function that is used in activity scheduling modelling at this study contained the clock-based and duration-based utility.

e DTC Model as a Tool for Predicting Peak Hours Pattern
in Long-Term Planning (as a Time of Day Model). Small and Biggiero [11,12] employed MNL models to select a departure time. In 1987, Small [13] proposed an ordered generalized extreme value model, in which m departure time periods were considered in a nested design. Hendrickson et al. [14] used the data collected from Pittsburgh (Pennsylvania) to examine flexibility of departure time choices for work trips. ese researchers estimated the logit model for concurrent selection of transportation mode and departure time.
Chin believes that although many studies have been conducted on transportation modes, few studies have discussed DTC and thus the importance of this problem is not understood. is study was an attempt to model DTC using the MNL model and analyse the nested logit model. Research results [15] revealed that delay, travel cost, and travel time are among the significant factors influencing the DTC for commuters. Sumi et al. [16] carried out a study to predict the response of commuters to operational features of the public transportation system for departure time and route choice.
In these applications of the MNL model, a day is divided into some periods, and the MNL model is used to select a choice from the set of alternatives (periods). In the MNL model, systematic utility functions are defined as a function of socioeconomic attributes and variables related to trip purpose. Although the MNL model overlooks the similarities or correlations among adjacent periods and violates the IID assumption, it is still popular due to its closed form and extensive use [17].
In many urban areas, the four-step process will continue to be used for macroscopic modelling of large scale road network. It is typical for models to start by estimating daily travel in the trip generation step, but traffic assignment is performed separately for different time periods. Typically, a four-step model uses three to five periods (for example, morning peak, mid-day, a ernoon peak, night). e most common method for conversion of daily demand to some periods in four-step models is simple factoring. ese factors typically are developed from the temporal patterns of trips which attained from OD surveys. While this method is relatively easy to implement and to apply, it is not sensitive to varying transportation conditions, limiting its usefulness in analysing policy changes or congestion management activities, and these time coefficients aggregated from OD surveys are somewhat biased, thereby reducing accuracy [3].
Fujita et al. have proposed a model for estimation of these time coefficients using links volume counting and Semidynamic Traffic Assignment [2]. In that study, a model has been developed that justifies the 24-hour time coefficients under a given day-long OD demand by minimizing the least square error between hourly observed link flow and estimated link flow But this research is useful for operational analysis, not for strategic planning because the analyst does not have access to link volumes in horizon year.

e DTC Model as a Tool for Analysis and Evaluation of Demand Management Policies.
Mannering et al. [18] examined the effects of provision of traffic information on commuters' behaviour in Seattle. Ettema et al. [19] developed a behavioural DTC model based on travelling activities plans. ey used the utility of participation in activities to model the DTC. Ettema et al. [20] also considered a variation of departure time as a possible response to density in microscopic traffic models. He [5] studied the effect of access to a flexible work plan on commuters' choice of departure time. Results of their investigations revealed that people with flexible work plans leave later. Bhat [21] referred to the importance of the travel mode and departure time choices and stated that although the priority of researchers modelling travel demand was travel mode choice, little attention has been paid to the DTC. He introduced a nested structure by considering the travel mode choice at the higher level and the DTC in the lower level. e MNL model was used for travel mode choice, while the ordered generalized extreme value (OGEV) model was used to consider ranks of choices and selection of departure time. In another article, Bhat [22] introduced the mixed logit model for modelling the travel mode choice and departure time in home-based social-recreational trips.

Various
e relationships between alternatives can be considered in the cross-nested logit model by placing choices in several nests and omitting independence of subsets [23]. A structure that depends on the cross-nested logit model is the continuous cross-nested logit model, which is employed when the day can be divided into many short periods and all of the possible correlations can be considered in this modelling structure. Lemp et al. [24] used the same approach, which is highly reliable but does not allow the probability function to maintain its closed form. ese researchers used the Bayesian estimation technique and reported that the model performs similarly to the continuous logit model. eir explanation of the continuous logit model is based on an application of the MNL method for a large number of discrete time choices.
Steed and Bhat [25] examined DTC for nonwork trips. e discrete choice model was estimated for shopping home-based trips and social/recreational home-based trips. Results indicated that the departure time of social/recreational trips and shopping trips is not highly flexible and is limited to certain times due to the person's or household's time limitations. In another study, Bhat and Steed [4] used the "hazard-based duration model" to develop a continuous model of DTC for urban shopping trips. Application of the continuous time model in predicting time variations in shopping trips is shown concerning changes in sociodemographic properties as well as the trip chain behaviour. Habib et al. [7] used the "hazard-based duration model" to select the departure time. Application of this model for DTC does not necessarily reflect behaviours. In such cases, the departure time is only modelled as a nonlinear regression model, and the behavioural trade-off is not reflected in the selection of departure time.
Jou et al. [26] used the reference point hypothesis of the prospect theory in selecting the departure time. Results indicated that according to the prospect theory, travellers display asymmetrical responses to loss and gain.
where, is the deterministic term for the utility and is the random term of the utility, which denotes the uncertainty caused by the limited power of the analyst. In the logit model, it is assumed that the random term has a Gumbel distribution [17]. Since the alternative with higher utility will have a higher chance of being selected, the probability of selecting choice by the person ( ) from the choice set equals: If is known, ≤ + − is the cumulative distribution function of , and considering the independence of values, this probability equals the multiplication of the cumulative distribution by all choices ̸ = .
Since is unknown, the probability of selecting choice will equal the integral of | using all values of .
is integral can be written in the following closed form [33]: e deterministic term of the utility, V in , for each choice is a function of the features of that choice and characteristics of the person ( ).
where is the vector of qualities of choice perceived by the person ( ) and is the vector of the person's characteristics, and is the vector of utility function's parameters. e deterministic term of the utility is determined in calibration.
For the estimation of the utility function of the logit model, the probability of selecting a choice by the person ( ) is expressed by ∏ . If the person ( ) selects choice , then = 1; otherwise, = 0. Assuming that the choice of each person ( ) is independent of the others, the probability of selecting an observed choice by each person is equal to: where is a vector including the model parameters, and Equation (8) in logarithm form is as follows: .
Ben-Elia et al. [27] proposed a DTC model based on the latent preferential entrance time notion. Using this model, they developed a DTC modelling framework by assuming a latent class of the preferred entrance time.
Sasic et al. [28] proposed a discrete choice model with a latent choice set.
is model is ranked with generalized extreme value (GEV) models. e DTC model was developed for home-based commutes, and separate models have been developed for the departure time of users of private cars and public transportation.

Remarks and Conclusion.
We did not nd appropriate DTC model that is useful on four-step strategic planning. e proposed model in the study of Fujita et al. [2] is useful for operational analysis, not for strategic planning because the analyst does not have access to link volumes in horizon year. Ettema [29] suggested that sociodemographic variables be incorporated within the DTC model. We developed a DTC model and incorporated sociodemographic variables as well as trip attributes within the model. e departure time choice and activity scheduling modelling are inextricably bound up, but it should be noted that the utility function in activity scheduling is derived from activity engagement (from clock-based MU and duration-based utility). In these models alternatives are activities. On the other hand, in many researches with econometric models the alternatives are time intervals. e proposed DTC model in our paper can be used for determining the peak hour pattern at the horizon of longterm planning. However, it is essential to predict socioeconomic variables that incorporated in the model for planning horizon. Furthermore, this model can be a tool for assessment of demand management implications on the departure time choice and consequently on the pattern of peak hours.

Problem De nition.
Several approaches to the DTC modelling are available in the research literature. is study investigates this problem using discrete choice models. Time discretization (formation of choice set from which the person picks an alternative) is particularly important in this approach. In this research, discretization was carried out using di erent clustering methods, and the MNL models were built based on a choice set including these clusters. In other words, the MNL modelling procedure is repeated for each clustering type, and the resulting models are compared. Although clustering and data mining have been employed in tra c engineering and transportation planning [30][31][32], no study has investigated the application of clustering in discrete choice. In the following, a brief description of MNL model is presented followed by general notions of clustering.

Multinomial Logit Modelling.
In the discrete choice models, each choice is favourable to a certain degree for a person, and the alternative with the highest level of utility will be selected. Equation (1) determines the utility function of the person ( ) that corresponds to alternative in the choice set . probability distribution parameters during iterations. e algorithm improves the parameters (clusters) by going through the following steps iteratively:

Step 1 (Expectation). Assign each member, , to cluster
with the following probability.
where, | = , has a normal distribution with a mean of m k . In other words, this step calculates the probability of membership of x i in each cluster.

3.5.
Step 2 (Maximization). Estimates of model parameters are improved using the estimates of probabilities resulting from the previous step. For instance, Since the present research has focused on the DTC modelling, clustering was only carried out on departure time, and from the methods mentioned above, two partitioning methods (K-medoids, K-means), one hierarchical approach, and one model-based method have been used.

Case Study
e data used for this research were obtained from Mashhad city which is the second largest city of Iran. is data included travel information and socioeconomic characteristics of travellers and were collected by interviewing 1.56% of the population. Its rst O/D survey was conducted in 1994 and in 2008 was updated with another O/D survey, when it had a population of over 2.7 million. Figures 1 and 2 illustrate some descriptive characteristics of sample and Figure 3 shows the temporal distribution of trips in a day which aggregated by gender.
e trips under study included commutes aimed at work or education between 2 A.M. and 13 P.M. It was also assumed that travellers below the age of 10 obey their parents' decisions, and thus only the information of persons older than ten was used in this research. While the total number of trips was recorded to obtain more than 12000 with details, we carried out modelling with 70% of the existing data, and the remaining 30% was used to validate the model.

Results
In clustering, it is essential to determine the number of clusters, because it in uences the results, so we considered the same number of clusters in all clustering methods. e EM (expectation-maximization) method resulted in six clusters, and the scree plot (which is mostly used with the K-means method) also con rmed six clusters. Figure 4 shows the sum of squared errors (SSE) inside clusters versus the number of clusters. Hence, the number of clusters was assumed to be six. We used the R so ware [36] in this research to cluster the data.
is estimated by maximizing this function. In 1974, McFadden indicated that is a concave function for a utility function with linear parameters, and numerous statistical so ware products are available for estimating these models [33]. In this research, the MNL models were estimated in Nlogit 5.0 [34].

Clustering.
Clustering refers to the process of grouping data into clusters, as a result the similarity between the data inside each cluster is maximized, and the similarity between di erent clusters is minimized. Di erent clustering methods include the following: partitioning, hierarchical, density-based, grid-based, model-based, and constraint-based methods and large data clustering methods [35].
We will brie y describe the four clustering methods employed in this research. Firstly, k members are randomly selected which are assumed as the clusters' mean in the K-means method. en each object is assigned to a cluster which is the most similar, based on the distance between object and cluster's mean. A erwards, a new mean value is calculated for each cluster. Frequently, the SSE measure, which is de ned as follows, is used to assess similarity [35].
In Equation (10), E is the sum of squared errors of all the dataset members, is a point in the space and denotes a member, and m i stands for the mean of cluster . e K-means clustering method is sensitive to outliers because these points can deform the distribution form. For solving this problem, the k-medoids method is employed, and instead of considering the mean values of members of a cluster to be the reference point, one member is assumed to be the cluster's representative. Each member is assigned to a cluster with the most similarity to the cluster's representative. Members are partitioned based on minimization of the sum of dissimilarities between them and their cluster reference point. erefore, the absolute error value is de ned as follows: In Equation (11), E denotes the sum of absolute error values for all dataset members, p is a point in space and indicates a member of cluster , and nally represents cluster . e hierarchical clustering methods group data into a tree of clusters and these methods can be classi ed into two groups: agglomerative and divisive. e weakness of hierarchical clustering methods is that they are unable to apply modi cations a er splitting, merging, or forming a new cluster.
Model-based methods maximize the t of the existing data to some mathematical models. ese methods are mainly based on the assumptions that the data is obtained by combining several probability distributions, a parametric probability distribution can display each cluster, and the data are a combination of these distributions. According to these assumptions, the goal is to estimate distribution parameters to allow for the optimum t of parameters with the data. e EM method is an algorithm that searches for estimates of Journal of Advanced Transportation 6 red triangle), and clusters 6 and 5 (pink triangle and turquoise rhombus) in the clustering carried out by the hierarchical method. e independence of choices and no overlap are requirements for MNL models which are not met by the hierarchical clustering method. Hence, we did not use the hierarchical clustering method in MNL modelling.
A er discretizing time with the clustering as mentioned earlier, the discrete choice modelling is carried out. First, a choice set should be de ned, and in this research, the rivals of each choice are its adjacent alternatives. In other words, the decision-maker starts the trip earlier or later than one or two periods, and this is because of the sequential nature of choices. We considered the following two states, and a schematic view of these states is depicted in Figure 7: (1) De ning the choice set as a 6-membered set (see Figure 7(a)). It means all periods are feasible for travellers to start the trip. (2) De ning the choice set as a 2-membered set (see Figure  7(b)) or 3-membered set (see Figures 7(c)-7(e)). A choice set with two members is for beginning and ending alternatives which means a traveller, who chooses the beginning or ending period, compares it with the rst next or prior period. A choice set with three members is for the middle alternatives, which conveys a traveller, who chooses the middle period, compares the middle period with one previous and one next period. e MNL modelling was carried out for two states (the 6-membered and 2-/3-membered sets). We examined di erent combination of independent variables to form the utility functions based on statistical indexes and our intuition. But we presented only the best developed model in Table 2, S1 and S2.
Results of model validation revealed that use of the 2-/3-membered sets in the DTC modelling yielded better than 6-membered sets. erefore, we estimate the utility functions' coe cients of the MNL models for the 2-/3-membered sets of di erent clustering methods. Table 3 presents some criteria of MNL modelling based on various clustering methods, and Table 2 shows the utility functions' coe cients for modelling on choice set resulted from K-means method and their corresponding values of the t-test (See Tables S1 and S2 in the Supplementary Material for the modelling on the choice sets resulted from K-medoids and E.M. methods). In MNL modelling, variables namely gender, age, profession, vehicle type, vehicle ownership, education level, and travel time are significant (See Table S3 in the Supplementary Material for the description of variables that used in modelling).
As we expected, results of coe cient (parameter) estimation which is illustrated in Table 3 indicate traveller's job has a signi cant e ect on departure time choice. is is predictable because of di erent start time and activity duration for various jobs. Moreover, the e ect of selected mode on departure time choice is unsurprising. e public transportation users are more likely to depart in a period between 7.28 and 8.30. e pedestrians and bicycle riders are more reluctant to depart before 6.17 than the period between 6.17 and 10.75, and they are more prone to leave their homes between 10.75 and 13.00.
Clustering of the data with the four methods led to the formation of di erent clusters. e start and end time of each cluster in di erent clustering methods are presented in Table 1. Figure 5 illustrates data scattering from 2 A.M. to 13 P.M. Clusters are also shown by di erent colours and symbols in this chart. As it can be seen, the K-medoids method divided the peak period into more clusters, and clusters with shorter length formed in the peak hours. erefore, the variability of the number of each cluster member is smaller than other methods (see Figure 6).
Besides, the rst cluster in the EM method is smaller than the rst cluster of others, which implies other characteristics of pre-peak-hour trips. In the K-means method, the duration of di erent clusters (except for the rst cluster) is close to each other, but this method displays the highest standard deviation and range of the number of members. Figure 5 depicts the overlap of clusters 2 and 1 (the red triangle with a black circle), clusters 3 and 2 (green plus and

Discussion
Since the data in each cluster are di erent according to di erent clustering methods, it is not possible to use the likelihood ratio test to compare models. us we used the nonnested hypothesis test method, in which we examined the hypothesis e long travel times will dictate choosing time periods before 6.17 for morning work shi s and period of 9.50-10.75 for a ernoon shi s that are shown with positive coe cient in Table 2. Also, the positive value of "travel time" coe cient shows longer travel time increases the likelihood of sooner departure time due to the importance of on-time arrival for work trips.   where, models H and L denote the models with the higher and lower ̄ 2 values, respectively. Moreover, ̄ 2 and ̄ 2 are adjusted likelihood ratios for the models with higher and (14) Significance Level  (e) compared to the alternative selected by the model, and the degree of compliance should be determined using the following measure [35]: We adopted two approaches to identify the model's selected alternative. e rst assumed the choice with the most probability to be the chosen alternative, whereas the second determines the chosen choice by using Monte Carlo simulation (15) Count 2 = Number of Correct Prediction . Table 4 illustrates the results of di erent hypotheses test. Following this test, the model resulted from K-means clustering is signi cantly better than the model produced by other methods.
As 70% of the data was used for modelling, the remaining 30% was used to validate the modelling outcomes and examine the performance of models. In other words, we used 8727 records of the data for modelling and 3740 records for validating the models. For validation of the models and determining the predictability, the observed selected choice should be Results of di erent clustering methods indicated that the hierarchical method was not suitable for MNL modelling. A er building the MNL models based on results of the clustering methods, some criteria were used to compare the models. Results indicated that the MNL model based on K-means results displays a better t than other models and a higher prediction power. e prediction potential of the MNL model based on the K-means clustering method was 64.9% and 58.6% with the rst and second approaches, respectively. e comparison between these values and 36.3% (the correct prediction expected value without a model) reveals the e ectiveness of this model. In using the developed models, it is highly critical to consider the stability of clusters in the time horizon which depends on the durability of activity patterns and starting times of activities.
Data Availability e departure time and socioeconomic data used to support the ndings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there they have no con icts of interest.
Funding is research received no speci c grant from any funding agency in the public, commercial, or not-for-pro t sectors.
technique. To calculate the measure as mentioned above in the rst approach, rst the probability of each alternative is determined based on the models, and then the choice with maximum likelihood is assumed to be the predicted alternative and is compared to the observed choice. In the second approach, we calculated probabilities and used Monte Carlo simulation to determine the selected alternative; then we compared it to the observed choice. Monte Carlo simulation is an iterative process, and we presented the mean of various iterations in Table 5 for di erent models. On the other hand, if the model does not exist and we assume that the probabilities of choices in each choice set are equal, the correct prediction expected value will be 36.2% (of the 3740 data records, 700 records belong to the 2-choice set, and 3040 records belong to the 3-choice set).

Conclusions
e present research was carried out with the aid of discrete choice models to develop a DTC model for work trips. To this end, we employed di erent clustering methods for time discretization, and we compared results of MNL modelling for them. e clustering methods used in this research include the K-means, K-medoids, EM, and Hierarchical.
Although time discretization is highly signi cant in these models and forms the choice set, previous studies were conducted based on expert opinions on discretization, and no systematic method was proposed. e present research is an attempt to examine the performances of di erent clustering methods in time discretization and their e ects on MNL modelling.
In MNL modelling, due to the nature of time, the choice set was designed in a way that each choice was competing with its adjoining alternatives. We compared this approach in the choice set generation to an alternative set of all available members and the higher e ectiveness of the proposed method was demonstrated.