Daily Commute Time Prediction Based on Genetic Algorithm

This paper presents a joint discrete-continuousmodel for activity-travel time allocation by employing the ordered probit model for departure time choice and the hazard model for travel time prediction. Genetic algorithm GA is employed for optimizing the parameters in the hazard model. The joint model is estimated using data collected in Beijing, 2005. With the developed model, departure and travel times for the daily commute trips are predicted and the influence of sociodemographic variables on activity-travel timing decisions is analyzed. Then the whole time allocation for the typical daily commute activities and trips is derived. The results indicate that the discrete choice model and the continuous model match well in the calculation of activity-travel schedule. The results also show that the genetic algorithm contributes to the optimization and thus the high accuracy of the hazard model. The developed joint discrete-continuous model can be used to predict the agenda of a simple daily activity-travel pattern containing only work, and it provides potential for transportation demand management policy analysis.


Introduction
The time allocation of individuals for trip making is an important determinant of the temporal pattern of traffic demand on a transportation network.An analysis of individual time allocation decision is, therefore, important for the practical work of transportation planning and management as well as the theoretical work about travel behavior analysis and modeling.Practically, understanding individuals' time allocation decisions is essential for a planning the development and construction of new transportation infrastructure by providing predicted temporal travel demands, b examining the potential responses to improved operational measures such as real-time information , c assessing the effectiveness of timespecific transportation demand management policies, such as compressed working week 1 , staggered shift 2 , road tolling 3 , and other similar strategies 4, 5 .Theoretically, understanding time allocation decisions will not only facilitate the efforts toward developing a comprehensive full-scale model of daily activity patterns but also provide useful insights into the nature of the impact of sociodemographic variables and time-space constraints on individual dimensions of activity behavior.Therefore, time allocation has been a focused issue for regional and transportation science since the 1970s 6 .
In the time dimension, a daily activity-travel pattern includes all the timing and duration of all the activities and trips in a day.The need to analyze the daily activity pattern makes it essential to consider time of day and activity-travel durations together.However, to date, timing and duration have been largely treated separately in the literature 7 , and many existing studies only considered part of daily activity agenda.For example, Vovsha and Bradley 8 focused on the departure-from-home and arrival-back-home time decisions.Fujii and Kitamura 9 and Hamed and Mannering 10 examined the time allocation of postwork activities.Bhat and Singh 11 and Habib et al. 12 modeled the departure time of daily commute trips, without involving the commute travel time.This study is aimed at addressing this issue by developing a joint daily time allocation model to predict a typical daily activitytravel schedule.
One important objective of transportation planning is to relief congestion by improving the level of service during the peak hours on the transportation network.Peak periods' traffic demands, or the source of congestion, are largely contributed by commute trips.For example, based on one survey conducted in Beijing, China, in 2005, over 32% of all the trips in both morning and evening peak periods are commute trips.Therefore, commute trips are at the core of many recent studies, such as Habib et al. 12 , Zhang et al. 13 , and Bhat and Singh 11 .
As stressed above, the study on commute trips and daily time allocation is of great importance for learning the travel behavior during the peak hours as well as obtaining the daily activity pattern, both of which are essential component of for transportation planning and management.However, to the authors' knowledge, there is no study that considered the overall daily time allocation of commute activity-travel pattern as a whole and developed a model system to predict it.Therefore, this paper focuses on predicting timing and duration of daily commute trips, expecting that commuter's typical travel schedule can be obtained based on the developed model.Using data from a 2.5% sample household survey in Beijing, China, a joint discrete-continuous model system for prediction of daily commute time allocation was developed and estimated, including ordered probit models for departure time analyzing and hazard models for travel time forecasting.
The choice of parameters is of great importance for the estimation efficiency and prediction accuracy of the models.As there are a lot of potential factors affecting traveler's decision about time allocation, genetic algorithms GAs will be employed in parameters optimization.Being one of the heuristic algorithms, GA has been successfully applied in various optimization problems 14, 15 .
The remainder of this paper is organized as follows: Section 2 presents the literature review on activity-travel time allocation in general.In Section 3, the joint discrete-continuous time allocation model is built, in which the ordered probit method, AFT model, and GA are employed.Section 4 predicts the commute activity-travel agenda by using the developed model.The paper closes with some overall conclusions and a discussion of future research directions.

Relevant Literature
With respect to the two dimensions of time allocation, timing and duration, existing studies can be classified into three categories: those only looked at departure time, those only looked at travel time, and those looked at both.
For the first direction, timing, the major method employed is discrete choice model.For example, Bowman and Ben-Akiva 16 modeled the departure and arrival time of daily trips using the multinomial logit MNL model.Small  For the second aspect duration analysis , the most widely used model is the hazard model.The hazard model recognizes the dynamics of travel or activity durations by considering the conditional probability of event termination, usually as a function of covariates explanatory variables 12 .Bhat 20,21  Comparing with the first two categories of studies, those using the joint analysis of timing and duration are more helpful for the modeling of daily activity schedule, by contributing to an insight into the influence and connection among duration and time of day choice as well as activities and trips There is also a kind of conjunct model that was used in the analysis of timing and duration, which is the discrete-continuous choice model.Pendyala and Bhat 27 examined the relationship between time of day choice and activity episode duration using discrete-continuous simultaneous equations, but it was neither on daily time allocation simulation nor the commute trips.Similar discrete-continuous models were employed by Habib et  Moreover, the study of Pendyala and Bhat 27 suggests that time of day and activity duration is only loosely related for the commuter sample.Therefore, travel departure time and travel duration will be modeled in this paper, instead of activity duration.In this case, the work duration can be calculated according to the arrival time at work location and departure time of the next activity after work.

Analysis of the Commute Activity-Travel Agenda
In this paper, work location refers to the usual work location for a worker and the usual school location for a student.As shown in Figure 1, a typical daily commute activity-travel pattern is home-to-work commute-work activity-work-to-home commute.Key time and duration values in this pattern include home-to-work morning departure time Dt 1 -home-to-work travel time T 1 -arrival time at work location A 1 -duration of work D 1 -work-to-home evening departure time Dt 2 -work-to-home travel time T 2 -arrival time at home A 2 .Within this pattern, the departure times Dt 1 and Dt 2 and the travel times T 1 and T 2 are most important.Known their values, one can easily derive the other three times A 1 , D 1 , and A 2 using the following equations:

3.1
This study employs the ordered probit model belonging to the discrete choice models for departure time forecasting and the hazard model belonging to the continuous models for travel time analysis.Four models, home-to-work departure time choice model, home-towork travel time estimation model, work-to-home departure time choice model, and workto-home travel time estimation model will be developed and the values of Dt 1 , T 1 , Dt 2 , and T 2 will be predicted.Values of A 1 , D 1 , and A 2 are then calculated based on 3.1 .Figure 2 is a schematic representation of the entire modeling process.

Data
This study uses data from a large-scale daily travel survey conducted in Beijing, China, in 2005.A face-to-face interview was given to a sample of 54,398 households, and activity/travel information of all household members on one particular working day is collected.The study area is 1,368 km 2 covering all 18 districts of Beijing and had more than 30 millions' population in 2005.In addition to weekday OD information, the survey also collected information regarding household size, car ownership, home location, monthly income, and mobility , people age, gender, driving license, and occupation , and trips departure time, arrival time, purpose, mode, transit path, etc. .With records containing missing values eliminated, our final sample consists of 37,842 commute trips of 37,842 individual workers/students from 28,382.
Based on a preliminary correlation test, 15 sociodemographic and trip characteristics variables were selected from the survey, shown in Table 1.
The statistics of the variables based on the sample data are shown in Table 2.

Departure Time Choice Model
The reported home-to-work morning departure time and the work-to-home evening departure time cover the time period of 4:00 am-12:00 am and 15:00 pm-22:00 pm, respectively, in our sample.Moreover, we observed that the home-to-work morning and the work-to-home evening peak hours in Beijing are 6:00 am-9:00 am and 16:00 pm-19:00 pm using our sample data.In order to reduce the number of alternatives in the models' choice set, we divided departure times into one-hour segments in peak periods and segments of two or three hours in off-peak periods.Table 3 shows the discrete alternatives for the home-to-work and workto-home departure time choices.As shown in Table 2, the alternatives of the departure time choice models are naturally ordered time periods.MNL model, which is commonly used in departure time modeling, would fail to account for the ordinal nature of the dependent variable and have the problem of IIA.This study will employ the ordered multiple choice model for departure time modeling instead.
The ordered multiple choice model assumes the following relationship:

3.2
where P n j is the probability that alternative j is chosen as departure time of trip n n 1,. .., N , α j is an alternative specific constant, X n is a vector of attributes of trip n, β j is a vector of estimable coefficients, and θ is a parameter that controls the shape of probability distribution F. Therefore, F can have various shapes of distribution based on a different value of θ.The ordered probit model, which assumes standard normal distribution for F, is the most commonly used.The ordered probit model has the following form:

3.3
where P n j is the cumulative standard normal distribution function.For all the probabilities to be positive, we must have The estimation results of the home-to-work and the work-to-home departure time choice models are shown in Table 4.

Mathematical Problems in Engineering
The estimation results indicate that high-income travelers are more likely to depart from home or work location later than travelers with low income.Older persons tend to have earlier departure times than younger ones.Commuters whose occupations are administration or health care are more likely to depart from home earlier but back to home from work location later than those with other occupations.The probability that students or teachers have early departure times both from home and school is high.Workers and servers tend to depart from workplace late.Concerning gender, men are more likely to leave home earlier but leave work later than women.Regarding travel modes, commuters choosing walk or auto have late home-to-work and work-to-home departure times.Bikers tend to leave home earlier, while commuters taking bus depart from workplace later.Long-distance trips tend to be made later from home while earlier from work location.Workers are more likely to leave home later than students.

AFT Model and KM Estimator
According to the travel survey data in Beijing, the average travel time for the home-to-work commute is 19.36 minutes, with a maximum of 205 minutes and a minimum of 1 minute; the average duration for the work-to-home commute is 18.36 minutes, with a maximum and a minimum of 168 minutes and 1 minute, respectively.
Treating travel times as natural continuous variables, one can use the hazard model to predict both the home-to-work and the work-to-home travel times.Hazard-based duration models are ideally suited to modeling duration data 20, 21 , such as travel time and activity duration.The hazard also called a hazard rate represents a termination rate of the duration.
Let T be a nonnegative random variable representing the travel time.The hazard at time t on the continuous time-scale h t is defined as the instantaneous probability that the travel duration under study will end in an infinitesimal time period Δt after time t, given that the duration has not elapsed until time t.A mathematical definition for the hazard function is as follows: Let f • and F • be the density and cumulative distribution function for T , respectively.Then the probability of ending in an infinitesimal interval of range Δt, after time t, is f t Δt.And the probability that the process lasts for at least t is given by the survival equation 3.5 : Thus, the hazard function can be further expressed as The distribution of the hazard can be assumed to be one of many parametric forms or to be nonparametric.Because the distribution of the travel time is unknown, one of the nonparametric methods, the Kaplan-Meier KM product limit estimator, is conducted to explore the covariate effects and the potential distribution.As a nonparametric method, the KM estimator produces an empirical approximation of survival and hazard but hardly takes any covariate effects into consideration.It is similar to an exploratory data analysis.Denoting the distinct failure times of individuals n as t 1 < t 2 < • • • < t m , the KM estimator of survival at time t i is computed as the product of the conditional survival proportions: where r t k is the total trips at risk for ending at t k and d t k is the number of trips stopping at t k .By using the KM estimator, the survival function curves of the home-to-work and the work-to-home travel time are estimated, which are shown in Figures 3 and 4, respectively.The results indicate that the survival probability decreases with travel time, which implies an accelerated failure time AFT model with Weibull or Exponential distribution should be employed.Therefore, the AFT model is developed to examine the linkages between travel time and covariates relative to individual and household.
The AFT model is one of the popular parametric forms of hazard model.It permits the covariates to affect the duration dependence.The survival function of AFT model is given as where S 0 • is the baseline survival function.The corresponding hazard function is 3.9 The AFT model can be expressed as a log-linear model: ln t β X ε.

3.10
Assuming the random error ε follows either a Weibull distribution or an Exponential distribution, one can get two kinds of AFT models, and both of them are often used in duration analysis.

GA for Parameter Optimization
The parameters in the AFT models will influence the estimation efficiency and prediction accuracy of the models greatly, especially for large-scale or real-time feature practice application.Therefore, this paper attempts to find the appropriate parameters in AFT models by using GA.GA is a part of evolutionary computing, which is a rapidly growing area of artificial intelligence.The process of GA is as follows.

Encoding of Chromosome
GA is started with a set of solutions represented by chromosomes called population.The individuals comprising the population are known as chromosomes.In most GA applications, the chromosomes are coded as a series of zeroes and ones, or a binary bit string.For the travel time forecasting models, some parameters are continuous valued like distance and age while some are discrete valued such as the variables about mode and occupation .Therefore, the real encodings were adopted for continuous-valued parameters, and the binary bit string was adopted for discrete-valued parameters.Thus, each chromosome consists of n "genes", gen t 1 , gen t 2 ,. .., gen t n , which represents n parameters, respectively.

Crossover
Crossover is a reproduction technique that takes two parent chromosomes and produces two child chromosomes.A commonly used method for crossover called one-point crossover 29, 30 will be employed in this study.In this method, both parent chromosomes are split into left and right subchromosomes, where the left subchromosomes of each parent are the same length, and the right subchromosomes of each parent are the same length.Then each child gets the left subchromosome of one parent and the right subchromosome of the other parent, as shown in Figure 5.The split position between two successive genes is called the crossover point.For example, if the parent chromosomes are 011 10010 and 100 11110 and the crossover point is between bits 3 and 4 where bits are numbered from left to right starting at 1 , then the children are 01111110 and 100 10010.We will call crossover applied at the bit level to bit strings binary crossover, and crossover applied at the real parameter level real crossover.

Mutation
Mutation is a common reproduction operator used for finding new points in the search space to evaluate.A genetic mutation operation 31 is used in this paper.Assume a chromosome is G gen t 1 , gen t 2 , . . ., gen t n if the gen t i i 1, . . ., n is selected for the mutation, the mutation can be shown in where n is the total number of the parameters.The function Δ t, y returns a value between 0, y given in where r is a random number between 0, 1 ; T max is a maximum number of generations.This property causes this operation to make a uniform search in the initial space when t is small and a very local one in later stages.
To deal with the problem that the mutation may violate the parameters constraints, we will assign a relatively high weight to reduce their probability of being selected in the following search 31 .

Termination
There are four GA parameters, namely, p c , p m , p size , and T max , that need to be predetermined.Considering the features of this problem and our experiences in GA, the values of four GA parameters are set to be 0.6, 0.05, 80, and 5000, respectively.Two AFT models are estimated, each of which assumes the random error in 3.10 follows a Weibull distribution, and an Exponential distribution, respectively.The parameters are optimized using GA and then estimated using maximum likelihood estimation MLE .The estimation results are shown in Table 5.

Home-To-Work Travel Time Estimation Model
The mean absolute percentage error MAPE , which looks at the average percentage difference between predicted values and observed ones, is adopted to examine the accuracy where A i is the observed value and P i is the predicted value for observation i.The MAPE values of the two AFT models are shown in Table 6.
According to the results shown in Table 5, the MAPE value of the Exponential distribution is less than that of the Weibull distribution, indicating that the values predicted by the AFT model with the Exponential distribution is more close to the actual travel time.Therefore, the Exponential distribution function is chosen.The hazard function and survival function are shown as follows: The estimation results indicate that the most essential factor of travel time is distance.The longer the distance from home to work is, the more time it will take.Comparing with other travel modes, the travel times of walk are 14% lower while those of bike are 15% higher.The reason is that, at first short distance encourages short travel time according to the estimation results, and then distance has influence on mode decisions, that is, walking usually belongs to short-distance travel comparing to biking.As for the factor of departure time, the results show that the later the departure times are, the longer the travel times will be.In the above three parameters, the factors of distance and mode are related to the transportation network, while the real traffic condition is considered by using departure time as a factor, because the traffic condition depends regularly on departure time.For instance, if a traveler departs from home in the morning peak time, the probability that he/she encounters traffic congestion is much larger than that in the nonpeak time.The results also show that the higher travelers' income is, the less their travel time will be.The travel times of students or teachers are about 10% less than those of other travelers.The older the travelers are, the longer the travel time will be.Women have longer travel time than men.The travel times of travelers whose occupation is administration are 19% longer than that of education.7 and 8, respectively.

Work-To-Home Travel Time Estimation Model
Same as the home-to-work model, the AFT model of work-to-home travel times with Exponential distribution is better than that with Weibull distribution.Therefore, the former is selected.The hazard function and survival function are as follows:

3.15
The estimation results indicate that the travel time of high-income travelers is 1% lower than that of low-income travelers.Moreover, old persons are likely to spend longer time in work-home trip.Regarding commuter's occupation, blue-collar workers are likely to spend shorter time for evening commute trip, while the travel times for teachers or students are longer.Comparing with other modes, walking trips have shorter time, while cycling trips have longer time.Long-distance trip takes longer travel time.The later the commuters depart from work, the longer the travel times will be.

Prediction of the Commute Activity-Travel Agenda
As explained in Section 3.1, the key timings and durations A 1 , D 1 , and A 2 can be calculated once the values of Dt 1 , T 1 , Dt 2 , and T 2 are predicted.Here is an example: the first member in the family with ID number 010104065 in our sample, Mr. Chen, is 45 years old, has an occupation of services, and earns 0-1500 RMB every month.His commute mode is walk, and the distance of one-way commute trip is 1500 meters.He had a typical commute activitytravel pattern on the survey day, which is shown in Figure 8.
Based on the developed models, his daily commute time allocation is predicted and shown in Figure 9.     Comparing with the observed values, the errors of the predicted results can be calculated as follows.
i The errors of home-to-work departure time: the maximum error is 45 minutes, the minimum error is 15 minutes.
ii The error of home-to-work travel time: 5.47 minutes.
iii The errors of arrival time at work location: the maximum error is 51 minutes, and the minimum error is 0 minute.
iv The errors of work-to-home departure time: the maximum error is 60 minutes, and the minimum error is 0 minute.Results also show that the errors of departure and arrival time are much larger than that of the travel time.The main reason is that we divided the natural continuous departure time into discrete time interval artificially, which reduces the predictive accuracy of the model.It has been tested that the smaller the interval is, the higher the predictive accuracy will be.However, as there are already five alternatives for both of the departure time choice models, at least half-an-hour interval will make the number of the alternatives double.Then the model will be more complex and the efficiency of the model will be lower.

Conclusions
In this paper, we have formulated and estimated a joint model of departure time choice and travel duration for commuters' daily activity-travel time allocation.Two ordered probit models have been employed to forecast the home-to-work and work-to-home commute departure time.By doing so, we were able to recognize the natural temporal ordering among the departure time alternatives and address the IIA limitation of the standard MNL model.Furthermore, two AFT models were built and estimated to predict home-to-work and workto-home travel times by using GA as parameters optimization.Then timing choice of a simple daily activity-travel pattern has been calculated.
Comparing with the previous studies, this paper developed a joint discrete-continuous model system to predict all the departure times and the activity-travel durations of a typical daily commute activity pattern.Results of this study not only contribute to developing a fullscale daily activity pattern forecasting model but also provide useful insights in the influence of sociodemographic variables on activity-trip timing decisions as well as the time constraint between daily activities and trips.Moreover, GA contributes to the optimization and thus the high accuracy of the travel time prediction model.In addition, this analysis of daily commute time allocation can be applied to a wide range of TDM policies, especially the measures aimed at adjusting the commute times, such as flexible work and compressed working week.For examining the effects of the traffic demand strategies, the developed model cannot only describe the overall change of the daily activity schedule caused by the strategies but also explore the time tradeoff between the connected trips as well as trips and activities.Besides evaluating the effects of the transportation demand management strategies, this study is also essential for planning the development and construction of new transportation infrastructure as well as examining the potential responses to improved traffic operational measures.
The results of this paper confirm that the discrete choice models and the continuous models can match well in the calculation of a whole-day activity-travel schedule, although comparing with the continuous models, the predictive accuracies of the discrete choice models are a little lower, as they divide the naturally continuous time into artificially defined time periods.Similar studies were also found to employ the discrete-continuous methods to model coupled mode and commute timing choice 12, 25 ; joint activity-type preference, travel time, and activity duration 10 ; as well as other activity-travel behaviors 27 .Therefore, it can be expected that we can further employ the combination of the discrete and continuous models to predict all the dimensions of the entire-day activities and trips.This future study can provide more useful insights into the nature of travelers' daily activity-travel decision making.
It should be pointed out that only the typical commute activity-travel pattern, comprising two commute trips and one work activity, has been considered in this paper.In reality, it is also common to observe other commute activity-travel patterns such as those including work-based subtour or home-based nonwork trips, or those having stops during commute travel.Further study may be done to model such daily activity-travel patterns.It will also be very important to exam one's activity-travel patterns over multiple days if the multiday travel survey data can be obtained.
applied hazard modeling framework to analyze after-work activity duration.Juan and Xianyu 22 considered daily travel time using hazardbased duration model.Bhat 20, 21 analyzed the duration of shopping activity by employing hazard-based duration model.As a matter of fact, a few studies also used hazard model in timing analysis.Examples include research by Habib et al. 12 on investigation of trip timing and by Bhat and Steed 23 on departure time choice for shopping trip.

1 (Figure 2 :
Figure 2: Modeling process of the commute activity-travel time allocation.

Figure 3 :Figure 4 :
Figure 3: Survival curve of the home-to-work travel time.

Figure 6
Figure 6 illustrates the survival curves for the home-to-work travel time influenced by several major variables.It shows that factors of departure time, travel mode, income, gender, and going to work have influences on home-to-work travel time.Two AFT models are estimated, each of which assumes the random error in 3.10 follows a Weibull distribution, and an Exponential distribution, respectively.The parameters are optimized using GA and then estimated using maximum likelihood estimation MLE .The estimation results are shown in Table5.The mean absolute percentage error MAPE , which looks at the average percentage difference between predicted values and observed ones, is adopted to examine the accuracy

h t exp 2 .Figure 6 :
Figure 6: Survival curves of the home-to-work travel time impacted by several factors.

Figure 7
Figure 7 illustrates the survival curves for the work-to-home travel time influenced by several variables of the interest.The AFT models with both Weibull distribution and Exponential distribution are employed for the work-to-home travel time modeling.The estimation results and the MAPE values of the two AFT models are shown in Tables7 and 8, respectively.Same as the home-to-work model, the AFT model of work-to-home travel times with Exponential distribution is better than that with Weibull distribution.Therefore, the former is selected.The hazard function and survival function are as follows:

Figure 7 :
Figure 7: Survival curves of the work-to-home travel time impacted by several factors.

Figure 8 :
Figure 8: Observed commute activity-travel pattern and time allocation.
17 built an MNL model for home-work morning departure time choice.Vovsha and Bradley 8 and Ettema et al. 7 also employed the MNL model but specified a continuous time variable in its utility function.Bhat 18 and Small 19 derived the ordered generalized extreme value OGEV model for departure time choice and time of day analysis, respectively.Habib et al. 12 introduced OGEV model into analysis of work start time and work duration because it allows for the accommodation of correlation among time period alternatives.In this way, it resolves the IIA independence of irrelevant alternatives problem of MNL model, which assumes zero correlation among different time periods.

by Janssens et al. 24 on time allocation of daily activity-travel patterns, by Fujii and Kitamura 9 on timing and duration of commuters' daily activity patterns after work hours, by Habib 25 on work start
. The examples include the study done time and work duration, by Raux et al. 6 on daily time allocation of travel and out-of-home activity, by Schwanen and Dijst 26 on relationship between commuting time and work duration, and by Vovsha and Bradley 8 on departure-time and duration of home-based trips.Similar studies also include Ettema et al. 7 , Pendyala and Bhat 27 , and Habib et al. 28 .
al. 12 , Habib 25 , and Hamed and Mannering 10 .Both of the first two studies modeled the joint

Table 1 :
Variables in the departure time choice models.

Table 2 :
Statistics index of the variables.

Table 3 :
Alternatives in the departure time choice models.

Table 4 :
Estimation results of the departure time choice models.

Table 5 :
Estimation results of the home-to-work travel time model.

Table 6 :
Goodness of fit index and estimated distribution statistics of the home-to-work travel time model.

Table 7 :
Estimation results of the work-to-home travel time model.

Table 8 :
Goodness of fit index and estimated distribution statistics of the work-to-home travel time model.

v
The error of work-to-home travel time: 4.29 minutes.viThe errors of arrival time at home: the maximum error is 65 minutes, and the minimum error is 5 minutes.By comparing the predicted values of the developed daily time allocation model with the observed values of our sample, the maximum errors for all the departure times and the activity-travel durations are as follows: error Dt 1