Integrated Model of Joint Residence-Workplace Location Choice and Commute Behavior Using Latent Class and Mixed Logit Methods

With the rapid development of urbanization and motorization, urban commute trips are becoming increasingly serious due to the unbalanced distribution of residence and workplace land-use types in most Chinese cities. To explore the inherent interrelations among residence location, workplace, and commute trip, an integrated model framework of joint residence-workplace location choice and commute behavior is put forward based on the personal trip survey data of Beijing in 2005. First, to extract households’ different choice characteristics, this paper presents a latent class model, clusters all households into several groups, and analyzes the conditional probability of each group. Second, the paper integrates the residence location andworkplace together as the joint choice alternative, employs the socioeconomic factors, individual attributes, household attributes, and trip characteristics as explanatory variables, and formulates the joint residence-workplace location choice model using mixed logit method. Estimations of the latent class model show that four latent groups fit the data best. Further results of the joint residence-workplace location choice model indicate that there exist significantly different choice characteristics in each latent group. Generally, the integratedmodel framework outperforms traditional location choice methods.


Introduction
In most Chinese cities, with the rapid development of urbanization and motorization, the density of urban land-use is increasing very fast, and the spatial distribution of residence location and workplace is turning to be unbalanced.As a result, the urban transportation systems, especially the commute trips, are facing more and more serious problems.
During the past two decades, integrated models of urban land-use and transportation systems have been studied extensively, especially the residential location choice models using decision behavior approaches.As a competitive tool, the discrete choice model was used widely in the location choice models.Lerman (1976) [1] introduced household car ownership, housing type, and mode to work to the residential location choice and formulated a logit model.Freedman and Kern (1997) [2] studied both workplace and residence locations in two-earner households.To make the model closer to the reality, McFadden (1978) [3], Boots and Kanaroglou (1988) [4], Gabriel and Rosenthal (1989) [5], Waddell (1993) [6], Abraham and Hunt (1997) [7], Ben-Akiva and Bowman (1998) [8], Deng et al. (2003) [9], Hunt et al. (2004) [10], Bhat and Guo (2004) [11], Waddell et al. (2007) [12], Jiao and Harata (2007) [13], Vega and Reynolds-Feighan (2009) [14], and Li et al. (2010) [15] presented different models of household residence location or workplace choices, as well as household members' activity and travel schedules.All the above work made use of discrete choice feature is to further combine the residence location and workplace together as the choice alternatives and present the joint residence-workplace location choice models for each latent class using mixed logit methods.
The rest of this paper consists of the following contents.The general model framework is proposed in Section 2, including both the latent class model and the mixed logit model.The latent class model for household clustering is formulated in Section 3. The integrated model of joint residenceworkplace location choice is put forward in Section 4 based on the combined choice alternatives.Both models are estimated using the personal trip survey data of Beijing in 2005, and the estimation results are reported and analyzed in Section 5. Conclusions and potential future researches are summarized in Section 6.

General Model Framework
The theory of LCM is based on the probability distribution principles and log-linear models, with the objective to explain the interrelations among manifest variables using the least latent categories and to achieve the local independence.The LCM is mainly used to analyze the categorical data.Compared with continuous variables, the biggest difference of categorical variables is that their values are discrete, with each value denoting different attribute or classification, for instance, gender, residence location, trip mode, and so forth.
Mixed logit (ML) model is a kind of discrete choice model.To assume the parameters subject to some random distributions, it is capable of incorporating the random taste variations of different households, as well as the spatial correlations among different land locations.Therefore, the ML model is widely used in location choice researches.

Structure of Latent Class Model.
The LCM is a kind of model to transform the probabilities of categorical variables to some parameters, that is, probabilistic parameterization.There are two kinds of categorical variables in classical LCM: manifest variable and latent variable.Meanwhile, there are two groups of parameters: latent class probability and conditional probability.
The manifest variable can be observed directly, for example, time, distance, and so on.It is also called observable variable or measureable variable.However, the latent variable cannot be observed directly, for instance, psychological expectation, individual preference, and so forth.
A latent class model can be formulated as where    is the joint probability of the LCM; , , and  denote three manifest variables;    is the latent class probability, which means the probability of latent variable  in class ,  = 1, 2, . . ., , ∑ Similar to our previous work [18], using  ℎ to denote the utility function for decision maker ℎ to select alternative , then it can be divided into two items: the systematic item  ℎ and the random item  ℎ ; that is, To incorporate random taste variations in the model,  ℎ is further formulated as below: where   and   are two kinds of parameters to be estimated;   is the fixed parameter, just like MNL model;   is the unfixed parameter following some random distribution to incorporate the random taste variations;  ℎ and  ℎ are explanatory variables corresponding to   and   , respectively;  is the total number of variables corresponding to   ;  is the total number of variables corresponding to   .Based on the fundamental theory of discrete choice analysis, the mixed logit model can be formulated as where  ℎ is the probability for decision maker ℎ to select alternative  and  is the total number of alternatives.Therefore, the unconditional probability for decision maker ℎ to select alternative  can be further formulated as where  ℎ is the unconditional probability for decision maker ℎ to select alternative  and (⋅) is the density function which the unfixed parameter   follows.

Latent Class Model for Household Clustering
To explore the inherent characteristics of urban residence location choice and workplace choice, this paper further formulates the latent class model for commute trips based on the personal trip survey data of Beijing in 2005.The study area is divided into eight zones according to the urban districts: Xicheng, Dongcheng, Chongwen, Xuanwu, Haidian, Chaoyang, Fengtai, and Shijingshan.Based on the thorough analyses of influence factors of residence location and workplace choices, we introduce the following five variables into the LCM: residence location, workplace, commute distance, commute mode, and household monthly income.
Here residence location, workplace, and commute mode are discrete variables; however, commute distance and household monthly income are continuous in nature.For convenience, these two continuous variables are also discretized and transformed to categorical variables.For the important commute mode, we mainly select five modes, that is, walk, bicycle, bus, subway, and car.
Variables in the latent class model are summarized in Table 1.
Based on the variables in Table 1, the latent class model is formulated as where  is the latent variable;  is the number of latent classes;    denotes the latent class probability; , , , , and  show the manifest variables; , , , , and  are levels of manifest variables, respectively;    ,    ,    ,    , Based on the above symbols and equations, we can further formulate the conditional probability of latent variable  as Using (7), we can obtain the conditional probability of latent variable.Every observation is assigned to the corresponding latent class according to the largest magnitude in the value of      X  .Therefore, for urban residence location and workplace choice problems, households will be clustered into several groups logically using the above LCM method, and each group will have different choice characteristics.

Integrated Model of Joint Residence-Workplace Location Choice
Based on the clustered household groups from the LCM, we can further formulate residence location and workplace choice models for each kind of household using mixed logit method, just like our previous work [18].Explanatory variables are summarized in Table 2, including house renting price, commute distance, commute time, household monthly income, population density, number of employment opportunities, and GDP of workplace.
In this model, we assume that households make residence location and workplace choices simultaneously, that is, the joint location choice.Since there are eight zones in the study area, totally we have 64 choice alternatives; that is, each residence-workplace location pair denotes one alternative.
Based on many estimation experiments, commute distance and commute time are assumed to be corresponding to the unfixed parameters   .Here   is assumed to follow a logarithmic normal distribution.Therefore, the integrated model of joint residence-workplace location choice is presented as where  is the total number of choice alternatives,   follows the logarithmic normal distribution in (9), and definitions of other variables are the same as those stated before: where   is the random variable and   and   are the expectation and variance of ln(  ), respectively.Using ( 8) and ( 9), we can formulate every household group's joint residence-workplace location choice behavior.

Estimation Results
Based on the personal trip survey data of Beijing, we can estimate the integrated model of joint residence-workplace location choice and commute behavior.

Estimation Results of Latent Class Model.
Parameter estimation of latent class model is usually implemented using two kinds of iterative algorithms based on maximum likelihood method: expectation maximization algorithm and Newton-Raphson algorithm.The iterative process consists of two steps: the first step is to achieve the maximized value from a starting number, which is taken as the initial estimation value in the algorithm and the second step is to estimate again from the result of the first step, until the process arrives at the accuracy requirement.
To obtain the LCM with best-fit ratio, this paper makes use of the maximum likelihood method for estimation.The progress of estimation begins by fitting a complete independence model with  = 1 and then iteratively increases the number of latent classes one by one until an appropriate  is achieved.The likelihood ratio chi-square value will increase with the increase of differences between observed values and expected values.The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are employed to evaluate the model and to determine the appropriate number of latent classes .For the LCM, the fit ratio increases with the decrease of AIC and BIC.
Based on several estimation experiments, the fit criteria of the proposed LCM are summarized in Table 3.
From Table 3, one can find out that the fit criterion of the model is the best while  = 4; that is, we totally get 4 latent classes.
The detailed parameter estimation results are further summarized in Table 4.
From Table 4, we can find out the following results.From the above analyses, we can further find out that the differences among four latent classes are distinct.Moreover, the clustered results from the LCM method are much more logical than those from traditional simple cluster analysis methods [13].

Estimation Results of Mixed Logit Model.
The joint residence-workplace location choice model based on mixed logit is estimated using maximum simulated likelihood (MSL) method, which was proved to be rather effective and efficient by Bhat and Guo (2004) [11].Furthermore, to implement the MSL algorithm, we integrate randomly scrambled Halton draws [26] into the estimation algorithm.Similar to our previous work, we also code the algorithm using GAUSS platform.Using the above method, the joint residence-workplace location choice model is estimated as though that all households tend to choose residence location and workplace simultaneously; that is, these two kinds of land-use types influence each other.The estimated mean values of all parameters are reported in Table 5, with the -statistics shown in parentheses to indicate the significances of explanatory variables.As stated before, for unfixed commute distance and commute time, estimations of their mean values and standard deviations are both reported.Furthermore, to compare different household groups, the estimations of four latent classes are summarized in four columns, respectively.
From Table 5 we can find out that all estimated parameters have expected signs and significance, which generally proves the effectiveness of the integrated model of joint residence-workplace location choice and commute behavior.
For all latent classes, we can get the following results from the signs of parameters.
(1) The expected negative sign of house renting price shows that with other conditions fixed, households tend to live in areas with rather low housing price.
(2) Both commute distance and commute time between residence location and workplace have negative signs as expected, which indicates that households tend to job-housing balance when they consider their residence location and workplace choices; that is, proximity to workplace is very important for households to choose residence location; at the same time, proximity to residence location is also very important for households to choose workplace.
(3) The positive sign of household monthly income is also consistent with expectation, which means that households are more likely to reside or work in places which could bring them higher income.
(4) The sign of number of employment opportunities is positive, showing that job opportunities are a rather important factor influencing households' residence location and workplace choices.It means that people tend to live and work in locations with more opportunities.
(5) GDP of workplace has the expected positive sign, indicating that households are more inclined to work in places with good economic environment.
(6) A very interesting thing is that there is an exception in population density; that is, in latent class 1, the parameter is positive, while in latent classes 2, 3, and 4, the parameters are negative.Characteristics of each latent class could explain such exception.In class 1, most households live in Haidian and Chaoyang districts, which are two rather big zones with many residential land-uses, but the residence density is not very high; therefore, households tend to locate in places with high population density, which is also a kind of reflection of population clustering effect.Conversely, in classes 2, 3, and 4, most households live in other six districts of Beijing, which are rather small areas with very high residence density; therefore, households in these three classes tend to reside in areas with low population density, which reflects that low residence density and comfortable community environment are more important for these people.Further comparisons of the estimations among four latent classes reveal the following results.
(1) For all 4 latent classes, the magnitude of household monthly income is much bigger than other parameters.It indicates that this factor is much more important than other factors for household residence location and workplace choices.Moreover, house renting price also has rather big magnitude, showing that housing price is also a very important factor in location choices.
(2) For latent class 1, house renting price, population density, and GDP of workplace have much bigger magnitudes, showing that these three factors are more important for households in Haidian and Chaoyang districts to make their residence location and workplace choices.
(3) For latent class 2, much bigger magnitudes in population density and GDP of workplace again prove their importance.As stated before, the sign of population density is negative, because most households in class 2 reside in Xicheng and Dongcheng districts, which locate at the center city of Beijing, with very high residence density.Therefore, different from class 1, these households tend to live in areas with low population density and comfortable environment.
(4) For latent class 3, commute distance has rather big magnitude, which means that people consider more about commute distance when they make residence location and workplace choices.Results from the latent class analyses give the reason; that is, most households in this group use walk and bicycle in commute trips, and these two kinds of modes are more sensitive on trip distance.
(5) For latent class 4, the magnitude of number of employment opportunities is obviously bigger than others.Once again, the reason can be achieved from the latent class analyses.Most households in this group reside and work in Fengtai and Shijingshan districts, which are relatively underdeveloped in economic level.There are less employment opportunities in these two districts than other six zones, and income is also rather low.Therefore, households in this group pay more attention to number of employment opportunities in residence location and workplace decision behaviors.
Generally, all the estimation results are consistent with expectations.The detailed analyses based on latent classes explore many interesting and logical results.

Conclusions
This paper addresses an integrated model of joint residenceworkplace location choice and commute behavior using latent class and mixed logit methods.The general model framework consists of two single models.We first present a latent class model to extract households' different choice characteristics and cluster households into several groups.Based on the latent class analyses, we further combine the residence location and workplace together as the joint choice alternative and formulate a joint residence-workplace location choice model using mixed logit method.A large amount of data is extracted from the personal trip survey data of Beijing in 2005 for case study.Estimation results of the latent class model show that households are properly clustered into four groups, and every kind of household has different characteristics.The mixed logit models for all four latent classes are then estimated, respectively, using maximum simulated likelihood method.Estimated parameters show that all the estimations are consistent with expectations.For all latent classes, household monthly income and housing price are much important for residence location and workplace choices.Further comparisons of the estimated parameters among four latent classes prove that there exist much big differences in the location choice behaviors, and the joint residence-workplace location choice model using latent class and mixed logit methods is very effective.
Future researches are directed towards the following aspects.The first is to employ more recent socioeconomic data, census data, and trip survey data and to update the case studies of this research.The second is to further explore the differences among different decision makers, for instance, male, female, and children in the same household and to analyze more detailed choice behaviors.The third is to track the development histories of residential and employment land-uses based on panel data.

Table 1 :
Variables in the latent class model.

Table 2 :
Variables in the mixed logit model.   is the conditional probability of latent class  on manifest variable  in level : that is, ( =  |  = );    is the conditional probability of latent class  on manifest variable  in level : that is, ( =  |  = );    is the conditional probability of latent class  on manifest variable  in level : that is, ( =  |  = );    is the conditional probability of latent class  on manifest variable  in level : that is, ( =  |  = );    is the conditional probability of latent class  on manifest variable  in level : that is, ( =  |  = ); In the third column, the number after the slash line "/" is value of the corresponding categorical variables.and    are conditional probabilities;

Table 3 :
Fit criteria of the LCM.
Most households reside in Dongcheng and Chaoyang districts and work in the same area; that is, these households mainly live and work in the east area of Beijing.Therefore, the commute distance is the shortest, with 87.7% within 8 kilometers.Meanwhile, the frequently used commute modes are walk and bicycle, which are most appropriate for short distance trips.Almost no household uses subway; however, the percentage of car is about 20%.The possible reasons are that there was few subway lines under operation when the survey was carried out, and some people tend to use car due to rather high income.
modes are bicycle, bus, and car.Household income is rather high.Furthermore, from the latent class probability, the proportion of this latent class is the biggest.Class 2. Households mainly reside in Xicheng and Dongcheng districts.Some of them work in zones close to home, while some others tend to work in Haidian and Chaoyang districts.

Table 4 :
Estimation results of the LCM.

Table 5 :
Estimation results of four latent classes.