Customers ’ Mode Choice Behaviors of Express Service Based on Latent Class Analysis and Logit Model

As the parcel delivery service is booming in China, the competition among express companies intensifies. This paper employed multinomial logit model (MNL) and latent class model (LCM) to investigate customers’ express service choice behavior, using data from a SP survey.The attributes and attribute levels that matter most to express customers are identified. Meanwhile, the customers are divided into two segments (penny pincher segment and high-end segment) characterized by their taste heterogeneity.The results indicate that the LCM performs statistically better than MNL in our sample. Therefore, more attention should be paid to the taste heterogeneity, especially for further academic and policy research in freight choice behavior.


Introduction
The parcel delivery service in express industry is booming as the rapid economic growth of China, especially in the field of e-commerce.With a growth rate of 61.6% year on year, the total volume of the express parcels in China has increased to 9.2 billion in the year 2013.Only on the day of the 2014 "Double 11" online shopping carnival, the amount of express parcels exceeded 58.6 million, which surged to a new record.
There are different types of express companies in China's domestic express market, including multinational companies with in-house logistics operations and Chinese-owned express companies [1].According to the Medium-and Longterm Development Plan of the Logistics Industry released in June 2014, the Chinese government will further open the domestic express market to foreign companies (e.g., UPS, Fedex, DHL, and TNT).It will challenge the dominant status of the state-owned or Chinese private express companies to some extent.Amid intensifying competition, the existing and emerging companies must understand customers' preferences in order to develop effective strategies to enhance consumer recognition and eventually gain competitive advantages.The objective of this study is to investigate customers' choice behavior for express delivery service and identify the taste heterogeneity of the customers.It provides an analytical tool for further academic and policy research and helps the express industry to identify market opportunities and make development strategies.

Literature Review
The research in the field of freight mode choice and carrier selection has been plentiful over last four decades.Some excellent review papers are available in Winston [2], Regan and Garrido [3], Meixell and Norbis [4], and Zhang and Tao [5].However, very little research has focused on the purchase choice decision about logistics services [6] and almost all previous works are related to B2B context [7][8][9].To our knowledge, Garver et al. [10] is the first paper to analyze the carrier selection from the perspective of customers in a logistics context, especially in the field of customer parcel shipping industry.Nevertheless, they have studied only one type of commodity, which is textbook.
Multinomial logit model (MNL) is an effective disaggregate theory of behavioral science to evaluate attribute importance.It has been widely applied to analyze freight mode choice decisions [11][12][13].The basic MNL model assumes the taste homogeneity.A number of studies in the context of freight mode choice and carrier selection have paid attention to the heterogeneity in tastes and have attempted to do market segmentation in order to better match the selection preferences of shippers.However, to date, very few studies have addressed unobserved taste heterogeneity in transport demand analysis, and they have been almost exclusively focused on the passenger transport [14].
Until now, latent class analysis (LCA) has been widely applied in various areas, such as medicine [15], biology [16], social sciences [17], psychology [18], criminology [19], and marketing [20].LCA, especially, is the dominant approach in segmentation, which can identify different groups of freight agents based on taste heterogeneity regarding service attributes [21].
For the segmentation, cluster analysis is one of the most prevailing techniques.The modeling framework typically used all variables in the model rather than the selection of the variables.However, segmentation method inclines to produce a large number of segments when considering all sociodemographic and trip-related segmentation variables [22].Alternatively, LCA predefines the number of latent classes [23].In general, selecting variables and removing the unnecessary variables and parameters can improve the segmentation performance and the parameter estimation [24].Moreover, it accounts for heterogeneity preferences of individuals and simultaneously identifies the size and characteristics of segments.The mixed logit model resembles the LCM, which accounts for the preference heterogeneity as well.However, the two approaches differ in the variations of taste parameters [25].The LCM uses a discrete distribution as opposed to the assumption of continuous random variations in the mixed logit model.Furthermore, LCM does not specifically assume the distributions of parameters across individuals.
In the area of freight transportation, Garver et al. [26] is the first paper published in the major logistics journals that uses LCA approach.Two years later, Taylor et al. [27] applied a LCA method to owner operator retention and found four different need-based driver segments.Arunotayanun and Polak [28] applied LCA to analyze the differences of the shippers' preferences between behaviorally homogeneous segments and the observed taste heterogeneity based on conventional commodity types.Anderson et al. [9] used LCA to categorize three customers' groups of logistics service provider and revealed the preferences for each group with discrete choice model.As far as we know, Garver et al. [10] is the only article applying LCA in the customers' choice behavior of parcel shipping services, but they have not employed MNL to analyze attribute levels of the customers.
This paper investigates express service customers' segmentation in China and reveals the customer preferences in each unobserved heterogeneous segment.The main contributions are highlighted as follows: (1) to our knowledge, it is one of the earliest researches which address the individual customer's preferences to express services, which is a novel contribution to the literature on the express industry; (2) the study focuses on the unobserved homogeneous segmentation and finds that LCM performs statistically better than MNL in our sample; (3) it provides the express companies, including existing companies and new entrant, a useful method to position their operations and to target customers.
The rest of the paper is organized as follows.Section 3 introduces the survey design and data collection.Section 4 presents the methods applied in this paper, including latent class analysis (LCA) and multinomial logit model (MNL).Section 5 is devoted to analyzing the survey results and conclusions are presented in the last section.

Data Collection
3.1.Recruitment.The data is collected by the way of the faceto-face interviews as well as the online survey from June 20th to July 20th, 2014.The face-to-face interviews were randomly recruited at railway station, movie theatre, and shopping malls in the City of Dalian, China.The online survey received feedbacks from various regions of China.A total of 474 respondents have effectively taken part in the survey.

Attribute Selection.
The survey is based on Stated Preference (SP) techniques.Therefore, it is necessary to limit the number of attributes in order to control the choice number presented to respondents within an acceptable level.
Cullinane and Toy [29] stated that the five most influential attributes in freight mode choice are freight rate, transit time, transit time reliability, characteristics of the goods, and service.Nevertheless, they do not explicitly indicate what the service attribute is.In our survey, transit time and freight rate are discussed, but we do not consider transit time reliability because it could be satisfied by most express companies in China.Existing studies have also focused on the potential of information technologies to reduce costs and improve customer service [30,31].Hence, we select tracking-andtracing service as the information technologies (IT) attribute in this survey.Besides, the "last mile" to customers' door in express service has already caused extensive concern.The competition of express service is somewhat the competition in "last mile".So pick-up distance and time window service are also taken as the attributes.We take pick-up distance attribute into account in our survey since most express companies in China have not fully achieved door-to-door service.The time window service means that the consumers could determine a time window when the delivery service is offered.It is a new service which some companies in China attempt to improve while the express companies in developed countries have already provided.Eventually our survey selects the following five attributes: freight rate, transit time, pick-up distance, tracking-and-tracing, and time window service.
We take two types of the existing express companies as example in our survey, which play the major role in Chinese domestic express market.They are Shunfeng (SF) and Sitongyida (Shentong, Yuantong, Zhongtong, Huitong, and Yunda, SYZHY for short).SF is a typical large company offering relatively expensive, convenient, and customized express service.And SYZHY represents companies with standard and low-cost service.The relevant attribute levels are defined based on the actual data.Table 1 illustrates the attributes and their corresponding levels in the survey.The level of the choice attributes is set as equal differences to keep orthogonality.For the tracking-and-tracing and time window service, if the company provides the corresponding service, then it is marked by "Y," and "N" vice versa.

Survey Design.
The questionnaire consists of three parts.Part 1 includes various background questions, such as the purpose to use the express service, average transit time, the express company used last time, and the type of express items shipped last time.Part 2 is the competitive choice scenario.In this part, respondents are given three competitive alternatives of express services and asked to select the one they most prefer.An example of the choice task is illustrated in Table 2.
Part 3 is about the personal information of respondents (e.g., gender, age, income, education, profession, and delivery frequency).
For Part 2, we faced 3 9 * 2 6 possible choice situations at beginning.We then applied an orthogonal design method, one of the most widely used methods in survey design, to reduce the number of choices and maintain all main effects.In this study, choice situations are reduced to 36 via orthogonal design.Based on previous experience, it may impose too much burden on respondents [32].Since the orthogonal design with blocking has significantly better performance than that with random assignment of choice tasks to respondents [33], the 36 choice situations are randomly divided into 9 blocks, each containing four different choice tasks.

Model
Multinomial logit model (MNL) has been applied for the analysis of discrete choice for many years.It is based on utility-maximization theory, where a decision maker, labeled , is assumed to choose the alternative  with the highest utility among  alternatives.The utility cannot be fully observed due to the modeling uncertainty.Thus, it is divided into observed and unobserved parts shown in where   is the observed part and   is IID random variable which is defined as the unobserved part.  is the parameter vector of the explanatory variables   .MNL can capture systematic taste heterogeneity relating to observed characteristics of the decision maker.However, it cannot capture random taste heterogeneity which is not linked to observed characteristics [34].The latent class model (LCM) is popular to be applied to identify behaviorally homogeneous segments.It assumes that a discrete number of segments are sufficient to account for taste heterogeneity across segments and the individuals are implicitly sorted into a set of  classes.The individuals are relatively homogeneous within each segment but heterogeneous across segments.
The LCM attempts to detect the presence of latent classes.The choice probability that individual  in class  chooses alternative  from a particular set  is expressed as where   is the parameter vector of the explanatory variables   .Note that ( 2) is a simple MNL specification in class .
The LCM model could also estimate (2) for  classes and predict the probability   as individual  being in class .The unconditional probability of choosing alternative  is given as (3)

Estimation Results
The examination of the survey results consists of two submodels: (1) a MNL model on the entire sample, which aggregates the customers all together; (2) a latent class segmentation model, in which the presence of the taste heterogeneity is expected to some extent.

Entire Sample Model.
The results of MNL model on the entire sample are illustrated in Table 3.As shown in Table 3, the model is acceptable because likelihood ratio index  2 (= 0.159) is larger than 0.1 at the 0.05 significance level.The alternative-specific constants (ASC) for SF Express and for SYZBY Express are both statistically significant with positive signs.It indicates that the utility of the modal alternatives is captured by the unconsidered part and customers are more likely to choose the first two alternatives (SF and SYZBY Express) than the third one (new entrance).The model performs quite well and has the expected sign for all the variables of express service.The increases in freight rate and transit time provide a positive impact on the utilities, whereas the increases in pick-up distance, tracking-and-tracing service, and time window service have negative effects on the utilities.Customers may take all the service variables into consideration when they choose express service.
In addition to the five express service variables, there are another 2 variables considered in this model.They are the item type to be delivered and the delivery purpose.We divide the items into 3 groups: commodity, electronics, and documents.Furthermore, following Train [34], we normalized the parameters of attributes to zero in two alternatives, and nonzero parameter in the other alternative (item type in SF and purpose in New Express) is interpreted as the differential effect of the attribute.
We set the value 0 for commodities, value 1 for electronic products, and value 2 for documents, respectively.It is the similar expression of the attributes to Feng et al. [35], where they specified the individual education as (0, 1, 2, 3), individual characteristics as (0, 1, 2, 3), and the traffic congestion level as (0, 1, 2, 3) to analyze the influence factors of their aggressive driving behavior.
The results indicate that item type has positive effect and it is statistically significant for SF Express.Therefore, for SF Express, customers will obtain larger utility from delivering documents than from commodity and electronic products.For delivery purpose, both e-commerce purpose (online shopping and online sales) and the offline purpose (personal and business) are included in this paper.However, the attribute coefficient ( = −0.0920, = −1.4000)lacks statistical significance for the delivery purpose.It indicates that customers will not care about delivery purpose when they choose express service.

Latent Class Segmentation
Model.The MNL model assumes the taste homogeneity for each sample.In this context, the preference heterogeneity across individuals is ignored in the MNL on the entire sample.However, the customers with different taste can somehow have a variety of service requirements and lead various market trends.Therefore, the heterogeneity in taste and the endogenous market segmentation need to be paid attention to in order to better understand the consumers' preferences.LCM is the dominant approach in endogenous segmentation, which can identify different groups based on taste heterogeneity regarding service attributes.It simultaneously determines the number of segments, the assignment of individuals to segments, and the segment-specific choice model parameters.
Hence, further analysis will apply it to identify the segments and accommodate the taste heterogeneity of the segments.
In this section, we primarily intend to manifest the task heterogeneity to express service and some shipment characteristics interacting with express service.In order to do so, we divide the variables mentioned in the previous section into two different types.One is the level of service variables including freight rate, transit time, pick-up distance, tracking and tracing, and time window service.The other is segment variables such as delivery purpose and item type to be delivered.
A series of criteria and their combinations could be applied to identify the number of segments.Bayesian Information Criterion (BIC) is the most popular criteria for assessing LCM [26].Generally, a model with lower BIC value is superior to the one with higher values [36,37].Some researches indicate that the adjusted BIC (ABIC) is the best indicator of the information criteria [16,38,39].In addition to the three fit criteria, several likelihood-based tests such as Lo-Mendell-Rubin (LMR), bootstrap likelihood ratio test (BLRT), and entropy are considered to determine the number of segments in some other studies [39,40].To date, there is no common best criteria to decide the number of segments being acceptable.In fact, the meaningfulness and significance of parameters should be taken into consideration to decide the number of segments, not merely the formal criteria, AIC, BIC, LRT, and so forth.
Table 4 summarizes the values of fit criteria for the models with one to five segments.The Akaike information criterion (AIC) values for MNL and two, three, four, and five segments are, respectively, 28667.57,27096.23, 26800.94, 26554.39, and 26285.41.BIC values for models with one to five segments are 27323.7,27144.92,27014.87,and 26862.4,respectively.AIC and BIC values are both monotone descending as the number of segments increases.Generally, lower AIC or BIC value indicates better segment results, which means more segments are preferred in our LCM.However, it has been proved that the most segmentation research produce no more than five segments, because it is difficult for the express operators to concentrate on more than five segments [10].Therefore, we only display the results from the one-segment toward the five-segment models in Table 4 although the experiments of ten segment models are also conducted.Since a level-off point of both AIC and BIC curves could not be drawn, some other information criteria are used to decide the number of segments.
In Table 4, ABIC values gradually decrease from onesegment model toward the five-segment model, the trend of which is similar to that of BIC and ABIC values.However,  the change in ABIC from two to three classes is much smaller than that between one and two classes, which is evident from the "elbow" in the ABIC curve shown in Figure 1.LRT is commonly applied to perform significance tests on the difference between two nested models.The LRT values (change from 0 to 0.85) indicate that MNL (oneclass) model should be rejected in favor of a two-segment model.Additionally, entropy summarizes the degree to which latent classes are distinguishable and the precision that the individuals place into classes.High values of entropy (>0.8) indicate that the results of LCM are quite good.
In conclusion, on the basis of the small change in ABIC, the nonsignificant LRT, and high value of entropy, the twosegment solution is selected as the final model for further study.

Estimation Results.
The estimation results for the 2-class LCM are shown in Table 5.From the results, the adjusted  2 and Log-likelihood value (LL) of 2-class model is much better than that in MNL for entire sample (adjusted  2 = 0.1588, LL = −1745.899).That is to say, the 2-segment LCM is more acceptable and reasonable than the MNL model for the analysis of entire sample.In addition, almost all descriptive attributes for the alternatives have the expected signs.In the 2-class model, we name Segment 1 and Segment 2 as Penny pinchers and high-end customers, respectively, according to their characteristics.In order to identify the various delivery tastes of express customers across segments, a detailed segment-by-segment discussion is provided below.
Meanwhile, all the service variables are statistically significant and have the expected signs.
For the customers in this segment, the estimated values of the ASC for SF Express and SYZBY Express are statistically significant.As seen from significance levels, the purpose and item type failed at the 0.05 significance level.This empirical evidence suggests that different delivery purposes and item types do not alter customers' choice behavior.That is to say, they do not consider delivery purposes and item types when they choose express company.
An interesting characteristic of this segment is that the customers have a distinct preference for the delivery service with lower freight rate, and the offer of the tracking and tracing technique and time window service could greatly promote their service choice.At the same time, the transit time and the door-to-door service could also affect the choice behavior of the customers in Segment 1.However, they do not pursue extremely less transit time and pick-up distance as Segment 2.
In summary, customers in this segment are price sensitive, and they have fewer requirements on other service variables than the ones in Segment 2. Therefore, Segment 1 is named as Penny pinchers.
Segment 2 High-End Customer.Segment 2 is the smaller group, containing 680 respondents (36%).The segment model fits the data quite well (adjusted  2 = 0.8587, LL = −745.2648).The scores for almost all alternative parameters are statistically significant and have the expected signs except for the freight rate.
It is interesting to find that the customers in Segment 2 do not take tracking-and-tracing and time window service into consideration.In addition, different delivery purposes and item types result in different behaviors for individuals in this segment.
It is abnormal that the customers have positive utilities for increases in freight rate in this segment.It means that the customers prefer the express service with higher freight rate.It may explain that the customers in this segment consider "you will get what you paid for."Furthermore, the customers in Segment 2 focus primarily on transit time and pick-up distance, which represent service quality.It is in line with the results of the ASC.We can find from the ASC that when three express companies provide the same service, the customers will select SF Express, SYZBY, and NEW Express in turn.Therefore, we can conclude that Segment 2 customers are more likely to choose express companies with better servicequality reputation.
Above all, Segment 2 is constituted with the customers having heavy emphasis on service quality-especially the transit time and the pick-up distance, and they are willing to pay a higher freight rate to obtain higher-quality service.It is because the delivery service with higher freight rate is capable of providing more reliable service in their perspectives.As a result, we nominate Segment 2 as high-end customers.
Comparing the two segments, one of them (Segment 2) pays special attention to high quality of express services such as transit time and pick-up distance, while the other (Segment 1) is more concerned with the freight rate of the express service.Thus, for New Express, it should provide cost efficient express service to attract more penny pincher customers.Although it is hard to appeal to those high-end clients at beginning, it may make sense to keep providing high standard express services.

Model Validation.
Mean values of demographic and level-of-service variables in each segment, calculated depending on Bhat [22], could explain the characteristics of each segment more intuitively.We evaluated them to further validate the overall segmentation characteristics, shown in Tables 6 and 7.
From Table 6, we notice that, for sociodemographic attributes, such as the age, gender, educational level, and express-used frequency, there is not much difference between two segments.It is probable that these attributes are not significant characteristic for identifying segments.However, customers in Segment 2 are the individuals with higher income than those in Segment 1.It supports that we define Segment 2 as the high-end consumers.As illustrated in Table 7, the freight rate and pick-up distance offer the most substantial differences across the two segments.Segment 2 is associated with higher freight rate and shorter pick-up distance comparatively to Segment 1.It indicates that Segment 2 has stricter requirement for service quality than Segment 1.Meanwhile, they do not care about the price.Regarding the delivery purpose, the proportion of e-commerce consumers (online shopping and online sales) in Segment 1 is higher than that in Segment 2 and the percentage of non-e-commerce consumers in Segment 2 is larger than that in Segment 1.It is not surprising because one of the advantages of e-commerce is its low price.In the point of item type, comparing with Segment 1, the percentage of Segment 2 in document delivery is higher.It may be because that it is usually an official behavior to deliver a document and consequently needs high-quality service.Furthermore, the mean values of the delivery companies used by the consumers show that Segment 2 prefers SF while Segment 1 inclines to choose SYZHY.Here, SF offers relatively expensive, convenient, and high-quality express service and SYZHY provides the standard and low-end service.It is consistent with the expected trend of the segmentation.Hence, there is enough evidence to suggest that our segmentation is reasonable.

Conclusion
The paper has applied MNL and LCM to investigate customers' express service choice behavior, using the data from a SP survey.The study has identified the attributes and attribute levels that concern the express customers most principally.The results indicate that the LCM performs statistically better than MNL in our sample.In addition, it is found that the customers can be divided into 2 segments characterized by the taste heterogeneity.One is the high-end segment who primarily focuses on service quality and is willing to pay for it, while the other is penny pincher segment who is price sensitive.Hence, in order to obtain the great mass of market share, the express companies can provide low-price express services to attract penny pincher customers.Besides, they can also get market share of high-end clients by high level express services without consideration of freight rate.Through this study, it is suggested that the taste heterogeneity should be paid more attention to, especially for further academic and policy research of freight choice behavior.

Disclaimer
Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding bodies.

Figure 1 :
Figure 1: ABIC curve for different segments in LCM.

Table 1 :
Attributes and the corresponding levels in the survey.

Table 2 :
Example of choice task.Given the goods type you delivered last time, which express service do you prefer to choose?

Table 3 :
Parameter estimate results for MNL model.

Table 4 :
Model fit criteria for models with one to five segments.

Table 5 :
Parameter estimate results for 2-class model using latent class model.

Table 6 :
Mean value of demographic variables in each segment.

Table 7 :
Mean value of service and other related variables in each segment.