Multinomial Logit Model of Pedestrian Crossing Behaviors at Signalized Intersections

Pedestrian crashes, making up a large proportion of road casualties, are more likely to occur at signalized intersections in China. This paper aims to study the different pedestrian behaviors of regular users, late starters, sneakers, and partial sneakers. Behavior informationwas observedmanually in the field study. After that, the survey teamdistributed a questionnaire to the same participant who has been observed, to acquire detailed demographic and socioeconomic characteristics as well as attitude and preference indicators. Totally, 1878 pedestrians were surveyed at 16 signalized intersections in Nanjing. First, correlation analysis is performed to analyze each factor’s effect. Then, five latent variables including safety, conformity, comfort, flexibility, and fastness are obtained by structure equation modeling (SEM). Moreover, based on the results of SEM, a multinomial logit model with latent variables is developed to describe how the factors influence pedestrians’ behavior. Finally, some conclusions are drawn from the model: (1) for the choice of being late starters, arrival time, the presence of oncoming cars, and crosswalk length are themost important factors; (2) gender has the most significant effect on the pedestrians to be sneakers; and (3) age is the most important factor when pedestrians choose to be partial sneakers.


Introduction
Due to high population density, rapid urbanization, and lack of adherence to traffic regulations by both drivers and pedestrians, traffic accidents involving pedestrians have become a major safety problem all over the world, particularly in developing countries.For example, in 2009, 16683 pedestrians were killed in traffic crashes in China, representing 24.62% of all traffic fatalities [1].However, 4092 pedestrians were killed in traffic crashes, accounting for only 12.10% of the fatalities sustained in police-reported motor vehicle crashes in the US in 2009 [2].
Crashes involving pedestrians are most likely to occur when pedestrians are crossing roads, especially crossing at signalized intersections.In China, more than 50% of pedestrian crashes occurred at signalized intersections [1,3].However, illegal pedestrian behavior is common and widespread in China.Yang et al. claimed that, in developing cities like Xi'an, if a pedestrian is waiting at a signalized intersection, in most cases they are waiting for an acceptable gap in traffic and not for the green signal [4].According to our study on the behaviors of 6628 pedestrians at 102 signalized crosswalks, the average compliance rate is only 62.8%.These statistics underlie the importance of understanding factors that contribute to pedestrian errors or willful violations of traffic regulations in China [5].

Previous Work
There have been considerable studies that have examined the factors which influence pedestrians' road crossing behaviors, such as traffic environment conditions, road user variables, and social factors.

Traffic Environmental Conditions.
To identify the impact of urban planning on pedestrian crossings, a large study is carried out in Canada comparing pedestrian behaviors in Ontario with that in Quebec [6].Using an attitude survey and a video survey, Keegan and O'Mahony studied the impact of the pedestrian waiting countdown timer on pedestrian behavior [7].They found that the timer units could induce a reduction in the number of individuals who crossed on red.In another more complete study, Chu et al. discussed the impact of street environment, including traffic conditions, roadway characteristics, and signal-control characteristics on the crossing behavior [8].More recently, Tiwari et al. concluded that, as the signal waiting time increases, pedestrians are more likely to violate the traffic signal [9].

Road User Variables.
Several studies have examined gender and age differences in pedestrian behaviors [10][11][12].Male pedestrians tend to violate traffic rules more frequently than females.Generally, young adults and adolescent pedestrians are more likely to commit violations than older pedestrians (e.g., Moyano Díaz and Holland and Hill), and older road users express more appreciation for signalized intersections than younger pedestrians (e.g., Bernhoft and Carstensen) [12][13][14].Other factors such as marital status, education, income, and personality variables such as attitude towards risk, sensation seeking and aggression were found to be related to pedestrian behavior [15][16][17].

Social Factors.
The impact of others' behaviors on the individual has been investigated and has been found to be complicated.Santor et al. found that peer conformity was one of the strongest predictors of risky behaviors in adolescents [18].Rosenbloom found that the presence of other pedestrians waiting at the crosswalk upon a pedestrian's arrival decreased the likelihood of crossing on red light [19], Moreover, Zhou et al. presented a survey on 426 pedestrians that investigated the effect of conformity tendency on Chinese pedestrians.It was found that people who exhibited a greater tendency towards social conformity also had stronger crossing intentions in following other pedestrians than low conformity people [20].
Although considerable research has been done in recent years to identify the factors influencing pedestrian crossing behavior, limited studies on the effect of pedestrian's preference and attitude are reported before.Considering the studies described above, offering valuable information about pedestrian crossing behaviors at signalized intersections, this study aims to assess the effect of pedestrian-related factors and traffic factors on pedestrian behaviors.

The Affecting Factors.
In this study, the affecting factors are divided into two categories: pedestrian-related factors and traffic factors.Pedestrian-related factors refer to the features relative to the individual pedestrians, including demographic and socioeconomic characteristics (gender, age, career, education, and income), family characteristics (marital status, with or without children), trip characteristics (trip purpose, owning driver license or not), and attitude variables (latent variables).Traffic factors include the attributes of facilities (crosswalk length, signal length) and traffic conditions (arrival time, accompanied or alone, number of pedestrians waiting at the curb, and number of pedestrians crossing the street).

Survey Design.
According to the signal use, pedestrians can be classified into four types: pedestrians who cross the street during the green signal (regular users), pedestrians who begin to cross when the signal is green but do not finish on green (late starters), pedestrians who cross during the red signal (sneakers), and pedestrians who cross part of the crosswalk during the red signal and then continue crossing during the green signal (partial sneakers) [21].Essentially, this study mainly focuses on the factors relating to these four types of crossing behaviors, which are the alternatives of the logit model.Behavior information was observed manually in field study.After obtaining the behavior data, the survey team distributed a questionnaire to the same participant that has been observed, to acquire detailed demographic and socioeconomic characteristics as well as subjective preference indicators.Also, the serial number method (giving the same number to the behavior recording and questionnaire) was applied to guarantee all the data to describe the same participant.

Data Collection.
In the survey, a total of 3952 pedestrians on 32 crosswalks in Nanjing were observed, among whom, only 1970 (49.85%) agreed to do the questionnaire.The data required to build a logit model should include behavior information and other contributing factors.Totally, 1878 questionnaires remained after we eliminated the incomplete ones.Gender and age distributions of the complete samples are shown in Table 1.
Statistical analysis is applied to these 1878 effective samples, and the results of the crossing behaviors are shown in Table 2.Only 54.53% of the participants are legal crossing pedestrians.The ones that obey the rules spatially, but not temporally, account for 87.42% of the total number (few are partial jaywalkers and jaywalkers).For this condition, we do not conduct a deeper analysis.

Methodology
The objectives of this research are to examine the effect of various factors and to find out the most significant predictors Note: "%" is a ratio of pedestrian counts of a certain type and the total sample (1878).
which influence the pedestrian's crossing behaviors.The overall methodology to analyze how the factors affect pedestrian behavior involves three main steps.
Step 1: Correlation Analysis to Identify the Factors.Correlation analysis is needed to assess the actual effect of each factor relating to crossing behaviors, which is performed to identify the suitable variables for the discrete choice model.To analyze the correlation between the variables and behaviors, two different methods are applied separately for categorical variables and continuous variables.For the categorical variables, the crosstabs procedure is used, where single factor Pearson's chi-square test is applied to determine whether a correlation exists between the factor and the crossing behaviors, while for the continuous variables, one-way variance analysis (ANOVA) is used for the same purpose as mentioned above.
In the variance analyses, -test is applied.
Step 2: Structure Equation Modeling (SEM) to Link Pedestrians' Preferences to Demographic Characteristics.This step employs SEM, simultaneously estimating the relationships between the demographic data and the attitudinal factors and the relationship between the attitudinal statements and the attitudinal factors.SEM enables testing of a set of linear models to identify the structural attitudes of crossing behaviors and quantification of the causal relationships between pedestrians' demographic profile and their attitudes [22].Both manifest and latent variables are used in the SEM.There are two main groups of manifest variables: (1) the attitudinal indicator variables, which are the ratings that characterize pedestrians' attitudes toward various crossings or travel statements and (2) socioeconomic and demographic variables, such as gender and income.
Step 3: Multinomial Logit (MNL) Model.The discrete choice model is the main model used in the study, which is based on the basic random utility theory assumption [23].MNL model is developed to conduct the discrete choice analysis for three reasons.First, the study focuses on four types of pedestrian behaviors, and, thus, a multinomial model has to be used.Second, the model can handle a wide variety of variables (e.g., numerical, categorical).Third, because the MNL model is widely used, the empirical results can be easily interpreted and easily understood by decision makers who do not have a strong statistical background.
Traditionally, discrete choice models have considered only objective attributes from the alternatives and socioeconomic characteristics of the individuals as explanatory variables.During the last decade, to capture the impact of subjective factors, a new breed of "hybrid choice" models have been developed; these models allow including not only tangible attributes but also more intangible elements associated with users' perceptions and attitudes, through latent variables [24].Moreover, it has been shown that introducing latent variables (LV) helps to improve the choice model fit [25,26].
Using the logged odds (logit) of being a regular user or not, separate models for the factors with and without latent variables are estimated.The logged odds of pedestrian behaviors are modeled as a function of various personality and traffic variables.

Model and Estimation
5.1.Correlation Analysis.Using Kendall's tau-b method, we found that there is no internal consistency among the factors.So the correlation analysis between each contributing factor and pedestrian crossing behavior is presented.
(1) Demographic and Socioeconomic Characteristics.For the gender factor,  2 is 7.934, and  = 0.046 < 0.05.The  values of age ( = 6.541) and income ( 2 = 21.467)factors are also smaller than 0.05.Furthermore, the  values of career and education factors are larger than 0.05.Thus, gender, age, and income are significant factors.
(2) Family Characteristics.As the  value is lower than 0.05, marital status factor is related to crossing behavior choices.In terms of compliance rate, there is no significant difference between the married and unmarried pedestrians.However, there are differences in different violating behaviors.Having children also is a significant factor.
(3) Trip Characteristics.With the prevalence of private cars, having a driver license has become closely associated with the travelers' behaviors.Pedestrians who have a driver license are significantly higher in compliance rate (58.48%) than the pedestrians without a driver license (53.37%).Trip purpose is also a significant factor.
(4) Attributes of the Facilities.Among all the facilities variables, the crosswalk length, green time, and red time influence the crossing behaviors significantly, where the  values are 0.024, 0.042, and 0.006, respectively.But the crosswalk width ( = 0.71) does not have significant effect.
(5) Traffic Condition.Among all the traffic condition variables, accompanied or alone, pedestrians waiting at the curb, the crossing pedestrians, the presence of oncoming cars and arrival time are significant.

The Latent Model.
The science of behavior says that explanatory variables, such as gender, age, and income, result in a person's subjective preferences (latent variables), which could be reflected by the individual's behaviors and attitudes (indicator variables).The latent variable model describes the relationships between the latent variables and their indicators and causes.Figure 1 shows the relationships among the three types of variables.
Latent variables are unobserved attributes, which represent the pedestrians' subjective preferences in crossing behavior choice.In this study, the latent variables are safety ( safe ), conformity ( conf ), comfort ( comf ), flexibility ( flex ), and fastness ( fast ).Latent variables are modeled in two aspects.
On one hand, the explanatory variables are used to construct latent variables as (1).We assume that the differences in pedestrian's attitude and preference are caused by demographic characteristics such as gender, age, and income: where  is the latent variable and  1 ,  2 , . . .,   are explanatory variables.And in this study, the explanatory variables include gender ( gend ), age ( age ), education ( educ ), and income ( inco ). 1 ,  2 , . . .,   () are the parameters to be estimated, and  is an error term.On the other hand, indicator variables can be represented by latent variables as seen in (2).Indicator variables are exogenous.We apply two different methods when constructing the latent variables.To construct the safety and conformity latent variables, we use behavioral indicator variables, and for the comfort, flexibility, and fastness latent variables, we use attitudinal indicator variables.The questionnaire is designed as 5-point Likert Scale to describe the feelings of indicator variables: where  1 ,  To build the model, gender variable is valued as 0 or 1, representing females or males.The age, education, and income variables are valued according to their levels.LISREL software is used to perform the latent variables modeling.Estimation results of  and  are shown in Tables 3 and 4, respectively.
Two aspects are used to evaluate the merits of the model.First, various fit indices are applied to the overall analysis of the model.The main evaluation indices as  2 /, root mean square error of approximation (RMSEA), comparative fit index (CFI), and goodness-of-fit index (GFI) are 2.31, 0.016, 0.94, and 0.91, respectively; all the indices are in the acceptable range.Therefore, this model is viable.
Second, the significance levels of the parameters are examined and, thus, their meanings and suitability are evaluated.Results of the -statistic output by LISREL are used to examine the significant level of the parameters.As shown in Table 3, the  values of all the parameters for  are larger than 1.96, which means that all the parameters are significant at the 95% confidence level and, thus, the assumption is valid.However, only part of the  values of the  parameters is larger than 1.96, and the others are lower than 1.96 (as shown in Table 4) and, therefore, should be eliminated.Some correlations between latent variables and explanatory variables can be shown in Table 4. Pedestrians' preference for safety is positive with age and the level of income and negative with gender; conformity is negative with income, gender, and education; comfort is positive with income, age and education; flexibility is positive with gender while negative with age; finally, fastness is positively correlated with income and gender but negative with age.

Model Variables.
After screening out the factors using the correlation analysis, the significant variables which are included in the model are classified as dichotomous data, ordered categorical variable, or continuous variables.
When used as the independent variables, dichotomous variables are usually valued as 0 and 1, respectively.Dichotomous variables in this study include gender, having a driver license or not, trip purpose, marital status, with or without children, accompanied or alone, and the oncoming cars.Ordered categorical variables could be analyzed using a dummy variable.The only ordered categorical variable is the level of income.There are totally four levels of the income, which are defined as three dummy variables.Generally, the higher level is chosen as the reference.Continuous variables  are typically put directly in the model, which include age, crosswalk length, the number of pedestrians waiting at the curb, the number of crossing pedestrians, and arrival time.

MNL Model Estimation.
Based on the analysis from Section 3.2, the model contains four alternative parts (regular users, late starters, sneakers, and partial sneakers).The allgreen type is chosen as the reference; thus, the logit model of the th behavior could be written as Among them,  1 ,  2 ,  3 , and  4 are the probabilities of choosing to be a regular user, late starter, sneaker, and partial sneaker, respectively.  is the constant, and    is the parameter of the explanatory variable    .Also,    indicates the th explanatory variable when choosing the th crossing behavior, such as gender, age, income1, and income2 in the model.
Using Biogeme software, models with latent variables are created, and the estimation results are shown in Table 5.

Discussion.
From the estimation results of the parameters, for ln( 2 / 1 ) (the probability ratio of being late starters and regular users), the variable "arrival time" is the largest parameter value (2.04).It is seen that those who arrive during the last few seconds of the green time tend to be late starters, not the regular users.The second largest variable is "oncoming car;" since when there is no oncoming traffic, pedestrians are more likely to take the risk of crossing the street.Moreover, variables such as "length" and "green time" indicate that pedestrians are more likely to fail in crossing the street during the green time with a longer crosswalk and a shorter green time.Therefore, these factors significantly affect the crossing behaviors of this type primarily due to the traffic conditions and the road facilities.Furthermore, as for latent variables, the stronger subjective inclination for fastness, safety, and flexibility is, the more likely the pedestrian chooses to be a late starter.
For ln( 3 / 1 ) (the probability ratio of being sneakers and regular users), the variable "gender" has the largest parameter value (−3.75), showing that gender affects most significantly pedestrians' choice of crossing on red, and the two are negatively correlated.Other significant factors include "oncoming car, " "trip purpose, " and "countdown." For the latent variables, conformity, fastness, and safety greatly affect the choice of the pedestrian to cross on red.The influential level of safety is smaller than that of fastness, showing that pedestrians would choose a greater walking speed and efficiency at the price of safety.
Finally, for ln( 4 / 1 ) (the probability ratio of being partial sneakers and regular users), the variable "age, " with the parameter value of −1.78, is the most significant affecting factor.Other factors such as "gender, " "oncoming car, " and "income" also greatly affect pedestrians' behavior choice of being partial sneakers.Apart from that, pedestrians' subjective preferences for conformity, flexibility, and fastness also affect their behavior choosing.Thus, those who have a high requirement for these three features are more likely to be partial sneakers.In other words, when many pedestrians cross before green signal and the comfort of the crossing environment is poor, pedestrians are more likely to cross, which is especially true for those with urgent purposes.
It can be seen from the above that, on different behaviors, the effects of socioeconomic and traffic factors are different largely.However, there are still some similarities.For example, "oncoming car" is a factor that is in the top three great effect factors on the three behaviors.So, traffic condition is an important factor for pedestrians considering violate or not.

Final Analysis.
To acquire more detailed information about pedestrian violations, this study examines pedestrians' behavior characteristics and the affecting factors at signalized intersections.Information was obtained by field studies observation, where 1878 pedestrians were surveyed at 16 intersections in Nanjing, China.Conclusions are drawn based upon the correlation analysis and MNL model.
The main findings are as follows: (1) the presence of oncoming cars, crosswalk length, and green time are the most significant factors when pedestrians choose to be late starters; (2) gender, oncoming cars, and trip purpose are the most significant factors when pedestrians choose to be sneakers; and (3) age, gender, and oncoming cars are the most significant factors when pedestrians choose to be partial sneakers.

Facility Design.
As is found from the presented analysis, the traffic conditions and road facilities significantly affect the crossing behaviors of pedestrians.To strengthen pedestrians' awareness of obeying the rules, the facilities should be carefully designed for pedestrian crossings.
Firstly, pedestrians' needs should be considered in the design of crosswalks.For example, the shortest route for a crosswalk must be chosen to save energy consumption and reduce any mental obstacles.
Secondly, signal countdown devices should be installed to offer the information better.Therefore, pedestrians could make more effective decisions, which may decrease the illegal behaviors.
Finally, the design of green time for pedestrians should also be improved.In China, the crossing pedestrians and the vehicles moving in the same direction are allowed to pass simultaneously in most cases.Although this is simple, the signal facilities for the pedestrians are ineffective.When designing the signal, neighboring land use and the results from traffic survey should also be combined to judge the main pedestrian groups at this site.Moreover, the designed speed for the crossing pedestrians should be based on the characteristics of their gender, age, and the attributes of the facilities (e.g., with or without a countdown timer) at the intersection.That is, the shortest crossing time for the pedestrians could be determined methodically.

Safety Education.
In this study, it was shown that the following groups are more likely to break the rules at a crosswalk: young and middle-aged people, females, and low-income people.Therefore, various methods for safety education should be applied to best suit the pedestrians in these groups.For example, female pedestrians should be told more about the rules, as some of them are not familiar with traffic rules in China.In addition, social psychology should be considered to give effective propagation and education to the public.Also, computer and networks technology should be harnessed so as to open up new areas for traffic propaganda and public education.

Table 1 :
Descriptive statistics for the samples.

Table 2 :
Statistical results of the crossing behaviors.

Table 3 :
The  matrix of factor loadings.