Examining the Bus Ridership Demand: Application of Spatio-Temporal Panel Models

Marlin Engineering, Marlin, FL, USA Queensland University of Technology, Centre for Accident Research & Road Safety—Queensland (CARRS-Q), Brisbane, Australia Department of Civil & Environmental Engineering, University of Central Florida, Orlando, FL, USA Department of Civil and Environmental Engineering, Faculty of Engineering, Imperial College, London, UK Department of Civil, Environmental & Construction Engineering, University of Central Florida, Orlando, FL, USA


Introduction
e overreliance on private automobile in the US over the last few decades has resulted in various negative externalities including traffic congestion and crashes, air-pollution-associated environmental and health concerns, and dependence on foreign fuel [1] ere is renewed enthusiasm among policy makers and transportation professionals to counter the private automobile reliance. Several urban regions are promoting public transportation and nonmotorized modes of travel through infrastructure investments such as public transit extensions, new commuter rail addition, and bicycle sharing systems (see Jaffe [2] and TPP [3] for public transportation projects under construction or consideration).
While nonmotorized modes of transportation are beneficial in the urban core, public transit with its reach to serve populations residing throughout the urban region can enhance mobility for a large share of urban residents. e public transit investments are critical in growing urban regions such as Orlando, Florida. In recent years, the Greater Orlando region has experienced rapid growth. In fact, according to the US Census Bureau, Orlando is the fastest growing urban region among the country's thirty large urban regions [4]. It is reported that the majority (about 74%) of the population growth in the region is driven by domestic and international migration. e rapid increase in population elevates the stress on the existing transportation system. us, it is not surprising that several transportation and public transit investments are underway in the region to alleviate traffic congestion and improve mobility for Greater Orlando residents. An important tool to evaluate the influence of these public transit investments on transit ridership is the application of statistical models. Transit system managers and planners mostly rely on statistical models to identify the factors that affect ridership while also quantifying the magnitude of the impact (e.g., see [5,6]). ese models provide feedback to agencies on the benefits of public transit investments while also providing lessons to improve the investment process.
Orlando, a typical American city in the south, represented by urban sprawl, excessive dependence on automobile, and a captive ridership, provides an ideal test bed to identify factors influencing public transit ridership. Drawing on stop level public transit boarding and alighting data for 6 four-month periods from May 2013 to April 2015, the current study estimates stop-level ridership models. Specifically, we apply a spatial panel regression model that accommodates for the influence of observed exogenous factors as well as unobserved factors. In terms of exogenous factors, we consider stop-level attributes (such as headway), transportation infrastructure variables (such as secondary highway length including major and minor arterials and major collectors; railroad length; and local road length and sidewalk length), transit infrastructure variables (bus route length, presence of shelter and distance of bus stop from central business district (CBD)), land use and built environment attributes (land use mix, household density, and employment density) and demographic and socioeconomic variables in the vicinity of the bus stop (income, vehicle ownership, and age and gender distribution). e repeated observation data at a stop level offers multiple dimensions of unobserved factors including stop level and spatial and temporal factors. For instance, it is possible that bus ridership of one bus stop is potentially influenced by the ridership of the neighbouring bus stops. It is also possible that ridership of a bus stop is influenced by the ridership levels of the previous time slots for the stop, while also being interconnected with the ridership of the neighbouring bus stops of the earlier time periods. Neglecting such spatial and temporal interconnections (if present) may result in biased estimates of the underlying ridership mechanism. To that extent, the major objective of this study is to accommodate for spatial and temporal effects (observed and unobserved) for modeling bus stop-level ridership. In our analysis, we apply a framework proposed by Elhost [7] to accommodate for the aforementioned observed and unobserved factors (spatial and temporal effects). Further, to accommodate for the repeated observations of ridership, we employ the spatial panel model in the current study context. e panel models developed include panel spatial error and panel spatial lag formulations (see Faghih-Imani and Eluru [8] for a similar formulation in another context). A validation exercise is conducted to illustrate the applicability of the model framework. e remainder of the paper is organized as follows. An overview of earlier research is described in Section 2 along with the current study section. In Section 3, the methodology has been outlined. In Section 4, the empirical analysis has been presented along with the data source and data preparation description for modeling. e model estimation results have been presented in Section 5 along with the discussion on the model results and validation. Finally, Section 6 provides a summary of the findings and concludes the paper.

Literature Review and Current Study
Traditional travel demand modeling research has focused on automobile travel. Only recently studies have begun to undertake a detailed analysis of transit systems and associated ridership. ese studies examine transit ridership to identify the impact of socioeconomic characteristics, built environment, and transit attributes on ridership across different contexts [5]. ese studies broadly examine macrolevel ridership [9,10], study the impact of financial attributes such as fare, fuel price, and parking cost [11][12][13][14][15][16][17], effect of transit attributes, transit level of service [18,19], and built environment on transit ridership. For the current research effort, the last group of studies are particularly relevant.
ese studies can be classified by the transit mode of interest such as rail, metro, and bus. As the focus of our current work is bus transit ridership, we limit our review to bus ridership studies. For studies on rail and metro, we refer the reader to [5,20]. For bus ridership studies, at the bus-stop level, the most common dependent variables of interest include daily level or time-period-specific boarding and alighting variables or a sum of boarding and alighting variables. A brief review of most relevant literature follows.
Ryan and Frank [21] highlighted the value of walkability of an area-computed based on land use mix, street patterns, and density-in determining transit ridership for San Diego. Johnson [22] studied transit boarding's in the Twin Cities region using an ordinary least square approach. e analysis highlighted the value of vertical mixed-use and retail establishments close to the stops. e study also found that population density in the larger vicinity of the stop is more critical to ridership compared to population immediately close to the stop. Pulugurtha and Agurla [6] applied spatial proximity and spatial weighting methods to analyze stoplevel ridership data from Charlotte. e models were estimated under various buffer sizes and the authors concluded that 0.25-mile buffer provided adequate model fit. Dill et al. [23], using data from Portland, Eugene-Springfield, and Jackson County, estimated separate log-linear regression models for each region and concluded that improving the transit level-of-service and developing pedestrian friendly environment near the stops positively influenced ridership.
Employing a simultaneous model that accommodates for interaction between transit supply and demand in Bogotá, Estupiñán and Rodríguez [24] concluded that promoting walking and creating barriers to car use are likely to increase ridership. Banerjee et al. [25] examined two corridors in Los Angeles and concluded that several land use and sociodemographic variables affected ridership on rapid bus transit systems. Tang and akuriah [26] highlighted that the value of real-time bus information is slightly increasing the bus ridership in Chicago. Chakour and Eluru 2 Journal of Advanced Transportation [5] employed a composite maximum likelihood approachbased ordered response model to accommodate for common unobserved factor influencing time-period-specific boardings and alightings. e results clearly highlighted the presence of such unobserved dependencies in addition to the impact of land use and urban form variables. More recently, using the same data as adopted in the current paper, Rahman et al. [20] formulated a grouped ordered response model structure that allowed for correlation between daily boardings and alightings at a stop level. e study also accommodated for repeated measures of data available. e study found that transit service affected ridership significantly while the effect of land use and urban form variables was substantially different across various buffer sizes. Further, in their analysis, bus route length, sidewalk length, the presence of low-income population, and the proportion of no vehicle population were likely to increase stop-level ridership.

Current Study.
e review of earlier research indicates the burgeoning research in the bus transit ridership field. However, the literature is not without limitations. First, earlier work is usually based on a cross-sectional-a singletime snapshot-ridership data (except for Rahman et al. [20]). Second, earlier literature on bus transit ridership has not accommodated for observed and unobserved spatial effects on ridership. Toward addressing these limitations, we formulate and estimate a spatial panel model structure that accommodates for repeated ridership data for the same stop as well as the impact of spatial and temporal observed and unobserved factors.
In our data, we have average daily boarding and alighting ridership, for weekdays only, for 6 four-month time periods between May 2013 and April 2015. Toward accommodating for spatial factors, we consider the most commonly employed spatial error and spatial lag variants employed for cross-sectional data analysis. e models are developed separately for boardings and alightings. e results from the spatial error and lag models are compared with the results from simple linear regression models to identify the improvement in model fit with accommodation of spatial unobserved effects and panel repeated measures.
e model estimation process is conducted employing a host of exogenous variables generated for the study region. e estimated models are validated using a holdout sample.

Methodology
In this paper, we considered boarding and alighting data for each bus stop for six time periods. e brief overview of the econometric methodology is presented in this section (see Elhorst [7] for complete econometric model details).
Let q � 1, 2,. . .,Q (in our study Q � 3,495) be an index to represent each station (spatial unit) and t � 1, 2,. . .,T (in our study T � 6) be an index for each time period. A pooled linear regression model for panel data considering spatial specific effects without considering spatial dependency can be written as where y qt is the log-normal of boarding and alighting, x qt is a column vector of attributes at station q and time t, and β is the corresponding coefficient column vector of parameters to be estimated. e random error term ϵ qt is assumed to be an independently and identically distributed normal error term for q and t with zero mean and variance σ 2 , and μ q represents a spatial specific effect to account for all the station-specific time-invariant unobserved attributes. is spatial specific effect can be treated as fixed effects or random effects. In the fixed effects model, for every station, a dummy variable is created while in the random effects model, μ q is treated as a random term that is independently and identically distributed with zero mean and variance σ 2 μ . e spatial random effects and random error term are assumed to be independent.
e fixed effects methodology is not appropriate in the presence of time-invariant independent variables. In addition, the fixed effects models estimate a large number of parameters (one parameter specific to each station); thus, they are computationally cumbersome for large systems as ours. erefore, in the current study, we restrict ourselves to a spatial random effects model 1 .
In traditional econometric literature, spatial dependency can be incorporated by employing different modeling frameworks, such as spatial lag or spatial autoregressive model (SAR), spatial error model (SEM), geographically weighted regression (GWR), and spatial Durbin model. In the current study, we have considered two different forms of spatial autocorrelations in examining bus ridership: (1) SAR model, which accounts for spatial endogenous interactions by a spatially lagged dependent variable, and (2) SEM model, which accounts for spatial interactions by a spatial autocorrelation process in the error term. Specifically, the first model comprises endogenous interactions effects with dependent variable at other stops and in the second model the spatial interaction is captured through the error term.
A spatial lag model can be written as follows: where δ is called the spatial autoregressive coefficient and w qj is an element from a spatial weight matrix W. e diagonal elements of W matrix are zero and define the spatial arrangement of the stops. Again, in some literature, other types of spatial matrices are introduced. In our study, the spatial W matrix is a 3495 × 3495 matrix with elements equal to 1 for the stations that are within 800 m buffer area of each other and zeros for the rest of the elements. It must be noted that the diagonal of W matrix is set to be zero to prevent the use of y qt to model itself. For stability in estimation, a row-normalized form of the W matrix is employed as our spatial weight matrix (see Elhorst [7] for more details on W matrix).
A spatial error model may be written as follows: where φ qt accounts for the spatial autocorrelated error term and ρ reflects the spatial autocorrelation coefficient. Both spatial lag model and spatial error model can be estimated using maximum likelihood approach (see Elhorst [7] for details on likelihood functions). In this paper, we use MATLAB routines provided by Elhorst [7,27] to estimate pooled spatial lag and error models with spatial specific random effects.

Empirical Analysis
e Greater Orlando region with a population of 2.3 million in 2016 is a typical American city in the south with an automobile-oriented transportation system with the following mode shares: automobile (85.7%), public transit (1.0%), walk (9.2%), and bike (1.2%). e main public transit service in the region is the Lynx system that serves an area of approximately 2,500 square miles within Orange, Seminole, Osceola, and Polk County in Central Florida. e bus system operates 77 daily routes with average weekday ridership of around 105,000. e number of bus stops considered for the analysis includes 3,745 stops. Of these, 3,495 stops data are used for model estimation while 250 stops data are set aside for validation. In addition to Lynx, the transit system includes a newly launched commuter rail system-SunRail. e rail line is 31 miles long with 12 stations with average weekday ridership of about 3,800 in 2015. Figure 1 represents the study area along with the Lynx bus route, bus stops, SunRail line, and SunRail station locations.
e ridership data was obtained from Lynx transit authority. For our analysis, weekday boarding and alighting data for the  Table 1. e standard deviation is large as the ridership is different across different bus stops in our analysis.
In our study, we have conducted an extensive literature review and identified factors considered in public transit ridership field for identifying the universal set of attributes. GIS shape files from Lynx were used to generate the number of bus stops and bus route length. For creating the exogenous variables, we considered various buffer distances (800 m, 600 m, 400 m, and 200 m) from each bus stop. e exogenous variable information was generated based on multiple data sources including 2010 US census data, American Community Survey, Florida Geographic Data Library, and Florida Department of Transportation databases. e exogenous attributes considered in our study can be divided into five broad categories: (1) stop-level attributes (such as headway), (2) transportation and transit infrastructure variables (secondary highway length including major and minor arterials and major collectors, railroad length and local road length, sidewalk length, Lynx bus route length, presence of shelter, and distance of bus stop from CBD), (3) built environment and land use attributes (such as land use mix, household density, and employment density), (4) sociodemographic and socioeconomic variables in the vicinity of the stop (income, vehicle ownership, and age and gender distribution), and (5) temporal and spatiotemporal lagged variables (such as stop boarding (alighting) in the last time period).
Temporal lagged variables were calculated for each bus stop by computing the boarding (alighting) variables from previous time period. Spatiotemporal lagged variables were created based on stops within the buffer (for various buffer sizes including 800 m, 600 m, 400 m, and 200 m). e boardings (alightings) from previous time period for stops within the buffer were generated for spatiotemporal lag variables. e descriptive statistics of exogenous variables are presented in Table 2.

Model Specification and Overall Measures of Fit.
e empirical analysis in our study is based on two different models, (1) Spatial Error Model (SEM) and (2) Spatial Lag Model (SAR), for boarding and alighting ridership. e loglinear independent models were estimated to serve as benchmark for advanced models. In this section, we compare SEM and SAR models. For each model type, the loglikelihood at convergence, R-square value, the number of parameters estimated, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) were calculated [28]. e AIC and BIC for a given empirical model are equal to where LL is the log-likelihood value at convergence, K is the number of parameters, and Q is the number of observations. e model with the lower AIC or BIC is the preferred model. e log-likelihood values at convergence for the models estimated are as follows: (1) (25,732.823). Based on the information criteria, SAR model performs better for boarding and alighting. However, the number of explanatory variables is higher in SEM model. Hence, we consider both frameworks for our discussion. e results from the models for boarding and alighting are presented in Table 3.

Variable Effects.
e final specification of the model development was based on removing the statistically insignificant (90% significance level) variables from the model. We considered various buffer size (800 m, 600 m, 400 m, and 200 m buffer size) and considered the buffer size that offered the best data fit. Columns 2 through 5 present results from SEM and SAR models for boarding while columns 6 through 9 present results from SEM and SAR models for alighting. e model results are described by variable categories below.

Stop Level Variables.
e headway between buses at a stop has a significant influence on ridership. e result from all models confirms this. An increase in headway is associated with a significant drop in ridership. e findings are in accordance with the previous literature [20,[29][30][31][32][33][34].

Transportation Infrastructures Variables.
Several transportation infrastructure variables significantly affect boarding and alighting. Bus route length in a 600 m buffer is associated with an increase in boarding and alighting across all models. e result indicates that an increase in the presence of bus route around the stop results in an increased adoption of public transit for the Greater Orlando region.
is is an important finding highlighting how when adequate infrastructure for bus transit exists, it is likely to be used. Sidewalk length in an 800 m buffer is observed to positively influence boarding and alighting in the SEM model. e corresponding coefficient was not significant in the SAR models. It is possible that the presence of sidewalk is serving as surrogate for walkable neighborhoods in the SEM model. e secondary highway length in a 600 m buffer and local road length in an 800 m buffer are positively associated with boarding for SEM and SAR models. However, these variables are statistically insignificant in the alighting models. Railroad length in an 800 m buffer is negatively    Journal of Advanced Transportation proportion of people aged between 0 and 17 years is observed to positively influence boarding in both SEM and SAR models. e result is intuitive as an increase in the proportion of young individuals is shown; population without access to car is also likely to increase. For alighting, the variable has a significant influence only in the SEM model. An increase in the proportion of individuals 65 and higher is associated with a reduction in boarding and alighting (except for alighting in SAR model). e result while counterintuitive at first glance is representative of vehicle access among this age group. As the number of households in the high-income category increase, the model results indicate a possible reduction in boarding and alighting (except for boarding SAR model). e result is expected in a city like Orlando where high-income individuals are more likely to use their personal vehicle for travel. Finally, the number of households renting in a census tract is positively associated with boarding and alighting (except for boarding SAR model). e relationship between rent and ridership is along expected lines.

Spatial and Spatiotemporal Effects.
e temporal lagged variables are positively associated with boarding and alighting ridership for SEM and SAR models. On the other hand, spatiotemporal lag variables present a reverse trend. To elaborate, the results indicate that stops with larger ridership in adjacent station for previous time period are likely to have a lower ridership. e result is indicative of competition from nearby stops. e result represents a system where the same ridership in the urban region is being split across stops.

Spatial Error and Spatial Lag Effects.
e study estimated SEM and SAR models to account for the presence of spatial effects. e model fit measures clearly confirmed our hypothesis. In the SEM model, the results indicate the presence of a significant spatial autocorrelated error term. In the SAR model, the spatial autoregressive coefficient indicates a significant impact of unobserved effects.  123). e results indicate a satisfactory performance for boarding and alighting models across the two systems. Overall, between the two model systems, the SEM models perform slightly better.

Conclusion
Toward encouraging a higher level of public transport adoption, it is of utmost importance to examine the critical factors that contributes to public transport ridership. An important tool to evaluate the influence of the critical factors and the future public transit investments opportunities is the application of statistical models. Drawing on stop-level boarding and alighting data for 6 four-month periods for Greater Orlando region from May 2013 to April 2015, the current study estimated spatial panel models that accommodate for the impact of spatial and temporal observed and unobserved factors.
Two spatial models, (1) Spatial Error Model (SEM) and (2) Spatial Lag Model (SAR), were estimated for boarding and alighting separately by employing several exogenous variables including stop-level attributes, transportation and transit infrastructure variables, built environment and land use attributes, sociodemographic and socioeconomic variables in the vicinity of the stop, and spatial and spatiotemporal lagged variables. e model fit measures clearly confirmed our hypothesis that spatial unobserved effects influence boarding and alighting through the presence of spatial autocorrelated error term in the SEM model and the spatial autoregressive coefficient in the SAR model. Further, the validation exercise results confirmed that the two models performed adequately. e outcomes of the estimated models can be employed to evaluate the changes in the public transport demand due to the changes in the future supply (adding or removing stops in the system). e optimal ridership could be predicted by employing the results of the estimated models while considering the spatial  1 We restrict ourselves to spatial random effects model as opposed to developing a spatial fixed effects model for multiple reasons. First, in a spatial fixed effects model, several additional parameters are estimated to account for bus-stop-specific effects. In a dataset with over 3000 stops, this would mean estimating a large number of parameters. e presence of such large number of parameters might lead to overfitting of the data. Second, in the presence of bus-stopspecific fixed effects, the impact of other variables that are common across the system is unlikely to be meaningful. erefore, the results from such an exercise are not transferable to the future or other locations in any meaningful form. Hence, we have not considered spatial fixed effects models. 2 "-" means insignificant at 90% confidence interval. 8 Journal of Advanced Transportation location of the proposed stops in relation to the existing bus stops (distance matrix). To be sure, the research is not without limitations. In our model, we have considered both boarding and alighting models separately. e observed and unobserved factors for boarding and alighting ridership at the same stop can have an impact on ridership. Incorporating such station level dependency between boarding and alighting along with spatial unobserved factors is a potential avenue for future research. In the future, it would be beneficial to examine how individual-level behavioral preferences for using private vehicle can be incorporated within transit ridership frameworks. ere is also a need to accommodate for the endogeneity between transit agency decisions (with regard to headway and new bus routes) and ridership. In the current study, we have estimated SAR and SEM models which account for spatial endogenous interactions and spatial interactions in the error structures. In the future, it might be interesting to estimate a spatial Durbin model which takes into account the advantage of both SAR and SEM models while also allowing for flexible spillover effects.

Data Availability
e dataset used for the study is confidential.

Disclosure
All opinions are only those of the authors. An earlier version of this paper was presented at the 2018 Transportation Research Board (TRB) Annual Meeting.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.