Bus Travel Time Deviation Analysis Using Automatic Vehicle Location Data and Structural Equation Modeling

To investigate the influences of causes of unreliability and bus schedule recovery phenomenon onmicroscopic segment-level travel time variance, this study adopts Structural Equation Modeling (SEM) to specify, estimate, and measure the theoretical proposed models. The SEM model establishes and verifies hypotheses for interrelationships among travel time deviations, departure delays, segment lengths, dwell times, and number of traffic signals and access connections. The finally accepted model demonstrates excellent fitness.Most of the hypotheses are supported by the sample dataset from busAutomatic Vehicle Location system.The SEM model confirms the bus schedule recovery phenomenon.The departure delays at bus terminals and upstream travel time deviations indeed have negative impacts on travel time fluctuation of buses en route. Meanwhile, the segment length directly and negatively impacts travel time variability and inversely positively contributes to the schedule recovery process; this exogenous variable also indirectly and positively influences travel times through the existence of signalized intersections and access connections.This study offers a rational approach to analyzing travel time deviation feature. The SEM model structure and estimation results facilitate the understanding of bus service performance characteristics and provide several implications for bus service planning, management, and operation.


Introduction
Bus service reliability can have significant impacts on the service providers and the existing and potential users [1].From the passenger's perspective, reliable bus services present predictable travel times and wait times; from bus agencies point of view, they benefit from stable ridership of passengers who are satisfied with reliable services.As a result, the public transit administrative authorities take service reliability as one of the vital performance measures [2]; and transportation researchers take into account the bus traveling randomness in bus assignment modeling and network and operation design [3][4][5].For fixed-route bus services with fixed timetables and trajectories, on-time performance, and headway regularity are the most commonly used reliability measures [2], while travel time variability can be an important agenciesconcerned issue relating to these two service reliability measures.The focus of bus service operation and management is on travel time reliability; travel times are core components of travelers' travel cost in transit assignment modeling.Thus, it is of great importance to investigate bus travel time reliability.
Many researchers have made efforts to explore the indices definition, overall features, and descriptive cause analysis of travel time reliability [1,6].However, this study adopts Structural Equation Modeling (SEM) method to specify, estimate, and measure the proposed theoretical model for analyzing travel time deviation from schedules on the microscopic bus route segment level.Compared with the previous studies on transit reliability analysis employing regression methods [7,8], the SEM model establishes and verifies hypotheses representing interrelationships among observed variables based on existing theories and empirical results.The relevant variables, denoting departure delays and upstream travel time deviations, are embedded into the SEM models so as to reveal bus schedule recovery phenomenon first investigated by Kalaputapu and Demetsky [9].Meanwhile, the availability of bus Automatic Vehicle Location (AVL) systems makes it feasible to conduct the microscopic modeling and analysis on the bus segment level.

Mathematical Problems in Engineering
This study begins with establishing hypotheses based on literature review about bus service reliability analysis, followed by Structural Equation Models specification.Then, it conducts the SEM model testing and modification by examining model estimation results in terms of estimates statistics and multiple fitness measures.With the respecified SEM model, the fitness of the entire model and estimates of path coefficients are discussed.Finally, research conclusions and relevant implications for bus service planning and operation are present.

Literature Review and Research Hypotheses
In Structural Equation Modeling, five basic steps should be followed, namely, model specification, model identification, model estimation, model testing, and model modification.The meaningfulness of correlation relationships in specified models depends on the employed variables and reasonable hypotheses.Hence, the theoretical hypotheses are very important and should be based on previous research.This section reviews related literature and proposes theoretical assumptions for model specification.
As aforementioned, on-time performance and headway regularity are key measures of bus service reliability, while travel time variability performs as an important and essential issue relating to these two reliability measures.This research gets insight into the internal and external factors influencing bus service reliability especially travel time deviations on the bus route segment level.

Effects of Internal and External Factors on Bus Service
Reliability.Causes of unreliability analysis for bus service have been well documented by Cham [1], TCRP-88 [2], TCQSM [10], Abkowitz and Engelstein [11], and Abkowitz et al. [12].Deriving from the internal bus systems or external traffic conditions, a number of factors affect bus travel times resulting in travel time variability and service unreliability.According to the previous research [1,2], travel time delays are impacted by major factors involving departure delays, number of stops made, dwell times, number of traffic signals, and so forth.Intuitively, the existence of signalized intersections leads to the variability of travel times due to bus random arrivals at traffic signals; access connections on the road represent conflict points where buses interact with the merging and diverging vehicles.Consequently, the following hypotheses are inferred and present: (H1) The dwell time has a direct and positive impact on travel time deviation.
(H2) Number of signalized intersections has a direct and positive impact on travel time deviation.
(H3) Number of access connections has a direct and positive impact on travel time deviation.
Apart from the above major interrelationships, it is likely that departure deviations at bus terminals cause an increase in passenger boarding (namely, dwell times) at bus stops further downstream.Increased boarding at bus stops results in longer dwell times, which increase total travel times [1].Meanwhile, longer bus stop spacing makes it more likely for buses to traverse more traffic signals and access connections.Thus, the following assumptions are proposed: (H4) Departure delays directly and positively impact dwell times.
(H5) Segment length directly and positively impacts number of traffic signals.
(H6) Segment length directly and positively impacts number of access connections.

Bus Drivers Schedule Recovery
Behaviors.Provided that bus travel time deviations from schedules exist, bus drivers could be motivated to adjust travel speeds to ensure the schedule adherence.This schedule recovery behavior of bus drivers was first investigated by Kalaputapu and Demetsky [9].Other researchers considered the schedule recovery effort as a control factor in modeling bus arrival time prediction and schedule optimization problems [13][14][15].Chen et al. [14] correlated the travel time delays on upstream segments with the travel time deviation on the segment under consideration.Similarly, Lin and Bertini [15] deem it reasonable that arrival time delays at two adjacent stops are strongly correlated, but delays for two stops far apart are usually weakly correlated.Besides the upstream delays having been considered above, departure punctuality at terminals is an important measure of bus service performance, impacts dwell times and travel times on downstream segments, and contributes to bus drivers schedule recovery efforts.Therefore, this study raises another two variables in SEM, departure delays at bus terminals and accumulated delays (of travel time on upstream segments), to explore schedule recovery phenomenon.
(H7) The accumulated delay has a direct and negative influence on travel time deviation.
(H8) The departure delay has a direct and negative influence on travel time deviation.
(H9) The departure delay has a direct and negative influence on the accumulated delay.
According to Lin and Bertini [15], how fast bus drivers can bring the bus back on schedule depends on the magnitude of deviation and the length of the remaining trip.Based on this inference, the following hypotheses are established: (H10) The segment length has a direct and negative influence on travel time deviation.
(H11) Percentage of completed trip has a direct and negative influence on travel time deviation.
Based on the above eleven research hypotheses, there are three exogenous variables which are assumed to be not affected by other variables, and five endogenous variables supposed to have unidirectional causal relationships with exogenous variables or other endogenous variables.Assume that the th bus trip on the th segment is under consideration.The bus route segment  originates from the bus stop  and terminates at bus stop  + 1, where  = 1, 2, . . ., .The regarding variables for the SEM models are notated and described in Table 1.

SEM Model Specification
Structural Equation Modeling uses various types of models to depict relationships among observed variables, with the same basic goal of providing a quantitative test for the hypothesized theoretical models [16].In detail, this approach refers to a series of statistical methodologies, including path analysis, confirmatory factor analysis, and structural regression models.In the 1980s, researchers introduced this approach to travel behavior studies [17,18].Until now, SEM methods have been applied to transportation market segmentation [19,20], travel behavior analysis [21][22][23], and service quality and satisfaction study [24][25][26].
Based on the inferred hypotheses in the previous section and the corresponding variables depicted in Table 1, this research develops a SEM model as follows.
where  is the column vector of the five endogenous variables,  is the column vector of the three exogenous variables,  is the matrix of path coefficients denoting the direct effects of endogenous variables on other endogenous variables, Γ is the matrix of path coefficients indicating the direct effects of exogenous variables on endogenous variables, and  is the column vector of estimation errors for five endogenous variables.Equation ( 1) can be expressed in the vector and matrix form as (2).The relevant path diagram of the proposed theoretical SEM model is shown in Figure 1.Consider Suppose that  is the sample covariance matrix of the exogenous and endogenous variables, and Σ is the theoretical SEM (see ( 1) and ( 2)) implied covariance matrix.The SEM models estimation process adopts particular fitting functions to minimize the discrepancy between Σ and  and to obtain estimates for each of the parameters specified by SEM models.In this study, the generalized least squares (GLS) method is employed in SEM models estimation, given the specific sample size [27].
According to previous research, a sample size needs to be sufficient to achieve the desired precision level of path coefficients estimates and model fit.On one hand, the sample size should be greater than 200 for an acceptable model [28]; on the other hand, it should be ten times or fifteen times the number of the observed variables [29].This study focuses on peak hour periods when bus travel time deviations occur frequently.The first sample used to test the proposed original model includes 209 observations, deriving from the eleven   The data in the above samples derives from the AVL archived records, the schemed timetable, and the field survey data of a bus route numbered 102 in Suzhou City, China.Specifically, the bus trip information (namely, actual departure times ADT and actual arrival times AAT) is directly extracted from AVL archived records; the bus route timetable presents scheduled departure times SDT and scheduled travel times STT; the data concerning bus route segments (lengths , number of traffic signals TS, and number of access connections AC) is collected by field surveys.Data entry and editing are conducted in the statistical software package SPSS.With the sample data and path diagram of theoretical SEM model as inputs, the step of model estimation is performed by using the SEM software, Amos of version 17.0.

SEM Model Testing and Modification
In order to inspect how well the sample data supports the proposed theoretical SEM model, the model testing procedure needs to be carried out by examining the goodness of fit for the entire model and the statistical significance for the individual parameters.The original model is estimated by inputting the first sample which includes 209 observations, deriving from the eleven bus trips in peak hour period (7:00-8:00) on May 14, 2012.Another sample with 209 observations from eleven bus trips during the same service period on May 15 is collected to validate the modified models.
As to the entire model fit test, SEM has a large number of model fit measures.Most of these measures are established based on the discrepancy between Σ and , which is referred to as Chi-square  2 [16,21,25].The model fit measures typically used are listed in Table 2.The third column in Table 2 shows that the five fitness measures for the original model defined by Figure 1 cannot reach the acceptable levels, illustrating the poor fitness of the proposed original model.To improve the goodness of model fit, researchers are inclined to add or remove paths in the originally proposed model based on the statistical significance of path coefficients.As a result, the following model modification and testing are conducted.

Model Modification A.
According to the critical ratios (CR) and  values for path coefficients in columns 2 and 3 of Table 3, most estimates of the path coefficients have values significantly different from zero.But the  value (0.688) for the path, departure delay → dwell time, is extremely great compared with 0.000.Correspondently, the correlation between departure delay and dwell time in the sample correlation matrix of Table 4 is −0.036 indicating low correlativity.Accordingly, the path (denoting hypothesis (H4)) is removed from the path diagram of the original model and Modified Model A is raised.

Model Testing A. Model estimation process is performed
for this modified SEM model with the new sample as inputs.For Modified Model A, all of the model fit measures in the fourth column in Table 2 cannot reach the good model fit thresholds, reflecting that Modified Model A need to be respecified further.

Model Modification B.
By analyzing the CR and  values for path coefficients in columns 4 and 5 of Table 3, it is found that the path, completed trip → travel time deviation, shows low significance with the  value of 0.401.Therefore, the corresponding path (denoting hypothesis (H11)) is removed to specify Modified Model B. 2 that the vital fit measures for Modified Model B reflect good or reasonable model fit.Meanwhile, the path coefficients are statistically significant with all of the  values less than 0.100 shown in columns 6 and 7 of Table 3.As a result, the specification of Modified Model B with better parsimony and model fit can be accepted finally.

Model Testing B. It is shown in the fifth column of Table
The finally modified and accepted model denoted as Modified Model B in Tables 2 and 3 can be represented by the path diagram in Figure 2.

Theoretical SEM Model Fitness.
For the finally modified and accepted model shown in Figure 2, the Chi-square value is 21.399, the degrees of freedom are 12, and thus the ratio of Chi-square to df,  2 /df, is 1.783.Many researchers have   suggested the use of the ratio as a measure of fit but different values for acceptable levels to indicate a reasonable fit [25,30].However, the ratio less than 2.0 generally indicates a reasonable fit between the hypothetical SEM model and the sample data.The goodness of fit (GIF) and adjusted goodness of fit (AGIF) for the modified SEM model are 0.971 and 0.931, respectively, very close to 1.0.The value of RMSEA is 0.061.These measures in the fifth column of Table 2 yield supportive indices for the reasonable SEM model structure and also suggest that the sample data fits the final model well.

Hypothesis Testing Results
. Figure 2 presents the path diagram with the path coefficients representing standardized estimates of regression weights.The standardized path coefficients are useful in determining the relative importance of each variable to other variables for a given sample.In addition, standardized path coefficients make it feasible to interpret interrelationships on the same scale of measurement [16].Causal relationships between the physical features of the bus route segment and travel time deviation, between the departure and arrival delay and travel time deviation, can be illustrated by the magnitude and sign of standardized coefficients.
The path coefficients, from departure delay to accumulated delay, from departure delay to travel time deviation, and from accumulated delay to travel time deviation, are −0.666,−0.301, and −0.168, respectively.The negative signs verify hypotheses (H7), (H8), and (H9).They also imply that bus drivers attempt to reduce travel time deviation and pursue schedule adherence by schedule recovery behavior, in cases where departure delay and travel time deviations from upstream segments occur.
Dwell time does have a vital impact on travel time variability as the path coefficient from dwell time to travel time deviation takes a medium value of 0.224.It suggests that passenger boarding and alighting at bus stops should be paid attention to and treated as independent variable in service reliability analysis and service planning modeling.
Causal relationships between physical feature of roadway segment and travel time deviation are implied by path coefficients for hypotheses (H2), (H3), (H5), (H6), and (H10).The hypotheses, (H5) and (H6), take positive values of 0.765 and 0.302, respectively, consistent with the common sense that buses traverse more intersections and access connections with longer stop spacing.Compared with number of access, traffic signals lead to the fluctuation of travel time more intensively.We can infer that the stop delays for buses at traffic signals make great contribution to the total bus delay on segment.As supposed, the direct effect of segment length on travel time deviation is negative.The greatest absolute value of 0.381, among the coefficients for paths from other variables to travel time deviation, proves that travel distance plays an essential role in bus drivers' schedule recovery behavior.It is more likely for bus drivers to bring the buses back to schedule with long travel distance.

Direct and Indirect Effects.
It is shown that, in Table 5, the exogenous variables, departure delay and segment length, both have direct and indirect effects on the endogenous variable travel time deviation.
The correlation (−0.189) between departure delay and travel time deviation is the sum of (i) the direct effect (−0.301) of departure delay on travel time deviation and (ii) the indirect effect (0.112 = −0.666* (−0.168)) of departure delay on travel time deviation through accumulated delay.The direct and indirect effects take reverse signs, representing negative and positive influences, respectively.
Rather than comparing path coefficients −0.301 and −0.168 directly, we suggest that departure delay and travel time delay of upstream segments have a similar influence on travel time variability as they have the correlations or total effects of −0.189 and −0.168, respectively.
The correlation (−0.190) between segment length and travel time deviation also consists of direct and indirect effects.On one hand, segment length directly and negatively impacts travel time variability and inversely positively contributes to the schedule recovery process; on the other hand, this exogenous variable indirectly and positively influences the variance of segment travel times through the existence of signalized intersections and access connections on the road.

Conclusions and Implications
This study investigates the influences of causes of unreliability and bus drivers' schedule recovery efforts on travel time variance.The theoretical hypothesized SEM modeling these interrelationships demonstrates excellent fitness with multiple measures.In other words, most of the preestablished hypotheses are adequately supported by the research sample dataset from bus AVL system.The SEM model structure and estimation results facilitate the understanding of bus service performance characteristics and provide several implications for bus service planning, management, and operation.
The final SEM model confirms the schedule recovery phenomenon, namely, bus drivers' active schedule adherence behavior.The departure delays at the bus terminals and upstream travel time deviations indeed have a negative impact on travel time fluctuation of buses en route.It also shows that these two portions of delays have similar magnitudes of total effects on travel time deviations.Given that upstream travel time delays have been taken into consideration in bus service planning and arrival time prediction of the existing research, particular emphasis should be placed on departure delays at bus terminals.Thus, there is a need to embed departure punctuality into bus operation modeling.
It is known that traffic signals on the road cause additional travel time delays for the passing buses.In this study, the number of signalized intersections on bus segments is taken as an observed variable in SEM model and found to positively affect travel time deviation.It comes to a conclusion that treatments reducing the stop delays of buses at traffic signals will make travel time deviations decrease.This kind of treatments often refers to active and passive transit signal priority controls.
As discussed in the last section, the segment length or bus stop spacing directly and negatively impacts travel time variability and inversely positively contributes to the schedule recovery process; on the other hand, it indirectly and positively influences travel time variance through the existence of signalized intersections and access connections.To optimize bus stop spacing, bus service researchers and planners can refine stop spacing model by taking account of its effects on travel time reliability and bus schedule recovery.
The parsimony for the proposed SEM model in this study is promised, and all of the hypothesized paths are based on well-known empirical research and supported by real-world data.But in the future work, it is advised to explore the correlation between bus service reliability and additional observed variables, such as those regarding transit preferential treatments.

Figure 1 :
Figure 1: Path diagram of the original SEM model for bus travel time deviation analysis.
departure time (ADT  ) and actual arrival time (AAT  ) for bus trip  travel time (ATT  ) from scheduled travel time (STT  ) for bus trip  time deviation on the upstream segments from 1 to  − 1, signals on the bus route segment  Number of access connections AC  Number of access connections on the bus route segment  Exogenous variables Segment length   Length of the bus route segment  between adjacent bus stops Departure delay DD  Deviation of actual departure time (ADT  ) from scheduled departure time (SDT  ) at the bus terminal for the bus trip , time that buses spent on upstream segments, that is, PCR

Figure 2 :
Figure 2: Path diagram of the modified SEM model for bus travel time deviation analysis.

Table 1 :
Exogenous and endogenous variables for SEM models.

Table 2 :
Model fit measures for the original and modified SEM models.

Table 3 :
Path coefficients for original and modified SEM models.

Table 5 :
Standardized direct, indirect, and total effects.