Operating Time Division for a Bus Route Based on the Recovery of GPS Data

Bus travel time is an important source of data for time of day partition of the bus route. However, in practice, a bus driver may deliberately speed up or slow down on route so as to follow the predetermined timetable. The raw GPS data collected by the GPS device equipped on the bus, as a result, cannot reflect its real operating conditions. To address this concern, this study first develops a method to identify whether there is deliberate speed-up or slow-downmovement of a bus. Building upon the relationships between the intersection delay, link travel time, and traffic flow, a recoverymethod is established for calculating the real bus travel time. Using the dwell time at each stop and the recovered travel time between each of them as the division indexes, a sequential clustering-based time of day partition method is proposed. The effectiveness of the developed method is demonstrated using the data of bus route 63 in Harbin, China. Results show that the partitionmethod can help bus enterprises to design reasonable time of day intervals and significantly improve their level of service.


Introduction
A well-designed bus schedule scheme is important for increasing bus transit ridership [1].Bus passenger demand differs greatly at different time intervals during the everyday operation.Before the overall design of a bus schedule scheme, the operating time of a bus route should be divided into multiple time intervals for which different schedule schemes should be made.This greatly helps formulate precise operating and dispatching schemes for buses and reduce the operational costs of a bus transit enterprise.
In recent years, buses in a number of large cities in China have been equipped with GPS devices [2][3][4][5].Bus enterprises can now directly retrieve the bus travel time between any two stops from this database.However, given the predetermined timetables, the travel time may not reflect the actual performance of a bus.When faced with traffic jams, a bus driver may deliberately accelerate if the bus is to arrive at a downstream stop no later than the scheduled time.Although it may manage to arrive on time, the bus typically undergoes frequent acceleration and deceleration enroute which not only reduces the comfort of passengers but also increases the probability of traffic accidents.In contrast, the bus driver may deliberately slow down in smooth traffic so as to avoid early arrival at the downstream stop.Consequently, the lowered travel speed may leave the passengers with the impression that the bus service is inefficient.These two kinds of drivers' behavior are common in China [6,7].The root cause is that the initial timetables are usually nonoptimal considering the real-time traffic conditions.Therefore, the retrieved GPS data cannot be used directly.To obtain the actual travel speed and to further divide the operating time, the effect of drivers' behavior should be considered.
Scholars have conducted much research on the optimization of bus schedule schemes but have rarely investigated the division of the operating time [8][9][10][11].To evaluate the effectiveness of a bus schedule scheme, Patnaik et al. [12] selected as indexes the numbers of passengers boarding and alighting the bus and the number of midway stops.The buses from the starting stop were then divided into several classes.The data used to develop the models were collected by the Automatic Passenger Counters (APC) on buses operated by a transit agency in the northeast region of the United States.Guihaire and Hao [13] presented a global review of the crucial strategic and tactical steps of transit planning: the design and schedule of the network.They pointed out that the bus operating period mainly depended on the passengers' requirements which were different at different times.However, no analytic method has been developed for time of day partition.Using ridership data from a bus smart card system, Yue [14] obtained an ordered sampling of the passengers' arrival ratio curve and divided the operating time into multiple intervals using the Fisher optimal segmentation method.In his model, only the passenger volume was considered and the bus travel speed was neglected.As a result, the bus operating conditions were not fully considered during the partition.Shen et al. [15] proposed an improved -means clustering algorithm for the division of the bus operating period based on GPS data.However, only the bus travel speed was used and the passenger demand was not considered.Given that, in different time intervals, a transit agency tends to arrange different bus dispatching frequencies because of the different passenger demand, this study becomes less practically promising.Bie et al. [16] selected the dwell time at each stop and the travel time between each pair of them as indexes and developed a rapid division algorithm.This is the first study that considers both the bus travel speed and the passenger demand in time of day partition.However, the GPS data were used directly without considering the deliberate speed-up or slow-down movement.
The existing methods for operating time division exhibit two shortcomings: (i) only the passenger flow volume is taken into account and (ii) data are obtained typically through manual work which consumes much manpower and many other resources.The method proposed in this study builds the relationship between time division and bus schedule scheme and successfully addresses these shortcomings.
The contributions of this study are twofold.Firstly, we develop a method to identify whether there is deliberate speed-up or slow-down movement of a bus.A recovery method is then established for calculating the real bus travel time based on raw GPS data.To the best of our knowledge, no research so far has investigated this kind of problem.Secondly, a sequential clustering algorithm is developed to partition the operating period into multiple intervals based on the recovered bus travel time and dwell time at stops.
The structure of this paper is organized as follows.In Section 2, a recognition method for bus operating state is first developed followed by a recovery method for the bus travel time.A discussion is provided as to why the recovery travel time and dwell time are selected as division indexes.In Section 3, a sequential sample clustering algorithm is proposed to divide the operating time into multiple time intervals using the recovered travel time and dwell time.Section 4 presents a real case study and Section 5 concludes the paper.

Development of the Operating Time Division Method
When traveling along a route, a bus usually passes through three different kinds of regions, namely, stops, road sections, and intersections.Therefore, the planned travel time of bus  from the th stop to the ( + 1)th stop can be further divided as follows: (,  + 1) =    (,  + 1) +    (,  + 1) where    (,  + 1) denotes the travel time spent at road sections,    (,  + 1) denotes the travel time spent at intersections, and    ( + 1) denotes the travel time spent at bus stops.
(,  + 1),    (,  + 1), and    ( + 1) can be extracted from GPS data in combination with a geographic information system (GIS) map.The actual travel time of the bus  from the th stop to the ( + 1)th stop, denoted as T  (,  + 1), can be rewritten as follows:

T𝑛
(,  + 1) = â  (,  + 1) + b  (,  + 1) + ĉ  ( + 1) . ( At intersections, bus drivers tend to reduce speed because of the queuing vehicles and the restriction of changing lanes.However, a driver can frequently accelerate and decelerate at road sections to reduce the travel time and to ensure punctual arrivals at the downstream stops.
In Case 1, although a bus may be delayed at intersections, it still arrives at the downstream stops on time due to deliberate acceleration at road sections.
In Case 2, although the driver may deliberately speed up the bus, it does not arrive on time at the downstream stops.
In Case 3, the increase in the bus travel time at road sections exceeds or equals the total increase in the travel time spent at intersections and in the dwell time at stops.The bus may run normally or undergo deliberate acceleration.
(2) Recognition of a Driver's Deliberate Deceleration.When the traffic volume is low, the bus travel times at road sections and intersections may decline.
Theoretically, if the timetable is not optimized in real time, the driver may deliberately slow down the bus to enable punctual arrivals according to the schedule.
In Case 1, although the bus is slightly delayed at intersections, it still arrives at the downstream stops on time, since the driver deliberately slows down the bus.
In Case 2, although the drive deliberately slows down the bus at road sections, the bus still arrives at the downstream stops ahead of the scheduled time.
In Case 3, the decrease in the bus travel time at road sections exceeds or equals the total decrease in the travel time spent at intersections and in the dwell time at stops.The bus may run normally or undergo deliberate deceleration.

Recovery of the Optimal Travel Time on the Road.
When a driver's deliberate acceleration or deceleration is recognized as discussed in Section 2.1, the retrieved GPS data cannot be directly used for the optimization of the schedule scheme.This effect should be considered for recovering the optimal bus travel time on the road.
The delay time of a bus at an intersection can be calculated by subtracting the travel time at a preset speed from the travel time spent at an intersection.During the operating period, , a number of buses pass through the intersection and their average delay can be directly calculated.Assuming that  1  denotes the average delay at the timetable's initial operation stage, the traffic conditions will change after a certain period of time, and the average delay will become  2  .Generally speaking, the traffic flow on a road increases/decreases as a result of an increase/decrease in traffic flow at the adjacent intersection.According to the theory of traffic engineering, the travel time spent at a road section or at an intersection is directly proportional to the traffic flow.At a signalized intersection, the average delay  can be calculated by the following [17].
where   ,   , and Cap  denote the green ratio, degree of saturation, and traffic capacity, respectively, of the phase for bus .T denotes the length of the analysis period and is generally set at 0.25 h.
where   and   denote the ratios of the arrival and saturation flows of the entrance lane for bus , respectively.
For bus , when the average delay changes from while the other variables remain unchanged, the variation ratio of the flow at the entrance lane can be derived according to ( 12)- (13).Since   denotes the ratio of the flow after a certain period of time to the original one,   can also denote the variation ratio of traffic flow which will be used for recovering the optimal travel time of the bus on the road.Through field observations, a relationship is shown to exist between the average speed of traffic on urban roads and the flow.At low traffic flow, speed is insensitive to the increase in flow and only decreases slightly.When the flow increases and is close to the capacity of the road, the speed decreases significantly.When the flow is lower than the capacity of the road, the average speed varies with the flow in an approximate linear fashion: where    denotes the flow in pcu/h of the road section, V  denotes the average speed of the traffic flow in km/h, and  and  are constants to be determined.
According to the characteristics of traffic flow, when freeflow speed   occurs, the traffic flow equals 0 (   = 0).When the speed equals the optimal value   , the traffic flow    reaches the maximum and the saturation flow ratio   is achieved.Therefore, the following equations hold: By calculation, we can get  =   /(  −  ),  = /(  −   ).
Assuming that the flow changes to      after the bus dispatching scheme is executed for a certain period of time, the average travel speed V   of the bus can be calculated by Defining    = V   /V  , the following expressions can be obtained: Let   (,  + 1) denote the average travel time of the bus from the th to the ( + 1)th stop within the operating time period  at the timetable's initial operation stage.The optimal travel speed after a certain period of time becomes â (,  + 1) which denotes the recovered average speed from the th to the ( + 1)th stop.
is the most important parameter which plays a decisive role in the travel time recovery process.Figure 1

Determination of the Operating Time Division Indexes.
To divide the operating period, we first analyze all the historical data and then group into the same class buses with similar operating states and starting times into the same class.The corresponding operating time is referred to as a time interval.In this study, the dwell time at each stop and the interstop travel time (the recovered value as described in Section 2.2) are selected as the division indexes.
(1) Bus Dwell Time at Stops.The dispatching frequency affects the passenger volume of a bus route when the vehicle capacity of a bus is fixed.In each interval, the frequency is kept constant.Only when the passenger volume is also constant or slightly fluctuates will the passenger load factor of each dispatched bus be similar to one another.As a result, the uneven bus occupancy rate in an interval can be avoided.With GPS data only, the number of alighting/boarding passengers at each stop is not available.However, empirical results show that the bus dwell time is positively proportional to the number of alighting/boarding passengers.Namely, a larger number of alighting/boarding passengers will result in longer dwell time.Though the bus dwell time is also affected by some other secondary factors such as the fare structure and the bus vehicle type (whether all doors can be used by the alighting passengers), they are all predetermined and remain unchanged for a given bus line.Hence the fluctuation of the bus dwell time at stops is mainly dependent on the number of alighting/boarding passengers.Therefore, the total dwell time at all stops is used to measure the passenger demand.Buses with similar total dwell time will be classified into the same time interval.
Let Δ max denote the maximum permissible difference in the total dwell time at all stops for a bus in the same period.It can be calculated by Δ max =  ⋅  ⋅   where  denotes the maximum number of passengers,  denotes the passenger carrying factor which is used mainly for adjusting the expected degree of crowdedness in the bus, and   denotes the average boarding time of each passenger at each stop.
(2) Bus Interstop Travel Time.The bus operating status is affected not only by the arrival passenger volume at each stop, but also by the traffic conditions in real time.The road traffic conditions influence the bus interstop travel time and hence the punctuality of the bus at each stop.For two buses dispatched consecutively from the same depot, if they have identical numbers of alighting/boarding passengers at each stop but different interstop travel time, they will experience different total travel time as well as different levels of punctuality at stops.Hence these two buses should not be classified into the same time of day interval.
Let Δ max denote the maximum permissible difference in the total travel time among all stops for a bus in the same period.It can be calculated by Δ max = max{0,  −  ⋅  ⋅   }, where  denotes the departure time interval for the buses as stated in the timetable.
Study [18] has analyzed the division indexes in different time of day for a bus route.However, the bus travel time obtained from GPS data was used directly.The deliberate acceleration or deceleration was not considered which renders the division results nonoptimal.

Operating Time Division Algorithm
Some classical clustering algorithms (such as -means clustering) have achieved favorable results in index-based classification but are not suitable for this study.These classical algorithms do not take the order of data into account but quantify the correlation among data by using one of the distance metrics (such as the Euclidean and Mahalanobis distances).If the sequence of the buses is not taken into consideration, the buses with nonadjacent departure time intervals may be included in the same class.For example, when the first, second, third, fourth, tenth, and twentieth buses are included in the same operating period, this period can be divided into three subplots: subplot 1 includes the first, second, third, and fourth buses; subplot 2 includes the tenth bus; and subplot 3 includes the twentieth bus.Subplots 2 and 3 are quite short leading to frequent transitions between different bus dispatching schemes which reduces the management efficiency of the bus enterprise [19].
Given that the sequential sample clustering requires that the data sequence not be disturbed, a Fisher sequential sample clustering method (also referred to as optimal segmentation) is the most effective method [18].There are 2 −1 division methods for  sequential samples.Each division method corresponds to segmentation.Among these segmentations, there exists an optimal segmentation that minimizes the difference within a segment and maximizes the difference among segments.To help achieve the optimal segmentation, the diameter of a class should be defined.After that the loss function is defined according to the constraint that the neighboring samples should be included in the same class.The optimal classification is found through a step-by-step recursive calculation with the objective of minimizing the loss function.The details of the procedure are described below.
(1) Calculation of the Diameter of a Class.In this study, the ordered variables are denoted as  1 ,  2 , . . .,   (each variable   denotes an -dimensional column vector,  = 1, . . ., ). = 2 given that the dwell time and the travel time are selected as two division indexes.Assuming that {  ,  +1 , . . .,   } denotes a segment (1 ≤  ≤  ≤ ), the diameter of a class (also referred to as the sum of the squares of deviation) (, ) can be written as follows: (2) Calculation of the Loss Function.For simplicity, the variable   ( = 1, . . ., ) is denoted by its subscript .
Assuming that   denotes the first sample (vector) in the th segment, the following method can be used for dividing the  ordered variables into  classes: To use Fisher clustering, we need to define a loss function ((, )) to evaluate the quality of clustering.For a certain division method, the loss function ((, )) is defined as the sum of the squares of the deviations of all classes.Given  and  (the Fisher algorithm is applicable to cases with a known class number, ), the total sum of the squares of the deviations of all classes is fixed.Hence a smaller intraclass sum of squares and a larger interclass sum of squares give better classification results.In other words, clustering or segmentation aims to find a method which minimizes the loss function ((, )): To solve the above-described objective function, we use the following recursion: For example, when  = 2,  * (, 2) is the optimal method among all possible division schemes that minimizes the loss function.

𝑒 (𝑃
Using the method of induction, the recursion described in (23) can be derived which represents the optimal classification method of dividing  samples into  classes.It can be regarded as a combination of the optimal classification method of dividing  − 1 samples into  − 1 classes and the th segment which includes the remaining  −  + 1 samples.
There are two unique features of this algorithm.Firstly, it does not disturb the order of the dispatched buses.Hence the numbers of all buses that are classified into the same interval are adjacent.Secondly, the algorithm is not complex which takes less time to get the partition results and which can improve the computational efficiency.
(3) Final Division Based on Threshold Values of Two Indexes.By means of the above two steps, the dispatched buses are   divided into  groups.However, it has not been determined whether the differences in the total dwell time and in the interstop travel time between two adjacent buses are smaller than the threshold values, which is thus evaluated in this step.In each group, if the differences in the two division indexes of two adjacent buses are larger than the threshold values, the two buses should be classified into different groups.
Figure 2 illustrates the overall process of the division algorithm.

Case Study
In this section, we apply the proposed time of day division method based on GPS data on the number 63 bus route in Harbin, China, as a case study.The operating distance of one direction is approximately 9.5 km.The bus enterprise has set the sampling interval of the GPS data at 30 seconds, which, however, cannot satisfy the requirement of this study.As a result, we carried out our own investigation of the bus line for two weeks (from Monday to Friday per week) during September 2013.In each bus, a GPS device was placed and connected to a laptop for real-time storage of the GPS data, which were later matched with a GIS map.Afterwards, the required travel time spent on road sections and intersections and the delay and dwell time at stops were extracted.During the investigation, the bus operating time was from 05:50 to 21:00.The operating period can be divided into 6 time intervals for each day.The specific starting and ending time points as well as the departure headways are listed in Table 1.In December 2014, we performed a second investigation for one week (Monday to Friday) and obtained the latest bus operating data.Compared with the first investigation, the total number of vehicles in Harbin had increased significantly.In addition, due to the winter snow on the road, vehicles moved more slowly and road congestion became even more serious.Acceleration and deceleration of the bus vehicles happened to be more frequent.As a result, the original schedule scheme was no longer suitable for the second investigation.

Division Results
. Before the division of the operating time, the values of various parameters should be determined. and   are constants which are set at 60 people per bus and 2.2 seconds per person, respectively.Given that  = 8 minutes,   max = 2 and   min =  and Δ max = 112.2and Δ max = 367.8.
All the division indexes are normalized before used.Table 2 lists the minimal loss function based on the sequential clustering and the beginning label at the last time slot.The minimal loss function is calculated from the second column; that is,  = 2.The minimal loss functions of all the schemes of dividing the first  buses (3 ≤  ≤ 97) into  classes are derived to determine the optimal segmentation.Using min ((3, 2)) as an example, there are two division schemes which divide the first two buses into two classes, namely, ({1}, {2, 3}) and ({1, 2}, {3}).

(24)
The optimal segmentation is ({1}, {2, 3}), and the beginning label of the last class (i.e., 2) is recorded.As shown in the second row and the second column in Table 2, [2] on the right of 0.0007 represents the division of the first three buses into 2 classes where the beginning label of the second class is 2 and the corresponding minimal loss function is 0.0007.Moreover, the division indexes (the average dwell time at stops and the average travel time among stops) are different for different buses in a class which should be taken into account in the classification.The buses whose division indexes are smaller than the thresholds are grouped into the same class.For example, there are 92 division schemes when dividing 93 samples into 2 classes.Before the calculation of the loss function, we should first evaluate whether the thresholds Δ max and Δ max are satisfied and delete those division schemes that do not satisfy the requirement.Only after that can the loss functions of the remaining division schemes be calculated so as to determine the optimal division.
As shown in Table 2, the sequential clustering algorithm cannot determine the class number  but can only determine the optimal class number according to the variation in the minimal error function.It can be observed that, in the last row of Table 2 ( = 93), the minimal error function of 93 sample data decreases gradually with an increasing .A greater  suggests a finer division and, accordingly, fewer buses are included in a class in which the difference is smaller.However, a bus enterprise does not necessarily want to increase the number of the operating time slots, since doing so will not only increase the frequency to update the dispatching schemes but also require more transition schemes between different dispatching schemes.Frequent transitions may reduce the operating efficiency of bus transit [7].In studies [7,8], the value of  was determined by the manager.For this study, with reference to the previous research, we consulted the administration department of the bus enterprise and finally set the value of  at 8; that is, the operating time of the number 63 bus is divided into 8 time intervals as shown in Table 3.

Conclusion
This study first recovers the bus travel time on the road based on the historical GPS data and then divides the bus operating time using a sequential clustering algorithm.The main conclusions are as follows: (1) The bus travel time data collected from the busmounted GPS cannot truly reflect the real operating (2) For the division of the operating time, the division algorithm is more sensitive to the threshold value of the dwell time at stops.A smaller threshold value may easily make the division finer.
(3) A sequential clustering method can ensure that the order of the adjacent buses is not disrupted in order to achieve a favorable division of the bus operating time.

Figure 2 :
Figure 2: Flow chart of the division algorithm.

4. 1 .
Data Acquisition.Bus route 63 in Harbin has 21 stops in total.The line starts from Jiangong Community and goes all the way to Dajiang Community along the westbound direction.
illustrates the overall process for calculating    .Figure1: Flow chart of the bus travel time recovery.

Table 1 :
Time interval partition scheme and bus headway during the investigation period.

Table 2 :
Minimal lost function and starting codes of the last cluster in different partition methods.

Table 3 :
Final partition results of operation time intervals for bus route 63.