Least Expected Time Paths in Stochastic Schedule-Based Transit Networks

We consider the problem of determining a least expected time (LET) path that minimizes the number of transfers and the expected total travel time in a stochastic schedule-based transit network. A time-dependent model is proposed to represent the stochastic transit network where vehicle arrival times are fully stochastically correlated. An exact label-correcting algorithm is developed, based on a proposed dominance condition by which Bellman’s principle of optimality is valid. Experimental results, which are conducted on the Ho Chi Minh City bus network, show that the running time of the proposed algorithm is suitable for real-time operation, and the resulting LET paths are robust against uncertainty, such as unknown traffic scenarios.


Introduction
The routing problem in a schedule-based transit network involves scheduling decisions made by a traveler, for example, accessing to a stop (station), walking between stops, waiting to board, traveling in-vehicle, alighting, and egressing.These decisions guide the traveler from an origin to a destination with minimum travel costs, such as number of transfers, total travel time, walking time, and waiting time.The decisions of the traveler are not only constrained by the network configuration, that is, transit routes (lines), but also constrained by the schedules of transit vehicles.However, due to the stochastic and time-varying nature of vehicle travel time, as well as the effects of the arrival of a transit vehicle at upstream stop on its arrivals at downstream stops, the arrival times of transit vehicles usually do not follow their schedules.Therefore, the determination of robust routing decisions can greatly affect the quality of the routing service provided under uncertain conditions.
Along with the stochasticity of vehicle travel times and the relationship between vehicle arrival times on the same transit route, there might also exist overlaps between transit routes in the network.Therefore, the arrival times of transit vehicles would be not only stochastic but also fully stochastically correlated.The routing problem with stochastically correlated link travel times has been investigated intensively in highway networks [1][2][3][4][5][6].However, its counterpart in transit networks, where vehicle arrival times are considered as stochastically correlated, has not been addressed, while existing works in literature assumed vehicle arrival times to be deterministic [7][8][9][10][11][12][13] or statistically independent [14].The main issue when designing a routing algorithm in a schedule-based transit network with correlated vehicle arrival times is to model the stochastic correlation of vehicle arrival times.This issue is related to the question of how to incorporate the correlation of vehicle arrival times into the routing process, in which not only constraints on transit routes but also constraints on vehicle arrival times are taken into account: (i) A time-dependent model is proposed for stochastic schedule-based transit networks where the correlation of all vehicle arrival times is presented as a scenario.The graph model captures travelers' decisions, namely, boarding, traveling in-vehicle, alighting, walking, and time constraints of these decisions in each scenario.(ii) A new dominance condition for paths is established with respect to number of transfers and travel times over a set of possible scenarios.Then a formal proof that Bellman's principle of optimality is valid with 2 Mathematical Problems in Engineering nondominated paths is presented.This theoretical establishment enables the use of pretrip online information to reduce uncertainties for more robust LET paths.
(iii) An exact link-based routing algorithm is proposed for efficiently determining LET paths, based on Bellman's principle of optimality.The results from experiments, which are conducted using data from a real-size bus network in Ho Chi Minh City, show that the running time of the proposed algorithm is feasible for online applications.Also, LET paths are shown to be robust in the presence of unknown scenarios.
The remaining of the paper is organized as follows.We present related researches on the routing problem in transit networks in Section 2. In Section 3, we define components used to develop the algorithm for our routing problem.Then we propose the solution algorithm for determining the LET path in Section 4. Various experiments are conducted, and their results are discussed in Section 5. Finally, the conclusion is given in Section 6.

Related Work
A transit routing algorithm in literature has been built on the notion of path [7,8,15] or hyperpath [16][17][18].A path consists of fixed decisions made by a traveler at stops, which are determined before he/she leaves the origin.In contrast, a hyperpath represents routing strategies in which the traveler is allowed to change his/her decision at each intermediate stop, depending on the previous decisions and what are likely to happen in the future.Routing based on hyperpath was shown to make better travel costs under uncertainty but requires the incorporation of online information and high computational complexity [19].
Treatment for the routing problem in a transit network can be different, depending on the type of transit services, that is, either headway-based [15][16][17] or schedule-based [7,8,11,12].In the former, transit services are represented by transit routes, and arrival/departure times of transit vehicles are not explicitly considered.This results in an approximation in calculating boarding times and in-vehicle travel times.In the latter, transit services are explicitly specified in terms of trips (runs), in which arrival/departure times of transit vehicles at stops are considered.Meanwhile the routing algorithm in a headway-based transit network can employ shortest path algorithms, for example, Dijkstra's algorithm [20], which are the same as those used for highway networks.A schedulebased transit network requires a time-dependent network presentation where routing processes of travelers are not only constrained by the network topology but also constrained by scheduled arrival/departure times of transit vehicles.Therefore, modeling transit services is the first and important task in solving the routing problem in a schedule-based transit network.As classified by Nuzzolo and Crisalli [21], the representation of a schedule-based transit network can be one of the three forms: the diachronic (time-expanded) network [9,10,13], the dual network [22], and the mixed line-based/database supply model (time-dependent model) [11,23,24].
In the context where transit services are insufficiently reliable, headways and arrival/departure times of transit vehicles are commonly modeled as random variables with wellknown forms of probability distribution, for example, exponentially distributed headways [25,26], Gaussian distributed headways [27,28], and Gaussian distributed scheduled times [14].Along with the stochasticity of transit services, the uncertainty in travelers' perceptions on different types of travel costs can be also regarded as a source of stochasticity in a transit routing problem [8,27,28].In these works, random weights for different travel cost components, such as transfer penalty, walking time, and waiting time, were incorporated into the routing process.
The routing problem in transit networks has been investigated with various assumptions on many aspects, such as capacity limitation, congestion and overcrowding issues, vehicle capacity, and boarding failures [29].Nuzzolo and Crisalli [21] investigated various routing models for lowand high-frequency schedule-based transit networks.In the former, for example, in regional bus or railway networks, routing processes are based on arrival/departure times of transit vehicles [30,31].In the latter, typically in urban areas, travelers usually have a large number of options at stops to reach their destination.In this case, arrivals of travelers at stops do not rely heavily on vehicle arrival/departure times but are significantly affected by vehicle congestion, which are defined in literature as situations in which a traveler cannot board the first arriving vehicle and has to wait for next vehicles.Vehicle congestion can be modeled implicitly as increasing discomfort functions [11,12,32] or explicitly with vehicle capacity or set availability constraints [33][34][35].

Network Modeling
In this section, we define components used to develop the algorithm for determining LET paths in stochastic schedulebased transit networks.

Stochastic Schedule-Based Transit
Network.We consider transit network B = (S, R, T, Q, C), where S is the set of stops and R is the set of routes.A route,  ∈ R, is a fixed sequence of stops through which transit vehicles run periodically with fixed trips and defined by a set where    is the th stop and   is the number of stops on route .Let   be the number of trips of route  over a set of time intervals T = {0, , 2, . . ., }, where  is unit of time and  is the last time interval.The universal stochastic scenario set Q is a set of all known possible scenarios in the network such that where   is the occurrence probability of scenario  ∈ Q.
Each scenario  can be defined by a set of stop times where    ∈ T denotes the stop time (scheduled arrival time of a transit vehicle) at the th stop of the th trip on route  in scenario  and C is the universal set of all stop times in all possible scenarios such that In the context of transit networks, there might exist overlaps among routes.A scenario presents a stochastic correlation of not only stop times on the same route but also stop times on routes sharing the same physical links.The probability of a scenario happening is the full joint probability of all stop times taking place, and stop times are known for each scenario a priori.This allows us explicitly to take into account delays resulting from transfer failures due to late arrivals and their effects on the total travel time in each scenario.
For example, consider the transit network shown in Figure 1 with S = {, , } and R = {1, 2, 3}.In this network, there are three routes in which routes 1 and 2 provide services from stop  to stop  with  1 1 =    → .With Assumption (4), the choices of trips for the earliest arrival time at stop  in different scenarios are shown in Table 2.In particular, if the traveler uses the choice of routes  1 , his/her expected arrival time at stop  equals (11 + 12 + 16)/3 = 13.In this case, the choice of trips for  1 can be interpreted that, at stop , the traveler transfers to trip 1 of route 3 successfully in scenarios  1 and  2 but misses this trip in scenario  3 .This leads to a later arrival time, that is, 16 instead of 10, at stop , which contributes to the expected arrival time of the choice of routes  1 .Similarly, we have the expected arrival time at stop  of the choice of routes  2 that equals 12.66.Note that transfer failures might spread over several later trips, depending on scenario.In this paper, the following assumptions are adopted: (1) Actual travel times of transit vehicles between stops on a given route are nonnegative; that is, (2) Actual arrival times of transit vehicles for later trips cannot be earlier than those of earlier trips; that is, (3) Arrival times of transit vehicles in different scenarios are statistically independent.
(4) Vehicle capacity, overcrowding, and fare issues are not considered.In other words, it is assumed that passengers always board any arriving transit vehicle successfully.
(5) There is a similar perception for passengers on different time components, such as waiting time, walking time, and in-vehicle time.
Assumptions ( 1) and ( 2) are expected to be valid in practice where it is conventional that transit vehicles serving trips on the same route keep away from each other at certain distance and their travel times are always positive.Assumption (3) is equivalent to the assumption used in the routing problem in highway network with correlated link travel times [4][5][6]; that is, link travel times in different scenarios are stochastically independent.Assumptions (4) and ( 5) have been widely adopted in literature, for example, [14,18,19,25,26].

Time-Dependent Model.
A time-dependent graph model (similar to [24]) is used to present the transit network as a directed graph whose arcs model travelers' decisions, namely, boarding, traveling in vehicles, alighting, and walking.
Let G = (N, A, T) denote the graph modeling the transit network B, where the set of nodes N and the set of arcs A are defined as in which subsets of N and A are defined as follows:  A well-defined -V path , defined in Definition 1, in graph G represents a choice of routes when he/she travels from origin stop   ∈ S to destination stop   ∈ S within the transit network B.

Arc Time and Transfer
Weights.Note that only travelers' decisions are captured in Section 3.2.For modeling constraints on times when the schedules of transit vehicles are taken into account, times are then assigned to arcs as arc weights.
The graph that models the network shown in Figure 1.

Let 𝜏 𝑞
V () be the time weight on arc ⟨, V⟩ ∈ A with time  ∈ T at node  in scenario  ∈ Q. Depending on the type of arc ⟨, V⟩, the time weight is either boarding penalty, in-vehicle travel time, alighting penalty, or walking time.In particular,   V () can be assigned according to the four following cases.
, the traveler stands at the th stop at time  and boards a vehicle of arriving trip of route .Due to unlimited vehicle capacity assumption, the boarded trip is commonly the first arriving one [24,36,37].For boarding an arriving trip, the traveler must be at the stop before the bus of that trip leaves the stop by at least    units of time (note that herein  is set to one and will be omitted for convenience in the rest of the paper).The boarding penalty for the first arriving trip if the traveler stands at the th stop of route  at time  is expressed by , the traveler rides on a vehicle serving a certain trip, for example, the th trip, and travels from the ( − 1)th stop to th stop on route .The time weight on arc ⟨, V⟩ is therefore the invehicle travel time of the th trip from the (−1)th to th stop.The in-vehicle travel time of the th trip from the (−1)th stop to th stop on route  is where the traveler's arrival time  at the ( − 1)th stop is the stop time  and V ≡     , the traveler alights from a vehicle serving a trip, for example, the th trip, at the th stop of route .The arc time weight can be expressed by where    is the alighting time for the th trip at the th stop on route .Case 4. If ⟨, V⟩ ∈ A  , where  ≡   and V ≡    , the traveler walks from stop  to stop   .Let    be the minimum time  2.  -----------------------1 1 3 --------------------------2 --------------------------1 ----------------------------------- --------------- ----------------0 11 11 11 11 11 11 required for walking between stops  and   , and the walking time weight is given by Let  V be the weight for the number of transfers on arc ⟨, V⟩.Note that  V does not depend on time and scenario.The arc weight for number of transfers equals one if the arc is a boarding arc and zero for otherwise.Therefore, In summary, Table 3 shows the arc time weights in the example graph model in Figure 2 after applying (7), ( 8), (9), and (10) with boarding penalty    = 1 and alighting penalty    = 0.Each arc with symbol "-" at a given time and in a given scenario means the traveler's action associated with that arc is restricted at that time and scenario.For example, considering arc ⟨  ,  2  ⟩, in scenario  1 , from times 0 to 9 the arc represents the traveler's action of boarding route 2 at stop  with different boarding penalties; that is, from times 0 to 6 the traveler boards trip 1 with penalties from 7 to 1, and from times 7 to 9 the traveler boards trip 2 with penalties from 3 to 1.After time 9 the traveler's boarding action is restricted since no trip of route 2 will arrive at stop  in scenario  1 (see the timetable in Table 1).Note that only walking arcs, that is, ⟨  ,   ⟩ and ⟨  ,   ⟩, are available at any time since travelers can walk freely.For shortest path problems, restricted actions can be set with very large integer weights.

Least Expected Time (LET) Path Problem
In Section 3, we propose and explain the graph modeling transit network that captures travelers' actions, namely, boarding, in-vehicle, alighting, and walking, and time constraints associated with travelers' decisions.Below we will study the LET path problem in stochastic schedule-based transit networks using the graph model.

Problem Definition.
The LET path problem in this paper is studied from one origin node  ∈ N for a fixed departure time  to all destination nodes V ∈ N over a scenario set Ω ⊆ Q.The criteria used for evaluating a path include the number of transfers and the expected total travel time across the set of scenarios Ω.
Let    (, ) be the travel time on - path ,  ∈ N, in scenario  ∈ Ω.Let us consider -V path   that is expanded from - path  via arc ⟨, V⟩ ∈ A, denoted by   = ♢⟨, V⟩.The relationship between travel time on path   and that of its subpath  for departure time  in scenario  is given by where   V () is the time weight on arc ⟨, V⟩ at time .Depending on the type of arc ⟨, V⟩, arc weight   ,V () is determined by one of ( 7), ( 8), (9), and (10).When Assumption (3) holds, the expected (mean) travel time of -V path   with departure time  over scenario set Ω, denoted by    (V, , Ω), is given by where   is the occurrence probability of scenario  and    (, , Ω) = 0.
We also have the relationship between the number of transfers on path   , that is,    (V), and the number of transfers on its subpath , that is,   (), in the following: where weight  V for the number of transfers on arc ⟨, V⟩ is given by (11), and    () = 0.
From the transit travelers' perspective, it is more useful that we aim to minimize the number of transfers first and then the expected travel time across the scenario set.The LET path is given by Definition 2.
Definition 2 (LET -V path).The LET -V path  * with departure time  ∈ T over scenario set Ω ⊆ Q, ∀V ∈ N, is given by

Dominance Condition.
A LET -V path problem with departure time  ∈ T over scenario set Ω ∈ Q defined in (15) can be solved by enumerating all possible -V paths  V and then minimizing the number of transfers and the expected travel time of each -V path in  V for departure time  over Ω using ( 12) and ( 14).Such a brute force algorithm is inefficient.We therefore propose a dominance condition by which the optimal LET path is satisfied.First, we define a dominance condition in Definition 3.Then, the LET -V path for departure time  over scenario set Ω is found in the set of nondominated -V paths at time  over Ω by Proposition 4.
Note that the dominance condition in Definition 3 is not as strict as the one with at least one scenario  ∈ Q such that     (V, ) <    (V, ).This is because, in the constructed graph described in Section 3.2, there might exist many nondominated paths, which present the same choice of routes and are only different from each other in transfer locations.
Then -V path   is nondominated in  V for departure time  over scenario set Ω if   is not dominated by any -V path at time  over Ω.
Proposition 4. Given a departure time  ∈ T and a set of scenarios Ω ⊆ Q, the LET -V path at time  over Ω, ∀V ∈ N, belongs to the set of nondominated -V paths at time  over Ω.
By Definition 3, the problem of determining nondominated - paths can be treated as multicriteria shortest path problem with (|Q| + 1) independent criteria, namely, number of transfers, as well as travel times corresponding to || scenarios.Theorem 7 below implies that Bellman's principle of optimality is valid when nondominated paths are defined with respect to their nondominated subpaths.We later develop a forward label-correcting algorithm to solve the LET path problem based on Theorem 7. Note that Theorem 7 is established on the grounds of Lemmas 5 and 6, being only valid when Assumptions (1) and (2) hold.Lemma 5.For any given arc ⟨, V⟩ ∈ A,  V ≥ 0, and   V () ≥ 0, ∀ ∈ T, ∀ ∈ Q.By Propositions 4, 9, and 10, we can establish the relationship between nondominated, LET, and the fastest -V paths for departure time  over universal scenario set Q and its subset Ω as shown in Figure 3.By determining the set of nondominated -V paths over the universal scenario set Q at departure time , we can obtain LET -V path at time  over any subset of Q.The relationship is beneficial when prejourney online information is used to determine these subsets.

Label-Correcting Algorithm.
The algorithm for determining the LET paths is based on the link-based approach, using the optimality condition stated in Theorem 7, which is only valid as Assumptions (1) and (2) hold.Since the proposed approach helps avoid enumerating all possible origindestination paths, it is feasible in real-time applications.Note that Theorem 7 can be still valid with simple modifications in the dominance condition when other criteria, such as fare and walking distance, are taken into account as long as the arc weights for these criteria are positive and time-independent.Consequently, we can incorporate travelers' weightings on different criteria, such as number of transfers, total travel time, fare, and walking distance, in the routing process.
Nevertheless, one drawback of our approach is that it does not allow taking into account travelers' weightings on different time components, such as boarding time and invehicle travel time, since the arc weights for these time components are time-dependent.Several works in literature solved this issue using the path enumeration method [27,28] or the branch and bound method [8].However, these solution approaches are infeasible for real-time applications, especially in stochastic transit networks herein considering the stochastic correlation among stop times of transit vehicles.Our proposed algorithm is developed as follows.
Given departure time , for each node  ∈ N and each - path , the algorithm maintains a vector label Let L() be the set of nondominated labels corresponding to the set of nondominated - paths at time  over the set of scenarios Q.According to Theorem 7, each label Λ  () ∈ L() contains the information of nondominated - path  that has potential to be a nondominated origin-destination path at time  over Q when the algorithm terminates, where label Λ  () is nondominated in L() at time  over Q if  is a nondominated - path at time  over Q (see Definition 3).At each iteration of the algorithm, label Λ  () is selected from queue X that contains a nondominated candidate path .Path  is expanded via arc ⟨, V⟩ ∈ A. Depending on the type of arc ⟨, V⟩, a temporary label Λ   (V) for path   = ♢⟨, V⟩ is constructed with weights calculated by ( 12) and (14).To determine if a new label Λ   (V) is nondominated, it is compared with the nondominated labels L(V) at node V. Details for the algorithm are presented in Algorithm 1.
Algorithm 1 is equivalent to multicriteria shortest path algorithm for (|Q| + 1) independent criteria and terminates after a finite number of steps with a set of nondominated paths at each node [38].The algorithm is computationally intractable as the number of nondominated paths examined by the algorithm grows exponentially in the worst case [39].However, the experiments in Section 5 show that the number of examined nondominated paths in a typical transit network is much smaller than that of the worst case.

Illustrative Example. Consider the transit network in
Figure 1 and its schedules as shown in Table 1.The timedependent graph and its arc times are shown in Figure 2 and Table 3, respectively, with boarding penalty    = 1 and alighting penalty    = 0. Figure 4 shows nondominated vector labels at all nodes in the time-dependent graph for origin node  =   , destination node  =   , and departure time Input: the origin , the destination , the departure time , and the universal set of scenarios Q Output: the LET - path at time  over Ω ⊆ Q, where Ω is the realized scenarios at time  (1) Create an initial path  0 with the origin node ; (2) L() = 0 for all  ∈ N; (3) Create label Λ  0 () with    0 (, ) = 0, ∀ ∈ Q, and   0 () = 0; (4) L () = {Λ  0 ()}; (5) X = {Λ  0 ()}; (6) while Q ̸ = 0 do (7) Ex tr a cta n dr e m o v eal a be lΛ  () from queue X; (8) for V ∈ Γ(), Γ() = {V : ⟨, V⟩ ∈ A} do (9) Create a new path   = ♢⟨, V⟩; (10) Depending on the type of ⟨, V⟩, calculate travel time weight   V (), ∀ ∈ Q, using one of ( 7)-( 10), and calculate transfer weight  V using ( 11); (11) C a l c u l a t e    (V, ), ∀ ∈ Q, using (12), and calculate    (V) using ( 14); (12) C r e a t ean e wl a b e lΛ   (V) with    (V), and 17) Let Ω ⊆ Q be the scenarios realized at time ; (18) Apply ( 13) and ( 15) for the set of non-dominated paths L() over the set of scenarios Ω to obtain the LET - path; Algorithm 1: LET-Path search.q 1 q 2 q 3 tf tf tf tf tf tf tf tf tf tf tf tf tf q 1 q 2 q 3 q 1 q 2 q 3 q 1 q 2 q 3 q 1 q 2 q 3 q 1 q 2 q 3 q 1 q 2 q 3 q 1 q 2 q 3 q 1 q 2 q 3 q 1 q 2 q 3 q 1 q 2 q 3 q 1 q 2 q 3 q 1 q 2 q 3 Node V  = 0 over the universal scenario set Q = { 1 ,  2 ,  3 } and two nondominated - paths: after the termination of the LET-Path Algorithm 1.Each sequence of dashed-line arrows in Figure 4 gives travel times along a nondominated path in the corresponding scenario.Note that each path corresponds to a choice of routes and each sequence of dashed-line arrows corresponds to a choice of trips as shown in Table 2. Table 4 gives the summary of the obtained LET - paths with respect to subsets Ω of Q.As the relationship shown in Figure 3, the set of nondominated - paths over Q is the superset of all sets of nondominated - paths over Ω ⊆ Q and also contains the fastest - paths when each of scenarios  1 ,  2 , and  3 occurs.2 with respect to subsets Ω of Q.

Experiments
In this section we conduct large numerical experiments aiming to investigate (1) the average running time of the proposed LET-Path algorithm; (2) the set of nondominated - paths; and (3) the robustness of LET paths in the presence of unknown scenarios.

Experiment Setups.
The experiments are conducted on Ho Chi Minh City (HCMC) bus network (Figure 5).The network consists of 1,340 stops, 40 routes, and 1,445 physical links, that is, direct links connecting pairs of consecutive stops on routes.Walking shortcuts are available between stops in a radius of less than 500 meters, and the average walking speed is approximately 2 km/h.Intervals between consecutive trips are 15 minutes and scheduled stop times are generated from 7:00 am to 4:00 pm.The graph model has 5,943 nodes and 13,227 arcs with boarding penalty    = 1 minute and alighting penalty    = 0 minutes.The data set for experiments is made up from 500 random user requests (, , ) where - pairs are generated randomly with the constraint that the distance between origin and destination is at least 5 km, and departure time  is generated from 7:30 am 1:00 pm to make sure that path times are not later than the ending time at 4:00 pm.The experimental environment is 2.6 GHz dual-core Intel Xeon ES405 2.00 Hz, 3 GB RAM, on CentOS under Java Runtime Environment 1.6 (JRE 1.6) and MySQL 5.2 database.
For studying the robustness of the proposed scenariobased approach, we compare the path found by our approach with that of certain equivalence (CE) approximation [5], in which stop times of transit vehicles are deterministic.
The CE approximation replaces every stop time random variable by its expected value over the margin distribution.In particular, the expected stop time for the th trip at the th stop on route  over the scenario set Ω ⊆ Q is calculated by where  Ω  is the independent random stop time for the th trip at the th stop on route  over Ω.Thus, the stochastic network with |Q| scenarios is transformed into a deterministic network with only one scenario.
So far, we assume that exact information on the probability distribution of stop times is not available, and therefore it is impossible to build a sufficient number of scenarios that can precisely describe the uncertainty of schedules.Suppose that  ∈ Q is the unknown scenario that will actually happen and Ω = Q \ {} is the set of known scenarios.For a given - pair and departure time , let  * and  * be the LET - path at time  over scenarios Ω and the fastest - path at time  in scenario , where paths  * and  * are given by ( 19) and ( 15), respectively.Then, when scenario  actually happens, the desired optimal path will be  * .However, since only scenario set Ω is known, the proposed approach is robust if the expected travel time of path  * does not deviate much from that of path  * .Hence, the robustness of proposed approach is evaluated using the deviation of travel times of paths  * and  * in all unknown scenarios  ∈ Q for all triples (, , ).The evaluated criteria (or considered performance metrics) are as follows: (1) Precision, the ratio between the number of cases in which LET - path  * is also the fastest - path  * and the number of total cases.
(2) Mean absolute percentage error (MAPE), the average deviation percentage between the actual and expected travel time of the LET - path  * . ( where  is number of triples (, , ) being experimented, and Ω = Q \ {}.

Schedule Generations.
To generate scheduled stop times of transit vehicles (or buses), for each time interval  ∈ T and each scenario  ∈ Q, a direct link between two stops  and   is assigned a random integer speed V(,   , , ), which follows the normal probability distribution ∼ ( = 18,  2 = 5 2 ) km/h.The number of generated scenarios is |Q| = 400, which is equivalent to 400 days of tracking trajectories of all buses in the network, and the scenarios are assumed to be uniformly distributed.The stop time for the th trip at the ( + 1)th stop on route  in scenario  is calculated by  Note that no stochastic dependency is applied to stop time generations and link speeds independently fluctuate within a range [3,33] km/h underpinned by the normal distribution ∼ ( = 18,  2 = 5 2 ).However, by ( 25) and ( 26), only stop times of trips on the same route or on routes with shared links are correlated which is also observable in practice.

Results.
To give an overview of LET paths found in the experiments, we first examine the impact of different sets of scenarios on LET - paths.We conduct the experiments on random subsets of the universal scenario set Q with different numbers of sampled scenarios being 1, 50, . . ., 400.Table 5 shows that, due to (15), numbers of transfers of LET - paths are always minimized and do not depend on the scenario set.This also implies that the number of transfers of LET - path only depends on the topology of the network.Similarly, different subsets of Q do not cause significant impacts on waiting times, as well as travel and walking distances.For investigating the average running time of LET-Path algorithm, Figures 7 and 8 show the average running time of the LET-Path algorithm and the average number of nondominated paths at all nodes after the termination of algorithm.Note that the algorithm finds nondominated paths from one origin to all nodes.Hence, the running time of the algorithm increases when the number of nondominated paths at each node increases.In particular, as the scenario set increases, by Definition 3 the condition for a path dominated by another path in all scenarios is looser.This results in an increase in the number of nondominated paths.The results also show that the number of nondominated paths generated   is not exponential as proved in the worst case, and running times are feasible for real-time applications.
For studying the impact of scenario sets on the number of nondominated - paths, Figure 9 shows that the number of nondominated - paths is proportional to the size of scenario set.The result also comes from the looser dominance condition when the size of the scenario set increases.
Table 6 compares the robustnesses of scenario-based approach and the certain equivalence approach.The results show that scenario-based approach is superior to the certain equivalence approach in all the evaluated criteria, namely, Precision, MAPE, and FMAPE.In addition, despite a high MAPE (8.52%), which is still less than that of certain equivalence approach (11.59%), the difference between actual travel time of the LET - path and that of the fastest - path for the same departure time and scenario set is only 1.82%, and in up to 86.58% of queries the LET - path is also the fastest - path.Although scenario-based approach produces a large error in travel time prediction (8.52%) when the actual scenario is unknown, the travel time of found LET - path and the travel time of the fastest path in the actual scenario do not deviate much (1.82%).These results prove that LET - paths are robust even when the actual scenario is unknown.

Conclusions
The LET path problem, which minimizes the expected travel time between a given origin-destination pair for a given show that the running time of LET-Path algorithm is suitable for real-time applications and LET paths are robust.However, scenario-based approach has two major issues: (1) It is impossible to build a sufficient number of scenarios that can precisely describe the uncertainties due to the lack of information on probability distribution of stop times.
(2) Even if all stop time information is available, the number of scenarios will grow exponentially.In particular,  ∑ ∈R   ×  scenarios are required to present all possible scenarios, which are generated from independent stop times, where  is average number of support points of marginal probability distribution of one stop time.
For the first issue, the computational results from Table 6 proved the robustness of LET paths found by scenario-based approach when unknown scenarios happen.Regarding the second issue, as results shown in Table 5, the number of transfers of a LET path does not depend on the scenario set, but on the network topology, and the average number of transfers is small, that is, approximately 2. In addition, the average size of set of nondominated origin-destination paths over 400 scenarios is less than 6 (Figure 9).That means in average only maximum 2×6 routes make up a nondominated origin-destination path set.Hence, for a given origindestination pair, we can determine the set of routes that cover nondominated origin-destination paths.Then we can treat these routes as an impact area of the origin-destination pair and generate stop time scenarios for this area instead of the complete network.

Appendix
Proof of Proposition 4. Suppose  * is LET -V path at time  and over Ω.According to (15),   * (V, , Q) is minimum for time  over Q with condition   * (V) being minimum.Since   * (V) is minimum, there is no -V path that dominates  * by condition in (16).At the same time, since   * (V, , Ω) is minimum at time  over Ω, for any nondominated -V path , there exists at least one scenario  ∈ Ω such that    * (V, ) <    (V, ).So there is no -V path that dominates  * by condition (17).By satisfying conditions in ( 16) and ( 17), LET -V path  * belongs to the set of nondominated -V paths at time  over Ω.
Proof of Lemma 6.We prove the following possible cases.
Case 1.According to (7), if  1 ≤  2 , the boarding trip for arriving time  1 is earlier than or at least equals the boarding trip for arriving time  2 .Due to Assumption (2), the proof is complete.
Case 2. According to (8), if  1 ≤  2 , the trip for stop time  1 +   V ( 1 ) is earlier than or at least equals that of stop time  2 +   V ( 2 ).Due to Assumption (2), the proof is complete.
Proof of Theorem 7. Suppose  * is a nondominated -V path at time  over Ω, ∀V ∈ N, and  is extended from dominated - path  via arc ⟨, V⟩ ∈ A, as then there exists a nondominated - path   that dominates  at time  over Ω. Suppose λ =   ♢⟨, V⟩.According to Lemmas 5 and 6, -V path λ dominates nondominated -V path .The theorem is proved by contradiction.
Proof of Proposition 9.If -V path  * is the fastest -V path at time  in scenario  ∈ Ω, according to (15), there are no other -V paths  such that   (V) <   * (V) and    (V, ) <    * (V, ).This also means no other -V paths dominate  * at time  over Ω.Therefore,  * is nondominated -V path at time  over Ω.

Figure 1 :
Figure 1: A simple transit network.
represents the action of a traveler boarding an arriving vehicle of route  at stop .A  is the set of in-vehicle arcs of route  ∈ R-arc ⟨, V⟩ ∈ A  , where  ≡    ∈ N  and V ≡     ∈ N  , represents the action of a traveler being in-vehicle of route  from stop  to stop   .A  is the set of walking arcs-arc ⟨, V⟩ ∈ A  , where  ≡   ∈ N  and V ≡    ∈ N  , represents the action of a traveler walking from stop  to stop   .A  is the set of alighting arcs-arc ⟨, V⟩ ∈ A  , where  ≡    ∈ N  and V ≡   ∈ N  , represents the action of a traveler alighting the current vehicle of route  at stop .

Figure 2
Figure2presents the graph model for the transit network as shown in Figure1.Let  V denote all paths connecting node  ∈ N and node V ∈ N in graph G or all -V paths in short.A well-defined -V path , defined in Definition 1, in graph G represents a choice of routes when he/she travels from origin stop   ∈ S to destination stop   ∈ S within the transit network B.

Figure 4 :
Figure4: Illustration of vector labels at nodes in the graph shown in Figure2after the termination of LET-Path algorithm.In each vector label, each value at  is the number of transfers, and each value at each of the scenarios  1 ,  2 ,  3 is the travel time of path from origin   to the current node in that scenario.Each sequence of solid arrows from origin to destination represents - path, and each sequence of dashed-line arrows represents the travel times from origin to intermediate nodes along the path in each scenario.

Figure 5 :
Figure 5: The experimental area of HCMC bus network that consists of 1,340 stops (red), 40 routes (blue), and 1,445 direct stop links.

Figure 6 :
Figure 6: Illustration of generated stop times.

Figure 7 :
Figure 7: Average CPU running time of the LET-Path algorithm with different subsets of Q per - pair.

Figure 8 :
Figure 8: Average number of nondominated paths generated by the LET-Path algorithm with different subsets of Q per - pair.
The stop times of transit vehicles in the network are shown in Table1with T = {0, . . ., 16}, Q = { 1 ,  2 ,  3 }, and C = {C 1 , C 2 , C 3 } in which each of the routes has two trips and each scenario  ∈ Q has an occurrence probability of   = 1/3.

Table 1 :
Stop times of trips in the network presented in Figure1over three possible scenarios each of which has an occurrence probability of 1/3.

Table 2 :
The choices of trips for the earliest arrival time from stop  to stop  at time  = 0 in different scenarios in the network shown in Figure1and the schedules shown in Table1.

Table 3 :
Arc time weights with boarding penalty    = 1 and alighting penalty    = 0 in the network shown in Figure Figure 3: The relationship between nondominated and LET and the fastest -V paths for departure time  over the universal scenario set Q and its subset Ω.
Theorem 7. Given departure time  ∈ T and a set of scenarios Ω ⊆ Q, every nondominated -V path  * is made up from nondominated - subpaths, where  is any intermediate node on path .Definition 8 (the fastest -V path).Given departure time  ∈ T and scenario  ∈ Q, the fastest -V path  * at time  in scenario , ∀V ∈ N, is given by * = arg min ∈ V {   (V, )} , s.t.* (V) = min ∈ V {  (V)} .(19)Proposition9. Given the fastest -V path  * at time  in scenario  ∈ Q, ∀V ∈ N, and a set of nondominated -V paths at time  over the scenario set Ω ⊆ Q, if  ∈ Ω,  * belongs to the set of nondominated -V paths at time  over Ω. Proposition 10.Given departure time  and two sets of scenarios Ω, Ω  ⊆ Q, if Ω ⊆ Ω  , the set of nondominated -V paths at time  over Ω is a subset of the set of nondominated -V paths at time  over Ω  , ∀V ∈ N.

Table 4 :
Summary of nondominated - path sets and LET - paths in the graph shown in Figure

Table 5 :
Summary of experimental results of LET paths with different subsets of Q.

Table 6 :
Comparison of robustnesses of scenario-based (SB) and certain equivalence (CE) approach.Comprehensive computational studies have been conducted using the real-size bus network in Ho Chi Minh City (HCMC, Vietnam).The experimental results