A Stochastic Programming Approach for Scheduling Extra Metro Trains to Serve Passengers from Uncertain Delayed High-Speed Railway Trains

)e metro system is an important component of the urban transportation system due to the large volume of transported passengers. Hub stations connecting metro and high-speed railway (HSR) networks are particularly critical in this system. When HSR trains are delayed due to a disruption on the HSR network, passengers of these trains arriving at the hub station at night may fail to get their last metro connection. )e metro operator can thus decide to schedule extra metro trains at night to serve passengers from delayed HSR trains. In this paper, we consider the extra metro train scheduling problem in which the metro operator decides how many extra metro trains to dispatch and their schedules. )e problem is complex because (i) the arrival of delayed HSR trains is usually uncertain, and (ii) the operator has to minimize operating costs (i.e., number of additional trains and operation-ending time) but maximize the number of served passengers, which are two conflicting objectives. In other words, the problem we consider is stochastic and biobjective.We formulate this problem as a two-stage stochastic programwith recourse and use an epsilon-constrained method to find a set of nondominated solutions. We perform extensive numerical experiments using realistic instances based on the Beijing metro network and two HSR lines connected to this network. We find that our stochastic model outperforms out-of-sample a deterministic model that relies on forecasts of the delay by a range of 3–5%. Moreover, we show that our solutions are nearly optimal by computing a perfect information dual bound and obtaining average optimality gaps below 1%.


Introduction
Metro lines are typically connected with high-speed railway (HSR) lines at some transfer stations to provide seamless transfer service for passengers. e metro system is indeed an important component of the urban transport system and is crucial to meet the transportation demand of passengers from HSR trains, especially at late night when fewer buses and taxis are operated. For instance, HSR passengers arriving at Beijing South Railway Station at night prefer using metro trains since it usually takes more than one hour waiting time to get a taxi service and it is inconvenient to take buses at night [1].
Inevitably, unplanned events such as adverse weather conditions or infrastructure failures occur in HSR operations, which may cause a major disruption. For example, on April 21 st of 2019, a disruption caused by the equipment failure due to heavy rain occurred in the Beijing-Guanzhou HSR in China, resulting in more than 50 delayed trains with the longest delay time being nearly 6 hours. On July 2 nd of station very late and risk to missing the last metro trains. e metro operator may thus consider running extra metro train services on the connected metro lines to transport passengers at the hub transfer station. We define an extra metro train service as an additional train service which operates later than the last train on the same metro line and is scheduled only in emergency cases, which includes disruptions occurring in railway or aviation systems, major events such as large concerts taking place, or particularly bad weather conditions such as blizzards. ese situations will indeed cause a sudden flow of passengers to one or more metro stations. In this study, we consider the case of a disruption occurring on the HSR network.
In practice, the duration of a disruption is usually uncertain and results in HSR trains arriving at their destination with a delay which is unknown at the time the disruption starts. When the metro operator is informed from the HSR operator about a disruption, it only receives information on which HSR trains are affected and some estimate of their delay in the form of forecasts or probability distributions and must decide how many extra metro train services to schedule. It is important that the metro operator takes this decision as soon as it is informed about the disruption to have enough time to notify the drivers, metro staffs, and passengers in the delayed HSR trains. In other words, this decision has to be taken under uncertainty, before knowing the exact arrival time of the HSR trains affected by the disruption. Subsequently, the metro operator is responsible to schedule these extra metro train services upon the HSR delay/arrival time disclosure. is scheduling task is particularly challenging due to the following three reasons: (1) e extra metro trains need to be synchronized at the transfer station with multiple HSR trains with different arrival times. Moreover, the stochastic arrival time of HSR trains requires making some scheduling decisions under uncertainty, which complicates the scheduling problem formulation and solution. (2) e extra metro train timetabling problem needs to balance a trade-off between the cost incurred by the metro operator and the "passenger costs," i.e., the number of passengers that miss the extra metro trains. To illustrate, consider the following two solutions: (i) operate one extra metro train service each time an HSR train arrives. is solution is ideal for passengers since they all could leave the hub station quickly. However, the metro company operates many extra train services, which is expensive due to high number of drivers and staff involved at late night; (ii) operate one extra metro train only when there are enough passengers to fill it, i.e., after multiple HSR trains have arrived. is solution is an economic plan for the metro operator but passengers in the earlier arrived HSR trains are subject to long waiting times. (3) Although several researchers have studied the classical "non-extra train" metro timetabling problem (i.e., not related to the dispatch of additional trains), the problem of extra train timetabling is quite different and exhibits some nontraditional features that should be considered and which we summarize in Table 1.
At present, metro operators mostly make decisions regarding extra trains manually based on their experiences and professional judgments, that is, without relying on analytical tools such as forecasting and optimization. Moreover, most related models in the extant literature rely on deterministic assumptions on future train arrival times, whereas these values in reality are not known in advance and should be treated as stochastic [3]. erefore, there is the need to study and solve the problem of extra metro train scheduling under uncertain delayed HSR passengers, which we consider in this paper.
We tackle the extra metro train scheduling problem by proposing a novel two-stage stochastic mixed-integer linear program to decide the number of extra metro train services to operate and the corresponding timetable under different HSR train delay scenarios. e goal of our model is on one hand to minimize the cost incurred by the metro operator (number of extra train services and operation-ending time) and, on the other hand, to minimize the number of passengers that would fail in getting the service. Since these two objectives are in conflict, our optimization model is biobjective. We use an epsilon-constraint method to generate a set of Pareto-optimal solutions. We also formulate (i) a deterministic model that we use as benchmark that relies on a single forecast of the uncertainty instead of multiple scenarios and (ii) a perfect information model that relaxes the nonanticipativity constraints and provides a dual (lower) bound on the optimal cost.
We performed extensive numerical experiments using realistic data from the Beijing metro system and the Beijing-Tianjin and Beijing-Shanghai HSR lines. We model the stochastic HSR train delays at the hub station using different probability distributions including Gaussian, Uniform, and Weibull. We found that solutions from our stochastic programming approach outperform out-of-sample those from a deterministic method by 3-5% on average, which is substantial and shows the benefit of accounting for uncertainty explicitly via multiple scenarios. Moreover, using the perfect information dual bound, we establish an optimality gap below 1% on average, indicating that our stochastic programming solutions are nearly optimal. e rest of this paper proceeds as follows. In Section 2, we review the literature on train timetabling problems and state the contribution of this paper. In Section 3, we formalize the extra metro train scheduling problem and its assumptions. In Section 4, we introduce the passenger and metro operator costs and derive the stochastic programming formulation of the problem. In Section 5, we present our numerical study and discuss the results. We conclude the paper in Section 6.

Related Works and Contributions
e train timetabling problem has been studied in the literature both in a deterministic setting and with consideration of uncertain passenger demand. In order to optimize the train timetable adapted to a dynamic passenger demand environment, Barrena et al. [4] used flow variables to construct a linear representation of the objective function and presented a branch-and-cut algorithm to solve this formulation. Wang et al. [5] considered a changing passenger arrival rate and proposed an event-driven model to solve the train scheduling problem for an urban rail transit network. Cadarso and de Celis [6] introduced robust itineraries to reduce the number of miss-connected passengers and proposed an integrated model to update base schedules in terms of timetable and fleet assignments while considering stochastic demand and uncertain operating conditions. Wang et al. [7] proposed a multiobjective mixed-integer nonlinear programming model to solve the problem of metro train scheduling and rolling stock circulation planning under time-varying passenger demand. In order to improve the reliability, efficiency, and attractiveness of public transport service under fluctuated passenger demand, Cao et al. [8] used holding and speed changing operational strategies to optimize real-time schedule and proposed a solution methodology based on time-space graphical techniques to minimize schedule changes. Meng and Zhou [9] designed an integrated demand-service-resource optimization model for managing the limited infrastructure and rolling stock resources to maximize operators' profits and passenger travel demand satisfaction. For more information, we refer to the review papers by Cacchiani and Toth [10]; Harrod [11]; and Cacchiani et al. [12].
Many researchers have applied stochastic programming (SP) approaches to transportation network planning under uncertainty. In order to optimize slack time allocation in train timetable on high-speed passenger dedicated lines, Niu and Meng [13] used a two-stage SP model with recourse, in which the first-stage decision allocates the slack time in the train timetabling phase and the second-stage simulates the execution of train timetable with consideration of "train dispatching" behaviors. Meng and Zhou [14] proposed a robust single-track train dispatching model under a dynamic and stochastic environment and designed a scenario-based rolling horizon solution approach to systematically generate and select meet-pass plans under different stochastic scenarios. Based on railway optimization by means of alternative graphs (ROMA) [15] and Environment for the desiGn and simulaTion of RAIlway Networks (EGTRAIN) [16], Quaglietta et al. [17] set up an innovative framework to investigate the stability of optimal dispatching plans against the dynamic evolution of randomly disturbed traffic conditions. In order to minimize energy consumption in metro operations, Li and Lo [18] formulated an integrated dynamic train scheduling and control optimization framework to satisfy the changing passenger demands during daily metro operations. Hassannayebi et al. [19] presented a robust train timetable model to adapt the dwell time variability, travel time, and demand uncertainty of metro network and improve service. ey used a two-stage simulation optimization approach based on genetic algorithm to minimize the expected passenger waiting times. Shakibayifar et al. [20] proposed a two-stage SP model to cope with stochastic fluctuation of arrival rates in an urban train timetable problem. Considering the uncertainty of a disruption happening in railway operations, Zhu and Goverde [21] formulated and solved a robust timetable rescheduling problem using a rolling horizon two-stage SP method.
We summarize some of the most relevant studies on train scheduling in relation to our paper in Table 2. As shown in the table and discussed above, although several researchers have already considered the classical train scheduling problem, both for railways and metro systems, to the best of our knowledge, none so far has considered to the extra metro train scheduling problem. Since these two problems are quite different (see also Table 1), it is not possible to tackle the extra train scheduling problem by adapting existing models from the standard scheduling literature. erefore, it is necessary to develop a new mathematical model to describe this problem, which is challenging due to its stochastic and biobjective nature as discussed in Section 1. Finally, worth mentioning are also studies that focus on the specific timetabling aspect of synchronizing the last trains in railway or metro systems, recent examples of which include Yang et al. [22]; Chen et al. [23]; and Long et al. [1]. Although the last metro train synchronization problem also considers factors such as the successful transfer of passengers and the running time of the last trains, this problem is conceptually very different than the extra train scheduling tackled in this paper, where the main decision is about how many additional trains to add at night to serve delayed HSR passengers. Moreover, the extra train scheduling problem involves dependencies between two systems (HSR and metro). Based on the achievements and gaps in the literature, the main contributions of this paper are the following: (1) We study the extra metro train scheduling problem, which is a new application in the literature that has previously not been studied. is application has practical relevance as it allows metro operators to (2) We provide a formulation to this problem, which is new and captures realistic but complex features such as the trade-off between metro operator and passenger cost (i.e., it is biobjective) and the uncertainty in the arrival time of delayed HSR trains (i.e., it is stochastic). Specifically, our formulation is a twostage mixed-integer linear SP model, where the number of extra metro trains is determined at the first stage and their schedules at the second stage. By accounting for uncertainty explicitly in the form of scenarios, our model produces first-stage decisions which are reliable in each scenarios and hence improve the robustness of the extra metro train timetable. (3) We introduce two new cost functions to model the costs incurred (i) by the metro operator when scheduling extra metro trains at night and (ii) by delayed HSR passengers that may fail to get the last metro trains. Our optimization approach accounts for both objectives. (4) We conduct realistic numerical experiments based on the metro system and HSR lines in Beijing and show that our SP approach is very effective at solving the problem. Specifically, our approach outperforms in an out-of-sample valuation, a deterministic optimization model that replaces the uncertainty with their expected value by 3-5%, which translates to a considerable amount of money in practice. We further prove the quality of our SP solutions by computing a perfect information lower bound and obtaining average optimality gaps below 1%.

Problem Statement and Assumptions
In this section, we provide a statement of the problem and its assumptions. We start below by describing the inputs to the problem, i.e., the information which metro operator's decisions are based on.
(I1) HSR lines and the connected metro lines. We are given a network that includes a set of HSR lines and a set of metro lines. HSR and metro lines are directly connected at some hub stations. In Figure 1(a), we illustrate an example of simple network consisting of one HSR line and two metro lines, where we only represent the hub station for simplicity and not other metro stations. We identify the metro lines by distinguishing between operation directions as shown in the example in Figure 1(b).
(I2) A set of uncertain delayed HSR trains. We are given a set of HSR trains with some uncertain delays, e.g., a set of trains affected by a disruption on the HSR line. For each delayed HSR train, we know the number of passengers that are onboard and that are divided into as many groups as the operation directions of the connected metro lines. For each group, we are given its volume, the transfer walking time between the platform of the HSR line and the corresponding metro platform. Finally, we are given probabilistic information to represent the arrival time of each delayed HSR train at the hub station. is can be a probability distribution or a discrete set of scenarios, each provided with a delay time and occurrence probability.
(I3) A set of candidate extra metro train services. For each metro line and operation direction, we consider a set of candidate extra metro train services that the operator may decide to schedule. For each extra metro train service, we are given its origin and destination stations, the running time between two stations, the dwell time at each station, and the passenger-carrying capacity. Given inputs (I1)-(I3), the extra metro train scheduling is the problem faced by the metro operator to serve passengers from delayed HSR When the HSR arrival time becomes known, the metro operator further schedules the selected extra metro train services by defining their departure times at the first station and the headway between two successive trains in the same operation direction. We formalize mathematically this problem in Section 4.
In the definition of our problem and related mathematical model, we assume the following: (A1) Passenger transfer activities between metro trains are not considered. (A2) e rolling stock rescheduling of the metro system is neglected. (A3) e passenger-carrying capacity of each extra metro train is fixed and given. (A4) For each extra metro train service on the same operation direction, the stopping pattern, the running time between two stations, and the dwell time at each station are the same. (A5) e passenger transfer walking time at the hub station is known and fixed. In the numerical study, we use parameters for the slowest transfer walking time among passengers but other choices are possible. (A6) A passenger is not willing to wait for a metro train service at night for more than a maximum, fixed time allowance. is allowance could be set, for instance, to the average waiting time for a taxi service at night. If the waiting time is higher, then the passenger will select another transport mode. (A7) Rescheduling the HSR system is not considered as it is exogenous to the metro operator.

Biobjective Stochastic Programming Model
In this section, we present our stochastic programming model for scheduling extra metro trains to serve uncertain delayed HSR passengers. For convenience, we start in Section 4.1 by summarizing the nomenclature. In Section 4.2, we formally describe the decision-making process underlying our optimization model. In Sections 4.3 and 4.4, respectively, we define the objective functions and constraints of the model. Since our stochastic program is biobjective, we explain in Section 4.5 our approach to solve it. Table 3, we introduce the notation that will be used to define our model. is table contains, in the order, subscripts and sets, input parameters, and decision variables.

Decision-Making Process.
After a major service disruption on a HSR line at night, the metro operator needs to run extra metro trains to transport the passengers arriving from the delayed HSR trains. e duration of a major disruption is typically uncertain and may last for several hours, resulting in HSR trains reaching the hub metro station very late, after the working shifts of the metro staff have ended. us, it is important that the metro operator decides on the number of extra train services as soon as it is informed about the disruptive event to have enough time to notify the metro staff, including drivers, and the passengers on the delayed HSR trains. In other words, this decision is taken under uncertainty. Formally, the decision-making process is defined by two stages as illustrated in Figure 2. At the first stage, the metro operator is informed about the disruption on the HSR network and decides on the number of extra metro train services x f,l to schedule. is decision is called first-stage decision (or here-and-now decision) and is made knowing the set of delayed HSR trains, the amount of passengers on these trains, and delay information for each HSR train in the form of a set of scenarios or probability distribution. However, this decision has to be taken before knowing the exact arrival times of the delayed HSR trains so that drivers and staff for the extra metro services can be notified with sufficient margin (see also our related discussion in Section 1); in other words, without knowing the scenario w ∈ W that will realize. Upon disclosure of the HSR arrival times (e.g., when the disruption is resolved and the HSR trains are rescheduled), the metro operator defines the exact schedule of the metro train services x f,l previously selected. Specifically, the operator chooses the departure time of each extra metro train service t D f,o f ,w , the headway between two adjacent train services h f,f′,l,w , and implicitly also the assignment of passengers y f,f * ,w . ese decisions are referred to as second-stage decisions (or wait-and-see decisions) as they adapt to the realization of the uncertainty, that is, they can be chosen after the uncertain scenario w ∈ W is revealed. e power of SP is that,

Objective Functions: Operational and Passenger Costs.
As discussed in Section 1, the metro operator has to consider and balance the operational cost of scheduling extra metro train services and the amount of passengers that can be served by these services. Below, we thus formalize two cost functions to describe operational and passenger costs. We start by identifying the three most relevant performance indicators for our problem: the operation-ending time of the last extra metro train service (OET), the Number of Extra metro Train services (NET), and the Number of Passengers that are Failed in getting services (NPF). e first and second indicators (OETand NET) will be used to define the operational cost, while the third indicator (NPF) to define the passenger cost. Since these indicators have different units, we convert them all into monetary units using conversion factors as suggested by one of the metro operators in China [24]. e conversion factors could be chosen by the operator depending on the relative importance that each of these three indicators has.

Operational Cost
At late night, the metro operator prefers operating fewer extra metro trains and ending operations early to  Journal of Advanced Transportation reduce operating expenses, which corresponds to having low NET and OET indicators. We quantify NET and OET in equations (1a) and (1b), respectively, and illustrate the two functions in Figure 3. As shown in equations (1a) and Figure 3(a), the metro operator pays a NET cost c NET for each extra metro train service, which is intuitive. Moreover, the operator incurs an OETauxiliary cost that is mainly due to lighting, air conditioning, and overtime pay of staff for the passed stations. e OET cost only depends on the operation-ending time and not on the number of extra metro trains as NET, which enables decoupling these two cost components. In this paper, we assume that the OET cost increases linearly with the operationending time with slope c OET , as indicated in equation (1b) and shown in Figure 3(b). Specifically, this equation captures, for each operation direction l ∈ L, the difference between the last extra metro train operation-ending time on l and the predetermined operation-ending time ψ l of the last (nonextra) metro train service. In contrast to C NET , notice that C OET (w) can only be determined at second stage and is therefore indexed by the scenario w. e total operational cost C M (w) incurred by the metro operator when scheduling extra metro trains is given in equation (1c) by summing up the two components NET and OET.

Passenger Cost.
It is well known that in general a timetable that only minimizes operational costs would be disadvantageous to passengers. is issue is even more acute in our extra metro train scheduling problem since considering operational cost alone would result in no extra metro train scheduled at all. erefore, it is imperative that passengers are also accounted for. e cost for passengers could be modeled using performance measures such as the number of transported passengers (e.g., [1]), passenger travel time (e.g., [25][26][27]), passenger waiting time (e.g., [4,28,29]), and delay time [30]. Although minimizing total or average waiting time is a common objective in train scheduling, in case of last/ extra train optimization (i.e., at late evening/night), most approaches regard the feasibility (i.e., passengers reaching their house or not) rather than optimality in terms of travel time (e.g., [1] and references therein).
us, given the peculiarity of our scheduling problem, in this paper, we propose using the NPF as passenger cost function since passengers are eager to leave the transfer station at night but also include constraints on the maximum time passengers are willing to wait for their metro connection. e NPF is defined as the number of passengers who cannot leave the hub transfer station by any extra metro train service.
Equation (2) quantifies NPF, i.e., the passenger cost. As shown in Figure 4, a unit cost c NPF is imposed to each passenger that fails to use any extra metro train, giving the total passenger cost C P (w).
Our biobjective formulation below minimizes the expected costs over scenarios of the metro operator and passengers.

Constraints.
We specify below the different constraints in our problem. being assigned to extra metro train service f (i.e., σ f,f * ,w � 1), then the transfer between f * and f should be possible for such passengers, i.e., the transfer waiting time t W f,f * ,w should be nonnegative. Constraints (4c) are similar to the former constraints but put an upper bound on the waiting time equal to ϑ (minutes), which is the time allowance for passenger transfer waiting time. As discussed in our assumptions, if the waiting time is more than ϑ, then passengers would select other transportation modes rather than waiting for a long time in the metro station at night.

Mapping between Continuous and Binary Passenger Assignment Variables
Constraints (5) are used in the model to map the passenger assignment variables y f,f * ,w to the 0-1 binary passenger assignment variables σ f,f * ,w . ese constraints model the following if-then condition:

Passenger Flow Balance
Constraints (6) ensure that the total number of passengers assigned to the connected extra metro train services on the same operation direction l are less than or equal to the volume of delayed passengers from train f * that transfer to operation direction l.

Capacity of Extra Metro Trains
Constraints (7) enforce the maximum passenger-carrying capacity of each extra metro train.

Starting Time of the Extra Metro Train Service at the Origin Station
.
Constraints (8) impose each extra metro train to leave the hub station after the last scheduled (i.e., nonextra) train on line l has departed.

Headway Time between Consecutive Metro
Trains. (9b) Constraints (9a) define the headway time between two consecutive metro train services f ′ and f on the same line l, i.e., the difference between the departure times of two consecutive trains from the origin station. Note that the candidate train services are sequentially numbered and must be selected in the sequence. If two consecutive train services f ′ and f on line l are selected by the metro operator, then constraints (9b) enforce the headway time between them to be no less than a minimum headway time h min , which is needed to ensure safe movements.

Mapping between First-Stage and Second-Stage Variables
Constraints (10) map the second-stage decision variables y f,f * ,w , t D f,i,w , and t A f,i,w to the first-stage decision variables x f,l to describe whether train service f on line l is selected by the metro operator for serving delayed passengers.

Departure Time at Intermediate Stations
Constraints (11) ensure that the departure time of an extra metro train service f at an intermediate station i is no smaller than the arrival time of this train at the same station plus the minimum required dwell time.

Epsilon-Constraint Formulation.
Recall that we aim to solve a problem which is not only stochastic but also biobjective and is defined by objective functions (3) subject to constraints (4)- (12). Since all objective functions and constraints are linear, this mathematical program is classified as a biobjective mixed-integer program.
Several approaches exist in the literature to determine the set of nondominated (i.e., Pareto-optimal) solutions of a multiobjective optimization problem. e most common approaches are known as scalarization techniques. ey construct a single-objective problem related to the original multiobjective one and solve it usually multiple times to find some subsets of nondominated solutions [31]. One such scalarization technique is the weighted-sum method, in which the objectives are combined with a convex combination into a single objective. Although the weighted-sum method is guaranteed to produce Pareto-optimal solutions, it also has the well-known drawback that it can only find Pareto-optimal solutions that lie on the convex hull of the nondominated set. In other words, if the nondominated set (i.e., frontier in our case of two objectives) is nonconvex, then not all Pareto-optimal solutions can be found [31]. To overcome this shortcoming, we chose a different scalarization technique known as the epsilon-constraint method, which retains only one objective for minimization and turns the others into constraints.
Our epsilon-constraint formulation minimizes the expected passengers cost alone whilst imposing a maximum expected cost for the metro operator equal to (ε): and (4) − (12). (13c)

Journal of Advanced Transportation
By repeatedly solving model (13) with different values for ϵ, we can approximate the Pareto front of the proposed biobjective optimization problem.

Numerical Study
In this section, we present our numerical experiments based on a real-world network from Beijing in China. We describe our case study and the scenarios of the uncertainty in Sections 5.1 and 5.2, respectively. We then discuss our result, starting from the trade-off between passenger cost and metro operator cost in Section 5.3, followed by a performance comparison between our SP approach, a deterministic benchmark, and a dual bound in Section 5.4. We conclude in Section 5.5 by evaluating the methods out-of-sample.

Test Case Description.
Our numerical experiments are based on the network and operational data from the Beijing metro system and two HSR lines: Beijing-Tianjin (BT) and Beijing-Shanghai (BS). Specifically, the network we consider consists of 44 stations in total and is illustrated in Figure 5. Regarding the two HSR lines, we consider 20 delayed trains arriving at the hub station, i.e., BSRS. Table 4 shows the planned arrival time and passenger-carrying volume of each train. Since in China all HSR passengers need to book their tickets in advance with a specified departure time from the origin station, the precise number of passengers onboard different trains is known from the ticketing system.
Our optimization model is solved using CPLEX 12.3 with default settings as the mixed-integer linear programming solver. e experiments were performed on a computer equipped with an Intel ® Core TM i7-8550 CPU @ 1.80 GHz processor with 8 GB RAM.

Scenarios of the Uncertainty and Computational Time.
We model the stochastic arrival time of each delayed HSR train at the hub station using Gaussian, Weibull, and uniform probability distributions, which are commonly used in the literature to model train delays [17,21,[32][33][34][35].
ese distributions are shown in Table 5 and are defined so that they all have the same expected value of one hour. In the experiments presented here and in Section 5.3, we focus on the Gaussian distribution alone. We subsequently consider all the three distributions in Sections 5.4 and 5.5 to assess the robustness of our findings towards the uncertainty.  We refer to the scenarios used in the SP model to find the decisions as the in-sample scenarios. It is well known in stochastic programming that the quality of the first-stage decisions are affected by the quality and number of insample scenarios. Typically, using more scenarios results in a better approximation of the uncertainty distribution, hence a better decision. On the other hand, the number of variables and constraints in the model increase with the number of scenarios. us, the number of in-sample scenarios is limited by the available computing power and time. In sum, choosing the number of such scenarios entails balancing a trade-off between solution quality and computation and is usually nontrivial.
We investigate this trade-off by assessing the solvability of our SP model for a number of in-sample scenarios varying between 1 and 10. For each number of scenarios n, we solve the model 5 times, each time sampling different sets of n scenarios from the probability distribution. Figure 6 shows the computational time results, including the minimum, average, and maximum computational time among the 5 runs. As expected, the computational time increases with the number of scenarios. We can see that the model is solvable relatively quickly for 9 scenarios, for which the average and maximum running times are 523 and 960 seconds, respectively. However, the average and maximum running time increase, respectively, by about 140% and 240% when moving from 9 to 10 scenarios. erefore, we choose a number of in-sample scenarios equal to 9 in our experiments, for which the stochastic program is solved to optimality in modest computation time. e 9 scenarios that we use in Section 5.3 are displayed in Figure 7, showing the arrival time of each HSR train at BSRS. e occurrence probability of scenarios 1 to 9 is 0.09, 0.12, 0.11, 0.11, 0.1, 0.12, 0.12, 0.1 and 0.13, respectively.

Trade-Off between Passenger and Operator Cost.
In this section, we investigate the biobjective aspect of our stochastic optimization problem. We consider the 9 in-sample scenarios introduced in Section 5.2 and use the epsilonconstrained method to produce the Pareto frontier illustrated in Figure 8. To obtain this figure, we first solved a model that minimizes passenger cost alone (i.e., without epsilon-constraints), which provided the extreme of the Pareto frontier in the right-bottom corner of the figure with a passenger cost of zero and operator cost of approximately 590,000 RMB. en, we obtained 9 other Pareto solutions by setting epsilon to values lower than 590,000, specifically 550,000, 510,000, 460,000, 420,000, 370,000, 330,000, 280,000, 230,000, and 190,000. For each value of epsilon, we display with small solid circles the metro operator and passenger cost for the timetables obtained under each of the 9 scenarios. e larger circles represent the expected cost of these 9 timetables for each value of epsilon and we call such solutions the SP solutions.
We summarize in Table 6 the most relevant information from Figure 8, including the metro and passenger cost of the SP solutions. e values in this table and the convex shape of the frontier suggest diminishing returns when lowering passenger cost. To elaborate, consider solution 1 in the table in which the operator cost is low but passenger cost is high (i.e., the leftmost point in Figure 8). By moving from   solution number 1 to number 2, for an additional cost of about 50,000 RMB the operator can reduced the percentage of failed services by 15%, which is a significant reduction. e operator can further reduce this percentage by 13% for a similar cost increase when moving from solution number 2 to number 3 in the frontier. However, the more we move to the right in the frontier, the more expensive it becomes to reduce the percentage of failed services. For example, moving from solution number 8 to number 9 only reduces this percentage by an additional 3% for a similar increase in operating costs. is finding shows that, in our case study, the operator should strive to reduce the percentage failed  services down to at least 30-40% as it is relatively cheap to do so, but satisfying more than 90-95% of passengers might be too expensive and hence not economical.
For the extreme SP solution number 10 in which all passengers are served (i.e., passenger cost equals 0 and operator cost is maximal), we report in Figure 9 the departure time of each extra metro train service from the BSRS station, respectively, for operation direction 1, 2, and 3, under each scenario. As illustrated in these figures, the firststage decision (i.e., the number of extra metro train services) is the same in each scenario, that is, we need 7 extra metro train services for each operation direction. It is not a general

Deterministic Benchmark and Dual Bound.
In the following, we define a deterministic model that replaces future uncertainties with their expected values and that we use as benchmark to our stochastic model. We call the solution from this model the expected value solution, henceforth EV solution. We also define a perfect information model that provides a lower bound on the optimal cost. We name the solution from this model the perfect information solution, or PI solution. We discussed both models below.

EV Solution.
A common solution procedure to solve a stochastic optimization problem is to replace all random variables with their best available estimate, namely, their expected value, and solve a deterministic model. In our case, this means constructing the timetable using the expected values of the delay time of high-speed trains rather than multiple delay scenarios. To formalize, the EV solution is the extra metro train timetable that is obtained as optimal solution to the following formulation: and constraints (4a)- (12). EV and SP solutions are computed based on models that differ in both objective function and constraints. Consequently, the resulting objective value from the EV model is not directly comparable to that of our SP model. To provide a fair comparison, we need to evaluate the EV solution using the same 9 scenarios that we used in SP. Specifically, the EV solution provides a first-stage decision, i.e., number of extra metro trains, based on deterministic information. We fix this decision, and for each scenario w, we optimize the secondstage decision (i.e., the timetable) and calculate the cost of passengers and metro operator. e sample average across scenarios represents the expected cost of the EV solution.

PI Solution.
e PI solution is obtained by relaxing the nonanticipativity constraints embedded in our SP model and assuming full information about the future. Mathematically, this means making the first-stage decision (i.e., number of extra metro trains) scenario-dependent, i.e., this decision also adapts to the uncertainty.
e PI solution provides a dual (lower) bound on the optimal cost since it exploits information that in reality is not available to the decision maker. For the same reason, this solution is also not feasible. In practical terms, PI solutions are infeasible as the operator needs time to arrange the unplanned shifts of metro drivers and staff, which cannot be done after the HSR train has already arrived. Formally, for a given scenario of the uncertainty w ∈ W, the PI solution solves the following hindsight model: min C P (w), (16) subject to C M (w) ≤ ε, (17) and constraints (4a)- (12). PI solutions should also be evaluated on the same 9 scenarios used in the SP and EV solutions.
We now compare the SP, EV, and PI solutions for a specific point in the Pareto frontier corresponding to ε � 550, 000. For the three solutions, Figure 10 shows the total expected cost of passengers and metro operator resulting under the three probability distributions in Table 5.
As shown in Figure 10, SP solutions decrease the expected total cost compared to EV solutions by 3.93%, 2.23%, and 3.10%, respectively, for Gaussian, Weibull, and uniform distributions. On average across distributions, our SP method improves the EV approach by 3.09%, which is a significant improvement. e difference between EV and SP solutions is also known as the value of stochastic solution (VSS; [36]). e high VSS value indicates that accounting for the uncertainty explicitly through multiple scenarios is valuable in our extra metro train scheduling application and would allow the metro operator to save a considerable amount of money to obtain the same service level to passengers.
Compared to SP, using perfect information decreases the expected total cost by 1.31%, 0.23%, and 0.72% in the three distributions. In other words, our SP solutions achieve an average optimality gap of 0.75%, i.e., they are near optimal. e difference between the SP and PI objective values is known as the expected value of perfect information (EVPI; [36]). On average, the EVPI that we obtain is 4146 RMB, which is relatively low and represents the maximum cost the metro system would be willing to pay to access information about the uncertainty in advance.

Out-of-Sample Valuation.
Recall that we employ a twostage SP model to solve our extra metro train scheduling problem and that the solution from this model is tied to the scenarios that are chosen, i.e., the in-sample scenarios described in Section 5.2. Due to limitations in the available computing power and the complexity of the SP model, we selected 9 in-sample scenarios. Our SP model implicitly assumes that these 9 scenarios are the only possible realizations of the uncertainty. However, these scenarios only provide a discrete approximation of the entire uncertainty outcome, which is given by the full delay probability distributions of all incoming HSR trains. As a consequence, the SP results based on the in-sample scenarios might be optimistically biased.
To investigate, if this is the case, and to obtain a fair and unbiased method comparison, we evaluate the performance of the three solutions (SP, EV, and PI) out-ofsample. e out-of-sample valuation proceeds as follows. We sample 50 new scenarios for each HSR train from each Journal of Advanced Transportation of the three probability distributions illustrated in Table 5 (i.e., Gaussian, Weibull, and uniform). For each scenario, we fix the SP first-stage decision and obtain the secondstage optimal decision by solving the recourse optimization problem. e sample average over the 50 scenarios provides the out-of-sample expected cost of the SP solution. We proceed analogously using the EV model discussed in Section 5.4 to obtain the out-of-sample valuation of the EV model. Regarding the PI model, we proceed as in SP and EV with the exception that the firststage decision is not fixed but is free in each out-of-sample scenario, i.e., it can adapt to the scenario of the uncertainty. Figure 11 reports the probability density function of the total expected cost (operator and passengers) and the expected passenger cost resulting from the out-of-sample valuation of the different methods under Gaussian, Weibull, and uniform distribution. As we can see from Figures 11(a), 11(c), and 11(e), the SP distribution is substantially shifted to the left compared to the EV distribution.
e total expected cost in SP is 4.19%, 4.91%, and 3.12% lower than the analogous cost in EV for the three distributions. Moreover, the perfect information lower bound is 0.99% lower than SP on average across distributions. ese results are consistent with the in-sample results previously discussed and provide support for our choice of the 9 scenarios. Finally, the relative performance of the methods is similar when considering the probability density function associated with passenger cost displayed in Figures 11(b), 11(d), and 11(f ).

Concluding Remarks
In this paper, we studied the problem of scheduling extra metro trains to serve uncertain delayed high-speed railway passengers, which is a new application in the literature. To solve this problem, we proposed a two-stage stochastic program that we formulated as a mixed-integer linear programming model. Our optimization problem is biobjective and balances the operational cost incurred by the metro operator with the passenger cost, defined as the number of passengers that miss the last metro trains. To illustrate the relevance of our two-stage SP approach, we performed numerical experiments using real-world data from the Beijing metro network and two HSR lines connected to this network. We generated a Pareto frontier and provided insights on how to balance the operator and passenger costs. Additionally, we compared the performance of our SP solution, a deterministic model that uses a forecast of the uncertainty (EV solution) and a hindsight model that relaxes nonanticipativity constraints (PI solution). We found that our SP solution evaluated out-of-sample which improves the EV solution by about 3% on average and that the former solution exhibits average optimality gaps below 1%, hence it is near optimal.
Future research avenues include the following extensions of the problem. First, we could consider a detailed passenger's origin-destination (OD) demand and passengers' behavior to provide better services for passengers [37]. e resulting problem would become more complex as it would require scheduling extra trains in the whole metro network and not only at the connecting metro-HSR station, as we do in this paper. Alternatively, stopping patterns could be included in the model to identify the optimal stop pattern for each extra train service according to the passenger OD demand. Finally, another option would be considering the number of allocated passengers to metro trains as first-stage decision so that the metro operator could more conveniently dispatch the rolling stock. is extension would also be challenging due to involving the joint schedule of extra metro train services and rolling stock.

Data Availability
Previously reported data were used to support this study and are available at the website of Beijing Subway: https://www. bjsubway.com/station/xltcx/line1/.

Conflicts of Interest
e authors declare that they have no conflicts of interest.