Modeling the Peak-Period Bus Commuting Behavior with Staggered Work Hours Using a Regret-Minimizing Learning Method

. Te staggered work hours (SWH) policy is a practical strategy for managing travel demand, aiming to spread out the temporal distribution of travel volume by adjusting the schedules of travelers’ activities. Te infuence of the SWH policy on the commuting patterns of passengers using bus transit is not yet clear. We addressed this issue in a many-to-one bus line, treating commuters as Q-learning agents learning to minimize regrets by selecting appropriate bus runs. Te learning outcomes reveal a SWH-induced equilibrium, where commuters departing from the same station with the same work start time experience identical minimal commuting costs, regardless of the chosen bus. Subsequently, we investigate the efectiveness of SWH policy by manipulating two key control variables: the division of travel demand between two categories of travelers and the staggered time interval. Te results confrm that congestion during peak hours can potentially be mitigated by carefully selecting the above two key parameters. Correspondingly, we provide optimal control boundaries for these two parameters to design an efective SWH policy. Furthermore, we explore the combined impact of physical distancing and SWH policy on trafc fow patterns during an epidemic outbreak. Concurrently, we assess the infection risk through a surrogate index, revealing that the SWH policy has a positive efect in mitigating the risk of contact exposure.


Introduction
One feasible solution to alleviate urban congestion is the implementation of travel demand management measures.Tese measures aim to redistribute travel demand in terms of space, mode of travel, or time by modifying the travel behavior of trafc participants.Usually, travel demand management policies include ridesharing, car sharing, ondemand services, tolls, and fexible work arrangements.Staggered work hours (SWH) policy is one type of fexible work arrangements, and it is the primary focus of this paper.Unlike rigid work schedules, SWH policy permits employees to maintain the same number of daily working hours but with varying work schedules.One signifcant advantage of the SWH policy is its ability to disperse travel demand during the peak hours, thereby mitigating peak congestion and reducing commuting times.Comprehending how SWH policy infuences commuters' travel behavior is essential in the development of an efective SWH policy and optimizing the transport system during its implementation.
Te optimal distribution of work start times was frst examined by Henderson [1].He carried out a theoretical study by considering trafc congestion and productivity efects.Arnott [2] generalized Henderson's model by considering frm heterogeneity and analyzed optimal congestion tolls.Rather than relying on the fow congestion model (e.g., Refs.[1,2]), subsequent studies depict the dynamic congestion pattern during peak periods through the bottleneck model introduced by Vickrey [3].A considerable number of studies have expanded Vickrey's model in diverse ways.For instance, scholars have taken into account limitations such as vehicle parking [4], broadened the model to networks with multiple bottlenecks [5], and evaluated the impact of pretrip information on the selection of departure times [6,7].For a thorough understanding of the extensions and applications of the bottleneck model, interested readers are directed to the review literature by Small [8] and Li et al. [9].Ten, using the bottleneck model, several researchers have studied the infuences of the SWH policy on trafc congestion and urban productivity [10][11][12].Recently, Yang et al. [13] explored the infuence of SWH policy on the departure time choice behavior of commuters by experimental studies.
Public transit networks handle substantial passenger loads, particularly during morning and evening peak hours.Terefore, it is essential to understand how commuters choose their departure times when utilizing urban mass transit services.Several studies have tackled this problem.For instance, Huang et al. [14] developed a departure time choice model for a one-to-one transit line and assumed that commuters choose their departure time by trading of the costs of in-vehicle crowding with the costs of schedule delays.In a subsequent study, Tian et al. [15] expanded commuting patterns to a many-to-one transit line and provided equilibrium properties.De Palma et al. [16,17] discussed the formulation of in-vehicle crowding in public transport and obtained the optimal pricing and the optimal scheduling.Other than in-vehicle crowding, some researchers assume that the primary congestion cost of travelling is the waiting time at oversaturated stations.For example, Yang and Tang [18] depicted a rail transit bottleneck model, where commuters select their departure times by trading of between schedule delay costs and queuing time costs.Ten, they proposed a fare-reward scheme to relieve queuing congestion at transit stations.Tang et al. [19] proposed a hybrid fare scheme by considering the heterogeneity in transit commuters' scheduling fexibility.However, nearly all previous studies have focused on equilibrium departure rates, assuming that commuters share the same work start time.Te travel behavior of public transit commuters remains unknown when implementing a SWH policy.Our work aims to fll this gap.
While mathematical equilibrium models are efective in examining equilibrium properties, they encounter analytical challenges when dealing with complex real-world factors, such as user heterogeneity, time-varying demand, and fexible transport services.In addition, these models overlook the crucial aspect that travelers learn and self-adjust their behavior over time.As such, agent-based simulation technology appears to be one feasible way to deal with user equilibrium.Such a method is inherently superior insofar as it can depict individual responses to the work start time and the interaction between other participants much more realistically.It has been proved that agent-based simulation technology is efective, fexible, and expansible in trafc system modelling [20,21].Yang et al. [22] utilized a multiagent-based Q-learning algorithm for evaluating the infuence of SWH policy by simulating travelers' time and location choices in their activity patterns.Xie et al. [23] simulated commuter departure time choices based on the BM reinforcement learning model in a many-to-one bus transit scenario.In our approach, passengers select their departure time guided by the regret theory.Tis theory posits that an agent's decision is infuenced not only by the associated utility but also by the anticipated disutility (regret) for not making a better decision [24].Te adjustment process is modeled using a multiagent-based Qlearning method, where the regret value serves as the reinforcement learning signal to guide choices.
So far, the SWH policy has been applied to tackle issues like congestion, but it could also bear signifcance for addressing other societal challenges, such as public health crises during an epidemic outbreak.For instance, in the context of the COVID-19 pandemic, certain measures like lockdowns or travel interventions have been implemented to decrease interactions among travelers on public transport (Tomas et al. [25]).One such intervention is physical distancing, which mandates that the occupancy of vehicles or facilities never exceeds a predetermined threshold (e.g., 50% of the maximum vehicle capacity).Consequently, the initial problem transforms into peak-hour bus commuting with a capped bus capacity.Ten, our primary concerns involve assessing the collective impact of capped bus capacity and SWH policy on trafc fow patterns and understanding the properties of the resulting equilibrium state.In addition, we seek to explore the role of the SWH policy in reducing the probability of infection.To our knowledge, these topics have not been previously discussed in the realm of public transit systems.Te insights gained can ofer valuable recommendations for adjusting public transportation operations and scheduling residents' work hours during epidemic outbreaks.
In summary, the purpose of this study is to delineate the departure patterns of commuters travelling on a capacitylimited urban bus transit line.Specifcally, we aim to understand how the combined efects of SWH policy, line characteristics, and physical distancing infuence commuters' choices of departure times.We make the following contributions: (1) We derive the SWH-induced equilibrium using a multiagent-based Q-learning algorithm, in which the regret value is considered as the reinforcement learning signal guiding departure time choices.
(2) We evaluate the SWH policy by analyzing the properties of the equilibrium state of trafc fow in terms of the commute travel cost and time-space distribution of departure fows.
(3) We examine the combined efect of physical distancing and SWH policy on trafc fow patterns on public transit during an epidemic outbreak.
(4) We provide optimal values for the staggered time interval and the proportion of the staggered population in designing a SWH policy, both with and without the requirement of physical distancing.
We emphasize that we test the SWH policy only from the perspective of commuters, i.e., based on minimizing their travel costs.Te beneft analyses of the other two participants-the bus transit operator and the company-are beyond the scope of our discussion.

Problem Definition
We consider a bus line connecting a central business district to several residential areas, as depicted in Figure 1.Commuters adjust their departure times in response to assigned work starting times, taking into account in-vehicle crowding costs and schedule delay costs.Tis scenario aligns with previous studies [15,23], ofering ideal reference lines for model verifcation.
More precisely, the bus line includes Sboard-only stations H S and a destination station W. We refer to stations near the start of the line as upstream stations, and stations near the end of the line as downstream stations.During the peak hours, a number of N �  S s�1 n s commuters travel through the bus line, where n s is the number of commuters departing from station H s .Te bus company schedules M buses during the peak period, and each bus has a maximum capacity of C bus (passengers).Te buses arrive at the destination station W with fxed-time headway H bus (h).Let A � 1, 2, . . ., M { } be the index set of buses, where 1 denotes the frst bus reaching the destination.For modelling tractability, the running time in each of two neighboring stations from H 1 to W is assumed to be constant and is denoted by τ s .
All commuters are considered to be frequent users who are acquainted with the bus timetable through day-to-day learning or have complete information about the schedule provided by the trafc authorizer.Under this assumption, passengers experience zero waiting time at the station.Terefore, the departure time choice problem transforms into a bus run choice problem, illustrating how commuters select a bus that minimizes their total generalized commuting costs.
Figure 2 illustrates the scheme of the multiagent-based learning process.Commuters respond to the imposed work starting time by minimizing their disutility of travelling and arriving by selecting their departure time.In this dynamic system, one agent must alter his/her departure time to respond to other agents' decisions.When an agent takes a particular bus, it will increase the degree of in-bus congestion, afect the ride experience of other agents, and consequently infuence their decision-making.By considering the mutual interactions among commuters, all participants' schedules can be calculated.As such, the accumulative volume distribution can be determined.For these reasons, such models are suitable for investigating how individual agents interact and learn to maximize their rewards.All agents are expected to converge to the state represented by the equilibrium if they are rational.In other words, each agent aims to choose the strategy that maximizes their utility function, creating a steady state-a combination of strategies for all agents-where no agent can beneft by unilaterally changing their strategy.
Here, we clarify two key components in Figure 2.  Te number of commuters within the two groups follows the division of Φ(ρ, 1 − ρ), where ρ is the proportion of Group 1 among all commuters.For instance, Φ(0.7,0.3)represents that 70% of the commuters belong to Group 1 and the remaining passengers are in Group 2.

Bus Operation Policy.
Here, we are referring to a bus operation policy concerning physical distancing during epidemics.We consider two scenarios: one with the adoption of physical distancing and another without it.In the frst scenario, we assume normal conditions where urban buses can be used up to their full physical capacity.In the second scenario, which pertains to epidemic conditions, the occupancy rate of vehicles must not exceed a predefned threshold to ensure safe social distancing, i.e., 50% total occupancy.Terefore, the control variable relevant to the bus operation policy in this context is the bus occupancy.

Multiagent-Based Q-Learning Model
In our approach, commuters are viewed as Q-learning agents who make departure time decisions.In what follows, we use the words "commuter" and "agent" interchangeably thereafter.One agent's decision will infuence other agents' decisions when travelling in the same bus line.For example, an agent choosing to take a certain bus will increase the degree of congestion in this bus, thus afecting other agents' ride experience.To avoid congestion or capacity limitation, the agent who initially decides to take the same bus may select a new bus, which will again infuence other agents.
Te following basic concepts need to be defned in advance when implementing the Q-learning algorithm.
(i) Action Set: Tis corresponds to the set of bus runs, as we have transformed the problem of choosing departure times into a bus run selection issue.(ii) Reward: Tis represents the immediate feedback received upon taking a bus, and in our study, it is the inverse of the generalized commuting cost.Te value of the commuting cost is bus-dependent, i.e., the number of agents who take t\he same bus.(iii) Q-Table : Utilized for calculating the maximum expected future rewards associated with an action.In this paper, the regret value serves as the reinforcement learning signal.Each agent maintains a Q-table that stores regret values for each bus run.A lower regret for a particular action implies a higher reward or, equivalently, a lower cost associated with taking that action.
Algorithm 1 presents the pseudocode of such a learning process in a daily iterative manner.At the beginning of a learning episode, agents receive the average congestion cost of each bus run based on previous days.Afterward, each agent chooses a bus run by using the ε − greedy policy derived from the Q-table.Ten, the agent takes the bus and records its commuting cost (Section 3.1).When the travel is fnished, the agent estimates his/her regret using the actual commuting cost and the received history information (Section 3.2).As an intermediate step, each agent also estimates the costs of his/her nontaken buses to compute the regret.Eventually, the Q-table is updated and guides the bus run selection on the next day (Section 3.3).

Generalized Commuting Cost.
We use the term "reward" instead of "cost" for consistency in our terminology.Te reward is inversely associated with one agent's generalized commuting cost from taking a bus run.Generally, commuting costs encompass ticket fare, crowding costs, in-bus travel costs, and penalties for schedule delays (early or late arrival).For simplicity, we assign a zero value to the ticket fare since commuters leaving from the same station incur identical fares, which do not impact their departure time choices.Besides, the travel cost is the same for all commuters departing from the same station, and it does not infuence the departure time choices of commuters.Terefore, without loss of generality, we set the value of the travel cost to zero from the same station in our discussion.In summary, commuters merely make their bus runs choices by trading of their in-bus crowding costs and the schedule delay penalties.
Specifcally, let TC a s denote the total commuting cost of a commuter who departs from station H s and takes a bus run a ∈ A. TC a s is given in the following equation: TC a s � c crowdness (a, s) + c delay−penalty (a). ( In equation ( 1), c crowdness (a, s) is the commuter' crowding cost by taking bus run a at station H s , and its value is determined by the degree of crowding efects and the inbus time.Ten, c crowdness (a, s) can be calculated by where n m a indicates the number of commuters from station H m taking bus a and τ k is the time spent on the bus between two neighboring stations H k and H k+1 .Te function g(•) calculates the crowding cost per unit of in-bus travel time, which is assumed to be monotonically increasing with the number of commuters carrying on.
In equation ( 1), c delay−penalty (a) indicates the schedule delay penalty with respect to the scheduled work start time by taking bus service a.We assume that there is a bus arriving at the workplace W punctually, and this bus run is labeled by a * .We also call a * the work start time.In this way, any bus run with index a < a * will ultimately arrive early with an early arrival time of a * − a, while any bus run with index a > a * will arrive late with a late arrival time of a − a * .Tus, the schedule delay cost c delay−penalty (a) is given as

4
Journal of Advanced Transportation where the coefcients β and c are the costs of a unit schedule delay that is early and late, respectively.According to Small [8], we set 0 < β < c.All commuters are assumed to be homogeneous regarding the value of time, the schedule delay coefcients, and the feeling of congestion.Heterogeneous commuters can be easily distinguished by their diference in the value of the travel time and schedule delay costs.To make the conclusions more concise, we do not consider the departure choices of heterogeneous commuters in this work.

Regret Estimation.
Within the Q-learning algorithm, regret serves as a reinforcement signal, guiding commuters to minimize their estimated regret.To calculate regret, a commuter must possess comprehensive knowledge of (i) the average cost incurred by the commuter and (ii) the average cost of the best-fxed action in hindsight.Unfortunately, determining the latter necessitates advance knowledge of the commuting cost for all bus runs each day, a task typically impossible in reality.To solve this, Romas et al. [24] proposed an alternative defnition of regret that describes the estimated regret of each action.A commuter can estimate regret according to this model by combining global and local information.
Te global information refers to the mean estimated reward of all bus runs in the system.As suggested by Romas et al. [24], such information can be collected by a central authority at the end of each day and sent to terminal clients through a mobile app.Here, the app recommendations are merely used to calculate the agents' regrets in their decisionmaking process.For a bus run a ∈ A, let r(a t s ) be the reward for taking bus run a at station H s on day t.Te value of r(a t s ) is inversely associated with the commuting cost TC a s , i.e., r(a t s ) � −TC a s .Using such information, the app can compute the mean reward for all bus runs.At a given station H s , let r(a s ) be the mean reward of taking bus a at station H s up to time T, and can be calculated by Te local information, on the other hand, is the actual reward an agent gained.Te history estimate of an action can be defned as where  r(a t s ) represents the most recent reward estimate of one agent for taking bus run a on day t departing from station H s .More specifcally, the value of  r(a t s ) is given by equation ( 5), depending on whether or not the action is executed in the current day.We use _ a t s to distinguish the bus run taken by the agent on day t from any of its other buses a t s .If a t s � _ a t s ,  r(a t s ) equals to the experienced reward by taking bus run _ a t s .Otherwise, we assume the reward of nontaken actions is the same as the previous day's estimation.Tat is,  r(a t s ) can be approximated by the most recent observation: Building upon the local and global information from the above defnitions, we can now formulate the estimated action regret.Let R a s denote the estimated regret of taking bus run a at station H s up to day T, with the formulation provided in equation (6).Te former term on the right-hand side of equation ( 6) is a linear combination of the local average reward and the global average reward by taking a bus.By maximizing the reward, we can fnd the best estimated bus run with the maximum expected reward.Te latter term on the right-hand side of equation ( 6) represents the history estimates of taking a bus.Tus, the estimated action regret R a s can be seen as an estimate of the average amount lost up to time T for not taking the best estimated action.
(2) Initialize history of estimates: E � 0; (3) Initialize learning and exploration rates: Journal of Advanced Transportation

Learning Process.
A sketch of the learning process is illustrated in Algorithm 1.In the Q-learning model, we use the ε − greedy principle to balance exploration and exploitation.Te ε − greedy principle works as in equation (7).A uniform random number between 0 and 1 is generated and then compared with ε.We call ε the exploration rates.If the new generated number is smaller than ε, we choose to explore, i.e., not to exploit what we have learned so far.In this case, the bus run is selected randomly, independent of the action-value estimates.Otherwise, the ε − greedy approach selects the bus run with the highest estimated reward most of the time.
Taking a commuter departing from station H s for instance, his/her learning process works as follows.At each day t ∈ [1, T], he/she receives recommendation information r(a s ) from the app.Ten, he/she chooses a bus _ a t s ∈ A obeying the ε − greedy principle.Upon arriving, the commuter calculates his/her commuting cost r( _ a t s ) immediately.Afterward, the commuter updates his/her history action estimation E using equation ( 5) and calculates the estimated regret of taking a bus run _ a t s using equation (6).Finally, the commuter updates the Q value of action _ a t i using the estimated action regret for that action, as follows: where α is the learning rate.

Spatial-Temporal Characteristics.
We frst plot the aggregative travel profle in Figure 3 without implementing a SWH policy, i.e., a * 1 � a * 2 � 40, serving as a reference line for comparative experiments.Besides, the commuting costs of commuters on each bus are also calculated at each station.Te result is represented in box plots (25%-75% quartile, 1.5 IQR) as shown in Figure 4.
We could fnd the following observations from Figures 3  and 4: (1) Commuters from upstream stations utilize more bus services than those from downstream stations.For example, commuters departing from station H 1 take the bus services in a range of [24,45], and this range decreases to [26,44], [31,43], and [32, 43] for commuters departing from H 2 , H 3 , and H 4 , respectively.Tat is to say, the farther the station is from the workplace, the longer the duration of the commuting period.(2) Te profle of the cumulative number of departures exhibits a single peak shape.Under the hypotheses that per time unit cost of a late arrival is higher than per time unit cost of an early arrival, the timedeclining rate of the late-arriving commuters is higher than the time-increasing rate of early-arriving ones.Due to the limited bus capacity, buses around the on-time service (a * � 40) are fully occupied.(3) Te commuting cost at stations H 1 , H 2 , and H 3 exhibit centralized distributions.Tat is to say, commuters from the same departure station have almost the identical and minimal commuting costs regardless of which bus they take.In other words, user equilibrium is almost achieved.We use the term "almost" because the standard deviation of commuting costs for users departing from the station is relatively high.
Te aforementioned simulation results align closely with the referenced analytical results obtained by Tian et al. [15].Tis confrms the reliability of the proposed learning model and thus enables us to apply it to a numerical evaluation of the efect of SWH policy.Te outcomes of applying the SWH policy are presented in Figures 5 and 6, depicting departure time profles and the corresponding commuting costs, respectively.Here, the results are obtained from a typical SWH policy with a * 1 � 30, a * 2 � 40, and Φ(0.5,0.5) and other default parameters.Te staggered time interval is 50 minutes, given that the time headway is H bus � 5/60 (h).
From Figures 5 and 6, we draw the following fndings: (1) Te SWH policy does infuence commuters' departure time choices and alters the cumulative departure fows.Te profle of the cumulative number of departures exhibits a double-peak shape.Te two peaks are at the work start times of a * 1 � 30 and a * 2 � 40, respectively.(2) Te SWH-induced equilibrium is identifed, where commuters departing from the same station with the same work start time encounter identical minimal costs, regardless of the bus run they choose.As shown in Figure 6, the cost variation of commuters from the same group is considered small at the same station.
(3) Tis segregation ultimately leads to a reduction in the mean commuting cost.For instance, when the SWH policy is implemented, the mean commuting cost for commuters from station H 1 is 10.58, compared to 14.26 when the SWH policy is not implemented.
It can be expected that as the staggered time interval increases, commuters from the two groups will gradually become more separated.When the time interval is signifcantly large, commuters from the two groups will not share the same bus run.To elucidate this segregation efect, we introduce a new index to measure the degree of mixing between the two categories of commuters.
Here, the mixed degree σ is defned as follows: where X 1 and X 2 are two sets with the elements of the bus index serving the commuters from Groups 1 and 2, respectively.Tus, card(X 1 ∩ X 2 ) indicates the number of buses that are shared by both groups and card(X 1 ∪ X 2 ) refers to the total number of utilized buses.When σ � 0, commuters from the two groups are totally separated from each other; when σ � 1, all buses are shared by commuters from the two groups.
Figure 7 shows how the ratio σ varies with the travel demand division ρ and staggered time interval Δt.Note that, for a given travel demand division, there is a critical staggered time interval that divides the curve of the mixed ratio into two regions: a volume-mixed region and a volumeseparated region.
In the volume-mixed region, the value of the mixed ratio σ reduces as the staggered time interval increases.When the staggered time interval is larger than the aforementioned critical value, σ does not depend on the staggered time interval anymore, with a minimum value of 0. Tis means that in the volume-separated region, a commuter's decision is not infuenced by the commuters from the other group.Moreover, the results suggest that the value of such a critical staggered time interval depends on the travel demand division.Usually, it increases with the value of ρ.Tat is to say, a smaller staggered time interval is enough to separate the two groups if the volume proportion of groups with the earlier work start time (i.e., Group 1) is more considerable.

Optimal Design.
For the SWH policy, the relationship between the staggered time interval and the proportion of the staggered population needs to be determined appropriately to reduce the in-vehicle crowding.Depending on whether physical distancing is enforced, we solve the SWH policy design problem in a normal case (in Section 4.3.1)and in a pandemic outbreak case (in Section 4.3.2).

Normal Period.
Tere is no passenger fow restriction in the normal period, and each bus can serve passengers to its maximum capacity, i.e., C bus � 80 persons.Tree types of costs-the mean total commuting cost, the in-vehicle crowding cost, and the schedule delay cost-are calculated by altering the staggered time interval and the demand division.Te results are illustrated in Figure 8.Here, the mean values of the three related costs are calculated by averaging all the commuters' costs in the transit system.
As indicated in Figure 8(b), achieving the minimum crowding cost requires satisfying two conditions: (1) ensuring a sufciently large staggered time interval, and (2) equally dividing the staggered population proportion.To gain an exact solution, we address the following two subproblems: (I) determining if there exists an optimal time interval that minimizes crowding costs for a given demand division and (II) establishing whether there is an optimal division of commuters that minimizes crowding costs when the staggered time interval is specifed.(1) Fixing the Demand Proportion of the Two Groups.Figure 9 shows the relationship between the mean crowding cost and the staggered time interval Δt for six selected travel demand divisions.Te curve labeled by ρ � 1.0 stands for a special case of SWH policy where two groups have the same work start time.Tis is used as the baseline for comparison.
One can observe the following conclusions from Figure 9: (1) Tere are two critical values of Δt that divide the crowding cost curve into three regions: a policy-failure region, a cost-reduction region, and a minimum cost region.We distinguish the above two critical values by (2) Let Δ _ t denote the optimal staggered time interval, where the minimum cost is achieved by the smallest staggered time interval.By defnition, Δ _ t � Δ � t.We fnd a tight relationship between the value of Δ _ t and the value of the demand mixed ratio σ.Recall that in Figure 7, the volume relationships of the two groups exist in two cases: a volume-mixed region and a volume-separated region.Te above two regions are rightly separated by the critical value Δ _ t.Te minimum crowding cost is achieved by separating the two classes of commuters until they travel independently, i.e., in a volume-separated region.When the staggered time interval is smaller than Δ _ t, the two groups in a volume-mixed region and their departure time decision afect each other.A simple case is provided in Figure 10 with demand division ρ � 0.5.
(3) Te optimal staggered time interval value depends on the quantitative relationship between two staggered groups.Generally, the optimal staggered time interval decreases as the demand division increases.8 Journal of Advanced Transportation (4) With regard to the minimum cost, it is also sensitive to the demand division.Due to demand division symmetry, i.e., ρ � 0.1 and ρ � 0.9, will fnally have the same minimum cost.Moreover, among all of the divisions, the minimum crowding cost can be achieved when ρ � 0.5.
(2) Fixing the Staggered Time Interval.Figure 11 presents the mean crowding costs as a function of the demand division ρ for six given staggered time intervals.Te curve with Δt � 0 represents a special case where SWH policy is not implemented, serving as the baseline for comparison.One can reach the following conclusions from Figure 11: (1) Tere are two critical demand divisions that divide the crowding cost curve into three regions: a policyfailure region, a cost-reduction region, and a costincrease region.For clarity, the above two critical values are denoted by ρ � and � ρ, and ρ � ≤ � ρ.In the policy-failure region, i.e., ρ ∈ [0, ρ � ], the mean crowding cost does not depend on the staggered time interval, and SWH policy fails to mitigate in-vehicle congestion.Ten, a further increase to the demand division ρ will ease in-vehicle congestion.At value � ρ, the minimum cost is reached.If the demand proportion is larger than � ρ, in-vehicle congestion increases.Taking Δt � 1 for instance, the two critical values of demand division are ρ � � 0.3 and � ρ � 0.8, respectively.However, if the staggered time interval is larger, the crowding cost is quite sensitive to the demand division, and the policy-failure region does not exist anymore.
(2) When the staggered time interval is predetermined, the optimal division of the two categories of commuters is defned when the minimum in-vehicle crowding cost is reached.Let _ ρ indicate the optimal division.Tis means that _ ρ � � ρ. (3) Te optimal demand proportion _ ρ is sensitive to the staggered time interval.Generally, as the staggered time interval increases, the optimal demand proportion, _ ρ, tends to decrease.For example, _ ρ � 0.7 when Δt � 5, while _ ρ � 0.6 when Δt � 10.However, when the staggered time interval is large enough, the value of _ ρ stabilizes and ceases to decrease further, settling at a value of 0.5.For instance, the critical demand proportion for Δt � 20 and Δt � 15 is identical, with the same value of _ ρ � 0.5.(4) In terms of the minimum cost, it is also sensitive to the staggered time interval.Te minimum cost decreases as the staggered time interval increases.However, once the staggered time interval surpasses a critical value, the minimum cost no longer decreases.
Summarily, the staggered time interval and volume division are two controllable parameters for designing a SWH policy.From the point of view of local optimization, we fnd the scenario-dependent optimal amount of control variables to minimize the crowding cost, assuming that one of the two control parameters is predetermined.
Figure 12 illustrates the diagrams of efcient control regions when designing an efcient SWH policy.
(i) Given the travel demand division, as Figure 12(a) suggests, the staggered time interval should be within its upper and lower boundaries.A staggered time interval that is smaller (or larger) than its lower boundary (or upper boundary) will not relieve invehicle congestion.(ii) In the same way, when the staggered time interval is predetermined, the travel demand division needs to be selected within the crowding cost reduction region as shown in Figure 12(b).
From the point of view of system optimization, the optimal combination of staggered the time interval and volume division should be set to achieve minimum  Journal of Advanced Transportation crowding costs.As shown in Figure 12, the volume division needs to set be at _ ρ � 0.5 and the staggered time interval has a value of Δ _ t � 15.

Pandemic Outbreak
Period.Tis section examines the supplementary impact of physical distancing on public transport services with the implementation of the SWH policy.In this analysis, we presume that the physical distancing policy limits bus capacity to 50% total occupancy, i.e., C bus � 40 (persons).Amid the COVID-19 pandemic, overall transport demand signifcantly decreased to community activity restrictions.Nevertheless, for comparative reasons, we assume that the total demand remains consistent with prepandemic scenarios (as discussed in Section 4.3.1). Figure 13 shows the joint impact of the travel demand division ρ and staggered time interval Δt on the mean commuting cost and its two components (the in-vehicle crowding and schedule delay costs).Compared to the scenario with C bus � 80, the implementation of a physical distancing strategy leads to an additional reduction in invehicle congestion costs.However, due to limited boarding constraint, certain commuters have to adjust their departure times-either earlier or later-to take a bus, resulting in elevated schedule delay costs.On average, the decrease in crowding costs fails to ofset the rise in schedule delay costs, ultimately leading to an overall increase in total commuting costs.
To delve deeper into the combined impact of SWH policy and physical distancing, we assess the changes in the three types of costs incurred by commuters who board from the same station with identical work start times.Two cases are discussed, i.e., Δt � 5 and Δt � 15, and the results are illustrated in Figures 14(a) and 14(b), respectively.Here, we set the demand proportion to ρ � 0.5, which means that the commuters are equally divided into two groups in each station.
We draw the following conclusions from Figure 14: (1) Commuters from the downstream, i.e., station H 3 and H 4 , are signifcantly afected by the physical distancing measures.Tese commuters need to depart earlier or later to avoid taking a fully loaded bus.Te considerable rise in schedule delay costs outweighs the benefts gained from reduced in-vehicle crowding.Tis leads to a signifcant surge in the total commuting cost.(2) When the staggered time interval is relatively small, i.e., Δt � 5, commuters from Group 2 (with a later work start time) in downstream stations sufer much higher commuting costs than Group 1. Te increment of schedule delay costs mainly contributes to the rise in total commuting cost.However, the diference between those two groups will disappear when the staggered time interval is large enough.As indicated in Figure 14(b), commutes from the two groups have almost the same cost.
Finally, we provide the optimal parameter settings for designing an efcient SWH policy under the requirement of physical distancing.Diagrams of efcient control boundaries are given in Figure 15.It is noticed that the combined efect of SWH policy and physical distancing changes the efcient control boundaries; however, the diference is insignifcant compared with Figure 12.Specifcally, the policy-failure region is slightly smaller.In terms of system optimization, the optimal demand proportion should be set to ρ � 0.5 and the staggered time interval has a value of Δ _ t � 15.Tis value is identical to the case where a SWH policy is implemented under normal conditions, i.e., C bus � 80. considered.One factor is the level of physical contact between passengers, which is clearly related to crowd density.Typically, a higher number of physical contacts (or greater crowd density) implies a higher risk of infection to some extent.Te second factor is the duration of physical contact.Longer durations in a crowded environment increase the probability of passengers getting infected.
A feasible way to assess this risk is by using simulation technology that describes a social-activity contact network and simultaneous disease transmission (Mo et al., [26]).However, we not consider such a method due to its complex and tedious analyses and the introduction of more parameters.Moreover, we lack actual data to calibrate the model parameters.
Here, we adopt the value of in-vehicle crowding as a surrogate index to depict the risk of infection when commuting on a bus line.Tis is reasonable since in-vehicle crowding is defned as the function of the degree of crowding efects and the in-bus time, which contains the two critical factors for evaluating the risk of infection.Te risk of infection will increase if commuters travel on a more crowded bus; it will be much higher if they travel for longer distances.
By this defnition, we conclude that the SWH policy provides a signifcantly safer commuting environment for public transit in terms of the risk of infection.However, transit safety benefts are not uniformly distributed throughout the bus schedule.Buses near the work start times are typically fully loaded, failing to meet the requirements of physical distancing.In contrast, those buses departing earlier or later, deviating from the scheduled work start time, are safer due to the smaller number of onboard passengers.To illustrate the above point, we record the number of utilized buses and the number of infection-safe buses in Figure 16, respectively.Here, an infection-safe bus refers to one where the number of onboard passengers is no more than 50 per cent of its maximum occupancy.We take the case without SWH as the baseline for comparison, i.e., when ρ � 0 and Δt � 0. In the case without SWH, the mean number of utilized buses is 22.81, of which a mean number of 13.43 buses are safe.When implementing the SWH policy    with parameter settings of ρ � 0.5 and Δt � 15, the number of utilized buses is 31.51 and the mean number of safe buses is 23.44.
On the basis of SWH policy, enforcing physical distancing (or limiting the maximum bus load) will further reduce the risk of infection.Figure 17 shows the reduced infection risk as a percentage from enforcing a policy that combines SWH and physical distancing, compared with implementing SWH only.It is crucial to note that physical distancing has a signifcant impact only when the staggered time interval is small.Te bus load factor (i.e., the mean number of passengers per bus) is largely reduced if the staggered time interval is large.In this case, physical distancing will not dramatically afect the bus load factor.So, the efectiveness of physical distancing is limited.For example, in Figure 17, there is only a 13.7% decrease in infection risk when the staggered time interval is Δt � 2, whereas there is a decline of 2.54% when the staggered time interval is Δt � 18.

Conclusions
Tis study examined how the SWH policy afected commuting pattern during peak hours.It focused on a straightforward bus route with multiple origins and a single destination.Commuters' daily departure time choices were simulated using a multiagent-based Q-learning model.In this model, the regret value served as the signal for reinforcement learning, guiding individuals in making optimal choices for their departure times.Te study explored SWH's efects on commuting costs and the time-space distribution of departure fows.Results indicated that a well-designed SWH policy infuences commuters' departure time choices, leading to a deconcentration of the temporal distribution of travel demand.Notably, a new SWH-induced equilibrium is achieved, where commuters departing from the same station with the same work start time experience identical minimal costs, regardless of their choice of bus.
Concerning the design of an efective SWH policy, the following conclusions are drawn.First, with the division of travel demand, the minimum in-vehicle crowding is achieved when the staggered time interval surpasses a certain threshold.Second, given a staggered time interval, the invehicle crowding is reduced by properly adjusting the division proportion of the two groups.Tese conclusions can be extended to situations involving physical distancing during epidemic outbreaks.It is worth noting that SWH policy also contributes to lowering the risk of infection during such periods.
In this study, we focused solely on the benefts commuters gain from adopting the SWH policy.We neglected the decisions of frms to impose start times (arrival times) on  14 Journal of Advanced Transportation their employees.For frms, the optimal design should not signifcantly deviate from the initial work schedules and should yield minimal changes (Yildirimoglu et al. [27]).Tis is because implementing substantial changes in work schedules may reduce positive production externalities.Tus, a SWH policy should mitigate congestion on public transit networks while reducing the impact on enterprise productivity as much as possible.Tis topic will be considered in future studies.Another meaningful extension is to replace the current simplifed bus line model with a more realistic one that takes into account the stochastic nature of public transport operations.Tis will allow us to explore how travel choice behavior infuences the overall reliability of the bus line.In addition, it would be interesting to investigate the combined efect of the SWH policy and other bus operating methods, such as stop-skipping and limited boarding, on the overall efciency of the bus system.

Figure 1 :
Figure 1: Bus line with multiple board-only stations and a single destination station.

Figure 2 :
Figure 2: Scheme of the multiagent-based learning process.

4. 1 .
Parameter Settings.Te simulation conditions are set as follows: S � 4 stations, M � 50 buses, τ � (0.2, 0.2, 0.3, 0.1) (h), n 1 � 200, n 2 � 350, n 3 � 200, and n 4 � 150 (persons).According to Tian et al.[15], g(n) � 0.35n (RMB/h) and (β, c) � (10, 30) (RMB/h).Te default bus capacity and time headway are set to C bus � 80 (persons) and H bus � 5/60 (h), unless otherwise stated.For the SWH policy, we fx the work start time of Group 2 to a * 2 � 40.Ten, the control parameters are the travel demand division ρ and work start time of Group 1. Here, the proportion ρ is identical for the passengers boarding at all the stops.In one test, the iteration in the learning model is set to T � 2000.When calculating the mean values of the related cost, 50 repetitions are used to guarantee accuracy.

Figure 3 :
Figure 3: Commuters' departure time distribution when SWH policy is not implemented.

Figure 4 :
Figure 4: Box chart of commuters' commuting costs from the same station when SWH policy is not implemented.

Figure 8 :
Figure 8: SWH policy: diagrams of the three related costs in the space of the travel demand division ρ and staggered time interval Δt.(a) Total commuting cost.(b) In-vehicle crowding cost.(c) Schedule delay cost.

Figure 9 :FigureFigure 11 :
Figure 9: Mean crowding costs as a function of the staggered time interval for six selected demand divisions.

4. 4 .Figure 12 :
Figure 12: Diagram of efcient control boundaries when only SWH policy is implemented.(a) Fixing the travel demand division.(b) Fixing the staggered time interval.

Figure 13 :
Figure 13: SWH-distancing policy: diagrams of the three related costs in the space of the travel demand division ρ and staggered time interval Δt.(a) Total commuting cost.(b) In-vehicle crowding cost.(c) Schedule delay cost.

Figure 15 :
Figure 15: Diagram of efcient control boundaries when both SWH and physical distancing policies are implemented.(a) Fixing the travel demand division.(b) Fixing the staggered time interval.

Figure 16 :
Figure16: Te number of utilized buses and safe buses when SWH policy is implemented.

Figure 17 :
Figure 17: Percentage reduction in infection risk by enforcing physical distancing based on the SWH policy.
.1.SWH Policy.As a preliminary analysis, we here consider a simple double-work start-time scenario.Ten, the control variables relevant to the SWH policy are (i) the demand proportion of the two groups and (ii) the staggered time interval of the two groups.Formally, bus commuters are divided into two groups: (1) commuters in Group 1 have the same work start time a * Receive app recommendations r(a s ) | a s ∈ A