Agent-Based Modeling and Simulation for the Bus-Corridor Problem in a Many-toOne Mass Transit System

With the growing problem of urban traffic congestion, departure time choice is becoming a more important factor to commuters. By using multiagent modeling and the Bush-Mosteller reinforcement learning model, we simulated the day-to-day evolution of commuters’ departure time choice on a many-to-one mass transit system during the morning peak period. To start with, we verified the model by comparison with traditional analytical methods. Then the formation process of departure time equilibrium is investigated additionally. Seeing the validity of the model, some initial assumptions were relaxed and two groups of experiments were carried out considering commuters’ heterogeneity and memory limitations. The results showed that heterogeneous commuters’ departure time distribution is broader and has a lower peak at equilibriumanddifferent people behave in different pattern. When each commuter has a limited memory, some fluctuations exist in the evolutionary dynamics of the system, and hence an ideal equilibrium can hardly be reached. This research is helpful in acquiring a better understanding of commuter’s departure time choice and commuting equilibrium of the peak period; the approach also provides an effective way to explore the formation and evolution of complicated traffic phenomena.


Introduction
The agglomeration of commuters has resulted in severe traffic congestion.Departure time should be considered when congestion exists.The "corridor problem" is crucial in the study of commuters' departure time.The core content of this problem is the commuters' departure pattern at equilibrium, during which no commuter can experience a lower commuting cost by departing at a different time.Vickrey [1] was the first to depict a similar problem.In his model, a road with limited capacity connects residential locations and workplaces; commuters drive daily from their home to work and minimize commuting costs by departing at a proper time.Commuting cost includes the costs of travel time, schedule delay, and queuing time.All commuters' costs are equal when equilibrium is reached.This model, which is called the "bottleneck model, " analyzes the mechanism of commuters' departure time choice succinctly and directly.The "bottleneck model" has been extensively studied in the past decades.Studies have considered different attendance times [2], elastic traffic demand [3], and so on.Ramadurai et al. [4] summarized related studies.
The "corridor problem" assumes that all people commute by driving.However, the use of private cars results in serious congestion and pollution.Public transportation priority has become an efficient method to solve these problems.In the 1990s, researchers studied the departure time choice of commuters who utilize mass transit in peak periods, which is termed the "bus-corridor problem." Sumi et al. [5] proposed an optimization model of commuters' departure time and route choice in mass transit systems, in which a commuter's departure time is mainly determined by his or her scheduled arrival and the operational features of the system.Arnott and DePalma [6] conjectured a departure pattern of equilibrium; however, the pattern was not proven because of a discretization problem.Huang et al. [7,8] modeled commuters' feelings of discomfort induced by crowding.Sumalee et al. [9] proposed a stochastic dynamic model with an explicit seat allocation process.Tian et al. [10] analyzed the equilibrium properties of a variant of the "buscorridor problem"; in-vehicle crowding and schedule delay cost are considered in the model, and the numerical examples are consistent with the pattern of departure conjectured by Arnott and DePalma [6].
The "bus-corridor problem" is generally considered a variant of the "corridor problem, " in which users commute through a transit line that connects residential locations and workplaces.However, several differences exist between the two problems.First, commuters select an alternative from a discrete decision space, which is why Arnott and DePalma [6] conjectured a departure pattern of equilibrium but failed to prove it.Second, an additional commuter on a given train generally increases crowding in that train but has no effect on the speed of the train or other trains, whereas an additional vehicle on a given road generally slows down the speed of other vehicles and causes time loss to subsequent road users.Third, the "bus-corridor problem" is a more realistic problem than the "corridor problem, " particularly in large cities, such as Beijing, Hong Kong, London, and New York, where many users go to work by using the transit system.Several policies, such as those for ticket pricing, line frequency, and service level, can be investigated in the context of the "bus-corridor problem." The "corridor problem" and "bus-corridor problem" attempt to depict the morning commute problem.Numerous studies have been conducted to investigate the decisionmaking process of commuters.To examine the effect of incentives on commuters' morning commute decision, Zhang et al. [11] conducted an empirical study on the Beijing subway system and recommended two policies to reduce congestion during peak periods.With regard to morning commute modeling, Maio et al. [12] presented a day-today route choice model to study the influence of experience on commuters' behavior.Further studies on day-to-day models in route choice behavior in response to previous travel experience can be found in Watling's work [13].Ben-Akiva et al. [14] presented an elegant modeling framework that incorporates behavioral models of drivers' route and departure time choices, in which drivers follow a bounded rational decision rule.
Several studies have focused on individual varieties among commuters.By incorporating the notion of punctuality reliability, Siu and Lo [15] found that risk-averse travelers (those with high punctuality reliability) opt to depart from home at an early time; this finding was confirmed by an empirical study.References [16,17] investigated a type of users with bounded rationality, specifically, "oblivious users." References [18][19][20] studied the morning commute problem within the context of heterogeneous commuters who have different value-of-time or early/late arrival penalty parameters.
Other studies have investigated some distinct phenomena.Xiao et al. [21] studied the "tactical waiting" phenomenon where travelers find it advantageous to delay reaching the bottleneck by slowing down or waiting.Feng et al. [22] developed a travel cost function with constraints of train capacity to depict the phenomenon where commuters have to wait at platforms for several scheduled intervals because of large passenger flow volume in the morning commute.
Review of the related literature indicates that modeling the behavioral pattern of travelers is a critical factor when dealing with the morning commute problem.Most existing traffic models that incorporate departure time choice are analytical models.A microsimulation approach is particularly attractive because of its ability to describe various behavioral hypotheses of an individual.Ettema et al. [23] proposed a microsimulation model system, in which a mental model of travelers' travel circumstances was utilized to make the departure time decision.To deal with the transit assignment problem (TAP), [24,25] proposed a multiagent learning-based approach and demonstrated the flexibility of the multiagent method in representing the different views of TAP.Wahba and Shalaby [26] took a significant step toward the advancement of TAP modeling by providing the operational integrated dynamic modeling framework MILATRAS; departure time and path choices are considered in the framework.MILATRAS has been applied to a largescale real-world transit network and exhibited promising predictability [27].The core of MILATRAS is a departure time and transit path choice model based on the Markovian decision process, which can be found in [28].
Microsimulation is a method of "bottom-up" modeling.This process has inherent superiority because of its capability to depict individual behavior and the interaction between the system level and the individual.This process is adept at modeling the nonlinearity of systems, such as the emergence of congestion; thus, it is a proper method of dealing with the "corridor problem." Another benefit of utilizing microsimulation is that new behavioral principles in psychology or economics can be flexibly incorporated instead of the utilitymaximizing assumption in traditional analytical models.
Although the "corridor problem" is only concerned about travelers' departure time choice, it remains an interesting problem worthy of attention.First, this problem is a critical issue when the aim is to optimize the performance of a transit system.Second, the many perceptual factors (e.g., in-vehicle crowding) or details of the commuting process (e.g., seat allocation) that shape the behavioral pattern of travelers result in the emergence of various phenomena in the "corridor problem." Tian et al. [10] investigated the equilibrium properties of a type of the "corridor problem" by considering in-vehicle crowding and schedule delay; Tian's model is thus an ideal reference point in verifying the feasibility of an agent-based approach.
Based on Tian's model, this study aims to establish a departure choice model that utilizes an agent-based approach to understand the departure time equilibrium during peak periods.Commuters are expected to adjust their behavior according to their experience with the performance of the transit system and base their daily travel decisions on the accumulated experience gathered from repetitively traveling through the transit network on consecutive days.This concept is similar to that implemented by Ettema et al. [23].In the present study, a transit line that links residential locations and workplaces may emerge when the transit line serves a concentric city where all commuters are assumed to work in a highly compact city center and live in dispersed surrounding suburban areas.To concentrate on the mechanism of departure time choice, we did not consider commuters' route choice.Thus, demand is exogenous.We also assumed that urban transit is the only option for all commuters.An agent-based approach is presented, in which reinforcement learning is adopted to represent passengers' adaptation and account for the dynamics in commuters' behavioral pattern.
Many researchers have suggested that a microsimulation approach is highly suitable when modeling individuals in complex or real-life systems (see [29][30][31][32]).Our study supports this view.With the same parameter settings and assumptions, our model produced results in line with those in Tian's work.Additionally, insight into the aspects of commuter characteristics can be gained, leading to system equilibrium.When we relax some of its assumptions or add features to it, the agent-based approach can be flexible in characterizing complicated circumstances.Two realistic conditions, commuters' heterogeneity and commuters' limited memory (indicating bounded rationality), were investigated.Results show that, in the heterogeneity case, the distribution of commuters' departure time is broad at equilibrium.In the case of limited memory, commuters present a certain degree of irrationality and randomness and ideal equilibrium is difficult to reach.
Section 2 states the problem and presents the model and learning algorithm.Section 3 presents the verification of the proposed model.Section 4 describes two subsequent experiments, in which several idealistic assumptions are relaxed.This section also contains our analysis.Section 5 provides the conclusion.

Commuters' Departure Time Choice Model Based on a Multiagent Approach
A bus line with multiple origins and a single destination (Figure 1) was considered in accordance with Tian's research [10].Buses set out from the furthest residential location, The bus line with multiple origins and a single destination.
Thus, these two factors would not affect the commuters' departure time choice.Commuters have to make a tradeoff between crowding cost and arrival penalty.We assumed that all commuters attempt to minimize their individual total commuting costs by selecting their departure times (or bus services).Traditional theory states that the departure time choice of all commuters is conducted openly with noncooperative game rules; that is, after a long period of adjustment, a state of equilibrium (user equilibrium) exists.At equilibrium, the total commuting costs of commuters departing from the same station are identical and no one experiences lower individual cost by unilaterally changing his/her departure time.Mathematically, this condition can be expressed as where TC  is the equilibrium commuting cost from station   to workplace  and    is the number of commuters from station   taking bus .This equation indicates that if some commuters who utilized bus  depart from   , then their individual commuting cost is equal to the equilibrium commuting cost; otherwise, the commuting cost by bus  is not less than the equilibrium cost.

Proposed Model.
Each commuter is considered an agent whose basic behavior pattern is to take the bus to work every day; the commuting cost is computed afterward.A commuter's knowledge of departure time choice is updated by reinforcement learning.The commuter agent is composed of several modules as shown in Figure 2.
The function of each module is explained as follows.
The perception module is utilized to perceive the external environment, such as in-vehicle crowding and early/late arrival penalty.
The memory module is utilized to save and extract previous commuting records.
The cost valuation module is utilized to calculate the commuting cost based on information from the perception module.
The experience module contains an agent's feeling and evaluation of the entire peak-period commuting.This module is updated continuously as the system evolves.
The learning mechanism is the core module that provides agents intelligence and is utilized to update commuter's experience and knowledge of peak-hour commuting.The Bush-Mosteller (BM) reinforcement learning model is adopted in this study.The decision-making module determines which bus to take.Decisions are made using information from the experience module and commuter's memory.

BM Learning Model.
The BM model is a stochastic model designed to analyze data with changing probabilities.BM is a classical learning model widely utilized to address different problems.Macy and Flache [33] applied this model to three classes of social dilemma and investigated cooperative equilibrium.Zhou et al. [34] designed a BM model-based power control algorithm to address cognitive radio network problems.Wynne [35] explored the BM learning model as an account of nonverbal transitive inference performance.In the BM model, a player can only make decisions by learning from his or her own experiences; choices and perceived utilities are private and cannot be known by others.Hence, the BM model is a fully distributed learning model.This feature fits our purpose well.Therefore, we modified the model and utilized it in the peak-period commuting context.
The BM model generally consists of a stochastic decision rule and a learning algorithm, in which the consequences of a decision create positive and negative stimuli (rewards and punishments).The stimuli update the probability () that the decision will be repeated.Choices that have resulted in satisfactory outcomes (i.e., outcomes that met or exceeded aspirations) tend to be repeated in the future, whereas choices that have resulted in unsatisfactory experiences will be avoided.The stochastic decision rule determines which strategy to select stochastically.Unlike the widely utilized -greedy action-choice model in which a fixed parameter  is employed to determine whether to exploit the current experience (i.e., choosing the selection with most immediate reward with 1 −  probability) or explore other selections, the decision rule involves simple selection according to the probability of each strategy updated by the stimuli every turn.
The BM model comprises conceptions, such as "payoff " and "aspiration." These conceptions are used to calculate stimulus.From the perspective of knowledge, players need to know their strategy set and all the potential utilities; they are assumed to have basic math skills to perform some calculations.
When the BM model is applied to the case of peak-period commuting, strategy  means taking bus  and   stands for the probability of selecting strategy .Hence, a commuter's strategy set is the bus service set and corresponds to a vector of probabilities (denoted by ).The stochastic decision rule utilizes  to make a decision about which bus to take every day.Commuting cost   is from the cost-calculating module previously mentioned.Figure 3 illustrates how the probability of strategy  changes.
In the standard BM model, the premise for calculating stimuli   is that players know all the potential costs (or payoffs).Therefore, they can calculate the maximum absolute value of the difference between all the possible costs and their aspirations.For example, in Macy's work of exploring social dilemma [33], stimulus   is calculated as where   stands for the payoff of selecting strategy  and , , ,  are the four potential payoffs.In our context, one commuter is unable to know all the potential commuting costs.Thus, we utilized extreme historical costs to settle this issue and modify the calculation of stimulus   , which is expressed as where   is the average commuting cost of decision ,  is the commuter's aspiration level,  max is the highest commuting cost perceived (including the current turn), and  min is the lowest commuting cost perceived (including the current turn).The denominator in (3) represents the supremum of the absolute value of the difference between any cost the commuter ever perceived and his/her aspiration.With this scaling factor, the absolute value of   is ensured to be not more than unity.By utilizing individual  max and  min , the commuters' distributed learning pattern is enhanced.
Aspiration  provides a reference point of positive or negative stimulus in calculating   .If commuting cost is lower than aspiration, the commuter will receive a positive stimulus; otherwise, a negative stimulus would be provided.
When strategy  is selected, its corresponding probability   will be updated as follows: where  , represents the probability of strategy  at day ,  is the learning rate (0 <  < 1), and  , is the stimulus experienced after taking strategy  at day .To ensure that the sum of all probabilities is always equal to unity, the probabilities of other strategies are updated as follows: other ∈  and other ̸ = . (5)

Model Verification
Owing to the difficulty of obtaining empirical data, the traditional analytical model is compared with our model to verify the latter.Tian's model was selected for this purpose because Tian investigated the equilibrium properties of peak-period commuting in a many-to-one mass transit system [10].Considering that Tian's conclusions were drawn under the assumption that commuters are homogenous and have complete information, an experiment with consistent assumptions and parameter settings similar to those utilized by Tian was conducted to verify our model.All commuters from the same station have uniform bus fare and in-vehicle travel time.Thus, these two factors would not affect the commuters' departure time choice.Commuters have to make a trade-off between crowding cost and arrival penalty.The total commuting cost of a commuter at station   choosing bus  is where    represents a commuter's in-vehicle crowding cost from station   to  on bus .Crowding levels and the travel time between stations are considered when calculating where    is the number of commuters taking bus  from station   ,   represents the travel time between   and  +1 , and crowding function () reflects a commuter's perception of in-vehicle crowding.() is the early/late arrival penalty of commuters on bus ; the definition of early/late arrival penalty is consistent with that in Vickrey's bottleneck model [1]. = {, . . ., 2, 1, 0, −1, −2, . . ., −} was employed as the tab set of bus services, where  and  are sufficiently large to ensure that all commuters can arrive at the workplace during the peak period considered.Only one bus is assumed to arrive at workplace  on time (arrival at work-start time); this bus is denoted by 0. Thus,  > 0 denotes trains arriving early, and  the early arrival time is  × . < 0 denotes trains arriving late, and the late arrival time is − × .() is provided by where  and  denote the early and late arrival penalties per unit time, respectively.
Tian derived the following four properties of the equilibrium state.
Property 1.At equilibrium, if train  is selected by commuters at nonstarting station   , then this train must have also been selected by cohorts of commuters from upstream station(s).
Property 2. At equilibrium, if train  is selected by commuters at nonstarting station   , then commuters must have taken the same train at the previous station.Property 3. At equilibrium, if train  is selected by commuters at nonstarting station   , then the number of all commuters from upstream stations is a constant independent of train number .Furthermore,    is a constant independent of train number  denoted as   .Property 4. For any boarding station   , except the last one   , a time duration exists during which the numbers of commuters taking all trains are identical and maximal (  ).A numerical example from Tian's study is presented in Figure 4.
Considering that agent-based simulation is different from traditional analytical method and that random disturbances exist, it is regarded as a result that meets the four equilibrium properties.Insights into the formation of equilibrium can be obtained through the analysis of original simulation data.
We determined if any difference exists among commuters from each station, especially whether some form of ordering exists in the formation of equilibrium.Therefore, we selected an indicator to represent the stability of a commuter's departure time choice.For each commuter, the max probability in , which is denoted as  max , fits our needs well.A large  max suggests stable departure time for a commuter.If  max reaches 1, the commuter will stick to one bus and always departs at the corresponding time.
A typical evolutionary process of a commuter's  max is shown in Figure 6(a).After approximately 500 days, the commuter's  max reached 1.According to the stochastic decision of the BM model, this commuter will always select one specific bus afterward.This condition may lead to too cold (exploitation) reinforcement (meaning our agent may freeze into always implementing the wrong strategy); however, this condition is only the theoretical probability for unsatisfactory outcomes.BM learners are very successful indeed (see [36] for further discussion on BM or related reinforcement learning models).Analysis of the original simulation data reveals that approximately 360 days are needed for a commuter to explore different alternatives until he/she freezes into selecting a specific strategy.
Figure 6(b) shows the average  max of the commuters of each station.
At the beginning of the simulation (roughly before day 250), commuters from  4 had the highest stability of departure time selection, followed by those from  3 .However, their speeds of growth slowed down afterward and commuters from  1 and  2 exhibited faster speeds of growth.Finally, by checking that the time cost before  max reached Probability 1, we found that upstream commuters reached equilibrium faster than downstream commuters.The number of commuters in one bus shows the overall level of commuters' willingness to select this bus.The number also reflects the distribution of commuters' departure time.Figure 7 shows the change in commuters' number in several typical buses (bus 0 is the on-time bus, buses 7 and 17 are buses that arrive before work-start time, and bus −3 is the bus that arrives after work-start time).
The number of commuters in bus 17 or bus −3 became stable after approximately 1200 days of evolution, that in bus 7 became stable after approximately 1800 days, and that in bus 0 continued to fluctuate after 2500 days.As for buses whose arrival time at the workplace is close to the work-start time, their number of commuters fluctuates more than that of buses whose arrival time is far from the work-start time.
Previous research has shown that when equilibrium is reached, each commuter's commuting cost is equal.To verify this finding, each commuter's average cost was examined in this study from day 1500 to day 3000 when most commuters reached equilibrium.The result is shown in Figure 8.
The commuting costs of commuters from upstream stations are high, and the data of  1 ,  2 ,  3 exhibit centralized distributions.The data of  4 are scattered.Most of the commuters have cost approximately about 320; some reached approximately 350, and a few exceeded 400 and were even close to 500.By checking the original data, we found that for commuters whose commuting costs are more than 320, their choices were mostly among buses 8, 7, 6, 5, 4, and −1 and their early/late arrival penalties are higher than those of other commuters from  4 .
The average cost of commuters on each bus was also calculated (from day 1 to 3000).As shown in Figure 9, the average costs of commuters from the first three stations exhibit a similar tendency.The average cost decreased when it was relatively early and then maintained a stable value for a period when the commuting cost of taking all buses was identical and minimum.The farther a station is from the workplace, the longer the duration is.However, no similar duration existed for commuters from  4 .All these results corroborate Property 4 in Tian's research and explain why the average costs of commuters from  4 are more scattered than those of other stations (Figure 8).
Figure 9 explains the distribution in Figure 5. Figure 5 shows that the departure time choices of all commuters in the first three stations at equilibrium are in those durations (when the average cost was identical and minimum).In other words, these commuters minimized their commuting costs.
In summary, our model can reproduce the results of traditional analogical method; its results meet the four equilibrium properties.With the developed multiagent technology and learning model, we can gain insights into the formation of equilibrium.This provides a new means to understand equilibrium.

Simulation Results and Analysis
As shown in the verification experiment, inexperienced commuters can minimize their costs through reinforcement learning.Our simulation result is in accordance with the four equilibrium properties.All these findings verify the validity of the model.Therefore, some idealized assumptions were relaxed to evaluate the distribution of commuters' departure time equilibrium in realistic situations.
First, commuters should be heterogeneous; that is, they should have diverse perceptions of commuting cost.Second, in the original model, a commuter's memory increases indefinitely as the system evolves and his/her aspiration is the average of all costs perceived.However, in reality, old memories always have less influence and commuters' aspirations are mainly derived from recent travel experience.Unlimited memory is unrealistic.
Hence, two groups of comparative experiments were conducted.In the heterogeneous group, commuters have individual differences in the perception of in-vehicle crowding or early/late penalty.The other group involves the situation wherein commuters have limited memory.

Experiments in Consideration of Commuters' Heterogeneity.
In our model, commuting costs include in-vehicle crowding and early/late penalty.Crowding function () is utilized to represent a commuter's perception of in-vehicle crowding and coefficients of early/late penalty (, ) are employed to represent a commuter's perception of early/late penalty.The following experiments were conducted for the aforementioned two aspects.

In-Vehicle Crowding.
In reality, commuters have different perceptions of crowding.Some elderly people or children hesitate to board a crowded carriage, whereas some young people do not mind such situation.In this experiment, commuters were divided into three types based on their tolerance level of crowding.These three types are normal, crowding-sensitive, and crowding-insensitive commuters.Their crowding functions and mixing ratios are shown in Table 1.
The commuters were uniformly mixed, and the parameters were similar to those in the verification experiment in Section 3. Figures 10 and 11 show the results of our simulation.Figure 11(a) shows that commuters' equilibrium did not meet Properties 3 and 4 because Tian's research was under the assumption of homogeneity, which was relaxed in this experiment.The distribution of commuters' choices had a broader range and a lower peak than that of the results of the verification experiment.Besides, there were not much differences between commuters' average  max of each station.
In our experiment, different groups of commuters exhibited different patterns of departure time choice.
Comparison of the different commuters' choices revealed that normal commuters mostly select from buses [12, −3], whereas buses [3, −1] were favored by downstream commuters.The crowding-sensitive commuters from the first three stations selected buses before bus 12 or after bus −3, whereas all those from the fourth station selected from buses [12, −3].Most of the crowding-insensitive commuters selected buses [3,0].Thus, all buses can be divided into three types.The first type contains buses [3,0] whose arrival times are close to the work-start time.These buses have more commuters at equilibrium, with a high level of in-vehicle crowding and minimal early/late arrival penalty.The majority of crowding-insensitive and downstream normal commuters selected these buses at equilibrium.The second type contains buses [11,4] and buses [−1, −2].Their arrival times are before or after the work-start time.A few commuters rode in these buses, and taking these buses would generate a Mixing ratio 50% 25% 25% certain early/late arrival penalty.Normal commuters from the first three stations mainly selected this type.The third type contains buses [21,12] and buses [−3, −8].Their arrival at the workplace is either too early or too late.Crowding-sensitive commuters from upstream stations mostly selected this type.These buses have a very low level of in-vehicle crowding but a very high early/late arrival penalty.

Early/Late Arrival Penalty.
Another aspect of commuters' heterogeneity is that commuters have different perceptions of early/late arrival penalty.For example, some companies have very strict attendance policies; consequently, commuters working in these companies may be more sensitive to being late.By contrast, some companies have flexible working hours; thus, employee punctuality is not an issue.
In the experiment on early/late arrival penalty, commuters were divided into three types based on their sensitivity toward early/late arrival penalty.The three types are normal, earlysensitive, and late-sensitive commuters.Their coefficients of early/late arrival penalty and mixing ratios were set as in Table 2.
The commuters were uniformly mixed, and parameters were similar to those in the verification experiment.Figures 12 and 13 show the results of the simulation.
We identified a distribution similar to that in the experiment in Section 4.1.1 in Figure 13(a); that is, the distribution had a broader range and a lower peak compared with that in  Comparison of different commuters' choices revealed that most of the normal commuters selected from buses [12, −2] and the early-sensitive commuters mainly selected from buses [3, −5].Most of the late-sensitive commuters from the first three stations selected from buses [21,10], whereas those from the fourth station preferred buses [9,3].The normal commuters mainly selected buses whose arrival time is close to the work-start time.These buses are crowded with passengers but could still ensure that commuters would not be late or too early for work.Buses that arrived after work-start time were selected mainly by the early-sensitive commuters.Buses whose arrival time is before work-start time contained most of the latesensitive commuters.These commuters would never be late for work and could enjoy sufficient space in the bus at the cost of getting up earlier compared to other commuters.

Experiments in Consideration of Commuters' Limitation of Memory.
In the verification experiment, a commuter's capability of memory is unlimited, and his/her aspiration  originates from all previous commuting experiences.To explore the departure time choice of commuters when this assumption is relaxed, commuter's memory capability was limited in this set of experiments.Thus, a commuter's aspiration is only determined by recent experience.Three experiments were conducted.Commuter's capability of memory was confined to 3 days to represent short-term memory in the first experiment.Capability of 30 days then was tested as middle-term memory and 100 days capability as long-term memory.The other parameter settings were in line with those in the verification experiment.
In the first experiment, we observed an analogous distribution of commuters' equilibrium as that in the verification experiment (referred to as ideal equilibrium because no idealized assumptions were relaxed).However, the stable state did not last long (roughly between days 500 and 1000).Afterward, a system evolved between some short-term states, where commuter numbers in certain buses were abnormally large or small.Figure 14 shows this phenomenon.
Commuters with middle-term memory made a more stable evolution as Figure 15 shows.
Compared with that in the two previous experiments, commuters' distribution maintained a relatively stable state in the long-term memory experiment as shown in Figure 16.
The results of this set of experiments indicate that the extension of each commuter's memory capability may improve a commuter group's general stability of departure time selection.The extension renders the choice distribution stable and close to ideal equilibrium.However, comparing the snapshots on day 500, the distribution of commuters with short-term memory appears to be closer to the ideal equilibrium.Therefore, commuters with short-term memory may learn faster than those with long-term memory although their learning rates are identical ( = 0.2).
Figure 17 shows that equilibrium is difficult to reach when a commuter's memory is limited.An approximate equilibrium can be formed in the case of long-term memory.Conversely, the number of commuters in certain buses in the short-term memory experiment fluctuated sharply and roughly periodically.This phenomenon may be due to the irrationality of commuters.
The above analysis indicates that ideal equilibrium is difficult to reach when commuters' memory capability is limited.When commuters have short memory, their departure time distribution is unstable.The number of commuters in certain buses may fluctuate substantially, and commuters are always willing to try other buses instead of sticking to merely one.Although ideal equilibrium may not exist when a commuter's memory is limited, we suppose that this situation is closer to reality because having perfect rationality is impossible in reality.

Conclusion and Discussion
The peak-period commuting of commuters who utilize a bus line with multiple origins and a single destination was modeled.Multiagent modeling and the BM learning model were utilized to simulate commuters' departure time choice behavior.Comparison with the results of traditional analytical method verified the feasibility of the proposed model and revealed that the model can be extended to cases where commuters are heterogeneous or have limited memory.Compared with traditional analytical methods, our model has several characteristics.
(1) Commuters are inexperienced.Their only prior knowledge is the schedule of bus services.Their intelligence is gained in the process of reinforcement learning through continuous trial-and-error and amendment.Good strategies are reinforced so that they can adapt to the environment.(2) By using multiagent modeling method, we can revert to many details in the process of commuting, such as the fully distributed learning pattern or commuters' memory limitation.Commuters' equilibrium is a result of a continuous process of game and learning but is not a resolvable mathematical problem.Thus, the model conforms closely to reality.
Simulation experiments show that the equilibrium of peak-period commuting is built from upstream stations to downstream stations.Most commuters obtain their minimum commuting cost at equilibrium.Considering the heterogeneity of commuters, the distribution of their departure time choices is broad and has a low peak.Considering the memory limitation of commuters, results show that shortterm memory may accelerate the formation of a state similar to ideal equilibrium; however, this state would not last long and irrationality can be observed afterward as the system evolves.On the contrary, an approximate equilibrium can be formed in the long-term memory case.
The two experiments illustrate that the agent-based approach can be extended to incorporate travelers' heterogeneity or bounded rationality.Although these factors have been successfully described in several traditional analytical models (see [37][38][39][40][41]), strict assumptions are usually required.The agent-based approach is more lenient.
The use of agent-based technology to model travelers' behavior is becoming popular.To the authors' knowledge and based on the literature review, only a few agent-based studies have been conducted to address the "bus-corridor problem." We implemented an agent-based approach to depict the problem that is based on Tian's variant.The results show the compatibility of the proposed model with the traditional analytical model and prove that the former is actually a more lenient method to investigate commuters' departure time equilibrium.Compared with Wahba and Shalaby's work [25], this study primarily aims to demonstrate the flexibility and compatibility of the proposed approach in terms of the "bus-corridor problem." Tian's model was employed for comparison, and commuters' path choice was neglected.
In the proposed model, there exists a loosely coupled relationship between agent and environment.We can delve into the evolutionary process and find out how commuters adapt to shocks.The dynamics of the system after a disruption can be reproduced.Furthermore, by incorporating management measures and evaluating system performance (e.g., calculating average utility of all users), policies which have the potential to reduce influences of the disruption can also be studied.Thus, it has an advantage over traditional analytical methods in investigating network vulnerability problems.
Another advantage of this approach is that it delves into individual-level dynamics and provides insight into the evolutionary process.The proposed approach presents an intuitive, bottom-up perspective to depict the morning commute situation of passengers; it is flexible such that it can incorporate commuters' mental model, decision process, individual differences, and so on.Hence, this study can account for the effect of commuters' characteristics on departure time equilibrium.By altering the perception module, we can restore the individual differences in commuters' subjective perception (e.g., the subjective perception of travel time is considered one of the main elements that influence route choice [12]).We can also modify the cost valuation module by adding new factors or changing the process of calculating cost.Hence, the influence of different incentives on commuters' behavior can be investigated.These factors can be general, such as service level, riding comfort, and transit route reliability.By altering the memory module, the ability of commuters to process information or their bounded rationality can be reflected.
In future research, our model faces the challenge of demonstrating how a sophisticated pattern of commuters' departure time might emerge when applied to a network structure.The effect of bus service quality on commuters should also be incorporated.

Figure 2 :
Figure 2: Commuter's modules and its interaction with the external environment.

Figure 3 :
Figure 3: The learning process of decision .

Figure 7 :
Figure 7: The change of commuters' number on typical buses.

Figure 10 :Figure 11 :
Figure 10: Commuters' average  max and number of commuters on typical buses.

Figure 12 :Figure 13 :
Figure 12: Commuters' average  max and number of commuters on typical buses.

Table 1 :
Different commuter groups' crowding functions and mixing ratios.

Table 2 :
Commuter's early/late arrival penalty coefficients and mixing ratios of different type.Minimal difference was observed in commuters' average  max between this experiment and the verification experiment.However, less time was required for numbers of commuters on peak buses to reach steady state.