A Subjective Optimal Strategy for Transit Simulation Models

A behavioural modelling framework with a dynamic travel strategy path choice approach is presented for unreliable multiservice transit networks. The modelling framework is especially suitable for dynamic run-oriented simulation models that use subjective strategy-based path choice models. After an analysis of the travel strategy approach in unreliable transit networks with the related hyperpaths, the search for the optimal strategyas a Markovdecision problem solution is considered. The new modelling framework is then presented and applied to a real network. The paper concludes with an overview of the benefits of the new behavioural framework and outlines scope for further research.


Introduction
Transit network planning requires prediction of bus travel times, on-board loads, and other state variables representing system operations.One way to obtain such variables is to use simulation models [1,2] which reproduce interactions over time among travellers, transit vehicles, and sometimes also other vehicles sharing the right of way.
In simulation models, a transit supply module is able to support detailed simulation of vehicles serving stops with a given schedule [3], picking up, and dropping off passengers, while monitoring transit vehicles' capacities and speeds.The simulation takes into account when passengers cannot board a vehicle because its capacity limit is already reached.Examples of supply model components are those of the simulators MATsim [3], BUSMEZZO [4], and DYBUS [2].The simulators perform a within-day dynamic simulation.Each transit vehicle from the departure terminal to that of arrival is followed and, at each bus departure from a stop, the forecasted vehicle travel times, considering the irregularities of the transit services, are updated.Each traveller of a timedependent origin-destination matrix is followed from origin to final destination, and dynamic routing is applied, taking into account real-time information on current and forecasted states of the transit network.Further, a day-to-day simulation with a traveller learning and forecasting process of service attributes allows a demand-supply equilibrium condition to be obtained.
While the supply and demand-supply interaction components of transit simulation models are quite well defined in the literature [4], traveller path choice modelling still presents its limits.A case that requires in-depth analysis is that of multiservice stochastic (unreliable) service networks, where at some bus stops more than one line is available to reach the destination and some path attributes (e.g., waiting time, onboard time, and on-board occupancy degree) are random variables.
According to the seminal paper by Spiess [5], in the case of multiservice stochastic networks, a stochastic decision approach should be considered, and optimal travel strategy modelling should be applied.In stochastic decision theory, an optimal strategy, detailed in Section 2 below, is the behaviour rule that travellers should follow to optimise the expected value of the experienced travel utility.
Two types of travel strategies can be considered from a modelling point of view.One is the objective (or normative) optimal strategy, which is the behaviour that travellers should follow to optimise the expected value of the experienced travel utility.A different question is the actual strategic behaviour, subjective (or descriptive) optimal strategy, which travellers adopt, with their cognitive constraints and own perceived path attributes.Drawing on data collected through new ticketing technologies, recent research confirms that, on unreliable transit networks with diversion nodes, subjective travel strategies are sometimes applied [6][7][8][9][10][11].These subjective strategies can differ among travellers and very often differ from the objective optimal strategy.Therefore, in transit path choice modelling, a subjective optimal strategy should be used, in principle modelling each traveller or at least each traveller category.In practice, such an approach would be very complex, and therefore in the literature a unique optimal strategy is assumed valid for all travellers.Further, in order to determine the applied optimal strategy, until now two main approaches have been followed.In one approach, an objective optimal strategy is searched and adopted, such as the optimal strategy reported in Spiess and Florian [12], but in this way neglecting the travellers' cognitive limitations and simplifications.The other, as in BUSMEZZO [4] and DYBUS [2], applies path choice random utility models, and the stochasticity of the services is hidden in the stochasticity of the path choice utilities.
From this analysis of transit path choice modelling applied in simulation, the need arises to adopt in reproducing traveller behaviour not a hypothetical objective optimal strategy, but a subjective strategy-based approach, which is more realistic in relation to the cognitive and computational traveller's capacities and obtained with a stochastic decision approach.This paper proposes such a type of subjective travel strategy approach, defining travellers' utility as combinations of anticipated values through travellers' parameters to estimate, moving from the first investigation performed by Nuzzolo and Comi [13].The estimation process of such parameters is simplified by the new opportunities offered by big data collecting and processing, which allows effective reverse assignment procedures to be applied [14].
The paper is structured as follows: Section 2 analyses the travel strategy approach in unreliable transit networks and the related routes, while Section 3 considers the search for an optimal travel strategy as a solution to a Markov decision problem.Section 4 presents the proposed behavioural assumption framework and finally Section 5 reports some concluding remarks and future research perspectives.

Transit Travel Strategy and Hyperpaths
Let there be an origin-destination pair od and an unreliable transit service network with diversion nodes, that is, nodes where choices are made among different subpaths.Because of transit service stochasticity, rather than relying on a pretrip selected single path from origin up to destination, users should adopt a travel strategy ST which is [5] a set of coherent behavioural decision rules (diversion rules) at diversion nodes, according to random service occurrences (e.g., random arrival times of buses at a stop, random transit vehicle crowding, failure to board, and so on), with the aim of minimising the expected travel cost or maximising the expected travel utility.
Nguyen and Pallottino [15] highlighted the underlying graph structure of Spiess' basic strategy concept, introducing a graph-theoretic framework and the concept of hyperpath, which is an acyclic subnetwork, connecting the origin to the destination and including a subset of diversion nodes and a subset of diversion links.At each diversion node, the choice of diversion link depends on the occurrences of transit services and therefore there are certain probabilities for choosing a link among the alternative diversion links [16].
In general, two types of graph representation of a transit service network can be used: line graph and run graph.While nodes of a line graph (see Figure 1) have only spatial coordinates, in a run graph the nodes have space-time coordinates (diachronic graph).Hence, below we refer to two types of hyperpath representations: line hyperpaths and run hyperpaths.To each line hyperpath corresponds run hyperpaths with the same spatial nodes, but with different temporal coordinates for each spatial node.

Optimal Travel Strategy Search
Although this paper focuses on subjective optimal strategies, objective optimal strategy search methods are first analysed since such methods can suggest efficient search methods for the subjective case as well.

Objective Optimal Travel Strategies as Solutions to Markov
Decision Problems.Path choice in an unreliable service network entails decision making without comprehensive knowledge of possible future evolution of all relevant factors.Hence the outcomes of any decision depend partly on randomness and partly on the agent's decisions.Therefore, in this case a general theoretical framework for objective optimal strategy search can be found in stochastic decision theory.If path choice is considered as decision making in a Markov decision process (MDPs), the Markov decision problem (MDPm) approach can be considered, as, for example, reported in Nuzzolo and Comi [17] and as summarised below for the reader's convenience.
A Markov decision process (MDPs; [18]) can be defined by the quintuple (;   ;   the system, effectively becoming an expected reward, expressed as where   [, ,   ] is the relative reward when the system is next in state   . An MDPs with a specified optimality criterion (hence forming a sextuple) is called a Markov decision problem MDPm.
Policies  are essentially functions that regulate, for each state, which actions to perform.The solution of an MDPm provides the decision maker with an optimal policy * that associates to states SS actions A optimising a predefined objective function.

Objective Optimal Travel Strategy as an Optimal Policy of MDPm.
Given a run service network, the optimal travel strategy  * can be seen as the optimal policy  * of a finite and discrete MDPm, considering that (i) the set T is the set of times () when the traveller is at a diversion node s and a diversion link has to be chosen; (ii) the state space set  is the set of diversion nodes among which travellers can move; (iii) an action  is a set of diversion links among which travellers can choose with a given diversion rule and the action set    is the set of actions a; (iv) the change in the time of traveller location within the diversion node set consists in a Markov process; (v) the transition probabilities   [  /, ] are the probabilities of going from a diversion node () to each of the following diversion nodes (  ) if action  is applied; (vi) the reward function   [; ] is the expected utility of applying action  at diversion node ; (vii) the optimal policy  * gives the best sequence of actions, considering the expected utility up to destination.
To represent an MDPm, a state-action tree can be used.At every diversion node, each action can be represented with a set of outgoing links to the next diversion nodes.In Figure 2, in relation to the diachronic graph, the decision tree is reported.For example, at diversion node F three different actions are possible: (1) using run 7.  run and the expected utility of the next run and then choosing the best (action  7+8 ).With regard to transition probabilities, consider the case of diversion node F in Figure 2. If action  7+8 is applied, the probability of moving onto node  is equal to the probability of using line 7, and the probability of moving onto node  is equal to the probability of using line 8.If action  7 is applied, the probability of going onto node  is equal to 1.The same holds for action  8 and node .

Objective Optimal Strategy Search Methods.
As explored above, the search for an objective optimal travel strategy in a transit network is equivalent to the solution of a Markov decision problem, MDPm.This solution, when the transition probabilities and the expected rewards are known or computable, can be found through exact linear or dynamic programming algorithms.In particular, efficient network algorithms based on the Bellman equation [19] can be used, as in Nguyen and Pallottino [15].For example, in the case of the optimal strategy reported in Spiess and Florian [12], hypotheses of random arrivals of buses and users at stops and limited information on services allow the transition probabilities to be computed analytically, although such hypotheses are often not congruent with the case studies in question.A more recent example is the dynamic routing of Gentile [20].Note that in this case operating conditions are assumed for the transit system in several cases very different from the real ones.In order to take into account the specific case study conditions without knowledge of transition probabilities, some authors use MDPm approximate solution approaches, such as enforcement learning methods (see, for example, the simulator MILATRANS, in [21]).However, this approach requires processes of exploration and exploitation with excessive computation times to reproduce each event.Other authors, in order to consider the actual conditions of the case study, use an adaptive routing problem in a stochastic time-dependent transit network, in which the link travel times are discrete random variables with known probability distributions [22].Nuzzolo and Comi [11,17] indicate a way to estimate the transition probabilities and the expected rewards for intelligent transit networks and thus apply an exact objective optimal strategy search method.

Subjective Optimal Strategy Search.
In order to find the subjective optimal strategy given the actual conditions of the case study, some authors assume diversion rules which are too complex in relation to travellers' cognitive capacity.For example, a comparison of optimal subhyperpaths is applied by Nuzzolo et al. [2] in the simulator DYBUSRT.

The Proposed Behavioural Framework
In this paper, an approach is proposed which applies path choice behavioural modelling based on a dynamic subjective travel strategy and defined in the framework of a Markov decision problem.The proposed model, an advanced version of that presented in Comi and Nuzzolo [13], allows for service occurrences and information provided to travellers and considers some travellers' cognitive limitations and simplifications.In the following subsections, the proposed behavioural framework is presented and examined in the MDPm perspective.Further, some application examples are reported.
Traveller behavioural assumptions are defined in the context of (i) an unreliable or stochastic and within-day dynamic transit service network with diversion nodes; (ii) transit users who often travel on the origin-destination (O-D) pair (frequent users) and are equipped with advanced mobile route planners with real-time individual predictive information, supplying a set of suitable lines and relative path attributes (i.e., travel time components) from current position to destination; (iii) subjective optimal strategy-based travel behaviour.

Traveller Behavioural Hypotheses
4.1.1.Master Hyperpaths.Given an O-D pair od and its set of available paths at time , traveller , as a frequent user on O-D pair od, and with the support of an advanced transit trip planner, is assumed to consider a subset of line paths feasible for the traveller.That is, paths that satisfy some logical and behavioural constraints, sometime called a mental map (see, e.g., [21]) and here called master line hyperpath  ,  (Figure 3).Due to the randomness of transit services, travellers do not refer exactly to time  but to a time slice Δ (e.g., Δ =  ± 5 min.),even if, for simplicity, we continue to use  below.
As a master line hyperpath MHP can depend on time slice  and day , due to within-day and day-to-day dynamicity of the transit service, it is indicated as  ,,  .A master line hyperpath can be dynamically upgraded at each diversion node with respect to the service state at time  of day t.For example, information on disrupted lines allows such lines to be eliminated.

Experienced Path Utility.
Given a line service graph, a travel strategy ST is defined through a line hyperpath HP from origin  to destination , with a set of diversion nodes and a diversion rule   , for each diversion node , which determines the diversion link choice behaviour at that node.Hence a strategy ST will be indicated as [; ], withdr the set of diversion rules   .Note that on a service network, several strategies and therefore several relative hyperpaths can be used.Given a diversion rule and an objective function Of, the strategy  * [ * , ] which optimises this function is the optimal strategy conditional upon the diversion rule dr and the objective function Of, with  * the relative optimal hyperpath.
As a result of random service occurrences and traveller's choices according to a diversion ruledr, each feasible path k from the origin to the destination has a certain probability of use and its experienced path utility  ,,  is a random variable.
Therefore, it can be assumed that travellers consider the average ATU of long-period experienced values of all random TU relative to all paths of strategy ST.Thus, the subjective optimal strategy  * is the strategy with maximum average experienced utility  * perceived by the traveller .

Dynamic Travel Choices and Diversion
Rule.We assume that a traveller , in order to optimise his/her travel utility, applies the following dynamic travel behaviour: "Given a master line hyperpath, at each diversion node an optimal diversion link is chosen (with the diversion rule reported below) and the relative path is used up to the next diversion node, where a new optimal diversion link is chosen and used." The proposed diversion rule   is composed as follows: given a master line hyperpath  ,,  , at diversion node i and time  of day , traveller  considers all the diversion links il, associates to each of them an anticipated utility, defined below, and chooses the diversion link  * with maximum anticipated utility.

Diversion Link Anticipated Utility.
Given a diversion node i and a diversion link il, the anticipated utility   is obtained by summing: (i) the anticipated utility   of the subpath from diversion node  up to the next diversion node w, including the diversion link il; (ii) the nodal anticipated utility   of the diversion node  up to the next nodes   .
For example, the anticipated utility of link B-F of Figure 2 is given by the anticipated utility of subpath B-F plus the nodal anticipated utility of node F, which in turn is a function of the anticipated utility of subpaths F-E-D and F-G-D.
with    parameters of the utility function.In turn, the attributes AX anticipated by travellers are functions of path attributes forecasted (if any) by travellers and those forecasted by the information system: where (i)  ,, , is the -th anticipated attribute value at time  of day ; (ii)  , , is the -th attribute value forecasted by the information system; (iii)  ,, , is the value (if any) of j-th attribute forecasted by traveller  at time  of day  (traveller forecasting process); (iv)   ∈ [0, 1] is the weight given by traveller  to the information provided, dependent on the traveller's compliance with the information system.

Traveller Forecasted Attributes of a Path.
Assuming that travellers use an exponential smoothing forecasting method [23], the values  ,, , of the -th attributes forecasted by traveller u at time  of day  are assumed as where is the value of the j-th attribute experienced by traveller u at time  of day t-1; (ii)  ,,−1 , is the value of the j-th attribute forecasted by traveller u, at time  of day t-1; (iii) ]  ∈ [0, 1] is the weight given to attributes experienced on day t-1, depending on the memory process of traveller u.
where  ,, [  ] is the perceived share of using path   at time  in the past days and  ,,   is the anticipated utility of subpath   at time  of day .It is assumed that the values of shares  ,, [  ] perceived by traveller  at time  of day  are given by where (i)  ,, [  ] is, at time  of day t, the perceived share of using path   ; (ii)  ,,    is the weight given by the traveller to path   in relation to day   ; if path   was used at day   , with   the parameter of the traveller's memory process.
In the learning process, travellers search for the optimal weights  which maximise the average experienced utility (ATU), as simulated in the application test of Section 4.2 below.

Example of Diversion Choices.
As an example of a diversion choice, consider the choice at origin O of the first boarding stop in the master hyperpath of Figures 2 and 3.
Traveller u is assumed: (i) to identify, within the master line hyperpath, the set of diversion links with the root on O, in our case the links O-B and O-C; (ii) to associate an anticipated utility   to each diversion link ol, in our case: (iii) to use the diversion link  * with the maximum anticipated utility  * .
Subsequently, at time   , when traveller u is at the (first boarding or interchanging) stop  and a run  of a line belonging to the run master hyperpath arrives (as depicted in Figure 2), s(he) is assumed: (i) to consider the diversion link   to board run r and the diversion link   to wait; (ii) to associate an anticipated utility to each of the two above diversion links; (iii) to compare the anticipated utilities of these diversion links; (iv) to board run  if the anticipated utility associated with the link incorporating run r is greater than the maximum anticipated utility associated with waiting link   ; (v) if the traveller does not board run , the process is reapplied when the next run arrives.
4.1.9.Model Parameter Estimation.The application of the presented model requires the knowledge of the following parameters: (i)   are the parameters of the anticipated utility function   , (ii)   ∈ [0, 1] is the weight given by travellers to the information provided, (iii) ]  ∈ [0, 1] is the weight given to attributes experienced on day t-1, (iv)   is the parameter of traveller's memory process of the perceived share of using path   .
Parameters   can be obtained, for example, with standard stated-preference surveys and aggregate random utility model calibration.Parameters ], , and  can be obtained applying a reverse assignment procedure [14], minimising the distance between measured alighting and boarding (or onboard) counts and those obtained through the model [24,25].

An Application to a Real
Network.An application of the proposed path choice modelling, with a unique subjective optimal strategy and the same parameters ], , and  for all travellers, within the assignment model in DYBUSRT [2], was carried out for the same network as in the authors' other studies in the field of run-oriented transit assignment.The aim of the application was to assess how different values of parameters  and hence different combinations of the forecasted utilities of the traveller and information system affect expected values of average experienced utility ATU.The service network (Figure 4) was obtained from the real service structure of the Fuorigrotta district in Naples (Italy), whose bus running time variation coefficients were appropriately modified for the purpose of the simulation.The As regards the master line hyperpath, according to the literature on choice set formation and as reviewed by Bovy [26], the master set of path alternatives was generated from the set of all available paths and then considering logical constraints to avoid loops, successive boarding of the same run or the use of opposite lines, and behavioural constraints to eliminate unrealistic alternatives in terms of maximum values of attributes, such as number of transfers, transfer time, access and egress times, and schedule delay.Combining the residual paths, a master line hyperpath from each origin o to each destination d was generated.Level-of-service attributes composing path utilities were calculated by using a diachronic graph, whose service subgraph consists of about 10,400 nodes and 20,100 links.The experienced path utility function is the same as that reported by Nuzzolo et al. [2].
The results entail the reproduction of an initial transient of about 60 days to set up the traveller's prior knowledge of path attributes and to reach an equilibrium state, followed by 30 replications of each simulation period, aiming to obtain statistically significant estimates of state variable expected values (i.e., confidence interval method with specified precision at 95%).Anticipated attributes are estimated assuming parameter ] equal to 0.3 [27,28], while  was hypothesized equal to 1.
The assignment algorithm is coded in C++ and data are managed with a Postgres 9.1 DBMS.As the programming code is optimised to use the latest technologies in the field of multicore CPU processing, simulation times strictly depend on the CPU architecture (i.e., number of cores and processors) and on the operating system.Referring to the above-mentioned three-hour morning period of a workday (i.e., 7:00am -10:00am), simulation takes 35 seconds on a computer with an Intel Core 2 Duo 3.33GHz, 8Gb RAM, running on Mac-OSX.This time is reduced to 12 seconds if we use a computer equipped with two Intel Core i7 293 GHz, 16Gb RAM, running on MS-Windows 7.
Four different coefficient variations of bus running times were used to consider different levels of service unreliability.
The results (see Table 1) indicate that the weights used for combining the utilities in question strongly influence the average experienced utility and that the weights to use in order to minimise the experienced travel disutility strongly depend on the unreliability of the transit system.As expected, with increasing transit service unreliability and hence with increasing forecasting failures, the best overall performances are obtained with the use of a low  parameter, to give much more weight to personal than to system forecasted attribute values.

e Proposed Behavioural Framework from an MDPm
Perspective.If the behavioural framework with the proposed diversion rule is applied, the subjective optimal strategy found at a diversion node can be considered as an approximate solution of a MDPm, where (i) the master hyperpath is found by considering quite simple logical and behavioural constraints (see Figure 3); (ii) the perceived shares of use of subpaths   at time  in previous days, from the diversion node w up to the next diversion node, are proxies for transition probabilities (see (6)); (iii) the anticipated utilities are proxies for expected rewards (see ( 2)).Indeed, the anticipated utilities are functions of the anticipated path attributes, given by a combination of the values forecasted by the information system and the values forecasted by travellers.The information-system forecasted attribute values, if obtained through statistical forecasting methods, are estimates of expected values.The traveller's forecasted attribute values are obtained through exponential smoothing methods, hence proxies of expected values.Thus the anticipated utilities can be assumed to be proxies of expected utilities; (iv) the traveller, at each diversion node, considers as an action only that of choosing among all available diversion links.Referring to the example depicted in Figure 2, the state-action trees are simplified, as reported in Figure 5, where at node B the only possible action is  5+6 while at node F it is action  7+8 .

Conclusions and Research Perspectives
This paper sought to overcome some limits of transit path choice modelling, especially that concerning the use of an objective optimal travel strategy for multiservice stochastic networks, instead of subjective strategies.A path choice model was therefore developed by using a dynamic subjective travel strategy.Further, the model was defined in the framework of a Markov decision problem.The optimal subjective strategy can be considered as the solution of a simplified MDPm with approximate transition probabilities and approximate expected rewards.It takes into account service occurrences and the information provided to travellers and applies a diversion rule that considers some of the travellers' cognitive limitations and simplifications.
Even if the proposed modelling framework requires several model parameters, the new opportunities resulting from the availability of a large quantity of data obtained from automated data collecting allow model parameter estimation and upgrading to be more easily achieved, for example, by using the reverse assignment method recalled in the paper.This same data availability helps to obtain new models of travel strategy generation for different categories of users, to be used as subjective travel strategies in assignment models.Therefore, the next steps in this research will be the setup and testing of an overall procedure, including inverse assignment parameter estimation, on the test network.In the near future, through a greater deployment of bidirectional communication between travellers and information centres, a suitable quantity of data will be available, making it possible, at least in theory, to calibrate not only individual model parameters, but also specific subjective strategy-based transit path choice models.
Further research should explore master line hyperpath modelling and the development of travel strategies within theories other than that of expected utility.In addition, the introduction of stochastic path choice models which take into account user perception errors and analyst modelling errors is another possible modelling improvement.

Figure 1 :
Figure 1: Example of line and diachronic graphs.

Figure 2 :
Figure 2: Example of run hyperpath with a diversion choice at origin.

Figure 3 :
Figure 3: Example of a master line hyperpath.

Figure 5 :
Figure 5: Example of reduced action tree of Figure 2.
Given subpath k up to the next diversion node w, the anticipated utility  at time  of day  is a linear function of the vector  ,,  of its attributes AX, anticipated by traveller  at time  of day : 4.1.5.Anticipated Utility of Subpaths (  ).
4.1.7.Nodal Anticipated Utility of Next Diversion Node (  ).The nodal anticipated utility   at time  of day t, of the diversion node  with   subpaths   up to their next diversion nodes   , is obtained by travellers as a function of the anticipated utilities of these subpaths   :