Estimation Markov Decision Process of Multimodal Trip Chain between Integrated Transportation Hubs in Urban Agglomeration Based on Generalized Cost

An eﬃcient multimodal transportation network is crucial to the development of urban agglomeration. Rapid transfer of travelers between integrated transportation hubs is essential for long-distance multimodal trip chains. The Markov Decision Process framework was estimated to explore the optimal transfer trip chain of diﬀerent income groups, given that the states are considered nodes between hubs, the reward functions are calculated by using the generalized travel cost between states after travelers make action decision, and the actions between states contain bus, subway, taxi, and walk. The optimal trip chain can be obtained through a value iteration algorithm. In the case study, multimodal transfer trip chains of diﬀerent types between Beijing Capital International Airport and Beijingxi Station in Beijing-Tianjin-Hebei urban agglomeration were constructed by MDP to compare the optimal trip chains of various groups. The ﬁndings of this study are as follows: (1) long-distance travelers always prefer to choose the unimodal fewer transfers trip chain between hubs; (2) long-distance travelers are more likely to choose the trip chain with more transfers more than long waiting time; (3) individual income diﬀerence aﬀects the generalized cost of trip chains and also inﬂuences the optimal choice of trip chain through the MDP framework. One potential application of this study is to complement the research on the transfer behavior of multimodal trip chains in long-distance travel, which can be used to help management alleviate the excessive pressure of passenger ﬂow between integrated hubs due to the sudden colossal travel ﬂow during severe weather days or holidays.


Introduction
e United States predicted that 70% of the world's population would live in cities by 2050 [1]. e urbanization processes of China have accelerated since 1978, and the proportion of urban proportion has also continued to rise [2]. As a model of global urbanization, China's speed of urbanizing is reflected in the rapid gathering of the urban population. People's aggregation usually tends to form urban agglomeration, composed of the core megacity, and at least 2∼3 large cities provide a close connection between economy and transportation for the coordinated development between regions. In the formation process of urban agglomeration, transportation construction is a crucial investment section. e multimodal comprehensive transportation network connects core cities and surrounding cities to ensure the rapid flow of people, information, and material in urban agglomeration to realize the coordinated development of multiple cities. Compared with urban transportation, more intercity travel modes and integrated hubs have been added to the urban agglomeration multimodal transportation comprehensive network.
Travelers across the urban agglomeration always choose multiple travel modes and transfer through more than two integrated hubs from point i of city A to point j of city B. e comprehensive transportation network provides passengers with more quick and convenient travels, but it also makes passengers' choices more complex and changeable, and more factors influence their choices. e multimodal transportation network's characteristics of the urban agglomeration were different from those of urban transportation networks, such as dynamic, uncertain, complex, and confluent. Travelers are mainly affected by distance, time, and cost when choosing the unimodal [3][4][5]. It is worth noting that they would care about other factors such as transfer convenience, waiting time, walking time, location patterns, and the purpose of travel when they have the multimodal long-distance trip [6][7][8]. e previous research discussed the trip chains that have mainly emphasized the multimodal transportation in urban transportation [9][10][11][12][13][14]. A limited number of studies have addressed the long-distance trip chain formed by urban travel and intercity travel through large integrated transport hubs in urban agglomeration.
e dynamic multimodal transportation network in urban agglomeration refers to the uncertainty of the traveler's trip chain. Compared with a single transportation network, the reliability of the multimodal transportation network is poorer. Different natural disasters influence one or more transportation networks, or the abnormal increase of passenger flow caused by holidays or events, the efficiency, and reliability of multimodal transportation networks in urban agglomeration decrease. However, with the recent growth driven by the government and transportation department, there are more forms of intermodal transport in the existing multimodal trip chain supply system, such as air-railway combination, highway-railway combination, and air-highway combination. It is still not enough to solve the increasingly changeable and complex trip chain of multimodal travelers in the urban agglomeration. It also affects improving the efficiency of extremely limited multimodal transportation.
Most previous studies explore factors that affect travelers' behavior and transfer habits in urban multimodal transportation networks, including travel cost, time, walking distance, and accessibility of hubs [12,[15][16][17]. However, there is a lack of knowledge about intercity or urban agglomeration matters. Understanding the impact of different travel times, travel costs, and transfer times is essential for understanding the traveler's decision on their trip chains and improving the cooperativeness of multimodal transportation. is paper uses Markov Decision Process (MDP) to estimate travelers' trip chain decision-making in the multimodal transportation network of urban agglomeration. Markov Chain is a random process with Markov property in discrete exponential set and space state in probability theory and mathematical statistics. Markov Chain suitable for the continuous exponential set is called MDP. MDP framework is a machine learning algorithm based on Markov Chain theory, which can analyze the system control problems related to random probability. e study accounted for the vacant taxi's long-term profit over the full working period by formulating the MDP; it defined the state as the node vacant taxi located, and action as the link taxi taken out of the node.
is MDP problem is solved by value iteration to find an optimal routing policy [18]. Another study about driving behavior also applied the MDP framework to obtain the Personally Revealed Choices through a value iteration algorithm. is study defined states as the surrounding states of the vehicle, and actions as driver's behavior including acceleration, deceleration, and maintaining speed decisions, and the individual's reward function are estimated by MNL model. e Personally Revealed Choices that maximize the expected sum of rewards for individual drivers were found by MDP to understand driver's behavior decision based on the surrounding states of vehicles [19]. It has been shown that the application of the MDP framework can solve complex system control problems consisting of random elements such as states, actions, and transition probability. It is essential to decompose the complex system problem to construct the corresponding MDP framework, for example, to define the states and actions, and to determine the reward function and transition probability. erefore, this study builds upon previous studies and constructs the multimodal transportation network to extract travelers' decision-making when trip chains include different modes and cross several integrated hubs.
is study benefits from the availability of multisource transportation and economic data, allowing this study to explore the multimodal trip chain travelers' choice behavior from different income groups' perspectives. e study aims to find the optimal trip chains for different income groups among the multiple types of trip chains available. By treating the optimal action value as the realization of an optimal policy in an MDP framework, it is possible to define states over as transportation nodes (i.e., bus station, subway station, HSR station, airport, etc.). e framework derives a state's value for different actions they take (i.e., bus, railway, taxi, and walk), quantified in terms of accumulated discounted rewards. e state transition probability depends on the authors' previous survey of the passengers' transfer behavior decisions during their multimodal trip chains under abnormal events [20]. e reward function is determined by the generalized travel cost, which considers both the travel time and the travel cost when they leave from state S i to state S i+1 . e generalized travel cost is calculated by weighting the travel time value and the travel cost sum. A more accurate calculation method is used to obtain the travel time of the trip chain depending on whether or not to change the station and the difference in walking distance during the interchange. e weights of the travel time value and the travel cost are mediated according to different levels of income groups in an attempt to analyze the impact of changes in the reward function of varying income groups on the value of nodes through the MDP framework, to understand the advantages and disadvantages of different trip chains, and to choose the suitable transfer trip chain for different income groups. MDP is a reasonable approach for providing a framework for modeling decisions to multimodal trip chain travelers between two integrated transportation hubs of urban agglomeration. e methods applied in this paper have the potential to form a foundation for multimodal trip chain transfer passengers evacuation in different scenarios (i.e., holiday peak traffic, unexpected events traffic aggregation) and can provide suggestions for the resolution of passenger aggregation brought about by unusual events on the operation of multimodal transportation systems in urban agglomerations. e outline of this paper is as follows. Section 2 provides a review of the literature. Section 3 introduces the problem and definitions of the multimodal trip chain model based on the MDP framework. Section 4 presents the computational experiments of the Beijing-Tianjin-Hebei urban agglomeration. Finally, Section 5 concludes the study and discusses potential directions for future work.

Literature Review
Previous works provided early focus on the trip chain. Primerano defined the trip chain as linking secondary activities to a primary activity through travel that is made when an individual leaves home to when they return home [21][22][23]. e trip chain in early studies was usually defined as home-to-home loops. Strathman and Dueker listed seven types of trip chains and divided them into simple and complex trip chains [24]. erefore, the trip chain with two trips (such as from home to work and work to home) is called a simple trip chain, and a trip chain with more than two trips (add trips from work to a restaurant or shopping mall) is called a complex trip chain [25]. In some studies, stop frequency was utilized to measure the complexity of the trip chain [26].
Most studies of the relationship between travel behavior and trip chains found that trip chaining precedes mode choice, and travelers' included activity locations in trip chains are generally referred to as influence mode choice of travel [9,10,27].
is pattern is often reflected in the commuter's trip chain. For example, commuters often add shopping, picking up children, or dinner after work and then make their travel mode choices after these activities have been determined. Of course, the choice of mode in the trip chain is influenced by its activities and by travelers' gender, age, household income level, work or nonwork, and numbers of children [22,26,[28][29][30].
Research on the complex trip chain has mainly focused on individuals' travel behavior, especially comparing private cars and public transport [10]. A prominent finding illustrates the following pattern: the more complex the trip chain is, the more the travelers depend on private car use, or the less likely travelers to choose public transport [31]. Findings in these research are elementary to understand that travelers tend to choose the greater flexibility and convenience mode (such as auto for long-distance, walk/bike for a short distance) when they have trips with multiple destinations [32,33]. An exciting finding also shows that the higher the minimum density at destinations, the lower the odds of a complex trip chain and auto mode choice [34]. In Daisy's study, similar findings proposed that the complex trip chain could be a significant barrier to shifting from drive-along to public transport [35].
For the urban multimodal travelers, previous studies have analyzed the transfer perception and traveler's intention of multimodal trip chain, especially the traveler's attention to the transfer times, waiting time, and walking time. Some research suggested that moving auto users to the public transportation network requires reducing barriers to transfers, such as long initial and final walking time and long waiting time [36,37]. Other research has investigated the perception of transfers and analyzed the travelers' perceptions of transfers from different perspectives [16,38,39]. Most show that waiting time is more penalized than walking time, and travelers perceived outof-vehicle time as more demanding than in-vehicle times [15,40]. e simple and complex trip chain mentioned above is the chain that starts or ends with a home. Another type of trip chain starts from i in city A to j in city B, this kind of trip chain always concludes urban and intercity traffic, and it is also closely related to integrated transportation hubs. It is called multimodal transportation or integrated transportation in previous studies; the Madrid Regional Transport Authority in 1985 defined multimodal interchange as "An area whose purpose is to minimize the inevitable sensation of having to change from one mode of transportation to another and efficiently using the inevitable waiting time." [41] e multimodal transportation hub plays a vital role in society, and it can benefit the government and different stakeholders [42]. It is regarded as a symbol of "urban identity" and "urban mobility." A critical linkage in the multimodal travel chain proposed that the more smooth the transportation hub is, the more probability travelers choose multimodal travel [43,44]. eir research found that the transfer penalty is strongly related to the internal corridor structure of the hub.
Studies on the long-distance trip chain are not as detailed as urban complex trip chains. Most of them focused on the factors that influence travelers' choices, such as travel time, travel cost, and transportation service. In the research of multimodal transportation behavior, scholars tended to analyze the competition relationship and coupling relationship of several modes (i.e., HSR and airway, train, and bus) [45,46]. A MIMIC model was constructed to explore the causal relationship between the socioeconomic attributes of passengers and transfer characteristics (such as transfer comfort, transfer convenience, and transfer economy) in Ma's research [20]. It is found that gender, occupation, and departure time greatly influence the choice of travel chain.
A research by D'Este proposed a model using Markov chains to model trip chaining behavior to extend the utility of the traditional four-step travel demand models [47]. A sequence of system states was calculated to represent an individual's likelihood of participating in an activity at a particular segment of a trip chain. Other studies used Markov chain or MDP in transportation problems to predict the travel time or distribution from simulation data. Past studies have defined the trip chain and divided the trip chain from home into the simple trip chain and complex trip chain. However, there is no definition of multimodal trip chain in urban agglomeration or intercity. Most of the focus and resources are pooled towards exploring the factors that influence travelers' mode choice of long-distance.
Journal of Advanced Transportation 3

Multimodal Trip Chain in Urban Agglomeration.
For the multimodal transportation network in the urban agglomeration, which is generally composed of railway, highway, aviation, and urban transport, the efficient, coordinated operation is inseparable from the connectivity provided by various transportation hubs. e combination of urban public transportation and intercity transportation is the main object of research on multimodal travel of urban agglomeration. Compared with multimodal travel in urban, the similarity is that the transfer time between different modes needs to be considered. e difference is that the waiting time between urban public transport and intercity transport needs to be considered in multimodal travel of urban agglomeration, especially for the transfer that happened in the railway hubs, highway passenger station, or airway hubs. As a result, the multimodal travel chain in urban agglomeration is more complex than urban.

States.
In a sequential process, if the state S i+1 at time i + 1 only depends on the state S i at time i and has nothing to do with any state before time i, then the state S i at time i is considered a Markov property. To the travelers in a multimodal transportation network, the state at node i + 1 only depends on the travel decision made at node i and has nothing to do with the trip before node i. erefore, it can be seen as a multimodal trip chain with Markov property. e whole process of the traveler's trip chain from the origin to the destination can be regarded as a Markov process. e origin, destination, and nodes between the original destination can be considered the state S i .

Rewards (Penalties).
Markov Process only considers the transition probability between states, but it cannot integrate the rewards associated with the state transition. Markov Reward Process (MRP) is derived to better solve the dynamic process decision. e MRP is a tuple consisting of <S, P, R, c>, and R represents the rewards (penalties) function, defined as the reward expectation that the state S i will achieve at the next state S i+1 .
In this study, the generalized travel cost of leaving state i to the next state i + 1 is considered as the reward function. e generalized travel cost is obtained by weighting travel time and travel cost. In the multimodal trip chain, since the travel time of choosing different trip chains is not only related to the travel time of each segment of the chain but also related to the transfer time and waiting time between two distinct segments, the destination of rewards (penalties) consisted of the transfer time (R a S (c)), waiting time (R a S (w)), and transport time (R a S (t)) from S i to S i+1 e waiting time can be determined according to the departure frequency of the public traffic line. De Cea and Fernandez [48] give formula (1) of waiting time as a deviation factor related to the time reliability of a vehicle operating in a traffic network, and f k represents the departure frequency.
e travel time from state S i to state S i+1 is one of four situations (see Table 1).

Harvest.
In the MRP, the sum of all rewards with attenuation starting from state S 0 until the end state is called harvest (G), the mathematical expression of harvest is as formula (2), c is an attenuation factor, to make the model mathematically tractable, and the discount factor is restricted to 0 < c < 1. Harvest can reflect every state's importance in the state sequence. However, there may have been a few sequences. (2) In the multimodal transportation network in the urban agglomeration, there may be several trip chains from original to destination. e total reward or punishment of each trip chain can be calculated according to Table 1.

States' Value and the Action-Value Function
Harvest can reflect every state's importance in the state sequence. However, there may have a few sequences, and the same state S i may appear in several different state sequences. e value function realizes the mapping from the state to the value. To solve the inconvenience of harvesting in describing the importance of a state in different state sequences, the concept of "value" is introduced more accurately. Value is the expectation of the state's harvest in the Markov reward process, and it can be calculated according to formulas (3) and (4). Formula (4) shows that the value of a state is composed of the rewards leaving the state and the value of the subsequent states according to the probability distribution sum in a certain attenuation ratio.
When selecting individual behavior is involved in the Markov Reward Process, it is necessary to introduce the Markov Decision Process. MDP is a tuple composed of <S, A, P, R, c>, in which S is a finite set of states. A is a finite set of behaviors, including all the choices an individual may make in the decision. P is the set's behavior-based state transition probability matrix, and R is the state and behavior-based reward function.

e Action-Value Function.
In MDP, an individual has the right to select an action from the behavior set according to his understanding of the current state, while the dynamics of the environment determine the subsequent state of an individual after selecting a specific action. e policy is represented by the letter π, defined as an individual choosing an action from a set of behaviors in a given state.
For the same MDP, different strategies will produce other Markov processes or MRP, and then there will be additional state value functions.
e previously defined value functions should be improved to formula (6), and the Policy-based value function in MDP represents the expected harvest following current policy π starting with the state e value of different actions selected by individuals in the same state is called the action-value function based on policy π, representing the expected harvest by performing a specific action a (a ∈ A) on the current state S i when following the policy π. e value of different actions is represented by q π (i, a).
According to Bellman (1954), we can obtain two Bellman equations formulas (8) and (9). It can be seen that the action is a bridge between two adjacent states in MDP.
e value of an action is related to both the state value before this action and the subsequent state value after this action. e formula can be expressed as Similarly, the value of state S i can be expressed by the value of all possible actions under this state: v π (i) � a∈A π(a|s)q π (i, a).
In searching for the optimal policy, a better policy is usually determined by comparing two policies, which need to be defined. e optimal state value function (v * (s)) is the optimal state value produced under all policies; the optimal action-value function (q * (s, a)) is the optimal action-value under all policies.

4.1.
e Multimodal Network and Transportation Data. Beijing-Tianjin-Hebei urban agglomeration includes a core megacity (Beijing) and two large cities (Tianjin, Shijiazhuang) with a permanent resident population of 110 million, an essential component in this specified area. It is one of the crucial urban agglomerations in China. It plays a vital role in defusing Beijing's noncapital functions, strengthening the linkage between Beijing and Tianjin, and enhancing the comprehensive ability of Hebei province. e 2022 Winter Olympics will be held in Beijing and Zhangjiakou in Hebei province. It will bring a surge in traffic flow, resulting in more pressure and bringing risk to the transportation network of the Beijing-Tianjin-Hebei urban agglomeration.
Take the transportation corridor from Beijing Capital International Airport to Beijingxi Railway Station as an example; the more multimodal trip chain travelers through the above two hubs, the more pressure there is in the corridor between the two hubs, and the more likely to generate passengers gathering in the two hubs. e traditional way to evacuate the passengers in transportation hubs always ignores the traveler's final destination. It is not helpful to the transportation efficiency of urban agglomeration. Considering the travel characteristics of passengers in the trip chain of urban agglomerations, travel time, travel cost, and transfer times are taken as the critical factors for their transfer between hubs. e study on evacuation schemes under abnormal passenger flow is combined with the different needs of different travel groups, which is more effective in solving the blocked operation of interhub passages in urban agglomerations. In this case, the multimodal transportation network between Beijing Capital International Airport (BCIA) and Beijingxi Railway Station (BJXRS) has been constructed in Figure 1.
Trip chains shown in Figure 1 refer to the recommended scheme of AutoNavi Map in different periods, including the main modes of transportation, such as subway, bus, taxi (including online car-hailing), and airport bus. ere are nine trip chains according to Figure 1 from BCIA to BJXRS.
As shown in Table 2, the nine trip chains were divided into four categories in the study according to the transfer times and mode of transfer. Considering that long-distance multimodal trip chain passengers with luggage in urban agglomeration always consider travel time, travel cost, and comfort, multimodal transfers may bring more walking distance and waiting time. erefore, the type of travel mode and the number of transfers are used to determine the category of the trip chain. e trip chain with more than one transfer will be called "more transfers," and the trip chain with no transfer or only one transfer will be called "fewer transfers." Combined with whether passengers switch travel modes in the trip chain, the following four categories of trip chains (unimodal more transfer; multimodal more transfers; unimodal fewer transfers; unimodal more transfers) are formed. Table 3 shows each segment's travel information of trip chain No. 1 to No. 9, rows 3∼11 in Table 3. 1 represents nine trip chains, and the vertical I, II, III, and IV represent the first, second, third, and fourth trip segments of this trip chain. For example, line 2, row 3, "0 ⟶ 2 (B)" indicates that the first trip segment of the trip chain No. 1 is from the origin node (BICA) to the node "2" (Dongzhimen), and "(S)" represents that travelers choose subway in this trip segment. Line 3, row 3 indicates the type of travel time calculation for this trip segment, referring to Table 1

Comparison of Harvest.
e multimodal trip chain of MDP conducted in this study regards the generalized cost from S i to S i+1 as rewards.
e generalized cost better describes the travel cost of different income groups who care about both travel time and travel cost, and it can be obtained by weighting the sum of the travel time after value calculation and cost in the same segment of the trip, and the weight of time and cost can be adjusted to represent different income groups. Travel time is usually converted into a time cost dimensionalized with expense by drawing into a time value. Formula (12) shows the way of valuing travel time.    Journal of Advanced Transportation

T-valued time (CNY/h); t-travel time (h); Y-annual income of traveler (CNY).
In formula (12), "Y" is the annual income of urban residents, "22" is the average number of working days per month, "8" is the average number of working hours per day, multiplied by 12 to get the result that is the number of working hours per year for urban residents, and "t" represents the travel time in travelers' each trip segments of their trip chains. According to the survey data of the National Bureau of Statistics of China in 2020, the per capita disposable income of Chinese residents in 2020 is 32,189 yuan, which is divided into five equal income groups. e low-income, low-middle-income, middle-income, uppermiddle-income, and high-income groups' per capita disposable income are 7869 yuan, 16,443 yuan, 26249 yuan, 41,172 yuan, and 80,294 yuan, respectively. Formula (13) was adopted to calculate the generalized travel cost of each income group.
In formula (13), "R(i)" is the reward function of each trip segment of the trip chain; μ 1 and μ 2 are the weighting factors for the valued time and travel cost, respectively; "T i " is the valued time of the trip segment of the trip chain, and "c i " is the travel cost of the trip segment of trip chain.
It can be seen that the time value coefficients of different income groups are different, and it is mainly related to the average annual income of other groups. Travelers are divided into seven groups in this study to explore the various groups' trip chain choices deeply. In addition to the five groups of varying incomes (see Table 4), groups a and c were set up to represent those who only consider travel costs and travel time. For the group that only cares about economic expenditure, the generalized cost of trip chain formula consists of travel cost only, and for the group who cares two, the procedure is determined by adjusting the weight of valued time and cost (it is shown in Table 4 rows 5∼6; lines 4∼8).
According to formula (2) in Section 3.2, the harvest of different trip chains can be calculated, and the results are shown in Table 5. Results of harvest shown in Table 5 illustrate every trip chain's total rewards in the different traveler groups. Comparison of calculation results on harvest shows that trip chain No. 3 is the least cost choice for people who only care about cost when transferring from BCIA to BJXRS, and trip chain No. 9 is the least for both low-income and low-middle-income groups. Trip chain No. 1 is the minimum generalized cost to the middle-income and upper-middle-income groups. Trip chain No. 7 is the minimum generalized cost to the highincome group and passengers who only care about travel time. Figure 2 compares the generalized travel cost of each trip chain in different groups more intuitively. Trip chain No. 7, which belongs to unimodal fewer transfer type, is the maximum generalized cost chain in groups a, b 1 , b 2 , b 3 . Still, it changes to the better or best choice in groups b 4 , b 5 , and group c. Travelers who take taxis from BCIA to BJXRS may get the least generalized cost when they have more annual income. In contrast to this phenomenon, trip chains No. 3 a, b 1 , b 2 , b 3 , but they transfer to be the most generalized cost in group b 5 and group c.

and No. 4 show the less generalized cost in group
From line 2 in Table 5, it is readily observed that trip chains Nos. 3, 4, 8, and 9 belong to the lower generalized cost one, but none of them is the least generalized choice in group b 1 ∼b 5 and group c except trip chain 9. Differences in their types might cause this result. Based on Table 2, trip chains 3, 4, and 8 are the multimodal trip chain, but trip chain 9 is the unimodal trip chain. It can be explained that travelers prefer  Journal of Advanced Transportation 7 to choose the trip chain that concluded with fewer modes in their trip. An exciting finding is that trip chain 1 (type (1)) is the least generalized cost in the b 3 and b 4 groups. However, those trip chains with fewer transfers (type (3) and (4)) are not the least. It can be found that there is a long period of waiting time in the first segment in trip chains No. 3 and No. 4 by checking the information in Table 3. e average waiting time of that segment is 1.5 hours, and the time value coefficient is higher between the middle and upper-middleincome groups. It means that more attention should be taken to the out-of-vehicle time in the trip chains of urban agglomeration than before. Figure 3 shows the comparison of the harvest for four pairs of trip chains considering differences in traveler income. Figure 3(a) shows the comparison between unimodal and multimodal trip chains with more transfers; Figure 3(b) shows the comparison between the two multimodal trip chains with more and fewer transfers, respectively; Figure 3(c) shows the comparison of harvest for the two unimodal trip chains without transfer; Figure 3(d) shows the comparison between unimodal and multimodal trip chains with fewer transfers. Looking first at part a of Figure 3, the generalized cost of unimodal trip chain (No. 1) is gradually lower than the multimodal one (No. 2) for middle, uppermiddle, high-income groups, and travelers who do not care about travel costs. erefore, this change becomes more and more evident as income increases. Comparatively similar is the comparison between trip chains No. 3 and No. 4. Like Figure 3(b), the fewer transfers type of trip chain has lower generalized cost than the more transfers type when the income increases. However, the difference is that the change in the gap between the two trip chains is relatively tiny. e apparent discrepancy between the same type of trip chain appears on trip chains No. 6 and No. 7. As shown in Figure 3(c), the generalized cost of unimodal fewer transfers type of trip chain presents a sharp difference with the change of income of different groups. e generalized cost of taking an airport bus is less than a taxi to the middle and below middle-income groups and those who do not care about travel time.    Uni-more (1) Multi -more (2) Multi-more Mult-fewer (3) Multi-fewer Uni-fewer (4)     Journal of Advanced Transportation On the contrary, taking a taxi is the lower generalized cost trip chain for travelers with more annual income or who do not care about travel costs. e equivalence point of trip chains No. 6 and No. 7 shows the gap between upper-middle and high-income groups. Over 60% of travelers may choose the airport bus to have the unimodal fewer transfer trip between two hubs in urban agglomeration. Similar but critically different to the previous comparison that appears on trip chains No. 8 and No. 9 (Figure 3(d)), the exciting turning point occurs for the middle-income group. e only difference is whether travelers choose the subway or not in the first segment, and it can be seen that the low-middle-income and low-income groups need to select the traditional bus to ensure the lower generalized cost. us, travelers who do not care about travel costs or have more income always choose the subway for the first trip to provide the lower generalized cost. It can also be seen that the growth of income has a positive association with the subway mode preference. Table 6 presents the calculation results of states' value of MDP by Python. As shown in Figure 1, eight nodes occur for the multimodal transportation network, while travelers need to travel from S0 to S7. is study depends on data from the survey conducted by the authors in May 2020, which focuses on the transfer decisions of multimodal trip chain passengers between integrated hubs of Beijing-Tianjin-Hebei urban agglomeration.

States' Value and the Action-Value Function.
e results of this survey show that the transition probability of transfer passengers when leaving the integrated hub is bus (34.8%), subway (53.7%), and taxi (11.5%). In this part of the analysis, the generalized cost is considered as penalties (reward function), so when solving for the state's value, the generalized travel cost was taken as negative values as input. Since the formula for calculating the generalized cost function is different for each income group, the corresponding state's values were calculated separately for groups a to c (see Table 6).
As shown in Figure 1, travelers will make several choices from state S i to the next state S i+1 . Different mode choices and routes are included in their various actions. It can be considered the action-value function for the traveler's decision in S i . It represents the expectation of reward function of the traveler when they take action in S i . e Bellman equation can derive formula (14)  Journal of Advanced Transportation function. e optimal action-value q * (s, a) of 14 segments starting from node S 0 and arriving at the destination S 7 through different travel modes can be calculated (see Table 7). One of our analysis interests is comparing different types of trip chains on the MDP framework of the multimodal transportation networks. For this purpose, the results of 9 trip chains' total action-value were calculated, as shown in Table 8.
Results in Table 8 and Figure 4 show that, for highincome groups and travelers who do not care about travel cost, trip chain No. 7, with the least transfer and unimodal travel, has become the best choice for choosing the minimum generalized cost and the optimal total action-value.
More attention is taken to the groups of low, middle, uppermiddle-income, and travelers who only care about travel costs. e unimodal trip chains almost show advantages in the comparison. e unimodal fewer transfers rip chain No. 6 was the optimal total action-values choice. Also, the unimodal more transfers trip chain No. 1 shows better than others in some groups (middle and upper-middle-income). e possible explanation is that transfers of trip chain No. 1 happen in the same station, with shorter walking and waiting time.
Another finding was found in this study. ere is no trip chain of multimodal type that presents an advantage in comparing total action-value. It may be explained that travelers with long-distance trip chains are less likely to transfer between different modes than different routes of the same model.    Uni-more (1) Multi-more (2) Multi-more Multi-fewer (3) Multi-fewer Uni-fewer (4)

Conclusion
In this paper, the multimodal trip chain between hubs in urban agglomeration is regarded as an MDP problem, aiming to analyze trip chains of different types in various traveler groups. To accurately analyze the travel time of the trip chain, four types of calculation were used to obtain travelers' time according to multiple patterns of transfer, and trip chains are divided into four categories depending on the transfer times and travel modes interchange. en, travelers were divided into seven groups to compare the generalized cost of different trip chains, making the comparison more meticulous. Starting from the multimodal transportation network's dynamic and random characteristics, we conducted the Markov Decision Progress to research an individual's optimal choice on the network by comparing traveler's total action-value of the trip chain.
It should be noted that almost all existing results for the analysis in this study show that the unimodal trip chain, including fewer transfers or more transfers, always becomes the best choice of a particular group. And the comparison of the minimum generalized cost and the optimal total actionvalue in MDP shows other factors that have essential impaction on the traveler's decision-making, such as waiting time, walking time, and transfer times, whether transfer happens in the same station. New findings in this study complement previous studies on influencing factors of trip chain choice (including on-vehicle time and travel cost).
According to the above analysis and discussion, several conclusions can be drawn from this study. First, for the travelers who have long-distance trips with transfers, the waiting time plays a decisive role in the travel behavior. Compared with the long waiting time, people may prefer to transfer more times. Second, individual income difference and personal characteristics of travel affect their choices. It shows a significant difference in comparing same type trip chains, such as unimodal fewer transfers type and multimodal more transfers type of trip chain. e transition occurs in the middle and upper-middle-income groups when travelers hesitate between taxis and buses from one integrated transportation hub to another in the urban agglomeration. However, the middle-income group changes when people choose between subway and buses in the multimodal more transfers type of trip chain. Finally, travelers are more likely to choose unimodal than multimodal trip chains, even if the unimodal trip chain involves more transfer times, consistent with previous research [15]. e findings of this study suggest the great importance of travel management for long-distance multimodal trip chains in urban agglomeration. e essential difference of multimodal trip chain between urban agglomeration and urban is that the former does not start or end at home, and travelers always determine the activities before mode choice of trip chain, especially the mode choice of a long trip in their trip chain. Another difference is that a longdistance trip chain traveler will not stay a long time in a hub or node, and it may delay the following trip. Transportation efficiency is the most important factor for travelers with long-distance trip chains. Based on the results of this study, the unimodal fewer transfers type of trip chain is the better choice to enhance the capacity of transportation between two hubs. erefore, to solve the traffic pressure rises during lousy weather or abnormal situations, the shuttle bus and subway from one comprehensive hub to another should be increased, along with more taxi or online car-hailing. Appropriate discounts for shuttle buses can attract middle and low-middle-income travelers for a different group of travelers. e pressure of subway transportation can be reduced, and the sharing of multiple modes of transport will be more balanced. is research chooses a part of the long-distance multimodal trip chain in the urban agglomeration, and more attention is focused on the trip chain between two integrated hubs in their long-distance trip, and it chooses two important hubs in Beijing-Tianjin-Hebei urban agglomeration as a case study. In the future, the intercity trip can be added to the long trip chain to compare the different types of trip chains in urban agglomeration. In summary, this study provides a method of comparing the other multimodal trip chains of urban agglomeration multimodal travelers and considers more factors to influence long-distance travelers' behavior.
Data Availability e data, models, and codes used to support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.