A Tree-Based Model for Setting Optimal Train Fare Zones

Given a railway line with n stops and the number of travelers between each pair of stops, we show how to split these stops into k different fare zones in order to maximize the benefit obtained from the sale of tickets to the travelers. We present a method to obtain this solution that is based on finding the longest path in a weighted root tree. This method improves in terms of efficiency the combinatorial method, where all the possible distributions have to be considered for deciding which is the optimal one.


Introduction
In regional or local transportation networks a fare system where the stops are grouped into different zones is usually applied.This is the case for subway, bus, and regional railway transportation networks.The price of a ticket between two stops depends on the number of zones that a passenger must traverse during the travel.The stops and the number of zones are determined by the company.The assignment of the stops to certain zones usually depends on their distance to the center of the network.This is usually an arbitrary decision and it could attend to different reasons.The maximization of benefits can be addressed either to social welfare or to profit.In this last case, an analysis of the flow of travelers between every single pair of stops is required.One may wonder whether a change of fares in order to increase the income would result in a decrease of the demand; however this depends on several factors such as the type of area, the trip purposes, the distance of the journey, and the ticket types, among others.Moreover, one also has to distinguish between short, medium, and long-term elasticities [1].In our work, we will consider that the number of travelers between each pair of stops is constant and independent of the change of the fare prices.
1.1.Related Work.In 1980 Webster and Bly published a collaborative report on [2] identifying many factors which influence demand of travelers.This has been of great value to public transport operators and transport planners.This has been recently updated in [1].This same year Cervello analyzed the effect of flat rates versus differentiated ones [3].In particular, he studied the impact of the variability of fares depending on the distance and time-of-day.Moreover, the differentiated fares can be further subdivided into zonal fares, distance-based fares, sectional fares, and time-based fares [4,5].Zonal-based fares are the most common fare systems in regional railway and underground transportation networks in Spain.
Several studies have been conducted in the last thirty years in order to study fare optimization; see, for instance, [6][7][8][9].The objective functions in the above studies were to maximize either total profit and/or social welfare.In some of the cases elasticities in the demand of the transportation service were included.The solution of these problems was given by the maximization of certain profit function subject to certain restrictions.Sometimes, the maximization can be split into two levels using a bilevel model approach, with the operator model at the upper level and the user at the lower one; see [10,11].For further information we refer the reader to [12, Section 2.2].

Mathematical Problems in Engineering
The problem of fare zone design for local public transportation networks using a graph theoretical approach was considered by Hamacher and Schöbel in [13]; see also [14,15].In these works, the design is obtained in order to minimize either the maximal or average deviation between the fare zone system and the distance tariff for all the pairs of stops.Some heuristics algorithms were provided in this line in [15].
In this work we present an algorithm to find the optimal fare zone system for a line of transport, where the stops are located one after the other.Our model is based on finding the longest path between the root and one of the leaves on a certain weighted directed tree with root.Weighted graphs often appear in the modeling of problems of logistics [16], scheduling and transport routing [17], and management in the air industry [18,19] and of car-rental companies [20]; see also [21].
Apart from this problem, the use of discrete mathematics in railway transportation problems is not new and has been considered for different problems [22][23][24].This is done, for instance, in the rolling stock planning phase [25,26] and in its integration with the timetabling phase [27][28][29], or in the assignment tracks to the trains [30,31].In particular, graph models have been used for choosing a fare planning model that maximizes the revenue [32,33].

Preliminaries on Weighted Graphs and Trees.
We recall that a directed weighted graph is a 3-tuple  = (, , ) where  is the set of nodes,  is the set of arcs (ordered pairs of nodes), and  :  → N 0 assigns to every arc (, V) ∈ , with , V ∈ , a positive integer amount (or zero) denoted by (, V), or just as  ,V .A path on a graph is defined as an alternate sequence of nonrepeated nodes and arcs of the form  1 , ( 1 ,  2 ),  2 , ( 2 ,  3 ),  3 , . . .,  −1 , ( −1 ,   ),   for some nodes  1 ,  2 , . . .,   ∈  and ( 1 ,  2 ), ( 2 ,  3 ), . . ., ( −1 ,   ) ∈ .A directed tree is a directed graph which would be a tree if the directions on the arcs were ignored and all the arcs are directed away from a particular node, which is called the root of the tree.A nondirected graph is a tree if it is connected (every pair of nodes are joined by a path) and it verifies that The solution of the longest path problem on a graph  = (, , ) consists of finding the path between a certain pair of nodes , V ∈  whose sum of the weights of the arcs is maximized.Despite the fact that this is in general an NPhard problem, its solution can be computed in a linear time on directed acyclic graphs.For further information on graphs and trees, we refer the reader to [21,[34][35][36].

Organization of the Paper.
The paper is organized as follows: in Section 2 we introduce the notation and we calculate the total number of admissible solutions to the problem using a combinatorial approach.Section 3 is devoted to the complete analysis of the case of 2 zones.A general algorithm for the construction of a directed rooted tree for an arbitrary number of stops and zones is provided in Section 4. Then the solution to our problem is reduced to find the longest path from the root to one of the leaves in this tree.In Section 5 we give details about the size of the tree constructed in Section 4 and how to label their nodes.This will help us to study the computational cost of this problem that is shown in Section 6.We conclude this section with an example of application.

Notation
Our algorithm for finding the optimal solution considers partial admissible solutions of the problem and compares among themselves in order to choose the optimal one.For dealing with this partial distributions and with all the admissible solutions to the problem, we introduce a special notation for referring to them.Besides, as we want to get the optimal revenue for the company that supplies the service, then the number of passengers between every pair of stops must be taken into account.
Firstly, let us consider a railway line where we know the location of  stops.For determining a fare system of  zones, we assume that there is at least one stop at every zone; therefore  ≥ .The case  =  has a trivial solution since we have a unique stop at every zone so that the cases of interest will be the ones with  > .
Secondly, let us consider  stops, namely, { 1 ,  2 , . . .,   }.These stops are assumed to be ordered in the railway line; namely,  1 ≺  2 ≺  3 , . . ., ≺   , with  1 being the first one and   the last one.We will also assume that every single stop   is located at only one of the  zones, and we denote by   its zone, with 1 ≤  ≤  and 1 ≤   ≤ .
We assume the following statements about the distribution of stops into zones.The first one indicates that two consecutive stops in the line must be in the same zone or in contiguous zones.
(A2) Consider  1 = 1 and   = ; that is, the first and the last city of the line are located at the first and at the last zone, respectively.
(A3) For every 1 <  <  there exists some 1 <  <  such that   = ; that is, we have at least one stop at every zone.
We introduce a notation for referring to particular distributions of  stops into  zones.We also consider partial distributions that deal with   <  stops distributed into  zones.
If we distribute all the  stops we refer to this distribution as an admissible one.Definition 2. We define an admissible distribution of  stops into  zones as a -tuple of the form [ 1 ,  2 , . . .,   ] where  1 +  2 + ⋅ ⋅ ⋅ +   =  and   ≥ 1 for all 1 ≤  ≤ .
The total number of admissible distributions can be computed using a combinatorial approach.Proposition 3. Let  be the number of stops to be distributed into  zones.The total number of admissible distributions of this  stops into  zones is ( − 1)!/( − )!( − 1)!.If we assume the necessity of having at least one symbol S at every zone, then we can rephrase the problem to calculate how many strings can be constructed with  −  symbols S and  − 1 symbols C. For instance, a string of the form with  1 , . . .,   ≥ 0 and . The total number of strings constructed with two different elements S and C, with S appearing  −  times and C appearing  − 1 times, is given by which is the number of permutations with repetition of  − 1 elements with  −  of one type and  − 1 of the other type.
For  much more greater than , this is of order O( −1 ).
Given an admissible distribution [ 1 ,  2 , . . .,   ] we can compute the benefit obtained from the passengers tickets.For every pair of stops   ,   with 1 ≤ ,  ≤ ,  ̸ = , we denote by  , the number of passengers that depart from   and arrive at   on a certain period of time.A passenger with a return ticket between two stops is counted twice, one on each sense.We assume that these passengers pay two times the price of a single ticket and no discount is granted to them.
Once we have an estimation of the number of passengers between every single pair of stops, we can compute which will be the benefit obtained by the sale of tickets after applying a certain assignment of stops to the zones.As we have indicated in the Introduction, we are assuming that a change of the location of the stops in the zones does not modify the number of travelers between every pair of stops.
We consider a fare system for the sale of tickets based on the number of zones.The price of a ticket depends on how many zones a passenger crosses in his/her trip, that is, a counting zone tariff system.If a passenger remains at the same zone, then the fare is ℎ 0 .If it is required to visit 2 zones, that is, the initial and final stops are at contiguous zones, then the price would be ℎ 0 + ℎ 1 .When a passenger visits 3 zones, then the cost of the ticket would be ℎ 0 + ℎ 2 , and so on.For simplicity, when we compute the benefit coming from a certain distribution of stops into zones we omit the fixed value ℎ 0 since it is applied to all the tickets.For maximizing the benefit, the idea is to find a distribution that globally forces the passengers to traverse the highest number of zones, taking into account the aforementioned restrictions set on the distribution of stops into zones.This is the underlying idea on which the following algorithms are based.

The Case of 2 Zones
The simplest case of just a single zone is not a problem for any number  of stops.In this section we solve the problem of findig a distribution of  stops into 2 zones in order that the revenue obtained from the passengers tickets is maximized.First, we introduce the trees that model this problem in terms of  ∈ N. Later, we will show how to compute the optimal solution using the structure of this tree.Some figures will help us to show the construction of the trees used to model different statements of the problem.A dashed line will be used for separating admissible distributions (nodes) with the last stop in different zones.Besides, a node with a thick line around it will be used to indicate the admissible distributions at every step, that is, the ones verifying assumptions (A1), (A2), and (A3).

Definition of the Tree for 𝑛 Stops and 2
Zones,  ≥ 3. We start at the first stop  1 that must be located at zone 1.This is represented by the partial admissible distribution [1, 0].Then we can add one more stop,  2 .By assumption (A1),  2 can be assigned either to zone 1 or to zone 2. The first case corresponds to the partial distribution [2, 0] and the second one to the unique admissible distribution [1,1].A pair of arcs departing from [1, 0] and arriving at [2, 0] and [1,1] shows that these distributions are generated from [1, 0]; see Figure 1.
A third stop,  3 can be added.Taking into account (A1) we have two options.On the one hand, the distribution [2, 0] lets us generate two new distributions: [3, 0], if we add the third stop to zone 1, and [2, 1], if we add it to zone 2. On the other hand, from distribution [1,1] we can only add  3 to the second zone since there is no third zone in this case.This enables us to consider the distribution [1,2], too.However, the unique admissible distributions with 3 stops will be [2,1] and [1,2], and not [3,0].In Figure 2 we have represented this case.
We can proceed by induction using (A1) to construct a tree for each number  of stops.Suppose that we have done Figure 1: Tree for 2 zones and 2 stops.the construction up to  − 1 stops.We will see how to get the partial distributions for  stops.There are two kinds of nodes representing a distribution of  − 1 stops attending to its form as follows: With this procedure we construct a tree with a root at [1, 0]; see Figure 3. Fixing a value  ∈ N, the nodes [, ] with  +  =  will be the leaves of the tree.In particular, every leaf, except [, 0], corresponds to one of the admissible distributions of  stops into 2 zones, bijectively.
We point out that in our representations of the trees generated under this procedure, all nodes with the same value for  +  are displayed horizontally as we can also see in Figure 3.Moreover, the nodes of the form [, ] with  = 0 are set to the left of the dashed line, since they correspond to partial distributions that uniquely have stops in zone 1.The rest of the nodes are set to the right of the dashed line since they have at least one stop in zone 2.

Optimal Solution for 𝑛 Stops and 2 Zones.
For finding the optimal arrangement of  stops into 2 zones, we assign weights to all arcs in order that if we take an admissible distribution [, ] with  +  =  and  ̸ = 0, then the sum of the weights of the arcs in the path that connects [1, 0] with [, ] will represent the total benefit obtained from the passengers that have to change of zone during their travel.Theorem 4. Let  be a tree constructed under assumptions (A1), (A2), and (A3), for representing the admissible distributions of  stops into 2 zones.Let us assume that the number of passengers between every pair of stops ( , ) 1≤,≤ is given.
Then there is an assignment of weights to the edges in order that the sum of the weights of the longest path from the root to [1,0] to a leaf of the form [, ] with 1 ≤ ,  ≤  and  +  =  returns the maximal benefit for distributing  stops into 2 zones.
The weight that we assign to this arc will represent the increment of benefit due to the changes of zone of the passengers that travel between each one of the stops { 1 , . . .,   } and  ++1 .The following formula gives the value of the weights for all the arcs in the tree: In this way, the optimal solution is given by the weight of the longest path among all the paths from the root [1, 0] to

The General Case of 𝑘 Zones
In this section we will see how to construct a tree for the case of  stops and  zones, with  ≥ 2 and  ≥ 3. The algorithm is mainly based on assumption (A1), which permits inductively defining the nodes and the arcs as we have seen in Section 3.
An example of a tree for 5 stops and 3 zones is represented in Figure 4.
Step k.Since   ̸ = 0 then consider the following.Theorem 5. Let us consider  stops to be distributed into  zones and assume that the number of passengers between every pair of stops ( , ) 1≤,≤ is given.Let  be a weighted tree constructed under assumptions (A1), (A2), and (A3) as it is indicated in Section 4.1.
Then the longest path from the root to [1, 0, . . ., 0] to a leaf of the form [ 1 , . . .,   ] with 1 ≤  1 , . . .,   ≤  and  1 +⋅ ⋅ ⋅+  =  returns the maximal benefit for distributing  stops into  zones.Remark 6.We point out that in substeps 1.2, . . .,  − 1.2 some nodes defined are useless for finding the solution, since they do not belong to any path from the root [1, 0, 0, . . ., 0] to any admissible distribution.However, their definition will simplify the enumeration of the nodes as we will see in the next section.

Some Considerations regarding the Nodes
The algorithm in Section 4 lets us construct a tree that models our problem.As we have seen in the previous section, the construction of the nodes and the arcs joining them can be done inductively.However, it could be of interest to find an explicit formula for calculating the total number of nodes in each tree; see, for instance, the case of 3 zones and 5 stops in Figure 5 and the case of 4 zones and 6 stops in the figure in the Supplementary Material available online at http://dx.doi.org/10.1155/2014/384321.
Suppose that we have  stops and  zones, with  >  ≥ 2. At the th step of the construction of the tree,  = 1, . . ., , we have all admissible distributions of  stops distributed in a row.More precisely we have all distributions of the form [ 1 , . . .,   ] with ∑  ℓ=0  ℓ =  and if  ℓ = 0, then the next indexes are null; that is,  ℓ+1 = ⋅ ⋅ ⋅ =   = 0.
As the number of nodes in a row is represented by combinatorial numbers, in order to assign labels to the nodes, we can use an existing connection with the partial sums by rows of Pascal's triangle.Furthermore, explicit formulas for these numbers can be provided using generating functions [38].
Proposition 8 (see [38]).Let us consider the functions If we denote by  , the th coefficient of the Taylor expansion of   at  = 0, then the following relation holds: Example 9. Let us analyze with more details the basic cases of 2 and 3 zones with  stops.For 2 zones, the total number of nodes of the tree generated for distributing  stops is Similarly, for 3 zones and  stops the number of nodes needed to define the tree is and for 4 zones and  stops we have a tree with the following number of nodes: Remark 10.The formulas in Proposition 8 and in Example 9 can be obtained taking into account that the sum of combinatorial numbers along a diagonal of Pascal's triangle can be expressed as an arithmetic sequence of higher order [39].The general formula for the number of nodes in the problem of  zones and  stops is given by that can be also rewritten as Therefore, using properties of arithmetic sequence of higher order [39] and if  is much more greater than , we affirm that the number of nodes of the tree for the problem of  zones and  stops is of order O(  ).
We can also enumerate all admissible distributions of a tree using a sequential order.As we have already seen, this can be done recursively.Here we provide explicit formulas for the cases of 2 and 3 zones.
Example 11.The number corresponding to a node representing the admissible distribution corresponding to the case of 2 zones and  stops [ 1 ,  2 ], with  1 + 2 ≤ , can be easily defined from the sequence {  } ≥1 with   = ( 2 + )/2 : For the case of 3 zones and  stops, the number assigned to a (partial) admissible distribution [ 1 ,  2 ,  3 ], with  1 +  2 +  3 ≤ , can be easily defined from the sequence {  } ≥1 with Remark 12.We point out that in the algorithm of Section 4, at some steps of the form ( * .2),we were introducing additional nodes for simplifying the notation as it is indicated at the beginning of this section; see also Remark 6, despite the fact that they are not considered for finding the optimal solution.Since no admissible distribution is accessible from one of these nodes, then we can remove them and reduce the number of nodes that we are considering in our model.Nevertheless, we have preserved them for the clarity of the enumeration.

Discussion and Results
As we have already mentioned in Section 2, for  stops to be distributed into  zones, we have to evaluate a number of ( −1 −1 ) possible solutions.The cost of generating all these solutions is of order O(  ), since  nested loops from 1 to  are required.For each one of these possible solutions one has to consider the ( − 1) arrangements of two different stops and to compute the income received from the passengers between every pair of stops (in both directions).Then the cost is of order O( +2 ).
If we think in terms of the tree constructed in Section 4.1 for modeling the problem, each possible solution to the problem of the optimal assignment of stops into zones is associated with a leaf of that tree, or in other words to a node in the zone  at step .This is consistent with the definition of  , , which is in fact ( −1 −1 ).So as to get the benefit from a certain distribution [ 1 , . . .,   ], of  stops into  zones, we have to compute the length (in terms of weight) of the unique path that connects the root [1, 0, . . ., 0] with the node associated with [ 1 , . . .,   ].Each path from the root to a leaf consists of  − 1 edges.The tree structure used

Proof.
By assumption (A2),  1 = 1 and   = .The other −2 stops must be distributed into the  zones.By assumption (A3) we have to assign at least one stop of these  − 2 to each zone  for 2 ≤  ≤  − 1.Let us count how many admissible distributions can we find rearranging the others  − 2 stops.Let us consider two types of symbols, S and C, where S stands for a stop and C for a change of zone.A string of  symbols S and  − 1 symbols C represents an admissible distribution of  stops into  zones whenever there is at least (i) one S before the first C, (ii) one S after the last C, (iii) one S between every pair of symbols C.

Theorem 7 .Figure 5 :
Figure 5: Tree for 3 zones and 5 stops with labels at the nodes.
.2. Optimal Solution for  Stops and  Zones.With a similar argument as in the proof of Theorem 4, we can get the optimal solution for  zones.