An Optimization Method for the Train Service Network Design Problem

Effective railway freight transportation relies on a well-designed train service network. This paper investigates the train service network design problem at the tactical level for the Chinese railway system. It aims to determine the types of train services to be offered, how many trains of each service are to be dispatched per day (service frequency), and by which train services shipments are to be transported. An integer programming model is proposed to address this problem. The optimization model considers both through train services between nonadjacent yards, and two classes of service between two adjacent yards ( i.e., shuttle train services directly from one yard to its adjacent yard, and local train services that make at least one intermediate stop). The objective of the model is to optimize the transportation of all the shipments with minimal costs. The costs consist of accumulation costs, classification coststrain operation costs, and train travel costs. The NP-hard nature of the problem prevents an exact solution algorithm from finding the optimal solution within a reasonable time, even for small-scale cases. Therefore, an improved genetic algorithm is designed and employed here. To demonstrate the proposed model and the algorithm, a case study on a real-world sub-network in China is carried out. The computational results show that the proposed approach can obtain high-quality solutions with satisfactory speed. Moreover, comparative analysis on a case that assumes all the shuttle train services between any two adjacent yards to be provided without optimization reveals some interesting insights.


Introduction
Railway freight is an important logistical system supporting both global and regional economies. Millions of railcars are continuously moving every day and night. In 2018, around 2.6 trillion ton-km of goods were transported on the Chinese railway network, which comprises 131,000 kilometers of operating mileage [1]. e demand for transportation is growing more rapidly than infrastructure investment and construction, thus motivating the optimization of current railway operations.
A physical railway network consists of terminals and links with restricted capacity. A freight shipment is de ned as being between terminals. A common method of planning railway transportation is rst to arrange direct services between high-demand or high-priority origin-destination (O-D) pairs. Scheduled train services are then dispatched hauling the high-value shipments. e remaining shipments undergo consolidation operations. At the appropriate yard, shipments with di erent nal destinations are assembled into blocks, each of which is an arbitrary unit to be considered. e blocks are eventually attached to a train, and may be transferred from one train to another. When a block arrives at its destination, it is separated from the train, and its cars are sorted. Generally, this type of railway operation consists of the following sub-problems: the blocking problem, the block-to-train problem, and the train routing and scheduling problem. e blocking problem is the foremost sub-problem; its aim is to determine the overall blocks to be built at each yard and the speci c shipments that should be placed into block to reduce the amount of intermediate handling as they travel from their origins to their respective destinations. Once a shipment is placed in a block, it will not be reclassi ed until it reaches that block's destination. A er a blocking policy is developed, the next step is to identify which trains should carry which blocks to their destinations. On the top of these two sub-problems, the train routing and scheduling problem is considered, which determines trains' routes and Discrete Dynamics in Nature and Society 2 time-tables so as to minimize the total costs of carrying the cars [2][3][4][5].
ere is rich literature on the blocking problem and block-to-train problem. Bodin et al. [6] established an arcbased mixed integer programming model that considered the capacity constraints at each yard to calculate the maximum number of blocks and the maximum volume of cars that can be handled. Barnhart et al. [7] formulated the railway blocking problem as a network design problem with maximum degree and flow constraints on the nodes; they proposed a heuristic Lagrangian relaxation approach to solve the problem. Ahuja et al. [8] proposed a model for the railway blocking problem using a very large-scale neighborhood search technique, as adopted by many railway companies. Jha et al. [9] formulated both arc-based and path-based time-space network models to solve the block-to-train problem at the operational level. eir models assigned a blocking plan for a given train schedule with respect to minimizing global transportation costs; they also developed greedy and Lagrangian relaxation heuristic algorithms to solve the model. Xiao et al. [10] also solved the block-to-train problem at the tactical level, which determines both the supplied services and the transportation strategy for each block established in the railway network. Yue et al. [11] introduced a model which can comprehensively describe the blocking policy and various combinations of multi-route O-D pairs in large scale railway networks, and proposed an improved ant colony algorithm to solve the problem. Lin et al. [12] formulated and solved the problem of train connection services in the Chinese network to determine the freight train services, and their frequency considering the differences between the freight rail networks in China and North America. However, they only considered single-block trains. Fügenschuh et al. [13] presented a linear mixed-integer model for the car routing problem arising from Deutsche Bahn's operations. e model sought the most economical car routing, and considered train and car travel kilometers and the amount of used sorting tracks. For a railway network only using single-block trains, the blocking and train make-up problems are naturally combined.
Given that the blocking, the block-to-train, and the train routing and scheduling sub-problems are interrelated, some researchers have considered several of the issues as an integrated optimization problem. Zhu et al. [14] presented a model integrating service selection and scheduling, car classification, and blocking based on a cyclic three-layer spacetime network. Crainic et al. [15] presented a general optimization model that considered the interactions between routing freight traffic, scheduling train services, and allocating classification work on a rail network. Gorman [16] developed a heuristic approach based on the genetic algorithm and Tabu search to plan the freight railway operation by integrating train scheduling and demand-flow problems. Kwona et al. [17] developed an algorithm to improve a given blocking plan and block-to-train assignment, formulated the problem as a linear multi-commodity flow problem, and used the column generation technique to solve it. Keaton [18] formulated train operating problem as a mixed integer programming model to determine which pairs of terminals are to be provided with direct train connections and the frequencies of service. e objective was to minimize the sum of the costs of the train, car time, and classification yard, while not exceeding limits on train size and yard volumes. e voluminous literature on the blocking and blockto-train problems focuses mainly on the railway networks of North America and Europe, which are significantly different to that of China. Specifically, the railway operation is supervised by the China Railway Corporation, whose hierarchical management system is different to the parallel management by different companies in the North America. As the demand for passenger and freight transportation exceeds the capacity of the system, the emphasis in China is less on "scheduling", and more on managing the train and line operations to operate freight trains between passenger traffic. Freight trains are arranged by waiting until enough cars have been collected that are traveling either to the destination yard or farther. erefore, the models and algorithms applicable to North American and European systems cannot reflect the situation of the Chinese railway network, and the methods proposed in the literature cannot be applied directly in this study. e literature on the Chinese railway network (e.g., Lin et al. [12]) generally assumes that all the shuttle train services between any two adjacent yards are provided without optimization. is simplifies the problem and reduces the modeling complexity, but also leads to sub optimal solutions. erefore, in order to find globally optimal solutions, it is necessary to relax the assumption and reformulate the mathematical model, which constitute the motivation the motive of this paper. Moreover, the train service network design problem is a typical combinatorial optimization problem and has an inherent NP-hard nature, making it difficult for an exact solution algorithm to find the optimal solution within a reasonable time. erefore, it is of great significance to develop heuristic algorithms (e.g., evolutionary algorithms [19][20][21][22][23]) to obtain high-quality solutions quickly. is paper primarily addresses the problem of train service network design optimization at the tactical level. Its contributions are summarized as follows. (1) Two classes of train services between two adjacent yards are considered, which is different from the traditional method that assumes all the services between any two adjacent yards are shuttle services without optimization. (2) An optimization model for the problem of train service network design is proposed with respect to minimizing car-hour consumption across all the yards in the railway network. (3) An improved genetic algorithm is designed to solve the optimization model. (4) e proposed model and algorithm were tested in a realistically sized railway network.
e computational results show the proposed approach can obtain reasonable-quality solutions with satisfactory speed. Moreover, a comparative analysis against a method with traditional assumptions reveals some interesting insights. e remainder of this paper is organized as follows. Section 2 further describes the problem. An integer programming formulation is presented in Section 3. Section 4 presents a heuristic approach based on the improved genetic algorithm to solve the model. Section 5 tests the model and the approach to solving it. Finally, Section 6 summarizes the research.

Description of the Problem
ere are various types of railway service including local train services, shuttle train services and through train services. ese services can be used in di erent ways to transport a given shipment. e overall transportation process is outlined using the example in Figure 1. Figure 1 shows a simple line network consisting of four yards ( , , , ; red circles) and six railway logistics centers ( 1 , 2 , 1 , 2 , 3 , 1 ; green circles) located between pairs of adjacent yards. e green arcs (labeled 1-3) represent shuttle train services formed at one yard and broken up at the adjacent yard. e blue arcs (labeled 4-6) are through train services that are formed at one yard, pass through one or more yards and are nally broken up at a relatively distant yard. Both of these service types are collectively called direct train services. e red dotted arcs (labeled 7-9) denote local train services which pick-up and deliver shipments between yards and logistics centers and carry shipments among the logistics centers. e local and shuttle train services both run between two adjacent yards, the former makes intermediate stops while the latter goes directly from one yard to the other. e local train service naturally takes longer than the shuttle train service. We consider the methods of transporting shipments → 2 , → , and → . e shipment → 2 can only be shipped by the local train service Arc 7, which forms at yard , stops at intermediate stations 1 and 2 , and nally breaks up at yard . ere are two options for the shipment → : one is by Arc 1, a direct shuttle train service from to , the other is the local train service Arc 7, which stops both at the logistics centers 1 and 2 . ese two intermediate stops make the travel time longer than that for direct shipping, despite both the trains following the same path. ese train services can be represented as → 1 → and → 7 → , respectively. In summary, shipments between two adjacent yards can be carried either by a shuttle or local train service. e shipment → can be carried directly by the through train service Arc 4 from its origination to the destination without classi cation: i.e., (1) → 4 → , Other strategies that are possible include (but are not limited to), Strategies (2), (3), (4), and (5) involve the shipment being classi ed once, while two classi cation operations are needed in strategies (6) and (7). e above analysis shows that di erent types of shipments have di erent transportation strategies, which involve di erent costs incurred from accumulation, reclassi cation, train operation, and train travel. Selecting the best shipping strategy for each shipment while minimizing the total costs is a combinatorial optimization problem.

Mathematical Model
is section proposes an integer programming formulation for this problem. e model aims to minimize the total costs while satisfying constraints on train and yard capacity. Two cost factors are considered: the economic costs of train operation and the time delay cost of cars at yards during the journey. e model determines the types and frequencies of train services to run between yards and which cars are to be consolidated into a given train service. e following assumptions facilitate the model formulation.
e rst assumption is that the shipment routing is given in advance. Each shipment is restricted to one transportation strategy (train service chain), and will not be split during the shipping. Each freight train is arranged when the collected cars reach its size. In general, local trains are unique in their size and operation costs compared with other types; their distinctive details are given in this paper. e positive shipment volumes among the logistics centers mean that local train services running between two adjacent yards must be provided without optimization. Discrete Dynamics in Nature and Society 4 train on the accumulation cost. It depends on the arrived car ow and the level of organizational work at the accumulation yard. Its detailed description and calculation can be found in the literature [24]. : Utilization coe cient of classi cation capacity at yard . : Average extra time cost (in hours) per car at the yard compared with when a train passes through the yard without classi cation. : Number of shunting lines at yard . : Classi cation capacity in terms of the number of cars at yard . : e original shipment from yard to yard in terms of the number of cars.

Notations.
e following notations are de ned: : Set of yards in the railway network. , , , , and refer to any yard belong to . : Set of yards along the path from yard to , including , and . : Set of yards adjacent to yard . : Set of the intermediate logistics centers along the path from yard to yard , ∈ . represents the th logistics center, ∈ . : Average car accumulation parameter at yard per day. e accumulation cost of a train is the product of the term and the number of cars in the train ( , de ned below). is parameter represents the in uences of all factors other than the size of the those traveling among other yards. e nal objective function is expressed as follows.
where ∑ ∈ ∑ ∈ + represents accumulation car-hour costs when a shuttle and a local train service are dispatched simultaneously between two adjacent yards; in this case, the shipments between two adjacent yards are transported by the shuttle train service. ∑ ∈ ∑ ∈ 1 − + represents accumulation car-hour costs when only the local train service runs between two adjacent yards, and thus shipments between two adjacent yards are transported by the local train. In these two cases, the cars traveling between two adjacent yards take di erent types of train generating di erent travel costs, which can be calculated as ∑ ∈ ∑ ∈ + 1 − . e operation costs of the local train are de ned as ∑ ∈ ∑ ∈ 1 − , while those of shuttle train are ∑ ∈ ∑ ∈ / . For cars moving between two nonadjacent yards, the costs include those of accumulation ∑ ∈ ∑ ∉ , classi cation ∑ ∈ , and train operation ∑ ∈ ∑ ∉ / .

Constraints.
e formulation includes the following constraints: (1) e cars from yard to yard is either assigned to a through train service without classi cation or carried indirectly by a sequence of train services. (1) Shipment originating from yard and destined to the intermediate logistics centers located between yard and yard , ∈ . : Average number of cars per train (excluding local trains). : Average number of cars for local trains. : Number of intermediate logistics centers along the path from yard to yard , ∈ . , +1 : Shipments from logistics center to +1 , ∈ , 0 ≤ ≤ . : Fixed train operation costs of train → (excluding local trains). : Fixed operation cost of a local train → , ∈ . : Average accumulation time in hours for a local train at yard without running a shuttle train. : Average accumulation time in hours for a local train at yard with running a shuttle train. : Average travel time in hours of a shuttle train → , ∈ . : Average travel time in hours of a local train → , ∈ . : Coe cient of converting the train operation cost to the equivalent car-hour consumption. ree groups of decision variables are de ned: : Its value is 1, if the through train service → , ∉ is dispatched. Otherwise, it is zero. : Its value is 1, if the shuttle train service → , ∈ is dispatched. Otherwise, it is zero. : Its value is 1, if the cars whose destination is yard take the direct train service → at yard . Otherwise, it is zero. e formulation also requires the following intermediate variables to be de ned.
: Actual number of cars from yard to , including the original shipment demand and cars from other yards to the yard that are classi ed at yard . : Number of cars allocated to the train service → . : Number of cars classi ed at yard . : Operational frequency of local train service → , ∈ .
: Number of cars from the yard to , ∈ .

Objective Function.
e objective of the model is to minimize the total costs comprising those of accumulation, classi cation, train travel, and train operation. Cars traveling between two adjacent yards should be distinguished from Discrete Dynamics in Nature and Society 6 (5) e operation frequency of the local train needs to meet the transportation demands. e value of w is calculated as (6) e values of the decision variables , , and can be either 0 or 1.

Solution Approach
e optimization of the train service network design model can be regarded as a combinatorial optimization problem of the decision variables: , , and . e total number of decision variables increases exponentially with the number of yards in the network. is is a typical multi-variable NP-hard problem, which can be feasibly solved by heuristic methods such as genetic algorithm, particle swarm optimization, and simulated annealing algorithm. Most many-objective evolutionary algorithms use a one-by-one selection strategy to solve manyobjective optimization problems because of their incapability to balance convergence and diversity in the high-dimensional objective space [25]. Considering that the rst classi cation yard of cars between two yards is an integer value, it is convenient to implement encoding in a genetic algorithm, which is therefore applied here to solve the model. e genetic algorithm, rst proposed by Holland [26], is an adaptive heuristic search algorithm based on the evolutionary ideas of natural selection and genetics. It is of high generality and robustness, and is especially suitable for solving combinatorial optimization problems, because the decision variables are easy to be represented during the encoding process. It has been used successfully to solve train service network design problems [27][28][29][30].
(2) Only if a train service → is arranged, it is possible for cars going from yard to yard to be classi ed at the intermediate yard .
(3) e number of cars classi ed at each yard should not exceed its classi cation capacity. e value of is calculated as e value of is calculated as (4) e number of the occupied sorting tracks should be less than the number of the available tracks.
Here g is a step function re ecting the utilization of the tracks. Assuming that one track can service a maximum of 200 cars, it is determined as follows: where is calculated as        28.1 -Discrete Dynamics in Nature and Society 8 using the genetic algorithm to solve constrained combinatorial optimization problems. e genetic algorithm is set up as follows to solve this speci c optimization model.
(1) Encoding: We set the variable to indicate the rst classi cation yard of the cars from yard to yard as a gene to encode. e value range of each is − { }.
Note that the chosen value of the cars between two adjacent yards can be zero to indicate that only the local train service is dispatched. In a railway network with | | yards, each is set with a speci c value and arranged in a xed sequence, as shown in Figure 3, which forms a chromosome. Each chromosome is a possible solution for the model. Changing the value of each gene in the chromosome obtains di erent chromosomes that constitute the solution space of the problem. (2) Population size and initialization: e population size has an important in uence on the performance of the algorithm. Its preferred value usually depends on the size of the problem. e initial population is obtained by randomly choosing the rst classi cation yard for each car ow. (14) ℎ = (ℎ). e genetic algorithm includes the following elements: encoding, population size, initial population, tness function, crossover, mutation, population regeneration, penalty function, and ending conditions. Figure 2 shows the framework of  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21   1  -305 323 303 295 321 295 323 325 348 345 363 348 318 345 347 348 359 345 345 345  2  295 -309 347 323 295 323 345 356 356 359 323 325 312 363 323 298 356 363 363 363  3  323 311 -312 330 323 318 333 323 325 348 345 363 348 318 345 363 348 318 318 (3) and (4). erefore, we use the following formula to evaluate the feasibility of the new chromosomes and adjust the value of the genes where necessary to guarantee that the chromosomes in CrossPOP( + 1) represent feasible solutions.
(7) Dynamic mutation: Each gene in the chromosomes of CrossPOP ( + 1) will be mutated with probability that it is changed at each iteration. e initially chosen value for is decreased by multiplying a decay factor until it reaches a speci ed minimum value . Similarly, a er mutation formula (17) evaluates the feasibility of the newly generated chromosomes, and adjusts the values of the genes where necessary to guarantee that the chromosomes in MutPOP( + 1) represent feasible solutions. An operation that lters identical individuals retains only one in the population; the remaining repeated individuals are updated by the mutation operation.  (16) Discrete Dynamics in Nature and Society 10 demands between any two yards. e xed train operation costs are listed in Table 2. e technical parameters of each yard are shown in Table 3. Note that the operation cost values here are not actual currency costs, but a relative value for comparison purposes. e value of α is de ned as 3, the average train size is 55, and the average local train size

Data Preparation.
is section tests the formulation and the proposed algorithm on a realistic railway sub-network in Northeast China, as shown in Figure 4. e network contains 21 yards, numbered 1-21. time for local train at yard with and without a shuttle train running and , the number of intermediate logistics centers between two adjacent yards , the average travel time of a shuttle train , the average travel time of a local train and the xed operation cost of a local train . e travel path of each shipment is de ned in Appendix B. e row header indicates the origin yard, and the column header indicates the destination yard. e cell value is the yard sequence along the route: i.e., . Note that real-world data are processed for some parameters. e dummy data used here are solely for testing the model and algorithm.

Conclusions
is paper proposes a model for the tactical optimization problem of train service network design. It aims to determine the type and frequency of each service provided among yards. e constraints are formulated in terms of classi cation capacity, number of tracks, and actual operational requirements. e objective function considers the train operation costs and the car-hour consumption due to accumulation, classi cation, and train travel. An improved genetic algorithm was developed to overcome the di culties in solving the model, which includes an enormous number of decision variables and complicated constraints. It was tested in a realworld 21-yard Chinese railway sub-network. e approach produced relatively stable results within a reasonable computation time. e results show that, compared with the case that all shuttle train services between pairs of adjacent yards are dispatched, cost is saved by the optimization model, as it reduces the number of services and total number of dispatched trains. In particular, the decrease of the accumulation of car-hour consumption makes the largest contribution to the nal saving of the total cost, although the costs of classication, train operation, and train travel slightly increase. Future research will focus on timetabling based on the train operation plans obtained here.

Appendix
See Tables 10 and 11. best solution. Table 4 shows the dispatched train services and frequencies. In total, 163 direct services, including 39 shuttle train services, are arranged. In addition, given that the arc in the network is directed, 62 local train services are arranged without optimization; they and their frequencies are listed in Table 5. Table 6 shows the value of . e row header indicates the origin yard and the column header indicates the destination yard . e cell value is the rst classi cation yard of the cars going from to . Clearly, if the cell value is equal to , a train service from to is provided. Note that for a pair of adjacent yards and , if the cell value is zero, only the local service is dispatched between them. Consider as an example the cars going from yard 2 to yard 9: they are rstly classi ed at yard 3, where they are merged with cars traveling from yard 3 to yard 9; these cars are then reclassi ed at yard 8, and merged with cars from that yard going also to yard 9; the nal leg is the local service from yard 8 to yard 9. Table 7 shows the classi ed cars at each yard and the utilized sorting tracks at each yard. e performance of the proposed improved genetic algorithm (IGA) is highlighted through comparison with two other state-of-the-art algorithms widely used in combinatorial optimization problems: the simulated annealing (SA) [31] and particle-swarm optimization (PSO) algorithms [32]. e SA and PSO algorithms were also run 10 times each in the same railway network, like the proposed IGA. e SA solutions (objective value) vary from 668,287 to 681,763. e variance is 13,580,513 and the average solution time is 501s. e PSO solutions vary from 661,896 to 673,411, with a variance of 13,288,476 and average solution time is 487s. Table 8 presents the computational results of the three methods, IGA, SA, and PSO. Table 8 shows that all three meta-heuristic methods nd similar solutions for the same problem instance. Among them, the proposed IGA outperforms the others in terms of both solution quality and solution time. erefore, it represents a good choice for solving the train service network design problem.

Comparison with the Traditional Assumption-Based
Method. Consider another case having all the shuttle services between two adjacent yards provided without optimization which is a widely adopted assumption in existing literature,  Data Availability e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.