Application of Modified NSGA-II to the Transit Network Design Problem

&e transit network design problem involves determining a certain number of routes to operate in an urban area to balance the costs of the passengers and the operator. In this paper, we simultaneously determine the route structure of each route and the number of routes in the final solution. A novel initial route set generation algorithm and a route set size alternating heuristic are embedded into a nondominated sorting genetic algorithm-II(NSGA-II-) based solution framework to produce the approximate Pareto front. &e initial route set generation algorithm aims to generate high-quality initial solutions for succeeding optimization procedures. To explore the solution space and to have solutions with a different number of routes, a route set size alternating heuristic is developed to change the number of routes in a solution by adding or deleting one route. Experiments were performed onMandl’s network and four larger Mumford’s networks. Compared with a fixed route set size approach, the proposed NSGA-IIbased solution method can produce an approximate Pareto front with much higher solution quality as well as improved computation efficiency.


Introduction
Followed by frequency setting, timetable development, bus scheduling, and driver scheduling, the transit network design problem (TNDP) [1] aims to determine a set of routes to operate in an urban area to balance the costs of the passengers and the operator. e performance of the succeeding four stages highly depends on the quality of the results of the transit network design. us, the TNDP has been continuously studied during the last five decades. Known as a NPhard problem, the TNDP is a difficult combinatorial optimization problem, and its optimal solution is difficult to find even by using supercomputers.
Despite the attention given TNDP, the determination of a suitable number of routes (or a range) for a specific network has received little attention. e route set size is an important parameter in the transit network design because of the following reasons. First, the route set size implicitly reflects the total route length, which directly affects the fuel costs, the mileage of the public transport, and other operation costs. Second, there is a trade-off between the route set size and the average travel time, which is an important indicator of the transit network performance. A larger route set size usually indicates more direct services for the passengers. Optimizing the route set size can lead to solutions with a wide range of trade-off levels between the operator cost and the passenger cost.
Pattnaik et al. [2] proposed a variable string length coding method in which the route set size is determined along with the route set. Similarly, Tom and Mohan [3] incorporated the variation for the route set size in the coding to address the transit network routing and frequency setting problem. However, the mentioned research studies imply two limitations: (i) the optimal route set is only the subset of a large set of candidate routes generated by a candidate route generation algorithm. is method is not practical for large networks as the candidate route set scales exponentially with the network size. (ii) e resultant model can only give one answer per run. To address these issues, in this work, we consider the number of routes as an endogenous variable and vary it during the search for the approximate Pareto front. In other words, a route set size alternating heuristic is used together with the NSGA-II for solving simultaneously the TNDP and the route set size determination problem.
is method can achieve a relatively good balance between the passenger cost and the operator cost. However, the determination of a suitable set of weights usually requires many experiments, which is time consuming, and the suitability of the weights obtained may vary from network to network. Besides, some researchers [10][11][12][13][14][15][16][17][18] approached the TNDP by considering the operator cost or the passenger cost as the only objective to be optimized. As a result, given the conflicting nature of the objectives, the optimal solution obtained might not be practical from the viewpoint of the other stakeholder. erefore, in this work, we try to produce an approximate Pareto front from which the desired solution can be selected by the decision maker.
e main contributions of this research are threefold: (a) the development of an initial route set generation algorithm to produce high-quality initial solutions for the TNDP; (b) the proposal of a route set size alternating heuristic to alternate the number of routes in a solution during the optimization procedure; and (c) the illustration of the applicability of the proposed solution method in five networks with different scales. e remainder of this paper is organized as follows. In Section 2, we review existing works on route set initialization and optimization methods. e transit network design problem is described in Section 3. Section 4 elaborates the proposed initial route set generation algorithm and route set size alternating heuristic. Section 5 presents numerical results, and we conclude in Section 6.

Literature Review
e typical solution method for TNDP comprises two parts: an initialization procedure and an optimization algorithm. In this section, we systematically review related works from these two perspectives.

Initialization Procedure.
Most initialization procedures can be classified into two categories: initial route set generation and candidate route set generation. One route set consists of a predefined number of stop sequences which can be used to define a transit network. As our work is focusing on the transit routing problem, interesting readers for other transit network planning problems can refer to Guihaire and Hao [19], Kepaptsoglou and Karlaftis [20], Farahani et al. [21], Abedin et al. [22], and Liu and Ceder [23]. Lampkin and Saalmans [24] first designed a heuristic algorithm to generate route set. ey produced a skeleton route of four nodes and inserted acceptable nodes one by one into the skeleton route according to a predefined criterion until a complete route is obtained. After each insertion, demand satisfied by the formed route will be eliminated in order to guide the process. is framework of constructing route was adopted to develop initial route generation algorithm by Baaj and Mahmassani [5]. Mandl [25] approached urban transit routing problem with two steps: first all the shortest paths connecting every origin-destination pair were found to generate feasible initial route set, and then used heuristics to improve the quality of initial route set. With the consideration of transit demand, Baaj and Mahmassani [5] designed a route generation algorithm (RGA) to generate initial route set. Specifically, high demand node pairs are selected to define the initial skeletons with the help of a designer. Once the skeletons were specified, they were expanded into routes with a node selection or insertion strategy. e algorithm can also generate different sets of routes with different trade-offs between passengers and operator costs. Inspired by RGA [5], Mauttone and Urquhart [26] proposed another route generation algorithm called pair insertion algorithm. ey used a novel strategy of inserting pairs of nodes, instead of the original expansion of routes by inserting single vertices. Experimental results reported by them evidenced that pair insertion algorithm can produce better solutions from the operator viewpoint with similar quality from the passengers' viewpoint.
To avoid defining a skeleton first, Chakroborty and Wivedi [12] determined each route by first selecting the starting node and then adding adjacent nodes to the former node until some termination criteria are satisfied. e probability of selecting a node i is determined by its activity level which is calculated based on the number of trips originating from node i and starting from node i. Similarly, Mumford [27] constructed the first route by adding adjacent nodes to a randomly selected start node and initiated other routes in the same way with a start node used in previously generated routes. Kechagiopoulos and Beligiannis [13] and Owais and Osman [28] proposed a similar procedure for initial route set generation. Instead of selecting nodes randomly, Nikolic and Teodorovic [16] developed a deterministic procedure for initial route set generation. ey generated the initial solution by always selecting the shortest path with the largest number of passengers that enjoy the direct service till the number of routes generated equals a predefined number. Most recently, Kilic and Gok [14] formulated an initial route set procedure based on the edge usage statistics. e statistics was obtained by computing all the shortest paths in the graph and then calculating the edge usage score based on the total traffic passing over a single edge. Each edge score was weighted by the inverse length of the edge and normalized. e normalized value is set as the discrete probability for the edge. Each route is then expanded by selecting adjacent edges with predefined probabilities. eir method improves upon published results for Mumford's larger networks.
In an initial route set generation procedure, multiple initial route sets are required for population-based optimization methods. Candidate route set generation methods generate a set of candidate routes. Fan and Machemehl [7] and Fan and Machemehl [29] modified Yen's k-shortest path algorithm [30] to generate all feasible routes whose distances satisfy the minimum and maximum length constraints for all nonzero demand origin-destination pairs. Similarly, Arbex and Cunha [31] generated a route database: for each origindestination pair with nonzero travel demand, the associated OD path that corresponds to the ideal shortest direct travel time is added to the database as a bus route, along with those routes whose travel time does not exceed the fastest time by a maximum route detour factor specified by the planner. In addition, Cipriani et al. [6] developed a heuristic algorithm to generate three different and complementary sets of rational and realistic routes. A set of candidate routes is formed by using Newton gravity theory and a special shortest path procedure [10]. In Afandizadeh et al. [4], paths with lengths within a desirable limit (i.e., travel time up to 1.2 shortest path travel time) are selected as initial paths. e best subset of the large candidate route set is selected by an optimization procedure.

Optimization Algorithm.
After the generation of an initial route set, heuristics-or metaheuristics-based optimization algorithms are utilized to improve solution quality, e.g., genetic algorithm (GA), simulated annealing, bee colony algorithm, artificial bee colony, particle swarm optimization, nondominated sorting genetic algorithm-II, tabu search, memetic algorithm, differential evolution, and selection hyperheuristics. For candidate route set approaches, the most commonly used algorithm is GA.
A GA with seven proposed genetic operators is designed to facilitate the search within a reasonable amount of time [8]. e model designs the bus routes in two phases: the route improvement algorithm using genetic operators and coordination of headways to improve the efficiency of the network. e results showed that the proposed model is more efficient than the binary-coded genetic algorithm benchmark. Fan and Machemehl [7] developed a GA to select an optimal set of routes from the candidate route set solution space. Numerical results showed that genetic algorithms outperform local search methods with multiple starting points and provide no worse solution quality than either simulated annealing or tabu search. Cipriani et al. [6] proposed a parallel GA for finding a suboptimal set of routes with the associated frequencies. Numerical experiments carried out on network of the city of Rome showed that transit demand can be served effectively by a bus network composed of a lower number of lines. Chew et al. [32] developed a GA to address the biobjective transit routing problem. e computational results for Mandl's network showed that the proposed algorithm performs better than the previous best published results in most cases. Afandizadeh et al. [4] applied GA to the city of Mashhad and the results indicated highly effective improvement in terms of travel time and fleet size. Nayeem et al. [15] developed a GA with elitism to optimize the size of satisfied passengers, the overall travel time of served passengers, and the total number of transfers. Owais and Osman [28] proposed a GA to tackle the multiobjective transit network design problem by optimizing average travel time and the required bus fleet.
Besides, Fan and Machemehl [29] used a simulated annealing algorithm for the first time to solve the optimal bus transit route network design problem at the distribution node level. Zhao [33] proposed a stochastic local search method based on simulated annealing and fast descent search. Numerical experiments showed that the proposed method is capable of tackling large-scale transit network design optimization problems and producing results in a reasonable amount of time. Later, Zhao and Zeng [18] developed a metaheuristic search scheme that combines simulated annealing, tabu, and greedy search methods to optimize transit route network structure, vehicle headways, and timetables. Kilic and Gok [14] applied hill climbing algorithm and tabu search algorithm, respectively, on the transit network design problem. e experiment results for Mumford's four larger networks showed improvements over results in Mumford [27].
In the last decade, Nikolic and Teodorovic [16] designed a bee colony optimization algorithm for transit network design.
is algorithm uses a similarity among the way in which bees in nature look for food and the way in which optimization algorithms search for an optimum of a combinatorial optimization problem. e numerical experiments performed on Mandl's network and a larger instance showed that BCO algorithm is competitive with other approaches in the literature, and it can generate high-quality solutions. Szeto and Jiang [34] developed a hybrid artificial bee colony algorithm to solve the transit route and frequency design problem. e memetic algorithm (MA) is one of the recent growing evolutionary computation algorithms. Zhao et al. [35] developed a MA algorithm to design both optimal route configuration and service frequency for the urban transit network. Compared with traditional algorithms, the results obtained by MA demonstrates that the proposed algorithm could improve the computational performance. Kechagiopoulos and Beligiannis [13] developed an optimization algorithm based on particle swarm optimization.
Most recently, Buba and Lee [36] applied differential evolution algorithm and a repair mechanism for the multiobjective transit routing problem. Numerical experiments are performed on Mandl's network and four larger networks of Mumford [27]. Buba and Lee [37] proposed a hybrid differential evolution with particle swarm optimization to design the transit network structure and associated frequencies. e computational results showed improvements over results in most of the previous studies. Ahmed et al. [38] evaluated the performance of a set of selection hyperheuristics on the route design problem of bus networks, with the goal of minimizing the passengers' travel time and the operator's costs.

Problem Definition
e transit network design problem (TNDP) can be defined by an undirected graph G � (V, E) and a travel demand matrix D. V is the set of vertices v 1 , v 2 , . . . , v n representing predefined pick-up/drop-off points. E is the set of edges e 1 , e 2 , . . . , e m standing for direct transport edges between vertices. Each edge e � (i, j) ∈ E is associated with a travel time l(i, j) denoting the necessary time for a vehicle to travel from vertex i to vertex j. (l(i, j) � ∞, if e(i, j)∄ ). Travel demand matrix D � d ij , i, j ∈ V is given to present the travel demand between every pair of vertices; d ij denotes the number of trips per time unit from vertex i to vertex j. A route r can be expressed as a sequence of adjacent vertices i�1 v i is the set of vertices of route set R (|r| is the number of vertices in node sequence r). E r k � e(r ki , r k,i+1) ): i � 1, . . . , |r k | − 1 denotes the set of edges of route r k and E R � ∪ r∈R E r . To compare our outcomes with existing results, this study mainly focuses on the generation of route sets and leaves the frequency setting problem for future studies.
From the viewpoint of passengers, the travel time and number of transfers are the major concerns. e average travel time ATT (including in-vehicle travel time and a transfer penalty) is selected as the indicator of the service level to depict the passenger cost. Since the average travel time includes the transfer cost, the number of transfers is implicitly included in the ATT. e objective function from the passengers' viewpoint is defined as follows: where d ij denotes the number of trips per time unit from vertex i to vertex j. p is the transfer cost representing the inconvenience of moving from one vehicle to another, and t ij (R, p) represents the shortest travel time from vertex i to vertex j by using route set R. To be consistent with former research studies, the transfer cost p is set as 5 minutes [4,6,16,27,32]. To compare our results with existing published results, we assume that passengers always choose the shortest path to reach their destinations. From the operator's perspective, the sum of the lengths of all routes in a route set is directly proportional to the fleet size, which is an important component in the operator cost [26]. us, the total route length (TRL) of a route set is taken as a proxy. e objective function from the operator's viewpoint is given as follows: where L r � r i�2 l(r i−1 , r i ) represents the length of route r. Mathematically, this optimization problem can be stated as Constraint (5) limits the number of vertices of each route. Constraint (6) ensures that all vertices in V can be found in at least one route. Constraint (7) ensures that every vertex is reachable from other vertices in the subgraph defined by R [36]. Constraint (8) ensures that each node can only be visited by a route at most once. Recall that route r k is described as a node sequence (v k1 , v k2 , . . . , v kq ), where q denotes the number of nodes in node sequence r k . Constraint (9) ensures that all consecutive vertices are connected by an existing edge (l(i, j) ≠ ∞). e uniqueness of route r in R is ensured through the definition of route set R.

Overall Solution Framework.
In the developed solution method, a novel initial route set generation algorithm is constructed to produce high-quality initial route sets, while a route set size alternating heuristic is developed to switch the route set size of a solution by adding or deleting an entire route.
ese two algorithms are embedded into the framework of NSGA-II to search for the approximate Pareto front in terms of the operator cost (i.e., the total route length, TRL) and the passenger cost (i.e., the average travel time, ATT).
Multiple high-quality initial solutions are firstly produced by the initial route set generation algorithm. Figure 1 presents the flowchart of the proposed initial route set algorithm. Two objective values are then calculated for each initial solution to evaluate the fitness of initial solutions. And all initial solutions are regarded as parents and used to generate the offspring through genetic operators (i.e., selection, crossover, and mutation operators). In this work, the application of crossover operator generates only one offspring. erefore, the number of solutions in the offspring remains the same as the parents. After the mutation, the route set size alternating heuristic is applied to the offspring to change the number of routes in an offspring by adding/ deleting if its solution quality has not been improved for a predefined number of generations. In order to select the individuals of the next generation, the NSGA-II is subsequently applied to the parents and the offspring. For extensive surveys on NSGA-II, readers are referred to Deb et al. [39]. is process is repeated until the stopping criteria are satisfied.
e following three datasets are needed to perform route combination algorithm (RCA): (i) initial candidate route set (containing all the shortest paths) denoted by P; (ii) transit demand served by each candidate route and denoted by P v ; and (iii) the origin-destination pairs served by each candidate route and denoted by P d . Note that due to the symmetry of demand and travel time, only triangular elements are used.

Initial Route Set Generation Algorithm.
e proposed initial route set generation algorithm is based on the observation that maximizing the travel demand that can be satisfied directly (without transfers) is a key component in the generation of an initial route set. is algorithm consists of two procedures: absorption and combination. As the algorithm is based on the combination of routes, it is called RCA. ese procedures generate an initial route set based on the following parameters: SP, D, and G. SP contains all the shortest paths connecting every origin-destination pair. G is the undirected graph. An example in which a graph and a matrix D is used to illustrate the initial route set generation algorithm is shown in Figure 2. e numbers adjacent to the links represent the travel times in minutes. Table 1 gives the dataset structure that is needed to generate an initial route set. e second column gives all the shortest paths between the vertices in the example graph. OD pairs served by the shortest paths are shown in the third column. And the last column contains the travel demands served directly by each shortest path. For example, for the second route in Table 1, the shortest path (1, 2, 3) starts from vertex 1, goes through vertex 2, and terminates at vertex 3.
is path serves the travel demand of one OD pair (1-3).

Absorption Procedure.
If a route is completely included in another route, we consider that this route can be absorbed by another route. e basic principle of absorption is to eliminate routes which are completely included in another route. By doing this, the number of routes in R and the total route length can be efectively decreased. e absorption procedure is composed of four steps: Step 1. Sort the routes in R in the order of the number of vertices of each route. e routes that serve no demand are eliminated. Let M be the total number of routes in R and set m � 1. For example, the routes in Table 1 are sorted and given in Figure 3(a). Particularly, the third route (1,4) in Table 1 is eliminated since it serves no demand.
Step 2. Find all the routes (except route m) that can absorb route m. Let Q be the set of routes that can absorb route m. If m � M, stop; otherwise, go to Step 3.
Step 3. If Q is empty, set m � m + 1, and return to Step 2; otherwise, based on the travel demands served by routes in Q, a probability p q is appointed to each route q and reflects the probability of each route to be selected as the route to absorb route m. e probability of selecting route q is calculated by Step 4. Use the roulette wheel selection method to select a route from Q based on the values of probabilities p q . As soon as a route is selected to absorb route m, the ODs and demands served by route m are assigned to route q. en, we need to delete route m from R and return to Step 2.
An example for the absorption procedure is shown in Figure 3. In Figure 3(a), the first route (1, 2) is completely included in route 6 (1, 2, 3), namely, the first route is absorbed by route 6 and the travel demand of the first route can be served by route 6. erefore, we need to erase the first route from R.
From Figure 3, it can be seen that the number of routes in R is decreased from 9 to 4, but the travel demand served by each route is increased. After the absorption procedure, 3 * N rou routes with high demands are selected as the input datasets for the following procedure, where N rou is the number of routes in a solution set (predefined according to Table 2).

Combination Procedure.
e combination procedure aims to construct new routes with pairs of routes in R. e basic principle of the combination procedure is to combine pairs of routes with high demands. In this way, we can simultaneously decrease the number of routes and increase the travel demands served by one route. Before constructing new routes, pairs of routes are selected. e selection of one pair of routes relies on the relationship between two routes. In the light of the positions of vertices shared by two routes, the relationships between two routes can be classified into six types. Figure 4 shows the six types and their combined results.
If two routes fall into types of 1, 2, 3, or 5, then these two routes can be combined. e combination procedure is composed of five steps: Step 1. Sort the routes in R in descending order based on the transit demand of each route. Let M be the total Journal of Advanced Is path1 the longest path in P?  Journal of Advanced Transportation number of routes in R and set m � 1. For example, the routes in Figure 3 are sorted and given in Figure 5(a).

No
Step 2. Find all the routes (except route m) that can combine with route m. Let Z be the set of routes that can combine with route m. If m � M, go to Step 5; otherwise, go to Step 3.
Step 3. If Z is empty, set m � m + 1, and return to Step 2; otherwise, according to the travel demands served by routes in Z, a probability p z is distributed to each route z and depicts the probability of each route to be selected as the route to combine with route m. e probability of selecting route z is computed by Step 4. Use the roulette wheel selection method to choose a route from Z based on the values of probabilities p z . As soon as a route z is selected, route m and route z are combined into a new route according to Figure 4. e new route serves the travel demands and ODs of both routes. en, we should delete routes m and z from R and return to Step 2.
Step 5. Sort routes again in R in descending order by the transit demand of each route. e initial route set is formed of the first N rou (the number of routes in a route set) feasible routes in R. Check the feasibility of the initial route set by using feasibility checks described in Section 4.3.
An example for the combination procedure using the output of the absorption example is shown in Figure 5. First, the routes are sorted in descending order based on the travel  Journal of Advanced Transportation demand served directly by each route. e relationship between the first route and other routes are determined, that is, routes 1 and 2 fall into type 1, routes 1 and 3 belong to type 1, and routes 1 and 4 are in type 2. With the aid of equation (11), the probabilities of selecting these three routes are 0.36, 0.33, and 0.30. We assume that route 2 is chosen to combine with route 1, and the new route is constructed according to Figure 4. It should be noted that the demand covered by the new route is doubled. e total route length of R is also decreased from 36 minutes to 34 minutes. e number of routes also decreases from 4 to 3, as shown in Figure 5(b). Similarly, in Figure 5(b), routes 1 and 2 are combined. From Figure 5, it is not uncommon that not only the number of routes in R is decreased from 4 to 2, but also the travel demand served by each route is increased. Hence, the result of the combination procedure can be used as an initial route set for optimization algorithms.

Feasibility Checks.
e TNDP is a multiconstrained problem, and the feasibility of the solutions should be checked carefully. Inspired by [27,31], our proposed feasibility checks are detailed as follows: (1) Route feasibility check: check if every route is feasible, that is, no repeated vertices appear in a route, and the number of vertices in a route must be within a predefined range. (2) Route uniqueness check: if one route is the same as a segment of another route or another entire route, we regard this route as nonunique. As a result, this route is directly eliminated. in the solution set at least once. is process constructs an appeared-vertex list by searching all the routes and checks if every vertex appears in the list. (4) Network connectivity check: the purpose of this procedure is to determine whether a route is connected to the transit network.
After feasibility checks, a repair mechanism is applied to repair infeasible solutions. It should be noted that the generated initial solutions are always feasible for the first two feasibility checks. Only infeasible initial solutions which fail area coverage check are repaired. Other infeasible solutions (network connectivity check) are simply discarded during Combined results

None
The shared vertices are on the first part of route 1 and on the last part of route 2.
The shared vertices are on the first part of route 1 and on the middle part of route 2.
The shared vertices are on the middle parts of both routes.
The shared vertices are not continuous on both routes.
There is no vertex shared by both routes.
The shared vertices are on the first parts of both routes.   Journal of Advanced Transportation initial solution generation and the initial route set generation algorithm is called again. e repair mechanism for initial solutions is described as follows. e repair mechanism randomly selects a route from the solution and then applies insert operation. e insert operation constructs a list of candidate nodes by searching all positions and checking the existence of the edges (in the origin graph) between the used and unused nodes of the modified route, and then it randomly selects one of the candidate nodes and inserts it into the route. e above procedure is repeated until all nodes can be found in the route set. If there are nodes that cannot be inserted into the initial route set, this solution is discarded. Note that this procedure is also applied to the output solutions of route set size alternating heuristic.

Route Operators.
Due to the complex structure of the route set, one crossover and three mutation operators are designed to facilitate the generation of offspring. e route crossover operator is developed to exchange two routes between two individuals. Each individual of the current generation is selected in turn as the first parent. e second parent is randomly determined. en, two routes from each parent are randomly determined and exchanged.
ree mutation operators, namely, the sequence, insert, and delete mutation operators, are proposed in this study to mutate the route structure of an individual. Specifically, the sequence mutation operator is developed to replace a random sequence of a route with a k-th shortest path. For the inserted mutation operator, all the positions in the route will be checked to determine whether there exists a vertex that can be connected to two adjacent vertices in the route with only two edges. en, a suitable vertex will be randomly selected and inserted into the route. Comparatively, the delete mutation operator constructs a candidate list of vertices and randomly deletes a vertex.
Before a crossover/mutation operation is made, the result of crossover/mutation will be checked for route feasibility and route uniqueness. If these checks return true, the crossover/mutation is made. e solutions are feasible for the first two checks all the time. After all solutions go through crossover and mutations, the last two feasibility checks are performed for child solutions.
If the area coverage constraint is not satisfied, a route in the child solution is selected from the solution and then insert operation is applied. e insert operation constructs a list of candidate nodes by searching all positions and checking the existence of the edges (in the origin graph) between the used and unused nodes of the modified route, and then it randomly selects one of the candidate nodes and inserts it into the route. e above procedure is repeated until all unused nodes are inserted. If there are nodes that cannot be inserted into the child solution, the child solution is replaced by its parent.
If the subgraph defined by a child solution is not connected, the child solution is substituted by its parent.

Route Set Size Alternating Heuristic.
As mentioned before, the route set size is regarded as an endogenous variable. erefore, for every individual that survives into the next generation, a descent route set size alternating heuristic is employed to change the route set size. To ensure efficiency and secure potential individuals, only individuals who have not been improved for a predefined number of generations (set as 5 in this article) will go through this heuristic. e proposed route set size alternating heuristic comprises two procedures: route addition and route deletion. For each selected route set, either route addition or route deletion is randomly selected with the same probability. Solutions going through this heuristic are checked for area coverage constraint and repaired when necessary using the repair mechanism described in Section 4.3. If the solution is not feasible after repair, the changes made by route set size alternating heuristic are undone. e process of this heuristic is detailed as follows.

Route Addition.
For each route set R, a deviation ratio is calculated for every nonzero transit demand to estimate the quality of their trips. en, the OD pairs with a low service quality are selected. Finally, the greedy algorithm proposed by Nikolic and Teodorovic [16] is used to determine the most suitable route to add into the route set R. e quality indicator for passengers between vertices i and j is defined as follows: Using routes of R, this represents a deviation ratio of the minimum travel time from the theoretical best value (which only depends on the road network G). t ij (R) is computed by using an all-or-nothing assignment approach, and t ij (G) is the in-vehicle travel time of the shortest path in G between vertices i and j. According to this formulation, lower values of Δ ij (R) are desired. en, the OD pairs with a high value of Δ ij (R) are considered as not well served. e set of these OD pairs can be derived as follows: e OD pair that has the highest value of ds(i, j) is used to generate the most suitable route. is route is obtained by connecting this OD pair with the shortest path. By inserting this route into R, we can increase the percentage of the satisfied travel demand with zero transfer.

Route Deletion.
e route deletion procedure is proposed to delete one route from the route set R.
is procedure is based on the analysis of the allocation of each travel demand. e route deletion procedure works in the following way. For each route set R that is selected to perform the route deletion procedure, the route whose removal from the route set has the least influence on the quality of the current route set is determined. en, this route should be deleted from the route set R. Based on the analysis of the allocation of each transit demand, every route Journal of Advanced Transportation r k is assigned a weight w k , which reflects the importance of each route. And this weight is calculated as follows: where T k is the round-trip time of route r k . d ij is the transit demand per unit time from vertex i to vertex j. t ij (r k ) represents the in-vehicle travel time on route r k for passengers traveling between vertices i and j. e value of t ij (r k ) is obtained as follows: (1) If there is a direct service between vertices i and j using route r k , the t ij (r k ) value is the in-vehicle travel time. is definition has significant implications: (i) more passengers and longer in-vehicle travel time spent on route r k mean more contribution to the weight w k ; (ii) the weight of route r k rapidly diminishes when the length route r k increases. After the weight calculation procedure, the least important route is determined and removed from the route set R.

Experimental Results
To illustrate the effectiveness of the proposed initial route set generation algorithm and demonstrate the effectiveness of the route set size alternating heuristic on solution quality, in this section, various experiments have been conducted and these solutions are systematically compared with existing published results. e proposed algorithms are tested on Mandl's benchmark network [25] and four larger instances in Mumford [27]. Table 2 gives a brief description of these five networks. e solutions are evaluated by the percentages of satisfied demand with 0, 1, and 2 transfers (d 0 , d 1 , d 2 ), the percentages of unsatisfied demand (d un ), and the average travel time (ATT), as well as the total route length (TRL). All the experiments are performed on a standard PC that has an Intel Core i5-3470 chip with 3.20 GHz and 8 GB RAM. e number of individuals in a generation is set to 20. e probabilities of all four route operators are set to 1/|R|, 1/|R|, 0.005, and 0.005. NSGA-II terminates when the approximate Pareto fronts are not improved for 100 generations.

Effect of the Proposed Initial Route Set Generation
Algorithm. To illustrate the effectiveness of the proposed initial route set generation algorithm, we compare our initial solutions with reported initial solutions for Mandl's network [25] and Mumford's large networks [27].

Mandl's Network. In this section, we compare our initial solutions with reported initial solutions in Nikolic and
Teodorovic [14,16] for Mandl's network. Initial route sets that contain 4, 6, 7, and 8 routes for Mandl's network were obtained. Other initialization procedures mentioned in the literature review are not compared, since the initial solutions produced by their methods are not reported in their works. e solutions are plotted on the two-dimensional space defined by the objectives (ATT and TRL). Note that the 4route solution in Nikolic and Teodorovic [16] is not plotted as it is not feasible.
In Figure 6, it can be seen that the initial solutions obtained by the proposed algorithm have small values of ATT, while other solutions have small values of TRL. e results from greedy algorithm have a better trade-off in terms of both objective function values. However, the greedy algorithm can only produce one specific solution for a network and |R| since it contains no randomization. Greedy algorithm may not be suitable for population-based optimization algorithms which require multiple different initial solutions. Table 3 compares our initial solutions for Mandl's network with other approaches in terms of six parameters (i.e., d 0 , d 1 , d 2 , d un , ATT, and TRL). It can be seen that the initial solutions obtained by RCA have better values of ATT than the ones in Nikolic and Teodorovic [16] and Kilic and Gok [14]. 4-route solution in Nikolic and Teodorovic [16] has a smaller ATT. However, 6.96% demand are unsatisfied in that solution. On the other hand, the initial solutions of Nikolic and Teodorovic [16] and Kilic and Gok [14] have lower values of TRL. For the initial route sets, the increase of the number of routes |R| has a direct impact on the values of ATT and TRL. For example, when |R| increases from 4 to 8, the value of ATT of the solutions acquired by RCA gradually decreases from 11.01 to 10.12, while the TRL value increases from 116 to 259. It should be noted that the values of d 2 and d un in all the initial solutions acquired by RCA are zero. is indicates that all the passengers can reach their destination with at most one transfer.

Mumford's Networks.
In this section, we further compare our initial solutions for Mumford's large networks with existing initial solutions [14]. Other initialization procedures are not compared, since the initial solutions of their methods are not reported in their works. We also plot the final solutions in Mumford [27] and Kilic and Gok [14] for four larger networks.
In Figure 7, the initial solutions generated by the proposed algorithm outperform other initial solutions in terms of both objectives. e initial solutions generated by the proposed algorithm have similar TRL values, and the ATT values of our results are generally smaller than the initial solutions in Kilic and Gok [14]. Note that for network Mumford2 and network Mumford3, the initial solutions produced by the proposed algorithm are even better than the final solutions in Mumford [27] and Kilic and Gok [14], which implies that the proposed initial route set generation algorithm can produce high-quality initial solutions for large networks. For network Mumford3, some initial solutions produced by Kilic and Gok [14] are also dominating the final solution produced by Mumford [27]. e initial solutions used in Mumford [27] are randomly generated to encourage coverage and connectivity of the route network. e importance of the initial route set generation algorithm may be overlooked.
We further compare the initial solutions in Kilic and Gok [14] and ours for Mumford's networks in terms of six parameters (i.e., d 0 , d 1 , d 2 , d un , ATT, and TRL). e results are integrated and presented in Table 4. e number of routes in a solution is set as the recommended value in Table 2.
From the results in Table 4, it can be seen that the solutions obtained by RCA are better than the ones obtained by Kilic and Gok [14] in terms of both objectives (i.e., ATT and TRL). Compared to the solutions in Kilic and Gok [14], the solutions obtained by RCA for the four networks can reduce the ATT by 8.23%, 8.10%, 6.46%, and 5.88%, respectively, and reduce the TRL by 6.92%, 24.06%, 6.07%, and 9.12%, separately.
ese results demonstrate that the RCA can significantly improve the quality of the initial solutions for large networks when compared with the initial procedure proposed by Kilic and Gok [14]. erefore, the RCA is used in the following experiments to generate high-quality initial solutions for the optimization procedure. We also compare the computation time reported in Kilic and Gok [40] and ours to further display the superiority of the proposed algorithm. e corresponding outcomes are depicted in Table 5. Notwithstanding the difference of data structure and subroutines, it can be seen that RCA is very efficient, as it takes only 3 seconds to generate an initial solution for the largest network (i.e., Mumford3). On the contrary, the computation time for network Mumford3 in Kilic and Gok [14] is 8 hours. e computation resource needed for RCA is negligible.

Effect of the Route Set Size Alternating Heuristic.
is experiment compares the approximate Pareto fronts obtained from the NSGA-II with and without a route set size Journal of Advanced Transportation alternating heuristic. To make the comparison as fair as possible, the NSGA-II with variable |R| starts with the same initial solutions for the NSGA-II with fixed |R| in Table 2 (for Mandl's network, |R| � 6).

Mandl's Network. Figures 8(a)-8(e)
compare the results of the NSGA-II with variable |R| and fixed |R|. An immediate observation is that the approximate Pareto fronts are very close. When |R| increases, the best ATT value decreases as expected, while the corresponding TRL value increases. For the NSGA-II with variable |R|, the theoretical best value of ATT (10.0058) is reached when |R| � 19, and all passengers can travel with the shortest path. is is the first time a transit network with theoretical ATT is presented for Mandl's network.
In addition, Figure 8(f ) compares the approximate Pareto front obtained from the NSGA-II with and without a route set size alternating heuristic. e approximate Pareto front with fixed |R| is obtained by comparing all the approximate Pareto fronts with |R| � 4 to |R| � 8. It can be seen that the NSGA-II with variable |R| performs as well as the NSGA-II with fixed |R|. is shows that the NSGA-II with variable |R| leads to the same solution quality of the solutions obtained with fixed |R|. Moreover, the NSGA-II with variable |R| can produce an approximate Pareto front with a wider range of trade-offs between the two objectives. Table 6 summarizes the computation time of each NSGA-II. Note that the computation time is measured by the number of generations generated by each NSGA-II. As shown in Table 6, the computation time of NSGA-II with variable |R| is relatively close to that of the NSGA-II with fixed |R|. However, to obtain the approximate Pareto fronts with different values of |R|, the NSGA-II with fixed |R| must be performed many times with different values of |R|. As a result, for fixed |R| approaches, the total computation time grows as the number of possible values of |R| increases. is suggests that the route set size alternating heuristic can save a large amount of computation time without deteriorating the quality of the approximate Pareto front obtained.

Mumford's Networks.
To assess the merits of our route set size alternation heuristic, we run the optimization algorithm for Mumford's four networks with the same initial solutions. To make a fair comparison, each NSGA-II was terminated when 3000 generations were produced. e fixed value of |R| is set to the recommended value in Mumford [27]. e results are plotted on the two-dimensional space defined by both objectives in Figure 9.
In Figure 9, the first observation is that two curves contact at some points. e joint points represent the best solutions from the passenger's perspective with predefined |R|. e operator's best solutions with fixed |R| are dominated by the results with variable |R| except the solution for Mumford0. Specifically, for network Mum-ford0 (i.e., Figure 9(a)), the solutions with fixed |R| coincide with the lower part of the other curve; for network Mumford1 (i.e., Figure 9(b)), the approximate Pareto front of fixed |R| is overlapped with the one with variable | R| on the upper part. As shown in Figure 9(c), the overlapped solutions are few. One interesting observation in Figures 9(c) and 9(d) is that the overlapped solutions locate on the upper part of the curve, rather than on the middle area. is is result of a larger |R|, which also reinforces the idea that it is difficult to select a suitable |R| for the best solution to TNDP.
Generally, the NSGA-II with fixed |R| can produce highquality solutions, but most of these solutions are dominated by solutions with a small |R|. As a result, the NSGA-II with fixed |R| needs to be executed many times with different values of |R| to produce the same approximate Pareto front generated by the NSGA-II with variable |R|. However, when the network grows in size, the evaluation of one solution will take a great deal of time, and there are more values of |R| to be searched. As a result, a decision maker is forced to compromise on solution quality and computation resource. Note that all the results are produced by a single run of the NSGA-II. Table 7 further presents the computation time for Mumford's networks. e computation times with variable |R| are generally larger than the ones with fixed |R| because NSGA-II with variable |R| has to search solution space with larger |R| values than the recommended value. To produce the same Pareto front, NSGA-II with fixed |R| has to be run many times with different input values for |R|. e total computation time scales with the range of |R|. In practice, the NSGA-II with variable |R| is easier to implement.        Beligiannis [13], Ahmed et al. [38], Buba and Lee [36], and Buba and Lee [37]. e approximate Pareto fronts of Kilic [42] and John et al. [43] plotted in Figure 10 are obtained via private communication. From Figure 10, it can be seen that our results determined by varying the value of |R| are competitive with other results. In general, our results are quite similar to the combined approximate Pareto front in Kilic [42] and the results in John et al. [43]. e ATT of our best 8-route solution is 10.09, which is almost the same as the ATT values in Nikolic and Teodorovic [16] (i.e., 10.09) and Ahmed et al. [38] (i.e., 10.08). However, their TRL values are, respectively, 288 and 272 minutes, which are much larger than ours (i.e., 233 minutes). Tables 8 and 9 separately compare the detailed performance indicators of the solutions in Figure 10 from the perspectives of passenger and operator.
In Table 8, we compare our results with former research studies in terms of d 0 , d 1 , d 2 , d un , ATT, and TRL. e presented solutions of our work are selected from the obtained approximate Pareto front generated by NSGA-II with variable |R|. e TRL values of Nikolic and Teodorovic [16] and Kechagiopoulos and Beligiannis [13] are obtained through our evaluation functions since they did not report these values.
From Table 8, the proposed solution method outperforms most of the former results. e ATT values of our method are very close to the smallest ATT values in four cases. For 4-route solutions, the best ATT (obtained by Ahmed et al. [38]) is 10.48 which is slightly smaller than ours (i.e., 10.56). However, our solution does have a much smaller TRL value (i.e., 124) when compared with the result (i.e., 148) in Ahmed et al. [38]. For 6-route solutions, the best ATT (obtained by Buba and Lee [36]) is 10.16, which is slightly smaller than ours (i.e., 10.19). e difference is less than 1%. On the contrary, the TRL of our solution is 195 (i.e., 12% less than 223). From the passenger's perspective, our solution method is competitive with other approaches in the literature.
Recall that the |R| value is not predefined. As a result, there is only one best solution from the operator's perspective (in the last column in Table 9). So, we supplement the best solutions from NSGA-II with fixed |R| in Figure 8.
e results of the proposed NSGA-II with fixed |R| are less impressive than other literature. Our operator's costs are close to the lower bound of TRL. is is because the objective of our search algorithm is to search for the approximate Pareto front, instead of optimizing the passenger and/or operator cost. Generally, the best TRL values of our results are slightly larger than the lower bound of TRL (i.e., 12%). e best result for operator obtained from NSGA-II with variable |R| contains only 3 routes. Figure 11 compares the obtained approximate Pareto front with recently published results for Mumford's larger networks in Mumford [27], Kilic and Gok [14], Kilic [42], John et al. [43], Buba and Lee [36], Nayeem et al. [15], and Ahmed et al. [38]. e approximate fronts obtained by John et al. [43] and Kilic [42] are obtained via private communication. Note that the results of Nayeem et al. [15] are obtained by evaluating the reported final solution in their work (the only feasible route set is for network Mumford1).

Mumford's Networks.
From Figure 11, it can be seen that for network Mumford0, the obtained front has a large duplication with former results. When network size grows, the gap between the obtained front and former results gets larger. is implies that our method has an advantage over large networks. e approximate Pareto fronts acquired by our method cover a larger area than other results because the |R| value is not predefined. Tables 10 and 11 summarize the detailed performance indicators of the best solutions from the perspectives of both the passenger and the operator in Figure 11.
In Table 10, we compare our results from the passenger's perspective with other methods for Mumford's four larger networks. Note that the results of Nayeem et al. [15] are obtained by evaluating the reported final solution in their work (the only feasible route set is for network Mumford1).
From Table 10, it can be seen that our proposed method with variable |R| outperforms the methods in Mumford [27], Nayeem et al. [15], Kilic and Gok [14], John et al. [43], Kilic [42], Buba and Lee [36], and Ahmed et al. [38] in terms of the ATT. e proposed solution method also found the best d 0 in all instances except network Mumford2. e percentages of demand satisfied with at most one transfer (d 0 + d 1 ) are  Mumford [27] Chew et al. [32] Kılıç and Gök [40] Kılıç and Gök [14] Nikolić and Teodorović [16] Kechagiopoulos and Beligiannis [13] John et al. [43] Kılıç [42] Ahmed and Lee [36] Ahmed and Lee [37] Ahmed et al. [38] NSGA         Chew et al. [32] John et al. [43] Kılıç [42] Ahmed and Lee [36] Ahmed et al. [38] NSGA-II with fixed |R|    over 99% for all instances. Although the TRL values of our approach are generally larger than other approaches, a solution that dominates the results of existing methods in terms of both objectives can be found in Figure 11.
Besides, Table 11 compares our results from the operator's perspective for Mumford's four larger networks with recently published results in Mumford [27], John et al. [43], Kilic [42], Buba and Lee [36], and Ahmed et al. [38]. Our approach found the best results in networks Mumford1, Mumford2, and Mumford3 in terms of TRL.
is is not unexpected, since the number of routes in our approach can be alternated by the route set size alternating heuristic as  Nayeem et al. [15] Kılıç and Gök [14] John et al. [43] Kılıç [42] Ahmed and Lee [36] Ahmed et al. [38] NSGA-II with variable |R|    long as the solution is feasible. e best solution for network Mumford0 is obtained by Ahmed et al. [38]. e TRL of their result is one minute less than ours (i.e., 95 minutes). On the contrary, the ATT of our result is smaller than theirs, and all demands can be satisfied with our route set. Generally, the proposed solution method is competitive with existing approaches in terms of both objectives for larger instances.

Recommended Value of |R|.
In the TNDP, it is difficult to state a suitable number of routes for the best solution [44]. By embedding a route set generation heuristic in a biobjective optimization framework, we try to find the approximate Pareto front without presetting |R|. e aim of this section is to demonstrate the efect of |R| on both objective values in the approximate Pareto fronts. Figure 12 shows the |R| values of the approximate Pareto front solutions for five networks. As expected, both objectives are generally conflicting. With |R| increasing, the ATT decreases and the TRL increases at the same time.
When |R| is reaching the theoretical largest value, the ATT will converge to the theoretical lowest value (shown in Table 2). When |R| is larger than the theoretical largest value, the increase of |R| will only lead to larger TRL but no decrease in the ATT. e figures reveal that when |R| reaches a certain value, a further increase in the |R| value can barely lead to decreasing of ATT. For example, for network Mumford3, the ATT decreases from 27.54 minutes to 27.29 minutes when |R| increases from 80 to 100, while the TRL increases from 6894 to 8688. is demonstrates that a large |R| is not always beneficial to both the passenger and the operator.
From Figure 12, it can be seen that the number of solutions with specific |R| in the approximate Pareto fronts is fairly small for four larger networks. is indicates that the best solutions with predefined |R| are only a small part of the approximate Pareto front. e proposed approach with variable |R| is more practical and efficient for large networks as the search process for large networks is computationally time consuming. Based on the observation of Figure 12, we present a recommended range of the |R| value for each instance in Table 12. It should be noted that the recommended values are only meant for future researchers to limit the number of routes to avoid wasting computation resources on solutions with extreme trade-off level.

Conclusions
An integrated solution method is proposed to simultaneously solve the transit network design problem and the route set size determination problem. e solution method includes a novel initial route set generation algorithm that aims to produce high-quality initial solutions for the optimization algorithm and to explore the solution space with a different number of routes, and it also contains a route set size alternating heuristic to alternate the number of routes in a solution. ese two algorithms are embedded into NSGA-II-based solution framework to find the approximate Pareto front in terms of both objectives (i.e., the average travel time and total route length). e proposed initial route set generation algorithm is based on the intuitive observation that maximizing the travel demand that can be satisfied directly (without transfers) is a key component in the generation of an initial route set. e initial solutions produced indicate that the proposed initial route set generation algorithm is capable of producing highquality initial solutions for large networks. Note that the initial solutions obtained for networks Mumford2 and Mumford3 are dominating final solutions generated by some former research studies in terms of both objectives. e comparison of computation time also shows that the proposed algorithm is very efficient, as it only takes three seconds to generate an initial solution for the largest network (i.e., Mumford3).
Based on the analysis of the allocation of each travel demand, to improve the local optimal solution and explore a solution space with a different |R| value, a route set size alternating heuristic is developed to switch the number of routes in a solution. e experimental results show that the route set size alternating heuristic can save considerable computation time without deteriorating the quality of the approximate Pareto front obtained. When compared with the fixed route set size approach, the approximate Pareto fronts obtained by varying the number of routes in a solution can lead to a wider range of trade-offs, which enables a decision maker to focus on the trade-offs within this constrained set of approximate Pareto front solutions, rather than needing to concentrate on a fixed number of routes before the generation of a transit network. For future work, the proposed solution method should be extended to include bus headway determination and fleet size constraint, and we will extend our research by extracting dynamic transit demand from available sources.
Data Availability e transit network instances used to support the findings of this study have been deposited in http://users.cs.cf.ac.uk/C. L.Mumford/Research%20Topics/UTRP/Outline.html. Disclosure e authors are responsible for any error in the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.