An Urban Bus Network Generation Algorithm Based on Particle Swarm Optimization and Force Field Properties

Due to continuous urban sprawl, large-scale bus network design has become a major challenge in urban transport planning. &e continuous increase in urban population and scale makes the factors considered in the urban route network design increasingly complex. Contemporary public transportation network design problems are based more on efficiency goals such as the accessibility and comfort of the transportation network, which increases the difficulty of analyzing the problem. Bus network design is not only an NP-hard (nondeterministic polynomial) problem but also a multivariable and multiobjective problem. &is paper focuses on the bivariate and multiobjective bus network design problem of route generation and station selection. &is paper proposes an algorithm called the Pseudo Force Field. By combining the idea of Particle Swarm Optimization (PSO) and the properties of the force field, a feasible route generation scheme is provided for the design of the bus network. &e algorithm does not need to determine the end station and has a high degree of completion of the demand.&is solves the problem of the selection of terminal stations in large-scale road network design. On this basis, the article combines Genetic Algorithm (GA) and Pareto frontier to provide a new route optimization algorithm and proves the effectiveness of the algorithm. &e model has achieved theoretical results in the design of the bus route network in the megacity of Shenzhen, China.


Introduction
Public transportation is one of the important ways to achieve a balance between supply and demand of transportation, energy savings, and emission reduction, and it has been highly valued by cities at all levels worldwide. According to existing research, transport systems in some countries account for approximately 20% of total annual greenhouse gas emissions from the energy sector [1]. Motorized travel is the main contributor to the worsening global greenhouse effect, and a reasonable public transportation system is of great significance in reducing the global greenhouse effect and achieving global sustainable development [2]. On the other hand, the public transportation system is the main mode of travel for residents of large cities. According to a survey report on residents' traffic behavior and willingness conducted by the Shenzhen Municipal Government in 2019, public transportation accounted for 48% of residents' motorized travel, reaching 61% during peak hours. In terms of travel experience, more than 40% of residents hope to have a better travel experience. e public transportation system plays an irreplaceable role in the travel of urban residents. A good public transportation system helps to improve the people's happiness index and promotes the economic development of the city, which is of great significance to the development of the city.
Among the existing travel modes, rail transit and bus transportation are the main forms of public transportation. e concept of the transit network design problem (TNDP) was first proposed by Baaj and Mahmassani in 1991 [3] to describe the public transport system network design problem. It is a typical TSP-type multiobjective NP-hard problem [4]. e common urban rail transit is mainly based on the subway, and bus transportation is realized by the operation of the bus network. In contrast, bus transportation has more flexible routes and greater accessibility than rail transportation, but it is far inferior to rail transportation in terms of operating speed and load capacity. In cities with a variety of public transportation, the route network design method of bus transportation often accommodates some traffic demands that cannot be met by rail transportation. erefore, the design of the bus transportation system is often more rigorous and meticulous than rail transportation, and more influencing factors need to be considered.
In the existing research, most of the route generation algorithms are studied based on economic benefit indicators such as route length and optimized based on them [5]. However, in the design of an urban route network, the priority of the route network's requirements for accessibility and demand completion has exceeded the traditional economic benefit index. Moreover, the problem of low demand completion will lead to a general decrease in the quality of the overall set of alternative routes, which may cause the results obtained by the algorithm to lose practical value. is is particularly prominent in large city network design. erefore, a route generation method with the main purpose of demand fulfillment is of great significance to the design of an urban route network.
In this paper, a route generation algorithm is proposed. e prototype of the algorithm is inspired by the logistics distribution problem and has similarities with the Particle Swarm Optimization (PSO). is paper will construct a Pseudo Force Field model based on this idea and design a new route generation method. e main contributions of the research are as follows: (1) e article explores a new way of route generation.
is method combines the properties of the PSO algorithm and the physical force field to provide a better initial solution set for the Genetic Algorithm (GA). e combination of algorithm and GA is shown in Figure 1. e article will demonstrate the advantages of this algorithm compared with the traditional initial solution generation algorithm through the first experiment.
(2) Make a fundamental change to the optimization method of the GA and propose a new optimization method. rough this route generation method, the GA is transformed from the traditional route selection problem to the site ranking problem. rough experiment two, the article mainly proves that this optimization method can ensure the route network quality of the algorithm in the optimization process and expand the whole solution set space.

Literature Review
Route network analysis is often analyzed by transforming it into a topology network structure. rough mathematical topology techniques, the relative influence between different routes and the traffic efficiency of the route network is studied [6]. Rivera-Royero et al. study route network performance from 11 RNP concepts and develop a classification scheme to map possible relationships and boundaries between them [7]. Munir et al. provide a template for the analysis of demand type indicators by evaluating the effectiveness of travel demand management strategies [8]. Khan and Fatmi provide a metric for assessing the safety of traffic networks, filling a gap in this field [9]. Jiang et al. analyzed the impact of route network design on people's lives from multiple perspectives, including environmental pollution, traffic accidents, noise emissions, and so on [10]. e researchers analyzed the impact of route network design on urban operation through different angles and means. is further reflects the importance of route network design and provides a strong theoretical basis for route optimization evaluation.
In the early design of the route network, the economic benefits of the route were the main consideration in the research. Pentek et al. demonstrated a relatively classic economical route network design method through the study of forest route design [11]. In the optimization algorithm, two-layer planning is the main method of early route network design. is is a way to optimize the route network based on the idea of mathematical operations research. e research of Ben-Ayed et al. and Zhang and Gao is a typical example of two different periods of this kind of road network algorithm [12], [13]. With the introduction of bionic algorithms, traditional mathematical programming methods such as bilevel programming have been gradually replaced. Early bionic algorithms are cited, such as Martins and Pato and Pattnaik et al., who applied the tabu road algorithm and genetic algorithm, to the problem of bus network design [14], [15]. Ngamchai and Lovell improved the encoding method of the route algorithm, which greatly improved the problem-solving efficiency of bionic algorithms [16]. e abovementioned studies are all designed with basic route attributes such as route length as the research object, and there are very few algorithms for generating initial routes according to requirements. However, there are already relatively mature algorithms for the demand data extraction of urban traffic systems [17]. Due to the high volatility of demand data, the relevant fluctuation of demand data cannot be considered in road network design. erefore, the current public transportation network design is based on deterministic transportation networks and deterministic travel demand [6]. Zhang et al. provide a study on route saturation optimization by combining the knowledge of the golden ratio to improve the genetic algorithm [18]. is research is more classical research on the problem of demand nature. Badia et al. further deepen the accessibility of the demand problem and add the transfer accessibility problem to the TNDP problem [19].
According to a summary of the existing research, the route network optimization algorithm has gradually matured in theory. However, there is still a large gap in the research on demand orientation, including route generation and solution. e algorithm proposed in this paper gives an example of the research on demand-based goal-oriented route network optimization. In fact, the algorithm idea has been adopted as the generation method of the initial route in the road network design by Sun et al. [20]. In addition, there have been similar studies in other fields on the combined application of PSO and GA, and it has been proven to be feasible. Taking the research of Pandey et al. as an example [21], they combined PSO and GA to build a multiobjective model to solve the problem of increased power loss in the power supply system. In the design of public transport network, GA cannot solve the bivariate conflict between station selection and route generation, which is particularly prominent in large-scale route network design problems.
is paper provides a feasible solution for the route generation of GA by combining the PSO idea and the force field properties. In the evaluation and discussion in this paper, the fulfillment of requirements is the most important research objective.

Assumptions.
In the design of this study, the following assumptions are made. ese assumptions are valid in subsequent model representations and simulation calculations. Assumptions will better constrain the application scenarios of the model and improve the operational efficiency of the model.

Journal of Advanced Transportation
(1) e model does not consider the problem of one-way and two-way traffic on different road sections, and this aspect is not constrained in the simulation calculation. (2) Routes only consider major bus service lines, and each line must connect two terminal stops at both ends. (3) ere are no isolated bus stations in the station set; that is, each station connects at least two lines. (4) e experimental background assumes that the influence of the subway on passenger transport is unchanged, and the influence of the route structure on the subway is not considered in the calculation.

Explanation of Symbols.
In the algorithm, the latitude and longitude grid is regarded as a two-dimensional plane space for calculation, where the longitude is the x coordinate and the latitude is the y coordinate. e following variables are defined as initial variables that do not change in the fulltext discussion. ey are the origins of other variables. e variables are shown in Table 1. In the formula of the article, " ∘ " stands for Hadamard product and "×" stands for matrix multiplication. "·" stands for number multiplication and vector multiplication. First, some special matrices are explained. ese matrices are used in subsequent formulations. ey do not have practical meaning in the discussion of this article. Ones(X, Y) represents a column matrix whose elements are all 1 and whose shape is (X, Y). I(A) is a matrix binary function: e demand data are contained in the two NEED-type matrices corresponding to the NEED level (i,j) and NEED matrices in the above table. Each element in the matrix represents a requirement, and they are all square matrices with the same dimension as the number of stations. Among them, NEED (i,j) represents the demand from the i-th station to the j-th station. NEED level (i,j) represents the demand level corresponding to the station i to station j ( e definition of the demand class will be detailed in the data processing section). e OD matrix contains information about the accessibility of the route, including three matrices OD (i,j) , ODVqx point , and ODVqy point . OD is a square matrix with the same dimension as the number of stations, and the value of OD (i,j) represents the length of the route between two stations; that is, the matrix contains the length information of the route and reachability information. If the element is 0, it means that the two points are not reachable. For the two station vector matrices of Vqx point and Vqy point , each element of the matrix satisfies the following: where Point ix and Point iy represent the x-coordinates of the i-th and j-th stations, respectively; Point iy and Point jy represent the y-coordinates of the i-th and j-th points, respectively. According to the above formula, the mathematical expressions for ODVqx point and ODVqy point are as follows: (3)

Pseudo Force Field Algorithm.
In the original PSO, the algorithm simulates the feeding habits of the bird flocks to find the optimal solution. Since the optimization process of PSO is similar to the path generation process, it is often applied to the line generation problem. In the route generation algorithm introduced in this paper, the idea of PSO in the optimization process is borrowed to deal with the relationship between route generation and demand changes. Specifically, the Pseudo Force Field algorithm will construct a fitness function according to different locations and changing demands during the route generation process and update the "speed" of route generation.
According to the expression in Sun et al. article, the pseudo force field algorithm is based on the basic form of electric field strength calculation [20]. Considering the basic properties of the electric field, there is the following formula for the magnitude of the electric field experienced at a point in the R 2 space: e calculation of the force field for each station in the route network should not be affected by the demands of all other stations throughout the route network. When calculating the effective field force experienced by each calculated station, the set of stations affecting the calculated station needs to be determined. Taking the design of shorter routes as an example, stations that cannot be reached under the specified route length should be excluded from the station set [22]. For a route design from a station, the set of stations where Point represents the first station of the route; T is a column matrix of (Pointnum, Pointnum) shape; NEED * level represents the Demand Class Matrix generated by all valid stations. In the parametric design, the setting of the size of the influence area will be slightly larger than the length of the route. Its purpose is to solve the problem that the route length may have reached the target during the route generation process, but the route has not reached the next terminal station, resulting in the failure of route generation. In the simulation calculation, [BGnode/Pointnum] is used as the expansion volume of the station set, which is designed based on the density of terminal stations in all stations. If the rate of generation is slow, the value can be adjusted subjectively according to the actual situation without affecting the subsequent calculation. In the process of generating the route, each station is regarded as measuring 0 on R 2 , and the charge of the station is its initial demand (the initial demand refers to the sum of the demand from this point as the starting station). e point charge between the points is on R 2 , with the Euclidean distance as the distance between the two. Under the assumption that the difference between vehicles is not considered, the urban bus network will be regarded as a simple superimposed electric field, and the electric field force experienced at each point is the electric field strength at that point. e specific presentation in the route network is shown in Figure 2.
A vector diagram of the local field strength at a point is presented in Figure 2. At this point, the calculated point is subjected to the field strength of all valid points (or stations). e field strength acting at each point is affected by the charge at the effective point and the distance between it and the calculated point. e vector direction of the field force on the calculated point in the force field can be obtained by superimposing the vector of the field strength on the calculated point. Its formula is as follows: In the formula, NEED * * level represents the matrix formed by the initial demand (the definition of the actual meaning of this matrix will be explained in Part 4.2, formula (13)), which is a (pointnum, 1) matrix. F Point i represents the resultant force of point i in the pseudo force field, which is a twodimensional vector.
According to the force field vector of the calculated point, the most suitable approximate path direction at the point can be obtained at this time. It will direct the route to areas with more intensive demand to ensure that the route can complete more demand. In the algorithm, the direction of the force field vector will be used as an important criterion for choosing the next station of the route. at is, the vector

Journal of Advanced Transportation
here is the direction of the "speed" determined by the particle at that point according to the demand data. In the selection of the next station, the station with the smallest cosine of the angle between the vector of the reachable station and the calculated station and the force eld vector of the calculated station itself will be used as the next station to be selected. e speci c presentation is shown in Figure 3.
In Figure 3, the next reachable station should select the station with the largest cosine value of the angle between the two vectors. e vector direction between two points has nothing to do with the route; it is only related to the relative position of the two points in space. In the process of selecting the next reachable station, the station with the cosine value of the included angle less than 0 should be eliminated rst to avoid the phenomenon of loopback or going back in the route. In route generation, reachable stations are selected by looping until the length of the route reaches the given demand and another starting and ending station appears in the set of route stations. Its basic mathematical formula is expressed as follows: where In the above formula, newpoint represents the selected reachable stations. Suppose its node ID as i, then matrix newpoint is a (Pointnum, 1) shape which elements are 0 except newpoint (i,1) as 1. e route generation algorithm makes the route go to the demand-intensive area as much as possible under the constraint of the speci ed route length to solve the problem that the route avoidance demand produces invalid routes to optimize some demanding goals in the algorithm solution. In this algorithm, the size of the algorithm's "speed" does not change. e algorithm only uses the direction of "speed" for station selection, and the distance traveled each time is one station. e inertia index of the algorithm is 0. at is, the speed calculation at each station (or position) is completely determined by this point, regardless of the "speed" of the previous stations.

Model Optimization.
rough the Pseudo Force Field algorithm, the route design problem can be transformed into stations or station groups ranking problem, and the optimal Pareto frontier can be obtained by combining the GA algorithm. However, in the process of generating the route, the algorithm often encounters many problems, such as loopback and interruption. For this, more ne-grained constraints must be placed on the model. In this work, the following three basic requirements are put forward for route generation: (1) Generated routes without loopbacks or station duplication; (2) e rst and last points of the generated route must belong to the set of terminal stations; (3) e route length is equal to or greater than the required length threshold.  If the generated route cannot meet the above constraints, it is considered that the route generation fails and the route needs to be discarded.
For the nonreordering that already has a set of stations or station groups, if the station is directly sorted, it is enough to loop through the station sequence. If sorting by station group, to generate several groups of routes with lengths A 1 , A 2 , A 3 , . . ., and the number of routes a 1 , a 2 , a 3 , . . ., it is necessary to assign tasks to the station groups. Since the number of stations in a station group is not necessarily the same, the number of routes undertaken by each station group should also vary. For a station group E, let the ratio of the number of sites it contains to the total number of stations be E ′ . For n types of routes with different lengths, there is the following formula for the number of routes undertaken by the station group: Each item of the series in the formula represents the corresponding number of routes undertaken by this station group for a certain type of route length and then assigns the number of routes to each station group. In generating routes, the choice of stops within a station group is random. ( e random selection will be explained in detail in the data processing section later.) In the process of generating a route, regardless of whether there is a route output, each time a station is selected, the station will be removed from the station group. is method avoids the fact that multiple routes are generated at the same site at the same time, resulting in a high degree of route coincidence, which will cause the route generation effect to decrease. If the stations of a station group have completely taken the station group, but the route task has not been completed, record the remaining route task amount and temporarily skip the station group. After completing the tasks of all other station groups, retraverse the station groups and complete the original unfinished build tasks for each station group. In the classification, a certain station group may account for a very small proportion of the entire number of stations, so the station group cannot be allocated to line tasks according to the proportion. At the beginning of assigning tasks, if the number of routes undertaken by the station group is 0, then add 1 to the number of routes for this station. is may cause the number of routes to be inconsistent with the originally designed number of routes, but it guarantees the number of routes.
After a route is generated, the generated route will affect the demand data, so it is necessary to reduce the demand level of the corresponding demand satisfied by the route. e specific formula is as follows: Assuming that the route can connect station i and station j, reduce the demand level from station i to station j and station j to station i by 1 level (the basis for the reduction will be detailed in the data processing section below). If the demand level is 0, keep the original value level unchanged. e final output of the Re function is a matrix with the same shape as the initial parameters of NEED level . NEED(new) level and NEED level are the same variable. NEED(new) level is a temporary representation of NEED level after the update. is part of the modification will affect matrix NEED level . e meaning of which is more intuitively represented by Figure 4.
In the process of generating the route, the route is processed by the needs of the station to avoid the reverse situation and so on. Take a column matrix of Pointnum rows of NEED * * level to represent the tentative demand in the route generation process. Among them, NEED * * level (i,1) represents the sum of the demand level of the point with node ID i; that Journal of Advanced Transportation is, the charged amount of this point. NEED * * level only takes effect in this route generation and does not affect the original demand data.
In the process of route generation, NEED * * level should be changed as the route is extended. When the route selects a new station, demand should change as follows.
(1) e demand (or charge) in NEED * * level of the passed stations should be defined as 0.
is ensures that routes do not loop back or be chosen repeatedly.
(2) When a new stop is selected for the route, all demand starting from the new stop should be "activated." In other words, for all the demands starting from this station, the target station of this demand needs to increase the corresponding demand (or charge) in NEED * * level . e amount of increase is determined by the level of that requirement.
NEED(new) * * level � NEED * * level OD, where C � C (i,1) � 0, for point i is Selected Point, Formula (13) explains the relationship between NEED * * level and NEED * level . e definition of NEED * level is already given in formula (5). In addition, formulas (14) and (15) show that NEED * * level is constantly changing during route generation. ese changes will only take effect for this route generation, not for other routes. NEED(new) * * level and NEED * * level are the same variable. NEED(new) * * level is just a temporary representation of NEED * * level after making the previous two changes. C, D are two matrices of shape of (pointnum, 1). In formula (14 and 15, formula (14) needs to be performed before formula (15) in each processing.
is guarantees that the demand (or charge) of each passed station is 0. e specific generation process is shown in Figure 5.
In the selection of the length of the line, the short line missions are given priority. On the one hand, during the task of generating short lines, the set of stations that are valid for the calculated station can be determined according to the scope of influence and will not be affected by the distant unreachable stations. On the other hand, short and small routes can clear the demand in small areas, reduce the complexity of demand data so that further long routes will not be affected by the demand in small areas, and better complete some cross-regional demand work, maximizing the realization of the value of long lines.

Evaluating Indicator.
In modern bus networks, route network analysis should consider the operational efficiency and comfort of the network [23]. According to the actual situation, two indicators are used as measurement standards in the calculation, namely, the load degree of the network and the complexity of the route network. e route index mainly involves the travel demand of passengers and the length of the route, which reflects the complete efficiency of the route to the demand and the comfort of passengers. In the calculation, the average load level of all routes in the entire route network is used as the rating index, which is the ratio of the passenger turnover of a route to the total vehicle mileage. Its mathematical formula is as follows: In the formula, N represents the number of routes that can satisfy the demand between stations i and j; P is the set of all stations that the route passes through; Roadlen represents the length of the route segment from the i-th station to the j-th station on the route; Roadnum represents the  Journal of Advanced Transportation number of routes; Weight road represents the load of a single route in the route network. e complexity of the route network can intuitively reflect the load balance of all routes in the bus network. e indicator adopts the Lorentz curve model and is calculated based on the Gini coefficient formula that reflects fairness. Its specific formula is as follows: All routes are arranged in ascending order according to parameter Weight road in formula (18), Weight road i in formula (20) represents the value of the i-th route in the arrangement on parameter Weight road . e abovementioned Weight network in formula (19) and H in formula (20) are the main bases for obtaining the Pareto frontier. e subsequent optimization process and results will revolve around the above two indicators.

Genetic Algorithm Design and Pareto Frontier.
Due to the large number of terminal stations, the terminal stations can be classified into several station groups by K-Means clustering (the details of this part will be explained in detail in the Data Processing, Part 5.2.1). In the population structure, each group of bus networks is regarded as an individual, and the multiple groups of bus networks generated by the ordering of multiple groups of different terminal stations are regarded as a population. According to this situation, the fitness determination in the genetic algorithm will also be analyzed based on the score of each bus network individual on the two indicators.
Before detailing the GA, the application of the Delaunay algorithm to the Pareto Frontier needs to be supplemented. e Delaunay triangulation algorithm can construct several points on a two-dimensional plane into a nonconcave triangular network. According to this algorithm, all edges and edge points can be extracted through the triangular network. According to the basic principles of data envelopment analysis (DEA), it is not difficult to prove that the Pareto front must be composed of several edge points and edge segments of the triangulation. erefore, the Pareto Frontier of the two-dimensional plane point can be obtained by reasonably sorting the edge points extracted by the triangulation network in the calculation. In the specific calculation, the minimum value of any index in the edge point set is selected as the first selection point. Set it as M 1 � (x 1 , y 1 ); then, in the solution set DM composed of frontier solutions, the correspondence between other frontier points and this point should conform to the following formula: where According to the above formula, all the selected edge points and edges are connected to form a polyline as the Pareto front. With Delaunay's nonconcave triangular network properties and the basic theory of DEA in the Pareto frontier, it is not difficult to prove that all other solutions of Pareto support can be realized for this set of Pareto frontier points, which has been confirmed in other fields [24].
In the model solution, several stations or station groups are randomly arranged and the corresponding bus network is generated as the initial population. According to the Pseudo Force Field algorithm, each station ranking represents a set of definite and unique bus networks; that is, the station ranking can form a one-to-one mapping relationship with the bus network. e two negative indicators of the network are combined with the Delaunay algorithm to obtain the Pareto frontier individuals in the population [25]. e two index values of each individual are used to map the point Q on the two-dimensional coordinate system, connect the point to the origin, and the intersection P of the straight line and the Pareto frontier, with the formula: Solve Mark as fitness in GA, which is shown in Figure 6.
In the processing of the optimization algorithm, the traditional sorting genetic algorithm is used for calculation. Because this paper does not study the performance of the optimization algorithm, very precise considerations are not adopted in the GA optimization process. In the subsequent calculation work, the GA algorithm will face the optimization of the sequence number sorting problem without repetition. e specific crossover and mutation processes are shown in Figures 7 and 8.
In addition, in the GA algorithm, individuals are selected in the form of roulette. is selection is made depending on the probability constructed according to the fractional proportions defined above. e specific GA process is shown in the following pseudocode.

Journal of Advanced Transportation
To improve the convergence speed of the calculation, an elite retention strategy is adopted in the algorithm. Each time a Pareto frontier solution is generated, the value of the frontier solution, including the station group ordering corresponding to the value, and the generated bus network will be retained to the next generation to participate in the Delete duplicate numbers outside the protected area Algorithm: GA Input: Demand, terminal station, station Output: route network (1) def Pseudo Force Field (terminal station sequence): (2) output route network (3) def Evaluation (route network): (4) output Pareto Front (5) begin (6) initialize terminal station sequence (7) T ⟵ 0 (8) while (T < cycle times): (9) T T + 1 (10) route network ⟵ Pseudo Force Field (terminal station sequence) (11) Mark ⟵ Evaluation (route network) (12) new sequence ⟵ Inheritance, Crossover, Mutation by Mark (13) terminal station sequence ⟵ new sequence (14) end while (15) end ALGORITHM 1: Pseudocode for the optimization algorithm.
construction of the Delaunay triangulation and the competition for survival. If a new solution replaces or joins the original frontier solution, the solution becomes part of the frontier solution, and the replaced frontier solution is treated as a common solution, participating in algorithms such as selection and intersection. It can enrich the frontier solution set of the results and provide more data for analyzing excellent route network results.

Numerical Experiment
e results will mainly show the advantages of the algorithm from two aspects: the change in demand completion in route generation and the changes in demand completion in the optimization process. e experiment uses K-Shortest-Path (KSP) algorithm for comparison. e KSP algorithm is the most commonly used initial route generation algorithm. In the GA of a large route network, it is the main way of generating the initial route set. As a comparison, the effectiveness of the algorithm in this paper can be visually demonstrated.
e approximate data of the computing environment of the two algorithms are shown in Table 2.
For the selection of route number, it needs to be set before the experiment. e design of these parameters will be affected by the actual situation. e city of Shenzhen has 2,406 simplified stations after sorting, with a total of approximately 900 conventional lines. For the theoretical route generation design of 512 stations, that is, approximately 1/5 of the area of Shenzhen, the theoretical estimate of the number of routes should be approximately 180. Using the period of maximum passenger flow among all periods of the regular day as the calculation data, the number of routes is set to be 5% more than the regular number. In the calculation, 200 routes will be generated.

Performance of Route Generation.
In the first experiment, more than 180 routes were generated and recorded with random station ordering using the Pseudo Force Field algorithm. In the generated route group, the routes with the same start and end station pairs are excluded. In the KSP algorithm, the generated route starting and ending station pairs are used and the shortest route is generated. Again, the routes generated in this section follow the guidelines mentioned earlier. ese guidelines are relisted in the following sections: (1) e generated route does not have loopbacks. at is, there are no duplicate stations in the route; (2) e generated route must use the terminal station as the start and end of the route; (3) e route length must be longer than or equal to the length of the route task design; (4) e route does not allow the same terminal stations.
Routes that do not meet the above requirements need to be eliminated during the generation process and are regarded as invalid routes. In the subsequent optimization experiments and generation comparison experiments, the above basic constraints need to be observed.

Comparison.
For the route algorithm proposed in this paper, routes of length 30 (according to the algorithm, it may be longer than 30) are generated in the calculation, and the number of routes keeps adding up. e route is assumed to fully complete all requirements traversed, regardless of route affordability. is means that the reduction of the demand level mentioned in Part 4.2 should be defined here as zero for passing demand. Moreover, each demand is defined as 1, and no demand grading is performed anymore. e variation between the route and demand completion of the two algorithms is shown in Figure 9.
As shown in Figure 9, the Pseudo Force Field algorithm has a better performance in demand completion. In the same process of generating 184 routes, the Pseudo Force Field algorithm can complete 41% of the demand. In contrast, the KSP algorithm fulfills approximately 15% of the demand. It means that within a reasonable number of routes, the Pseudo Force Field algorithm is more efficient for demand completion. Experiments with more routes were not performed and shown because they were considered to be in excess of reasonable numbers. Too many routes are meaningless for transit network design, even if it performs better in demand fulfillment.
rough the above experiments, the line generated by the Pseudo Force Field algorithm has a high completion efficiency for demand. is has an excellent performance in  large urban road networks. In the following experiments, we will further demonstrate the performance of the Pseudo Force Field algorithm in optimization.

Performance of Optimization
Process. Based on the Pseudo Force Field algorithm, a set of route networks can be generated in sequence through an ordered station code. erefore, the problem of bus network design will be generated from the original route and transformed into the problem of arranging the terminal stations. is experiment will demonstrate the stability of the generative algorithm to satisfy the demand during the optimization process through two indicators related to the demand.

Data Processing.
ere are 117 terminal stations included in the 512 stations in the selected range. Due to the limitation of computing power, it is not possible to directly optimize the ranking of the stations in the calculation. erefore, the K-Means clustering method is used in the simulation calculation, and the starting and ending stations are divided into a limited set of station areas [26]. Each area contains several terminal stations, and the 117 station codes are converted into station group codes. Since the selection of stations within the station group is random in route generation, the station group code and the road network cannot form a meaningful mapping. To ensure the successful convergence of the algorithm, the station classi cation must satisfy the stability of the mapping between the ranking code of the same station group and the score generated route network. at is, the score uctuation should be within a small range. It is not di cult to see that for a xed set of terminal stations, the fewer station group categories, the higher the volatility of the score. erefore, it is necessary to nd a suitable number of station classi cations through experiment.
Based on the above theoretical basis, the K-Means clustering method is used to continuously reduce the number of classi cation categories from 100 categories. Each time the same station group is sorted and encoded to generate 50 groups of route networks, the similarity between the route networks is calculated, and the stability of the classi cation method is analyzed from the degree of dispersion of the road network on the two index data [27]. Its speci c calculation formula is as follows: In the formula, the data x represent the vector on the R 2 metric space distributed on the two indicators of route network complexity and route network load in this problem, and x i represents the value of the i-th data.
In addition, to account for the order of magnitude difference between the two indicators, the variance calculation uses the degree of change relative to the mean as the raw data.
at is, the raw data are processed by the following formula: where μ represents the original value and n represents the number of route network groups. According to the information entropy obtained by the above formula, combined with the respective variances of the two indicators, 48 route networks are generated for each classi cation method for data analysis. e speci c data performance is shown in Figure 10.
When the overall data are in the 10-60 categories, there will be large numerical uctuations in the overall degree of confusion and the variance of the load degree, and the reference as a basis for classi cation is poor. After the data are greater than 60 categories, the variance of the load degree, the variance of the complexity, and the overall degree of confusion all show a relatively stable or declining trend. erefore, when choosing the number of clustering categories, more than 60 categories should be preferred for K-Means area two-dimensional clustering. In the simulation calculation, the K-Means clustering method is adopted to divide the terminal stations into 70 station groups, and the route is generated and solved.
According to Article 4 of "Technical Conditions for Safe Operation of Motor Vehicles" issued by China in 2004, the oor area for standing passengers in urban buses and trolleybuses shall be not less than 0.125 square meters per person. In summary, the standard bus design veri cation number is 45 people, and the bus operation standard is 5 min during the peak period. erefore, according to the three bus scheduling standards formulated by Sheu [28], the bus dispatch frequency is set to be time-invariant, and real-time passenger demand data are considered to be collected through advanced intelligent transportation system technologies (such as automatic passenger counting systems), regardless of changing passengers arriving at the terminal and arriving at the originating station. In the calculation, it is assumed that each bus line theoretically takes 540 passengers for each one-hour demand, and the subsection clustering method is used to classify all demands. In the processing of demand data, for a certain station pair, the maximum demand of the station pair in all periods on a regular day is selected as the original demand data to construct a demand matrix. In addition, with 540 as the dividing line, the raw demand data are graded; that is, in the case of legal transportation, each time a vehicle passes through, the weight of this station is reduced by one unit. erefore, the demand level of each point where demand is not 0 can be expressed as follows: e practical signi cance of this division is that the demand from the i-th station to the j-th station satis es the required optimal number of routes.

Result of Calculation.
In the calculation, multithreading can be used for the generation to speed up the calculation speed. According to actual conditions, a total of 200 lines of 10, 20, and 30 are generated in the calculation. e proportion of routes is approximately 3 : 3 : 4, and the k value is selected as 1.5 for testing. According to the above algorithm, the variables in Table 3 for the experiments are determined in the calculation.
In the experiment, an i7-11700k is used to calculate in the Windows 10 environment, and without GPU acceleration, the time to generate 50 sets of lines by multiprocessing is 5 mins. In the presentation of the results, the solution with the minimum distance from the origin in the frontier solutions of each generation of the population will be presented as the overall population level. e two indicators corresponding to the optimal route network are shown in Figures 11 and 12. e coordinates on the left represent the distance between the Pareto frontier and the origin, in which the minimum value of the multiplication of the two indicators in the entire Pareto frontier is used as the numerical result. e right side represents two demand completion degrees, namely, the accessibility rate in the case of direct access and the accessibility rate in the case of one transfer. e image shows that in the process of solving the Pareto frontier, the algorithm ensures that the direct access rate is stable above  Figure 11: Achievable e ciency and index optimization. 60% and the one-time transfer reachability rate is above 80%. e algorithm ensures the stability of the requirement completion degree and realizes the optimization of the target. It should be noted that the demand for this experiment is de ned in the context of demand strati cation. It means that completing the same demand multiple times can repeatedly reduce the remaining value of the demand. is resulted in data-level di erences between the two experiments. Figure 12 shows the performance of the two indicators on the GA. In Figure 12, the left ordinate represents the load of the corresponding optimal route network in each iteration, and the right side represents the complexity of the route network. Both index values steadily decrease in the iterative process.
e main data changes in the experiments are shown in Table 4.

Conclusion
e above results and analysis suggest that the Pseudo Force Field has higher performance in demand completion than traditional route generation algorithms. On the other hand, the Pseudo Force Field algorithm gives researchers a unique optimization method and guarantees the quality of the route during the optimization process. Compared with the traditional route generation algorithm, the Pseudo Force Field has a larger solution space. For example, faced with a basic route network design problem with k starting and ending stations. Without considering the constraints, in the traditional optimization method, the number of combinations of n lines selected from m lines is C n m . In contrast, there are k n cases where there can be duplicate station orderings and k! cases where there can be duplicate station orderings (this paper uses nonrepeated sites for experiments). e explosive growth of the solution set space makes the advantages of  the Pseudo Force Field algorithm particularly prominent in large route networks. is means that the Pseudo Force Field algorithm can provide a route generation method with greater potential for better optimization methods in the future and further improve the route network optimization results. It is not difficult to see in the research that the common sorting GA algorithm cannot satisfy the convergence of such a large-scale solution set space. In addition, the completion of the route still needs to be improved. In the experiment, the processing method of the demand has a great influence on the generation effect of the route. Due to the lack of processing methods for changing demand, the algorithm should be more precisely considered in follow-up work to further improve the generation effect.
Data Availability e data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.