Expected Length of the Shortest Path of the Traveling Salesman Problem in 3D Space

Finding the shortest path of the traveling salesman problem (TSP) is a typical NP-hard problem and one of the basic optimization problems. TSP in three-dimensional space (3D-TSP) is an extension of TSP. It plays an important role in the ﬁelds of 3D path planning and UAV inspection, such as forest ﬁre patrol path planning. Many existing studies have focused on the expected length of the shortest path of TSP in 2D space. The expected length of the shortest path in 3D space has not yet been studied. To ﬁll this gap, this research focuses on developing models to estimate the expected length of the shortest path of 3D-TSP. First, diﬀerent experimental scenarios are designed by combining diﬀerent service areas and the number of demand points. Under each scenario, the speciﬁed number of demand points is randomly generated, and an improved genetic algorithm and Gurobi are used to ﬁnd the shortest path. A total of 500 experiments are performed for each scenario, and the average length of the shortest path is calculated. The models to estimate the expected length of the shortest path are proposed. Model parameters are estimated and k-fold cross-validation is used to evaluate the goodness of ﬁt. Results show that all the models ﬁt the data well and the best model is selected. The developed models can be used to estimate the expected length of the shortest path of 3D-TSP and provide important references for many applications.


Introduction
e traveling salesman problem (TSP) is a typical combinatorial optimization problem and is also NP-hard [1,2]. It is explained as a salesman trying to find the shortest path (or minimum travel cost) through a given set of cities, where the travel distances (or travel costs) between cities are also given, under the requirement of visiting each city only once and returning to the starting city [3]. TSP has a wide range of applications in transportation, logistics, computer science, genetics, and other fields [4][5][6][7][8][9]. e traveling salesman problem in three-dimensional space (3D-TSP) is a generalization of the classic TSP. 3D-TSP refers to an executor, such as a drone that tries to find the shortest path in the three-dimensional space and returns to the starting point after visiting each of the given demand points. e difference between 3D-TSP and the classic TSP is that the coordinates of the cities or demand points are no longer represented by two-dimensional coordinates but by three-dimensional coordinates. Many problems, such as the unmanned aerial vehicle (UAV) patrolling for inspection, could be regarded as a TSP in 3D space [10]. In these applications, it is important to know the expected length of the shortest path of the 3D-TSP under a given area and the number of demand points in order to determine the zoning strategy of the service area and the number of executors needed.
So far, many scholars have studied the expected length of the shortest path and proposed estimation models that adopt the number of cities and service area parameters as independent variables [11][12][13][14][15]. However, models of the expected length of the shortest path of 3D-TSP have not yet been studied. As a result, this study contributes to determining the influence of the number of demand points and the relevant parameters of the service area on the expected length of the shortest path of 3D-TSP and establishing the estimation models of the expected length. e remainder of this paper is structured as follows. Section 2 provides a review of the related literature on the algorithms to solve TSP and the estimation models of the expected length of the shortest path of TSP. Section 3 gives the formal definition of 3D-TSP, its objective function, and the improved genetic algorithm (GA) used in this paper. Section 4 describes the simulation experiment scenarios used in this paper, proposes the variables that can be used to construct the models, builds the models based on the results of the simulation experiment, and evaluates the goodness of fit of the models. Section 5 provides a summary of the entire study.

Literature Review
No research has yet been conducted on the expected length of the shortest path of the 3D TSP. us, the studies related to the expected length of the shortest path of the 2D TSP are reviewed.
Beardwood et al. [11] proposed that the expected length of the shortest path through N points in abounded plane region is always proportional to ��� � NA √ when N is relatively large. In two-dimensional space, this result can be used to estimate the expected length of the shortest path of the 2D TSP. e specific function is provided as Model (1) in Table 1. In the model, L is the expected length of the shortest path of the 2D TSP, A is the area of the service area, N is the number of points in the area, and α is the constant of proportionality, which is not related to the shape of the area. After that, many scholars have estimated the value of α [16][17][18][19][20][21][22][23].
Based on the study of Beardwood et al., Daganzo [12] considered the influence of shape on the expected length of the shortest route and proposed Model (2), as shown in Table 1. In the model, Φ is an unknown constant, δ is the density of points (δ � N/A), l is the width of the rectangle, N is the number of points, and A is the area of the service area. In the same year, Daganzo [13] proposed Model (3) to estimate the expected length when there is a fleet of vehicles operating in an area with heterogeneous demand, as shown in Table 1. In the model, α 1 is an unknown constant, A is the area of the subregion, N is the number of points in the subregion, D is the average straight-line distance from each point in the subregion to the depot, and C is the maximum number of times that a car can stop.
Chien [14] presented seven functions for the length of the shortest path of TSP on the basis of the actual locations of customers and the depot and conducted a large number of Monte Carlo experiments to determine the values of related parameters. e results show that when the service area and randomly distributed points in the area are provided, Model (4) in Table 1 performs the best and can obtain a highly accurate approximation of the expected length. In this function, α 1 and α 2 are unknown constants, RBAR is the average straight-line distance between the depot (origin) and customers, RECTAN2 is the smallest rectangular area that covers all the customers, and N is the size of TSP (number of customers plus 1).
Kwon [15] constructed the regression model and neural network model of the expected length of the shortest path for 2D TSP and compared the results with the model proposed by Beardwood et al., Daganzo, and Robuste et al. [24]. e results show that the model built by Kwon performed best. Different from the previous model, Kwon's model also considers the effect of the aspect ratio S and S/N of the rectangular service area on the expected length of the shortest path. e specific functions of the regression models established by Kwon are shown in Models (5) and (6) in Table 1. In Models (5) and (6), α 1 , α 2 , α 3 , and α 4 are unknown constants; N is the number of points in TSP (the sum of the number of depot and all the customers); S is the ratio of the length to the width of the service area; A is the smallest rectangular area that covers all the points; D is the average straight-line distance from all the customers to the depot.
is study primarily refers to Model (5) in Table 1 (proposed by Kwon) to construct our models.
When studying the expected length of the shortest path of the 3D TSP, we need to first solve the given 3D TSP in order to obtain the shortest path. e solution algorithm of 3D-TSP is similar to that of the classical TSP. e only difference is the calculation of the distance matrix. erefore, this paper adopts the solution algorithm of TSP to find the optimal solution for 3D-TSP. Many scholars have studied various algorithms to solve TSP. Solution algorithms for TSP can be divided into two types: exact algorithm and heuristic algorithm [19]. For small-scale problems, exact algorithms can be used to solve them. For large-scale or medium-scale problems, exact algorithms cannot generate the optimal solution in a reasonably short time, and thus heuristic algorithms are needed to find the approximate optimal solution. Some commonly used algorithms [25][26][27][28] are shown in Table 2.
For the exact algorithm, the solver Cplex or Gurobiis is usually used when the number of demand points is less than or equal to 100. For heuristics algorithm, GA is a random parallel search algorithm based on natural selection and genetics. It is an efficient method to find the optimal solution without relying on any initial data. Due to its simplicity, GA is widely used to solve TSP. However, it has a certain dependence on the selection of the initial population. As a result, GA is often improved by combining it with other heuristic algorithms. For example, Li et al. [2] proposed to combine GA with other algorithms such as greedy, hill climbing (HC), and simulated annealing (SA) algorithm to improve GA to achieve better performance. Meng et al. [29] proposed a PBIL algorithm combining GA and competitive learning to explore better solutions. Later, Meng et al. [30] also proposed a variable neighborhood search (VNS) approach that utilizes a two-stage greedy initialization algorithm to generate an initial solution and employs a 2-opt method for local search. Xu et al. [31] proposed a biobjective variable neighborhood search (BVNS) approach based on the VNS proposed by Meng et al. Kumar [32] proposed to improve the GA with a greedy algorithm to improve the quality of the initial population. Among them, the combination of GA and the greedy algorithm has generated good results within a reasonable time. erefore, this paper uses Gurobi and GA combined with the greedy algorithm to obtain the shortest path of the 3D-TSP in different scenarios. A randomly generated geometric TSP instance [3] assumes each point is independently and randomly generated from the service area based on the uniform distribution. e travel cost between two points is the Euclidean distance between the corresponding points. is study implements relevant experimental designs and randomly generates TSP examples in 3D space. In addition, TSPLIB contains more than 100 TSP instances and 85,900 cities [33]. It provides a testbed for the proposed algorithm.

3D-TSP.
Let us give a formal definition of the 3D-TSP. Suppose that there is a drone (or performer of action) that needs to visit each of the N demand points once and come back to the origin. It can be formulated over a complete is associated with a weight d ij , which represents a visit cost (or distance) between two demand points i and j. e objective is to determine the route along which the drone departs from the origin, visits each point once, and returns to the origin with the lowest total travel cost. e integer programming model of 3D-TSP is presented as follows [4]. Binary variable x ij � 1, i ≠ j, i, j ∈ V, if the drone passes through the edge (i, j); and otherwise, x ij � 0. e objective function of the problem is Subject to the following constraints: first, the drone departs from the demand point numbered 0 (origin) and returns to the origin, that is, Each demand point except 0 must be visited by the drone exactly once, that is: For each demand point, introduce a decision variable μ i , ∀i ∈ V, μ i ≥ 0. μ i is the access order of demand point i. For example, in μ 1 � 5, it can be understood that point 1 is the fifth point visited from the starting point. M is a very large number. M should be an upper bound of μ i − μ j + 1. Some papers pointed out that the effect will be better if the tightest upper bound is taken, so we take M � N [25]. Construct Miller-Tucker-Zemlin (MTZ) constraint to eliminate subtour, that is: It should be noted that since the classical TSP is NP-hard [1, 2] and 3D-TSP is an extension of the classical TSP in the 3D space, 3D-TSP is also NP-hard. When all the points of the 3D-TSP fall onto the same plane, 3D-TSP turns into the classical TSP.

Improved GA Algorithm.
Due to the complexity of 3D-TSP and the relatively large number of demand points in this study, it is necessary to use heuristic algorithms to solve the 3D-TSPs. As mentioned in the literature review, GA has a good global search ability and is easy to be combined with other algorithms for improvement. Combining the GA with the greedy algorithm can shorten the time to obtain the solution. It is achieved by using the greedy algorithm to improve the initial population of the GA since a high-quality initial population can accelerate the evolution of the GA to quickly reach a satisfactory optimal solution [2,32]. erefore, this paper chooses to use the GA combined with the greedy algorithm to find the shortest path of 3D-TSP. e initial parameters include the population size M, the number of chromosome genes N (that is, the number of demand points), the number of iterations C, the crossover probability P c , and the mutation probability P m . e specific steps to perform the algorithm are as follows.
Step 1. Generate the digital codes (chromosomes) of M feasible solutions of 3D-TSP through the greedy algorithm as the initial population (the feasible solution set). Step 2. Select the inverse of the total length of the path as the tness function, which indicates the shorter the distance is, the better the tness function is. e tness of each individual is calculated by the tness function.
Step 3. Use the wheel selection mechanism for the selection operation, eliminate individuals with low tness, and select individuals with high tness.
Step 4. Randomly select two individuals and perform the partial matching crossover. e speci c steps are as follows: rst randomly generate two intersection points, determine the matching areas between the two points, and exchange the two matching areas. If the code repetition occurs outside the matching area, it will be replaced based on the corresponding position in the matching area.
Step 5. Randomly select an individual, extract two genes from the individual code, and exchange their positions. In this way, variation of individual code is achieved.
e selection, crossover, and mutation of individuals generate a new population for the next generation. After a new population is generated, it is evaluated and then selected, crossed, and mutated. ese steps are repeated until the maximum number of iterations or the termination condition of the algorithm is reached. Finally, an approximate optimal solution is found. e owchart of the improved GA is as follows (Figure 1). Figure 2 presents an example of the 3D-TSP. In this example, there are 10 demand points and the volume of the service area is 1000 with equal length, width, and height. e red line in the gure is the obtained shortest path. It can be seen from the gure that the length of the shortest path is 42.0285. e coordinates of each demand point are marked.

Experiments and Results
In this section, we rst design the simulation experiment scenarios according to the previous literature [14,15,34], including the setting of the volume, shape, and the number of demand points of the service area.
is section rst compares the solution obtained by the GA combined with the greedy algorithm to the solution obtained by the solver Gurobi. e better solutions are selected for further analysis and modeling.

Settings of the Experimental Scenarios.
e volume of the service area, the number of demand points, and the shape of the service area are three important factors that a ect the expected length of the shortest path. We only consider the shape of the service area as a cuboid or cube. We assume that the longest side is length, the second longest side is width, and the shortest side is height.
is study also assumes that the service area has three parameters: the     volume of the service area (V), the ratio of length to width (S1, the ratio of the longest side to the second longest side), and the ratio of length to height (S2, the ratio of the longest side to the shortest side). When V is xed, the shape of the service area will change with the change in S1 and S2. For example, when S1 S2 1, which means that the length, width, and height are equal, the shape of the service area is a cube.
A total of 500 (5·10·10) experimental scenarios are designed in this study. e range of V includes 1000, 8000, 27,000, 64,000, and 125,000. Under the premise of keeping V constant, let S1 and S2 increase from 1 to 4 at an interval of 1, such that the shape of the service area is gradually elongated from a cube to a cuboid. After the service area is set, demand points are randomly generated in each of the 3D service areas. e number of demand points increases from 10 to 100 at intervals of 10. e demand generation process is repeated under each scenario.

Algorithm Comparison.
e computer environment is as follows: Windows 10 (64 bit), CPU is AMD R9-5900HX processor and baseline speed is 3.30 GHz. First of all, in order to verify the e ectiveness of the improved GA, this paper selects 4 cases from the TSPLIB database with the number of cities less than or equal to 100. e solution results are shown in Table 3. e test results show that the improved GA algorithm used in this paper generates results that are very close to the solution given by the TSPLIB. e TSPLIB o cial website states that any solution that is close to or better than the solution they provide can be considered a good solution [33], which veri es the e ectiveness of the algorithm. e improved GA and Gurobi are used to obtain the shortest path lengths of 3D-TSP. e computing time and quality of the two solutions are compared. We set the parameters of the improved GA as M 200, C 10,000, P c 0.7, P m 0.1. 500 simulation experiments were performed for each scenario, and the average value of the length of the shortest path of the 500 experiments was taken as the expected length of the shortest path in this scenario. To prove that 500 simulation experiments for each scenario are su cient to accurately estimate the expected length of the shortest path, we plotted the relationship between the average length of the shortest path and the number of experiments (see Figure 3). As shown in the gure, the uctuation is large when the number of simulations is less than 100. When the number of simulations is between 100 and 300, the uctuation is smaller. When the number of simulations exceeds 300, the average value tends to be stable. To obtain accurate results, we set the number of simulation experiments to 500.
In each of the 50 scenarios with N 10, we compute the di erence between the lengths of the shortest paths obtained by the two methods and divide the di erence by the length obtained by Gurobi to get a percentage. Figure 4 shows the distribution of the percentage. Figure 4 shows that 90% of the di erence is lower than 1%, and the biggest di erence is 1.37%, which is also very small. As a result, the quality of the solutions obtained by the two methods is not much di erent. At the same time, we found that the improved GA takes much less computing time than using Gurobi. Especially when the value of N is greater than 30, Gurobi cannot generate the results for the 500 scenarios in three days, while the improved GA can. As a result, this paper uses the solution obtained by the improved GA to further establish the estimation models.

Estimation Models of the Expected Length of the Shortest
Path. We start by de ning some variables that will be used in the estimation models. We then explore the relationship between these variables and the expected length of the shortest path and propose possible models for estimating the expected length. Finally, the goodness of t of the proposed models is evaluated.
ese studies assume that the service area is a rectangle, where A is the area of the service area, N is the  Journal of Advanced Transportation number of demand points in the service area, S is the ratio of the length of the service area to its width, and L is the expected length of the shortest path in the service area. Because this study focuses on the 3D space, the number of variables is expanded to ve (V, N, S1, S2, and L), where V is the volume of the service area, S1 is the ratio of the length to the width, and S2 is the ratio of the length to the height.

Analysis of the Expected Length of the Shortest Path.
Before establishing the estimation models, it is necessary to analyze the relationship between the variables and L based on the experimental data. e results of the expected length of the shortest path for some of the 500 scenarios obtained using the improved GA are shown in Table 4. e demand points are randomly generated three-dimensional coordinates, as explained in Section 4.1. is table shows the change in the mean value of the shortest path length when di erent values of V, N, S1, and S2 are selected. As indicated in Table 4, the mean value of the shortest path length increases with an increase in V, N, S1, and S2. When V, S1, and S2 are xed, the mean value of the shortest path length increases as N increases. When N, S1, and S2 are xed, the mean value of the shortest path length increases as V increases. When V and N are xed, the mean value of the shortest path length increases gradually as S1 and S2 increase.
We use the data when the shape of the service area is a cube as an example to demonstrate the relationship between the expected length of the shortest path and the volume of the service area and the number of demand points in the service area. We rst present the relationship between the volume of the service area (V) and the expected length of the shortest path (L) under the di erent number of demand points (N) ( Figure 5). As shown in Figure 5, when V does not change, L gradually increases as N increases. When N is constant, L gradually increases with an increase in V and there is a nonlinear relationship between the two variables. It can be seen from the gure that L and the cubic root of the volume of the service area ( V √ . erefore, we use the data when the shape of the service area is a cube as an example to visualize the relationship between the variables in the form of a scatterplot. e scatterplot of the expected length of the shortest path and N √ V 3 √ is shown in Figure 6. e scatterplot of the expected length of the shortest path and NV 3 √ is presented in Figure 7. Figure 6 shows an evident linear relationship between the expected length of the shortest path (L) and N √ V 3 √ . By contrast, the linear relationship between L and NV 3 √ in Figure 7 is not that obvious.
In addition, this study calculates that when the volume of the service area is xed to 1000, the ratio of L to N √ V 3 √ and the ratio of L to NV 3 √ under di erent shapes are provided in Tables 5 and 6. As indicated in Table 5, when the volume of the service area (V) and the number of demand points (N) are xed, the ratio increases as the shape of the service area is elongated (refer to any row in Table 5). When the shape and volume of the service area are xed, the ratio increases with the number of demand points (refer to any column in Table 5). When the volume of the service area is xed, the di erence between ratios becomes smaller as the number of demand points increases. e conclusions drawn from Table 6 are similar to those drawn from Table 5.

Model Results
. From the analysis above, we could build the following models: where the coefficient is a linear combination of the independent variables. is study establishes multiple linear regression equations based on the data obtained in the simulation experiments. Test and analyze the signi cance of the comprehensive linear e ect of each independent variable on the dependent variable through RStudio, select the independent variable that only has a signi cant e ect on the independent variable, and establish the regression equations. Based on correlation analysis, eight di erent models are considered and presented in Table 7. Models 1 and 2 are extensions of the models proposed by Beardwood [11]. Models 3-6 are extensions of the models proposed by Kwon [15]. In these models, α, α 1 , α 2 , α 3 , and α 4 are unknown parameters.
After the regression models are determined, the experimental data is used to estimate the unknown parameters of the models, and RStudio is used to analyze the signi cance of the coe cients of the tted regression equations. e parameter estimation method used in this study is the least square method. eir adjusted R 2 and p values of each model are recorded. e results are summarized in Table 8. From Table 8, we can see that the R 2 of each model is very high, and the p value is very small. at means all the models t the data well. Model 1 has the highest R 2 .
However, R 2 should not be the only evaluation measure of the model because the models could su er from an over tting problem. To deal with the possible over tting problem, the k-fold cross-validation method is used [35]. e mean absolute error (MAE) and mean absolute percentage error (MAPE) of each model are calculated at the same time to compare the performance of these 6 models. e k-fold cross-validation is a cross-validation method that can e ectively compare models [35]. It rst divides the   Service region shapes S1 1 S1 1 S1 1 S1 1 S1 2 S1 2 S1 2 S1 3 S1 3 S1 4  Service region shapes S1 1 S1 1 S1 1 S1 1 S1 2 S1 2 S1 2 S1 3 S1 3 S1 4  Table 9. e MAE and MAPE of the testing set are larger than those of the training set, and such findings are reasonable. Among the 6 models, Model 1 and Model 2 are the simplest models. e MAE and MAPE of Model 1 are smaller than those of Model 2. Model 1 is selected as the representative of the simple models. Models 3 to 6 are more complex and fit the data better in terms of MAE and MAPE. Among them, Model 4 has the smallest MAE and MAPE, followed by Model 3. As a result, Model 4 is selected as the representative of the complex models.

Conclusions
In this paper, we intend to study the expected length of the shortest path of the 3D-TSP. By changing the volume, shape, and number of demand points within the service area, 500 experimental scenarios were designed. e improved GA and the solver Gurobi are used to obtain the shortest path of 3D-TSP in each scenario. e results of the two methods are compared and analyzed. ough the Gurobi could generate better results, it takes much longer computing time. Especially when the number of demand points is large, it becomes infeasible to obtain the results in an acceptable time. e shortest path obtained by the improved GA is very similar to that obtained by the Gurobi. As a result, the shortest path obtained by the improved GA is used in this study for further analysis and modeling. Regression models of the expected length of the shortest path are proposed based on data analysis and previous studies. e coefficients of models are fitted using RStudio. e k-fold cross-validation is applied to evaluate the model performance based on MAE and MAPE. All the models proposed in this study fit the training and testing datasets well. e results indicate that Model 1 can accurately estimate the expected length of the shortest path of the 3D-TSP and has a simple formulation. For a more complex model formulation, Model 4 could generate better prediction accuracy. e results of this study can be used to estimate the expected length of the shortest path of 3D-TSP. Although this study uses linear regression to model the expected length of the shortest path of the 3D-TSP, machine learning methods, such as neural networks, are also worth exploring. e limitation of this study is as follows. Firstly, we obtained the solution using the improved GA. e solution may not be globally optimal. Better algorithms could be explored in the future to make sure the global optimal solution is obtained. Secondly, this study is based on experiments designed under ideal conditions. e applicability of the proposed models in practice should be further verified.
Data Availability e data used to prove the validity of the algorithm of this study are available from https://comopt.ifi.uni-heidelberg. de/software/TSPLIB95/.

Conflicts of Interest
e authors declare no conflicts of interest regarding the publication of this paper.