TPA : A Two-Phase Approach Using Simulated Annealing for the Optimization of Census Taker Routes in Mexico

Censuses in Mexico are taken by the National Institute of Statistics and Geography (INEGI). In this paper a Two-Phase Approach (TPA) to optimize the routes of INEGI’s census takers is presented. For each pollster, in the first phase, a route is produced by means of the Simulated Annealing (SA) heuristic, which attempts to minimize the travel distance subject to particular constraints. Whenever the route is unrealizable, it is made realizable in the second phase by constructing a visibility graph for each obstacle and applying Dijkstra’s algorithm to determine the shortest path in this graph. A tuning methodology based on the irace package was used to determine the parameter values for TPA on a subset of 150 instances provided by INEGI. The practical effectiveness of TPA was assessed on another subset of 1962 instances, comparing its performance with that of the in-use heuristic (INEGIH). The results show that TPA clearly outperforms INEGIH. The average improvement is of 47.11%.


Introduction
The National Institute of Statistics and Geography (INEGI, its acronym in Spanish) is the government institution responsible for requesting, processing, analyzing, preserving, and publishing statistical and geographical data that sustain decision making processes in Mexico.These activities are done continuously, require significant public budgets, and involve specific rules and care to ensure the reliability of the generated information.Among its several functions, the planning and execution of official censuses are important.
To carry out censuses and polls, INEGI faces the logistic problem of determining a minimal length travel route through the city blocks of the instance assigned to each census taker or pollster.This problem is subject to two constraints: (a) the route must be done exclusively by accessible roads, and (b) for practical reasons each city block must be completely traversed before passing to the next one.
Presently, INEGI approaches this problem by applying the following rules of thumb (INEGI H ) [1]: (a) the first city block to be traversed by the pollster is the one situated in the northwest corner of its assigned instance; (b) each city block must be completely toured, starting from its northwest corner, before going to the next one; (c) once a city block has been toured, the next one is chosen following a zigzag path from north to south (see Figures 1 and 2).
From the combinatorial optimization viewpoint, this problem can be posed as a Minimal Hamiltonian Path Problem (MHPP) with side constraints.Given a graph  with weighted edges, MHPP consists in finding a minimal length path that visits each node of  exactly once.The similarity between MHPP and the well-known Traveling Salesman Problem (TSP) is clear.Hence, in view of the constraint (b) above, INEGI's problem could be modeled as a directed General Traveling Salesman Problem (GTSP), which, in turn, can be transformed in a directed TSP, allowing us to use the standard solution methods for TSP (see [2] and the references therein).
Also, INEGI's problem could be seen as an unpublished member of the family of arc-routing problems, but as its instances data are only available in terms of vertex coordinates, this kind of modeling did not appear straightforward.For a thorough account of arc-routing problems see [3].
The decision was to model INEGI's problem as an undirected MHPP with side constraint (b) in a geometric realm.The  →  transformation together with standard solution methods did not seem attractive because, being of general use, it does not take advantage of the geometric aspects of the problem, and its performance depends significantly on the number of vertices of the instances.In contrast, this work proposes a solution method whose performance relies more on the number of blocks than on the number of vertices.Other approaches could be the subject of future research.As MHPP has been classified as NP-complete in complexity theory [4], the proposal is an ad hoc heuristic procedure, called TPA (TPA: Two-Phase Approach), composed by two phases.The first one uses an SA algorithm [5] aimed at solving MHPP in a graph whose nodes correspond to the vertices of the polygons defining the city blocks.The second phase is devoted to repair unrealizable solutions (with crossings over the houses or buildings) obtained by the SA algorithm.This is done by means of computational geometry concepts such as visibility graphs [6,7] and then finding the shortest paths in these graphs with Dijkstra's algorithm [8].
TPA was assessed on a subset of 1962 real-world instances provided by INEGI.Results show that TPA is able to attain an average improvement of as much as 47.11% with respect to INEGI H .
The rest of the paper has been organized as follows.In Section 2 the problem under study is formally defined.The implementation details of TPA are discussed in Section 3.Then, Section 4 presents computational experiments aimed at determining suitable parameter settings for TPA and at comparing its performance with that of INEGI H . Finally, the main conclusions of this research are provided in Section 5.

Problem Formalization
The optimization problem studied in this paper consists in finding a minimal length route that traverses all the city blocks in a given instance, subject to the following specific requirements: (a) each city block must be completely traversed before passing to the next one, and (b) the route must be done exclusively by accessible roads, namely, without crossing any city block.In this sense, it can be seen as an instance of the MHPP.
Figure 3(a) shows an unrealizable route because of crossings over city blocks.However, crossings can be eliminated by adding vertices to the route (see Figure 3(b)).Most instances have blocks with irregular, asymmetric geometries, and many more vertices.
Let  = ⋃   ∈ (  ) be the set composed of all the vertices of  city blocks, and let  = ||.Given a subset   ⊆  that includes at least one vertex from each city block   ∈ , a route is an order  = (V 1 , V 2 , . . ., V  ) on   of size  ≥ .Then, the total length (cost) of route  is where dist(, V) stands for the Euclidean distance between vertices , V ∈   .The problem consists in finding a route  * minimizing (1); namely, where R is the set of all the possible routes.

TPA
The problem under study was solved by implementing TPA as mentioned in Section 1. Below, its pseudocode is shown as Algorithm 1.In the first phase of TPA a route  * is constructed by applying the SA algorithm.The core of SA is a neighborhood function, called  3 (, ), composed of two carefully designed neighborhood relations (see line 10).The route  * obtained by SA is expected to be optimal or nearoptimal; however, it could be unrealizable because, very likely, the neighborhood function generates routes with crossings (1) TPA (N, D, T i , maxIter, Q) over city blocks.Thus, the second phase of TPA (see line 20) is devoted to repair those unrealizable solutions.This is done by adding to  * one or more vertices of some city blocks so as to avoid crossings over them.The implementation details of the TPA algorithm are as follows.

Internal Representation.
During the SA process, a route  (a solution) is represented by vectors  = ( 1 ,  2 . . .,   ) and , where, for  = 1, . . ., ,   is the city block visited in the th step of the route and V  is the vertex of block   where the route leaves   to go to block  +1 , or where the route arrives to   when coming from block  −1 .

Neighborhood Function.
The main objective of the neighborhood function  3 (, ) is to produce solutions (routes) by making small changes to the current solution.This neighborhood function is composed of two neighborhood relations (see Algorithm 2).The first one,  1 (), generates a neighboring solution   by swapping, in the current solution , two randomly selected city blocks (uniform distribution).Its range is  * ( − 1)/2.
The second neighborhood function,  2 (), changes the vertex of one randomly selected city block, currently used in route , to produce a neighboring solution   with the following procedure.First a city block, say   , is randomly selected.Then, the next block in the current route  (namely,  +1 ) is also considered to identify one of its vertices satisfying the conditions: (a) it minimizes the distance between city blocks   and  +1 and (b) it avoids crossings when connecting these two city blocks (see Algorithm 3).The latter condition is ensured by an efficient verification of lines intersection through computational geometry procedures [6].The range of  2 () is bounded by  * arg max During the search process, a combination of the  1 () and  2 () neighborhood relations is employed.The former is applied with probability , while the latter is employed at (1 − ) rate.Thus, the neighborhood function  3 (, ) is defined as where  is a random number uniformly distributed in the interval [0, 1]. (

Evaluation Function.
The selection of the evaluation function is a critical aspect of any search procedure.First, to efficiently test each potential solution, the evaluation function must be as simple as possible.Secondly, it must be sensitive enough to identify promising search regions in the solution space.Finally, the evaluation function must be consistent: a solution that is better than others must contribute with a better value of the objective function [9].
In TPA implementation, the cost of a solution  is calculated using formula (1).To determine the cost of a route  with formula (1), the perimeter of every city block   ∈  and the Euclidean distance between every pair of vertices (V  , V +1 ) ∈  in the route must be computed.( + ) instructions must be executed (remember that  represents the number of city blocks in the problem instance and  is the size of an order (route) on   ⊊ ) by this complete evaluation scheme.Nevertheless, TPA employs an incremental evaluation of the neighboring solutions which takes advantage of the fact that the evaluation function D(, ) contains a constant term, ∑   ∈ (  ).This is precomputed only once and used along the search process.Furthermore, when a neighboring solution   is generated with function  1 (), the incremental evaluation scheme only recomputes the (up to four) Euclidean distances that change, in order to obtain the cost of the route   .This is much faster than the ( + ) operations originally required.
A similar incremental evaluation mechanism was implemented for the neighborhood function  2 ().As a consequence, TPA analyzes thousands of neighboring solutions employing only a very small fraction of the time that would be required by the complete evaluation scheme.This is possible thanks to the use of appropriate data structures.

Cooling Schedule.
A cooling schedule is defined by the following parameters: initial temperature, maximum number of solutions generated at each temperature value (Markov chain length), rule for decreasing the temperature, and stop condition.The cooling schedule governs the convergence of any SA [10].
In TPA implementation, a geometrical cooling scheme (static) was defined due to its simplicity, requested computational time, and quality of the solutions produced in preliminary experiments.It starts at an initial temperature   .Then, the temperature is decreased at each round by a positive factor  < 1 using the relation   =  −1 .For each temperature   , the maximum number of neighboring solutions was fixed to maxIter =  * ( 1 +  2 ).Hence, it depends directly on the range of the neighborhood function and on the constant , which is called "moves factor."This is because more neighboring solutions are required for denser instances.

Stop Condition.
TPA stops if either the best-known solution does not change during a predefined number of consecutive temperature decrements  or the current temperature is lower than a prefixed value   .

Repair Mechanism.
Let  denote the route that SA offers as optimal solution in the first phase of TPA.Recall that  is defined by vectors ( 1 ,  2 , . . .,   ) and (V 1 , V 2 , . . ., V  ).The second phase of TPA consists in applying the following repair mechanism on  (see Algorithm 4).
Let ℎ() denote the centroid of city block  and, for  = 1, . . .,  − 1, let () = 2 * dist(ℎ(  ), ℎ( +1 )), where ℎ(  ) and ℎ( +1 ) are the centroids of two neighboring blocks.Further, for  = 1, . . ., , let Γ() be the set of blocks  in the instance such that dist (ℎ () , ℎ (  )) ≤  () or dist (ℎ () , ℎ ( +1 )) ≤  () .block , if ℓ has nonempty intersection with the interior of the polygon defining .The crossings can be of two types: (a) self-crossings, when an edge (V  , V +1 ) crosses the city block   itself, and (b) neighboring-crossings, those belonging to () \   .For example, considering the blocks of Figure 3(a) as a subset of an instance and block  as reference, the line segment 1-2 is a self-crossing, and the segment 2-3 is a neighboring-crossing.Then, whenever () is not empty, a set of neighboring blocks  is produced (see lines [10][11][12][13][14][15].Set  serves to build up a visibility graph  = (, ); see [7].In , the set of nodes  is composed of the vertices of all city blocks in , and edge (, V) ∈  if (, V) is side of a city block in  or if the line segment with endpoints  and V does not cross a block in  (namely,  is visible from V and vice versa).Each edge in  has as weight the Euclidean distance between its endpoints.
Figure 4(a) presents a visibility graph between vertices 1 and 3. Figure 4(b) shows a realizable path between vertices 1 and 3. A shortest path between vertices V  and V +1 in  is obtained by means of Dijkstra's algorithm.Of course, this path, of minimal length, does not cross any single block.Dijkstra's algorithm is a well-known, efficient algorithm to solve the so-called Shortest Path Problem (SPP) in a graph with nonnegative edge costs [8].

Computational Experiments
In this section, two computational experiments are presented.The first one was aimed at determining the parameter values of TPA that yield the best trade-off between solution quality and computing time.The second experiment was carried out to compare TPA with INEGI H in solving the route problem.For these experiments, the algorithms were coded in Java SE Development Kit version 7, update 51.They were run sequentially into a cluster equipped with 10 InfiniBand interconnected nodes, each of which features 8 cores running at 2.66 GHz, has a total of 16 GB of RAM, and uses CentOS distribution of the Linux Operating System.

TPA Parameter Tuning.
Optimizing parameter values is an important activity when implementing efficient algorithms.The literature offers many possibilities to find a combination of parameter values that allow a particular algorithm to furnish the best solution quality in reasonable computational times [11,12].
In this work, the strategy was to find the most appropriate parameter values of TPA by employing the iterated racing procedure (/-) [13].This offline automatic configuration procedure is implemented in  as part of the irace package [14] and has been successfully used in several research projects [15].I/F-Race is based on the use of racing [16] and Friedman's nonparametric two-way analysis of variance by ranks.It consists of the following three steps, executed iteratively until a predefined termination criterion is met: (1) sampling new parameter configurations according to either a normal or a discrete distribution, depending on whether the parameters are numeric or categorical, (2) selecting the best parameter configurations from the newly sampled ones by means of racing, and (3) updating the sampling distribution in order to bias the sampling towards the best configurations.Further details of the I/F-Race iterative procedure and its implementation can be found in [13,14,16].
For the experiments INEGI provided a set of 2112 instances, where the number  of city blocks varies from 3 to 103, and the total number of vertices  is in the range [15,8461].From this set a subset of 150 instances served in the tuning experiments, and the remaining 1962 instances were used during the performance evaluation of TPA, as it will be detailed later.From preliminary experiments, it was found that TPA required different parameter configurations depending on the density of the instance to be solved.The density of a simple undirected graph  = (, ) is defined as 2 * ||/(|| * (|| − 1)).For this specific problem || = , and  is the number of blocks.Then, the total number of blocks sides in a particular instance can be expressed as || =  +  − 1.Consequently, the density of a particular instance can be computed as Thus, the decision was to sort the test set obtained from INEGI according to density.Then, five representative ranks of density were identified (see Table 1) and 30 instances from each rank were randomly selected for tuning the TPA algorithm.The idea behind this is to have a representative sample of the instances for this experiment and to produce one good parameter configuration for each density rank.A sample of 30 statistically representative instances of each rank was used to run the irace package [15].1500 independent runs were executed for each instance.
In Table 2, the allowed range of possible parameter settings used for the tuning process is presented.  , , and  stand, respectively, for initial temperature, moves factor, and maximum number of consecutive temperature decrements without improvement.For all the ranks, the range of the final temperature   was [0.05, 0.5], the range of the cooling factor  was [0.88, 0.99], and the range of the probability of using each of the two neighborhood functions  vec was [0, 1000].
As a result, the five different parameter configurations depicted in Table 2 were obtained.These parameter configurations are therefore used in the experimentation reported next.
In Table 3, the parameter combinations found with irace for TPA are presented.  grows as density grows.  does not present a defined trend; however the values are relatively similar.Moves factor  grows as density grows; however, in the densest rank, the value drops significantly to 0.66.In this rank, the instances have up to 103 blocks and 8461 vertices.A smaller  parameter is required because of the large size of the The results from this experiment are presented in Table 5, summarizing, for each rank, the average data obtained by the compared algorithms.The solutions found by INEGI H are listed in column 2. Columns 3 to 6 show, respectively, the best, the average and the standard deviation of the total travel distance (cost) reached by TPA, and the needed CPU time in seconds.Additionally, the average improvement attained by TPA was computed as 100 * (INEGI H −Avg.)/INEGI H . Table 5 shows that TPA clearly outperforms INEGI H with an average improvement of 47.11%.This highlights the suitability of TPA.Also, a relatively small standard deviation is an indicator of the algorithm's precision and robustness.TPA consumes, in average, 0.54 seconds per run.CPU time consumed by TPA is acceptable and fully justified by considering that it is able to outperform INEGI H in terms of cost.Results achieved by TPA are illustrated in Figure 5.The subset of 1962 instances is indexed in nonincreasing order of cost-improvement.
In regard to the instance with the highest number of blocks (92 blocks, 478 vertices), the improvement was of 70.90%.Further, in the instance with the highest number of vertices (51 blocks, 8461 vertices), the improvement was of 40.30%.The highest improvement, 98.82%, was presented in an instance of 59 blocks and 519 vertices.The lowest improvement was of 1.18% and was presented by an instance with 6 blocks and 275 vertices.TPA seems to be an effective approach to find good quality solutions for the route problem faced by INEGI.In Table 5 it can be observed that ranks 2-5 present a relatively small standard deviation.This is an indicator of the algorithm's precision and robustness, since it shows that, on average, the performance of TPA does not present important fluctuations.However, the standard deviation for rank 1 has a high value, which might be explained by the high variability of the instances' size.For example, the maximum total perimeter of an instance belonging to this subset is 59.48 times the minimum total perimeter of another one.Further, when comparing the length of the optimal route in any two instances of rank 1, the maximum ratio is as much as 256.74.

Conclusions and Future Work
In this paper a Two-Phase Approach, called TPA, was proposed for the solution of a logistic problem faced by INEGI.In the first phase of TPA a route  * , of expected minimal length, is produced by an ad hoc implementation of the SA heuristic.Due to the particular implementation, route  * is likely to be unrealizable, namely, with crossings over the city blocks.Hence, the second phase of TPA is aimed to repair route  * , namely, to eliminate the crossings from it.This is done with the use of visibility graphs [6] and Dijkstra's algorithm [8].
The parameter values for TPA were established through intensive experimentation on a representative subset of 150 instances.A tuning methodology based on the irace package was applied.The practical effectiveness of TPA was assessed on another subset of 1962 instances drawn from the test set, by comparing its performance with that of the in-use heuristic (INEGI H ). The results show that TPA clearly outperforms INEGI H ; the average improvement over the total distance traveled is of 47.11% (636.31versus 328.77).
The problem resolved in this work has common elements with traditional arc-routing problems; however some details make it original.For example, the constraints are that (a) the route must be done exclusively by accessible roads, and (b) for practical reasons each city block must be completely traversed before passing to the next one.Constraint (a) implies a combination of an arc-routing problem with geometric aspects.Standard procedures exclude these considerations and therefore are not applied directly.The composed neighborhood implemented in the SA algorithm is an ad hoc strategy for this specific problem in order to face the presence of blocks and vertices with the challenge of irregular geometries causing unrealizable solutions.Also, the integral solution given to the problem, as a whole, is not reported in the literature.In this sense, this work represents an original contribution.
Although outstanding results were obtained by TPA, they may be still improved.Future work would concentrate on designing alternative neighborhood and evaluation functions, so as to boost the algorithms performance, since it is well-known that these are two important components to define the so-called landscape of the search problem, and they impact thus greatly the overall efficiency [9].Another future work could focus on a different problem modeling or different heuristic approaches.The instances as well as detailed results of the experiments presented in this paper are available under request.

Figure 1 :
Figure 1: Example of a route produced by INEGI H .

1 Figure 2 :
Figure 2: Example of a city block traversal.

Table 1 :
Five ranks of density of the test set with 2112 instances.

Table 2 :
Allowed range of possible parameter settings used for the tuning process.

Table 3 :
Parameter combinations found with irace for TPA.

Table 4 :
Instances used to evaluate TPA performance.() and  2 (), and the relation maxIter =  * ( 1 +  2 ).4.2.Comparing TPA with INEGI  .The main objective of this section is to compare average solutions achieved by TPA with respect to INEGI H . Table 4 shows 1962 (2212 − 150) instances used to evaluate TPA performance.Next, results are compared with INEGI H . Due to the nondeterministic nature of TPA, 30 independent runs were executed for each instance.The parameter values employed by TPA are those previously found in Section 4.1.

Table 5 :
Overall performance comparison of TPA with respect to INEGI H .