A CDT-Based Heuristic Zone Design Approach for Economic Census Investigators

This paper addresses a special zone design problem for economic census investigators that is motivated by a real-world application. This paper presented a heuristic multikernel growth approach via Constrained Delaunay Triangulation (CDT). This approach not only solved the barriers problem but also dealt with the polygon data in zoning procedure. In addition, it uses a new heuristic method to speed up the zoning process greatly on the premise of the required quality of zoning. At last, two special instances for economic census were performed, highlighting the performance of this approach.


Introduction
Zone design is widely used as a method of spatial partitioning in geospatial sciences.It is similar to districting problem, which is a classical topic in optimization research.Both of the zone design and the districting problem use a certain of optimize operations divide a geographical region into some zones (or districts).The difference between them is to meet different criteria or constraints, such as balance, compactness, and contiguity.However, zone design pays more attention to how to build a spatial auxiliary graph.Sometimes, the zone design, also called district design or territory design, is usually considered a strategic activity [1].The zone design has a broad range of applications, such as in political redistricting [2][3][4][5][6][7], sales territory alignment [8][9][10], school redistricting [11], logistics districting [12][13][14][15], and delimitating zones for land-use allocation and/or land acquisition and apportionment [16][17][18].
Traditional methods for zone design mainly focus on the complex geometries and spatial relationships between them.However, there are still some problems to be solved, such as nodes representation problem and geometrical barriers problem.Solving these problems from a GIScience perspective may lead to a tractable solution, highlighting the importance of geographical insights [19,20].The use of ordinary and nonordinary Voronoi diagrams to solve districting problems has been reported in many literatures [21][22][23].Novaes et al. [22] took advantage of the Voronoi diagram to detect the spatial relationship and solved geometrical barriers problem successfully.However, all above approaches are based on point data but are not suitable for polygon data.Murray and Tong [19] have raised the issue that only point-based representations are too simplistic to represent more complex vector objects like polygons in GIS.On the other hand, traditional methods make the shape of zones tend to be a circle or a square [21,24].That no longer applies with the presence of geometrical barriers problem.Another remarkable feature of traditional districting methods is that running procedures for optimizing the districting results usually cost too much time, which is in seconds or minutes [25][26][27].
So it is hard to use the previous studies or existing optimization software platform (such as CPLEX or MOSEK) to find an effective solution.To solve problems in this paper, we have to find a way different from the traditional ones.This paper presented a heuristic multikernel growth approach via Constrained Delaunay Triangulation (CDT) and introduced

Problems Definition
This paper discussed a special zoning problem for economic census application.The objective of the problem is to partition a region into some census zones for investigators.Every investigator takes charge of surveying the financial situation of all companies in one zone.The partitioned zones should be contiguous, compact, and balanced.The main constraints presented are as follows: (1) contiguity makes sure that units in every zone are geographically adjacent; (2) compactness requires total travelling time in one zone as small as possible; (3) balance ensures that the workloads of every investigator are as equal as possible.The workload mainly includes investigating time and travelling time.In our application, data sources are polygon data and line obstacle data.Polygon data stands for the shape of buildings, where companies located.Some polygons are touching, but some of them are not touching, while line obstacle data stands for fences or busy roads which cannot be passed by.Additionally, it is difficult for ordinary users to give some upper and lower limits on a scalar attribute (e.g., investigating time and travelling time), because they hope to use it as simple as possible.To solve problems in this paper, we have to find a way different from the traditional ones.
First problem is about how to generate auxiliary graph for these touching or nontouching building polygons.Most of literatures often use a point to represent a polygon and then generate auxiliary graph based on these simplified points.However, in most of cases, the auxiliary graph mentioned above may not truly represent buildings' connection and touching relationships, because sometimes a polygon with complex shape may distort the connection and touching relationships of neighbor polygons.The partitions based on simple point-representation may result in some errors.Although Chou and Li [28,29] and Ríos-Mercado and Fernández [27] discussed how to generate auxiliary graph directly based on polygons, they only considered touching polygons rather than nontouching polygons.Therefore, it is necessary to generate an auxiliary graph based on touching or nontouching polygons.In this paper, we used Constrained Delaunay Triangulation (CDT) to generate an Auxiliary Graph for nontouching Polygons (AGP).The shortest distance between discrete polygons in the auxiliary graph is related to the shapes of touching polygons, which would be described in Section 3.1.
The second problem is how to adjust AGP when line obstacles are involved.Novaes et al. [22] discussed some geometrical barriers problem, but they still focused on point data based on Voronoi diagram.Their method is not suitable for our polygon data based on CDT.In our application, geographical barriers may be polylines or polygons; they will affect the connection of building polygons.When a barrier crosses some edges of an auxiliary graph, these edges (paths) will be cut.The adjusted graph is called auxiliary graph for polygons and barriers (AGPB).
The third problem is how to speed up the zoning process.In fact, spatial partition is a classical NP-hard problem.Most of literatures adopt the multikernel growth approach.The approach firstly selects a certain number of basic nodes as "seeds" (centers) for some zones and then successively adds every other node to its neighboring center until all have been assigned [23].In the optimization phase, they need to constantly swap two adjacent nodes of zones in order to achieve optimized zones.When it comes to too many nodes, it is exhausted.It is feasible to consider at heuristics [6,8,30,31], such as Tabu Search [32,33] and simulated annealing [17,34,35].The problem of traditional solutions is that bad seeds selection would generate bad partitions initiation.That means they often took more time to get optimization partitions.The procedure usually spends a lot of time after 20,000∼80,000 iterations [7].In the paper, we use some heuristic strategies about spatial structure information to participate in the location of nodes and then avoided some unnecessary iterations.

A Heuristic Zone Design Approach Based on Constrained Delaunay Triangulation
There are two major phases in our method.Phase 1 is to construct an auxiliary graph for polygons and barriers and at last write them into XML files.Phase 2 is to use a spatial heuristic strategy to realize multikernel growth.The heuristic strategy continues to run as a simple optimization after all nodes are allocated.A flow chart of the general procedure is presented in Figure 1.Details of the steps are provided below.

Phase I: Auxiliary Graph Construction. The left part of
Figure 1 shows the flow chart of Phase I.The AGPB construction is an optional operation.Users can input polygon data and barriers data based on actual situations.Finally, an auxiliary graph, which reflects the connectivity between nodes, is written into XML files.need to be deleted.After the deletion of all the triangulations contained in the polygons, the polygons are attached in the graph as nodes.The relationship between adjacent nodes can be obtained by recursions.
In the third step above, the shortest paths between polygons are obtained by taking advantage of Delaunay Triangles.In fact, after the second step of CDT construction, triangles must connect two or three polygons.If three points of a triangle belong to three different polygons, its three edges are all likely to be the shortest paths between two polygons.Therefore, all edges in this type of triangles should be stored as candidates for the shortest path.If points of a triangulation belong to two polygons, it must be such a situation that a point belongs to a polygon and the other two points belong to another polygon.In those cases, the shortest path may not be the one with the edges existing in the triangle but rather the vertical distance from a point to a line.In Figure 2, we assume that there is a Delaunay Triangle connected with two polygons, and vertex (  ,   ) belongs to one triangle and vertexes (  ,   ) and (  ,   ) belong to the other one.Assume that the edge  stands for the shortest edge which is from vertex  to the straight line determined by vertexes  and .The location of vertex  has two situations: either location inside the edge  or location outside the edge .We assume that the slope of straight line  is .If point  moves from vertex  along -axis by  units, it will move along axis by  =   +  *  units.Straight line  is perpendicular to straight line  and we obtain the formula as follows: Equation ( 1) can be further rearranged as follows: If 0 ≤  ≤   −   , vertex  is located between vertex  and vertex  and the shortest path in the triangulation is the segment .
If  >   −   , vertex  is located in the extension line of segment  and the shortest path in the triangulation is the segment .
If  < 0, vertex  is located in the extension line of segment  and the shortest path in the triangulation is the segment .

Construction of AGPB.
Further, the existing graph should be adjusted when geographical barriers exist.After the construction of AGP, we continue to put the vertexes of geometrical barriers into the graph and construct AGPB.In the process above we should find out all the edges intersecting with counter lines of geometrical barriers.Assume that the edge with the start point  and end point  needs to be adjusted.According to the graph, the geometrical barriers associated with points  and  can be found out and the set of their vertexes is assumed to be noted as  aset and  aset , respectively.
Assume that the line with start point  and end point  is defined as Line⟨, , ⟩ and  stands for a point or a line or the collection of both.For example, the Line⟨, , ⟩ means points , , and  form a line; the line of Line⟨, Line⟨, , ⟩, ⟩ is composed of points , , , , and .Suppose O (defined as in (3)) is the set of barriers and is defined as follows where the integer  means the count of geometrical barriers.Consider ( = 1, . . ., ) represents a polygon or a line.Assume that the line with the minimum length in a set  of lines is noted as Line minlength {}.The shortest path function (noted as Iter Line(⋅)) of points  and  can be calculated as follows: Iter Line (, , ) = Line minlength {Line  ⟨, , ⟩ , Line  ⟨, , ⟩ , Line  ⟨, , ⟩ , Line  ⟨, , ⟩} .

(4)
As can be seen from the formula, the shortest path is calculated by considering four cases.Calculations of four cases are as follows: (1) The shortest path is composed of three points, which are point , point  from  aset , and point .Mathematical expression is as follows: And we set a set  as follows: (2) The shortest path is composed of three points, which are point , point  from  aset , and point .Mathematical expression is as follows: And we set a set  as follows: (3) The shortest path is composed of four points, which are point , point  from ( aset -), point  from ( aset -), and point .Mathematical expression is as follows: And we set a set  as follows: (4) An iterative process will be taken into account in this case.One point in the ( aset --) and one point in the ( aset --) are put into the function () and a set of lines (noted as sublines) can be obtained.Finally select the shortest one from sublines, put points  to  into the first and last position of the shortest subline, and the shortest path would be found out.The line on behalf of the shortest path is composed of not less than four points.Mathematical expression is as follows: The Antonio example is used here to illuminate the new method.Firstly the Delaunay Triangulation is created based on points , , , and .Add vertexes of barriers  1 and  2 to adjust the DT.The DT after adjustment is shown in Figure 3.According to DT, triangles  1 ,  2 ,  3 , and  4 are found out.So  aset = {, } and  aset = {, }, where , , , and  represent vertexes of barriers shown in the figure.Line ⟨, , ⟩, Line ⟨, , ⟩, and Line ⟨, , ⟩ do not exist, while Line ⟨, , ⟩ = Line⟨, , , ⟩.So the shortest path from point  to point  is  →  →  → .
Phase I mainly accomplishes operations for determining the spatial relations of data.It is relatively independent of Phase II.The graph obtained from Phase I can be stored and used at any time in Phase II with different parameters.Therefore, the design of two phases can save significant amount of time during graph construction.  = Euclidean distance between nodes  and .

Phase II: The Heuristic Multikernel Growth
= edge connecting nodes  and .
MST  = minimum spanning tree of the potential zone .
= weight of compactness.
The following decision and auxiliary variables are defined: The mathematical formulation is as follows:

Balance
Minimize (max { 1 , . . .,   , . . .,   } − min { 1 , . . .,   , . . .,   }) , where Compactness where The formulations above are drove by a specific zone problem in the real-world application.The problem is different from the traditional ones because we cannot give out the exact bounds for varieties or constraints.So we cannot completely use the methods of previous work or the existing optimization software platform (such as CPLEX or MOSEK) to solve the problem.In this work, multikernel growth is employed as a simple heuristic partition method.This method begins by selecting a certain number of basic nodes as "kernels" for the zones.The algorithm then successively adds, to each kernel, neighboring basic nodes by primarily considering the workload balance and compactness objectives.

A Heuristic Strategy.
The specific multiobjective problem is solved in this work by the heuristic multikernel growth method (AMKG for short) based on an auxiliary graph.The contiguity objective is assumed to be achieved as long as nodes are connected by edges in an auxiliary graph.The balance and compactness objectives are achieved by a heuristic strategy: the zone with the smallest workload is selected and allocated with a specific basic node which contains a relative large workload and is relatively close to the zones.Furthermore, the specific basic node is found out firstly from unallocated nodes around the zone otherwise, if it failed to be got, from allocated nodes around the zone.The specific basic node is selected by calculating the "cost" which is defined by (19).To achieve the workload balance and compactness objective, "kernels" are greedily allocated basic nodes with the largest "cost" via an iterative procedure.The "cost" of node  can be defined as follows: where  stands for a positive number of type double,  means the average distance value of the edges in graph network,  means the average value of workload, and We can see from (19) that the compactness objective can be generally achieved because   would be as small as possible in the basic nodes allocation when the largest "cost" is selected.
It shoud be noted that in the heuristic multikernel growth the allocating basic node may be an unallocated one but may be an already allocated one.The proposed algorithm is with a certain degree of self-regulation and not sensitive to the initial choice of kernels.If allocating nodes must be unallocated nodes and the initial kernels are baddly selected, some zones would have no new unallocated node allocated in the process of zones growth.The final result may make a serious deviation from the balance object.For example, Figure 4(a) shows that zone I with the minimun workload (which is 40) needs to continue to be allocated nodes, but no unallocated nodes can be allocated.The proposed heuristic multikernel growth makes the zones growth "greedy" and "competitive."The zones with minimum workload may "eat" the adjacent node which belonged to other zones.In Figure 4(a), zone I gets the node with workload of 12 from zone II.Zone II needs to be allocated nodes because its workload becomes minimum.In accordance with the heuristic described previously the nodes with workload of 14 and 10 were allocated to zone II.According to the heuristic rules, the procedure would be continued as in Figures 4(c) and 4(d) and the final results are shown in Figure 4(e).

A Simple Optimization.
The heuristic multikernel growth also has a simple and quick procedure of optimization for the result of nodes allocation after all nodes are allocated.The most classic optimization procedure for zoning problem may be the local search.It usually needs a huge time to run and it is hard to know when to stop the optimization procedure.The optimization procedure of the heuristic multikernel growth follows the same heurisic strategy as the nodes allocation.Its objective is to make the maximun workload of zones as small as possible (see (21)).So it is simple to be implemented and a little time comsuming: Minimize (max { 1 , . . .,   , . . .,   }) .zone centers.Kalcsics et al. [26] thought a commonly used method was to solve in each territory resulting from the last allocation phase a 1-median problem.But it would be very complex, because much iteration should be taken to avoid bad cases.In our method, the initial seeds should be nodes which contain the largest number of companies.Of course it may not be optimization to take the initial seeds as kernels.If it is not optimization, this method will use a heuristic algorithm to adjust kernels.Nodes allocation takes into account three criteria: contiguity, balance, and compactness.The contiguity objective is assumed to be achieved as long as nodes are connected by edges in the graph.The balance and compactness objectives are achieved step by step in the allocation procedure.The general allocation procedure of improved multikernel growth method is illustrated in Figure 5. Zone  with the minimum workload would be selected to "grow."The connected basic node  with the maximum cost(, ) will be selected and allocated to zone .The simple optimization is similar to nodes allocation.Both of them follow the same heuristic strategy, but the main iteration of simple optimization is that if the first node with maximum "cost" in the nodes sorted list enlarges the minimum workload of zones or reduces the maximum workload of zones, it should be switched; otherwise the next node with the second maximum "cost" would be selected and repeat that logic.The iterative procedure would be continued until the maximum workload of zones could not be smaller any more by the heuristic strategy.
Assuming that the number of graphs which kernels generate is noted as , the workload of the -sub graph (1 ≤  ≤ ) is noted as   .The main procedure of improved multikernel growth is shown as follows: (a) Take one graph, the workload of which is noted as   , and identify the number of kernels which is  where (b) Obtain  nodes with the largest number of companies as centers for zones, and take the number of companies as the initial workload of zones.
(c) Find out the zone (noted as ) with the minimum workload and find out its adjacent nodes via auxiliary subgraph firstly from unallocated nodes.If there are no unallocated nodes, the algorithm will find out its adjacent nodes from allocated ones; simultaneously calculate their cost by (19).
(d) Allocate the node with the maximum cost to the zone  and recalculate its workload   .
(e) If all the nodes in the subgraph are allocated, turn to step (c); otherwise continue.
(f) Take the zone with the minimum workload, and try to "eat" its adjacent node with the biggest "cost."If the maximum workload is not smaller or the zone whose basic unit is "eaten" becomes not continuous, another adjacent node with the next biggest "cost" would be tried.
(g) Continue step (f) until the maximum workload could not be smaller any more.
(h) Exit the procedure if the entire graph is handled; otherwise turn to step (a).

Computational Experiments and Results
To test the performance of the proposed solution procedure, a series of experiments were generated.All problem experiments were solved on a 2.53 GHz Pentium processor with 1.93 AGPB of RAM running with Windows XP as the operating system.Our procedure model is solved based on ArcEngine 9.3.A real-world application from an economic census whose operations consist of surveying the financial state of companies within a service region was performed in this work.
Test data were all obtained from the real data of an economic census in Beijing, China.We solved two different sizes of instances, which are classified by number of nodes and zones: Test one: 213 nodes and 3 zones.
Each instance has two phases.Phase I checked the topology of polygons and got touching polygons recording zero distance.Then AGP would be generated.After AGP was created, we continued to turn the graph into AGPB wherever geometrical barriers existed.In Phase II, the heuristic multikernel growth approach was performed.The nodes with a relatively largest workload were selected as initial kernels.Nodes allocation and simple optimization would be applied following proposed heuristic strategy.
In test one, polygons were input and AGP was built in Phase I.In Phase II, several experiments are repeated on different values of parameters  and .In Test two, geographical barriers are considered.Therefore, AGPB was built in Phase I. Several distances are solved based on different numbers of zones with the same values of  and .Further, a comparison test between considering barriers and nonconsidering barriers is given in Section 4.2.This paper uses the Standard Deviation of Workload Values of zones to evaluate the balance of workload (STDEV ) and uses the AVERAGE sum length of all MST edges in zones to evaluate the compactness of zones (AVERAGE ).A smaller value of STDEV  means a better balance result while a smaller value of AVERAGE  means a better compactness result.Finally, time cost of instances was discussed.4.1.Test One.The test data of test one is selected by a random rectangle from the economic census data.213 buildings are selected into the test data and 1189 companies are involved.This indicates that there are 213 nodes whose total workload is 1189, and the number of companies located in every building is shown in Figure 6(a).Two spatial auxiliary graphs based on polygon center points and polygons were, respectively, built In generation of the graph, triangles between polygons were initially created and then AGP was created consecutively.Figure 6(b) shows details of AGPB.Finally, we decided to generate three zones from the test data.
In the first step of Phase II, we selected three nodes with the largest number of companies as kernels.In the second step, we set different values to  and different results are obtained.In Figure 7, we thought travel time on road can be negligible relative to the investigation time in companies, so we set  = 0 and  = 10.The average value of workload is 396.33 (the number of companies actually) and that of three zones is 397, 396, and 396, respectively.This means that the result has got a good balance of workload for zoning.We also can find out that in Figure 7 continuity of zones has been held.Compactness of zones is used to control the total travel time in our application so it would be acceptable due to negligible travel time.
When  > 0, the workload content included the travel cost and the compactness objective would take into account the distance of edges between polygons by setting  = 10.We set different values for  and the numerical results are presented in Figure 8.The parameter  stands for the number of companies,  for the average length of MST, and  for the workload in zones.
In Figure 8, we note that, in general, values of STDEV  and AVERAGE  almost have no large change with the change of the value of .Two charts (see Figure 8) are generated from test results and support the assumption stated above.This indicates that the balance of workload and  the compactness objectives are affected not too much by the composition of workload content.In practical application, the appropriate value of  needs to be considered when considering workload content and we can set its value as a ratio of investigation time to travel time.With changes in the value of , standard deviation of workload is stable and is always approximately 0.6 in most cases.The average value of distances changes around 1250.This indicates that the proposed heuristic works and holds the balance of workload and compactness to a certain extent when  is specified and whatever value  is.
To test the effect of the parameter  in the heuristic multikernel growth, another twenty experiments were performed with twenty different values of  and two specific values of .The test result of experiments is shown as in Figure 9.It is noted that when the value of  is relatively small, the balance of workload is in general well held but the compactness is badly achieved.This means that  is a control to coordinate compactness and balance objective.From Figure 9 we can find out that a specific  corresponds to a relatively specific value of AVERAGE .So in the real application we usually specify  by considering the travel time limits or means of transportation.

Test Two.
Test data of large instances are selected by a random rectangle from the economic census data.There are 741 polygon nodes whose total workload is 7107 and the number of companies located in every building is shown in Figure 10(a).In addition, we considered some busy roads or fences through the test data as geographical barriers, which can be seen in Figure 10(b).Finally, we aimed at generating 3\4\5 zones from the test data.
To illustrate the impact of geometrical barriers to the zoning results, we have made comparison experiments without   barriers but with the same parameters  and .In Figures 11 and 12, we could find out that geometrical barriers had affected the zoning results.
The comparison between effects of zoning results also has been made.Set  = 0.08 and  = 100, and effects of balance and compactness between the 3\4\5 zones with barriers or no barriers are shown in Table 1.We can see that geometrical barriers have affected the zoning results mainly affecting the distribution of members in zones but having relatively little effect to zoning results.II.Therefore, once the graph is built up and stored in a file, there is no necessity to rebuild the auxiliary graph and the file only needs to be read prior to the procedure of the proposed heuristic strategy with different values of parameters  and .Therefore, the average computational time will be decreased greatly and can well meet our requirements of the special zoning problem in the economic census.

Conclusions and Possible Further Work
This paper addresses a special zoning problem for economic census that is motivated by a real-world application.Firstly, we introduced the Constrained Delaunay Triangulation to solve several problems such as polygon-based graph construction problem and geographical barriers problem which are generally countered in the zoning problem.The difficulty of this NP-complete problem motivated us to propose the heuristic multikernel growth following a heuristic strategy to speed up the zoning process greatly in the premise of the required quality of zoning.According to real-world instances, we present problem formulations to optimize three criteria: contiguity, compactness, and balance of workload among zones.Two special instances for economic census were performed, highlighting the applicability of this approach.They resulted in acceptable compact zones with considerable balance of workload.Test one showed the partition results of touching and nontouching polygons.Some experiments showed how to determine the values of  and  and the meanings of them; that is, the values of  determine workload composition while  determines balance and compactness object.Test one also showed that compactness would be better achieved if  is larger while balance object would become worse.Test two showed that our method could successfully solve zoning problem with barriers and still could achieve some reasonable results.Test three in Section 4.3 showed that our method could get optimized results within acceptable time cost.
The simple spatial heuristic approach in this paper makes a good performance in most of cases, but sometimes it maybe results in bad compactness objective.The reason is that the compactness objective is achieved by minimizing the maximum length of MST constructed in the zones when the kernels are growing without considering the shape of zones.In the future, we will do our efforts on the problem and maybe take into account the shape of zones for compactness.

Figure 3 :
Figure 3: Search the short path around barriers.

3. 2 . 1 .
Problem Formulation.Before describing the heuristic multikernel growth, it is necessary to formally specify the problem formulation of interest.The problem formulation can be specified from the follow three criteria: balance of workloads, compactness of zones, and contiguity of nodes.The following parameters are defined:  = node set. = edge set. = index of nodes. = index of nodes. = index of potential zones. = number of zones to plan.  = node .  = workload of the node .  = node set of the zone .

(21) 3 . 2 . 4 .
The Implementation.The improved multikernel growth method includes three steps: (1) seeds selection;(2) nodes allocation; (3) simple optimization.There exist several approaches for determining a new configuration of District with Min(W j ) is selected Node with Max(cost(i, j)) is selected
(a) Polygon data and workload e (b) Construct the AGP based on points (c) Construct the AGP based on polygons

Figure 6 :
Figure 6: Generation of the auxiliary graph.

Figure 8 :Figure 9 :
Figure 8: Standard deviation of workload and average length of MST.
Barrier edges Polygon nodes(a) Test data for large instance AGPB edges (b) Auxiliary graph for large instance

Figure 10 :
Figure 10: Test data and its auxiliary graph with geometrical barriers.
if node  blongs to the district 1, if node  and  are connected by an existing edge   0, otherwise.
Constraint (17)means a node belongs to only one zone while constraint (18) means a zone contains more than one node: )Contiguity.If a zone is an undirected graph whose nodes stand for buildings of the zone, an edge   exists between nodes  and  if and only if  ↔ = 1.A zone is contiguous if and only if the graph is connected (a path exists between every pair of adjacent nodes).A zone is feasible only if it is contiguous.

Table 1 :
Results of large instances.

Table 2 :
Time cost in different instances.