New Methods for the Construction of Test Cases for Partitioning Heuristics

Partitioning is an important problem in the design automation of integrated circuits. This problem in many of its formulation is NP-Hard, and several heuristic methods have been proposed for its solution. To evaluate the effectiveness of the various partitioning heuristics, it is desirable to have test cases with known optimal solutions that are as “random looking” as possible. In this paper, we describe several methods for the construction of such test cases. All our methods except one use the theory of network flow. The remaining method uses a relationship between a partitioning problem and the geometric clustering problem. The latter problem can be solved in polynomial time in any fixed dimension.

Partitioning in several of its formulations is NP-Hard [7].Therefore, the various existing partitioning methods are heuristics in nature [8][9][10][11][12][13][14][15].In analyzing the effectiveness of partitioning methods, it is useful to have test cases (inputs) for which an optimal solution is known.In this paper, we restrict our attention to one important formulation of the parti- tioning problem, which we will refer to as the mini- mum bisection problem (MBP).Informally, MBP seeks to partition a graph into two parts of about equal sizes such that the sum of weights on the edges cut by the partition is minimized.We will describe 4 methods for the generation of test cases with known optimal solutions for MBP.Three of these methods use the network flow theory [16], and the fourth method uses a relationship between MBP and the geometric clustering problem (GC)  [13, 17].
The latter problem can be solved in polynomial time in any fixed dimension [18].In all our methods we restrict ourselves to partitions of equal sizes.
The remainder of this paper consists of five more sections.Section 2 describes some preliminary notations and definitions.Section 3 reviews the Krishnamurthy-Mellema method for the generation of test cases for MBP.Section 4 describes 3 methods for the generation of test cases for MBP using network flow theory.Section 5 describes a method for the generation of test for MBP using a polyno- mial time algorithm for GC.Finally, Section 6 con- cludes this paper.

PRELIMINARIES
We assume that the reader is familiar with the notion of a set, multiset, and a graph.The number of elements of a set or a multiset A is denoted by AI.The sum of two multisets A and B is the multiset A + B consisting of all elements of A and all elements of B. Thus, IA + nl--IAI / Inl.Re- peated elements of a multiset A can be considered distinct by giving them distinct labels.In this case, the multiset can be considered a set.A mapping f: A R from a multiset A (considered as a set) to the reals is a weighting function on A. Given f, the weight of an element a A is f(a), and the weight of a subset B _ _ _ A is Eb Bf(b).By a slight abuse of notations, we write f(B) for the weight of B _ _ _ A. A graph G(V, E) is vertex-weighted if there exists a 94 YOUSSEF SAAB weighting function S defined on V.For v V, we say that S(v) is the size of v. Similarly for B V, we say that S(B) is the size of B. In this paper, the sizes of the vertices of a vertex-weighted graph are all assumed to be strictly positive.A graph G(V, E)   is edge-weighted if there exists a weighting function W defined on E. In the graphs considered in this paper, it is allowed to have repeated (parallel) edges among pairs of vertices, i.e., the set of edges E is a multiset.However, for partitioning purposes, paral- lel edges can be replaced by a single edge having a weight equal to the sum of their weights.Self-loops do not play any role in partitioning problem, and therefore they can be ignored if they do exist in a graph.A weighted graph is a graph which is both vertex-weighted and edge-weighted.Note that any graph can be considered a weighted graph by simply assigning unit weights to its vertices and edges.
Given a graph G(V, E), we use S(v) for the size of a vertex v V, and W(e) for the weight of an edge e E. The sum of two graphs GI(V1, E1) and G2(V2, E2) is the graph G(V, E) given by V V u V 2 and E E + E 2. In this case, we write G G + G 2. Clearly, for > 2 the operation G G + G 2 + +G is well-defined, since .theopera- tions u and + are both commutative and associa- tive.A partition of a graph G(V, E) is a partition (V 1, V 2) of the vertex set V, i.e., V t V 2 , V u V 2 V, and both V and V 2 are non-empty.If S(V1) S(V 2), then we say that the partition (V 1, V2) is a bisection of G.An edge u--is said to be cut by a partition (A, B)of G if u A and v B. The cost of a partition (A, B) of G is the sum of the weights of the edges cut by (A, B) and is denoted by cost (A, B).The minimum cut problem (MCP) seeks a minimum-cost partition of a graph, and the mini- mum bisection problem (MBP) seeks a minimum- cost bisection of a graph.Using network flow tech- niques [16], MCP can be solved efficiently in polyno- mial time.However, MBP is NP-hard [7].

THE KRISHNAMURTHY-MELLEMA METHOD
This method appeared in [19, 20] for the case of a hypergraph.Here, we consider its application to the case of a graph.The description of this method here is slightly different than in the original papers [19,  20].However, the basic idea is still the same.We will first establish a general lemma which captures the underlying principle of the Krishnamurthy- Mellema method.
Lemma 1: Let G(V, E) G I(V1, E 1) + G2(V2, E2) + +G/(V/, Et) be the sum of >_ 2 weighted graphs such that the sets V/ are weighted by the same function S: V-R.Let c be the mini- mum cost of any bisection of G, and let c i, 1 <_ <_ be the minimum cost of any partition of G i. If every bisection (A, B) of G is such that (A n V, B n V/) is a partition of Gi for 1 _< _< l.Then, The Krishnamurthy-Mellema method implicitly uses Lemma 1.In this method a graph G(V, E) is constructed as a sum of a number of other graphs Gi(Vi, Ei) 1 < < l, and such that G satisfies the hypothesis of Lemma 1 and admits a bisection (A, B)   such that (A V/, B c V/) is a minimum-cost parti- tion of Gi, 1 < < l.Thus, equality is achieved in (*) and cost(A, B) must be the minimum cost of any bisection of G.In the original Krishnamurthy- Mellema method only unit weights are used for the edges.However, this method can be extended to construct graphs of non-unit weights in a trivial way.
Here is an algorithmic description of the Krishna- murthy-Mellema method: Step 1.
Let n be the desired number of vertices, and let k be the desired optimal cost.Let V be a set of n distinct vertices.Con- sider any partition (A, B) of V. Assign weights to the vertices such S(A) S(B) S(V)/2.Set counter 1.
Randomly choose a subset V/ of V such that Z --A V q: f, B B V 4: , and S(V/) > S(V)/2.Set the edge set E (R).
Randomly choose a vertex a A and b B and set E E k) {ab}.
Add enough edges to E so that Gi(Vi, Ei)   becomes connected.The edges added in this step are not cut by the partition (Zi, Bi).If < k then increment and go to step 3.
Clearly, any bisection (X, Y) of G is such that (Xi, Yi)= (X n V i, Y V i) is a partition of G for 1 <i<k, since S(V/)>S(V)/2 for 1 <i<k.
Also, the minimum cost of a partition of G is 1 for 1 < < k.Therefore, by Lemma 1, the minimum cost of any bisection of G is at least E/= 11 k.But the bisection (A, B)chosen by the Krishnamurthy- Mellema method has a cost equal to k, since only the k edges chosen in step 4 are cut by (A,B).
Consequently, k must be the minimum cost of any bisection of G.
The above method is straightforward.However, it has two disadvantages.First, the number of vertices in the set V/chosen in step 3 is 12(n) under moder- ate assumptions.Hence, ll(n) edges are added in step 5 to guarantee the connectedness of G i. Consequently, the number of edges in the resulting graph G is ll(nk) and is therefore proportional to k.This is not a desirable fact, since in a random-looking graph no clear relationship should exist between the optimal bisection cost and the number of edges.The number of edges in G can be reduced by replacing every p parallel edges by a single one of them having weight p.However, even in this case, the graph is still less random, since we know that with a very high probability G admits a bisection which cuts only unit weight edges.Thus, if e umv has weight greater than 1, then u and v can be forced to be in the same part while keeping a high probability of finding an optimal bisection.Thus, a special algorithm can be developed to find an optimal bisection in G.The second disadvantage of the Krishnamurthy-Mellema method is that it does not handle edges of different weights.The obvious ex- tension of this method to handle different weights is to give all the edges chosen in steps 4 and 5 the same weight ai, 1 < _< k.Then, the cost of an optimal bisection of G becomes E/= lai rather than k.But again G has a non-random structure since the edge set of G can be partitioned into k disjoint parts E1, E9.,... Eg such that the weight of any edge in E is a for 1 < _< k.

METHODS BASED ON NETWORK FLOW THEORY
The methods in this section rely on the fact that MCP can be solved in polynomial time using net- work flow techniques [16].The input to all the methods in this section consists of an integer n.The number of vertices of the generated graphs will be O(n).
Method NFI: Step 1.Let G(V, E) be a random edge-weighted graph on n vertices.
Step 2. Find a partition (A, B)of G of minimum cost c.
Step 3. Assign weights to the vertices of G such that S(A) S(B) S(V)/2.
Lemma 2: The minimum cost of any bisection of the graph G generated by Method NF1 is c.
Proof: It suffices to show that (A, B) is a bisection of minimum cost.By step 3, (A, B) is a bisection of G.By step 2, cost(A, B) c and c is the minimum cost of any partition of G.
El Method NF1 may be disadvantageous, since the cardinality of one of the subsets A or B may be much less than the cardinality of the other subset.
Therefore, to enforce S(A) S(B) in step 3 of Method NF1, much larger weights must be assigned to the vertices of the subset of smaller cardinality.
The next two methods avoid the disadvantage of Method NF1.
Step 3. Find a partition (A 2, B2)of G 2 of minimum cost of c 2.
Step 5. Assume without loss of generality that [AI[ >_ [Bll and 1A21 "( IBE[.Randomly add enough edges to G between vertices of A and A 2 and enough edges between vertices of B and B 2, until the partition (V1, V 2) in G has cost Cl + 172" Step 6.Let A AI I,.)A 2 and B B to B 2. Assign weights to the vertices of G so that S(A) S(B) S(V 1) S(V2) S(V)/2.This is always possible (e.g., S(A1) S(B 2) x and S(A 2) S(B 1) y x, where 0 < x <y).
Lemma 3: c + c 2 is tile minimum cost of any bisection of the graph G generated by Method NF2.
Proof: It suffices to show that (A, B) is a minimum cost bisection of G.By step 6, S(A)=S(B)= S(V)/2, and hence (A, B) is a bisection of G.All the edges added in step 5 are not cut by (A, B).Therefore, by steps 2 and 3, cost(A, B)= c / C 2.
Step 2. For every edge uv in G1, put a corre- sponding edge f(u)f(v) in G2 and assign to it the same weight as uv in G 1.
By performing the above operation, we have actu- ally created a subgraph of G 2 which is isomorphic to G 1. We say that we have embedded G into G 2 by f.The next method uses embedding operations to find a graph with a known cost for the optimal bisection.Method NF3: Step 1.
Find a partition (A 1, B1) of G of minimum cost c 1.
Assume without loss of generality that All > IBll and IZ21-< In2l.Consider an empty graph G(V, f) such that IVI IAll / IB21.Let (A, B) be a partition of G such that IZl IZll and Inl-In21.
Embed G in G by an injective mapping fl'V1 "-> V such that fl(A 1) =A and fl(B1) c__.B.
Assign weights to the vertices in V such that S(A)= S(B) S(V)/2.Lemma 4: C / C 2 is the minimum cost of any bisection of the graph G generated by Method NF3.
Proof."By steps 5 and 6, we have cost(A,B)= C / C 2. Let X fl(Va) and let Y f2(V2).By steps 5 and 7, S(X) S(A) + S(fl(B1)) > S(A) S(V)/2.Similarly, by steps 6 and 7, S(Y) S(B) + S(f2(A2)) > S(B) S(V) / 2. Thus in any bisection (A',B') of G, we must have X=XA' X X B' 4: , Y Y f A' 4: f, and Y B' 4: .A lso, (X, X) is a partition of the subgraph of G that is isomorphic to G by fl, and (Y;, Y) is a partition of the subgraph of G that is isomorphic to G 2 by f2.Therefore, by steps 2 and 3, cost(X, X) > c as an isomorphic partition of G (i.e., here we only sum the weights of the edges of the subgraph that is isomorphic to G in G), and cost(Y, Y(z) > 2 as an isomorphic partition of G 2. Consequently, since the set of edges of the two subgraphs that are isomorphic to G1 and G 2 in G, are disjoint by construction, cost(A', B') >_ c1 / C2" 5. A METHOD BASED ON GEOMETRIC CLUSTERING In this section, we describe a method for the genera- tion of test case for MBP using an algorithm for the solution of the geometric clustering problem (GC).The relationship of MBP and GC is due to Frankle and Karp [13].The method described here relies on an algorithm for GC, which runs in polynomial-time in any fixed dimension.This algorithm is due to Montgomery-Smith and Saab [18], and a similar algorithm was independently discovered by Arun and Rap [21] about the same time.The theorem which is the basis of the algorithm by Montgomery-Smith and Saab allows the generated graphs to have only unit sizes for the vertices, and it does not allow for parallel edges.
It is useful here to review the matrix representa- tion of simple graphs, where a simple graph is one that does not admit parallel edges.
The connection matrix C of a simple graph is defined by C ij "-0 if there exist no edge between vertices and j.Otherwise, cij is equal to the weight of the edge imj.For MBP, self-loops do not matter, so they can be ignored if they do exist in the graph.
Given an even number n of points (vectors) in d-dimensional Euclidean space, GC seeks to find a partition of these n points into two sets S and S 2 of equal cardinality such that the Euclidean distance between the centroids of the points in S and $2 is maximized, where the centroid of a set of points S is given by (1/ISI)Epsp.In the sequel, a partition of a set S of n points into two sets of equal cardinality will be referred to as a bisection of S. The n points of an instance S of GC can be conveniently repre- sented as the n columns of a d n matrix B. Let C BrB be considered as the 'connection matrix of a graph G(V, E) on n vertices.We can consider an integer to either be the i-th vertex in G or the i-th point in the set S. Therefore, there exist a one-to-one correspondence between a bisection of S and a bisection of G.In fact, we can consider a partition (X, Y) of the set {1, 2,..., n} into two parts X and Y of equal cardinality as a bisection of G or a bisec- tion of S. The cost of a bisection (X, Y)of S is defined to be the negative of the Euclidean distance between the centroids of the points in X and Y. Theorem 1: Let B a d n matrix representing an even number n of points of a set S. Let C BrB be considered as the connection matrix of a graph G(V,E).Let c and c 2 be the cost of a bisection (X,Y) in G and S respectively.Then, there exist two constants a and b independent of the bisection (X, Y) such that c 2 ac + b. (**) The relationship of MBP and GC given by Theo- rem 1 was first noted by Frankle and Karp [13].However, a proof for Theorem 1 can also be found in [17].As an immediate consequence of Theorem 1, a bisection (X, Y) is an optimal solution of GC if and only if it is an optimal solution of the corresponding instance of MBP.Thus if we can solve GC in polynomial time, then we can construct test cases for MBP with known optimal cost in polynomial time as follows: Step 1. Generate a random d n matrix, where n is even.
Step 2. Solve GC using the column of B as points in d-dimensional Euclidean space.Let c 2 be the cost of the optimal solution for GC.Step 3. Construct the graph G given by its connec- tion matrix C B rB.The cost of an opti- mal bisection in G is given by (**).
Let (X, Y) be a partition of a set S of points in a d-dimensional Euclidean space E,. and let H be a hyperplane in E. We say that H separates (X, Y) if the set S can be split into three subsets L, R, and M, where L and R lie on either side of H, M _ _ _ H, L_X, and R_Y.Furthermore, if x,yM x y, then we say that (X, Y) is well separated by H.The following two theorems which are proved in [18] form the basis for a polynomial-time algorithm for GC in any fixed dimension.Theorem 2: Let S be a set of an even number of points in a Euclidean space E. If (X,Y) is an optimal bisection of S, then (X, Y) is well sepa- rated.
Theorem 3: Let S be a set of an even number of points in a Euclidean space E of dimension d such that the affine hull of S is E, and suppose that (X, ) is a well separated partition of S.Then, there exists a sequence of affine subspaces E H a D_ H a_ HI -Ho, where a) dim(H) k, 0 <_ k <_ d. b) For 1 _< k _< d, H is the affine hull of k + 1 affinely independent points in S. c) X= IO=0L and Y= IO=0R, where, if k > 0, then L and R lie on either side of H_ in H, and L 0 and R 0 contain at most the single point (possibly repeated) that is in H 0 Theorem 2 says that an optimal bisection must be well separated, and Theorem 3 gives us a procedure for enumerating well-separated partitions among which an optimal bisection exist.Thus, by going through all possible sequences of hyperplanes H a, Ha_l,..., H 0, one is bound to find an optimal solution for GC.In fact, this is exactly the algorithm of Montgomery-Smith and Saab.A straightforward analysis shows that this algorithm runs in O(n(a2)) on n points in a d-dimensional space.However, for points in general positions, this algorithm can be expected to run in O(n(a)).Nevertheless, this algo- rithm remains highly exponential in the dimension d.Therefore, the method for constructing test cases for MBP using this algorithm for solving GC is only practical for small values of d.However, it is ex- pected that test cases for MBP generated by the method of this section to be highly random.

CONCLUSION
In this paper, we have presented several methods for the generation of test cases for MBP for which the optimal cost is known.All the methods except the last one are based on the fact that MCP can be solved efficiently using network flow techniques.The last method uses a relationship between MBP and GC, and it relies on a polynomial-time algorithm for GC in any fixed dimension.The methods described