Area Optimization of Slicing Floorplans in Parallel

We first present a parallel algorithm for finding the optimal implementations for the modules of a slicing floorplan that respects a given slicing tree. The algorithm runs in O(n) time and requires O(n) processors, where n is the number of modules. It is based on a new O(n2) sequential algorithm for solving the above problem. We then present a parallel algorithm for finding a set of optimal implementations for a slicing floorplan whose corresponding slicing tree has height O(logn). This algorithm runs in O(n) time using O(logn) processors. Our parallel algorithms do not need shared memory and can be implemented in a distributed system.


INTRODUCTION
loorplan design is the first task in VLSI layout and perhaps the most important one.It is the problem of allocating space to a set of modules on the chip in order to minimize the area of the chip.
A chip is a floor rectangle with the additional infor- mation about the relative positions of basic modules (circuits) such as registers, ALU, etc.The target of floorplanning is to partition the floor rectangle into smaller ones, called basic rectangles, and embed the basic modules into these small rectangles preserving the relative positions of the modules [7,8].A module is called rigid if its dimensions are given, otherwise it is called flexible.In this paper we assume that all modules are flexible.This flexibility allows the de- signer to manipulate the structure of the modules during floorplanning.For each module we are given a list of pairs (height, width), called implementations.
Given the relative positions of the modules of a chip, we wish to find the best implementations of these modules, sometimes called cells, in order to minimize the total layout area of the chip.Notice that the objective function (layout area) to be minimized is *Research supported in part by the Texas Advanced Research Program under Grant No. 3972.nondecreasing.This problem has received consid- erable attention recently [1-2, 4-10].
A floorplan is a partition of the floor rectangle using vertical and horizontal line segments called slices.A floorplan is slicing if it is either a basic rectangle or there is a slice that partitions the en- closing rectangle into two slicing floorplans, see Fig- ure 1.There are two ways to represent a slicing floor- plan" (a) using series-parallel graphs [4], and (b) using a slicing tree [6].A slicing tree T is a rooted binary tree that gives the natural hierarchical de- scription of a slicing flooplan.Each nonleaf node of T is labeled either "H" or "V" specifying whether the corresponding slice is horizontal or vertical.Each leaf corresponds to a basic rectangle.In general, there are many slicing trees that describe a given slicing floorplan.Notice that a slicing tree with n leaves has 2n 1 nodes.
If each basic rectangle has c implementations, where c is a constant, then there are O(c") possible sets of implementations for the floor rectangle.
Stockmeyer [6] presented an algorithm for finding a set of optimal implementations of the cells of a slicing floorplan that requires that O(n2) time in the worst case.In fact, the algorithm runs in time O(n l), where is the height of the slicing tree.At the be- ginning, each leaf of the slicing tree has two pairs, corresponding to the two possible implementations of the basic rectangle.At each node, the lists of the children are merged in order to produce a new list containing all the implementations of the basic rec- tangles in the subtree that could give minimum total area.If the node is labeled "H" (resp."V") then the lists of its children have to be sorted in decreasing order of widths (resp.heights), and the generated list (of the parent node) is also sorted in decreasing order of widths (resp.height).Therefore, if a parent node whose label is not compatible with the labels of its children, the lists of the children will not have the order that the parent node requires.In this case, the algorithm has to reverse them.This fact makes the algorithm hard to parallelize.
In this paper, we assume that each basic rectangle has c implementations, where c is a constant.We first present a new sequential algorithm that runs in O(n2) time and eliminates the need for reversing the lists when they are incompatible; therefore it is easily parallelizable.The main idea of the parallel algorithm is to use a processor for each nonleaf node of the slicing tree and propagate the pairs of the lists from the children to the parents in a pipeline fashion.
The algorithm runs in O(n) time and requires O(n) processors (i.e., we achieve optimal speedup).How- ever, there are slicing floorplans whose correspond- ing slicing trees have height O(logn).Such slicing floorplans are rather important, because in practice the slicing floorplans are obtained by recursively us- ing circuit bipartitioning techniques.Hence, in most practical applications, the height of the slicing trees is O(logn).In the second part of this paper, we pre- sent a parallel algorithm to compute a set of optimal implementations (one for each basic rectangle) of such a floorplan in O(n) time, using O(logn) pro- cessors.Moreover, our parallel algorithms do not need shared memory and can be implemented in a distributed system.

SLICING TREES AND FLOORPLANNING
A vertical slice is a vertical line segment that is en- closed in some rectangle and cuts the rectangle into two smaller ones.Similarly, a horizontal slice is a horizontal line segment that cuts the rectangle into two smaller ones.For each basic rectangle, there is a list of possible implementations of the form {hx W, h2 W2, h,, Win}, where hi is the height and wi is the width of the rectangle.As discussed above, every node of T corresponds to a rectangle, and there is a list of pairs associated with each node.These pairs are ordered by their heights (resp.widths) either in decreasing or in in- creasing order.If they are ordered in decreasing or- der of height (resp.width) then we say that height (resp.width) is the major-value, otherwise if they are ordered in increasing order of height (resp.width) then we say that height (resp.width) is the minor- value.We will see later that if a list has height (resp.width) as its major-(resp.minor-) value, then it has width (resp.height) as its minor-(resp.major-) value.
Procedure M_V merges a pair of two neighboring rectangles separated by a vertical slice into a larger rectangle.The new height is the greater height of the two and the new width is the sum of the widths.Suppose (hi, w) and (h2, W2) denote implementa- tions of the two rectangles.The new rectangle is (max{hi, h2}, w + w2).Procedure M_H is defined similarly.If the slice is horizontal, the new rectangle is (hi + h2, max{w, w2}).Figure 2 shows examples of running Procedures M_V and M_H.
Let (h, w) be a pair in list L. We say that (h, w) is useless if there is a (h', w') in L such that h -> h' and w -> w'; if there is no such (h', w') in L then (h, w) is called useful.
Lemma 1 Let L be the list of node u of a slicing tree T that contains all the useful pairs and no useless pairs.The major-value of L is height if and only if the minor-value of L is width.
Proof: (=)>) Suppose that the major-value of L is height but the minor-value of L is not width.Let (h, w) be the first pair of L that violates the increasing order of width.So there is a pair (hg, wi) in L such that hi > h and W > Wj.This implies that (hi, We) is useless, a contradiction.() Similar to the previous proof.
Let u be a node in T with children u and u2.
Procedure Merge_V (1)i: 1; j: 1; (2)whilem and jk do hi w2 (a)  4)M_V((hi, wi), (h;, w;)}; (5)ifhi>h theni:=i + 1 (6)else if hi h: then begin (7) i:=i + 1; (8) j:=j + 1; (9) end (lO)eise : = + 1 (ll)end; Procedure Merge_H is symmetric to Procedure Merge_V.It merges all the useful implementations of two rectangles separated by a horizontal slice and obtains all the useful implementations of the enclos- ing rectangle.The parent node is labeled "H" and the major-value of L and L2 is width.These two procedures are variations of the ones described in [6] and are included here for completeness.Lemma 2 Let u be a node of a slicing tree T with children U and //2" Assume that L and L2 are the lists of u and u2, respectively.If u is labeled "V" (resp."H") and the major-value of L and L2 is height (width), then the list of u generated by Pro- cedure Merge_V (Merge_H), L, has height (width) as its major-value and contains all the useful pairs and no useless pairs.Furthermore, both procedures run in O(IZ,I + It21)time, and produce at most [LI / ILzl 1 pairs.Proof: First assume that u is labeled "V."Let L be the list generated by Procedure Merge_V.It is clear that all the pairs in L are ordered in decreasing order of their height and L contains no useless pairs.The reason is as follows: Procedure Merge_V starts merg- ing the first pair of L with the first pairof L2.When two pairs from L and L2 are merged, if the height of one pair is greater than or equal to the height of the other pair, then the pair with the greater height is discarded (if the height of both pairs are equal, then both pairs are discarded) and the next pair is considered.This process continues until the pairs in one of the lists are exhausted.Since the pairs in L and L2 have height as their major-value, we have that L are also has height as its major-value.More- over, the reason that L contains no useless pairs is that all the pairs of L and L2 are ordered in increas- ing order of their width.As the process goes on, the generated pairs have larger width.Hence all the pairs in L are useful and ordered in increasing order of their width.
Next we will show that L contains all the useful pairs.Suppose it does not.Then there must be a pair (h, w) such that (h, w) is useful but is not in L. Let (h, w) be generated by merging pairs (hi, wi) and (h:, w:) from L1 and L2, respectively.Since (h, w) is not in L, either (a) (hi, wi) merges with some pair of L2 other than (h, w]), or (b) (hi, wi) does not merge with any pair of L2.
h i, hence h' -< h.Since wt < wi, we have w' < w.This implies that (h, w) is useless, a contradiction.
Next assume that (h, w) is considered before (hi, w) in Procedure Merge_V.Similarly, if there is a pair (ht, wt) that is the last pair in L that merged with (h, w]), and ht > hi, then (h, w) is again useless, also a contradiction.
(b) If (hi, w) does not merge with any pair of L2 then, when (h, wg) is considered, all the pairs in L2 are exhausted.Similar to (a), this contradicts the fact that (h, w) is useful.
From the above discussion, we conclude that L contains all the useful pairs and no useless pairs.Since Procedure Merge_V moves one .pairdown each time at least in one of L1 and L2, it needs O(ILI + IL2]) time to generate L. In addition, since a pair of L is generated by merging a pair in L1 with a pair in L2, there are at most IL[ + [L21 pairs of L gen- erated by Procedure Merge_V.In fact the number of pairs generated is [LI + [L21 1 since the last two pairs will generate at most one pair.Assume that u is labeled "H."Let the major-val- ues of L1 and L2 be width.Symmetrically (to Pro- cedure Merge_V), Procedure Merge_H merges L with L to generate L, where L has width as its major-value, contains all the useful pairs and no use- less pairs.Moreover, symmetrical to Procedure Merge_V, Procedure Merge_H takes O([LI + time, generates at most ILI[ + [L21 1 pairs.Let u be a node of T and u,//2 be its two children.The list of u can be obtained by applying Procedure Merge_V or Procedure Merge_H on the lists of u and u2 depending on the label of u.If the labels of u and/or u2 are not compatible with the label of u, the lists of u and/or u2 need to be reversed in ad- vance.We can compute the list of the root by merging the lists of its children that have been computed in this bottom-up fshion.Notice that every pair p, gen- erated by procedures Merge_V and Merge_H, has two pointers that point to the two pairs that generated p. Finally, performing a linear scan of the final list we can determine the implementation that gin- FIGURE 3 Eight different ways to label a node and its two children. erates the minimum area of the floor rectangle, and using the pointers we can trace down to the leaf nodes and obtain the optimal implementations of the basic rectangles.There are eight different ways to label a node and its two children as shown in Figure 3.For the cases where the labels are not compatible (cases (c)-(f)), the list of the child (or both children) with different label has to be reversed.In order to reverse a list, we have to wait until all the pairs of the list are generated.Hence the list of the parent cannot be generated until both children's lists are completely generated.This makes the algorithm very hard to parallelize.Our new floorplanning algorithm does not need to reverse lists.The computatio n starts whenever there is a nonredundant pair available in each of the children's lists, and it continues until all the useful pairs of the parent are generated.

THE NEW FLOORPLANNING ALGORITHM
In this section we present a new procedure that gen- erates the list of a parent without reversing the lists of its children.This procedure is perhaps the most important ingredient in parallelizing this floorplanning problem.
Suppose u is a node of T and Ul, u2 are its two children.Let L and L2 be the lists of u and u2, respectively.Now let the lists of all the leaves have height as their major-value.For those sibling leaves whose parents are labeled "V," the lists are merged using Procedure Merge_V, described in the previous section.For those sibling leaves whose parents are labeled "H," we use Procedure Mg_H, described below.This ensures that all the lists of the nodes of T will have height as their major-value.
We first present the ideas underlying Mg_H in- formally.Let u, the parent of u and u2, be labeled "H" and L be the list that is generated by merging the lists L1 and L2.Suppose Pi (hi, Wi) and p] (h;, w]) are pairs in L1 and L2, respectively, that will be merged.If none of them is redundant, then we merge them.If none of them is the last pair of its list, we look at the pairs Pi+ (hi+l, Wi+l) and hi+ then pi and p]+ P]+I ( 1, wi+).If Wi+l > Wj+I then Pi+l and are to be merged next, if W+l wi/ then Pi/l P/I are to be merged next; if wi/ < W/l and p are to be merged next.The procedure ter- minates when both Pi and p are the last pairs of their lists.

Procedure Mg_H
(1)begin (2)i: 1; j: 1; (3)while (hi, We) is not the last pair of L1 or (h:, w:) is not the last pair of L2 do (4) begin (5) 1 and w is the last pair of L2 then M_H((hi; We), (hi, w])) /*lVI_H((hi, We), (hi, w])) (hi + h], max{we, (6) else if j and (hi, We) is the last pair of Z then IVL_H((h/, We) (hi, w])) (7) else if/ 1 and wj < Wj+l -< wi thenj'=j + 1 /*(h, w]) is not the last pair of L2, but it is redundant*/ (6) else ifj 1 and wi < Wi+l <-w then i:=i + 1 /*(h, We) is not the last pair of L but is redundant*/ (7)  else if (hi, wi) is the last pair of L1 (8) then begin (9)   M_H((hi, wi), (h;, w;)); ( This procedure needs constant time to generate a new pair of the list of u, L, We will show that a pair p is useful in L if and only if p is generated by Pro- cedure Mg_H. Let u be a node of the slicing tree T with children u and /'/2" Assume that Lx and L2 are the lists of u and u2, and their major-value is height.Let u be labeled "H," and L be the list of u.We have the following lemma: Lemma 3 Let p (h, w) and p M--H(pi, p) where Pi (hi, We) and p (hj, w.) are pairs in L1 and L2, respectively.If p is a useful pair for u then p is generated by Procedure Mg_H.
Proof: We distinguish the following cases: Case 1. pi is the last pair of L1 and p is the last pair of L2.By Procedure Me_H, p is gen- erated.
Case 2. pi is the last pair of L2 and Pi is not the last pair of L. Obviously, p. is not the first pair of L2 because L2 contains at least two pairs.Also, (hi+ 1, wi+ 1) does not ex- ist.Hence none of the Steps ( 5) and ( 6) are executed for Pi and pi.Now suppose Pi does not merge with pi.According to Procedure Mg_H, this can happen only h' w! 1) and We+ if Pi merges with ( If then (hi+ wi 1) will Wj.
We+ Wj 1 + merge with p. and generate the pair (hi+l + hi, max{we+l, W;}).Since the major- value of L1 is height, by Lemma 1, hi+l < hi and wi/l > wi.Hence the generated pair makes p useless, a contradiction.If w+ < w, again, by Procedure Me_H, there exists a pair Pt (ht, wt) in L1 such that ht < hi and p, merges with p. Similar to the discussion above, p becomes use- less, a contradiction.Case 3. Pi is the last pair of L and p is not the last pair of L 2. Similar to Case 2. Case 4. Pi and p are not the last pairs of L1 and L2, respectively.Apparently, both Pi and pi can not be redundant, otherwise p is useless.Suppose p does not merge with &.Similar to the previous case, if this is true, then p is useless, a contradiction.Considering all the possible cases for Pi and p, we conclude that if p is a useful pair for u, then p is generated by Procedure Mg_H.
Next, we will show that Procedure Mg_H does not generate any useless pairs.Lemma 4 shows that for any two consecutive pairs (h, w) and (h', w') gen- erated by running Procedure Mg_H on L and L2, we have h > h' and w < w'.We need this property in order to show that Procedure Mg_H does not generate useless pairs.Lmma 4 Let p (h, w) and p' (h', w') be two consecutive pairs generated by running Procedure Mg_H on L and L2, where the major-value of L and L2 is height.If p is generated before p' then h >h'andw<w'.
Proof: Let p M_H(p;, pi) where pg (hg, w) and pi (h], wf) are pairs in L and L2 respectively.
Case 2. p is the last pair of L2.Proof: Suppose p is in L*.From Lemma 3, p is generated by Procedure Mg_H.Suppose that p is not in L*.Then there must be a useful pair q in L* that makes p useless.But q is also generated by Procedure Mg_H.Using Lemma 5, the fact that q is generated by Mg_H implies that p is not generated by Procedure Mg_H, a contra- diction.
Since Procedure Mg_H moves one pair down each time at least in one of L1 After selecting the best implementation of the floor rectangle, we need to trace down to the basic rectangles in order to obtain the optimal implemen- tations of the cells.This is done easily by keeping two pointers for each pair in each list that point to the pairs of the children that generated the pair.The Algorithm FP below uses Procedures Merge_V and Mg_H to find the optimal implementations of the cells.

Algorithm FP
(1) begin (2) Prepare the lists of all the leaves so that the ma- jor-value is height.
(3) From the second to the bottom level to the root level do (4) From the leftmost node to the rightmost node do (5) if this node is labeled "V" then call Procedure Merge_V (6) else call Procedure Mg_H; (7) Let L,. be the list of the root.Scan all the pairs in Lr and select the one with minimum area, i.e., height width.
(8) Using the pointers of the selected pair, trace down to the cells and return the optimal implementations of the cells.If T is a skewed slicing tree with internal nodes labeled "V" and "H" alternately, the height of T is 4 THE PARALLEL ALGORITHM O(n), see Figure 4.Because there are O(n) cells, the time needed for generating L is O(n2).Hence we have the following Theorem: Theorem 2 Let T be a slicing tree with n leaves.Algorithm FP computes the optimal implementa- tions of the cells in time O(n2).
Let r be the root of a slicing tree T, and assume that the lists of all the leaves are prepared such that the major-value is height.In this section we present an algorithm that uses O(n) processors and computes the list of r in O(n) time.To simplify our description, we assign a processor to each internal node.If the node is labeled "V," and for each of its children's lists there is at least one pair available, then we apply Procedure Merge_V to merge them; if it is labeled "H" and for each of its children's lists there are at least two pairs available, then we apply Procedure Mg_H to merge them.As soon as a pair is generated by a processor, it is sent to the processor of its parent.A processor stops computing when Procedure Merge _V or Mg_H is finished.
There is a way to embed an arbitrary binary tree into a hypercube with dilation three (i.e., such that any two neighboring processors in the tree are at distance at most three in the hypercube).Further- more, an arbitrary binary tree can be embedded into its optimal hypercube with dilation five [3].Hence, we can embed a slicing tree into a hypercube so that the communication delay of our parallel algorithms are only multiplied by three or five.
Notice that we do not need to assign one processor per internal node.The exact number of processors depends on the structure of the slicing tree.One can use a pool of available processors.When a processor is needed then it is taken from the pool.Similarly, when a processor finishes its computation, it is re- turned to the pool.However, in this model, we as- sume that the processors are fully interconnected.
Algorithm PFP; (1) begin (2) for all the internal nodes the corresponding processors do the following in parallel: /*each processor contains two lists corresponding to the children's lists*/ while there are pairs available in their children's lists do begin if the node is labeled "V" and there is a pair available in each list of its children then begin merge the children's lists (according to Procedure Merge_V) for each generated pair, it is passed to the processor of the parent node; end else if the node is labeled "H" and there are at least two pairs available in both children's lists then begin merge the children's lists (according to Procedure Mg_H); for each generated pair, it is passed to the processor of the parent node; end; end (3) Let Lr be the list of the root.Scan all the pairs in Lr and select the one with minimum area.
(4) Using the pointers of the selected pair, trace down to the cells and return the optimal implementations of the cells.
Before we continue, we need the following simple observations" (1) Each pair is generated in constant time.The time taken to generate a pair is called a basic step.
(2) In Procedure Mg_H the redundant pairs will not generate pairs in their parent's list, be- cause if a pair is known to be redundant, it will not be merged with pairs of the other list.
(3) If (hi, wi) is not redundant then (hi+l, Wi+l) is not redundant because Wi+ Let u be a node of slicing tree T with children U and u2, and L, L and L2 be the lists of u, Ul and u2 respectively.Also let u be labeled "H" and the ma- jor-value of L and L2 be height.Suppose that the first k pairs of L2 are redundant.Using Procedure Mg_H to merge L1 and L2, the processor in u gen- erates the first pair of L k + 2 basic steps after it started computing.This is because Procedure Mg_ H will skip the first k pairs of L2, and start merging the first pair of L and the (k + 1)th pair of L2 at basic step k + 1.Notice that we assumed that a processor needs one basic step to check and skip a redundant pair, although in fact it takes less time than generating a pair.
Lemma 6 Let u be a node of T, and T' be the subtree rooted at u. Assume that the height of T' is I.If there are k redundant pairs generated by the nodes of T', then the processor at u starts generating pairs no later than basic step + k.
Proof: If u is labeled "V" then there are no re- dundant pairs to be considered.So we only need to consider the case that u is labeled "H."Let T, T2 be the subtrees of T rooted at u and u2.We use induction on I. Assume that 1.This implies that u and u2 are both leaves.Assume that there are c and c2 pairs in L and L2, respectively, where c and c2 are con- stants.Suppose there are k redundant pairs for T', then those k redundant pairs must be either in L or L2.W.l.o.g., let the redundant pairs be in L. Hence the processor at u is checking and skipping the re- dundant pairs until the (k + 1)th pair of La is con- sidered, and starts merging the pairs at basic step +k.Now assume that for < l, where is the height of T', the processor at u starts generating pairs at basic step + k where k is the number of redundant pairs generated by the nodes of T'.
Let the height of T' be l.Assume that the number of redundant pairs generated by the nodes of T and T2 is k and k2, respectively.Also assume that there are m redundant pairs either in L or in L2.Since the height of both T and T2 is less than l, u starts generating pairs at basic step 1 + k and u2 starts generating pairs at basic step 1 + k2.Hence, u will start generating pairs no later than basic step + max{k, k2} + m.But the nodes of T' generate k k + k2 + rn redundant pairs and max{k, k2} -< k + k2.Therefore, the processor at u starts gen- erating pairs no later than basic step + k.
V Lemma 7 Let each leaf of T have at most c imple- mentations.The processor at the root of T will start generating pairs no later than (c + 1)n basic steps.
Proof: Let be the height of T. Since each redun- dant pair does not generate any pair for upper nodes, if at some level there are p pairs totally and q of them are redundant, then there are at most p q pairs for every level higher than i.By Lemma 2 and Theorem 1, we know that there are totally no more than cn pairs generated by the nodes in the same level of T. Hence, there are at most cn pairs gen- erated by the root of T. This implies that there are at most cn redundant pairs generated by the nodes of T. By Lemma 6, the processor at the root starts generating pairs no later than basic step + cn <-(c + 1)n. [5] In the next lemma we show that for any processor, once it starts generating pairs, it will always have enough pairs in its children's lists so that it generates a new pair for each basic step until the pairs of one of its children's list are exhausted.
Lemma 8 Once a processor at some node u of T starts generating pairs it will generate a new pair for each basic step until one of the lists of its children is exhausted.
Proof: Let the height of u be l.Let u and U be the children of u, and L1, L2 be the lists of u and u2, respectively.We use induction on l.Assume that 1.Clearly, u and u2 are both leaves.The processor at u does not need to wait for the new pairs arriving at u and u2.Hence it will not be interrupted.
Next, assume that for < l, where t is the height of u, the processor at u will generate a new pair for each basic step until one of its children's list is ex- hausted.
Now consider the processor at u that has height I.
Let the processor at u start generating pairs at basic step .This implies that there are enough non-re- dundant pairs of L and L2 available at basic step j 1 (by Algorithm PFP).So after basic step j, by induction hypothesis, new pairs will be generated at u and u2.If u is labeled "V," there are no redundant pairs need to be considered at u. Again, by induction hypothesis, at each basic step, there are new pairs generated at u and u2, respectively, until one of Lx or L2 is exhausted.If u is labeled "V," then there are no redundant pairs need to be considered.If u is labeled "H," by Observation 3, the new generated pairs at u and u are not redundant.Hence, there are always enough pairs available for merging at U and u2, thus the processor at u will not be interrupted until no more pairs arrive at U or U 2. ['-] Combining the above results we obtain the follow- ing: Theorem 3 Let T be a slicing tree with n leaves.Algorithm PFP computes the optimal implementa- tions of the cells in O(n) time using O(n) processors.
Proof: Since T has n leaves, there are 2n 1 nodes in T. Hence O(n) processors are enough for the par- allelization.
By Lemma 7, the processor at the root will start generating pairs no later than (c + 1)n basic steps; by Lemma 8, once it starts generating pairs, it will not stop until all the pairs are generated.Further- more, since there are at most cn pairs in L, the list of the root, so the time for generating Lr can be calculated as follows: Total generating time Broadcasting time + Waiting time + Execution time of the processor at the root -< (c + 1)n + 0 + cn O(n).For steps (3) and ( 4), only O(n) time is needed to select the pair and trace down to each cell.Hence the whole process can be done in O(n) time.
5 THE PARALLEL ALGORITHM FOR ALMOST BALANCED SLICING TREES served, and the total layout area of the chip is min- imized.The topology, i.e., the relative positions of basic rectangles, of a slicing floorplan is obtained by recursively using circuit bipartitioning techniques.A bipartition divides a given circuit into two parts such that: (1) the sizes (the number of basic rectangles) of the two parts are as equal as possible, and (2) the number of nets connecting the two parts is mini- mized.Hence, clearly, the slicing trees generated by bipartitioning techniques have height O(logn).
The result of Theorem 3 achieves optimal speedup in the worst case, i.e., when the slicing tree has height O(n).However, if the height of the tree is O(logn), the sequential algorithm takes only O(nlogn) time.
Here we will show how to compute the optimal im- plementations in O(n) time using O(logn) proces- sors.
(3) The nodes at level are the roots of O(logn) subtrees.Dedicate one processor to each sub- tree.
(4) For each such subtree, use Algorithm FP (except for the last two steps) to compute the lists of the roots sequentially.Of course all O(logn) lists are computed in parallel independently.
(5) Consider the subtree of T consisting of the root and all the nodes down to level i, call it T'.T' has O(logn) leaves (the roots of the previous subtrees), whose lists were computed in Step 3.Call Algorithm PFP on T', using a processor for each non-leaf node of T'.
Figure 5 shows an example of a balanced slicing tree with height 4.During the first step of Algorithm FBT, T is partitioned into T' and O(logn) subtrees.Each such subtree has O(n/logn) leaves and height O(logn).If we use Algorithm FP the list of the root of each subtree can be clearly obtained in O(n) time.Also, T' is a balanced tree with O(logn) leaves whose lists contain O(n/logn) pairs.Since there are at most cn pairs in each level of T', Algorithm PFP will com- pute the best implementations of the cells in O(n) time.Hence we have the following: As we discussed in the Introduction, the target of floorplanning.is to determine a set of implementa- tions (one for each basic rectangle) such that the relative positions of the basic rectangles are pre-Theorem 4 Let T be a slicing tree with n leaves and height O(logn).Algorithm FBT computes the opti- mal implementations of the cells in O(n) time using O(logn) processors.

EXPERIMENTAL RESULTS
We simulated Algorithm PFP to generate all the use- ful implementations of F in Pascal on a Sun 4 work- station under the UNIX operating system.We com- pared the number of steps needed for the sequential algorithm with the number of steps needed for the parallel algorithm.In EXP1, we tested 10 skewed slicing trees.In EXP2, we tested 10 non-skewed slic- ing trees.The trees are randomly generated.The results are shown in Table I and Table II, respec- tively.For the parallel algorithm, we also computed the number of processors used.The number of pro- cessors to be used is obtained as the following: We use a pool of available processors.When a processor is needed then it is taken from the pool.When a processor finishes its computation, it is returned to the pool.We can see that the number of processors needed in both EXP1 and EXP2 is close to 70% of the number of leaves of the slicing trees.The results are also plotted in Figure 6 and Figure 7.These two figures verify that the sequential algorithm needs O(n2) time and the parallel algorithm needs O(n) time to find the optimal implementations of the basic No. of Leaves

3O
rectangles for F. Also we note that the time needed for skewed slicing trees is more than the time needed for non-skewed slicing tree.

CONCLUDING REMARKS
In this paper, we presented a parallel algorithm to compute the optimal implementations of the basic rectangles for any slicing floorplan.Our algorithm runs in O(n) time with O(n) processors, in the worst case, where n is the number of basic rectangles.We also presented a more efficient algorithm that solves the area optimization problem for floorplans whose corresponding slicing trees have height of O(logn).
Namely, our algorithm runs in O(n) time and uses O(logn) processors.Both parallel algorithms achieve optimal speedup.In addition, our algorithms do not need shared memory and can be implemented in a distributed system.
We provided experimental results that verify the theoretical speedup of Algorithm PFP, and showed that we use about 0.7n processors.
When the height, l, of the slicing tree is more than O(logn) and less than O(n), the sequential algorithm of Section 3 takes O(ln) time to find the optimal implementations.There is an interesting question" Is it possible to achieve optimal speed up in this case?Our preliminary results indicate that it is possible to solve the problem in O(n) time using O(l) processors.
However the technique seems to be rather compli- cated and not very realistic, because we need a powerful parallel machine with shared memory that al- lows concurrent reads and concurrent writes.
Another interesting question is the following: Can this problem be solved in poly-log time using a poly- nomial number of processors or is the problem Pcomplete? FIGURE

FIGURE 4 A
FIGURE 4 A skewed slicing tree.

FIGURE 5 A
FIGURE 5 A balanced slicing tree with height 4.

FIGURE 7
FIGURE 7 The experimental results of EXP2.
Similar to Case 1. simple induction on Lemma 4 proves the fol- w+ > Wi+l.As in Case 1, wi < wi+, otherwise p is not generated.Comparing p and p', h > h' and w < w'.Similarly, and we can prove that for wi+ Wj+ also we have h > h' and w Wi+ Wj+l < w'.A