A Fast Clustering-Based Min-Cut Placement Algorithm with Simulated-Annealing Performance

Placement is an important constrained optimization problem in the design of very large scale 
(VLSI) integrated circuits [1–4]. Simulated annealing [5] and min-cut placement [6] are two 
of the most successful approaches to the placement problem. Min-cut methods yield less 
congested and more routable placements at the expense of more wire-length, while simulated 
annealing methods tend to optimize more the total wire-length with little emphasis on the 
minimization of congestion. It is also well known that min-cut algorithms are substantially 
faster than simulated-annealing-based methods. In this paper, a fast min-cut algorithm 
(ROW-PLACE) for row-based placement is presented and is empirically shown to achieve 
simulated-annealing-quality wire-length on a number of benchmark circuits. In comparison 
with Timberwolf 6 [7], ROW-PLACE is at least 12 times faster in its normal mode and is at 
least 25 times faster in its faster mode. The good results of ROW-PLACE are achieved using 
a very effective clustering-based partitioning algorithm in combination with constructive 
methods that reduce the wire-length of nets involved in terminal propagation.


INTRODUCTION
The design of VLSI circuits is a complex process that transforms a design specification into a physical cir- cuit through several inter-dependent steps.The layout problem is an important stage of the overall design process and it involves the assignments of geometric locations to the elements of the circuits and the elec- tric wire connections among them.Due to an enor- mous combinatorial complexity, the layout problem has been traditionally performed in two stages.In the first stage, called placement, all the circuit elements are assigned fixed geometric locations on the layout surface.The second stage, called routing, consists of the physical realization of the connections among the elements of the circuit subject to a technologydependent set of constraints.Placement and routing are inter-dependent and better layouts may be 38 Y. SAAB achieved by performing placement and routing simul- taneously.There have been previous efforts at com- bining placement and routing [8][9][10][11], but the ever in- creasing size of electronic circuits is rendering the placement followed by routing the more practical ap- proach to the layout problem.Routing is highly de-  pendent on the placement stage.A good placement simplifies the subsequent routing step, while a bad placement may render routing an impossible task.Therefore, routability is a major goal of placement among many other goals such as minimizing timing delays on critical nets, maximizing circuit performance, and minimizing layout area.
Successful placement algorithms have been ob- tained using general optimization paradigms such as simulated annealing [5,7] and genetic algorithms [12][13][14].Other existing placement techniques are: Incremental construction: The strategy used here is to place the nodes of the circuit successively until all of them have been placed.Seed nodes are placed first.Subsequently, based on a selection rule, an un- placed node is chosen and is placed in the next best vacant position.This process is repeated until all the circuit elements have been placed, approaches of this kind are reported in [15].
Node exchange: This is an iterative improvement approach in which an initial placement is improved by the exchange of the positions of some of the nodes.A widely used strategy is the exchange of the positions of two nodes until no further improvements can be made 16,17].
Combinatorial methods: Branch-and-bound is a strategy that systematically explores the solution space of a combinatorial problem in search of the optimal solution [3].The strategy is used to prune or reduce the number of placements explored in what would otherwise be a complete exhaustive search of all possible placements.Branch-and-bound algorithms tend to be computationally expensive and are not used except for problems of small size [1].Branch-and-bound techniques have been reported in [15].Analytical methods: It is possible to formulate the placement problem as a non-linear mathematical problem, where non-linear programming techniques are applicable [18,19].Min-cut placement: Methods in this class place the nodes of the circuits by a recursive use of a partition- ing method.Basically, the circuit is partitioned into two parts.The available layout area is then cut by a straight line into two parts on each side of the cutting line.Each of the two parts of the circuit is then as- signed to the two parts of the layout area.This gives rise to two smaller placement problems which are then recursively solved by the same method until each sub-circuit consists of one cell [6,11,20].
Force-directed placement: Methods in this class view the connections among circuit nodes as binding forces that are trying to keep the connected nodes in close proximity.Therefore, a node is in a good loca- tion if the total force exercised on it, is zero.A com- mon denominator of these methods is the determina- tion of the best location for a node.Usually, nodes are successively moved in some order to their ideal loca- tions in an effort to optimize the placement.Such methods are reported in [21].
Spectral methods: These methods use a mathemat- ical formulation of the placement problem, where placement properties are usually related to the eigen- values of an associated matrix.These relationships are exploited to design of approximation placement algorithms.Several spectral approaches have been re- ported in the literature [22,23].
Resistive network optimization: This approach has been proposed by Cheng and Kuh [24].Basically, the placement problem is formulated as the problem of minimizing the power dissipation in a resistive net- work.A solution of the resistive network is then translated into a placement of the circuit.Among all placement methods, simulated anneal- ing is currently the most popular and is the best al- gorithm available in terms of the placement quality, but it is too time consuming.One of the best avail- able simulated annealing placement package is Tim- berwolf 6 [7].Min-cut algorithms rank second to simulated annealing in terms of placement quality but are substantially much faster [1,4].The contribution of this paper is the design of a min-cut algorithm (ROW-PLACE) with results that are competitive with Timberwolf 6 in terms of quality.In terms of speed, ROW-PLACE in its normal mode is at least 12 times faster than Timberwolf 6, and, in its fast mode, ROW-PLACE is at least 25 times faster than Timber- wolf 6. ROW-PLACE is distinguished from previous min-cut placement methods by an effective clustering-based partitioning algorithm in combina- tion with constructive methods that reduce the wire- length of nets involved in terminal propagation.

ROW-BASED PLACEMENT
For placement purposes, an electrical circuits consists of a hypergraph along with geometric descriptions of its components.A hypergraph G(V, E) consists of a set of nodes V and a set of nets E. Each net e E is a subset of 2 or more nodes in V.In the hypergraph model of an electrical circuit, each node correspond to a component of the circuit, and each net represents a common electrical signal among its constituent nodes.A pin is a point of contact of a net with one of its constituent nodes.A net may touch a node in more than one pin.The locations of pins of a node are specified by relative coordinates with respect to the center of that node.The nodes are usually rectangular in shape and are placed so that their sides are parallel to the reference coordinate axes in the plane.There- fore, the location of a node is completely specified by the coordinate of its center if only one orientation of the node is allowed.
Row-based placement is an approach applicable to design styles such as standard cells, gate arrays, and field programmable gate arrays.In this approach, the nodes of the circuit have a common height but differ in length, and they can be placed in horizontal rows, where each row has the same common height of the nodes.The space between rows is reserved for rout- ing.The number of rows is a user-chosen parameter and is usually chosen so that the layout space used is approximately a square.The length of a row is the sum of lengths of nodes assigned to it.Therefore, to avoid wasted space at the end of short rows, the placement algorithm must balance the lengths of rows.

OUTLINE OF ROW-PLACE
Roughly, the min-cut approach used is the same as in [20].We alternate the partitioning of the circuit nodes by vertical and horizontal lines until the nodes are localized in small areas where they can be assigned to specific locations in specific rows.To ease the de- scription of the process, let us define a rectangulation of the layout surface to consist of: 1) A partition of the rows into horizontal slabs.Each slab consists of one or more consecutive rows and each row belongs to a unique slab.
2) Each slab is in turn partitioned into an ordered sequence of rectangles by means of vertical lines.
Consider a slab S of a rectangulation that spans rows to h and consists of k rectangles r r k or- dered from left to right.Let rn [(/+ h)/2].The slab S can now be partitioned into two slabs U and D by a horizontal line that cuts each rectangle r in S into two rectangles ui and di.Slab U spans rows rn + 1 to h and consists of k rectangles u uk, and slab D spans rows to rn and consists of k rectangles d dk.A rectangulation can be refined by cutting each of its slabs with more than one row by a horizontal line.
Call this operation a y-refinement.A slab S that con- sists of k rectangles r r k can be refined by cutting each one of its rectangles in half by a vertical line.Let xi and Yi be the left and fight halves of rectangle ri.The refined slab consists of 2k rectangles Xl, yl Xk, Yk.An x-refinement is the application of the above operation to each slab of the rectangulation.
A placement of a circuit into a rectangulation con- sists of assigning each of its nodes to a unique rect- angle.The length of a rectangle residing in a slab that spans rows to h is equal to the sum of all the lengths Y. SAAB of the nodes assigned to it divided by h + 1.The x-coordinates of rectangles are computed according to the order of rectangles in their respective slabs from left to right.Consider a slab of k rectangles r r k that are ordered from left to right.Let x be the x-length of rectangle ri.The x-coordinate of rect- angle r is xi/2 + ,i-} xj.Thus, r has Xl[2 as its x-coordinate, r2 has x2/2 + Xl as its x-coordinate, and so on.Each row in the placement is at some y level.Let h be the common height of all the nodes of the circuit.Then row has the y-coordinate h/2 + 2(i 1)h.Thus, it is assumed as in Timberwolf [7] that adjacent rows are separated by distance h.This as- sumption is only used to estimate the wire-length and the separation between adjacent rows can only be de- termined after routing.The y-coordinate of a slab is the average y-coordinate of its rows.Each rectangle of the slab has the same y-coordinate as the slab it belongs to.The x and y coordinates of a node in a placement are those of its enclosing rectangle.The length of a net in a placement is the half-perimeter of the rectangle that encloses all its pins.The wire- length of the placement is the sum of lengths of all nets.
A placement into a rectangulation can be refined by applying either a y-refinement or an x-refinement.When a rectangle is cut into two rectangles, the nodes assigned to it are partitioned between the two result- ing rectangles so that the two new rectangles are about equal in length.
The initial rectangulation is one that consists of one slab that spans all the rows and that consists of one rectangle.By repeated applications of x-refinements and y-refinements, we reach a rectan- gulation where each rectangle contains one node of the circuit and each slab spans one row.At this stage, the coordinates of each node specify the location of that node in the final placement.Many different se- quences of x-refinements and y-refinements have been tried.The best approach was the one that alter- nates between the two refinements as in [20].
Figure 1 shows a placement of a small circuit ob- tained using ROW-PLACE.The length of nodes 1 through 8 are 22, 18, 30, 10, 15, 25, 35, and 5 respec- tively.The common height of the nodes is 5. Nets are represented by dashed lines connecting the nodes.The sequence of rectangulations that led to the place- ment in Figure 1 is shown below.Each rectangulation is shown as a sequence of slabs.The rows spanned by each slab are indicated.Rectangles in each slab are listed from left to right as sets of nodes.
Initial rectangulation: slab(row 1, row 4): 1, 2, 3, 4, 5, 6, 7, 8 After y-refinement: slab l(row 3, row 4): 1, 2, 3, 4} slab2(row 1, row 2): {5, 6, 7, 8} After x-refinement: slab l(row 3, row 4): {1, 3}, {2, 4} slab2(row 1, row 2): {5, 7 }, {6, 8 After y-refinement, the final rectangulation is: slabl.2(row 3, row 3): 3 }, {4 slab2.1 (row 2, row 2): 5 }, 6 slab2.2(row 1, row 1): 7 }, 8  4. TERMINAL PROPAGATION Some of the earlier min-cut algorithms partitioned blocks of nodes without considering connections to other blocks.This scheme was inadequate because the nets that enter a block from the outside have an effect on where the elements of this block ought to be placed.Terminal propagation is a technique that al- lows each block of the circuit to remember and to include the effect of its connections with other blocks.This technique was first introduced by Dunlop and Kernighan [20], but it required the computa- tion of a rectilinear Steiner tree.In ROW-PLACE, a simple terminal propagation approach is used.Given a hypergraph G(V, E) and a subset X C V, let G[X] denote the sub-hypergraph of G induced by X. G[X] itself is a hypergraph with node set X and net set E {N f) X: N E and IN x[ > }.When a rectangle R is being cut into two rectangles U and D, the set X of nodes in R must be partitioned between U and D. This is accomplished by a partitioning algo- rithm applied to the sub-hypergraph G[X] of the cir- cuit hypergraph G. Terminal propagation is included by adding two artificial nodes u and d of zero length to the node set of G[X].Node u (d) is to remain locked in U (D) during the partitioning of G[X].Nets that have nodes outside rectangle R are biased toward U or D. Let N be a net of the circuit that has nodes in common with rectangle R. Let rn and M be the min- imum and maximum of the coordinates of nodes in N in the perpendicular direction to the cut line.Let m* and M* be two parameters chosen as we shall explain later.If both rn < m* and M* < , where B is equal to 3, d} or u according to whether m* <-rn --<M--<M*,rn <m*-<M--<M*,orm*--<rn_<M* < M. If the cut line is horizontal, then m* M* y, where y is the y-coordinate of rectangle R. If the cut line is vertical, then m* x I/8 and M* x + 1/8, where x and are respectively the x-coordinate and the length of rectangle R. Basically, m* and M* are the levels of two lines on either side of the cut line and parallel to it.Let S denote the area between these two lines.A net in G[X] is either deleted, biased to the appropriate side, or not biased according to whether it has nodes on both sides of S, only one side of S, or exclusively inside S. In the case of a vertical cut, m* (M*) is the mid-point of the segment between the center of rectangle R and the target center of rect- angle D (U).The above values chosen for m* and M*, in addition to being the most intuitive and natu- ral for this type of placement, have empirically been verified to be the best after experimentations with several different other choices.Because nodes have equal heights but different lengths, it was necessary to include a buffer zone in the case of a vertical cut but not in the case of a horizontal cut.In the case of a vertical cut, a node close to the cut line is not al- lowed to bias the partitioning process one way or another because it may end up on the other side of the cut line due to the different lengths of nodes.This does not happen for horizontal cuts, since a node, once assigned to one side of a horizontal cut line, stays on this same side for the remainder of the placement process.

THE PARTITIONING ALGORITHM
The input to the partitioning algorithm considered here is a hypergraph G(V,E) with integer node sizes.Define the size S(A) of a subset A C V as the sum of the sizes of its constituent nodes.A partition of G(V,E) is an ordered pair (U,D) such that U t_J D V and U f') D .Ap artition (U,D) is feasible if IS(U) / -< M, where M is the maximum node size in V and T is a desired target size for U.In ROW- PLACE, T is chosen so that a rectangle is cut into two rectangles of about equal length.The input hy- pergraph may contain one or both of the two bias nodes u and d of zero length that are designated to stay locked in U and D respectively during the parti- tioning process.
Compaction is an operation in which subsets of nodes are coalesced (compacted or clustered) into a single node each.When a subset of nodes X C V of a hypergraph G(V,E) is compacted into single node x, the nets incident with node x in the new hypergraph Y. SAAB are of the form (e f') (V X)) tO {x}, where e E is a net originally incident with some node in X, and with the provision that nets that are reduced to one node are discarded.All other nets and nodes in V X remain the same.
The algorithm, BISECT, briefly presented here is described in [25] and in more detail in [26].BISECT uses information collected during iterative improvement to incorporate compactions of nodes in a dynamic way.In [25,26], it is empirically shown that BISECT results can be up to 73 times better than the Fiduccia-Mattheyses algorithm (FM).The good em- pirical performance of ROW-PLACE are mainly due to the highly effective partitioning algorithm BI-SECT.
Let BISECT_AND_COMPACT(G, P1, P2, G', P'I, P'2) be a function that takes as input a hypergraph G (V, E) along with an initial feasible partition (P1, P2) of G, and outputs a compacted hypergraph G' (V', E') along with a feasible partition (P'I, P'2) of G'.
The following is a pseudo-code of BISECT: Generate an initial feasible partition (U, D) of G; save (U, D) as the best current feasible partition; WHILE (improvements are made) DO let H be a copy of the hypergraph G; let (P1, P2) be the best current feasible partition; WHILE (improvement or compactions are made) DO BISECT AND COMPACT(H, P1, P2, H' P' 1, P'2); set H H' and (P1, P2) (P'I, e'2); compute the new best feasible partition of G from (P1, e2); Let (P1, P2) be a partition (not necessarily feasible) of a hypergraph G (V, E).Define B (i) to be the subset of the partition that contains node for each V, and let/ (i) V B (i) be the other subset.The cost of a partition (U, D) is the number of nets cut and is denoted by cost (U, D).The gain of a node is de- fined as: {i}, (/)tO {i}), which is the decrease in cost due to moving node from B(i) to/ (i).
The function BISECT_AND_COMPACT performs the following steps: 1) Free all nodes and set c 0.
2) Forward move: Of the subsets P1 and P2, select the one that has excess size, call it F, and call the other subset T. Move a sequence of nodes fl fk from F to T using a highest-gain-first scheme until either F is out of free nodes or a stopping crite- rion is satisfied.Lock fl and fk in T, set c c + 1, and let L {fl f}.
3) Restore balance: If the size of P1 is equal to its target size then do nothing.Otherwise, of P1 and P2, call F the subset that has excess size, and call the other subset T. Move a sequence of nodes r rj from F to T using a highest-gain-first scheme until either F is out of free nodes or the size balance cannot be improved.Lock rl and rj in T, set c c + 1, and let L r rj}.
4) Save: If the current partition is an improved fea- sible partition then save it.
5) Repeat: If there are still free nodes then go to Step 2.

6) Compaction: Compact G by coalescing together some connected subsets of G[L1], G[L2] G[Lc]
BISECT is the main feature of our min-cut ap- proach.However, in order not to make this paper unnecessarily long, the reader is referred to [25,26] in which the specific details of the implementation of BISECT are discussed at length.This leaves enough space to discuss additional relevant details of our im- plementation of ROW-PLACE without having to du- plicate material available elsewhere.In the remainder of this section, the methods used to generate the ini- tial feasible partition are presented.
Random initial feasible partition: Here each node is placed with equal probability in U or in D until one of U or D exceeds its target size.At this point, the remaining nodes are put in the other subset that is not yet filled to capacity.As an exception, the first two nodes are placed in U and D respectively to guarantee that both subsets of the partition are not empty.This initial partitioning scheme is used when the input hypergraph does not contain any of the bias nodes.
One-sided construction: This method is used to generate the initial feasible partition when the input hypergraph has only one bias node.In this case the nodes are successively put in either U or D depending on whether the bias node is u or d.Since both cases are similar, assume without loss of generality that the bias node is u.Let T be the desired target size of U. Initially the bias node u is put in U. Then other nodes are consecutively added to U until the target size T is reached or is exceeded.At this point, the remaining nodes are put in D. The nodes are put in U using a highest-gain-first scheme, where the gain of a node is the number of nets connecting it to U. To guarantee that both U and D are not empty, the last node is assigned to D even if U has not yet been filled to capacity.
Two-sided construction: This method is used to generate the initial feasible partition when the input hypergraph contains the two bias nodes u and d.Ini- tially the bias node u is put in U, and the bias node d is put in D. Then nodes are alternatively put in U and D until one of the subsets U or D reaches or exceeds its target size.At this point, the remaining nodes are put in the other subset that is not yet filled to capacity.
In this initial partitioning scheme, the gain of a node is the number of nets connecting it to U minus the number of nets connecting it to D. The next node to be added to U has the highest current gain among all remaining unassigned nodes.The next node to be added to D has the least current gain among all re- maining unassigned nodes.
One-sided and two-sided constructions of the ini- tial partition, are used to reduce the wire-length of nets involved in terminal propagation.For example, suppose a rectangle is being cut in half by a vertical line into two rectangles U and D. The x-length of a net involved in terminal propagation will be at least the minimum length of the new rectangles U and D if this net is cut, since this net will have to cross the whole length of either U or D. One-sided and two- sided constructions of the initial partition are used so that most of the nets involved in terminal propagation are not cut by the initial partition and are thus favored to remain uncut in later stages of the partitioning al- gorithm.

TUNING OF ROW-PLACE
In this section we present some enhancements used in the implementation of ROW-PLACE.
Preprocessing: The initial rectangulation consists of one slab that spans all the rows and containing one rectangle R. Let and h be the length and height of R. Normally, R is a square, i.e., h.However, when N [l/hi > 1, better performance were achieved by first cutting R by vertical lines into N almost-square rectangles.This was done by repeatedly cutting the longest rectangle in the slab into two rectangles by a vertical line, where the target length of the leftmost of the two new rectangles was set equal to h.This pro- cess terminates when the length of the longest rect- angle in the slab is strictly less than 1.75h.
Row length adjustment: During y-refinement, each slab that spans more than one row is cut into two slabs by a horizontal line.Consider a slab S consist- ing of k rectangles R R k listed in their order from left to right.The process of dividing S into two slabs proceeds by dividing each of rectangle R of S into two rectangles Ui and D i, where Ui and D belong to the upper and lower new slabs respectively.Ideally, Ui and O should have an equal length.But in prac- tice this may not be achieved all the time.If no ad- justments are made, the lengths of the final rows of the placement may be unbalanced.In addition, unbal- anced cuts of earlier (leftmost) rectangles in a slab causes opposite shifts in the position of later rectan- gles in the two new slabs.These opposite shifts cause nets to stretch in the x-direction and thus increase the Y. SAAB wire-length.This problem is corrected by processing the rectangles R R k in that order.When Ri is be- ing cut into Ui and D, the previous rectangles R R_ have already been cut.Let top (bottom)   denote the total length of U1 Ui-1 (D1 Di-1).
The target length of U is set equal to the length of R plus bottom-top to adjust for previous length imbal- ance.
Another effort to balance the length of rows was made by restricting x-refinements.Specifically during x-refinement, a rectangle that spans more than one row is not allowed to be cut unless the number of nodes in it is greater than UB min (16,[high/low]) where low and high are the minimum and the maxi- mum node length.This is done so that in subsequent y-refinements, the number of nodes per rectangle is large enough to permit the partitioning algorithm to achieve the desired size ratio of the two parts.For example, suppose the first rectangle in a slab has two nodes of lengths 10 and 200, respectively.During y-refinement, such rectangle is problematic because it cannot be cleanly divided into two rectangles of about equal length and thus it will lead to a large size imbalance that decreases the placement quality.The above restriction on x-refinements is meant to avoid the appearance of such problematic rectangles.
Local improvements: Two-interchange of nodes is used as a last step in improving the wire-length.However, nodes are only allowed to move locally.More precisely, the nodes are stored in a 2-dimensional table T. A node in T(i, j) can only be interchanged with a node in T(m, n), where li ml -< ROW_RANGE and n <-RANGE.ROW_RANGE and RANGE are two user-specified parameters, and they were respectively set to 1 and 5 in our experi- mentation.
After one two-interchange step, a node in a row may overlap with other nodes in the same row.Node overlaps are removed by adjusting node positions by a sweep of each row from left to right.

EXPERIMENTAL RESULTS
All our experiments were performed on a DEC 5000- 240 workstation with 96 Megabytes RAM.The re- suits of ROW-PLACE were compared with those of Timberwolf 6. Due to the use of randomness, both ROW-PLACE and Timberwolf 6 produce different results in different runs.For this reason, all results used are averages of 5 different runs in each case.
The circuits used are listed in Table I in increasing Iteration of refinements: Due to the use of terminal propagation, partitioning of the nodes of one rectan- gle is influenced by the coordinate of nodes in other rectangles.Thus it is possible to get better performance by iterating x-refinements and y-refinements as long as the wire-length of the current placement can be improved.Normally each refinement is iter- ated 3 to 4 times.
Iteration of the partitioning algorithm: When the input subcircuit to the partitioning algorithm does not contain any of the two bias nodes, the initial partition is randomly generated.To improve performance in this case, the best partition generated by LIMIT runs of the partitioning algorithm is used, where LIMIT is a user-chosen parameter.In our experimentation, LIMIT was set equal to 5. order of size.The circuits Primary 1 and Primary 2 are two benchmark circuits from the 1987 Physical Design Workshop.The circuits fract, struct, biomed, industry2, industry3 are benchmark circuits from the 1991 Physical Design Workshop.
The first set of experiments shows the improved performance of ROW-PLACE due to preprocessing for those circuits with aspect ratio different from 1.In Table II, the wire-length results of ROW-PLACE without preprocessing (Wnp) are expressed as per- centages over the wire-length produced by ROW- PLACE with preprocessing (Wp), i.e., the numbers shown are of the form (Wnp Wp) Wp 100.The number shown in the bottom comer of Table II is (EWnp EWp) ZWp 100.For individual cir- cuits, the improvement due to preprocessing ranges from 2.9% for circuit t200g to 22.9% for circuit ckta7.The overall improvement for all circuits is 9%.
The second set of experiments is intended to show the contribution of some specific parts of ROW-PLACE: (1) improvement due to local node inter- change, (2) improvement due to the use of one-sided and two-sided constructions to generate the initial partition of the partitioning algorithm, and (3) the im- provement due to the use of BISECT as the partition- ing algorithm.The results are shown in Table III and are expressed as wire-length percentages over the wire-length produced by ROW-PLACE.The first col- umn (local) shows the improvement due to the local node interchange step at the end of ROW-PLACE.
The second column (random) shows the improve- ment due to the use of one-sided and two-sided con- structions to generate the initial partition of BISECT  over the use of a random initial partition.The third column (fm) shows the improvement due to the use of BISECT rather than the Fiduccia-Mattheyses algo- rithm (FM) [27] as the partitioning algorithm in ROW-PLACE.The last column (fm_random) ex- presses the results of ROW-PLACE using FM with a random initial partition as the partitioning algorithm, as percentages over the results of ROW-PLACE.The bottom row (total) of Table III expresses the results of other algorithms as percentages over the results of ROW-PLACE in terms of the sum of the wire-length for all the circuits involved.The following observa- tion can be made: 1) The improvement due to local node interchange ranges from 1.5% for the circuit industry3 to 7.6% for the circuit good.The overall improvement for all circuits is 2.2%.
2) The overall improvement for all circuits due to the use of one-sided and two-sided constructions of the initial partition over using of a random initial partition in BISECT is 2.4%.However, ROW- PLACE using random initial partition in BISECT generated better results for some individual cir- cuits as indicated by the negative number in the second column of Table III.This is mostly due to Y. SAAB 3)

4)
the stability of BISECT as is demonstrated in [25,26].Nevertheless, one-sided and two-sided constructions of the initial partition lead to better results overall and for most individual circuits.
The third column in Table III shows the advantage of using BISECT rather than FM as the partition- ing algorithm.Except for the circuit l000g, ROW-PLACE performed better using BISECT.
The improvement can be as much as 17% for the circuit ckta7 and is 9.4% overall.
The 4th column in Table III shows the signifi- cance of using one-sided and two-sided construc- tions to generate the initial partition for an unsta- ble algorithm such as FM.By using FM with ran- dom initial partition instead of BISECT in ROW- PLACE, the results are 22.5% worse in comparison with 9.4% worse when one-sided and two-sided constructions are used to generate the initial partition of FM.This is to be contrasted with the 2.4% deterioration in performance when BISECT with random initial partition was used in ROW-PLACE (the second column).These results show that one-sided and two-sided constructions provide good starts for iterative improvement, while at they same time they show that BISECT is not as sensitive as FM to the initial partition as is indicated in [25,26].
The third set of experiments is intended to show the good performance of ROW-PLACE in compari- son with Timberwolf 6 [7], a simulated-annealing based algorithm and widely recognized as the cham- pion of row-based placement algorithms.The default parameter setting were used to run Timberwolf 6. Timberwolf 6 was also used with the parameter TWSCfast set to 10 (tw_fast (10)).This effectively speeds Timberwolf 6 by a factor of 10 so that its running time is comparable to ROW-PLACE.The purpose of this is to determine whether or not Tim- berwolf 6 is capable of producing good placements when it is run at the same speed as ROW-PLACE.ROW-PLACE was run in three different settings: (1) the usual setting as described in the paper (usual), (2) the usual setting without iterating refinements (no_i- ter), and (3) the same as the no_iter setting without iterating the partitioning algorithm (LIMIT 1) and with limiting node interchange to adjacent nodes in the same row (fast).
The wire-length results are compared in Table IV.The results of each algorithm are expressed as percentages over the results of Timberwolf 6.The bot- tom line of Table IV shows percentages over the re- suits of Timberwolf 6 in terms of the sum of the wire-length over all circuits.
The timing results are shown in Table V and they are expressed as multiples of the run-time of ROW-PLACE using the third setting (fast), which is shown in CPU seconds in the first column of this table.
The results of tw_fast (10) are significantly worse than those of Timberwolf 6 and are in fact the worst.The results of ROW-PLACE(usual) range from 7.8% better for the circuit biomed to 7.1% worse for the circuit primary2, and are overall within 1.2% of the results of Timberwolf 6.The results of ROW-PLACE(no_iter) range from 5.1% better for the cir- cuit biomed to 8.8% worse for the circuit primary2, and are overall within 5.4% of the results of Timber- wolf 6.The results of ROW-PLACE(fast) are always worse than Timberwolf 6 but are consistently better than tw_fast(10) except for the small circuits good and fract.Overall The results of ROW-PLACE(fast) are 16.3% worse than the results of Timberwolf 6 and are about 18.4% (= 34.7% 16.3%) better than tw-  is quite fast.For the circuit industry3 with 15059 nodes and 21807 nets, ROW-PLACE generated a placement in less than two hours (6355 seconds).Therefore, ROW-PLACE is suitable for use on large problems.
I am indebted to Dr. Carl Sechen for providing me with a copy of Timberwolf 6 and I wish to express to him my sincere thanks.This work was supported by the National Science Foundation under Grant MIP-9208293.
_fast (10).The timing results in Table V show that ROW-PLACE(usual), ROW-PLACE(no_iter), and ROW-PLACE(fast) are respectively 11.7   163.8/ 14.0, 24.8  163.8/6.6, and 163.8 times faster than Timberwolf 6.Table V also shows that tw_fast (10) is slower than ROW-PLACE(usual), while Table IV shows that the results of tw_fast (10) are significantly worse than the results of ROW-PLACE(usual).This shows that Timberwolf 6 cannot achieve comparable results to those of ROW-PLACE in the same amount of running time.

CONCLUDING REMARKS
In this paper, a fast and effective algorithm, ROW- PLACE, has been presented.ROW-PLACE achieves comparable results to those of Timberwolf 6 in 12 times faster running time.Also, good placements can be achieved in 25 times faster than Timberwolf 6 using ROW-PLACE(no_iter).The good results of ROW-PLACE are achieved using an improved clustering-based partitioning algorithm in combina- tion with constructive methods that reduce the wire- length of nets involved in terminal propagation.This shows that if a good partitioning algorithm is used, min-cut placement can achieve simulated-annealing quality placement in much less time.ROW-PLACE FIGUREA small circuit placed by ROW-PLACE.

TABLE III
Effect of specific feature of ROW-PLACE.

TABLE V
Time comparisons.