Analytical Engines are Unnecessary in Top-down Partitioning-based Placement*

The top-down "quadratic placement" methodology is rooted in such works as [36, 9, 32] and is reputedly the basis of commercial and in-house VLSI placement tools. This methodology iterates between two basic steps: solving sparse systems of linear equations to achieve a continuous placement solution, and "legalization" of the placement by transportation or partitioning. Our work, which extends [5], studies implementation choices and underlying motivations for the quadratic placement methodology. We first recall some observations from [5], e.g., that (i) Krylov subspace engines for solving sparse linear systems are more effective than traditional successive over-relaxation (SOR) engines [33] and (ii) that correlation convergence criteria can maintain solution quality while using substantially fewer solver iterations. The focus of our investigation is the coupling of numerical solvers to iterative partitioners that is a hallmark of the quadratic placement methodology. We provide evidence that this coupling may have historically been motivated by the pre-1990’s weakness of min-cut partitioners, i.e., numerical engines were needed to provide helpful hints to weak min-cut partitioners. In particular, we show that a modern multilevel FM implementation [2] derives no benefit from such coupling. We also show that most insights obtained from study of individual min-cut partitioning instances (within the top-down placement) also hold within the overall context of a complete top-down placer implementation.


INTRODUCTION
In the physical implementation of deep-submicron ICs, placement solution quality is a major deter- minant of whether timing correctness and routing completion will be achieved.The first-order objective is to place connected modules close together, to reduce total routing and lower bounds on signal delay.This implies a minimum wirelength based placement objective.Because there are many layout iterations (including those between placement and global/detailed routing, performance optimiza- tions, technology mapping and logic synthesis) and because fast (constructive) placement estimation is needed within the floorplanner, an ideal placement tool will offer fast, consistent, and high-quality results.Due to its speed, "global" perspective, and ability to address wirelength-based objectives, the quadratic placement methodology (cf.such antece- dents as [36,14,9, 33]) has been widely adopted in industry.
In this work, we revisit the quadratic placement methodology and develop insights into its effective implementation, particularly in light of recent algorithmic developments for partitioning.Our paper is organized as follows.After defining notation, Section 2 synthesizes a generic model of the quadratic placement methodology.The key elements of our model are: (i) top-down hierarchical place- ment, and (ii) use of a sparse linear systems solver, coupled with a min-cut iterative (FM-type) parti- tioner, to obtain any given partition within the top-down placement process.Section 3 discusses effective implementation of the quadratic placement methodology.We briefly review previous results [5] that suggest use of Krylov-subspace sol- vers, along with correlation convergence criteria within the solvers, to improve efficiency.We then focus on the coupling between the linear system solver and the min-cut partitioning step as the key to implementing quadratic placement: issues in- elude the type of wirelength-based objective addressed by the solver, how the solver result is used by the partitioner, and the type of partition- ing engine employed.In Section 4, we use a high- quality placement testbed to experimentally assess our various hypotheses.Our most significant results show that the solver-partitioner coupling may have historically been motivated by the pre- 1990's weakness of min-cut parameters, i.e., nu- merical engines were needed to provide helpful hints to weak min-cut partitioners.In particular, we show that a modern multilevel FM implemen- tation [2] derives no benefit from such coupling.We also contrast the abilities of linear-wirelength and squared-wirelength solver objectives to drive partitioners to good solutions.Finally, we show that insights obtained from study of individual rain-cut partitioning instances (within top-down placement) apply to the overall top-down place- ment results as well.We conclude the paper in Section 5 with a list of ongoing research directions, and some comments on the relevance of the quadratic placement methodology to future design methodology requirements.
2. A SYNTHESIS OF THE QUADRATIC PLACEMENT METHODOLOGY

Notation and Definitions
A gate-level netlist is represented for placement by a weighted hypergraph.The n vertices corre- spond to modules, with vertex weights representing module areas or routing requirements.Hyperedges correspond to signal nets, with hyperedge weights representing criticalities and/or multiplicities.The two-dimensional layout region is represented as an array of legal placement locations.Placement seeks to assign all modules of the design to legal locations, such that no modules overlap and chip timing and routability are optimized.The placement problem is a type of NP-hard quadratic assignment.
The numerical techniques used within the quadratic placement methodology apply only to graphs (hyper-graphs with all hyperedge sizes equal to 2).
Therefore, we must assume some transformation of hypergraphs to graphs via a net model.Throughout the discussion and experiments reported below, we use the standard clique model for nets of degree 10 or less, and the directed star model for nets ofdegree larger than 10, to preserve sparsity.For a given multipin signal net with k pins, the graph edges that represent the net may be constructed in several ways, e.g., a directed star model of k-edges, an unoriented star model, or an unoriented clique model of (k(k-1)/2) edges (see [3] for a review).
The resulting weighted graph representation G (V, E) of the circuit topology has edge weights aij derived by "superposing" all derived edges in the obvious manner.The standard undirected clique model [25] assigns all clique edges weight (1/(k-1)).
DEFINITION The n x n Laplacian matrix Q (qo) has entry qij equal to -a O. for Cj and diagonal entry qu equal to }-4 aij, i.e., the sum of edge weights incident to vertex Squared Wirelength Formulation objective Minimize the q(X) Z aij(xi-xj) 2 such that i>j Xc+ Xn are fixed.
This function can also be written as q(X) (1/2)xTQx.
We are interested in quadratic placers that solve the two-dimensional placement problem with a top- down approach, i.e., one-dimensional placement in the horizontal direction is used to divide the netlist into left and right halves, after which one- dimensional placement in the vertical direction is used to subdivide the netlist into quarters, etc.
Certain vertices are fixed, typically due to pre- placement of I/O pads or the inducing of terminals around a block's periphery during top-down place- ment.All other vertices are movable.The one-dimen- sional placement problem seeks to place movable vertices onto the real line so as to minimize an objective function that depends on the edge weights and the vertex coordinates.The n-dimensional placement vector x (xi) denotes the physical loca- tions of modules Vl,..., v, on the real line, i.e., xi is the coordinate of vertex vi.Let c be the number of movable modules and let fn-c be the num- ber of fixed ("pad") modules.Without loss of generality, the c movable modules are Vl,..., v and the f fixed modules are Vc+,..., Vn.The modules can always be permuted prior to optimization to ensure this condition is satisfied.

Essential Structure of a Quadratic Placer
We now state the essential components of the quad- ratic placement methodology.Our primary goal is to establish the coupling between (i) numeri- cal methods for sparse linear systems and (ii) min- cut optimizations or other means of "spreading", or "legalizing", a continuous placement solution.We illustrate our discussion by referring to the PROUD algorithm of Tsay et al. [32,33].
An unconstrained formulation is obtained by opti- mizing the objective function q(X) for c movable modules while satisfying f fixed pad constraints, but without discrete slot constraints.(The term slot constraint, originated by Cheng and Kuh [9], refers 1Some other clique models have been proposed, e.g., the model of [33] assigns all clique edges weight 8/k3.We have obtained similar results for the clique model of [33] as well as for a directed star model. 2"Decomposition" of the two-dimensional placement problem into independent one-dimensional placement problems is used in the quadratic placement methodology to yield smaller linear systems.Notice .thatfor the quadratic objective, the Euclidean problem decomposes cleanly into coordinates (by Pythagoras' theorem)" while the Manhattan problem does not.(Hence, we presumably are minimizing squared Euclidean wirelength.)For the linear object,e, the Euclidean problem does not decompose while the Manhattan objective does.
to the fact that a legal placement must locate modules within the two-dimensional array of allowed locations (slots).E.g., the first-order slot con- straint forces the sum of module coordinates to equal the sum of slot coordinates.)The objective function can be written as (I)q(X) where xf denotes the vector of fixed module positions and x denotes the vector of movable module positions; the Laplacian Q is partitioned into four corresponding parts Qcc, Qcf, Qfc and Qff with QT-Qfc. cf In this formulation, the optimal positions of all movable modules are inside the convex hull of the fixed module locations [33].Hence, we can consider the minimization problem for q(X) over this convex hull.Since qq(X) is a strictly convex smooth function over a compact set (in c-dimen- sional Euclidean space), the unique minimum objective function value is attained at the extremal or a boundary point; the nature of the problem implies that it will be at the extremal point.To find the zero of the derivative of the objective function qq(X), we solve the c c linear system 7(I)q(X) Qccxc + Qcfxf 0 which can be rewritten as Qccxc -Qcfxf This development is similar to that of other "force-directed" or "resistive network" analogies (see, e.g.[36,27,14,9].The essential tradeoff is the relaxation of discrete slot constraints, along with the changing of the "true" linear wirelength objective into a squared wirelength objective, to obtain a continuous quadratic objective for which a global minimum can be found.The typical resulting "global placement" concentrates mod- ules in the center of the layout region.Hence, the key issue is how the "global placement" (actually, a "continuous solution obtained using an incor- rect objective") should be "spread" or "legalized" into a solution to the original discrete problem.
Two approaches have been used to obtain a feasible placement from a "global placement".The first approach is based on assignment, either in one step (to the entire two-dimensional array of slots) or in two steps (to rows, and then to slots within rows) [14].The second and more widely-used ap- proach is based on partitioning: the global place- ment result is used to derive a horizontal or vertical cut in the layout, and the continuous squaredwavelength optimization is recursively applied to the resulting subproblems (see [36, 27, 9, 24]).The main difficulty is making partitioning decisions on the extremely overlapped modules in the middle of the layout.The obvious median-based partition- ing (find the median module and use it as a "splitter") is sensitive to numerical convergence cri- teria.Thus, FM-type iterative improvement strateg- ies ( [22,13]; see [3] as well as the discussion of Section 3 for a review) are commonly used to refine the resulting partitioning (see, e.g. [24]).Since the typical objective for iterative improvement partitioning is some form ofminimum weighted cut, 3 the quadratic placement methodology can be seen to be quite similar in structure to top-down min-cut 3Formally, denote the n modules of the netlist hypergraph H(V, E as V {v, vz vn}.A net e E E is a subset of V with size greater than one.A bipartitioning P {X, Y} is a pair of disjoint clusters (i.e., subsets of V X and Y such that XU Y V.The cut of a bipartitioning P {X, Y} is the number of nets which contain modules in both X and Y, i.e., cut(P) [{elefqXT,eCl YTI}[.Let A (v) denote the area of v E V and let A (S) E,ss A (v) denote the area of a subset S c_ V. Given a balance tolerance r, the min-cut bipartitioning problem seeks a solution P {X, Y} that minimizes cut(P) subject to (A (V)(1 r)/2) < A (X),A (Y) < (A (V)(1 + r)/2).placers, with initial cuts induced from (one-dimensional) placements under the squared-wirelength objective. 4  We summarize the essential structure of a quad- ratic placer as consisting of: a sparse linear systems solver; a min-cut iterative (FM-type) partitioner; and a top-down hierarchical min-cut framework wherein for any given partitioning instance the solver results are used to induce an initial solution for the iterative partitioner.

EFFECTIVE IMPLEMENTATION OF THE QUADRATIC PLACEMENT METHODOLOGY
In this section, we list five major degrees of freedom in the implementation of the quadratic placement methodology.For two of these degrees of freedom the use of modern Krylov-subspace solvers with pre-conditioners, and correlation con- vergence criteria we refer the reader to earlier work of [5].The remaining three degrees of free- dom linear wirelength vs. squared-wirelength objectives, the solver-partitioner interface, and the use of modern (multilevel) FM partitioners are the focus of detailed experiments in Section 4 below.
Of course, we realize that there are many other degrees of freedom in the implementation of a "quadratic placer", e.g., the number of partitioning trials made at a given level of the top-down placement, the integration of metaheuristics such as cycling and overlapping [19], etc.In our experience, tuning these degrees of freedom is an important activity, requiring substantial effort.However, we have chosen to concentrate on what we believe are the more fundamental issues for quadratic placement implementation.In particu- lar, our line of investigation is motivated by the recent advances in algorithm technology for iter- ative partitioning.Our main question is: Do modern partitioners really require seeding from (one-dimensional placements computed by) numerical solvers?Our discussion sets the stage for experi- mental evidence showing that implementation choices for quadratic placement are dominated by the strengths of modern multilevel partitioners.We find that the "quadratic placement method- ology" no longer benefits significantly from the use of linear systems solvers that minimize quad- ratic wirelength objectives.

Use of Krylov-subspace Solvers with Preconditioners
Recall that quadratic placement requires solving sparse linear systems with dimensions on the order of the number of movable modules in a given one- dimensional placement instance.The time com- plexity of an iterative solver depends on both the cost of a single iteration (which is constant during the solution of a given system) and the number of 4The PROUD placer [33,32], which we have cited as a prototype for the quadratic placement methodology, is more careful than previous works in how it applies partitioning during the top-down min-cut process.Suppose that a vertical cut has been made along some centerline, so that the left and right halves must each be split by horizontal cuts.PROUD applies a "block Gauss-Seidel" analogy, as follows.Modules in the left half are ordered vertically, followed by terminal propagation (projection) to the centerline.The projected terminals then influence the vertical ordering of the modules in the right half.The modules of the right half are then fixed and projected to the centerline, where they influence a new vertical ordering of the left half.Eventually, both orderings converge and can be split to induce subproblems.In this way, PROUD affords more global interaction between siblings in the hierarchy (see also the "cycling and overlapping" metaheuristics discussed in [19]).
More generally, we recognize (i) that intermediate steps involving assignment or transportation may also be used to derive hierarchical subproblems from the initial global placement; (ii) that while top-down bipartitioning is the standard approach, hierarchical subproblems may also be derived by top-down quadrisection; (iii) that iterative partitioners can use more sophisticated (placement-based) objectives than the traditional weighted min-cut, and (iv) that any number of metaheuristics can be used within the general top-down framework to improve routability, timing, etc. (such recent works as [34, 19] exemplify such possibilities).
Nonetheless, we have chosen to synthesize a quadratic placement methodology based on FM-type iterative bipartitioning: this is the cleanest (and most typical) framework for the solver-partitioner interaction in quadratic placement.iterations needed until iterates adequately approxi- mate the true solution.The theory of iterative methods shows that the number of iterations needed to obtain a good approximation in norm depends on the spectrum ofthe matrix involved [15].Hence, the idea of a preconditioner a way to transform the original system to an equivalent one with "improved" spectrum.Because most imple- mentations ofpreconditioners entail additional periteration cost, one must carefully examine the overall efficiency of solver/preconditioner combi- nations on particular classes of instances: more expensive iterations must be balanced against the number of iterations saved.
The work of [5] experimentally compares itera- tion counts and CPU times for various combina- tions of solvers and preconditioners when solving typical systems arising within the quadratic placement methodology.These studies were motivated by the observation that works such as [33] use the 1950's-era SOR iteration to solve the linear system in Eq. ( 1), despite many improved methods for solv- ing sparse symmetric linear systems having been developed in recent years.(See [7] for pseudocodes of solvers and preconditioners, as well as a taxo- nomy of the types of systems to which these itera- tions apply; [15] gives theoretical analysis). 5The authors of [5] test comparable implementations of solver-preconditioner combinations using the PETSc library [6], and conclude that BiConjugate Gradient Stabilized (BiCGS) is among the best solvers.Though it does not guarantee convergence, BiCGS is good even for degenerate (not necessarily symmetric) matrices and provides more robust convergence than conjugate gradient (CG).For preconditioners, Incomplete LU-factorization and the Successive Over-relaxation family (including SSOR) are particularly successful.In verifying the superiority of BiCGS, performance of SOR and SSOR was evaluated with the best value of parameter, which was determined to be 1.95 (for SOR/SSOR solvers) and 1.0 (for SOR/ SSOR preconditioners) over a range of problem instances.
3.2.Linear-wirelength vs. Squared-wirelength Objectives In Section 2.2 above, we saw that the continuous placement formulation with squared wirelength, objective has a unique optimum solution that can be found by solving a sparse linear system.However, while being convenient to work with, the quadratic objective does not correspond to any intuitive physical quantity.Compared to the linear objective, the quadratic objective more heavily weights long connections and less heavily weights short connections.To see this, consider the exam- ple of two wires with respective lengths 2 and 10.
According to the linear objective, the relative cost of the long versus the short wire is 10/2 5, but   according to the quadratic objective, the relative cost of the long versus the short wire is 100/4 25 i.e., the cost ratio of long to short wires is much higher under the quadratic objective.
Several works have suggested that the follow- ing linear wirelength objective is superior for placement: Linear Wirelength Formulation Minimize the Ob- jective /(X) "--Z aij]Xi-Xj] such that i>j Xc+l, Xn are fixed.
Mahmoud et al. [26] and Sigl et al. [31] demon- strate the superiority of the linear wirelength 5Briefly, iterative methods for solving large systems of linear equations can be classified as stationary or non-stationary.Stationary methods include Jacobi, Gauss-Seidel, Successive Over-relaxation (SOR) and Symmetric Successive Over-relaxation (SSOR).They are older, easier to implement and computationally cheaper per iteration.Non-stationary methods include Conjugate Gradient (CG), Generalized Minimal Residual (GMRes) and numerous variations.These are relatively newer and harder to implement, but afford much faster convergence.Additional computational expense per iteration is normally justified by much smaller numbers of iterations.Solvers which provide smooth convergence can be also used as preconditioners.Direct solvers present a different source of preconditioners for iterative methods, with examples being incomplete Cholesky (ICC),LU-factorization and incomplete LUfactorization (ILU).
objective for analog and row-based placement, and Riess et al. [30] show that a one-dimensional placement which minimizes linear wirelength can lead to excellent netlist bipartitioning results.
Optimizing the linear wirelength is less straight- forward.For example, there can be multiple opti- mal solutions (consider a single movable module connected to two fixed pads by edges of equal weight this module can be optimally placed any- where between the two pads).The set of optimal placements is again closed and contained within the convex hull of fixed pad locations (see [33]).Thus, direct minimization of the linear wirelength objective can be achieved by linear programming, but this is usually computationally expensive.Consequently, most placers that address the linear wirelength objective find a solution by iteratively solving several quadratic formulations.We use the GORDIAN-L placer [31] to illustrate this techni- que.(Note that the set of constraints that GOR- DIAN-L can handle is more general than described here.In particular, GORDIAN-L can handle center-of gravity constraints whereby the coordi- nates for any subset of modules must be centered around a prescribed center.However, the techni- que described here for optimizing a linear objective by transforming it into a quadratic objective is independent of the types of constraints applied.) The objective t can be rewritten as l(X) i>j aolxi xJl i>j

aij(xi xjl
If the IXi--Xjl term in the denominator were constant, then a quadratic objective would result which can be solved via the above technique. GORDIAN-L first solves the system q(X) to obtain a reasonable approximation for the Ixixjl terms.Call this solution x .GORDIAN-L then derives successively improved solutions xl,x2,... until there is no significant difference between x k-1 and xk.From a given solution Xk-l, the next solution x is obtained by minimizing where g= aij/Ixi -1 x -I I.Note that the coeffi- cients go are adjusted between iterations.The iterations terminate when the factors (x/ -x) no longer change significantly.(It turns out that this approach is actually a special case of a method due to Weiszfeld [35]; see [5] for a detailed discus- sion.)Below, we will experimentally study the ef- fects of the choosing the linear-wirelength, vs. the squared-wirelength, objective for one-dimensional placement within the quadratic placement method- ology.In particular, we will consider the effects on individual cutsizes as well as on total placement wirelength.

Correlation Convergence Criteria
Any iterative solver builds a sequence of iterates that converges to the solution x of Eq. ( 1).In the quadratic placement methodology that we have described, the one-dimensional placement informa- tion is used to "seed", i.e., construct an initial solution for, the partitioner.How soon the itera- tion can be stopped will affect the CPU efficiency of the overall implementation.Typically, iterative sol- vers have stopping criteria, or convergence tests, that are based on some norm of the residual vector for an iterate, 6 which is taken to represent error with respect to the true solution.In practice, most norms are equivalent, and various heuristics (check convergence every j iterations, check differences of iterates rather than residual vectors, etc.) can re- duce the time spent on convergence tests.Constructing an initial min-cut partitioning solution from one-dimensional placement solution wastes information, particularly if nothing is retained but memberships of vertices in "left" and "right" initial partitions.If the final iterate will be sorted and split to induce an initial solution for 6When solving the system Ax b, the residual vector for a given iterate x k is b-Axk.
the min-cut partitioner, then the iteration should terminate as soon as further changes will be ines- sential to the partitioner. 7Determination of "in- essential" fundamentally depends on the strength and stability of the partitioner, as will be discussed in the next subsection.However, regardless of what partitioner is used, solver iterations should certain- ly stop when the left and right groups stabilize.The work of [5] proposed a number of correlation convergence criteria, based on permutations, that may be useful in efficiently measuring such stabili- zation.Instead of residual norms, correlation con- vergence criteria use correlations and rank correla- tions between successive iterates to compute their similarity.Convergence is detected when such a measure becomes sufficiently close to 1, and itera- tions are stopped.(Note the analogy to residual norms which are used in traditional convergence criteria, but tend to 0 rather than to 1.) In Section 4   below, we provide evidence that simple and efficient- ly-computed measures of correlation or rank cor- relation between successive iterates indeed yield useful correlation convergence criteria.

The Solver-partitioner Interface
A fourth degree of freedom in implementing the quadratic placement methodology is the "solver- partitioner interface", namely, the manner in which an initial solution for min-cut partitioning is constructed from a given solver iterate.The key decision concerns how much information to retain from the iterate when "seeding" the partitioner.Above, we noted that the usual practice is to sort coordinates of the iterate, then pre-seed some percentage of modules (corresponding to the most extreme coordinates) into the left and right initial partitions.Many implementation choices must be faced, e.g., whether the pre-seeded modules should be locked or unlocked within the partitioner; how to construct initial assignments for the remaining (not pre-seeded) modules; whether the pre-seeding should be based on module areas or module cardinalities; etc.In our experiments below, we apply the following procedure to create the initial bipartitioning solution.
The midpoint of the iterate is determined, such that the sums of module areas on either side of the midpoint are as close to equal as possible.Fixed pads are assigned to left or right parti- tions according to whether they are to the left or right of the midpoint in the iterate.On the left (right) side of the midpoint, a prescribed seeding percentage of the modules with smallest (largest) coordinates are pre-seeded (but not locked) into the left (right) partition.
The percentage is computed on each side accord- ing to module cardinality, rather than module area.In the experiments below, we study seeding percentages of 0%, 25%, 50% and 100%.
Remaining (not pre-seeded, not pad) modules are randomly assigned to the left and right parti- tions, such that the resulting partitioning is bal- anced.More precisely, we randomly order these remaining modules, then assign each in turn to the partition that currently has smaller total module area.

Use of Modern (Multilevel) FM Partitioners
Recall that a motivation of our present investiga- tion is that the use of numerical linear systems solvers that minimize quadratic wirelength may be a historical accident, resulting from the pre-1990's weakness ofmin-cut partitioners.The standard FM bipartitioning approach consists of iterative im- provement based on the Kernighan-Lin algorithm, using the improvement of Fiduccia-Mattheyses [13].The FM algorithm begins with some initial solution {X, Y} and proceeds in a series of passes.
During a pass, modules are successively moved between X and Yuntil each module has been moved exactly once.Given a current solution {X', Y'}, the 7This precept also applies when the iterate is used to "seed" the partitioner.For example, one can seed the initial partitioning solution with only a percentage (say, 20%) of vertices having the most extreme coordinates (with all other vertices randomly assigned), because these vertices are more likely to be "correctly" assigned.The GORDIAN and GORDIAN-L placers use such a strategy [24,31].previously unmoved module v EX'(or Y') with highest gain (= cut({X'-v, r' + v}) cut((X, r})) is moved from X' to Y'.After each pass, the best solution (X', Y'} observed during the pass becomes the initial solution for a new pass, and the passes terminate when a pass does not improve the initial solution. Recent work [1,2,17,18,20, 21] has illustrated the promise of multilevel approaches for partition- ing large circuits.Multilevel partitioning recursive- ly clusters ("coarsens") the instance until its size is smaller that a given threshold, then unclusters ("uncoarsens") the instance while applying a parti- tioning refinement algorithm.Work in multilevel partitioning was originally prominent in the scienti- fic computing literature for partitioning finite-ele- ment graphs [18, 21, 28].Hendrickson and Leland [18] developed a very efficient multilevel partition- ing algorithm, included in their Chaco package.Metis, another multilevel partitioning package tar- geted to finite-element graphs, was developed by Karypis and Kumar [21].In the VLSI CAD com- munity, previous multilevel works include [1, 10, 17,  20] and [2].As shown in [20] and [2], multilevel implementations of the FM approach give the strongest and most stable results yet reported in the VLSI partitioning literature.Thus, our fifth degree of freedom assesses the use of multilevel FM versus traditional FM implementations. 4. EXPERIMENTAL RESULTS

Experimental Step
Our top-down placement testbed includes the fol- lowing elements.
Plain FM and multilevel FM bipartitioning engines.The plain FM implementation uses a LIFO gain bucket organization for improved performance [16].The multilevel FM implemen- tation uses a CLIP-FM core [11] and follows the description of [2] with respect to use of heavy-edge matching for coarsening [20, 2] and the value of the matching ratio (r-0.33) for coarsening/uncoarsening. Numerical iterations to minimize the squaredwirelength and linear-wirelength objectives.To minimize squared wirelength, we use a BiCon- jugate Gradient Stabilized (BiCGS) solver with- out preconditioner, following the conclusions of [5].To minimize linear wirelength, we apply the Weiszfeld iteration described in [4], using the same BiCGS solver.For this objective, we also use an ILU preconditioner since linear-wire- length minimization is inherently harder than squared-wirelength minimization.
A top-down quadratic placer framework.With- in this framework, relevant implementation choices are: Final (non-overlapping) module placements are evaluated by the sum of net bounding box half-perimeters.
Nets are modeled as weighted graph edges for the numerical solvers using the standard clique model for nets of degree 10 or less, and the directed star model for nets of degree greater than 10.
Pads (or block terminals) are kept fixed in the positions originally specified by the designer.
For multi-pin nets with pins outside the cur- rent partitioning instance, straightforward terminal propagation is used.
We use for standard-cell test cases from indus- try, which we read in Cadence LEF/DEF 4.5 format (see Tab. I).
Our basic experiment explores the various de- grees of freedom from the previous section, a follows.
For each bipartitioning instance with _> 50 mod- ules we use a solver to obtain a one-dimensional placement minimizing either squared or linear wirelength.For smaller instances, we do not produce placements and instead use a random initial solution in the bipartitioning.
We use either LIFO FM (FM) or ML CLIP-FM (MLFM) to obtain a minimum-cut exact bisec- tion (using exact module areas, with tolerance equal to the largest individual module area in the instance).Note that when MLFM is used, its coarsening phase is constrained by the pre- seeding.Pre-seeded modules are not allowed to be matched to modules pre-seeded in the oppo- site partition.
Exhaustive enumeration of all possible placements is used for end-cases having 5 modules or fewer.
Each minimum-cut bisection is the best result from 5 multi-starts, with randomization in the initial assignment of non-pad/non-seeded mod- ules, and in the heavy-edge matching based coarsening stage of MLFM.
Runtimes for our placer on a 300 MHz Sun Ultra-10 are given in Table II.We emphasize that a tuned implementation would be much more efficient, e.g., in practice solvers would not be run with such rigorous convergence criteria.For the top-level bipartition instances alone, the quadratic solver required 11 seconds (178 iterations) and 45  (Weiszfeld) 205 802 seconds (154 iterations), for Case and Case 4 respectively.The 5 starts of LIFO FM required 2 and 11 seconds, and the 5 starts of ML CLIP-FM (including all clustering operations) required 4 and 20 seconds, for Case and Case 4 respectively.

Experimental Data
For each experiment configuration and test case, we examine both the final placement result as well as the results of the top-level bisection step.In the context of the top-level bisection, we save 20 dif- ferent iterates of the squared-wirelength minimi- zation (BiCGS engine) to pre-seed partitioners in separate experiments.These 20 iterates are chosen uniformly spaced in the interval between the first and final solver iterates (the stopping criterion is for successive iterates to differ by less than 10 -8 times the norm of the residual).For each of these iterates, each partitioning engine, and each level of pre-seeding, Figures through 4 show the best cuts achieved in 5 random starts, averaged over 5 separate trials.Oscillations in the Figures, particularly for MLFM results, are due to the ran- domizations inherent in the experimental setup.
For linear-wirelength minimization only a one iteration is available, as Weiszfeld typically conver- ges in a single iteration.The best cutsizes obtained when this iterate is used for pre-seeding (100%, 50%, 25% and 0% pre-seeding) are given in the captions of each Figure .Tables III through VI document the similarity of each iterate of the squared-wirelength mini- mization to the next iterate and to the final iterate, using correlation and rank correlation measures [29].We additionally report the similarity of the resulting partitioning solutions to the solution achieved using the final iterate.Here, the similarity measure is Hamming distance, i.e., the minimum number of modules that must be moved to trans- form one solution into the other. 88More precisely, we treat each bipartitioning solution as a 0-1 vector, so the Hamming distance between two partitioning solutions is a measure of how dissimilar two solutions are.If X xi and Y yi are two bipartitioning solutions, their Hamming distance is -']in_._lIxi Yil.If this quantity is larger than n/2 then the coordinates in Y are flipped and the quantity is recomputed.
In our study of complete placement results, within each experimental configuration we preseed each partitioner call with the corresponding final solver iterate, and report the sum of net bounding box half-perimeters in the final placement.These results are given in Table VII.FIGURE 4 Best cut after pre-seeding with solver iterates for Case 4. The x-axis is the index of the iterate, and the y-axis is the cutsize.When pre-seeding with the final (converged) Weiszfeld iterate, the best cuts were 737, 754, 643 and 590 for FM (0%,25%,50% and 100% pre-seeding); and 318, 328, 337 and 331 for MLFM (0%,25%,50% and 100% pre-seeding).

Discussion
We first note that fully pre-seeded (100%) runs are still somewhat randomized as we do not pre-seed partitioning instances of size 5 through 50 (small instances with 5 or fewer cells are solved optimally with an enumerative approach).Figures through 4 justify the traditional quadratic placement methodology, in the sense that a (LIFO) FM partitioner clearly benefits from pre-seeding by a quadratic (squared-wirelength) solver.We see that full (100%) pre-seeding reduces the FM cutsize by as much as 35%, compared to no pre-seeding (0%).On the other hand, MLFM cutsizes are clearly not improved, and in some cases are worse when pre-seeded with results from the quadratic    pre-seeding.This confirms that with modern partitioners, pre-seeding from analytical placements only hurts solution quality.
With regard to the use of a linear-wirelength minimizer, we observe that the FM partitions are still generally better with more pre-seeding, but that the improvement versus pre-seeding with the quadratic solver is somewhat unpredictable (in two cases, FM results are distinctly worse when pre-seeded with the Weiszfeld solution). 9The MLFM partitioner is still hurt by pre-seeding, but the linear-wirelength pre-seeding is less damaging than the squared-wirelength pre-seeding.Again, our main conclusion is that a modern multilevel partitioner no longer requires pre-seeding by an analytical solver, particularly for large instances.
The Hamming distance studies in Tables III through VI show that pre-seeding with early itera- tes leads to partitioning solutions that are structu- rally similar to those achieved with later iterates, l We also see that correlation and rank correlation to the next iterate increase steadily.Since using later iterates does not usually improve cutsize, this suggests that correlation convergence measures can be used for early termination of numerical solvers.Finally, we observe that linear-wirelength minima do not seem strongly correlated with squaredwirelength minima.
We conclude this section by discussing the total wirelength results for complete placements obtained with each of the experimental configurations.From Table VII, we see that conclusions obtained for individual bisection instances still apply to com- plete placements: (i) FM is weaker than MLFM; (ii) FM benefits from 100% pre-seeding (but not always from 25% pre-seeding) but MLFM does better with no pre-seeding than with pre-seeding from quadratic placement; and (iii) MLFM per- forms somewhat better when partially pre-seeded with the placements produced by the Weiszfeld algorithm, while FM does not benefit from such pre-seeding.Overall, the worst results were achieved by FM with no pre-seeding, with partial pre-seeding by a squared-wirelength solver, or with full pre- seeding by a linear-wirelength solver.The best results were achieved by MLFM with no pre-seed- ing or with linear-wirelength based pre-seeding.

CONCLUSIONS AND FUTURES
We have synthesized the motivations and structure for a generic "quadratic placement" methodology, and developed a testbed that allows exploration of several key implementation decisions.Our experi- ments compare different combinations ofpartition- ers, analytical solvers, and pre-seeding strategies within the solver-partitioned interface.We observe that: Traditional pre-seeding with a quadratic (squared-wirelength) engine does not improve either cutsize or placement results if a strong partitioner (e.g., ML Clip-FM) is used.If pre-seeding is used, earlier iterates are often as good as later iterates, and correlation conver- gence tests based on correlation and rank cor- relation between iterates can save CPU time by detecting when the relative order of module loca- tions stabilizes.
Pre-seeding with a linear-wirelength engine may be useful if issues such as stability and reproducibility of solution structure are considered-but such a conclusion, if true, would require a more elaborate experimental design to demonstrate.
These observations suggest that with the transi- tion from classic FM partitioners to modern multi- level partitioners, quadratic engines may no longer be necessary in top-down placement, and may even lead to a loss of solution quality when applied.
Our ongoing research encompasses the follow- ing areas.
We are improving our placement testbed to enable "apples-to-apples" comparison with com- mercial tools.This entails building infrastructure for timing-driven layout (industry-standard tim- ing models, timing constraints formats, delay calculation and static timing analysis algorithms, 9From the results of [30], we would expect superior performance from a Weiszfeld-seeded FM partitioner.1The Hamming distances are occasionally surprisingly large, even though cutsizes are similar.We recognize that our experiments do not address other issues, notably (i) the number of multi-starts required for stable solution quality, and (ii) reproducibility of solution structure, that may yet reveal advantages of solver-based pre-seeding strategies.Since these issues move us into details metaheuristics within the top-down placement, we defer them to future research.
etc.) as well as interfaces to leading routing en- gines.Out existing placement capability is ex- tremely competitive for wirelength minimization, but routability analyses and legality-checking are inferior to those in commercial systems, making direct comparisons difficult.
We are studying non-hierarchical alternatives for the interface between analytical solvers and the layout substrate.Several recent approaches, based on novel formulations for both solver and "legalizer", appear promising.Finally, several drivers suggest looking beyond the quadratic placement methodology.(1) Well- known limits of quadratic placers include inabil- ity to naturally model path timing constraints, invariance of orderings to unequal horizontal and vertical routing resources, and the requirement of pre-placed pads to "anchor" the ana- lytical placement.
(2) Future top-down design methodologies will tend to have smaller random- logic blocks in order to gain predictability; these may not be large enough for a quadratic placer to show its "global awareness" and runtime ad- vantages.
(3) The advent of block-based designs, with synthesized glue logic spread out over dis- connected regions, may lead to a design planning-block building-assembly flow that is also ill-matched to quadratic placers.Thus, we must also seek new placement approaches that can be better suited to future placement contexts.

TABLE II Total
CPU times for our placer (300MHz Sun Ultra-10) on smallest and largest test cases, under various

TABLE III
Correlation convergence studies for the top-level bisection of Case 1 and pre-seeding with early iterates of the squared-

TABLE IV
Correlation convergence studies for the top-level bisection of Case 2 and pre-seeding with early iterates of the squared-

TABLE V
Correlation convergence studies for the top-level bisection of Case 3 and pre-seeding with early iterates of the squared-

TABLE VI
Correlation convergence studies for the top-level bisection of Case 4 and pre-seeding with early iterates of the squared-

TABLE VII
Average final wirelength results for top-down placement of Case 4