Power Distribution Synthesis for VLSI

The synthesis of the power distribution network is an important problem in the layout design ofVLSI systems. In this paper we propose novel methods to solve the problem of designing minimal area power distribution nets, while satisfying voltage drop and electromigration constraints. We will see that our methods significantly improve upon current techniques. We propose two novel greedy heuristics for power net design-one based on bottom-up tree construction using greedy merging and the other based on topdown linearly separable partitioning. We test the efficacy of our techniques on benchmark instances. The areas required by our methods on typical instances are significantly smaller than those obtained using previous methods.


INTRODUCTION
The power distribution network supplies reference voltages to the circuit elements of an electronic system.It is designed after the placement phase of layout design.Thus the positions of the devices and their circuit functionality are known.As the power distribution net is one of the largest nets to be routed and subsequent nets are routed with it as a blockage, the area it occupies is of concern.Good power distribution design is also essential as it affects the reliability and speed of the system.The reliability is affected as the mean time to electromigration is determined by the current densities in the power lines.The speed is degraded if the voltage drop on the line is excessive so that the current drive of the devices is smaller than required.
Several trends make optimized power distribu- tion synthesis imperative [20].As technologies scale to the sub-micron regime, the interconnect resistance increases and the total current drawn by the system increases, so that voltage drops on the power lines increases.If the supply voltage is scaled down for low power as in portable applications, the noise margin is further reduced and the problem becomes increasingly important.With device counts increasing, the problem be- comes harder as its scale is larger.While it is possible to alleviate the problem by increasing the number of metal layers, increasing the process complexity considerably lowers the yield.There is clearly a need for better algorithms for power distribution synthesis that exploit all the available degrees of freedom in order to reduce the area requirements, while ensuring correct circuit operation.
Early work on power distribution design cen- tered around single layer routing of power and ground networks [18,17,23].The emphasis was on connectivity, and given the small circuit complexities, electrical effects were not considered.Subse- quently, electromigration and line voltage drop problems became severe enough to warrant optimization techniques.Algorithms were devel- oped [3, 6, 14] to widen the wires to satisfy electrical constraints, while minimizing the area, given an input topology.The problem had been broken into two parts: one of finding a topology and the other of sizing it.
Topology optimization was recognized as a powerful tool in [16].The approach was meant specifically for standard cell layouts.The central idea was to add enhancement buses of minimum area to pre-existing power buses to meet performance constraints.However, high performance systems-the systems in need of such optimization, are invariably full custom, so that unrestricted topologies can be much better than constrained ones.Nevertheless, this work demonstrated that topology design was an important degree of freedom-one whose use could lead to much better solutions.
Simultaneous topology optimization and wire sizing were considered in [21].The entire problem was solved in one step using simulated annealing, but all topologies in the search space are sub-sets of a general grid.This is extremely restrictive and misses a large class of optimum solutions.In particular, when voltage drop constraints are very stringent, optimum solutions may have sinks fed by a single path from the pad.Thus, if two sinks are in the same channel, the optimum topology might run parallel lines to each sink.This is not allowed in the general grid formulation.The origin of this weakness in the formulation can be traced back to the use of simulated annealing as the optimization engine.If a simulator is to be used in an optimization loop, the run time required by the simulator should be small.In order to keep the simulation fast, parallel lines in the same channel are not allowed.This automatically forfeits the possibility of obtaining optimum star routes in a channel.Yet another shortcoming of the simulated annealing approach is the large computational expense.The time required for problem instances with tens of sinks is tens of hours.While the power supply routing problem is usually solved only once during layout design, the use of simulated anneal- ing for large scale designs with tens of thousands of sinks is ruled out even if the problem were partitioned into hundreds of smaller sub-problems.Algorithms that exploit problem structure and quickly return good solutions are needed.While we have railed against simulated annealing, to be fair it should be pointed out that the approach is meant primarily for small analog power distribution problems where noise coupling is critical.
Another topology optimization procedure has been proposed in [8].The procedure starts with a sized power distribution mesh and returns a sized power distribution tree which satisfies the same electrical constraints while using less area.This shows that area minimal topologies are typically trees, so that fast heuristics that synthesize area minimal trees are potentially useful.We will see that this turns out to be the case.
Section II outlines our problem formulation and discusses many insights into the structure of the problem.We observe that the problem is closely related to other layout problems like timing driven routing and clock routing, enabling the re-use of many ideas.We will also see why present techniques for power net routing are insufficient.
The use of information about the times during which sinks draw currents has been omitted'from previous research on power distribution design.
Most formulations either accept the worst case currents of the sinks or require simulation over all patterns of interest.While analysis tools for power distribution networks are more sophisticated [15, 25, 2] all previous works on synthesis [3, 21] lose important temporal information in the formula- tion.In Section III, we will survey the current estimation techniques in use during synthesis and look at better ways of representing the necessary temporal information so that design methods can utilize it.This leads to two powerful representa- tions-the current compatibility graph and current interval sets.We will see that in general, the current compatibility graph representation leads to intractable problems when used for design.However, important special cases lead to simple and elegant means to its use during design.When more temporal information is available, current interval sets can be applied.
Section IV outlines two heuristics for the problem: a bottom-up greedy merging construc- tion and a top-down partitioning-based divide- and-conquer strategy.Section V reports the performance of these heuristics as compared to previous methods.Finally, Seciton VI concludes with directions for future work.the sinks draw currents and the maximum allowed current density Jmax, Minimize Z wj lj jEB where B is the set of all branches of a tree whose topology is to be determined, wj and /j are the width and length of a branch j.The cost function is therefore the area of the power distribution net.
The electromigration constraints are given by formula (1):

_> vj
This means that the wire width of each tree branch should be large enough to keep the estimated maximum current density within limits.
The vertical voltage drop constraints for each leaf node (sink) are as follows: (2)

PROBLEM FORMULATION
We wish to design minimal area, sized power supply net, given only the positions of the sinks, currents drawn at sinks, temporal sink current information, source pad position, and technology parameters.The temporal sink current informa- tion will be discussed in detail in Section III.It represents information about the times at which sinks draw current from the power net.
Thus we state the Power Tree Construction (PTC) problem as: Given P (/Ol, P2,...,P} a set of sink positions on the Manhattan plane, the interconnect resistance per unit length R0, the source pad position, the maximum allowed voltage drop Vvm from the pad to each sink (the vertical voltage drop constraint), the maximum voltage drop difference Vhm between any pair of sinks (the horizontal constraint), the peak sink currents (Ii}, temporal information regarding the times at which In (2) Pi is the unique path from the root (pad) of the tree to the ith sink, R0 is the resistance per unit length and /j'max is the current in the sub-tree downstream of the branch j.
The vertical voltage drop constraints are im- posed so that the current drive of the sinks does not fall below prescribed limits.
The horizontal voltage drop constraints impose the need for bounding the difference between reference voltages seen by communicating circuit elements.For CMOS, they are implied by the vertical voltage drop constraints [3], as the minimum non-zero current drawn by a sub-tree is very small.However, this may not be the case for custom ECL circuits which essentially draw large, constant currents.In this case, optimization is possible if the horizontal drop constraints are stringent and the vertical constraints are not.
The decision version of the PTC problem is NP- complete by restriction to the minimum Steiner tree problem, which is NP-complete [9].Without electromigration and voltage drop constraints, the problem is one of finding a Steiner tree of minimum cost and NP-complete problem.This means that efficient exact solution in polynomial time is unlikely and leads to the quest for good polynomial time approximation algorithms.
We need to see if standard signal routing techniques such as minimum Steiner tree routing or bounded-radius bounded-cost trees are suffi- cient for our problem.Figure shows a small example with three sinks A, B and C. The pad position is indicated by the square.The instance also gives us the times at which the sinks draw current A and B draw current at the same time, followed by the sink C. For this instance neither a minimum Steiner tree nor a bounded-radius bounded cost tree is optimum in terms of area.The geometric formulations simply do not consider some of the available electrical information.Note that critical temporal informa- tion is used to avoid sub-optimal design.This motivates the need for good representation schemes for temporal information, which we study next.

BRANCH CURRENT ESTIMATION
The currents flowing in the branches of a power distribution network determine the wire widths necessary for satisfying electromigration and

FIGURE
Insufficiency of geometric formulations, i) A problem instance with 3 sinks, ii) A minimum Steiner tree.iii) A bounded-radius bounded cost tree.iv) Best design.voltage drop constraints.It is therefore important to be able to accurately estimate the worst case currents in any branch.Previous synthesis approaches typically assume that the current in a branch of a tree is just the sum of the leaf currents.This is a very poor estimate and the artificial nature of this formulation has been recognized [6].
In this section we propose a remedy to this problem.First we will discuss a graph-theoretic representation for sink current temporal informa- tion.We then study computational complexity issues related to finding good branch current estimates using this information.We will also consider representation of currents as a set of time intervals corresponding to the time at which current is drawn.
We call two sinks current-compatible if the sinks never draw current at the same time.This relation is used to define a current-compatibility graph G whose vertex set corresponds to the set of sinks and whose edge set consists of current-compatible sink pairs.Note that information about disjoint temporal utilization of functional units is not difficult to obtain, especially from a high-level synthesis system.In a self-timed system, there is a partial order on the events occurring in the system [5].This might lead to some sub-systems never being activated at the same time as others.In domino logic, a stage evaluates only after its predecessor is done.In a Boolean network, a zeroth order delay model would give gate level as the time instant at which it draws current.Thus, such temporal information can be obtained not only by simulation, but also by straightforward structural analysis.
The computational complexity associated with calculating the branch currents using this graph is of interest.Consider the homogeneous case where all sinks draw the same current.We define sub-sets V1,V2,...,Vk such that the sub-graph induced by each Vi is a clique and Ei[ Vii 1 > K.
We call the maximum such gain attainable the weight of G and denote as W(G).The intuition behind this problem is as follows.If we have a clique of sizen n as the current compatibility graph, the current in the root of the tree is that of a single sink the worst case current estimate has been reduced from n to indicating a gain of n-1 in terms of the current bound.Clearly, if we have k cliques of size V1, V2,..., Vk, for each Vi we get a gain of V;-1 in the curren,t bound.Thus maximum weight corresponds to a maximum gain in the current bound.Conversely, if the current compat- ibility graph is not a clique of size n, we cannot get a gain of n-1 as this would mean that only one of the sinks is conducting at a given time a contradiction.Thus, the bound is tight.Theorem formalizes this argument the branch current estimation problem is translated into a graph theoretic problem.

THEOREM
The worst case current drawn by the root of a tree with a current compatibility graph G(V, E) is given by /max Ii W(G) The computational complexity of finding this partitioning is given by the following theorem.
Proof The problem is in NP as we can test each of the sub-graphs for the presence of all edges in time polynomial in the number of vertices and the number of such sub-graphs is linear in vertex set size.The transformation from [EXACT COVER BY 3-SETS (X3C)] to [PARTITION INTO  TRIANGLES] given in [10] can be used for this problem as well.The statement of the X3C problem is as follows.
[XC] INSTANCE: A finite set X with [X[ 3q and a collection C of 3-element subsets of X.
QUESTION: Does C contain an exact (pair wise disjoint) cover for X.Note that if a clause is in the set cover, we can get a gain corresponding to 4 triangles in the [WEIGHTED CLIQUE COVER] problem.If the clause is not in the set cover we get a gain corresponding to 3 triangles.Thus, the [X3C] instance has an exact cover if and only if the weighted clique cover instance has a weight of at least 2*(q + 3C), where C is the number of clauses and 3q is the size of X.In essence, the [WEIGHTED CLIQUE COVER] problem for any transformation from [X3C] is equivalent to the [PARTITION INTO TRIANGLES] problem as there are no cliques of size greater than 4 and we are done.I--1 The NP-eompleteness result motivates the search for polynomial time estimates that are only a constant factor above optimal.This is in fact possible as proved in Theorem 3 below.THEOREM 3 The weight of the graph W(G) is at most twice that of any maximal matching on the graph G.
Proof Consider an optimum partitioning V1, V2, Vg, where each V induces a clique in G.
Clearly, the maximal matching on G has at least [IVi[/2] edges contributed by each such a clique else it is not maximal.The weight of G is sum of the weights of the k cliques each of which ai [3] ai [9] ai [1] x Yi zi ai[S] contribute at least half their size vil to the matching.The weight of G is therefore at most twice that of the matching.

U]
The bound above is tight.For a triangle, the weight is 2, while the matching is a single edge giving a weight of 1. Maximal matching on a graph with F vertices can be determined in O(F) time using a simple scan algorithm add an edge from the next scanned vertex and skip the adjacent vertex during the scan.
The current-compatibility graph carries infor- mation only about pair wise relationships between sinks regarding the current drawn, but not about the actual individual sink current waveform envelopes (worst case sink current drawn during a clock period).This graph representation is appropriate when little is known about the exact temporal behavior.However, an alternative repre- sentation is possible which captures information about the actual time instants at which sinks draw current.Note that in the context of design flow, we are past the placement phase but signal routing has not been done.Voltage-controlled circuits draw currents when they drive their respective loads, so that sub-intervals of the clock period when there are demands on the power net can be determined.Such interval information would use bounds on delays of nets [19] as placement is done, but not routing.We explain this with an example.
Figure 3 shows a netlist and the intervals for each sink.The sink correspond to the gates in the netlist.Assuming that each of the inputs is available at the same time and the levels of the gates give the time at which the gates draw currents, the intervals for the gates are shown.If each gate draws one unit of current, the bound on the sink current is 2, while the sum of currents is 3.Note that the use of temporal information not only leads to better wire sizing, but also allows for better topology design.
If each sink is associated with a set of intervals corresponding to the time instants at which it draws current, the following procedure gets the best current estimate.
A C (i) The use of sink current temporal information in design, i) The netlist, ii) The sink current intervals, iii) Design without temporal inforamation, iv) Best design.

Current Estimation
Input: A family F of sets of time intervals Output: The worst case current estimate begin For each interval in F increment bins corresponding to times contained in the interval worst case estimate size of the largest bin end FIGURE 4 Estimating the worst case currents given a family of time interval sets.
For simplicity, the time intervals are assumed to be drawn from a small finite set, as this is usually the case.This means that we can have a bin for each time instant between 0 and T (the clock period).
Example Let the family of interval sets consist of three sets { [0, 1], [3, 4]}, { [2, 4]} and {[0, 2], [3, 4]}.In our current estimation problem this means there are three sinks that draw a unit of current at the given times during a clock period.Four bins corresponding to unit times [0, 1] to [3,4] are initialized.After the loop, the first bin has two units in it, the second has 1, the third has and the fourth has 3.The worst case current is 3.During the time interval [3,4], three units of current are drawn from the root (pad) of the tree.The generalization to the case of unequal sink currents is straightforward we increment the bin by the amount of the corresponding current.The inter- vals could also be over an arbitrary set and our procedure would still work with minor changes.

GREEDY HEURISTICS
In this section, we propose two greedy heuristics for the problem.The first heuristic is a greedy bottom-up tree construction strategy which simul- taneously designs the topology and sizes the wires.
The second heuristic employs the divide-and- conquer approach based on linearly separable partitioning to solve the problem.First we discuss the characteristics of the problem and motivate the algorithms.Then we highlight some features of the heuristics.
Voltage drop at a sink can be expressed by the formula given on the left hand side of inequality (2).Note that this expression is isomorphic to the one of calculating Elmore delay in the performance-driven interconnect design problem [4,19].
The delay to the jth sink when resistance per unit length is R0, driver resistance is Rd, capacitance per unit length is Co and load capacitances is CLi can be determined from the following formula: Di Ck Ro [k ' Rd aoL( r) -Rd Z CLi k E Pi Wk V There, Ck is the downstream capacitance seen by a branch k (this is the sum of sub-tree load capa- citances and sub-tree interconnect capacitance), L(T) is the net length, lk and Wk are the length and width of the branch k.The expressions for voltage drop and sink delay are isomorphic if C0=0 and Rd=O in the sink delay equation.The delay in performance-driven interconnect design corresponds to the voltage drop in our problem and the sink capacitances correspond to the sink currents.Wire sizing behavior is the same too wire widening decreases the interconnect resistance in both cases.There are differences however the load capaci- tances are constant while the load currents are time-varying.The delay expression has a term which is quadratic in the wire length.This is absent in the voltage drop expression.
The horizontal constraints are exactly the same as bounded skew constraints in clock routing [24].
Horizontal constraints require the difference bet- ween leaf voltages to be small, while bounded skew routing asks for delays to sinks to differ by at most a given constant.Therefore our topology design should reduce to the problem of clock routing when the only constraints are the horizontal voltage drops.

Bottom-Up Greedy Merging
The PTC problem appears to be closely related to that of the performance-driven interconnect design problem and to the clock routing two problems which have seen considerable research over the past few years.A common thread that runs through both of the problems is the tremendous, almost unreaso,nable effectiveness of greedy methods.In clock routing, the greedy algorithm proposed in [7] returns remarkably small wire length.Similarly, in performance driven intercon- nect design, greedy methods [1, 4] has proven effective.It is therefore natural to expect a greedy algorithm to do well for power supply net routing too.
In this section we propose to build the power net in a bottom up fashion.The basic operation preformed by our greedy algorithm is the merging of sub-trees.Below, we discuss in detail the greedy merge operation.
Consider two sub-trees T and T, rooted at positions Z and Z on the Manhattan plane as show in Figure 5.Let the maximum voltage drops from the sub-tree roots to any of the sinks be VI and V respectively.Let their sub-tree currents be 11 and 12.We need to decide a merge point as the position of the new root.This is chosen to be T1 FIGURE 5 Merging sub-trees T and T2 to get a new sub- tree T.
the point on the periphery of the bounding box of the two sub-tree roots that is closest to the pad.This makes our tree an arborescence tree, using the terminology of [4].The motivation for using arborescence trees is that the sum of sink to pad distances is minimized.This term appears in a lower bound on the sum of voltage drops to sinks.
The widths of the segments Wl and w2 are determined next.This is done using the wire sizing techniques of [14] for homogeneous vertical voltage drop constraints.
We have the electromigration constraints (wi >jIimax)i= 1,2 The voltage drop constraints are trees so that the width may be increased during subsequent merges.
Our heuristic chooses the sub-trees to merge, using a minimum area increase criterion.In other- words, we find the cost increase for the merge of each pair of current sub-trees and choose the merge that gives the smallest area increase.Following the merge, the current drawn by the new sub-tree is determined.This uses the temporal information provided by our formulation.Note that if a sub-tree formed during this process draws a lot of current, it would not be merged right away with other sub-trees, because the topology design uses the exact current information to make better-.informeddecisions.The algorithm is outlined below.
Note that the topology design is dynamically driven by wire sizing considerations, and therfore the wire sizing and topology design are determined simultaneously.At any stage of the above algo- rithm, we have sized sub-trees with exact informa- tion about the current they draw and the maximum voltage drop so far from the root of the sub-tree to any of its sinks.If the maximum voltage drop from the root to a sink of a sub-tree is approaching the maximum allowed, the area increase incurred by a merge with this sub-tree goes up, so that topolgy decisions are guided in the right direction.
(Vmax _> VMi-Jr liRo-ii) i= 1,2   If the minimum widths given by the electro- migration constraints are insufficient for satisfying the voltage drop requirements, we size up the sub- tree by a factor c.This decreases the sub-tree voltage drop by a factor of c but increases the sub- tree area Ag, so that there is an optimum a for minimum area increase, while satisfying con- straints [14].The optimum c and wg can be computed in O(1) time for any pair of sub-trees.
The new largest voltage drop to any sink can also be found in O(1) time.Note that we size up sub- Algorithm GM (Greedy Merging)

Input: Sink positions {Zi}, voltage drop & electromigration constraints, technology
Output: Sized topology of minimal tree begin initialize list of sub-trees to the list of sinks repeat{ minimum_cost_merge0 n-1 times end minimum_cost_merge0 t rain(cost of sub-tree merges for all pairs of sub-trees} where cost increase due to the merge (as described in Figure 4) merge the sub-trees and append to the list end FIGURE 6 Greedy merging algorithm.
The time complexity of the greedy merging algorithm is O(n3) as we have n merges each of which is decided upon in O(n2) time.Each sub-tree is represented by its maximum voltage drop, its children set and the current.Each of these can be computed in linear time for a new sub-tree that merges two others.Thus O(n3) time is sufficient.In our experience, this greedy heuristic takes a few seconds for instances with a hundred sinks, so that the constants associated with this growth are not large.When combined with partitioning, this enables us to solve problem instances with tens of thousands of sinks in minutes as opposed to many days as required by competing approaches such as simulated annealing.

Top-Down Topology Design
If the tree is designed in a top-down manner, it is possible to ensure single-layer routable trees.This can be done by dividing the problem into two pieces whose convex hulls do not intersect so that the routing internal to the two pieces do not intersect.We also have to ensure that at each stage the wire segment connecting the roots of the two pieces to the pad does not intersect any of the other branches that are topologically lower down in the tree.This idea has been used for finding single-layer routable clock trees [13].We use linearly separable partitions for dividing the problem into smaller pieces.A partition of a set of points is said to be linearly separable if there exists a line separating the points into two clusters.Linearly separable partitions naturally give us smaller problem instances whose convex hulls do not intersect.Besides, the number of such partitions grows at most quadratically with the number of points if no three of the points are collinear.We can clearly choose one such partition that results in the smallest estimated area.If we insist on arborescence tress, the root of each sub-tree corresponding to the sub-problems is just the point closest to the pad and contained in the bounding box (the smallest rectangle enclosing all pins).We therefore know the length of the top- level routing.The current in the branches going from the tree root to each of its children can be determined exactly as it depends solely on the children in each sub-tree.The only information that we do not have when deciding on the partition is the routing area that each sub-problem will require.Estimates are required for this and we approximate the unknown sub net area by a product the diameter of the bounding box contain- ing all sinks of the sub tree and the square root of the sinks cardinality.This is motivated by the probabilistic results of [22].Our top-down heur- istic is outlined below.Note that it decides on the topology completely, before sizing the wires and therefore corresponds to conventional methods of choosing a topology first.The topology chosen, however, is cognizant of the electrical constraints that will be encountered later.Once again, note the pivotal role played by the current estimation which enables good partition cost evaluation.
The time complexity of this procedure depends on the size of the acceptable partitions.If we insist on balanced partitions, the time complexity is O(n3).This is determined as follows.

T(n) 2. T() + k.n 3
There are two sub-problems of equal size and the sub-division takes cubic time because O(r/2) possible partitions have to be evaluated and each evaluation takes linear time.However, if we allow partitions of arbitrary size, O(n4) will be the worst case time complexity as we might end up with a sub-problem of size n-1.Just as with the bottom- up construction, the constants associated with this growth are small so that this heuristic takes about a minute for an instance with 150 sinks.

RESULTS
We implemented our algorithm in C on a SPARC 5 workstation.We have compared our heuristics with the previous approach of starting off with a Algorithm TD (Top-Down topology design) Input: Sink pgsitions, current information, electromigration &voltage drop constraints, technology infomaation Output:Power distribution topology begin if only two sinks say l,r present then return binary tree as topology with and r as children for each linearly separable partition (induced by a pair of points) find_cost(partition) choose partition (L,R) of smallest cost embed root at smallest x-, y-coordinate of all sinks in the sub-tree TD(L) TD(R) end find_cost(partition:L,R) begin retum(I(L)*sqrtlLl*diameter(L) + I(R)*sqrtlRl*diameter(R) + root_branch_ area) end FIGURE 7 Top-Down topology design algorithm.
topology and sizing it to satisfy the constraints.The two topologies that we considered were the near-optimal minimum Steiner tree and the star topology.Choosing the topology and then sizing it represents the conventional solution to the problem [17].We tested how well our heuristics performed in the first set of experiments.In the second set of experiments we checked how useful the time domain current information was, as represented by the current interval sets.
There are no benchmarks for power supply routing.We therefore introduce our own bench- marks, based on the widely available clock routing benchmarks from [24].We derived 8 benchmarks from the benchmarks R1 and R2 introduced in [24].A typical high performance design has tens of thousands of gates and close to a hundred power supply pads.The number of sinks per pad is therefore a few hundred.We chose each of R1 and R2 to have four supply pads at the four corners of the die.The assignments of pins to pads was done by a closest point heuristic, i.e., a pin was assigned to its closest pad a reasonable partition of the problem into smaller sub-problems.The total current, i.e., the sum of sink currents was chosen to be 1A.The distribution of sink currents Ii was chosen to be proportional to the corresponding load capacitances CLi which is a valid assumption since large capacitive loads occur when circuit elements are sized up to increase speed, therefore increasing current requirements.The die size for R1 and R2 were set to be 7.5 mm 7.5 mm and cm cm respectively.The number of sinks in each of our eight bench-marks are shown in Table I.The suffix of each name indicates which corner the pad is at.E.g., R1.LL represents the benchmark obtained from R1 with sinks closest to the lower left corner.
The results of our first set of experiments are shown in Table II.
Name We compared the areas of sized Iterated 1- Steiner trees [12], sized star routing, our greedy merging algorithm and the top down heuristic.The run time for the largest example is about one minute for R2.LR on a SPARC 5 and grows cubically with the number of sinks on both our implementations.As the time required is reason- ably small, we do not list the run-time require- ments for all the benchmarks.Very large examples will have to be partitioned into several smaller sub- problems involving a single pad each.The size of these sub-problems will typically be in the range that we cover.The iterated 1-Steiner heuristic returns trees with small net length and would therefore be used by layout tools which use standard signal routing procedures for power distribution net topology design.The star route supplies each pin with an individual route from the pad.This topology was studied in [3] and the conclusion was that star routing could be competitive in row-based routing schemes used for standard cells.The resistance per unit grid is milliohm and the maximum current density is mA/micron 2. The time interval for current drawn by each sink was chosen to be the level in a Boolean network.The number of sinks at the highest levels is usually larger than those at smaller levels.The level is therefore randomly generated from a triangular probability distribution.The levels correspond to a tree-like circuit with 7 levels.
Sizing was done using the techniques of [14].We report areas and net lengths for three different vertical voltage drop constraints.The areas are in units of 10 6 grids (1 grid 0.1 micron 3 micron).The net lengths are shown in Table III.The lengths are in terms of 10 6 grid units (0.1 micron).
The net length comparison clearly shows that the iterated 1-Steiner trees have the smallest net lengths (the iterated 1-Steiner approach is one of the best heuristics for the minimum rectilinear Steiner tree problem) and the star routes the largest, as expected.The areas of the trees returned by the greedy merging procedure however are significantly better than the other algorithms- mainly due to the simultaneous topology design and wire sizing.
Table IV lists the maximum current drawn from the pad for each case.Note that this is much smaller than the sum of the sink currents.Thus, using current intervals does indeed translate into better estimates and smaller areas.
In the next set of experiments we test how the granularity of the temporal information affects our results.Recall that each sink could draw at any of 7 time instants.Clearly, if our delay estimation capabilities in the form of net delay bounds are not so precise, there is more uncertainty in the time interval during which a circuit element draws current.In other words, the granularity of the interval will be larger.Note that standard methods of using the sum of sink currents for a branch current is attained when the time interval spans the entire clock period.In Table V we show how reduction of temporal information granularity affects the results.There, we compared the top- down and bottom-up greedy heuristics with Benchmark shown in Table VI.We see that the improved current estimates influence significantly the areas of power distribu- tion trees. 6. CONCLUSIONS AND FUTURE WORK We have proposed two new heuristics for synthe- sizing the power distribution network.These heuristics use temporal information about the sinks to obtain net whose areas are much smaller than those obtained by the previous methods.Several issues remain to be solved.While we have developed greedy heuristics, no exact expo- nential-time algorithm has been proposed.Note that the Hanan grid [11], which gives the positions of optimum internal nodes of the tree for recti- linear minimum Steiner trees, is no longer suffi- cient for the PTC problem.This can be seen considering an instance with only a horizontal constraint that requires the internal node to be  outside the Hanan grid.Thus, it is of interest to be able to find characterizations of optimum solu- tions.It is also of interest to generate solutions in polynomial time with performance guarantees, i.e., the area is at most a constant factor away from optimal.Our work has been motivated primarily by practical applications and our approach seems reasonable for all test cases seen.

Figure 2
Figure 2 below shows the local replacement for each clause ci (xi, y, z) in the [X3C] instance to the [WEIGHTED CLIQUE COVER] problem.Note that if a clause is in the set cover, we can get a gain corresponding to 4 triangles in the

TABLE IV
number of levels, for a 200mV vertical constraint.The corresponding root currents are different

TABLE V
Effect of smaller temporal information granularity