Combining technology mapping with layout

Due to the significant contribution of interconnect to the area and speed of today's circuits 
and the technological trend toward smaller and faster gates which will make the effects of 
interconnect even more substantial, interconnect optimization must be performed during all 
phases of the design. The premise of this paper is that by increasing the interaction between 
logic synthesis and physical design, circuits with smaller area and interconnection length, 
and improved performance and routability can be obtained compared to when the two processes 
are done separately. In particular, this paper describes an integrated approach to 
technology mapping and physical design which finds solutions in both domains of design 
representation simultaneously and interactively. The two processes are performed in lockstep: 
technology mapping takes advantage of detailed information about the interconnect 
delays and the layout cost of various optimization alternatives; placement itself is guided by 
the evolving logic structure and accurate path-based delay traces. Using these techniques, 
circuits with smaller area and higher performance have been synthesized.


INTRODUCTION
A principal goal of the electronic design automation effort is to provide designers with the capability to employ a complete top-down design methodology, compiling abstract architectural descriptions into an optimal implementation for a specific physical media.In fact, as density increases, system and ASIC de- signers have no choice but to converge to such a top- down design methodology.However, there is at least one major difficulty that must be overcome in order to develop the full potential and promise of this meth- odology.The problem is that the top-down design approach makes high-level decisions about optimiza- tion, scheduling, allocation, logic partitioning and re- structuring, and so forth in terms of abstract views of behavior and structure.During these early steps, fac- tors such as characteristics of the physical media, layout, interconnect and parasitics are ignored.Physical design which is expected to address these issues, comes much later in the design hierarchy.By then, many of the key architectural and structural decisions have been made, hence, limiting the capability of the physical design tools to generate the "best" solutions *Corresponding author.111 in terms of area and performance.In addition, once some parameter at the physical design stage fails to satisfy a constraint imposed on it, the synthesis must be modified (even repeated) so as to accommodate the constraint.The new change may cause some other constraint to be violated and the process must be it- erated.
It is, therefore, necessary to develop models, algo- rithms and techniques to increase the power and ro- bustness of the top-down design methodologies.The goaluwhich is easy to state but difficult to achievemis to integrate the various design steps keeping the computational complexity manageable.This paper is a step toward achieving this objective, that is, it describes novel techniques to couple logic synthesis to physical design and to explicitly control the ihterconnection length during logic synthesis.This will allow physical design and logic synthesis to evolve together and will ensure higher degrees of system level integration.The goal of logic synthesis is to produce a circuit which satisfies a set of logic equations, occupies min- imal silicon area and meets the timing constraints.Logic synthesis is often divided into a technologyindependent and a technology-dependent phase.In the first phase, transformations are applied on a Bool- ean network to find a representation with the least number of literals in the factored form.Additional timing optimization transformations are applied on this minimal area network to improve the circuit performance.The role of the technology-dependent phase is to finish the sYnthesis of the circuit by per- forming the final gate selection from a target library.The technology-dependent phase is, to a large extent, constrained by the structure of the optimized Boolean network.Most previous work in logic synthesis [4,3] has focused on minimizing gate area and delay through a chain of gates without considering the area needed to hold the interconnect lines or the delay through the lines.It is generally assumed that inter- connect optimization can be relegated to the physical design phase.Only recently some attention has been given to the interconnect optimization during logic synthesis [1, 20, 19]. 1.1 Interconnect Effects Interconnections are becoming a major concern in to- day's high-performance, high-density ASIC designs because the distributed RC time delay of these lines increases rapidly as chip sizes grow and minimum feature sizes shrink [2].With recent studies [23, 10]  indicating that interconnections occupy more than half the total chip area and account for a significant part of the chip delay, it is appropriate that wiring is integrated into the cost function for logic synthesis.To elaborate on this point, consider Figure 1 which shows a performance-optimized two-input NAND gate driving a performance-optimized inverter gate through 0.2 cm of aluminum interconnect (2 m wide, 0.5 gm thick, with a 1.0 m thick field oxide beneath it). 0.2 cm is the expected length of a local interconnect line on a 2m 2cm chip [2].Two methods are used to calculate the rise time (to 50% of its final value) at the input of the inverter gate: one method ignores the capacitance and resistance of the interconnect line, the second method accounts for them [22].Gate delays are taken from data sheets for an industrial 1-micron  pacitance and resistance are calculated using expressions given in [2].The delay calculations clearly show that interconnect capacitance dominates gate input capacitance and interconnect resistance may be ignored without introducing much error.When the RC tree forms branches, delays for the branching nodes can be calculated independently and accumu- lated to obtain the delays at the sink nodes.This cal- culation, however, requires knowledge of net topologies which is not available before global routing.This effect will not be addressed here.Furthermore, the transmission line properties of interconnect lines are ignored for on-chip connections.Therefore, an accu- rate expression for propagation delay through gates connected by local interconnect lines is given by d(z) "r + Rs(C + Cl) where "r is the intrinsic gate delay, R is the on-resis- tance of the driver gate, C is the input capacitance of the fanout gate, C is the interconnect capacitance per unit length and is the interconnect length.
In summary, with the existing technology, the ca- pacitive term is dominated by the capacitance be- tween the interconnection and substrate.For local aluminum lines, the resistive term is dominated by the on-resistance of the MOS transistor; for polysili- con and global aluminum lines on large-size circuits, the resistive term is controlled by the interconnection resistance.As the chip dimension increases and the minimum feature size decreases, the interconnection resistance increases rapidly while the MOS on-resis- tance remains relatively unchanged; the interconnec- tion capacitance bottoms at about 1-2 pF/cm while the input gate capacitance decreases.Therefore, the distributed RC delay of interconnect lines will be- come even more dominant in the future.

Concepts and Examples
A Boolean network , is a directed acyclic graph (DAG) such that for each node in N there is an asso- ciated representation of a Boolean function f, and a Boolean variable Yi, where Yi --fi.There is a directed edge (i, j) from Yi tO yj iff depends explicitly on Yi or Yi" A node Yi is a fanin of a node yj if there is a directed edge (i, j) and a fanout if there is a directed edge (j, i).A node Yi is a transitive fanin of a node yj if there is a directed path from Yi tO yj and a transitive fanout if there is a directed path from yj to Yi.Primary inputs are inputs of the Boolean network and primary outputs are its outputs.Intermediate nodes of the Boolean network have at least one fanin and one fanout.A Boolean network is an implementation or representation of a set of incompletely specified Boolean functions.
Given a Boolean network optimized by technolo- gy-independent logic operations and a target library, technology mapping is the process of binding nodes in the network to gates in the library such that the area of the final implementation is minimized, and timing constraints are satisfied.We illustrate the in- corporation of the interconnect into technology map- ping with a simple example.Figure 2a shows a small portion of a NAND-decomposed Boolean network.Source nodes si have either been mapped (and hence have been assigned matching gates and positions) or are fixed at the circuit boundary.Note that Sl and s2 are positioned near one another but far from s3 and s4.The objective is to transfer the signals from si's to the sink node implementing the desired logic func- tion while using minimum wire length.Conventional technology mappers attempt to find a solution with the smallest area gate which matches as many inter- mediate nodes as possible (the solution with one AND4 gate in Figure 2a).This is a good approach if the fanin gates s can be placed near the matching gates.However, in many cases, these gates are either strongly connected to different gate clusters on the layout plane or are fixed at the circuit boundary and hence may have positions far from one another and from the matching gate.Therefore, a solution with one distribution point may incur a large interconnec- tion cost.In fact, there is often an optimum number of mapped gates greater than one which will result in overall minimum wire cost as depicted in Figure 2. If the number of sources is large, say four or more, then  it will pay off to consider how close the sources can be placed by a good placement optimizer before de- ciding whether a solution of one gate (with high fanin count) or a solution of more than one gate (with low fanin counts) should be chosen.The technology map- per proposed here selects the solution with one AND2 and one AND3 gate in Figure 2a. Figure 2b illustrates the importance of a layoutdirected technology decomposition for the technol- ogy mapping scheme.This figure shows the same decomposition tree as in Figure 2a.However, this time as a result of placing the NAND-decomposed network, source nodes sl and s3 (s2 and s4) have been positioned near one another.Signals coming from sl and s3 (s2 and s4) enter the network at topologically distant points.This is undesirable because the mapper has lost the option of reducing the wiring cost by breaking one big gate into smaller gates, i.e., in Fig- ure 2b, the mapping solution with one AND4 gate is superior to other solutions in terms of both total gate area and interconnection length.Therefore, Figure 2a provides better technology decomposition (and hence potential for higher quality mapping) than that in Fig- ure 2b.

Overview
In this paper, we present LDTM [20], a technology mapping program, built on top of MIS [4], which tightly and interactively couples mapping to placement.LDTM's key idea is to generate a placement of the optimized multi-level Boolean network which captures the structure of the network.The placement information is used to evaluate the cost of a gate matching during decomposition and mapping pro-cesses.The placement is dynamically updated in or- der to maintain the correspondence between logic and layout representations.In the end, a mapped network along with a companion placement solution are gen- erated.The placement solution is then globally re- laxed in order to produce a feasible placement ac- cording to the target layout style (e.g., standard-cell or sea-of-gates).This design flow is in contrast with the existing synthesis systems which separate tech- nology mapping and placement steps.
Technology Mapping is driven by layout informa- tion derived from the placement of the Boolean net- work.It is, therefore, essential to generate a place- ment solution which not only captures the global con- nectivity structure of the network, but also produces the shortest directed path between any pair of pri- mary inputmprimary output nodes.[16] describes a flow-oriented approach to the placement of general directed acyclic graphs.
In Section 2, the technology mapping paradigm is presented.Section 3 describes a technique for creat- ing a feasible placement solution from the companion placement solution.Sections 4 and 5 contain experi- mental results, discussions and future research direc- tions.
from lack of detailed information about the intercon- nect delay.The approach presented here, however, explicitly considers interconnect and routing com- plexity during the mapping.
Our layout-directed technology mapping is based on the DAG covering formulation which can be sum- marized as follows.A set of complete base functions is chosen, such as a two-input NAND gate and an inverter.The optimized logic equations (obtained from technology independent optimization) are con- verted into a graph where every node is one of the base functions.This graph is called the subject graph.Each library gate is also represented by a graph con- sisting of only base functions.Each such graph is called a pattern graph.A library gate may have many different pattern graphs.The technology mapping problem is then defined as the problem of finding a minimum cost covering of the subject graph by choosing from the collection of pattern graphs for all gates in the library.For area optimization, the cost of a cover is defined as the sum of gate areas.For per- formance optimization, the cost of a cover is defined as the critical path delay of the resulting circuit.In particular, the mapper binds a given logic circuit onto a set of gates in the target library minimizing post- layout area, delay or area under delay constraints.

LAYOUT-DRIVEN TECHNOLOGY MAPPING
A successful and efficient solution to the technology mapping problem was suggested by K. Keutzer and implemented in DAGON [12] and MIS [9].The idea is to reduce technology mapping to DAG covering and to approximate DAG covering by a sequence of tree coverings which can be performed optimally us- ing dynamic programming.DAGON and MIS tech- nology mappers generate circuits with small active cell area but ignore area contributed by interconnec- tions between gates.Consequently, these mappers produce gates with high fanin count which often in- crease routing congestion during the final layout and increase interconnection lengths.Similarly, performance-oriented technology mapping programs suffer

Technology Decomposition
The procedure for converting an optimized Boolean network into the subject graph (i.e., technology de- composition) is not unique, and it is an open problem to determine which of the possible subject graphs yields an optimum solution when an optimum cover- ing algorithm is applied [5].The goal of our technol- ogy decomposition procedure is to find a circuit rep- resentation with minimum signal arrival time at the primary outputs and minimum number of wire cross- ings (given an initial placement of the optimized Boolean network).
The decomposition process starts by constructing AND-OR trees implementing the sum-of-product representation of the logic function associated with each intermediate node in the Boolean network.The function of AND subtrees is to compute the product terms (cubes) and that of the OR subtrees is to com- pute the sum of the product terms.The input signals to the AND subtrees and then the cubes in the OR subtrees are ordered.The conversion from the or- dered AND-OR subtrees to the gates in base function set is accomplished using unbalanced NAND decom- position by providing late arriving inputs with shorter paths through the NAND-decomposed subnetwork [25, 7].
In order to derive the input signal ordering, one re- fers to the companion placement solution for the Bool- ean network.Each multi-pin net signal is modeled by a star connection from the source toward the sinks.By circularly traversing around each node (for example, starting from the positive horizontal axis and proceeding in a counter-clockwise fashion), a unique ordering of the input signals to the node can be determined.This ordering is directly related to the positions of the fanin nodes with respect to the node in question.
The cube ordering is achieved by setting up a lin- ear assignment problem.S slots are placed on an imaginary inner circle around the node, and the projections of the fanin signals into an imaginary outer circle around the node are found (Figure 3).Then, a (R) fanin node F Xl X3 X4 + X2 X3' + X4' X5 FIGURE 3 Cube ordering viewed as a linear assignment prob- lem.linear assignment cost matrix C is set up whose C ik entry corresponds to the cost of assigning cube to slot k.This entry is equal to zero if slot k falls inside the shortest circular span for the immediate support of cube i.Otherwise, the cost is proportional to the angular distance of slot k from the nearest end of the support span of cube i.The linear assignment pro- gram [7] determines a cube assignment with the min- imum sum-cost.The cube ordering is easily derived from the cube positions obtained by the above linear assignment procedure.The process of ordering input signals, cubes and then primitive gate decomposition is recursively applied to all nodes in the Boolean net- work in order to produce the subject graph.

DAG Covering for Minimum Layout Area
Consider a Boolean network N, which has been trans- formed into a subject graph consisting of only two- input NAND and inverter gates.This is the network in its unmapped form which will be referred to as the inchoate network, Ninchoate.In DAGON, Ninchoate is partitioned into a set of maximal trees, T i, and an optimal dynamic programming solution is found for each tree.In MIS, Ninchoate is split into a set of logic cones K i, where each cone corresponds to a primary output and all its transitive fanin nodes.This allows covering across tree boundaries and, as a result, may duplicate logic.The MIS technology mapper imple- ments DAGON as a subset.Our mapper uses cone partitioning.
where inputs(v, m) refers to the list of nodes of N which correspond to the inputs of m. gate(m) is the hawk .,..,,, physical gate corresponding to m. gate(vi) is the best gate matching at node v i.The area cost calculation is similar to that in MIS.The wire cost wire_cost(v, m) consists of two terms.The first term is the intercon- nection length required to complete connections from gate(m) to its fanin gates, i.e., gate(vi).The second term is the dynamic programming recursive cost and represents the sum of wire lengths required to con- nect all gates from primary inputs up to gate(vi).In order to calculate wire_cost(v, m), the position of m must be known as shown next.
At the beginning, nodes are assigned valid pla- cePositions based on the initial global placement so- lution.As nodes are mapped, mapPositions are calcu- lated and stored on nodes.In particular, match m is placed at the center of mass (or median) of its fanin and fanout rectangles.Due to the postorder traversal of the network during the mapping procedure, in- puts(v, m) have already been mapped and therefore their mapPositions are used; outputs(v) are not mapped yet and their placePositions are used.
Depending on the wire length metric adopted, the local placement problem can be solved efficiently or can become difficult.Consider Figure 5, which shows the enclosing rectangles for the fanin and fanout nets of match m at 11. 2 Given a norm and the coordinates of these fanin and fanout rectangles r, the problem is to find a point p which results in the minimum sum of distances between that point and the rectangles.In case of the Manhattan norm, the solution easily fol- lows by observing that the distance function has a separable form with respect to the variables x and y.
That is, the x distance of point p from rectangle r is 1/2 (Ir.ll.xp.xl + Ir.ur.x-p.xlIr.ur.x-r.ll.xl) where II and ur refer to the lower left and the upper right of rectangle r.The constant term is dropped and the problem can be restated as: Find x such that Nil x x] is minimum where xi corresponds to either the left or the right corner point coordinates of each of the rectangles.The problem is a special case of solv- ing for the median of a graph which is presented in [11].It can be shown that this problem, treating only a linear tree rather than a general graph, is very easy to solve; the solution is the median point for the sorted list of x i' s.For the Euclidean norm, the opti- mal point location problem can be solved approxi- 1The enclosing rectangle for a fanin (fanout) net is the smallest rectangle enclosing the gates connected by the net.mately by placing m at the center of mass of its fanin and fanout rectangles 18].
When constructing the enclosing rectangle of fanin net i, it is important to know fanout nodes of source node vi.These fanout nodes of vi are dynamically defined based on the current partially mapped net- work.First, we give some definitions.A sink node in a pattern graph is defined as a node which does not fanout to any other node in the pattern graph.In Fig- ure 4, assume that cone K has been mapped.At this point, nodes can be classified into four categories.An egg is a node which has not been processed (visited) by the mapper.A nestling is a node in the current cone, K2, which has been visited.It cannot be pre- dicted whether a nestling will be present in the final mapped network until primary output POe is reached.
A dove is a node in K which is a non-sink element of some pattern match.Such a node will not be present in the final mapped network because it has been merged into another.A hawk is a node in K which is a sink node in some pattern match.Such a node will inevitably show up in the final mapped network.Note that every dove has been merged into (fallen prey to) at least one hawk.A nestling can become a hawk or a dove.Due to the possibility of logic duplication, it may be possible for a dove to reincarnate and restart the node's life cycle as an egg and later become a hawk.The dynamic fanout of fanin v for match m at v is a hawk, a nestling or an egg which has v as its fanin.For example, the list of dynamic fanouts of node v4 consists of nodes v, x 2 and f4.During the construction of the enclosing rectangle of net (say 4), nodes covered by the current match are excluded from the net.In addition, mapPositions are used for hawks and nestlings (x 2 and v4) and placePositions are used for eggs (f4).
After positioning gate(m), the wire cost associated with the matching of m at node v must be calculated.This cost consists of the sum of the wire lengths from m to its fanins.Consider fanin vi of m.If it is driving only the input pin of m, the wire length calculation is point-to-point.However, if it is driving multiple fanout pins (including input pin of m), then the half perimeter length of the minimum box bounding all pins on the net is used.Note that during technology mapping for minimum area, only length of the line connecting each fanin v to m is of interest.Therefore, the net length must be divided by the dynamic fanout count at v in order to get the expected wire length contributed by connection from v to m and thereby avoid duplicate accounting of the wire cost.

DAG Covering for Minimum Circuit Delay
In the delay mode, the best mapping at a node is determined based on the arrival time of the signal at the node output.As pointed out earlier, it is the delay in the interconnect that is of prime importance.Hence, it is only natural that wiring delay is incorpo- rated into the calculation of the arrival time during technology mapping.
Consider a gate g with output line y and input lines i, 1...p.Let g fanout to inputs of gj.In a simple linear delay model, the delay through g is a linear function of its output load capacitance C L. The slope of this linearity can be thought of as the output resis- tance and the offset (at zero CL) can be thought of as the intrinsic delay through g.In general, the delays from different inputs to the output are different.Therefore, the intrinsic delay from input to y is de- noted by I;, and the output resistance at y correspond- ing to input is denoted by R;.I and R each have separate values for rising and falling delays.
Based on this general model, the arrival time at y from input i, tyi, can be easily calculated as ty + I; + R C L where is the arrival time at input line i.
Using a worst case analysis, the output arrival time at y, ty is defined as the time at which all signals from input lines will be available at y and is given by ty max[ tyi] computed over all i, 1...p.This calcu- lation for the arrival time requires that the value of CL be known.CL is the equivalent capacitive load at y.This capacitance is modeled as C t =1 Cj + C where Cj denotes the capacitance at the input of fanout gate gj, and n is the number of fanout nodes.
Cw represents the capacitance due to the interconnec- tions which connect g to its fanout nodes.We calcu- late Cw accurately using the placement information which provide the horizontal and vertical extents of the signal nets and using a technology file which pro- vides the capacitance per unit length of horizontal and vertical interconnect.
In Figure 4, fanouts of v are not yet mapped.This implies that the load CL, at the output of gate(m) cannot be determined exactly.This difficulty can be handled by assuming a default load, i.e., all types of gates are assumed to have the same input capacitance.This assumption is also adopted in MIS2.1.However, in order to calculate the wiring capacitance, positions of gates at the fanouts of v must be known.This information is not available and instead positions of the fanout gates are read from the initial placement solution of the subject graph.The simpli- fication gives rise to inaccuracies in the arrival time calculation.To prevent the inaccuracy from propagat- ing through, the following observation is used: When matching m at v, the capacitance at the output of in- puts(v, m) is known because the type and position of their fanout gate, which is gate(m), is known.If the output arrival times of inputs(v, m) are updated, then the input arrival time of gate(m) is accurate.There- fore, the output arrival time for gate(m) can be cal- culated with less error (that is, error will be due to the unknown load only).

PLACEMENT RELAXATION
After the logic synthesis stage, a net list of gates and a companion placement solution are available.The placement solution, however, has overlapping gates and has not yet been mapped to rows (in the case of the standard cell layout methodology) or to slots (in the case of the sea-of-gates style).The objective of global relaxation step is to eliminate gate overlaps and produce an even distribution of gates over the layout image.Two basic approaches are generally used for mapping a global placement result to legal locations: (1) Perform a minimum squared error lin- ear assignment which maps the cells in the global placement to the legal positions simultaneously; (2) Use a hierarchical hi-partitioning technique to obtain a feasible placement solution.
In our application, the initial (globally optimized) placement is modified throughout the synthesis (based on local considerations only), and therefore, the resulting companion placement is not globally op- timized.However, we do not want to throw away the companion placement (which has influenced many of our decisions during the synthesis) and place the mapped network from scratch.
We have adopted the top-down hi-partitioning heu- ristic in the following way.Our placement procedure consists of alternating and interacting global function optimization and partitioning steps.In particular, for a circuit with M gates, the placement procedure goes through m log2Mq steps in order to produce a detailed placement.Now, assume that an initial place- ment solution for the circuit and two parameters N and Nf are given.These parameters specify the start and finish conditions for the relaxation procedure, that is, relaxation begins when number of modules per hierarchical region is Ns and ends when this num- ber is Nf.Let s log2Ns], log2Nf], then m >-s -> --> 0. We modify the placement procedure so that it goes through steps m s m only, thereby, achieving the relaxation goal without drastically dis- turbing the initial placement solution.Note that Ns M corresponds to doing placement from scratch while Nf 1 corresponds to the detailed placement (one module per region).We have obtained the best results when setting Ns M/8 and Nf 1.

EXPERIMENTAL RESULTS
Our objective was to show that by integrating tech- nology mapping and gate placement, one can im- prove the quality of mapping both in terms of layout area and circuit performance.In order to provide a fair basis for comparison, two pipelines were used to produce the results: 1) Read in the optimized circuit; do balanced tree technology decomposition; read in the lib2.genlibstandard cell library; run MIS technol- ogy mapper in timing mode; write the mapped circuit to the database; do detailed placement and routing.2) Read in the optimized circuit; do layout driven tech- nology decomposition; read in the lib2.genlibstan- dard cell library; run LDTM in timing mode; do de- tailed placement and routing.The following tools were used for generating the layouts: the GORDIAN package for placement [14], the TIMBER WOLF global router [15], and the YACR detailed router [21].
In both cases, the technology-independent optimi- zations were performed using the MIS program.The benchmarks were optimized for minimum area using the rugged script [24].This script produces the opti- mized circuits.The literal count results (before tech- nology mapping) are listed in Table I.
Table II shows post-mapping comparisons between the MIS and LDTM in terms of total gate area and logic delay calculated by ignoring the wiring loads (i.e., using the input capacitances, the intrinsic and fanout delays for logic gates only), and the cpu times on a Sparc Station II workstation.The MIS mapper produces somewhat better results in terms of logic delay and is faster by a factor of 1.8.
Table III shows post-placement comparisons be- tween the MIS and LDTM in terms of total chip area and circuit delay calculated after placement and rout- ing and accounting for the wiring loads.As expected, LDTM shows an average chip area improvement of 7% and a total delay improvement of 9% compared to MIS.This improvement is mainly lue to reduced wiring load on critical signal paths as a result of LDTM's gate selection policy.We used a value of 3 pF/cm for the capacitance per unit length.As the technology scales down and the contribution of the wiring delay to total circuit delay increases, the per- centage improvement of a layout-driven mapper (such as LDTM) over a conventional mapper (such as that of MIS) will increase.
These tables were generated by running the MIS and LDTM mappers in the timing mode.A similar trend existed when we ran these mappers in the area mode.

CONCLUDING REMARKS
In this paper, we put forth techniques for coupling logic synthesis and placement.To achieve this objec- tive, we studied the effects of interconnect on circuit area and performance, presented appropriate models and computational procedures for estimating wiring delay during synthesis, and introduced a scheme for maintaining simultaneous and interactive data representations in logic and layout domains.
Layout information should be considered during all logic synthesis operations in order to improve the circuit routability and reduce the interconnect contri- bution to circuit area and delay.This is important since, as discussed earlier, interconnects play a pri- mary role in determining the longest paths through a circuit.Therefore, layout-driven algorithms and tech- niques for performing the various logic operations must be developed.In manipulating the initial representation of the logic function, five operations are key: decomposi- tion, extraction, factoring, substitution, and collapsing [5].We are developing layout-integrated proce- dures for these logic operation which aim at minimiz- ing the total interconnection length of the synthesized network as well as the total number of literals in the factored form representation of the network.For ex- ample, consider the extraction procedure which iden- tifies common subexpressions among various func- tions.The literal value of a kernel measures the dif- ference in the number of literals in the network if that rectangle is extracted and made into a new node [6,  4].We can similarly define the interconnection value of a kernel as the difference in the total wire length in the network if that kernel is extracted and made into a new node.We then use a kernel-selection policy which chooses a kernel with the greatest cost reduction in terms of a linear combination of literal and interconnection values [19].
Performance optimization logic restructuring operations (e.g., depth reduction [13], partial collapse and resynthesis along the critical paths [25], logic cluster- ing and partial collapse [26], etc) are often used to speed the synthesized network.Layout information can be exploited by most of these transformations.For example, consider circuit speed-up procedure given in [25].This procedure identifies an e-network (i.e., a sub-network in which all the signals have a slack within e of the most negative slack), partially collapses the nodes in the e-network, and redecom- poses these nodes using a timing-driven decomposition scheme such that the resulting network is faster, and the area increase is minimal.During this process, if information about the interconnect delay becomes available (through a placement of the original net- work and incremental calculation of positions for the collapsed and later newly created nodes), then the following improvements are possible: the timing analysis for finding the e-critical paths is performed more accurately; during decomposition both literal saving and interconnect saving values of the candi- date divisors are considered; interconnect delays in- fluence the selection of the best timing-divisors; and NAND-decomposition of collapsed nodes can be per- formed using information about positions of fanin nodes.
The layout-driven technology mapping procedure can be easily extended to solve the problem of mini- mizing gate area plus wiring subject to required time constraints on the primary outputs.The idea is to in- corporate the wiring area and delay into the initial calculation of area-delay trade-off curves and the sub- sequent gate selection steps [8].
Previous approaches for Table Look-Up (TLU) based FPGA's have aimed at minimizing the number of TLU' s.However, routing resources in these archi- tectures are very limited and therefore mapping for improved routability is an important consideration.The layout-driven approach can be extended to the FPGA synthesis problem.

FIGURE 2
FIGURE 2 Active gate area versus wire length trade-off.

FIGURE 5
FIGURE 5 Dynamic updating of placement positions (Euclidean norm).

TABLE form Multi
-level benchmarks: number of literals in factored

TABLE II
Comparison of the total gate area and logic delay after placement and routing.The cpu time reflects the mapping time

TABLE III
Comparison of the final chip area and circuit delay after placement and routing.The cpu time reflects the mapping, placement