Two-dimensional Placement Using Tabu Search

Search based placement ofmodules is an important problem in VLSI design. It is always desired that the search should converge quickly to a high quality solution. This paper presents a tabu search based optimization technique to place modules on a regular twodimensional array. The goal of the technique is to speed up the placement process. The technique is based on a two-step placement strategy. The first step is targeted toward improving circuit routability and the second step addresses circuit performance. The technique is demonstrated through placement of several benchmark circuits on academic as well as commercial FPGAs. Results are compared to placements generated by commercial CAE tools and published simulated annealing based techniques. The tabu search technique compares favorably to published simulated annealing based techniques, and it demonstrates an average execution time speedup of 20 with no impact on quality of results when compared to commercial tools.


INTRODUCTION
Two-dimensional placement is a well studied topic.
However, the importance of placement cannot ever be ignored due to changing design complexities and requirements.One technology that is evolving very rapidly is field programmable gate array (FPGA).Currently, commercially available devices can map up to four million gate equivalent designs [12] and some of the newly announced products like Altera's APEX series [11] will map two million gate equivalent designs.Typical CAD flOW for mapping circuits to FPGAs takes place in four inter-dependent steps: design entry, technol- ogy mapping, physical placement, and intercon- nect routing.Improvements in CAD tool technology for mapping circuits to FPGAs has not kept pace with hardware improvements.Currently it takes minutes to hours to map circuits of 10 K gate equivalent designs to FPGAs.We need faster algorithms that provide high quality (relative to mapped circuit performance) results.Our goal is to speed up the mapping process by speeding up the placement step.Motivated with  this goal, we have designed a two-step fast placement tool for array based designs.Our algorithms make use of tabu search [9] based optimization for finding good placement solutions.Circuit rout- ability is enhanced using an edge based model for minimizing the total wire length of the circuit.The circuit timing optimizations are carried out using an edge based model for critical path length minimization.As a practical demonstration, we map several benchmark designs on FPGAs and compare the results against a simulated annealing search based technique for placement [21].We also compare the results to the simulated annealing based, ultra fast placement work done in [22], and we compare results with commercial CAE tools from Xilinx, both the XACT 5.2 P PR tools and the vI1 tools.We show significant speedup relative to convergence on good quality solutions when compared to simulated annealing.Similarly we show favorable results when compared to [22], and we show an average execution time speedup of 20 with no impact on quality of results when compared to commercial tools.
This paper is organized as follows.In Section 2 we describe fundamentals of the tabu search opti- mization technique.In Sections 3 and 4 we formally describe the placement problem and related re- search.In Section 5 we describe our two-dimen- sional placement solution.We formally describe the model we used for placing circuits on two-dimen- sional arrays.Then we describe our placement based on total wire length minimization and critical edge length minimization respectively.In Sections 6 and 7 we describe our test methodology and analyze the data, and in Section 8 we conclude the paper by providing a summary and ideas for future work.

TABU SEARCH
In this section we present an overview to the tabu search optimization technique.Tabu search is a meta-heuristic approach for solving constrained optimization problems.When used properly, tabu search approaches near optimal solutions in a relatively short amount of time compared to other non-deterministic random move based methods [9].Unlike approaches like simulated annealing that rely on good random choices, tabu search exploits both good and bad strategic choices to guide the search process.Tabu search uses the idea of a move to define the neighborhood of any given solution, and imposes restrictions in the form of a tabu on certain moves to avoid local optima.
As a meta-heuristic, tabu search guides local heuristic search procedures beyond local optima.In tabu search, a list of possible moves is created.In the short term, as moves in the list are executed, tabu, or restrictions, are placed on the executed moves in order to avoid local optima.This tabu is typically in the form of a time limit, and unless certain conditions are met (e.g., aspiration criteria), the move will not be performed again until the time limit has expired.This short term phase is associated with intensification of the search strat- egy.During the intensification stage, short term memory is used to explore closely related solutions in the local neighborhood.Good tabu search strategies also include long term memory that is used to diversify the search.Diversification moves the current solution out of the local neighborhood.For example, frequency of moves is a good candidate for long term memory storage.Penalties can be placed on moves that are fre- quently executed.Then, less frequently executed moves can lead the search to unexplored areas.modules that will be placed onto a two-dimen- sional array L. The set S is used to represent the signals connecting the modules in M together.Given a set of modules M-{ml,mz,...,mn} and a set of signals S-{Sl,Sz,...,Sq}, we asso- ciate with each module mi E M a set of signals Smi, where Smi C_ S. Similarly, with each signal si E S we associate a set of modules Ms;, where Msi- {mjlsiSmj}.Ms is said to be a signal net.We are also given a set of locations L-{ll, 12,..., lp}, where p_>lM I.The placement problem then becomes how to assign each module miG M to a unique location /j.L such that an objective function is optimized [4].In our case, the object- ive function is to maximize circuit performance by minimizing path length.For the case of map- ping mi M to a regular two-dimensional array, each /EL is represented by a unique (xj., yj)   location on the surface of the two-dimensional array where xj and y are integers Figure shows the 16 element set L for an example 4 x 4 two- dimensional array.

RELATED WORK
Much research is associated with placement and floorplanning for FPGAs [1 8, 10, 13- 20,23-26].However, use of the tabu search technique for placement is very limited.In this section we briefly describe research re- lated to tabu search for placement or floorplanning.
Song and Vannelli developed a tabu search based placement algorithm for minimizing total wire length [23].Their cost function was based only on total wire length, and therefore, designed to enhance routability and not performance, whereas our algorithm also improves circuit performance.
Lira, Chee and Wu developed a tabu search based placement with global routing strategy for standard cells [14,15].Their algorithm for stan- dard cells is a divide and conquer strategy based on successive partitioning while ours uses force directed placement.

PLACEMENT
In this section we describe our two-dimensional placement solution.First we describe our model t0r abstracting the information from M and S into a Graph G. Then we describe our total wire length minimization and critical edge length minimiza- tion tabu searches respectively.

Model
We convert each multi-terminal net to a set of edges where each edge consists of the driving terminal and one driven terminal.We use this model to keep net sources and sinks in close proximity thereby enhancing circuit performance.We create the set ofedges by converting the hyper-graph input circuit model described earlier to a graph G (V, E) where V-{11, v2,...Vn} IVln, E: {el, e2,.., em}, and IEIm.Each vertex vi E V corresponds to a circuit module mi E M. Each edge ei E E connects a pair of vertices (vj, vk)lvj, VkV.The elements of E are created by considering each signal, si S. If we let mj Msi be the source module for signal si then an edge (v, vk) is added to E for each mk Mi[j :/: k.
At any given time, each element of V is mapped to a unique element of L, and the minimum requirement for mapping is IV[ _< ILl.
For the first step of our tabu search based placement strategy, TS_TWL, we seek to enhance routability by minimizing total wire length (TWL).We estimate TWL using the Manhattan length of each edge egEE, and we seek to minimize the following function: TWL M Length(ei).
Vei E The second step of our tabu search based placement strategy, TS_EDGE, seeks to enhance circuit performance by minimizing the length of critical circuit edges.To accomplish this, we traverse G and determine a path weight pwi for each path pi P where P is the set of all paths for G.For simplicity we let pwi be the maximum level for each PimP.Edges in critical paths receive a higher weight.Figure 2 shows an example circuit with six paths.In Figure 2 path Pl is at level 1; paths P2 and P3 are at level 2; and paths P4, P5 and P6 are at level 3. We associate with each edge e E a set of paths Pe., where Pej C_ P. For example, in Figure 2 we have Pe, {Pl) for el E, Pe2--{P2,P3,P4,P5,P6} for e E E, Pe3 {P4,p5,p6} for e3EE, Pe4 {P2} for e4EE, Pe5 {P3} for e5E, Pe6 {/94} for e6 E, Pe7 {P5 } for e7 E, and Pe {P6} for e8 E E. Then we determine a weight w for each edge ej.E E.
Vej E, wj max(pwi) Vpi Pej For example, in Figure 2 the weight for edge e2 is the maximum path weight for each path in the set {Pz, P3,P4,Ps,P6} or w2 3. Similarly for e3 in Figure 2, w3 is the maximum path weight for the all paths in the set {P4,Ps,P6} or w3-3.The determination of the edge weights is accomplished with a breadth first search.Then we weight the Manhattan length of each edge ej E by multi- plying the Manhattan length of edge e by its corresponding weight w.For our timing driven tabu search based step, TS_EDGE, we use a two part optimization function.First we minimize the weighted length of the longest edge.Second, since many configurations may have the same weighted longest edge length, we add together n of the longest edges (NLE) and minimize NLE in the event of a tie.Key to the development of a tabu search is a search list.For TS TWL our search list U consists of all possible swaps of vertices occupying adjacent locations in L. This implies two basic swap moves: horizontal (swap of adjacent vertices with the same y coordinate) and vertical (swap of adjacent vertices with the same x coordinate).Valid swaps also include the exchange of a vertex from a position in L into an adjacent empty location in L.
There are two reasons this move type was chosen: (1) to keep the move list short, and (2) to mini- mize the overhead of updating the move list after a move is executed.Given a two-dimensional array L of width W units and height H units, there are IUI ((H x (W-1)) + ((H-1) x W)) , 2(H x W) possible swaps or moves in U. Therefore U= {Ul, Uz,...,u,,} where n= IuI. Figure 3 shows an example horizontal swap move ui and vertical swap move uj.For TS_TWL, given a random initial placement in L (by selecting an appropriate sequence of moves from U) we seek to optimize our objective function, minimization of TWL.
In TS_TWL, each move ui E U has an associated attractiveness A or sum of the adjacent forces pulling on the vertices vj and vk that make up move ui.For the vertical move we have and for the horizontal move Ai-M(vj) x PN(vj)+ M(vk) x PS(vk).
Each vertex iEV has one multiplication factor M(vi) and four associated pulls or forces: PN(vi), U Ai--M(vj) x PE(vj) + M(v) x PW(vg) PE(vi), PS(vi), and PW(vi).If the functions X(vi) and Y(vi) respectively return the current x and y coordinates of vertex vi then, For example Figure 4 shows edges el =(vl, v2) and e3 (Vl, v3)E Evl.Therefore we can calculate Over- all we see vertex v has a slight pull to the north and a strong pull to the west.Similarly in Figure 4 we see vertex v4 has PN(v4) PSv4) --0, PE(v4) 2,   and PW(v4) -2 due to edge e2 (v4, vs)E Ev4.
Figure 4 shows horizontal move ui consists of swapping the positions of vertices vl and v4.If ini- tially Mv j)= 1Vvj.V we can calculate the attrac- tiveness Ai for horizontal move ui in Figure 4, Ai M(v4) x PE(v4) -+-M(Vl) x PW(Vl) x 2 + x 3 5.In a similar manner Ai is calculated for each ui U. (For a given move ui U if one(both) of the adjacent slots is(are) empty of vertices then the pull(s) corresponding to the empty slot(s) is(are) set to 0.) If we used the move list U in a typical greedy search strategy (i.e., given an initial placement find a move that would improve the minimum TWL) we would quickly reach a local optima.This local optima would probably not be close to the global optima or minimum TWL.However, by applying the concepts of tabu search i.e., accepting strategic moves that may not improve the current minimum e 3 FIGURE 4 Example pull calculation.TWL we climb out of local optima to rapidly converge on near optimal solutions.After execut- ing move ui E U we set a tabu tenure for ui.Move ui will not be executed again until the tabu tenure has expired or our aspiration criteria is satisfied.In this way we climb out of local optima and accept the current best move even if it does not improve the current minimum TWL.

Timing Driven Placement
For our timing driven tabu search based algo- rithm, TS_EDGE, we use the edge list E as our search list.We order our edge list E in descending order according to each edge's weighted Manhat- tan length.Then search the edge list looking at each of the two vertices attached to each edge as possible candidates for a move.Therefore in algorithm TS_EDGE, E= {el, e2,...,en} where n= IEI is our search or move list.The vertices attached to the edges with the longest weighted Manhattan lengths are the most attractive candi- dates for moving closer together.By moving these vertices closer together, the longest edges are shortened thereby enhancing circuit performance and reducing the longest paths.Once an edge is selected from the search list, we look at only one of the edge's two vertices as a possible move candidate.For simplicity we pick one of two possible moves for the vertex selected: vertical swap or horizontal swap.For the horizontal swap adjacent vertices with the same y coordinate are swapped.For the vertical swap adjacent vertices with the same x coordinate are swapped.Figure 5 shows an example horizontal swap move for vertex v2 attached to vertex 1 by edge eiE E. In this case the Manhattan length of edge ei is reduced by one.Figure 6 shows an example vertical swap move for the vertex v5 attached to vertex v4 by edge ej E. Similarly edge ej is reduced by one.In our tabu search based algorithm, TS_EDGE, given a random or otherwise initial placement in L (by selecting an appropriate sequence of moves from E) we seek to optimize our objective func- tion, minimization of the longest weighted edge length (or in the case of a tie, NLE).
If we used the search list E in a typical greedy search strategy (i.e., given an initial placement find a move that would improve the current minimum longest edge length) we would quickly reach a local optima.This local optima would probably not be close to the global optima or minimum longest edge length/NLE combination.However, by applying the concepts of tabu search, i.e., accepting strategic moves that may not improve the current minimum NLE, our TS_EDGE algo- rithm climbs out of local optima to rapidly converge on near optimal solutions.After execut- ing a move for a vertex on edge ei E we set a tabu tenure (number of iterations a vertex' position is locked) for the moved vertex on edge ei.This vertex on edge ei will not be moved again until the tabu tenure has expired or our aspiration criteria is satisfied.In this way we climb out of local optima and accept the current best move even if it does not improve the current best solution.

TEST METHOD
We empirically tested TS_TWL, TS_EDGE, and the 2-step (TS_TWL followed by TS_EDGE) tabu search based placement methodologies described above using Xilinx Netlist Format (XNF) benchmark circuits available from MGNG (email benchmarks@mcnc, org) and netlist benchmark circuits from the University of Toronto (http:// www.eecg.toronto.edu/vaughn).For compar- ison to the simulated annealing technique, we placed the benchmark circuits using our algorithms as well as a simulated annealing placement algorithm with the same TWL cost function [21].
For the XNF benchmark circuits (see Tab. I), we used Xilinx routers to route all placed circuits.Additionally we placed and routed the XNF benchmark circuits using Xilinx P PR and the more recent M1 tools for comparison of execution times and result quality.We used statistics available from the Xilinx tools to compare XNF circuit placement quality.For the benchmark circuits from Toronto (see Tab. II), we placed the circuits using our TS_TWL algorithm and compared the execution time to the ultra fast placement [22] work done at Toronto.We assume each of the  vertices in the circuits can be mapped to one and only one location on the smallest square array L such that L]-]V/] 2. The unix time function was used to determine system placement times for our tabu search algorithms and the simulated annealing based algorithm.

RESULTS
Figures 7-9 show respectively execution time on the x-axis and current minimum TWL on the y-axis for example runs of TS_TWL and simulated annealing on XNF circuits c2670, c3540 and c6288.In each of the figures, the solid line shows TWL versus execution time for TS TWL and the dashed lines show TWL versus execution time for the simulated annealing algorithm set for three different rates of convergence.These figures de- monstrate the fast convergence of the generic tabu search on good solutions relative to the generic simulated annealing algorithm.annealing algorithm approaches the near optimal solution of minimum TWL, but it suffers from either slow start up time or slow overall rate of convergence depending on how the simulated annealing parameters are chosen.Overall, these graphs demonstrate the applicability of using tabu search as a stand alone placement tool or as an ideal first pass placement method for initializing the input for further refinement by simulated annealing or other random move based approaches to placement.It should be noted that many methods exist to enhance simulated annealing, but many of these can also be applied to tabu search to speed it up.
Table III shows average execution times for the placement step of circuit mapping by TS_TWL, TS_EDGE, and combined tabu search approach (TS_TWL then TS_EDGE).The same random placements were used as inputs to all algorithms (except for H1 where we had no control over technology mapping).All algorithms and tools were executed on an UT,'RAS'AC1 workstation.Times for p pp, and H1 were taken for the default tool settings and just used for comparison.We found that performing a 2-step approach (TS_TWL then TS_EDGE) could greatly reduce the execution time of our tabu search based algorithms.Ob- viously a true comparison cannot be made between tabu search and the commercial tools since the objective functions of the commercial tools are unknown; however, we provide data from the commercial tools for informational purposes.The 2-step tabu search is approximately 25 times faster than p P and 20 times faster than Table IV shows the static delay calculations done on the postrouted XNF circuits.The worst case static pad-to-pad delay for the 2-step tabu search is very similar to that of the XNF circuits placed by   PEP, and M:k.Therefore for the benchmark XNF circuits used to test the tabu search method, result quality was similar to that of the commercial tools.Table V shows the execution time comparison of TS_TWL to the Ultra Fast placement tool and the modified VPR tool (modified to improve execution time at the cost of placement quality [22]) from the University of Toronto [22].A direct comparison of the algorithms cannot be made since they have different goals and cost functions; however, relative to execution time, the tabu search method is on the same order as that of the tools from Toronto.
8. CONCLUSIONS We have described a two-step routability and per- formance driven tabu search based search algorithm for placement of circuits on two-dimensional arrays.Our tabu search based method performs extensive local and diversified searches that result in very well placed circuits resulting in high performance.We demonstrated the approach with benchmark circuits from MGNC; and the University of Toronto.For the benchmark circuits from MGNC;, we compared the results to both simulated annealing [21] and commercial tools.Our results demonstrate that good placement was determined much quicker and of similar quality to that of the commercial tools.For the benchmark circuits from Toronto, our placement times compared favorably to the Ultra Fast placement work at the University of Toronto.We feel the tabu search based methods presented here can be used in a stand alone fast placement tool for large designs or for fast initial placement to use as input to other placement algorithms.
Our future work includes adding routability and performance estimation to the tabu search based placement approach.This will reduce the necessity of successively performing the placement step and allow feedback for any necessary itera- tions of the circuit mapping process.

FIGURE 3
FIGURE 3 Example horizontal and vertical moves.