Technology Mapping for FPGA Using Generalized Functional Decomposition *

In this paper, we address the technology mapping for RAM-based FPGA. Functional decomposition is applied to decompose a large function into a set of smaller subfunctions such that each subfunction can be implemented using a single logic cell. Our system is mainly divided into two parts. The first part is designed specifically for totally symmetric functions. A Fast-Decompose algorithm based on weight dependency is proposed. The second part deals with general functions. We consider some techniques such as output partition, variable partition, don’t care assignment and encoding to minimize the number of subfunctions derived. Using these techniques together, our tool, Fun-Map, improves the mapping results compared with other tools in terms of area and delay.


INTRODUCTION
ield Programmable Gate Arrays (FPGA's) con- sist of arrays of programmable logic blocks and programmable routing networks.It provides both fast turn-around time and user-programmability for ASIC design.There are mainly two types of FPGA architecture: one is lookup table (RAM) based (e.g., AT&T, Xilinx), and the other is multiplexer based (e.g., Actel).For RAM based architecture, a novel feature of these devices is that a basic logic cell can implement any Boolean functions that satisfy the I/ O constraints of the logic cell.In this paper, we concentrate on the technology mapping for the RAM based FPGA's.
The functional decomposition theory developed by Ashenhurst [1] have been used for designing switching circuits [4].It was also used for PLA de- composition  [5, 8, 30, 32, 33] and multilevel logic synthesis [9,26].In this paper, we apply functional decomposition to RAM-based FPGA technology mapping.Functional decomposition decomposes a function considering the functionality rather than the given network.
A totally symmetric function is one in which each of the input variables plays the same role in deter- mining the value of the function.Based on the weight dependency and functional decomposition, a fast al- gorithm using full-adders to map totally symmetric functions is also proposed.
The rest of this paper is organized as follows.In the next section, the functional decomposition for FPGA design is presented.Section 3 introduces over- view of the technology mapping system.In Section 4, a weight based algorithm specifically for handling 1 k} as the Boolean space spanned by X. Definition 2.4 Given a function f(X) and a parti- tion 7r (X, X:).Let b, b: BS(XO.are compatible with respect to 7r, if f(bl, c) f(b2, c) for tc BS(X2); otherwise, they are incompatible.It is denoted as bl ('/")b2/f.
[-I This compatibility divides BS(X1) into many equiv- alent classes.For any two elements in the same equivalent class, they are compatible.Definition 2.5 Given a function f(X) and a parti- tion 7r (X, X2).EQU(f, 7 0 is defined as the number of equivalent classes in BS(X1) with respect to zr.
Example 2.1 Let f(a, b, c, d) (-db + ab)c + (-db + ab)d and a partition 7r ({a, b}, {c, d}).Since f f,b C and fb f,g d, 00 11 and 01 10 with respect to 7r.And EQU(f, 70 2.   ff] Consider a Boolean function f(X) and a partition 7r (X1, X2).The decomposition of f with respect to 7r under consideration is f(X) ft(fe(XO, X2), where f, ft and fe are all multiple output functions.
The lower bound on the number of encoding func- tions, Ifel, depends on the number of equivalent classes in BS(X1).That is, Ifel >-log2 (the number of equivalent classes in BS(X1)) [28].
If the number of equivalent classes in BS(X1) is less than or equal to 2, it is a simple disjoint decom- position; otherwise, it is a complex disjoint decom- position [4].These two types of decompositions are shown in Figure 1.
Using functional decomposition to decompose f(X), two problems should be considered: (1) par- tition problem: to find a good partition 7r; (2) en- coding problem: to encode the equivalent classes.
For the first problem, a partition with minimal number of equivalent classes should be found to min- imize the number of interconnections between fe and ft.For the encoding problem, fe can be found by assigning different binary codes to the equivalent classes in BS(X).However, an assignment which minimizes overall cost of realizing the functions fe and ft needs to be selected.

FPGA Mapping with Complex Disjoint Decomposition
Simple disjoint decomposition handles only a very limited set of functions.For more general functions, we consider the complex disjoint decomposition.Definition 2.6 The support of f(X) is defined as the set of variables which f explicitly depends on.It is Xl X2 ft ,-'-'" f(X)

FIGURE
The functional decomposition of f(X) by r (X, X2).
denoted as sup(f).The sup(f) can be obtained by eliminating variables v which results in fo fv. [Definition 2.7 Given a function f and an integer k > 0. Then f is feasible with respect to k if Isup(f)] -< k; otherwise, it is infeasible.

V]
Recall that a logic cell can implement any k-input functions.If a function f(X) has Isup(f)l > k, it can not be realized by a single logic cell directly.To map an infeasible function into logic cells, functional de- composition can be used to decompose it into many feasible subfunctions.
From Equation (1), we know that the derived subfunctions are f, and re.In order to make fe fea- sible, the partition zr (X1, X2) used for decom- position must have ISal -< k.Then each output of fe can be implemented by a single logic cell.Similarly, to implement ft, the size of input set, Xt X2 re, of ft must be less than k; otherwise, f, must be further decomposed until all subfunctions are feasible.

SYSTEM OVERVIEW
In this section, our functional decomposition based technology mapping system is overviewed.The sys- tem is mainly divided into two parts.The first part is designed specifically for totally symmetric func- tions; the second part uses the decomposition tech- nique to decompose general functions.The flowchart of this technology mapping system is shown in Fig- The system proceeds by first removing the func- tions which can be implemented using a single CLB.Then, the remaining functions are checked to see if there is any totally symmetric function.The sym- metry is checked by examining on-set and off-set of the function, f [10].Characteristic set returns the symmetry character of f.It gives the number of l's among variables of X in on-set of f.A fast decom- position is then performed on totally symmetric func- tions using full-adders to synthesize subfunctions.This Fast-Decompose procedure will be discussed in detail in Section 4.
The general functions are handled in the second part.Some techniques such as output partition, vari- able partition, don't care assignment and encoding are considered.Output partition is to partition pri- mary outputs into several groups based on some cri- terion.Then each group is mapped individually.Variable partition is to find the "best" partition of input variables for a completely specified function.For incompletely specified functions, a variable par-  tition and don't care assignment are considered to- gether.To derive subfunctions ft and fe, we need encode the subfunctions fe.In many cases, the en- coding may generate don't care set to ft.This don't care set is very useful in the next level decomposition.These techniques are presented in Section 5.The functional decompsoition algorithm used in the system is described in Figure 3.

A MAPPING ALGORITHM FOR TOTALLY SYMMETRIC FUNCTIONS 4.1 Symmetric Functions
A function f(X) is said to be symmetric with respect to a set h C_ X, if and only if it is invariant under any permutation of variables in h. is called the symmetry set of f.If h X, it is totally symmetric; otherwise, it is partially symmetric.For the case 2, f is pairwise symmetric with respect to the sym- metry pair h.
For a totally symmetric function f, there is a sim- pler form to represent it.It can be specified using the number of l's needed for the function to be 1.This is stated formally in the following theorem [29].
Definition 4.1 For a totally symmetric function S, set A is the characteristic set of f.
Example 4.1 For the three-variable majority func- tion, f XlX.+ x2x3 + xx3, A {2, 3}; for the two-variable parity function, f x2 + x., A {1}. i-1 Since the value of f(x, x2,. x,) S depends on the characteristic set A, it can be also specified by an equation as follows: X --X n t-+ X ai for a; A C_ {0, 1,... n}. (2) It can be viewed as a kind of threshold logic in which each input x is assigned a weight 1.For any input vector a, if Equation ( 2) is evaluated to be true then f(a) 1; otherwise f(a) O.
Example 4.2 Consider the three-variable majority function f S:3.It is specified by the following equation: X + X2 + X3 ai for ai {2, 3}.
Based on a theorem proposed in [11], symmetry sets of a function f can be detected by first finding all symmetry pairs of f and then using these pairs to form larger symmetry sets.It is clear that if variables xi and x form a symmetry pair of f then f where fx,x is obtained from f by exchanging the variables xi and x.We have proposed an efficient transpositional operation [35] for computing fx from f in OBDD (Ordered Binary Decision Dia- gram) representation [2].With f and fxx repre- sented in OBDD, the equivalence checking, f fx,.x, can be performed in constant time.Using this method, it is relatively easy to detect large symmetry sets of completely specified functions.We also gen- eralize this method for incompletely specified func- tions.

A Weight Based Algorithm
The following lemma suggests that a totally symmetric function is especially suitable for functional decomposition.
Lemma 4.2 Given a function f(X).Let set X C X be a symmetry set of f.Then the number of equiv- alent classes in BS(X) is not greater than k + 1, where Is l k.
Proof: The Boolean space of X, BS(Xs), is divided into many classes depending on the number of l's in variables of X.That is, any elements (minterms)  with the same number of l's in the variables of X are in the same equivalent class.Counting the num- ber of l's in variables of X, there are at most k / 1 cases.Therefore, the number of equivalent classes in BS(Xs) is not greater than k + 1.

E]
For a totally symmetric function f(X) and a subset X X, where IXI k, we know that there is at most k + 1 equivalent classes in BS(X).Compared to general functions where the number of equivalent classes may be 2k, totally symmetric functions are especially suitable for functional decomposition.For k + 1 equivalent classes, we need at least [log2 (k + 1] subfunctions to encode the information con- cerning X.Since there exists exponential number of different encodings in terms of encoding length [28] and the synthesized circuit depends on the encoding very much, we need to derive an effective and effi- cient encoding algorithm. In order to retain the symmetric property of the function, we propose the following encoding method.Consider a totally symmetric function f(X) with characteristic set A and a subset X C X, where IXI k.First, we partition the elements in BS(Xs) into k + 1 groups according to the number of l's in an element.The elements in the same group have the same number of l's and are thus in the same equivalent classes.The weight of a group is the num- ber of l's in an element of the group.Then, the group with weight is encoded as the binary value i. Figure 4 shows the encoding for the case of [XI 4. Three subfunctions e0, ex and e2 are derived.For such an encoding, each bit corresponds to a weight of power of 2. For the example shown in Figure 4, e0, e, e2 correspond to the weight of 20 21 22 respectively.Now we partition inputs into m sets X,, X,... X,,,,, where X,., m X Xs,, X and X, f3 X, O for j.Encode each set of inputs as described above.We will have a structure as shown in Fig-  Let's denote the encoded outut bits for input set Ssm as emi for 0 -<--< l, where + 1 is the encoding length and represents 2 weight for the input set Xsm" By grouping the encoded outputs with the same weight together, the totally symmetric function f(X) ft(e.s) and ft can be specified as + ea, + + em,) -F 2(elo,+ eao + +emo) ai [.foraiEA C_{O, 1   ,n}. ( where + 1 is the encoding length.Notice that f, is a partially symmetric function with + 1 symmetry sets So, S, S. Each symmetry set Si {e,, e2i, em} has a weight 2i.Now for each set Si, we define a function f, as a function of S. ft counts the number of l's of el, ezi, emr Thus the func- tion ft is also a totally symmetric function.The de- fined function ft, can be decomposed and encoded using the same method.Note that ft is defined on ft and the weight of It must be multiplied by 2i.This f(X) f, FIGURE 5 The top level decomposition of f(X).
FIGURE 6 The decomposition process of 9-input f(X).
process can be applied recursively until the size of each symmetry set is less than a predefined number.
The last stage function is then synthesized straightforward according to the weights that each output bits represents.

FPGA Technology Mapping
In the Xilinx 3000 series family, a Configurable Logic Block (CLB) can implement any single function of up to 5 variables, or any two-output function of up to 5 input variables and each output depending on at most 4 inputs.In order to take the advantage of the Xilinx architecture, we set the input size of subfunctions to be 3. Thus there are 4 groups in the Boolean space of an input set and the encoding length is two which can be implemented in a single CLB.By such a grouping, the two encoded bits be- come the sum bit and carry bit of a full-adder, and a CLB corresponds to a full-adder.The symmetric function is thus constructed using the full-adder as a basic block [36].The decomposition algorithm is shown in Figure 8.
As shown in Figure 8, Fast-Decompose stops when the size of each symmetry set is less than 3 or function ft is feasible.The time complexity of this procedure is equivalent to the number of full-adders used.In f(X) t FIGURE 7 The synthesized circuit of 9-input f(X).
[34], we have formally proved that the number of full-adders used for constructing a totally symmetric function f S is less than n, where n is the number of input variables.Therefore, the time complexity of Fast-Decompose is bounded by O(n).

A GENERAL DECOMPOSITION ALGORITHM
To handle the general function and the last stage function resulted from decomposition of symmetric function, we consider the complex disjoint decom- position for mapping general Boolean functions.In this section, we map the general Boolean functions to FPGA's.Some techniques are proposed to im- prove the mapping results.They are briefly described in the following.
(1) Output partition: partition the outputs into a set of groups and then each group is decom- posed individually; (2) Variable partition: find the "best" partition 7r with minimum number of equivalent classes for completely specified functions; (3) Don't care assignment: assign the don't cares for minimizing the number of equivalent classes for incompletely specified functions; (4) Encoding: derive subfunctions ft, fe and gen- erate don't cares; non-decomposable functions are also handled separately.

Output Partition
For a given function f(X) and a partition zr (X, X2), we say that f is non-decomposable with respect to 7r if [log2 m] [XI[, where m is the number of equivalent classes in BS(X1).In some cases, a func- tion is non-decomposable if the multiple output func- tion is treated as a whole while it is decomposable if each individual function is handled separately. ,00 0 1 Ol 0 0 1 0 I0 0 0 0 i 11 0 0 1 0 FIGURE 9   (b)A.
However, if we consider individual output sepa- rately, we will not be able to take subcircuits sharing into consideration.
From the above observations, we suggest that the outputs of f should be partitioned into several de- composable groups.Then each group is decomposed individually.We propose to partition outputs by grouping the outputs with 'similar' best partition.Usually, for a given partition, the best partition is not unique.We first find all good partitions for each individual output.Then partition outputs into several groups by checking if two outputs have the same good partition or almost the same partition.It is more likely that grouped functions are decomposable if they have the same good partition.Moreover, since the grouped outputs use the same input partition, it is more likely to have subcircuits sharing.
As described above, the time complexity of output partition is dominated by finding good input parti- tions for each output.Since finding the best input partition needs to search all possible input partitions, it is exponential in terms of input size.Instead of using exhaustive searching method, we propose a branch and bound algorithm for finding good input partitions which will be described in the following section.Therefore, the time complexity of output partition is O(np), where n is the number of outputs and p is the time complexity of the branch and bound algorithm.
Based on the cost function, we also need an efficient algorithm for searching a best partition.In this sec- tion, we propose an easily computed cost function to estimate the number of equivalent classes of a given partition.A branch and bound algorithm is then developed to find the partition with least esti- mated cost among all partitions.
In [27], the number of equivalent classes for a given partition is computed using the number of dif- ferent overlappings of cubes.An overlapping of cubes means the intersection of the cubes is not empty.Intuitively, less number of cube patterns will produce less number of cube overlappings and thus less number of equivalent classes.By such an ob- servation, we consider a partition that has the small- est number of cube patterns as a best partition.Now, we define the cube patterns.Definition 5.1 Let Cf {Cl, c2, Cm} be the on- set of I(X) and 7r (Xl, S2) be a partition of X.Then cube pattern Cube(C f, X1) is defined as Cube(C I, X1) {li[=lri, (li, ri) E CI, 1,..., m} and Cube(Cf, X2) is defined as Cube(Cf, X2) {rilli, (li, ri) Cf, 1, m}, where li and ri are the coordinates associated with the variables in X1 and variables in X2, respectively.

Variable Partition
To find the best input partition, we need to define cost function for input partitions, and provide a procedure for computing the cost for a given partition.
which compares 2-bit binary numbers A (a, a2) and B (bl, b2).Assume we have two partitions ({al, a2}, {b, b2}) and r2 ({al, bl}, {a2, b2}).Then Cube(f, {al, a2}) {12, a2, al2, aa2}, Cube(f, {bl, b2}) {blb2, blb2, blb2, blb2}, Cube(f, {al, {albl, albl} and Cube(f, {a2, b2}) {a2b2, a2b2}.Thus cost(f, 7ri) 4 and cost(f, 7r2) 2. The de- composition charts of these two partitions are shown in Figure 10.It shows that the numbers of equivalent classes for 7r and 7r2 are 4 and 2, respectively.The partition 7r2 has smaller cost and less number of equivalent classeS:than 7/'1o 1--] With this cost function, we now show a branch and bound algorithm for finding the partition with the least estimated cost.Observation 5.1 Given a function f(X) and two X2) can be solved by finding a subset S C X and IsI min(ISl, Is l) with minimum number of cube patterns.Without loss of generality, we let X be the set of smaller size.Based on Observation 5.1, a branch and bound algorithm is developed for finding a subset X C X with the minimum number of cube patterns.
First, we find a best solution by including variables into X1 one by one.Variables are included into X1 in a greedy way.The best solution found is then used to bound the search space.We use an example to explain the branch and bound algorithm in more de- partitions 7r (Y, X-Y)and 7r2 (Z, X-Z), tail.
V3 bc(-d + de + def).To find a best partition 7r {X, X2} with ISll 3, the search tree is shown in Figure From the cost function defined above, we know 11.In the search tree, the square nodes at level 3 that the problem of finding a best partition zr (X1, are solution nodes and the circle nodes are the ex-

Root B B B B expansion nodes
solution nodes best FIGURE 11 The search tree of f -db(def + df + "f) + be(' + de + def).Only those expansion nodes which have cost less than the best so far are searched.An ex- pansion node with cost greater than the best is not searched because all solution nodes in this subtree have costs greater than the best by Observation 5.1.The expansion nodes with symbol B are bounded nodes.The search tree in Figure 11 shows that the optimal solution is the solution node with cost 2.
This estimated cost function has been tested for some circuits in the MCNC bench-marking set.Table I shows the results obtained by the estimated cost using branch and bound algorithm and by the exact cost using enumeration.The column Sum shows the sum of the number of equivalent classes of all outputs with respect to the partition found.The column CPU shows the running time in seconds.Note that ex- ample cmpn is a n-bit comparator and addn a n-bit adder.From the experimental results, our estimation cost can find the best partition for most cases.The branch and bound algorithm is basically an exhaustive searching method.The worst case com- plexity is exponential in terms of input size.However, Table I shows that for large examples, our al- gorithm can significantly bound the search space.
* indicates the CPU time exceeds two hours.5.3 Don't Care Assignment pansion nodes.In the search tree, each edge is tagged a variable v X.Each node N has a cost Cube(C, P), where P C X is the set of variables tagged on the path from Root to node N. The solution nodes are those nodes with [PNI 3.   Initially, the best cost best is initialized to co and rr (X O, X2 X).We find the best partition when [X] 1, X {a} is found to have the least cost.We now include the second variable into X1 starting with X {a}.The second level expansion nodes from X1 {a} are searched.The process con- tinues until we find a solution, X1 {a, b, c}.Now we search for a better solution by expanding expan-In some cases, we are given an incompletely specified function.In the other cases, when deriving subfunc- tions, a don't care set maybe generated to ft.This don't care set is very useful in searching a good de- composition.If the don't cares are assigned values properly, .thenumber of equivalent classes can be minimized.We show an example to explain this.Example 5.3 Given an incompletely specified func- tion f(a, b, c, d) and a partition 7r ({a, b}, {c, d}).Its decomposition chart is shown in Figure 12(a).
Two functions fl and f2 are generated by different don't care assignments.Their decomposition charts are shown in Figure 12(b) and (c), respectively.We  can see that EQU(fl, 70 2 and EQU(f2, )

K]
Now a problem arises: "How to assign the don't cares to minimize the number of equivalent classes for incompletely specified functions?" This problem has been formulated as the graph coloring problem in [25].It uses a heuristic for col- oring the graph with minimum colors.Each color is corresponding to an equivalent class.Rather than using graph coloring, a procedure is developed based on the algorithm proposed in [27].This algorithm is used to compute the communication complexity for completely specified functions.We modify this al- gorithm to make it suitable for incompletely specified functions.We briefly review this algorithm below.
The detailed procedure can be found in [34].
The algorithm used to compute the number of equivalent classes for incompletely specified function is mainly based on Lemma 1 in [31].This lemma is repeated here as Lemma 5.1.
Lemma 5.1 Let Cl(f) (Co(f)) be the on (off)-set of f(X) and zr (X1, X2) is a partition.Given v0, Vl BS(X1), Vo / Vl if and only if there are cubes (11, rl)  Cl(f) and (10, r0) Co(f), where li and ri are the coordinates associated with the variables in X1 and variables in X2, respectively, such that (1) r0 r # (2) v0 l and v 10, or v0 10 and v ll.
Our algorithm is mainly divided into two stages.
The first stage is to partition all rows (elements in BS(X1)) into many compatible groups considering the decomposition chart.In the same group, any two rows are mutually compatible.Two rows are com- patible if the care elements are compatible; that is, for all vertically corresponding elements, it does not happen that one row is output 0 and the other is output 1.For example, rows 00 and 01 shown in Figure  12(a) are compatible; rows 00 and 10 are in- compatible.Based on the above observation, we de- fine an operator shown in Table II(a).Lemma 5.2   is thus proposed for compatible checking.

V if
The second stage is to assign the don't cares for each compatible groups.An operator t shown in Table II(b) is defined for don't care assignment.If two rows R1 and R2 are compatible, the rows after don't care assignment are R R. We give an ex- ample to illustrate this.Example 5.4 Consider the same function f and par- tition 7r shown in Example 5.3.By Lemma 5.2, rows 00 and 01 are compatible and thus form a compatible group G.A group G of rows 10, 11, is formed in the same way.After don't care assignment, the rows for G1 and G2 are 0-0-1-0 and 1-0-0-1, respectively.
The resultant decomposition chart is shown in Figure 12(b).

K]
Since this algorithm is divided into two stages, we analyze the time complexity for each stage.For the first stage, it is mainly based on the algorithm proposed in [27], so the time complexity is O(pq), where p is the number of cubes in on-set of function f and q is the number of different overlappings of cubes, respectively.For the second stage, don't cares in the rows of a compatible group are assigned values ac- cording to the cubes of on-set and off-set in these rows.It is clear that the number of these cubes is no more than p + r, where p and r are the numbers of cubes in on-set and off-set of function f, respectively.The time complexity is therefore bounded by O(m(p + r)), where rn is the number of compatible classes.
By the above observations, the overall time com- plexity of this algorithm is O(pq + m(p + r)).

Encoding
The Encoding procedure is shown in Figure 13.It is mainly divided into two parts.The first part is to handle non-decomposable functions.In many cases, function f(X) may be non-decomposable with re- spect to all partitions.In order to decompose f, a simple heuristic is proposed.The heuristic is to divide f into k subfunctions such that each subfunction has [m/k] equivalent classes in BS(X1) with respect to 7r.Then f can be specified as: f fl -4-fz -4-'''-4-f.Now f is feasible with respect to k and may be im- plemented by a single CLB.Then each function f is decomposed individually.
The second part deals with decomposable func- tions.The procedure includes deriving subfunctions Procedure Encoding(f, X, r) 1 if f is non-decomposable with respect to r then { 2 Divide f into k subfunctions f, f,..., f such that  Generate don't care set to f; 7 return(f, f).
FIGURE 13 The Encoding procedure.
f, and fe" fe is derived by assigning different binary codes to the equivalent classes.A heuristic that as- signes two similar equivalent classes adjacent Graycodes is proposed.Similar means their rows have the same output values in most of the corresponding ver- tically entries.For example, for the decomposition chart shown in Figure 14, there are 3 row patterns P0, P1 and P2.P0 and P1 have 6 same outputs entries but P and P2 have only one same output.So we say that P0 is more similar to P1 than P2.This heuristic is try to make ft as simple as possible (with less num- ber of cubes).After fe is encoded, ft is derived straightforward.In many cases, a don't care set may be generated to f,.It happens when the number of equivalent classes is m and 2 [lg2 ml > m.This don't care set is very useful in the next level decomposition.As shown in Figure 13, the Encoding procedure is divided into two parts.For nondecomposable func- tions, the dividing of f into k subfunctions fi's can be done in constant time.So the time complexity of this part is O(kT), where T is the time complexity of Fun-Map.For decomposable functions, the time complexity is dominated by the similarity checking among those equivalent classes.Consider two row patters P and Pi.The similarity between them de- pends on the on-set of P P ((3 is an equivalence operation).That is, the larger the on set of Pi Q) Pj Po 011 P 001 P 010 the more similar they are.Using OBDD representation, this checking can be done in time complexity O(SS), where S and S are the sizes of OBDD's representing P and P, respectively.Suppose there are m equivalent classes with row patterns P, P2, Pro.The time complexity of similarity checking is then bounded by O(m2S2), where S max(S: the size of OBDD representing Pi).

EXPERIMENTAL RESULTS
The procedure, Fun-Map, has been implemented on the top of MIS [3] and runs on SUN4/370 (a 12.5 mips machine).The program reads in an input file written in blif format.The output networks are all verified by the "verify" command of MIS system.
We have tested examples from MCNC bench- marking set.In MCNC benchmarks, 9symml, rd84 and rd73 are totally symmetric functions.Table III gives the comparisons of Fast-Decompose and other tools in terms of area.Table IV shows the compar- isons in terms of number of levels.The columns C and L represent the number of CLB's and the num- ber of levels of the output network, respectively.Compared to the other systems, we have the best results in both area and delay.
Table V shows the results of Fun-Map compared with TRADE [25], mispga [22] and mispga (new) [23].TRADE was also run on SUN4/370.Both of mispga and mispga (new) were run on the DEC 5500 (2.28 mips machine).In this table, column 1" rep- resents the running time of the program measured by the time command of MIS.The experimental re- sults show that our program runs better than mispga and mispga (new) in most of the benchmarks in terms  Fun-Map   Using this approach, infeasible functions could be decomposed into several feasible subfunctions.Then each subfunction can be directly implemented by a CLB.
Our program is mainly divided into two parts.The first part is designed specifically for totally symmetric functions.The second part deals with general func- tions.In the first part, a weight based algorithm Fast- Decompose is proposed.In the second part, some methods such as output partition, variable partition, don't care assignment and encoding are considered to obtain better decomposition results.These tech- niques have been implemented and tested for many benchmarking examples.The results show that our algorithm is indeed effective.However, it is still pos- sible to further improve the generated results.For example, the criterion used for handling non-decom- posable functions can be improved.Currently, we are working toward finding a better technique for handling non-decomposable functions.
FPGA architecture since a *This work was supported by the National Science Council, R.O.C., under contract no.NSC 81-0404-E-007-610.

FIGURE 2
FIGURE 2 The flowchart of Fun-Map. ure5.

FIGURE 4
FIGURE 4  The encoding for n4.

Example 5 . 1
Consider a 2-0utput function f(a, b, c, d) consisting of fl and f:.The decomposition

FIGURE 10
FIGURE10 The decomposition charts of 2-bit comparison function.

FIGURE 12
FIGURE 12 An example of don't care assignment.

1 FIGURE 14
FIGURE 14 The example for similar rows.

TABLE Comparisons of
Branch and Bound Algorithm (B.B.) with

TABLE III Comparisons
of Number of CLBs both delay and area.Compared to TRADE, our program is also competitive.However, our program runs faster than TRADE.7DISCUSSIONAND CONCLUSION In this paper, functional decomposition approach is applied to RAM-based FPGA technology mapping. of