Timing-Driven Circuit Implementation

We consider the problem of selecting the proper implementation of each circuit module from a cell library to minimize the propagation delay along every path from any primary input to any primary output subject to an upper bound on the total area of the circuit. Different module implementations may have different areas and delays on the paths. Wc show that the latter problem is NP-hard even for directed acyclic graphs with two implementations per module and no restrictions on the overall area of the circuit. Wc present a novel retiming based heuristic for determining the minimum clock period on sequential circuits. Although our heuristics may handle a bound on the total area of the circuit, emphasis is given on the timing issue.


INTRODUCTION
The circuit implementation problem studied here is related to the technology mapping problem studied earlier by Brayton et al. [I], Kcutzcr [7], Pcdram et al. [13], Touati et al. [16,17], Chau- dhary et al. [3] and others.The authors above examine the problem of mapping a Boolean network using gates from a finite size cell library.However, in this paper, wc consider Boolean networks that have already bccn mapped.
Wc examine the problem of selecting, from a cell library, an implementation for each module so that the propagation delay along any path from any primary input to any primary outpUt is minimum.It is desirable that the total area of the circuit does not exceed a given bound.Every module imple- mentation may have different delays along the paths connecting different pairs of input-output (I/O) terminals.Different implementations may also have different areas.In [2,11], a similar problem was studied, the basic circuit implementa- tion problem (BCI), where different implementa- tions may have different areas, but the delays within each module are uniform.
The circuit implementation problem is very complex since many factors, i.e., delay, area, power, must be taken into consideration.How-ever, a recent trend in Computer Aided Design (CAD) focuses on timing driven applications, where priority is given to the maximum delay between any I/O path in a designed combinational circuit, or between any two flip-flops in a synchronous sequential circuit [15].Thus, our primary goal is to select module implementations so that we minimize the maximum delay of a given circuit.For the sake of simplicity, we focus on the Timing-Driven General Circuit Implementation (TDGCI) model,   where the module implementations are selected without considering the area of each implementa- tion.However, the heuristic solutions presented in this paper can be easily extended to consider a bound on the total area of the implemented circuit.
We consider the pin-dependent MIS library delay model as was formulated in [3], where the arrival time arrival() at the output go of some module g is a complex function expressed as arrival (go, Cgo)=maxg,inputs(g)(rg,.go d-Rg,,goCgo d- arrival(g, Cg,)), where rg,.go is the intrinsic gate delay from input g to output go of g, Rg,,g is the drive resistance of g corresponding to a signal transition at input g, Cgo in the load capacitance seen at go, and arrival (gi, Cg,) is the arrival time at input g corresponding to load Cg, seen at that input [3].The load capacitance Cg depends on the input pin capacitances of the gates it is driving2.Observe that if all pin capacitances of all module implementations are the same we result to a more simplified delay model, the simplified TDGCI problem where every module implementation has local delays on its various paths that do not depend on the delays on paths of other modules in the circuit.That way, the delay along a path can be computed by simply adding the delays on the module edges on the path.
Most of the previous work in the literature is on a simpler model, the BCI problem [2, 11, 10].In this model the delay at each module ouput does not depend on module paths.For example, the rising transition of an m-input CMOS N AND gate can be approximated as [18]: tr=Rp/n(mnCd+ Cr + kCg), where Rp is the effective resistance ofp- device in a minimum-sized inverter, n is the width multiplier for p-devices in this gate, k is the fan- out, m is the fan-in, Cg is the gate capacitance of a minimum-sized inverter, Ca is the source/drain capacitance of a minimum-sized inverter, and Cr is the routing capacitance [18].A similar approxima- tion is given for the falling transition.Observe that due to the different capacitances, the delay of one module may depend on the delay of a neighboring module.However, the authors in [2,11,10] have considered a simplified BCI model so that the delay along a path can be computed by summing the delays on the modules on the path.Chan [2] has shown that the simplified BCI problem is NP-hard for circuits with tree topology.Furthermore, for the simplified BCI model, Chan [2] has given a pseudo-polynomial time algorithm for trees, and a heuristic for basic circuits modeled by directed acyclic grphs (dags).Later, Li et al. [11] showed that the simplified BCI problem in NPhard.They also developed a pseudo-polynomial time algorithm that obtains optimal solutions for basic series-parallel circuits [11].In addition, they proposed six heuristics for basic combinational circuits under the simplified BCI model, without actually providing that the BCI problem on combinational circuits in NP-hard in the strong sense [11].We have shown that the latter problem is indeed NP-hard by reducing from the One-In-Three 3SAT problem.The reduction is given in the Appendix.The later was also recently, and independently, shown in [10] by reducing from the 3SAT problem.Both reductions hold for the restricted case of two implementations per module.
It appears that it is often the case in VLSI where parameters related to capacitances are ignored so that algorithmic solutions are obtained easier.This is for example the case in the clustering problem for delay minimization studied in [8, 12, 14], among others.
2The latter recursive formula is slightly different from the one in [3] since we only consider Boolean Networks that have already been mapped.
In another context, the authors in [9] allow similar simplifications when they perform a retim- ing in a sequential circuit.Retiming is a technique that allows repositioning of the existing flip-flops of a sequential circuit so that its operation is not modified and the clock-period is minimized.This is equivalent to maintaining the same number of flip-flops at each cycle.Leiserson and Saxe have presented efficient retiming algorithms [9].In fact, in this paper we modify one of the algorithms in [9] to solve the TDGCI problem efficiently for sequential circuits.Thus, the heuristics proposed in this paper are derived by initially considering the simplified TDGCI problem.At this point we decide the implementation for every module.Subsequently, we report the actual delay according to the pin-dependent MIS library delay model.That way we derive accurate solutions in a more efficient (in terms of time response) manner.
The paper is organized as follows.In Section 2, we solve an interesting open problem and we show that the TDGCI problem remains NP-hard on directed acyclic graphs (combinational circuits), where all the modules in the library have the same pin capacitanes and there are only two possible implementations for each module.This is an improvement over the result in Li et al. [11].
In Section 3, we consider the TDGCI problem on general circuits and we given heuristics for both combinational and sequential circuits.In sequen- tial circuits the goal is to minimize the clock period.We define the latter problem as that of selecting an implementation for each of the circuit modules, so that the clock-period of the circuit is minimized and the circuit function is preserved.As in [9], we assume delays on the combinational components only, and not on the flip-flops.In the first part of Section 3 we present a method for combinational circuits.This approach resembles the iterative improvement methodology, a popular approach for CAD, which needs to be modified so that it is more time efficient.We emphasize that iterative improvement turns out to be a very expensive (time consuming) operation, especially when one insists on working directly on the pin- dependent MIS library delay model, whereas our proposed method is much more efficient both in time response and surprisingly in quality of results when compared to iterative improvement.Then we present an O([ VI2[E [log[ V ) time approach that uses retiming techniques as well as the latter method to obtain efficient solutions for the.circuit implementation problem on sequential circuits.Our proposed heuristics are then compared with two alternative approaches in Section 4.
Below we present some notation.A circuit is an interconnected set of modules.Each module has a number of input and output pins.A module with p input pins and q output pins is called a (p, q)module.Some of the input pins of a (p, q)-module may be primary inputs and some output pins may be primary outputs.

THE COMPLEXITY OF THE TDGCI PROBLEM
We first show that the TDGCI decision problem is NP-complete on directed acyclic graphs with two implementations per module and uniform pin capacitances.

TIMING-DRIVEN CIRCUIT IMPLEMENTA-TION PROBLEM (TDGGI)
Input" A combinational logic circuit, two imple- mentations for each module of the circuit, and a delay D. Output: Is there an implementation, such that they delay of the given circuit is at most D?
We reduce from NP-complete problem in [5]: has Ickl---2, positive integer K < Question: Is there a truth assignment for U that simultaneously satisfies at least K of the clauses in C?
THEOREM 2.1 The TDGCI decision problem is NP-complete in the strong sense even if the maximum number of possible implementations for each module is two.
Proof The TDGGI decision problem is easily shown to be in NP.Next, we trasform MAX-2SAT to TDGCI.Let U {Ul, u2,...,un} be a set of variables and C (cl, c2,...,Cm} be a set of clauses making up an instance of MAX-2SAT.We shall construct an instance of TDGCI such that the MAX-2SAT instance is satisfiable if an only if the.constructed TDGCI instance has an implementa- tion with delay at most D. We obtain the TDGCI instance as follows.
Let ll k, 12 be the first and the second literals of clause ck, 1 < k <]C ], respectively.Note that any literal l/k is either equal to some u: or its complement u., where u: E U. For every clause Ck, < k < CI, we costruct two modules called variable modules and a module called clause module.We call these three modules the k th block.
The structure of the variables and clause modules is given below.
Each variable module is a (4,4)-module and corresponds to a literal in the particular clause.
For example, if 11 k u and 12 k u3, we construct two variable modules which we label U2 and U, respectively.The structure of module U2 k and U, respectively.The structure of module U/k is independent of whether the respective literal is ui or u' and is given in Figure a.Let the r th input and r th output of module U/k be labeled U/k" and U/k', respectively.The la-beling of the module's inputs and outputs is also given in Figure la.The clause module Ck, corresponding to clause Ck is a (2, 2)-module.Let the r th input and r th output of module C be labeled C and C , respectively.The structure of clause module C is given in Figure lb.
Observe that it contains 2 internal nodes (this constraint can be removed but it simplifies the description of the reduction), and 5 internal edges, labeled e/k, 1 < < 5, respectively.
The variable and clause modules are connected

FIGURE
The reduction for the TDGCI problem.u) l e r a is urn, u e U, then C is connected to -m m Similarly, if lis Un, u,, e U, then C is connected u' eU, then C is to U n ', and if 12 e is Un, n connected t& U n':.
Observe that up to this point we have specified only the connections on the first two inputs and outputs of every variable module in our design as well as all the interconnections to and from any clause module.(We postpone the remaining inter- connections for later.) We will now give two possible implementations for each variable and clause module, and an upper bound D on the delay, so that we guarantee that at most.K clauses are satisfied if and only if the delay is at most D. This will be shown assuming a consistent assignment of true/false values on all appearance of each variable.This consistency will be guaranteed later via connections of the 3rd and 4th inputs and outputs of each variable module.Every variable module U/k has two implementations: If ui is assigned the value true, then we assign delays 0, 1, 0, along the edges connecting the ith input and output, < < 4, respectively.If ui is assigned the value false, then we assign delys 1, 0, 1,0 respec- tively.Similarly, each clause module Ck has two implementations.The implementation of a clause module depends on the true/false evaluation of the first literal of the respective clause.If the first literal of a clause is true then we choose the implementation where e k 1, and e2 k e3 k e4 k e 0. Otherwise, we choose the second imple- mentation where 1, and elk=e3 k =eak= e5 k o.Finally, we set the delay D to be 2m-K.
An example of the TDGCI instance corresponding to a MAX-2SAT instance is shown in Figure lc.
We now show that the MAX-2SAT instance is satisfiable if and only if the constructed TDGCI instance has an implementation with delay D' at most D. Let Ck be a clause and consider the path from any of the first two inputs of the corresponding variable modules to any of the two outputs of the clause module Ck.If either one literal or both literals of Ck are evaluated to be true, then both outputs of Ck will have delay 1.If both literals are evaluated to be false, then both outputs will have delay 2. From the way we constructed the TDGCI instance, the delay up to the ith block is added to the delay up to the + 1st block.Thus, we ensure that every satisfied clause Ck increments the delay up to the k th block.On the other hand, if Ck is an nonsatisfied clause, the delay up to the k th block is increased by 2 units.
If the MAX-2SAT instance is satisfied, there at least K clauses satisfied and m-K clauses that are not satisfied.If thereare exactly K clauses satisfied, then D'=2(m-K)+K=2m-K.It is easy to see that in the constructed TDGCI instance of Figure Conversely, suppose that the constructed TDGCI instance is satisfied.This implies that the delay along the longest path is at most 2m-K, or equivalently, there are at least K blocks with delay 1.The latter implies that there are at least K clauses that are satisfied in the MAX-2SAT instance.
Up to now, we have assumed that we can ensure a consistant true/false assignment on all appear- ances of each variable.This can not be ensured necessarily with the up to now construction.However, we will enforce the latter by enlarging our construction as follows: For every variable ui of the MAX-2SAT instance, we construct a (2, 2)module which we call control module and we label it as CUi.Let the r th input and r th output of Module CU be labeled CU and CH respec- i tively.The structure of such a module is given in Figure ld.Note, that a variable ui may appear in more than one clauses, and for each of its appearances we have created a new variable module.We connect these variable modules with their coresponding control modules as follows: We connect the 3 rd and 4 th outputs of the variable module corresponding to the k t appearance of variable u, 1 <k, with the 3 ra and 4 t inputs, respectively, of the variable module corresponding to the next (k + t) appearance of variable u.We continue in this manner and last we connect the 3 ra and 4 th outputs of the variable module corre- sponding to the last appearance of variable u with the 1st and 2 ni inputs, respectively, of the control module CU.Let m be the number of clauses that contain either u or u Every control module CU has two implementations: If u is assigned the value true, then we choose the 1st implementation where has delay the edge connecting CU and Cbt 2m-K, and the edge connecting CU 2 and Cb//2 has delay 2m-K-mi, respectively.We call these edges the st and 2 nd internal edge of CUi, respectively.If ui is assigned the value false, then we choose the 2 nd implementation where the latter edges have delays 2m-K-mi, 2m-K, respectively.We now show that the control modules guar- antee a consistent true/false assignment to the variables.If the delay in the TDGCI instane is at most 2m-K, then the delays up to CH Cb/2 of every control module CUi must be at most 2m-K.However, our construction enforces that exactly one of the.two internal edges of CUi is assigned delay of exactly 2m-K units.If the first internal edge of CUj is assigned delay 2m-K, then the delay up to input CU) is 0. This in turn implies that the delays on all edges connecting the 3 rd input and output of every variable module corresponding to variable u must be 0. The latter enforces a consistent true assignment to all appearances of variable uj.Similarly, if the second internal edge of module CUj. is assigned delay 2m-K, then it means that we ensure a consistent false assignment to all appearances of variable uj.

Combinational Circuits
We propose a heuristic for solving the TDGCI problem on combinational circuits modeled by dags.For simplicity, we assume two implementa- tions per module.We borrow ideas from the iterative improvement methodology.We first define the gain g(u) of a node u to be the difference between the delay of the longest path considering the node's current and complimentary implemen- tation, respectively.Let the two delays be denoted D1 and D2.The gain of node u is g(u)= D1-D2 i.e., the benefit resulting from the interchange on the module's implementation.In order to speed up operations however, we employ a technique that calculates the gain of the modules approximately (based on local principles), in most cases provably correct.However, in order to be able to perform these local computations we must assume that all modules in the library have identical pin capacitances and therefore the load capacitance seen at each module output is the same.Note that this is where our approach is different from the straight- forward iterative improvement.The major operation of our heuristic is to select the module with the biggest calculated gain, change its implementa- tion has been found, and (b) it will not be changed in later iterations of the program3.Finally, after all circuit modules have been assigned an implementation we perform an extra step (topological search) to enhance the quality of our results.We consider the modules actual pin capacitances and via a longest path Computation we calculate precisely the (minimized) maximum delay of the circuit.
In a preprocessing step, we transform the input combinational circuit to an acyclic directed graph, G (V, E), by substituting every (p, q)-module by (p+ q) vertices.Figure 2 illustrates the transfor- mation.Note that in the transformed graph, delays exist only on the edges that resemble edges internal to the modules in the original circuit.
Next we describe the procedure that calculates the gains.Each vertex u contains two fields lp-in (u) and lp-in2(u).The former stores the maximum delay onthe longest path from S to u, and the latter the maximum delay on the longest path from D to vertex u.We calculate the value of both fields via two longest path computations, one from S to D, and one from D to S. Let /1//1 be some (p, q)-module, selected for evaluation.The heuristic adds lp_inl (Pi) to the lp_in2 (@, for every pair (p, q) of M1.It then adds the result to the weight of the internal edge from p to q. Observe that this is the maximum delay on the longest path from S to D passing through vertices p and q.Let's call this maximum delay lmax" Next, it compares, the value of/max with the lln primary lout 4h 8--lout In 12out primary 21n primary 9--1OUt il 13 tou primary (a) A combinational circuit.(b) The graph that corresponds to the circuit C of (a).Each net of C is a set of nodes (a node per pin) connected with external edges.For each module of C graph G contains a set of internal edges (presented inside the big cycles).Delays (> 0) exist only on internal edges.These delays correspond to the internal delays of the modules in C.
value of the maximum delay on the global longest path from S to D, call it/max If/max- 0, rflax then vertices Pi, qj are on the global longest path from S to D. If/max I lmax > 0, then vertices pi, qj are not on the global longest path from S to D. Either way, a flag is set to determine the status of each vertex.Let lmax=lp_inl (pi)+lp_in2(q) + dE, where dE is the delay of the second implementation of M1 from Pi to qj.
The heuristic completes g(M1) as follows: If 2 12 > /max, then g(M1) /max lmax I f max 12max < /max and vertices pi, q are not on global longest path, then g(M1)=0.If E =/max, then max g(M1)=0.If/2ma x </max and vertices pi, qj are on 3This process is in fact repeated a fixed number of times of unlocking all cells, in hope of further reducing the overall delay.the global longest path, then g(M1) :/max 12max.
For every pair (Pi, qj) of M1 there is a correspond- ing gain for M1.The smallest of these gains is the one that our heuristic considers as g(M1).Let be the number of modules of the given circuit, and IEI to the number of all edges of the graph.The heuristic requires O([VIIEI) time.This is much less than the straightforward iterative improvement scenario (even for the simplified TDGCI model) where every time a module changes imple- mentation we have to compute the gain exactly.

Sequential Circuits
A straightforward heuristic that obtains optimal solution to the TDGCI problem on sequential circuits is based on the idea of generating a combinational circuit by selecting flip-flops to break the qycles, and then apply the algorithm above.Figure 3 illustrates the process.Our proposed approach uses the concept of retiming.  The straightforward approach for sequential circuits.(a) A sequential circuit.(b) The directed acyclic graph that corresponds to the circuit of (a).Every register R is replaced by two modules, Rin and Rout.All edges from S to any Rin and from any Rout to D have 0 weight.Similarly, all edges from any Rin to any module, and from any module to any Rout have 0 weight.The rest of the construction is identical to the one for combinational circuits.Leiserson and Saxe [9] proposed algorithms for clock period minimization, for both circuits with uniform and nonuniform delays on the modules.We found the proposed algorithm for circuits with nonuniform delays on the modules [9] to be complicated to implement.In this paper, we present an approach for the general model, which although asymptotically has the same time com- plexity with the respective algorithm in [9], is faster in practice and much easier to implement.
The formulation of our problem requires a retiming technique with some constraints on top of the ones in [9].More precisely we perform retiming on a graph constructed as follows: Every module's input and output is represented by a node, and every internal edge is substituted by a node and two edges labeled as prohibitive.An edge is called prohibitive if it is not allowed to host any flip-flop.After this transformation, all nodes have uniform delays.(see also Fig. 4).Our problem is to do retiming on a cyclic graph G (V, E), constructed from the sequential circuit so that we minimize the clock period subject to a set of prohibitive edges Ep.It can be shown that the problem is equivalent to assigning weights r(u) on the nodes u E V, that satisfy the following conditions: C3: r(u)-r(v) w(e), /e=(u, v) E,, where Ep is the set of prohibitive edges.The element of (a) after the transformation.The labels inside the circles indicate uniform delays on the respective nodes.
C1, C2 guarantee clock period minimization in [9].C3 guarantees that no flip-flop is placed on a prohibitive edge.
Next, we present a more detailed description of the heuristic.First we substitute every (p,q)module by (pq+p+ q) modules whose delay is uniform.All Pi, qj modules have delay 0. If max (p, q) > s, s is a small constant (we set in our ex- periments s 3), we consider the (p, q)-module as a module with uniform delay equal to the maximum delay among all I/O paths.We follow this type of approach to speed up operations.We call this model the prohibitive-edge model.The rest of the graph construction is the same as the one described in the straightforward approach earlier.Thus, the created graph is a DAG, and we can select the module with the biggest gain by employing our proposed approach for combinational circuits.In a second step, the heuristic changes the created DAG to a graph with cycles, by considering registers.
Then it performs retiming on the new graph, by applying the algorithm for clock period minimiza- tion, algorithm OPT2 in [9], that uses the prohibi- tive-edge model, described above.Thus the second step runs in O( VI IEI log vI ) time.The heuristic repeats the two steps until no feasible retiming exists.The time complexity is therefore O( Igl/l VIIEI log Vl ))-O(IVl=lEIlogl Vl ).
For comparison reasons, we also implemented a variation of the previously described approach.In this version, although we create a graph using the prohibitive-edge scheme described above, we don't break any cycles.Furthermore, we perform retim- ing once to obtain the minimum clock period before any module changes have taken place.Then we calculate the gain of each module by performing retiming, using algorithm [9].The gain of a module is equal to the difference of the minimum clock period evaluated before any module changes minus the minimum clock period of the circuit considering the alternative implementation of the particular module.The module with the biggest gain is selected and locked.Thus, for every module change, the heuristic performs retiming once for each unchanged module.The time complexity is O(IV[alElloglV[).

EXPERIMENTAL RESULTS
We implemented both our approach and the straightforward iterative improvement for the combinational circuits in C and run on a Sun Spare System 4/330.We experimented on several ISCAS'85 benchmarks.Since the ISCAS'85 cir- cuits do not include a list of possible implementations for each module, we generated these randomly by using function rand() from the standard library.For simplicity reasons, we considered uniform pin capacitances and the simplified model.We applied mod 10 to all created numbers, so that the delays range from 0 to 9. We treat every cell from the library as a module.Table I, gives the experimental results for our heuristic, Comb 1, and the straightforward iterative improve- ment approach Comb2.In Table I, "initial delay" denotes the initial delay of the circuit's longest path, "delay" the minimum delay obtained for the longest path, and "time" the time required for the particular heuristic to terminate.The time here is expressed in seconds.
When we constructed Comb 1, we tried to stay as close to the gain computation as possible, expect- ing to get a little worse results than those of Comb2 (since the gains were computed approxi- mately) but faster.We observed that in 80% of the gain selection Comb did indeed select the best actual gain.From Table I, observe that Comb l is not only much faster than Comb2 as expected, but is also produces smaller delays.A possible ex- planation of this behavior is that the suboptimal gains helped escaping local minima.
We implemented our heuristics for the sequen- tial circuits in C and run on a Sun Spare System 2. We experimented on several ISCAS'89 bench- marks.For simplicity reasons, we run our three heuristics based on the assumption that all modules have uniform delays.We used the delays given by the ISCAS'89 circuit data as the delays of the first implementation.We generated the delays of the modules for the second implementation randomly using the function rand() from the standard library.In addition, we applied mod 10 to all generated numbers so the delays range from 0 to 9. As before, we treat each cell from the library as a module.Table II, presents the experimental results for the straightforward ap- proach, Seq 1, our heuristic, Seq 2, and Seq 3.
Under "initial delay", we give the initial delay of the circuit's longest path.Under "delay", we list the minimum delay obtained for the longest path.Under "time", we give the time required for the particular heuristic to terminate.The time is expressed in seconds.Moreover, under "initial min.clock"we give the initial minimum clock period obtained by using retiming in Seq 2, Seq 3, before any module swaps have taken place.Although retiming appears to be relatively slow Circuit # of nodes ISCAS  process, we were able to obtain good results by using the prohibitive edge scheme described in Section 4. The results were in practice faster than the approach in [9] for the general model (with the simplifications described in the prohibitive edge model), by a factor of approximately 10%.Seq 3 was inapplicable even for small circuits.More importantly, the latter modification did not lead to any improvements on the quality of the obtained implementations.
For simplicity reasons we implemented all of our heuristics without considering any area con- straints.Note though that all of our heuristics can be trivially modified in order to handle a given upper bound on the total area of the circuit.The idea is to select the module that has the highest gain but whose selection does not violate the area upper bound.When the module with the biggest gain has been obtained, the particular heuristic checks whether by considering this module the overall area of the circuit exceeds the given area bound.If the area bound is exceeded, then we discard the module and we proceed to the module that has the highest gain among the remaining ones.Otherwise, the module is selected.

CONCLUSION
We have shown that the general circuit implemen- tation problem is NP-hard in the strong sense even when each module has only two implementations and there is no constraint on the total area of the circuit.We call this problem the TDGCI problem.The BCI problem, where all paths in a module have the same delays but different implementa- tions have different areas is also NP-hard in the strong sense even for two implementations per module.
We proposed the first heuristic for combination- al circuits under the general circuit implementation model.The heuristic uses iterative improvement methodology and outperforms an alternative iterative improvement scenario.We also proposed a retiming based heuristic for sequential circuits which uses the one for combinational circuits as a subroutine.The approach is compared to two other schemes we devised.contain variable ui.Clearly Ei m=3m.Every variable module U has two implementations.In the first implementation, the area is 0 and the delay is 1.In the second implementation, the area is m and the delay is 0. (See also Fig. 5b).If ui is assigned the value true, then we assign module Ui its first implementation.If u is assigned the value false, then we assign its second implementation.Note that the way we construct the variable modules guarantees a consistent true/false assign- ment on the variables.In addition, we set the area A to be 2m, and the delay D to be 1.An example of the BCI instance corresponding to a RE-STRICTED ONE-IN-THREE 3SAT instance is shown in Figure 5b.only if the constructed BCI instance has an implementation with area at most A, and delay at most D. We call clause path Ck, the path along the variable modules for the variables corresponding to the clause Ck (see Fig. 5b).Observe that the clause paths are the longest paths among any input-output paths.In fact, all the remaining input-output paths consist of one edge, and can have delay at most 1.If the RESTRICTED ONE-IN-THREE 3SAT instance is satisfied, then each clause ck E C has one true and two false literals.Therefore, the delay on every clause path is the same and equal to 1, and the area of the whole BCI instance is 2m.Thus, the BCI instance is satisfied.
On the other hand, suppose that the constructed BCI instance is satisfied.Then the delay along any input-output path is at most 1, and the area of the BCI instance is at most 2m.Next, we show that the delay along any clause path is exactly 1, and the area of the BCI instance is exactly 2m.Assume, by contradiction, that one of the clause paths has delay 0. This means that all three variables of the corresponding clause are assigned the value false, and therefore, the corresponding variable modules are assigned their second im- plementation.Let these three variables be ui, uj, and u,.From the construction, it follows that modules Ui, U:, and Ur contribute to the overall area of the BCI instance m, m:, and mr units of area, respectively.Moreover in order to satisfy the upper bound on the delay constraint, every other clause must have at least two variables assigned the value false.Thus, besides the clause whose three variables are evaluated to be false, all the other clauses must have at most one variable evaluated to be true.Thus, in the RESTRICTED ONE-IN-THREE 3SAT instance there are less than m variables which are evaluated to be true.The latter implies that the overall area exceeds the bound of 2m, a contradiction.
Therefore no.clause path can have delay 0, and all clause paths have delay exactly 1.The latter implies that exactly one variable per clause is true.
Since our construction guarantees consistent true/ false assignment to the variables, we conclude that FIGUREThe reduction for the TDGCI problem.(a) Variable module U. (b) Clause module Ck. (c) The TDGCI instance (without enforcing consistency in the assignment of the true/false values).(d) Control module CUt. (e) The complete TDGCI instance.
Figure e shows the complete construction of the reduction of Figure c, and illustrates the latter construction.

FIGURE 2
FIGURE 2  Constructing graph G for combinational circuits.(a) A combinational circuit.(b) The graph that corresponds to the circuit C of (a).Each net of C is a set of nodes (a node per pin) connected with external edges.For each module of C graph FIGURE 2 (Continued).

FIGURE
FIGURE3 The straightforward approach for sequential circuits.(a) A sequential circuit.(b) The directed acyclic graph that corresponds to the circuit of (a).Every register R is

2 4 FIGURE 4
FIGURE4 The transformation using the prohibitive edge model.(a) A functional element with nonuniform delays.(b) The element of (a) after the transformation.The labels inside the

FIGURE 5
FIGURE 5 The reduction for the BCI problem.(a) Variable module Ui. (b) The BCI instance.

TABLE Results for
Comb and Comb 2

TABLE II
Results for Seq 1, Seq 2 and Seq 3