General Decomposition and Its Use in Digital Circuit Synthesis

ion (relating several units of lower level time to one unit of higher level time, e.g. relating the delays of structural implementation elements to the clock period of a sequential machine). The structural, behavioural and data abstractions have actually been used to prove the general full-decomposition theorem. This theorem gives the necessary and sufficient eonditions which must be fulfilled by each general composition of partial machines, in order to realize the functional behaviour as specified by the original machine. Once proved, the general full-decomposition theorem provides synthesis rules that are problem independent and guarantee functional correctness. The parametric correctness, in the sense of satisfying the hard physical constraints, is achieved with the class specific rules, which are distinct for different classes of target architectures (building blocks). These rules are obtained by modelling the synthesis problem as a multiobjective constrained optimization problem (see Section 6.3). The parametric optimization is guided by the rules that are problem-instance specific. These rules are constructed and selected automatically by the search algorithms, based on information about the characteristic features of a sequential machine related to the characteristics of building blocks and optimization aims [29]-[32]. Many researchers and designers are convinced that "correctness by construction" makes post-factum verification unnecessary. This is not true. Even if the construction rules are proved to be correct, their application can be faulty due to mistakes made by designers or errors in the synthesis tools. The physical constraints are verified by estimating the parameters involved by using the abstract modelling, the lower level synthesis tools, or simulation, and checking the estimates against the constraints. Verification of the near-optimal satisfaction of the objectives consists of checking the performance of the synthesis tools by using benchmarking and statistical analysis of the synthesised designs [29]-[32]. The functional correctness is checked by repeatedly applying two elementary verification pro-

INTRODUCTION odem microelectronic technology gives opportu- nities to build digital circuits of huge complexity and provides a large variety of logic building blocks.A rapidly growing interest in programmable devices has also been observed, as a result of their very attractive characteristic features.However, programmable devices impose limitations on various circuit parameters due to input, output, functionality, memory, communication and speed constraints.
On the other hand, traditional logic design methods are not suitable for very complex circuits or implementations with constrained building blocks for the following main reasons: they are only devoted to some very special cases of possible implementation structures, they often leave unconsidered some important parameters that sufficiently influence the actual design objectives, they often fail to find global optima for large designs, they do not consider hard constraints, and they often do not consider correct- ness aspects in an appropriate manner.
Logic synthesis is typically performed without any relation to the target implementation structure and there- fore, a technology mapping must be applied in order to map the synthesized logic network into a network of building blocks that can be implemented.Sometimes, the technology mapping algorithm is unable to construct any implementable network from a given initial network and it cannot guarantee an optimal solution, if the initial network is constructed without any regard to future implementation.
The bad practice of target-independent logic synthesis follows from the lack of appropriate modelling tools and synthesis methods for digital circuit structures.Tradi- tional logic modelling tools model circuits in terms of functionally complete systems composed of a minimal number of some special structural elements (e.g.AND+ OR+NOT, NAND, NOR, MUX or AND+EXOR) instead of modelling them in terms of all structural elements at the designer's disposal or, just generally, in terms of all possible subcircuits.For example, the commonly used Boolean algebra enables us to express all the possible Boolean functions but fails to model their implementation structures.Boolean algebra makes it possible to decompose functions exclusively into networks consist- ing of AND, OR, and NOT subfunctions, or into the equivalent NAND or NOR networks, while in general they can be decomposed into subfunctions of any kind.L. J(WlAK Similarly, binary decision diagrams enable us to express the Boolean functions exclusively in the form of two- input multiplexer networks.
From the above we can conclude that the opportunities created by microelectronic technology cannot effectively be exploited.It has become extremely important to develop a new generation of methods which will effec- tively and efficiently deal with design complexity and the characteristic features of modern building blocks, en- abling modelling and synthesis of all reasonable circuit structures and providing "correctness by construction", easy correctness verification and intelligent search algorithms for the effective and efficient exploration of the huge space of correct circuit structures.
In order to solve the problem a structural decomposi- tion approach may be used.It consists of transforming a system into the structure of two or more cooperating sub-systems in such a way that the original system's behaviour is retained and certain constraints and objectives are satisfied.
The theoretical work in this field was started by Ashenhurst  [4] and Curtis  [10] for combinational circuits and by Hartmanis  [16][17] [18] for sequential circuits in the early 1960s.However, they over-simplified the actual problems and left some important parameters, that suf- ficiently influence the actual design objectives, uncon- sidered.For example, Hartmanis and others only partly considered decomposing the internal states of sequential machines.It was an incomplete solution, because the most important design parameters of a circuit for implementing a sequential machine (complexity, speed, test- ability etc.), or the possibility of implementing a machine with limited building blocks, depend on the whole implementation structure, i.e. on the distribution of the machine's inputs,-outputs, state memory, functionality, and interconnections between the building blocks.So, from the practical viewpoint, decomposing the whole sequential or combinational process into an appropriate structure is necessary, i.e. full-decomposition.The theo- retical works devoted to decomposition from the 1960s and 1970s should be considered as first steps on the way towards a complete decomposition theory.The first practical solutions were obtained in the 1980s (e.g.[1][5] [21][27] [35][45] [46]).
The strongest stimulus for developing decomposition methods and tools came recently from the newest gen- eration of multi-block programmable devices.In the case of fine granular multi-block FPGA's the hard constraints are active for virtually all non-trivial circuits.Implementation is impossible without decomposition.
In this paper, we will present the fundamentals of a decompositional design methodology which meets the requirements of today's complex circuits and modern microelectronic technology.The methodology is based on the theory of general full-decompositions which has been developed by us during the last few years and applied in a number of prototype decomposition tools [22]- [30].General decomposition consists of transform- ing a sequential machine or a Boolean function into the structure of two or more cooperating partial machines in such a way that the original machine's behaviour is retained, and all the important structural attributes relat- ing to inputs, outputs, state memory elements, functional units, and interconnections between the units, are appropriately considered at the same time.
Our previous publications focused on heuristic search algorithms for decomposition and presented some bench- mark results [26]- [30].The main aim of this paper is to present a general full-decomposition model and to show that this model, together with its theorem, constitute the theory of digital circuit sfructures at the highest abstrac- tion level and form a sound base for the construction of decomposition algorithms.Since general full-decomposition includes various special decomposition types for sequential and combinational circuits, the general de- composition theorem will also be interpreted for some important special cases.Other aims of this paper are to explain how the model can be used for the digital circuit synthesis when focusing on correctness and optimization aspects.

Sequential Machines and Realisations
A sequential (Mealy) machine M (Fig. 1) is an alge- braic system defined by: M (I,S,O, 8,h) where: I S O g h a finite set of inputs, a finite non-empty set of internal states, a finite set of outputs, the next state function :S I---S, the output function h:S I--*O.
Sometimes, the design requirements do not completely specify a machine.For example, certain input/state combinations may never occur due to external con- straints or due to realizing the machine in such a way that some of the input/state combinations of the realization are not used for implementing the original machine.Therefore, from the functional viewpoint, the designer does not care what will be the next-states or outputs for such combinations.Sometimes, outputs are sampled only at specified times: when they are not being sampled, they may be unspecified.If a certain input/state combination is followed by a general reset signal, the output for this combination should be specified, but the next-state need not.In all such situations one talks about so called "don't care" conditions."Don't cares" are commonly denoted by "_".To account for "don't care" conditions, the sequential machine definition should be extended by slightly changing the definitions of functions 6 and h: 6: S I-->SU{-} and h: S I-->OU{-} (for a single output machine) or h=[hj], hi: S I---Ot_J{-} / O= [O] (for a multiple output machine).A sequential ma- chine without "don't cares" will be referred to as completely specified and with "don't cares" as incom- pletely specified.
A hardware implementation of a sequential machine is called a sequential circuit.If the output values are independent of the input values, i.e. h: S-->O then the sequential machine is called a Moore machine.If the output set O and output function h are not defined or h is an identity function of a Moore machine, then the sequential machine M (I, S, 6) is called a state machine.A sequential machine with one state and a trivial next-state function is called a combinational machine (combinational function) and its hardware implementation is called a combinational circuit.Since the output values of a combinational machine are inde- pendent of the state values, i.e. h: I---->O and its trivial state behaviour is not important, the combinational machine is completely defined by M (I, O, h).
Since a Moore machine, a state machine, and a combinational machine can be considered as special cases of a Mealy machine, formal considerations of this paper will be limited to the Mealy machines.
If M's a realization of M then, for all possible input sequences, the output sequences produced by machine M and its imitation M' are identical after renaming them.
Realization in this sense will be referred to as the realization of the output behaviour.Since is a function into nonvoid subsets of S' it is a multi-state realization.
In some cases, the internal state of the machine must be known outside.Therefore, the state behaviour realiza- tion is also important.Realization of the state and output behaviour is a special case of the output behaviour realization for which function is a one-to-one function, so that ': ' (I)-1 exists.
The sequential machine composed as a structure consisting of , M' and O (and ' -1 for the state behaviour realization) (Fig. 2 and 3) will be referred as L. J0:WIAK the realization structure for M defined by M' and will be denoted by str(M').

Partitions and Partition Pairs
Partitions and partition pairs are useful for modelling information and information flows inside and between the machines.Let S be any set of elements.Partition r on S is defined as follows: "rrs(0).The partition containing all the elements of S in one block is called an identity partition and it is denoted by "rrs(1).
Let xr and xr 2 be two partitions on S. The partition sum xrl + "rr 2 is a partition on S such that [s]xr + "rr 2 [t]'rrl + "rr 2 if and only if a sequence" S=So, S1,..., s t, saCS for 1..n, exists for which either [si]r [Si+l]'rr or [si]r e [si+l]xr2, 0 <--< n-1.From the above definitions, it follows that the blocks of xr 2 can be obtained by intersecting the blocks of r and r e, while the blocks of r + r e are obtained by combining all the blocks of xr and rg_ which contain common elements.
Also, a-r 2 is greater than or equal to r l: 7r <r e if and only if each block of ar is included in a block of 7r e.Thus, 7r --< 7r e if and only if ar "rr 2 1, or if and only if r + 7r e r 2.
Any partition a-r on S can be interpreted as an equiva- lence relation defined on S with the equivalence classes being the blocks of wr.Using this interpretation, the partition 7r gives information about the elements of S with limited precision to the equivalence class.With this information it is possible to distinguish the elements from different classes although it is impossible to distin- guish elements of the same class.The partition product can be interpreted as a product of the appropriate equivalence relations introduced by these partitions; it represents combined information about the elements of S that is provided by the relations together.The partition sum can be interpreted as the sum of the appropriate equivalence relations introduced by the partitions and it represents information about elements of S after apply- ing the combined abstraction provided by the relations involved.The partial ordering relation -< denotes the fact that if "rr -< 2, then "rr (and thus the associated equiva- lence relation) provides information about elements of S, that is at least as precise as information given by "rr 2 (and its associated equivalence relation).
A zero partition provides complete information about the elements of S and an identity partition gives no information.
A set system ,r on a set S is defined as a collection of subsets B, B e B of S such that B S and BiB for :/: j.The only difference between a partition and a set system is that the subsets B of the set system are not required to be disjoint.A set system "rr on S can be interpreted as a compatibility relation defined on S with the compatibility classes being the blocks of "rr.Such a compatibility relation is reflexive and symmetric but is not required to be transitive.If it is transitive, i.e. the subsets Bi of the set system "rr are disjoined, then the compatibility relation is an equivalence relation and the set system "rr is a partition.Let and denote the functions which map the subsets of S, I or S I in the subsets of S or O, respectively, in accordance with mappings provided by the functions and h for the elements of the appropriate subsets, i.e. (B, x) {(s, x) sBC_S/ xl}, a(s, A) {(s, x) lsS / xAC_I}, (D)= {(s, x) l(s,x) DC_S I}, .(B,x)= {h(s, x) sB_S /K xI}, .(s,A) {X(s, x) lsS / xAC_I}, and (D)= {X(s, x) (s,x)tF_DCS I}.
Let xr s, a-s, xrI, "fro, "rrsI be partitions on M (I,S,O,,h).In particular xr s, a-s on S, q'l" on I, a-i" O on O, and "rr s i on S I.
(Trs,a-s) is an S-S partition pair if and only if /BTr s x I: (B,x)C_B', H B'a-s.
(q'ri,q'rs) is an I-S partition pair if and only if /A qT sS: (s,A)C_B, Br s.
('fl'i,a-l'O) is an I-0 partition pair if and only if VATr VsS: (s,A)C_C, Cr o.
(q'l's x I,'ri's) is an S I-S partition pair if and only if /D 'rr s I: (D)C_B, B "tr s.
('rl" S i,ffl'o) is an S I-O partition pair if and only if 'D 'rr s i" h(D)C_C, C 7r o.
The interpretation of the notions introduced above is as follows: ('rr s, a" s) is an S-S partition pair if and only if the blocks of "rr s are mapped by M into the blocks of a-s, i.e.
the input and the block of "rr s will unambiguously determine the block of a-s in which the next-state will be contained.In other words, knowing the input and having information about the present state with precision to the equivalence classes introduced by "rr s, it is possible to compute the information about the next-state with precision to the equivalence classes introduced by "r s.Interpreting the notions of I-S, S-O and I-O partition pairs is similar.Partition s has a substitution property (it is an SP-partition) if and only if (xrs, xr s) is an S-S pair.
Let xr be a partition on S. The minimal second partition which forms an S-S partition pair with xr as a first partition will be denoted by ms_s('tr).The maximal first partition which forms an S-S partition pair with r as a second partition will be denoted by Ms_s(-rr).It can be proved [18] that: m s s('rr) H{'ITi ('rr,'rri) is a S-S partition pair} M s s(-rr) {'rri ('rri,'rr) is a S-S partition pair} The interpretation of ms_s('tr) and Ms_s(W) partition is the following: For a given -rr, m('rr) describes the largest amount of information which can be computed about the next-state of M knowing the block of 7r which contains the present state.M(-rr) describes the minimum amount of information which must be known about the present state of M in order to be able to compute the information about the next-state with precision to "rr.In a strictly similar way, m and M operators can be defined and interpreted for I-S, S-O and I-O partition pairs.7rs X is a partition on S I induced by an output i.e., if "rr s is a partition on S I induced by an output partition -rro (notation: "rr s i ind('tro)) and, if it is known that the output y of M is contained in the block C: C "rro then it is also known that the pair (s,x) consisting of the present state and input of M is contained in the block D: D Ws i, where block D is unambiguously indicated by block C. It can be said that block D of "rr s is induced on S I by block C of "tro and denoted by D ind(C).
It is possible to prove that "rr s is a partition induced on S I by an output partition 7r o if and only if "rr s -> M s I_O('rro), i.e. the smallest induced partition for a certain "rro is "rr s i: 'ITs Ms I-O('Tro) "rrs i is a partition on S I induced by a state partition "rr s (xr s i ind(xrs)) if and only if: i.e. if it is known that the present state s of M is contained in a block B: B'rr s then it is also known that each pair (s,x) is contained in the block D: D'rr s i, where block D is unambiguously indicated by block B (D ind(B)).In a similar way, the notion of a partition induced on S I by an input partition "rr can be defined and interpreted.
'ITS I is a partition on S I induced by an input partition 'ffi if and only if: It is easy to prove that the smallest induced partition for a certain "rr s ('rrI) is "rr s i: [(s,x)]s [(t,z)]rs if and only if [S]s [t]'rrs ([x]'trI [z]'rri).
For the purpose of bit decompositions (in which the input and/or output bits are appropriately distributed instead of the input/output symbols), the concept of bit partitions has been introduced.Let: B {bl,b 2 blBi} be a set of bits.Let: T {tl,t2,..ttxt } be a set of symbols (bit value patterns) on B. Now, each bit bk:bkB induces a two block partition 'rrT(bk) on the set T (in the case of incompletely specified machines on the subset of T for which the value of this bit is specified).One block of 'rrT(bk) contains the symbols for which the bit b k has the value 0. In the second block of "rrT(bk) we find the symbols for which b k has the value 1.In the following text we will also use 'rT(bk) to denote partition on T induced by b k.
A partition q'r a on the set of bits B: qT B {bl,b 2 bk, (bk+ blBi) }, where important (for distinguishing be- tween certain symbols) bits b b k are kept in separate blocks and don't care bits bk+ biB are kept in a single block called a don't care block (denoted by dcb()), is called a bit partition.The product (.) and sum (/) operations as well as the ordering relations (-<) for bit partitions are defined in the same way as for "normal" partitions, but the block of the bit-partition's product being the product of a block (important or don't care) with an important block is an important block; and the block of the bit-partition's sum being the sum of some blocks (important or don't care) with a don't care block is a don't care block.The zero bit-partition is defined as a bit partition with an empty don't care block.
L. J0WIAK If "rr T ind('rrB) then having "rrB, one can compute the blocks of "try:.If 7r B ind(TrT) then, knowing the block of -rrT one can compute the values of all the important bits from "rr B.
The last two definitions relate the symbol and bit partitions and allow the bit full-decomposition to be considered as a special case of a symbol full-decomposition.

GENERAL FULL-COMPOSITION
Let us consider realization of M (I,S,O,,h) (Fig. 3) by M' being a composition of n partial machines M (as shown in Fig. 4 for the case of two machines Mi).
{Mi {Ii*, Si, Oi, i, hi}, Ii* Ii X I'i, -< -< n }, a set of sequential machines referred to as component machines, { Coni: X Oj ----I' i, -< i,j -< n }, a set of surjective functions referred to as connecting rules of the component machines.
A general composition is said to be in canonical form if and only if the connection rules Con compute the vector values and have the following form: Coni(Yl Yn) (Cnl,i(Yl) COnn,i(Yn)) i.e. a (partial) output information from a certain component machine j, -< -< n, is separately transmitted to the input of a certain machine i, -< <-n, i.e. without combining it with a (partial) output information from other partial machines k, --< k -< n, k 4: j.
A general composition is said to be in maximally preprocessed form if the connection rules Con compute the scalar values, i.e. information transmitted from vari- ous partial machines to a certain machine is combined prior to connecting it to the input of this machine.Of course, the compositions in partially preprocessed form, lying between the two above extremes, are also possible.
Allowing for external local connections between the outputs and inputs of a certain component machine Mi, gives more freedom in describing the circuit structure.
The machine M can influence it own behavior partly through its internal state and partly by affecting the inputs.The precise form of this influence is defined by a specific choice of the connections Con and machine functions i and X i.
Definition 2. A general composition GC of n sequential machines defines the general composition machine M6c(GC) M6c({ Mi }, { Coni }) (IGc, SGC, OGC, GC, k6C) with I6c Ii, SGC Si, OGC Oi, GC (SGc'XGc) GC ((S1 Sn), (X1 Xn)) i(Si, (Xi' Cni(Yl Yn)), hGC(SGc, XGC) hGc((S Sn), (X1 Xn)) hi(si, (Xi' Cni(Yl Yn))" Formal definitions for compositions TC of various special types T can be introduced in a very similar way, as special cases of the above definition [22]- [25].Each one of them defines the appropriate type T composition machine M:c.We will say that composition TC of the type T of n sequential machines M realizes machine M if and only if Mrc realizes M. We will not distinguish between the type T composition TC and the type T composition machine MTc unless it can lead to misun- derstanding.
In a general composition, there is a danger of infor- mation loops occurring in the exchanged information.Such loops at the level of elementary (binary) signal lines will result in sequential behavior of the two interconnected combinational circuits which compute h instead of the required combinational behaviour.We say that a general composition is legal if and only if the composition h* of h is guaranteed to be a function.This is satisfied if and only if the signal values of each elementary (binary) signal used for information ex- change between the partial machines are computed independently of the values of this signal.Of course the cyclic signal flows can occur exclusively due to the interconnection circuity.Checking for acyclic signal flow is equivalent to tracing the primary information sources i.e. to check if the signal values of each elementary signal line, used for information exchange between partial machines or for transmitting information between the outputs and inputs of a certain machine, are originally computed from the primary input and state information of the composition machine only.Therefore, the partial machines must together possess enough primary input and state information to compute all the information transmitted by interconnections.The composition's le- gality guarantees that the information that has to be transmitted will be computed by the partial machines.
Fortunately, the legality is structurally guaranteed for most of the special cases of the general composition because the information loops are, in most cases, impos- sible at the level of total information flows between the outputs and inputs of the partial machines.One has to check for non-closed loops at the level of partial infor- mation flows (signal lines) only in the most general cases of the general composition, i.e., for machines other than Moore machines in cases where the exchanged informa- tion is computed in more than one partial machine when using some information transmitted from the other ma- chines or in the presence of local connections.In particular, the composition legality is structurally guaranteed for Moore machines (where the partial state information of the component machines is transmitted between the partial machines) and for the following compositions of Mealy machines without local connec- tions: parallel compositions (without information ex- change), serial compositions (with unidirectional infor- mation flow) and compositions where the exchanged information is only computed from the (primary) input and state information of each partial machine itself. 4. GENERAL FULL-DECOMPOSITION Let us consider realizations of M by M' being a general full composition of n partial machines M i.In order to construct a general full-decomposition of a machine M, it is necessary to find partial machines M (Ii Si, Oi, i, hi), their interconnection structure repre- sented by Con as well as the mappings: : I----I i, : S-S and O: Oi---O such that the machines M interconnected by Con i, together with the mappings , and O, can realize the behavior of a machine M (Fig. 5).
The state S i, input I and output O of each component machine provide partial information about the state S, input I and output S I of the original machine and in this way, they all provide partial information about S I. Information I' about O can be transmitted between the component machines.Each component machine M must be able to compute its next-state and output information from the partial information about S I only, provided by Si, Ii, and I'i.A general composition of the component machines, together with the input, output, and state decoders , , and O, must be able to realize the behavior of the original machine M. q Sa S 01 FIGURE 5 General full-decomposition of a machine M into two component machines M and M2 without local connections.
L. JOWIAK Implementation of the general decomposition model requires four sorts of components: an input coder (preprocessor) simultaneously operating component ma- chines (main processors) Mi, output/state decoders (post- processors) O/', and the communication circuitry Con i.The input coder, the output/state decoders, and the communication circuitry can be implemented as combi- national circuits.In some special cases these circuits can be reduced to the appropriate distribution or joining of the appropriate input, state, or output bit lines.Implementation of the component machines depends on the nature of the original machine M. The component machines can take the appropriate form of a general sequential machine, a Moore machine, a state machine, or a combinational circuit.Each component of the general full-decomposition model can be further decom- posed using this model or its special cases.
The general full-decomposition model is powerful and very natural.It models an information-processing system in terms of information flows and cooperating informa- tion-processing units.Special cases of the general full- decomposition model cover all other known models for decomposing sequential and combinational circuits.
Various full-decomposition types can be distinguished on the basis of the type of connections between the cornponent machines (general, serial or parallel connec- tions with state and input, state or input information transmitted), and on the type of coding/decoding (symbol or bit coding/decoding) [22]- [27].Definition 3. The machine str(Mac({Mi}, {Coni})) is a full-decomposition of a certain type T of machine M if and only if the type T composition of machines M realizes M. In particular, the machine str(Mc({Mi}, { Con })) is a general full-decomposition of the machine M if and only if a general composition GC of M realizes M.
Each component machine Mi of a general decompo- sition computes partial information about the next-state and output of the original machine M and the component machines cooperating together realize the required be- haviour specified by the original machine M. To realize this behaviour, the component machines must be able to compute together enough information about the next- state and output of the original machine M within each period between the successive sampling moments of the state and output information.
Below, the theorem concerning the existence of a general decomposition will be presented.For reasons of simplicity in presentation, the single-state realizations of the completely specified sequential machines will only be considered further.However, the presented results can be quite naturally extended.All the concepts presented will remain valid except for replacing the partitions by the set systems in order to cover the multi-state realiza- tions and the incompletely specified machines (also the weak or extended partition pairs can be used in the place of partition pairs) 18].
,.l.r be partitions on M Let 71-1, S, and "rrs (I,S,O,,h) on I, S, and S I, respectively.Let 7r s J be and a partition on S I, such that "rr s q > 'rr S S 1-I "rr s tiJ.Let 'rr s H qT s i.Let "Iris and 7rSs be partitions induced on S I by '/1"11 and -rr H "fr/s and s, respectively.Let 7r/s i- TfSs I-H "ITSs i i. Below, the term "trinity of i= partitions" will be used in the sense of three strongly related partitions.
The proof of Theorem can be found in Appendix.
Theorem can be interpreted in the terms of the equivalence relations introduced by the appropriate parti- tions and it is interpreted graphically in Fig. 6 for the case of two partial machines without local connections.When computing the output function h, the sequential machine M classifies the elements of S into the classes of an equivalence relation.In this equivalence relation, the elements of S I, mapped by in the same values, are in the same classes and those mapped in different values are in different classes.Analogously, when computing the next-state function , the sequential machine M classifies the elements of S into the classes of an equivalence relation defined by values of .I t is possible to obtain each of those classifications as a product of n other classifications.These are defined by the appropriate output and next-state functions of the partial machines if and only if the following conditions are satisfied: s'( s o'{ FIGURE 6 General full-decomposition of M with two component machines M! and M2 defined by the trinities ('rrl I, Xrs , "rr [) and (q2, %2, % i2).
the product I] Trs of the state classification rela- tions of the partial machines forms the state classi- fication relation -rrs(0) of the original machine M (condition 5), the product I-[ s of the output classification relations of the partial machines forms the classifi- cation relation which enables unambiguous computation of the output classification relation "rro(O) of the original machine M (condition 4), each partial machine is able to compute its own classifications -rrs and "rr s based on the present state and primary input classification provided by its own state and primary input, and the classification of the elements from S I provided by the extra input from the other machines (conditions (1) and ( 2)), the composition of the partial machines is legal (condition (3)).For incompletely specified machines or multi-state real- izations, partitions have to be replaced with set systems and a theorem similar to Theorem can be proved.Its interpretation will be slightly different.The classifica- tions of the elements from S I computed by the original and component machines will no more define the equivalence relations but the compatibility relations denoted by the appropriate set systems.In these relations each element can be a member of many compatibility classes, because the compatibility relations are not re- quired to be transitive.
The next section of the paper is devoted to the discussion of some important special cases of the general full-decomposition model and Theorem 1. Usage of the model for decompositional logic synthesis is discussed in Section 6 and an example to illustrate the usage is presented in Section 7.

SPECIAL CASES
The general full-decomposition model covers all other known structural models for sequential and combina- tional circuits, including the following: parallel full-decompositions [19]- [27], in which each of the component machines can compute its own next-state and output independently (Fig. 7); serial full-decompositions [22]- [27], in which only one of the component machines (Me) uses informa- tion from the second machine (M1) in order to compute its own next-state and output (Fig. 8); decompositions with the separate realization of the next-state and output functions [24] (Fig. 9); bit full-decompositions [25]- [27], where the decod- ers and O are reduced to the appropriate distri- bution of the input and output bit lines (Fig. 10); input-bit parallelfull-decompositions (referred to in the literature as cascade decompositions [30], serial decompositions [35], Boolean decompositions [14][44], decompositions with generalized decoders [9][44], three-level decompositions [37], or decom- positions into submachines [46]) (Fig. 11); bit parallel full-decompositions (parallel decompo- sitions [21], output decompositions [30]) (Fig. 12).all the special structures modelled by Boolean algebra, BDD's or any other traditional means (for example, the two-level logic AND-OR structures of combinational circuits are special parallel decompositions with partial machines M Mr: restricted to AND functions and with the output decoder O composed of exclusively OR functions).
Below, the general decomposition theorem will be inter- preted for a number of important special cases.For reasons of simplicity in presentation, we consider the decompositions with only two machines and without local connections later in the section; however, the presented results can be very easily extended to n machines.I1" [111'11 Ilk]>!M 1 0 1 , , , [0,0 l 0"[01,0 2 0 m FIGURE 12 The bit parallel full-decomposition.

Sequential Machines
Let TrI, TI, 'ITs, TS, 'IT S I, TS be partitions on M (I,S,O,,h), on I, S, and S I, respectively.Let "rr' s i and "r's be partitions on S I, such that s I'rrs and a" s i-->a's i, and av i, "r i, 'IT I, T be partitions induced on S I by "rr I, "r I, r s, "r s respectively.Theorem 2 (general decomposition with two component machines and without local connections) A sequential machine M (I, S, O, , X) has a general full-decomposition with the output behaviour realiza- tion if and only if two trinities of partitions (r I, r s, "rrs i) and (a" I, "r s, "r s i) exist, so that: (|) ('Ti'Is X I" 'TrSs I'T'S I, 'Trs) and ('rIs i"rSs i"rr's i, "rs) are S I S partition pairs, (2) TI'Is I" ITSs I" 'I"S 'ITS and TIs TSs Ti"S "I'S I, (3) 'TrIs I" 'rIs i" 'ESs I" TSs 'IT'S x and ITIs i" TIs ITSs TSs T'S I, (4) ('rr s i" "rs i, "fro(0)) is an S I O partition pair.
Additionally, if (5) "rrs'rs "rrs(0) is satisfied, then the state behaviour of M will also be realized.
Theorem 2 is interpreted graphically in Fig. 13.The primary input, state, and output sets of partial machines are defined as blocks of the appropriate partitions from the partition trinities (ax I, s, Ws i) and ('r I, The interconnection circuits Conl,2 and Con2,1 compute the appropriate extra input information of each component machine from the output information of the other component machine, by computing the blocks W's i and "r' s i respectively.The next-state and output func- tions of partial machines map the blocks of their appro-priate state and input partitions (primary and from the interconnections) into the blocks of their state and output partitions.The input decoder computes the blocks of 'IT and "r from the input information of the original machine M. The state and output decoders ' and (R)   compute the state and output information of the original machine M from the state and output information of the partial machines represented by partitions "rr s, "r s, and "rr s and "r s i, respectively.
In the case of the general decomposition with state connection (type S), "rr s i' and "r s i' give partial information about the states of M and M 2. This infor- mation can be described by state partitions "tr s' and "r s' induced on S by r s i' and "r s '.Now, it is possible to partly formulate the conditions (1), (2), and (3) of Theorem 2 directly in the terms of "rr I, a" I, r s, "r s, "rrs', and "r s' instead of formulating them in the terms of the appropriate partitions on S I. Theorem 3. A sequential machine M (I,S,O,,X) has a general full-decomposition of type S with the output s o'{ Sxl FIGURE 13 General full-decomposition of M with two component machines M and M2 defined by trinities ('rrl, "rrs, "rrs i) and ('q, "r s, TS 1)' L. J0WIAK behavior realization if and only if two trinities of partitions (ori,ors,ors ) and (a-l,a-s,a-s i) exist, so that (la) (ori, ors) and (a-I, a-S) are I-S partition pairs (lb) (ors" a-s', ors) and (a-s" ors', a-s) are S-S partition pairs (2) ors X OrSs >< X a-Is >< a-'Ss < ors and s X Or'Ss -< a-S X' where or's s ind(ors') and a-,s ind(a-s, SXI-- (3) ors' -> ors "a-s and a-s' -> ors "a-s (4) (OrS TS I, oro(O)) is an S I-O partition pair.
Additionally, if (5) ors "a-s ors(0) is also satisfied, then the state behaviour of M will also be realized.In a parallel decomposition, no information flows be- tween the partial machines.So, the partitions ors i', a-S I OrS and a-s in Theorems 2 and 3 are reduced to orSi(1) and ors(l) respectively.In this manner, the Theorems 2 and 3 are reduced to the following theorem.Theorem 4. A machine M has a parallel full-decom- position with output behaviour realization if and only if two partition trinities (orl,ors,ors i) and (a-i,a-s,a-s i) exist that satisfy the following conditions: (1) (ori, ors) and (a-i, a-s) are I-S partition pairs, (2) ors and a-s are SP-partitions, (3) Orlsx OrSsx <z OrSXI and a-lsx a-Ssx TSXI, (4) (orsxi a-si, oro(0)) is an S XI-O partition pair.
If the condition: (5) ors a-s ors(0) is also satisfied, then the state behavior of M will also be realized.In a serial decomposition only one of the component machines, say M2, uses information of the output of the other machine (M1).So, the partition a-si' in Theorem 2 is reduced to orsi (1).In this manner Theorem 2 is reduced to the following theorem.Theorem 5. A machine M has a serial full-decomposi- tion with output behaviour realization if and only if two partition trinities (ori, ors, orsxi) and (a-I,a-S,a-SxI) exist and the following conditions hold: (1) (ori, ors) is an I-S partition pair, ors is an SP- partition, and (Tlsxi a-Ssx or'SxI, TS) is an S I-S partition pair, s < and (2) OrIsxI" or SXI OrSXI TIsxI TSsxI OrtSXI TSXI, (3) (orszI a-sI, oro(0)) is an SI-O partition pair.
If the condition: (4) ors a-s ors(0) is also satisfied, then the state behaviour of M will also be realized.If a" s ors(l), Theorem 3 is reduced to the following theorem Theorem 6.A machine M has a serial full-decomposi- tion of type S with output behaviour realization if and only if two partitions (ori, ors, ors i) and (a" I, a-s, a-s i) exist that satisfy the following conditions (1) (ori, a-rs) and (a-i, a-s) are I-S partition pairs, (2) ors is an SP-partition and (a-s" ors', a-s) is an S-S partition pair, where" ors' -> ors, ortS (3) OrIsx OrSs < OrSI and TIsI TSsI SI a-sI, where or'Ssxi ind('rrs'), (4) (ors a-si, "rro(0)) is an SI-O partition pair.
If the condition (5) ors a-s ors(0) is also satisfied, then the state behaviour of M will also be realized.
Bit decomposition is a special case of symbol decompo- sition, where the input and output decoders qr and are reduced to the appropriate distribution of the input and output bit lines (Figure 8).So, the partitions" ori, a-, oro, and a-o in the decomposition theorems, should be re- placed by the appropriate bit partitions orB, "riB, OrOB, and "rOB on the set of input bits IB and the set of output bits OB.In this way, Theorem 2 for example, is transformed into the following theorem" Theorem 7. A machine M has a general bit full- decomposition with output behaviour realization if and only if two bit partition trinities (OrlB, orS, OrOB) and a-s, "rOB) exist that satisfy the following conditions:
(2) (*r ""rI '11"O(0)) is an I-O partition pair.By replacing in Theorem 11 the input partitions *ri and "r by the bit partitions *riB and "rB, the following theorem can be obtained.

Combinational Machines
A combinational machine can be considered as a special case of a sequential machine with one state and the trivial next-state function.Therefore, the combinational ma- chine is completely specified by" M=(I,O,h), because the trivial state information can be eliminated from further consideration.In this way, Theorem 2 is reduced to the following theorem.

Decompositional Logic Synthesis
The aim of synthesis is to provide a circuit structure that realizes the specified behaviour, satisfies certain con- straints and optimizes specific objectives.
In general, the constraints and objectives refer to the circuit's performance and how the various resources are used during the whole life-cycle of the circuit.They can be formulated along various dimensions such as time, area, inputs, outputs, power consumption, testability, reliability, maintainability, design time or cost etc.
In our methodology, the behaviour is specified in the form of an original sequential machine or Boolean function and the physical constraints and objectives are modelled as a constrained multi-objective optimization problem.
Decompositional synthesis consists of applying the general decomposition model and theorem, or their special cases, a number of times and in this way, adding the structural information to the design specifications until a directly implementable design description is obtained.
By repetitive use of the general full-decomposition model or its special cases, all possible implementation structures for sequential and combinatorial machines (all meaningful partial machine networks) can be obtained.
The appropriate decomposition theorems guarantee correctness by construction and limit the search for solutions to the decompositional structures that realize the specified behaviour.The model, together with its theorems, forms a basis for decompositional synthesis.The model information on how the model was used during synthesis and allow us to check the correctness of the synthesis in a relatively easy waymby backward mapping the synthesis result into the specification.In this manner, checking the correctness of the human designer or automatic design tool behaviour is possible (see Section 6.2 for further information and Section 7 for an example).
For small sequential or combinational machines, the optimal decompositions can be found by implicit enu- meration, limited only by the properties of the building blocks and the algebraic properties described by the appropriate decomposition theorems and partition pair theory.For large systems, the number of possible decom- positions is so great than an implicit exhaustive search, performed using only the algebraic and building block properties, is impossible.It becomes necessary to con- struct the most promising decompositions using the theorems presented in this paper together with the appropriate heuristics.The heuristic evaluation functions and selection mechanisms must limit the search space to a manageable size and keep high-quality solutions in this limited space.
Our methodology guarantees "correctness by con- struction", easy post factum correctness verification and satisfaction of all the originally specified constraints.The objectives can be near-optimally satisfied, because the problems at hand are computationally complex and heuristic algorithms must be used.

Correctness Aspects
Currently, simulation and prototype testing are com- monly used to validate designs, but this approach is not sufficient for complex circuits.The techniques of formal validation are more promising and therefore have been applied in our methodology.We will show that it is possible to use them very effectively and efficiently.Proving correctness consists of providing evidence of the fact that the realization relation holds between an original specification and its implementation.Since syn- thesis involves adding detailed information to specification, proving correctness must involve the opposite i.e. abstracting from the information and in this way relating the detailed implementation description to its more abstract specification.The correctness-proving process associated with the decompositional logic synthesis is performed as a series of abstraction steps which are used to gradually relate the lower level design descriptions to the immediate higher level specifications, starting from the bottom level implementation and continuing until the original top level specification is reached.
Four types of abstraction are used in this process: structural abstraction (hiding the information about a circuit's internal structure by computing the behavioural description for the composition of partial machines); data abstraction (hiding the information about the implemen- tation of data by replacing the binary data values with their symbolic abstract representations); behavioural ab- straction (e.g.leaving unspecified behaviour for certain state/input combinations which will never occur in the operating environment ("don't cares")); and temporal abstraction (relating several units of lower level time to one unit of higher level time, e.g.relating the delays of structural implementation elements to the clock period of a sequential machine).
The structural, behavioural and data abstractions have actually been used to prove the general full-decomposi- tion theorem.This theorem gives the necessary and sufficient eonditions which must be fulfilled by each general composition of partial machines, in order to realize the functional behaviour as specified by the original machine.Once proved, the general full-decom- position theorem provides synthesis rules that are problem independent and guarantee functional correctness.The parametric correctness, in the sense of satisfying the hard physical constraints, is achieved with the class specific rules, which are distinct for different classes of target architectures (building blocks).These rules are obtained by modelling the synthesis problem as a multi- objective constrained optimization problem (see Section 6.3).The parametric optimization is guided by the rules that are problem-instance specific.These rules are con- structed and selected automatically by the search algorithms, based on information about the characteristic features of a sequential machine related to the character- istics of building blocks and optimization aims [29]- [32].
Many researchers and designers are convinced that "correctness by construction" makes post-factum verifi- cation unnecessary.This is not true.Even if the construc- tion rules are proved to be correct, their application can be faulty due to mistakes made by designers or errors in the synthesis tools.
The physical constraints are verified by estimating the parameters involved by using the abstract modelling, the lower level synthesis tools, or simulation, and checking the estimates against the constraints.Verification of the near-optimal satisfaction of the objectives consists of checking the performance of the synthesis tools by using benchmarking and statistical analysis of the synthesised designs [29]- [32].The functional correctness is checked by repeatedly applying two elementary verification processes: checking of the correctness of each particular de- composition (transformation) computed by the syn thesis tools, i.e. verifying whether the proposed system of partitions and associated state, input, and output mappings satisfy the conditions of the general full-decomposition theorem, and checking of if each particular planned decomposi- tion has been applied successfully; this is performed by reverse mapping of the decompositional imple- mentation structure into its specification.
In general, design verification is a complex process because one does not know the sequence of transforma- tions which have to be performed, in order to show that an implementation satisfies its specification.In our meth- odology verification is simple, because the sequence of transformations results from the information produced during synthesis.If information about synthesis transfor- mations to be performed is memorized, then finding the reverse transformations and performing the reverse mapping is very easy (see an example in Section 7).In place of verifying that a certain implementation is a realization of a given specification, as done by traditional verifica- tion methods, we check if the specific planned decom- positional structure is correct and then we prove that the synthesised implementation is the planned realization of the specification.This results in a very efficient verifica- tion process.Of course, it requires prior knowledge of the class of correct structures and knowledge of deci- sions made for selecting a certain structure from the class of correct structures.The first part of the required knowledge is general and it is given in the form of a general full-decomposition model and its theorem proven in this paper.The second part of the knowledge is problem instance dependent; however, it must be found before constructing the required decomposition.There- fore, the only extra activity to be performed to enable the reverse mapping is keeping a record of the decisions taken during the construction of a certain decompositional structure, i.e. recording what instance of the model is intended to be used.This record can be kept in terms of (partial) machines and appropriate mapping functions.Since the highest level record represents the original machine and lowest level the resulting realization struc- ture, the extra information recorded is limited to the tables of partial machines from the intermediate levels and appropriate mapping functions.
Since verification processes are performed by using reverse operators to those used during synthesis, the probability of masking the synthesis faults, by faults during verification, is negligible.Therefore, the verifica- tion performed in this way is very reliable.Synthesis faults can be rapidly detected and localized because the elementary verification processes can be immediately performed after finding or applying an elementary trans- formation.
The post factum verification by reverse mapping is much more efficient than a verification by traditional verification methods which do not use information from the synthesis.The time savings result from the fact that it is no longer required to find the sequence of the verification transformations, because this sequence is unambiguously defined by the sequence of the synthesis transformations.Therefore, the verification time is com- posed exclusively of time for performing the reverse transformations (which is comparable to time for per- forming synthesis transformations) and time for comparing the original specification with the result of the reverse mapping (which is proportional to the dimen- sions of the specification).Verification by reverse mapping is also much easier than proving correctness for the complex software of the synthesis tool and ensuring the correct functioning of its hardware.It is equivalent to showing that the synthesis tool has performed correctly for a particular case.An example of verification by reverse mapping can be found in Section 7.
The principle of reverse mapping is very general.It can be applied for off-line and on-line correctness verification of various systems in all cases where forward transformations are known.In particular, it can be applied for design verification of any kind of transfor- mational design.

Search for Optimal Solutions
For large sequential or combinatorial systems, an ex- haustive search for optimal decompositions is impossible.It is necessary to construct only the most promising decompositions when using heuristic search algorithms, and to choose the best of these.In our previous publica- tions [29]- [32], some specific heuristic decomposition algorithms have been described and benchmark results from their software implementation have been presented.This section aims to describe the underlying principles of those algorithms.These principles can be used for construction of the heuristic search algorithms for vari- ous decomposition problems.
A specific decomposition problem can be modelled as a special multi-objective constrained optimization problem (MOCOP) [29][30] [32].A MOCOP can be charac- terized by four sets of components" a set V (V 1, V 2 V n) of variables, a set D (D1, D2 Dn) of domains for the variables, a set C (C1, C2 Ck) of constraints, and a set O (O1, 02 Op) of objectives.Each domain Di: D D represents a finite or infinite set of values.A constraint Ci: Ci C is a m-ary relation on m of the n variables V i" V V and it can be viewed as a mapping from D to the set {T, F} (T "true", F "false").An objective Oi: Oi O is a function on the variables from V with values being real numbers, i.e. a function from D to R (Rmthe set of real numbers).To solve a MOCOP is to find an assignment of values to L. J0WIAK variables Vj, 1...n, so that all constraints Ci, 1...k, are satisfied.Since a MOCOP involves several objec- tives, trade-off btween them is possible.Therefore, the complete formulation of a MOCOP must include trade- off information.This trade-off information can be formu- lated in several different ways: as an order of the objectives, utility function, ranking information, local preference for small changes in values of the objectives etc. [43].The choice of a specific formulation depends on the particular problem.To solve a MOCOP optimally is to solve it in such a way that the most preferred solution is found according to the objectives Oi, 1..p, and the actual trade-off information.When modelling the struc- tural decomposition problems, the particular variables and domains correspond to various structural attributes (dimensions) of the implementation architecture such as inputs, outputs, memory elements, functional elements, interconnections, etc., and their possible values.The objectives and constraints are the functions and relations of those structural dimensions, such as the number of inputs (outputs, memory elements, etc.), area, speed, etc. or the limits imposed on such characteristics.
The solution of the MOCOP model of a specific decomposition problem can be found by defining a set of some elementary component machines (atomic compu- tations), constructing a certain partition on the set of the elementary component machines and implementing each partition block as a separate component machine.If the model of a certain decomposition problem involves constraints, it can be solved by special multidimensional packing algorithms [29] [30].Models without hard con- straints can be solved by special multi-objective cluster- ing algorithms [32].The partitioning process which produces partitions on the set of elementary component machines is preceded by analysis of characteristic fea- tures of an original machine related to the characteristics of building blocks.In particular, the input, output, and state information and their interrelations are analysed.This analysis enables us to distinguish elementary com- ponent machines and to characterize them and their correlations.Its results are used to guide the heuristic packing or clustering processes.
The partitioning problem is represented in terms of a space of states, where each state corresponds to a particular (partial) solution.A (partial) solution consists of a (partially constructed) partition.The tree of (partial) solutions has the form of an implicit tree, i.e. it is defined only by means of an initial state, the rules for generating the tree and the termination criteria.The rules describe how to generate successors to each partial solution (i.e. they define move operators that are (partial) mappings from states to states).Any state that meets a termination criterion is called a goal state.Partitions in ptcking algorithms are constructed by putting unallocated ele- ments successively into the partition blocks [29] [30].
Clustering algorithms construct the successor partitions by merging some of the partition blocks [32].Heuristic algorithms are used to select the most promising partial solutions and to develop them further when applying only the best move operators.
A heuristic search algorithm can be effective and efficient, if it is able to appropriately compose a broad search of a solution space in many promising directions with a fast convergence to the (near-)optimal solutions.The fast convergence can result from using the knowl- edge cumulated in the previous search steps for selecting the most promising (partial) solutions and move operators.
A special heuristic double beam-search algorithm has been developed by us in order to efficiently construct a limited set of "sub-optimal" partitions.A beam-search is a variation of breadth-first search, where only a limited number ot the most promising alternatives are explored in parallel.Beam-search requires two data structures: current states (that contain the set of states which have been constructed earlier and are considered to be ex- tended presently), and candidate states (that contain the set of states which are being created as a direct extension of the current states).A third data structure, final states, contains the states with completed partitions.
Our double beam algorithm uses two selection mechanisms: Select Moves and Select States.Move operators are evaluated and selected in relation to a certain state (dynamically).Only a few of the most promising move operators are chosen for a certain state from current states by Select Moves, using a number of choice strategies and heuristic evaluation functions.By applying the selected move operators for each of the current states, a new set of candidate states is created.The work of the second selection mechanism Select States, is twofold: it scans the candidate states for completed partitions in order to include them into the final states and it examines the rest of the candidate states in order to include the best of them into the (new generation of) current states.The beam-search algorithm stops if the set of current states is empty.
The selection mechanisms Select Moves and Select States must ensure that a solution that violates the hard constraints will not be constructed and they will try to satisfy the objectives optimally by limited expenditure of computation time and memory space.In order to fulfil the first task, Select Moves will select only those move operators which, applied to a given state, do not lead to the violation of hard constraints.In order to fulfil the second task, Select Moves and Select States will select a number of the most promising operators or states, re- spectively, by using the estimations provided by some heuristic evaluation functions.The selection mechanisms and evaluation functions determine together the extent of the search and quality of the results.
The selection mechanisms use heuristic elaborations of one coherent decision rule: "in each state of the search, take a decision which has the greatest chance of leading to the optimal solution, i.e. a decision which is most certain according to the estimations given by the heuristic evaluation functions".If there are more deci- sions of the same or comparable quality, a number of them will be tried in parallel (beam-search).
According to the above rule, Select Moves will apply those move operators which maximize the choice cer- tainty in a given current state and it will leave the operators which are open to doubts for future consider- ation.Since information contained in the partial solutions and used by the evaluation functions grows with the progress of computations, the uncertainty related to operators decreases.In each computation, Select Moves will maximize the conditional probability that the application of a certain move operator to a certain solution state leads to the optimal complete solution.Under this condition, it will maximize the growth of the information in the partial solution, which will then be used in the successive computations steps in order to estimate the quality of choices.The quality of the operator Q(op) is decided by these two factors.Select Moves is controlled by two parameters: MAX-MOVES (the maximum number of operator alternatives explored) and OQFACTOR (the quality factor for opera- tors).Select Moves selects no more than MAXMOVES of the highest quality operators op, so that: Q(op) -> OQFACTOR * Qopmax where Qopmax is the quality of the best alternative operator.Poor quality alternatives are not taken into account.
In addition to selecting the final states, the task of Select States is to choose the most promising candidate states for a new generation of current states.Select States is controlled by two parameters: MAXSTATES (the maximum number of state alternatives explored) and SQFACTOR (the state quality factor).Select States selects no more than MAXSTATES of the highest quality alternative states PS, for which: Q(PS) -> SQFACTOR * Qmax where Q(PS) denotes the quality of an alternative PS and Qmax denotes the quality of the best alternative.Poor quality alternatives are not taken into account.Q(PS) can be computed by cumulating the qualities of the choices of operators that took place during the construction of PS and prediction of the quality of the best possible future choices on the way to the complete solution.Another possibility consists of predicting the quality of the best complete solution that can be achieved from a certain present state PS.
Generally, operators and partial solutions are esti- mated with some uncertainty.This uncertainty decreases with the progress of computations, because both the "sure" information contained in partial solutions and the quality of prediction grow with this progress.In the first phase of the search, the choices of operators can be done with much more certainty than the choices of partial solutions.In this phase, partial solutions almost do not exist or, in other words, they are far from being complete solutions and almost anything can happen to them on the way to achievable complete solutions.
Therefore in the first phase, the search should be performed almost exclusively based on the choices of operators and, with the progress of computations, more and more on the choices of partial solutions.In our algorithm, this is achieved by giving a relatively low value to MAXMOVES compared to MAXSTATES and a relatively high value to OQFACTOR compared to SQ-FACTOR.
Since the uncertainty of estimations decreases with the progress of computations, MAXMOVES and MAX-STATES can decrease and OQFACTOR and SQFAC-TOR can increase with the progress of computations, increasing the search efficiency in this way.
In the method described above, the double beam- search allows for effective and efficient decision-making under changing uncertainty.
In the first search phase the algorithm is divergent to high degree, i.e. a large number of the most promising directions in the search space are tried.In the second phase, when it is already possible to estimate the search directions and operators with a relatively high degree of certainty, the search becomes more and more convergent.The highly divergent character of the search in the first phase, composed with the continuous interplay between the partial solutions in the second phase, result in a global character of the double-beam algorithm.
The search method presented was implemented in a number of decomposition and state assignment programs and when tested on benchmarks, it efficiently produced very good results [29][30] [31].
Of course, it is possible to use the solutions found by our constructive double-beam algorithm as good initial solutions for the search algorithms that perform search in the space of complete solutions (e.g. for local searches, simulated annealing, tabu search, or genetic algorithms).However, this was not necessary in the tested cases, because the double-beam constructed the strictly optimal solutions [29] [30].
Complex multiple general decomposition problems can be solved by decomposing them into systems of more specific subproblems, which are easier to solve than the original problem, and then solving the systems of subproblems by using systems, of cooperating sub- problem-specific algorithms.
L. J0WlAK In this section, we have discussed only some very general principles of searching for the optimal decom- positions.For each particular decomposition problem, the problem specific features should be used in order to distinguish the elementary component machines (atomic computations) and to perform the partitioning processes effectively and efficiently.In this way the generic pack- aging or clustering algorithms will be transformed into some problem specific algorithms.For example, in the case of a traditional two-level AND-OR decomposition of Boolean functions, the atomic computations can be defined as computations of minterms, the partial ma- chines will be limited to AND circuits which will be able to compute product terms, and the output decoder will be limited to be an OR circuit.In this case, the decompo- sition problem can be viewed as clustering of minterms into larger terms.The aim will be to find the minimal number of clusters (terms, partial machines), that realize all the atomic computations (minterms). 7. EXAMPLE The aim of the example is to illustrate the use of the proposed decompositional synthesis methodology for logic synthesis and correctness verification.
Since conditions (1)-( 6) are fulfilled, the conditions of Theorem 8 are satisfied.Therefore, it is possible to construct a parallel bit full-decomposition of M into M and M 2 with the state and output behavior realization as shown in Figure 14.
Since conditions ( 7)-( 8) are fulfilled, the conditions of Theorem 7 are satisfied.So, it is possible to decompose M 2 further into M2,1 and M2,2, by constructing a general output-bit full-decomposition with the state and output behaviour realization as in Figure 14.The next-state and output tables of the partial machines from Figure 14 and -the state decoding functions ' and 2' are given in Tables II-VII.
Since each of the partial machines M1, M2,1 and M2,2 has two states, only one two-state memory element is required for implementing the state memory for each of them.If D flip-flops are used and if we assign the binary state values according to the output values of each partial machine, then, the output logic will be reduced to identity functions and the following assignments and excitation functions will result (Tables VIII-X).Twice performed decomposition results in a state assignment for M.
Designing the decompositional implementation of M can proceed further with combinational logic synthesis and layout synthesis in order to optimize the specified excitation functions and to find an optimal layout.The combinational logic synthesis can also apply the decom- positional paradigm.Since the decompositional realiza- tion of M, as given in the Tables II-X, has been obtained when using the previously proven correct ways of construction (Theorems 7 and 8), this realization is correct.However, it is correct if those correct ways of construction have been actually applied and not only planned to be used, i.e. if the human designer or automatic tool which actually applied these theorems performed fault-free.Since this cannot be guaranteed, the actual application of the correct construction must be checked for correctness.
Applying the concept of reverse mapping, it is possible to make a straightforward check whether the designed composition of partial machines realizes the specified behaviour of M.
In the first step, the decompositional structure of M 2 is mapped into its specification.From the tables of M2,1 and M2,2 (Tables V and VI), the table of the composition B(nt )=t(B(nt),xt machine M2,1--->M2,2 (Table XI) is computed and then the table for M 2 (Table III) is obtained by applying the state decoding function cb'2:$2,1 S2,2-->S2 (Table VII) to the table of the composition machine M2,1--->M2,2.The second step consists of mapping the composition of M and M 2 into the specification of M. It can be performed in precisely the same way as the first step.Since the decomposition was correct, the reverse mapping did not uncover any faults.
Let us now introduce a fault, for example by replacing the correct state F in the first row and third column of the table of M2, (Table V) by the faulty state G.In this way, we obtain Table XII.The table of the faulty composition machine Mz,<-->M2,2 is shown as Table XIII.If we now apply the state decoding function t22 S2,1 S2,2"--'>S2 to Table XIII, we will obtain the table for M 2 (Table XIV).Simple comparison of the table for M 2 obtained from the reverse mapping (Table XIV) with the table for M 2 being the specification of M 2 obtained during the synthesis process (Table III) uncovers the fault.In one table in the first row and the third column state "C" appears and in the second "-" appears in the same place.

CONCLUSIONS
Implementation of a sequential machine or a Boolean function requires finding the composition of some struc- tural elements, which allows realization of the inputoutput behaviour specified by a certain machine or function, and which satisfies a certain set of constraints and objectives.In general, the problem of finding an optimal implementation remains unsolved.Only the special case of two-level logic and unconstrained mini- mization (in the sense of the minimal term cover), can be processed by the exact techniques for designs up to about 20 inputs [11] and by the nearly optimal heuristic techniques for larger designs [6].Constrained optimiza- tion or optimization for objectives other than minimum term cover remained unsolved even for the two-level   M2.1 S2.X2X (H,00) (H,01) (H,10) (K,00) (K,01) (K,10) Y2 82,1 S2. x2x (F00) (F,01) (F,10) (G,00) (G,01) (G,10) Y3 82,2 logic structures which after all, form a small sub-class of all the possible implementation structures.
Contrary to traditional logic design methods, which deal with some very specific implementation structures, the decomposition methodology presented enables the modelling and construction of a large class of function- ally correct digital circuit structures, which includes all structures known from the literature as its special cases.The decomposition theory presented shows that the structures tackled by the traditional logic synthesis meth- ods form only a small subset of possible implementation structures.The traditional methods can of course, be used for solving the specific decomposition problems they were developed for (at least, if they solve them effec- tively and efficiently); however, they allow us to exploit L. J6WIAK      only a small part of the opportunities created by modem microelectronic technology.The decompositional logic synthesis enables us to deal efficiently with design complexity and characteristic features of modem build- ing blocks.It makes possible extensive examination of the solution space and allows exploitation of the ma- chine's structural features in relation to a given set of objectives and constraints.Additionally, the partial cir- cuits are smaller than the original ones and easier to Yl design, optimize, implement and test.The decomposition 0 methodology presented ensures "correctness by con- struction" and solves very effectively and efficiently the problem of design validation.As we have shown, "correctness by construction" and post-factum verification are two complementary validation approaches which should be used jointly during the design process.Design- y2 ing with previously proven constructions can make not 0 only synthesis but also post-factum verification much easier.The structural knowledge generated during the synthesis process is used for post-factum verification, raising its efficiency enormously.An appropriate struc- tural knowledge in an appropriate form and appropriately Y3 applied is the main reason for an extremely efficient verification process using our methodology. .For example, Luba et al. [35] applied the input-bit parallel decomposition to combinational circuits and saved on 00 average more than 50% silicon area compared to tradi- tional implementation, and the area saved was as high as 10 11 90% in one example.Similar results were reported in [8][9] and [46].Decomposition results for sequential machines show that the overall silicon area for the decomposed machine is often much smaller than the area of the best monolithic implementation, and the component machines are always smaller than the original ones 0 [2][26][39] [40].
Recently, there has been substantial progress in devel- oping efficient heuristic methods for various special cases of general decomposition and for different optimi- zation objectives and constraints [9][29] [30][32] [37] [44] [45]; however, no overall method is yet available for 00 efficient exploration of the entire space of multiple 10 general decompositions in a systematic manner.This

APPENDIX: THE PROOF OF THEOREM 1
First, it will be shown how to construct a general full-decomposition that will realize the output behaviour of M.

FIGURE 3
FIGURE 3 State and output behaviour realization.

FIGURE 4
FIGURE 4 General full-composition of two component machines and M2 without local connections.

FIGURE 14
FIGURE 14 Decompositional realization structure for M.

11 important
research area remains open.
a given s S, the block of partition xr containing s is denoted as [s]xt; while [s]xt=[t]xt denotes that s and t are in the same block of xr.Similarly, the block of partition xr containing S', where S' C_ S, is denoted by [S']xr.The partition containing only one element of S in each block is called a zero partition and it is denoted by For The partition product xrl xr 2 is a partition on S such that [s]xr xr_ [t]'rrl "rr 2 if and only if [s]xr [t]xr and [s]'rr 2 [t]'rr 2.

TABLE State /
output table of the original machine M