Analysis and Characterization of State Assignment Techniques for Sequential Machines

utomated synthesis of finite state machines (FSMs) typically refers to a process by which a high level description of an FSM is transformed into a gate-level netlist. Beginning with a state transition table, a typical synthesis process is shown in Figure 1. An important step in the synthesis process is the state assignment. The state assignment has a direct influence, in part, on many parameters in the resulting circuit such as circuit area, circuit performance, and circuit testability [1]-[4]. Examples of state assignment tools include KISS [2], MUSTANG [5], NOVA [61, MUSE [7], and JEDI [8]. Two-level minimization seeks to reduce the number of terms in the Boolean equations that realize the FSM by reducing redundancy and exploiting don’t care conditions. An example of a two-level minimization tool is ESPRESSO [9]. Multi-level minimization refers to factoring two-level Boolean equations into multiple levels. The objective is to maximize the sharing of common Boolean expressions in order to reduce the number of required logic literals. Examples of multilevel minimization tools include MIS [10] and DECAF [11]. Technology mapping of multi-level Boolean equations is performed in order to map the equations into standard cells. Consideration is given to minimizing a cost function subject to design constraints such as speed and area. After technology mapping, a netlist is generated which can be interfaced to silicon compilation tools for placement and routing. Although each step in Figure 1 is shown as a separate entity, the steps are interdependent. For example, state assignment algorithms that target two-level minimization may not be efficient for multilevel minimization [5]. In the synthesis of an FSM, important criteria are considered which include in part:


I. INTRODUCTION
utomated synthesis of finite state machines (FSMs) typically refers to a process by which a high level description of an FSM is transformed into a gate-level netlist.Beginning with a state tran- sition table, a typical synthesis process is shown in Figure 1.An important step in the synthesis process is the state assignment.The state assignment has a direct influence, in part, on many parameters in the resulting circuit such as circuit area, circuit perfor- mance, and circuit testability [1]- [4].Examples of state assignment tools include KISS [2], MUSTANG [5], NOVA [61, MUSE [7], and JEDI [8].Two-level minimization seeks to reduce the number of terms in the Boolean equations that realize the FSM by reducing redundancy and exploiting don't care con- ditions.An example of a two-level minimization tool is ESPRESSO [9].Multi-level minimization refers to factoring two-level Boolean equations into multiple levels.The objective is to maximize the sharing of common Boolean expressions in order to reduce the number of required logic literals.Examples of multi- level minimization tools include MIS [10] and DE- CAF [11].Technology mapping of multi-level Boo- lean equations is performed in order to map the equations into standard cells.Consideration is given to minimizing a cost function subject to design con- straints such as speed and area.After technology mapping, a netlist is generated which can be inter- faced to silicon compilation tools for placement and routing.Although each step in Figure 1 is shown as a separate entity, the steps are interdependent.For example, state assignment algorithms that target two-level minimization may not be efficient for multi- level minimization [5].
In the synthesis of an FSM, important criteria are considered which include in part: 1) silicon area, 2) propagation delay time, and 3) testability.
For each of the above criteria, the particular state assignment which is chosen plays a crucial and central role.In this paper, we study the problem of state assignment as it relates to the above criteria for the case of multi-level minimization.The study involves the analysis and characterization of various classes of state assignment techniques in relation to the above criteria.The result of a study involving various FSM benchmarks is presented.The results show that the simple technique of o.ne-hot encoding [12] often produces better results than those attained by com- plex state assignment algorithms.

II. PRELIMINARY DISCUSSION
In general, an FSM F(NSD, M, OD) is composed of three sections as shown in Figure 2. The next state decoder (NSD) and output decoder (OD) are com- posed of combinational logic (and are often grouped into a single combinational logic block), and the memory (M) consists of a set of flip-flops.We shall restrict M to be composed of D-type flip-flops or- ganized as a parallel-in parallel-out (PIPO) shift reg- ister in the regular mode of operation.An FSM can be described via a state transition diagram (STD) or via a state transition table (STT).The STD and STT describe the set of FSM states, the conditions that cause a transition from one state to another, and the conditions that cause the assertion of output signals.
For a given STD, a state assignment is made by as- signing a unique binary code to each of its states.As measured by the number of bits, the length of the state assignment code, N, determines the size of the PIPO in M.An STD having P states requires a min- imum length code of size [log2P] where [log2P] is the least integer greater than or equal to log2P.However, utilizing a minimum length code does not necessarily result in an optimum solution relative to the area, speed, and testability of the FSM (see Section IV).
We shall assign a cost function C(A, D, T) to a given FSM, where A is the silicon area, D is the worst-case propagation delay time, and T is the test- ability.Each of the parameters A, D, and T is a complex function an,d is interrelated to the other two.
In the following, we shall examine the effect of state assignment on each of the above parameters.

Area
For the purpose of quantifying the silicon area, A, occupied by an FSM, we shall assume that the pro- cess technology and layout style are fixed.We shall further assume that a standard set of cells is used for the layout: Now, A can be separated into two parts: the first is the area occupied by the cells, and the second is the area occupied by routing.Thus, A de- pends on the number of required cells and the reg- ularity of the circuit structure.In order to optimize the function of A, one has to study the effect of state assignment on NSD, M, and OD.It has been estab- lished (see above) that the code length N determines the size of M. In addition to N, the actual compo- sition of the code also influences A. It has been shown that a significant improvement can be achieved by having the state assignment satisfy cer- tain constraints which are derived by applying a num- ber of rules [13].be given adjacent state assignments (i.e., two codes are adjacent if they differ in only 1 bit position)" Rule 1. states having a common next-state, Rule 2. next-states of a given state, and Rule 3. states having common outputs.The use of the above rules can help to reduce the amount of logic in NSD (rules 1 and 2) and OD (rule 3).It has been shown that using variations of the above rules, which also include the input space of the FSM, to derive weighted constraints can be ef- fective for multi-level minimization [5].

Delay Time
In general, OD is a function of the present state and the present input combination.The present state is computed from the previous state via NSD.Thus in order to compute the next outputs, logic signals are propagated through NSD, M, and OD.Hence, in general, the worst-case propagation delay path tra- verses a set of logic elements belonging to NSD, M, and OD.Since M consists of a PIPO, a state assign- ment that produces a wider PIPO (more bits) would not, at least to a first order, increase the propagation delay time through M (this does not take into account additional capacitance and power dissipation).However, if such a state assignment produces less logic in NSD and OD, the value of D would be reduced (everything else being equal).In other words, if in- creasing the size of M results in less logic in NSD and OD, we would expect a speed improvement.Indeed this was the case for most of the FSMs which were studied as presented in Section IV.

Testability
Testability refers to the degree to which internal cir- cuit nodes can be controlled and observed.Test gen- eration for sequential circuits has been recognized to be especially difficult [14].For example, in order to detect a fault in NSD, the FSM has to be placed in an appropriate state.However, this is a difficult task for two main reasons.First, the flip-flops are a coded representation of the states and must be set to the appropriate code so as to propagate the fault from NSD through M and on to the outputs of OD.Sec- ond, the feedback paths pose a problem because they are not directly controllable.With proper Design- For-Test such as the use of scan paths [14]- [16], the test process can be simplified.In the scan path ap- proach, the flip-flops in M are threaded together to form a serial shift register in the test mode of operation.The combinational portion of the FSM is separated from M so that standard test algorithms for combinational circuits such as D [17], PODEM [18], and FAN [19] can be utilized.Unfortunately scan paths serialize the test process and can result in an unacceptable amount of overhead for some ap- plications having tight performance and silicon area requirements.
Recently, it has been shown that there exists an intimate relationship between the synthesis approach chosen for an FSM and the testability of the FSM [3].A set of codes of distance-2 was used to enhance the testability of FSMs at the cost of a 0% to 30% circuit area overhead for the benchmark samples which were presented in the study.Also, in a previous paper [20], it was demonstrated that the one- hot state assignment results in circuits that can be easily tested without the use of scan paths.In the next section we will briefly review the reasons for the positive effect that one-hot encoding has on test- ability.In Section IV, we compare the one-hot state assignment with other types of state assignment tech- niques relative to testability.

III. ANALYSIS
In the past, designers have sought to use a minimum length code in order to reduce the cost of M because flip-flops were expensive.However, in VLSI CMOS applications a D-type dynamic flip-flop can be real- ized with eight transistors [21].Furthermore, the amount of logic in NSD and OD depends, in part, on the code length N.For many FSMs, as the value of N is increased the amount of logic in NSD and OD begins to decrease (to a certain limit) because: 1) of the availability of more don't care conditions which can be used in minimization, and 2) a larger N implies the existence of more state variables which can be used to satisfy more of the adjacency con- straints.Thus optimizing N (i.e., a minimum length code) only guarantees an optimum M but does not necessarily imply that A is optimum because A is affected by M as well as NSD and OD.The issue is to determine the length and composition of the state code which optimizes A for the single entity com- posed of the parts NSD, M, and OD.Similar argu- ments can be made for D (Delay) and T (Testability).For example, the testability of an FSM is not only affected by the amount of sequential elements in M but also by the amount of logic in NSD and OD.We have found that a one-hot state assignment, which utilizes a maximum length code where N is equal to the number of states, results in easily testable cir- cuits.The fact that each flip-flop in the FSM repre- sents a unique State in one-hot encoding, and that only one flip-flop can be at logic "1" in any given clock cycle greatly simplifies the process of path sen- sitization.Once the FSM is initialized to a given state, a path from the state to an output point can be easily sensitized by using the state-to-state se- quencing information given in the STD.Another ad- vantage of one-hot encoding is that the STD maps directly into hardware.The resulting circuit has a one-to-one correspondence with the STD which al- lows for test generation at the STD level (for gate- level faults).A test algorithm called Branch Testing has been developed for one-hot encoded FSMs [18].
Branch Testing identifies a minimum set of paths (between the top-most state and the bottom state) containing all the edges in the STD and then proceeds to sensitize each path using the STD sequencing information.Branch Testing requires the use of only three test points and a 2-way multiplexer re- gardless of the size of the FSM.More detailed in- formation on Branch Testing can be found in ref- erence 18.
In order to quantify the effect of state assignment on A, D, and T, we developed three state assignment tools.Tool 1   Only two rules are utilized to obtain a constraint matrix that specifies the states which are targeted to be adjacent: Rule 1. states having a common next-state with the transition controlled by the same input literal (i.e., a literal is a logic variable or its complement) as shown in Figure 3a, and Rule 2. next states of the same state as shown in Figure 3b.Once the constraint matrix is developed, each con- straint is given a certain weight factor.Generally, the state that appears the most number of times in the constraint matrix is given some priority.A heu- ristic is used to assign a set of state codes that best satisfies the constraints.We used a technique similar to that of KISS where the vertices and faces of a Boolean hypercube are employed to encode states and groups of states respectively.

Tool 2
Herein, the term unconditional output refers to an output that depends only on the present state of the associated FSM, independent of the present input combination.The term conditional output refers to an output that depends on the present state and the present input combination.
A total of four rules are utilized: Rule 1. same as that of tool 1, Rule 2. same as that of tool 1, Rule 3. states with common conditional outputs are targeted to be adjacent only if the out- puts are conditional on the same input literal(s) as shown in Figure 3c, and Rule 4. states with common unconditional outputs are targeted to be adjacent as shown in Figure 3d.The weight factors and state codes are assigned as in tool 1 except that rules 3 and 4 are given higher priority because tool 2 targets FSMs having many outputs.

Tool 3
One-hot encoding is used for an FSM of P states, where N P. The one-hot assignment implies that the code for state has a single "1" in the ith position and "0s" elsewhere.
Given an FSM of P states, tools 1 and 2 produce codes of length N where [log2P] -< N -< P, and tool 3 produces a code of length P. In order to complete the study, we chose to use the tool MUSTANG which provides the option of producing a minimum length code.The results of the study are presented in the next section.

IV. RESULTS
The statistics of 13 practical FSMs are shown in Table I.The FSM samples were taken from various sources including: 1) the MCNC 1989 Logic Synthesis Work- shop set of FSM benchmarks, 2) University projects, and 3) reference 10.The FSMs represent a wide range of samples which includes Moore machines (no OD) and Mealy machines.The "# outputs" column in Table I refers to the number of outputs from OD.In the case of a Moore machine, we specify the num- ber. of outputs to be zero.
The study involved a comparison between MUSTANG, tool 1, tool 2, and tool 3. The param- eters of interest are: 1) number of state variables (Table II), 2) the number of transistors in NSD and OD (Table III), 3) the total silicon area after place- ment and routing (Table IV), 4) the worst-case prop- agation delay time (Table V), and 5) the test length (Table VI).
The silicon area results were obtained as follows.
ESPRESSO was used for two-level minimization and DECAF was used for multi-level minimization.The VPNR (Vanilla Place aNd Route) [22] system was used for cell placement and routing.The area results are given in units of /2 where A is the minimum feature length.
CRYSTAL [23] was used for finding the worst- case path.The results are given in units of nanosec- onds and were obtained by using the circuit param- eter values of the MOSIS SCMOS 3-micron p-well process.
The test results for tool 3 (one-hot) were obtained by applying the Branch Testing algorithm.In the other three cases, a scan path was employed to par- tition NSD and OD from M. Test patterns for NSD and OD were then generated using MIKE [24] and applied serially via the scan path.The scan path itself was tested by shifting in a sequence of alternating l's and O's.The fault simulation data were obtained from the CADAT [25] fault simulator.MIKE uses     IV that one-hot encoding performs well for the larger FSM samples.It performs especially well for FSMs having many conditional outputs.However, it performs poorly for Moore machines.The reason is embedded in the fact that conditional outputs are directly available in the NSD portion of a one-hot encoded FSM.This allows the OD to share many common terms with the NSD.
In the case of propagation delay time, it is ob- served that one-hot encoding performs well for most of the FSM samples.This can be attributed to the fact that an increase in the number of state variables does not, to a first order, increase the delay time because M is organized as a parallel-in-parallel-out shift register.In general, such an increase will result in less combinational logic which would explain the results shown in Table V.
The test data of Table VI also supports the case for one-hot encoding.The circuits resulting from MUSTANG, tool 1, and tool 2 utilize a scan-path to separate NSD and OD from M. Whereas, the circuits resulting from tool 3 utilize no scan-paths.Hence, we are really comparing a case where test pattern generation was performed for the combinational portion (separately from the flip-flop portion) with a case where test pattern generation was performed for the FSM regarded as a single entity with no separation between the combinational and flip-flop portions.

V. CONCLUSIONS
Table VII summarizes the experimental results by comparing the one-hot encoding data against the tool that performed best in each category (area, delay, and testability) for each of the FSM samples.Based on Table VII, we give one-hot encoding high marks for: 1) simplicity, 2) speed performance, 3) testabil- ity, and 4) area performance for Mealy machines that have a large number of outputs.We give one-hot encoding low marks for area performance for Mooretype and small FSMs.
The good performance of one-hot encoding rela- tive to silicon areas is, in part, due to the degree of regularity of the resulting circuit structures.Regular structures lend themselves to more compact (denser) results.The speed performance results were due to shorter critical paths.Finally, the test results show that one-hot encoded FSMs do not require a scan- chain, but are inherently easily testable.In conclu- sion, for VLSI applications, the simple technique of one-hot encoding offers a competitive performance package for many FSM structures.
82 R.Z.MAKKI AND S. SU

FIGURE 3
FIGURE3 Examples of state assignment rules where state-a and state-b are given adjacent assignments.

TABLE II Number
of State Variables in State Code

TABLE VI Test
Iike approach.We start our analysis by observing the number of state variables used in the state code for each of the FSMs in TableII.The number of state variables directly affects the size of M. The largest number of state variables was incurred by tool 3 and the smallest was incurred by MUSTANG.TableIIIshows that tool 3 incurs the least amount of combinational logic for most of the FSM samples.The data shown inTableIV represents the area occupied by the entire FSM.The area occupied by a single D flip-flop is 58 h 64 h.It is clear from Table

TABLE VII Area
, Delay, Test, and Transistor count Results in Comparison