^{1,2}

^{2}

^{3}

^{3}

^{1}

^{2}

^{3}

A novel concept of logic redundancy insertion is presented that facilitates significant latency reduction in self-timed adder circuits. The proposed concept is universal in the sense that it can be extended to a variety of self-timed design methods. Redundant logic can be incorporated to generate efficient self-timed realizations of iterative logic specifications. Based on the case study of a 32-bit self-timed carry-ripple adder, it has been found that redundant implementations minimize the data path latency by 21.1% at the expense of increases in area and power by 2.3% and 0.8% on average compared to their nonredundant counterparts. However, when considering further peephole logic optimizations, it has been observed in a specific scenario that the delay reduction could be as high as 31% while accompanied by only meager area and power penalties of 0.6% and 1.2%, respectively. Moreover, redundant logic adders pave the way for spacer propagation in constant time and garner actual case latency for addition of valid data.

The 2009 International Technology Roadmap on Semiconductor (ITRS) design predicts that adaptive digital circuits will be increasingly necessary for the future as a consequence of increase in variability [

Although the term “self-timed” has been used to refer to asynchronous circuits, it is important to note that self-timed circuits actually constitute a robust class of asynchronous circuits, namely, input/output mode circuits. In general, circuits corresponding to the input/output operating mode do not impose timing assumptions on when the environment should respond to the circuit. The robustness attribute in self-timed circuits usually results from employing a delay-insensitive (DI) code for data representation, communication, and processing, and a 4-phase (return-to-zero) handshake signaling convention is commonly adopted. Among the family of DI codes [

According to dual-rail data encoding, each data wire

Delay-insensitive dual-rail data encoding and 4-phase handshaking.

With reference to Figure

The dual-rail data bus is initially in the spacer state. The sender transmits the codeword (valid data). This results in “low” to “high” transitions on the bus wires (i.e., any one of the rails of all the dual-rail signals is assigned logic “high” state), which correspond to nonzero bits of the codeword.

After the receiver receives the codeword, it drives the

The sender waits for the

After an unbounded but finite (positive) amount of time, the receiver drives the

The timing diagram for the 4-phase asynchronous signaling protocol is shown in Figure

Data representation via dual-rail and 1-of-4 encoding formats.

Single-rail inputs | Dual-rail encoded data | 1-of-4 encoded data | |||||

( | ( | E3 | |||||

0 | 0 | (0 1) | (0 1) | 0 | 0 | 0 | 1 |

0 | 1 | (0 1) | (1 0) | 0 | 0 | 1 | 0 |

1 | 0 | (1 0) | (0 1) | 0 | 1 | 0 | 0 |

1 | 1 | (1 0) | (1 0) | 1 | 0 | 0 | 0 |

Timing diagram of a 4-phase handshake discipline.

The 1-of-4 encoded values of single-rail inputs given in Table

Although higher order encoding schemes are available, apart from the dual-rail code that allows easier mapping between conventional binary functions, the other widely used DI code is the 1-of-4 code. This is owing to the reason that for self-timed data paths, encoding by sender and membership test and decoding by receiver are important aspects, and consequently the encoding and decoding complexity is dependent on the message space to be coded [

Seitz classified a self-timed logic circuit into two robust categories on the basis of its indicating (acknowledging) genre as

all the inputs become defined (valid)/undefined (spacer) before any output becomes defined/undefined; that is, any or all of the output(s) become defined/undefined only after all the inputs have become defined/undefined,

all the outputs become defined/undefined before any input becomes undefined/defined.

Some inputs become defined (undefined) before some outputs become defined (undefined); that is, some outputs could become defined (undefined) only after at least some inputs have become defined (undefined).

All the inputs become defined (undefined) before all the outputs become defined (undefined); that is, all the outputs could become defined (undefined) only after all the inputs have become defined (undefined).

All the outputs become defined (undefined) before any input becomes undefined (defined).

The signaling scheme for strong- and weak-indication timing regimes in terms of the input-output characteristics is illustrated graphically in Figure

Depicting the input-output behavior of strong-and weak-indication circuits.

This section deals with an efficient method of reducing the critical path delay of self-timed adders by means of a novel concept called

Logic redundancy can be incorporated into a self-timed circuit implementation by careful duplication of similar logic, and this can lead to multiple acknowledgements, which might be useful in simplifying the timing assumptions. Additionally, this procedure could facilitate faster reset of logic during the return-to-zero phase with a constant latency. Logic redundancy achieved through input-incomplete gates basically introduces weak-indication property into the circuit as it relaxes the indication constraints of those outputs that are considered as candidates for optimization. (

The basic equations corresponding to a dual-rail encoded full adder are given by (

The circuit shown in Figure

Proposed weak-indication full adder design (SSSC_DRE adder).

Firstly, it can be noticed that the responsibility of indication is confined to the sum outputs of the adder block, thereby freeing the carry signal from indication constraints which facilitates fast carry propagation. Even with the arrival of a subset of the inputs, the carry outputs could become defined/undefined, while the sum outputs would have to wait for the arrival of all the inputs to become defined/undefined. Thus the full adder satisfies Seitz’s weak-indication timing constraints. This style of implementation is labeled as the

Secondly, the full adder block depicted in Figure

We now consider a variety of scenarios where logic redundancy is explicit in a circuit design. To this end, we analyze some adder circuits which employ a uniform DI data encoding protocol (dual-rail encoding) for both primary inputs and outputs, or a combination of DI codes (dual-rail and 1-of-4 codes) for primary inputs, but a single DI code (dual-rail code) for the primary outputs.

The term “hybrid input encoding” specifies a mix of at least two different DI data encoding schemes as adopted for the primary inputs. Considering the single-bit full adder block, the augend and addend input bits can be encoded using a 1-of-4 code, while the carry input, sum and carry outputs can adopt the dual-rail code; that is, hybrid encoding of primary inputs and uniform encoding of primary outputs are resorted to. The structure of the

The general expressions governing a full adder block utilizing hybrid input encoding for inputs and dual-rail encoding for outputs are given below. In the equations that follow, (

The full adder block that synthesizes equation (

Hybrid input encoded full adder block (SSSC_HIE_NRL adder).

The synthesized hybrid input encoded full adder block that incorporates logic redundancy is shown in Figure

Hybrid input encoded full adder including redundancy (SSSC_HIE_RL adder).

Here, gates

We now analyze the effect of introducing redundant logic in a self-timed dual-bit adder module that employs homogeneous data encoding for both its primary inputs and outputs. The homogeneous encoding procedure refers to a similar DI data encoding protocol as adopted for all the primary inputs and outputs of a function block—here dual-rail data encoding. The dual-bit adder block consists of dual-rail encoded versions of five single-rail inputs, namely,

The reduced orthogonal sum-of-products forms corresponding to the encoded outputs of the dual-bit adder are given below, expressed in terms of their encoded inputs. In an orthogonal sum-of-products form, the logical conjunction of any pair of product terms yields a null:

The architecture of the

Dual-rail encoded

Redundant logic insertion in a homogeneously encoded dual-bit adder.

The gate output node labeled “

The heterogeneous encoding procedure implies a combination of at least two different DI codes (say, dual-rail and 1-of-4 codes), used to encode the primary inputs and outputs of a self-timed logic circuit. A dual-bit adder block based on heterogeneous DI data encoding can represent the augend, addend inputs, and sum outputs by a 1-of-4 code, while the input and output carry signals can be represented using the dual-rail code. Adopting such an encoding scheme, the minimized expressions for the function block outputs are given below. It is to be noted that the 1-of-4 code assignments for the augend, addend inputs, and the sum outputs are the reverse of the assignments given in Table

The dual-bit adder module that synthesizes (

Weakly indicating heterogeneously encoded dual-bit adder module.

The

Heterogeneously encoded dual-bit adder based

(a) Self-timed system handling heterogeneously encoded inputs and outputs, (b) dual-rail to 1-of-4 encoder, (c) 1-of-4 to dual-rail decoder.

To demonstrate the usefulness of the proposed concept of logic redundancy insertion, simulations have been performed by considering a 32-bit self-timed RCA architecture. In this context, a subset of well-known self-timed design methods [

Delay, area, and power of various nonredundant 32-bit self-timed RCAs.

Adder realization style | Delay (ns) | Area (^{2}) | Power ( |
---|---|---|---|

SSSC_HIE_NRL (weak) | 8.0 | 6633 (78) | 619.1 |

DIMS_DSSC_DRE (weak)* [ | 12.8 | 21833 (1202) | 1025.9 |

Toms_DSSC_DRE (strong) [ | 9.4 | 10793 (512) | 693.1 |

Toms_DSSC_HE (strong) [ | 9.0 | 12121 (479) | 695.9 |

Folco et al._DSSC_DRE (weak) [ | 5.9 | 9417 (426) | 740.4 |

DSSC_DRE (weak) | 5.9 | 14921 (770) | 871.9 |

DSSC_HE (weak) | 5.8 | 10889 (402) | 688.4 |

The nature of indication of the different adders is mentioned within brackets in the 1st column of the Table. The values specified within brackets in the 3rd column of the table signify the area of the respective individual single-bit/dual-bit self-timed adder block. The delay, area and power parameters of the different redundant logic incorporated 32-bit self-timed RCAs are given in Table

Delay, area, and power metrics of various redundant 32-bit self-timed RCAs.

Adder realization style | Delay (ns) | Area (^{2}) | Power ( |
---|---|---|---|

SSSC_HIE_RL (weak) | 5.9 | 6953 (88) | 630.2 |

DIMS_DSSC_DRE (weak) [ | 10.5 | 22473 (1242) | 1034.6 |

Folco et al._DSSC_DRE (weak) [ | 4.7 | 9577 (436) | 743.8 |

DSSC_DRE (weak) | 4.6 | 15081 (780) | 875.3 |

DSSC_HE (weak) | 4.6 | 11049 (412) | 691.9 |

With reference to the DSSC_HE adder module shown in Figure ^{2}, and 696.8

A new concept of redundant logic insertion was described in this paper that can be used to minimize the data path delay of self-timed arithmetic circuits. It was shown that introduction of logic redundancy is feasible with respect to many self-timed design methods, especially for synthesizing iterative logic specifications. The advantages of logic redundancy insertion have been propounded on the basis of a 32-bit self-timed carry-ripple addition. It has been inferred from the simulation results that significant reduction in latency could be achieved at the expense of only marginal increases in area and power metrics. It was also discussed how logic redundancy paves the way for constant latency operation by permitting fast reset when applying spacer data, while actual case latency is encountered for addition of valid data.

This research was supported in part by the Engineering and Physical Sciences Research Council, UK, under Grant EP/D052238/1. The first author was additionally supported by a bursary from the School of Computer Science of the University of Manchester.