A Fast Dynamic 64-bit Comparator with Small Transistor Count

CHUA-CHIN WANG*, YA-HSIN HSUEH, HSIN-LONG WU and CHIH-FENG WU

Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung 80424, Taiwan, ROC

(Received 1 May 2000; Revised 16 March 2001)

In this paper, we propose a 64-bit fast dynamic CMOS comparator with small transistor count. Major features of the proposed comparator are the rearrangement and re-ordering of transistors in the evaluation block of a dynamic cell, and the insertion of a weak n feedback inverter, which helps the pull-down operation to ground. The simulation results given by pre-layout tools, e.g. HSPICE, and post-layout tools, e.g. TimeMill, reveal that the delay is around 2.5 ns while the operating clock rate reaches 100 MHz. A physical chip is fabricated to verify the correctness of our design by using UMC (United Microelectronics Company) 0.5 μm (2P2M) technology.

Keywords: Comparator; Dynamic CMOS; Small transistor count; Feedback inverter; High speed; VLSI

INTRODUCTION

High speed operation has long been a target of circuit design owing to the speed demand of supercomputing, CPU, etc. One of the critical operations is the comparison of two binary data. Theoretically, the fastest comparator is made of full combinatorial logic gates. However, the gate count, the area and the fan-in will be problems when the length of the data is very large, e.g. \( n = 64 \). Besides, wide bit comparators are key components in the design of parallel testing, signature analyzer and built-in self test (BIST) circuits, etc. [4]. Although high fan-in gates are useful in a number of applications, they are not practical in a single stage of static CMOS. Since the NMOS and PMOS transistors of a static CMOS gate are dual of each other, one of them will always be arranged in series. These transistors also increase the loading seen by their previous stages. When a large fan-in is required, the dynamic logic, thus, has to be used [1,2]. Meanwhile, other prior dynamic logic design styles suffer from different difficulties. For example, domino logic [6] cannot be noninverting; NORA [6] has the charge sharing problem; all-N-logic [6] and robust single phase clocking [1] cannot operate correctly under clocks with short rise time or fall time, which cannot be easily integrated with other part of logic design; single-phase logic [6] and Zipper CMOS [6] contain slow P-logic blocks. In this work, we propose a fast 64-bit dynamic comparator with small transistor count.

FAST 64-BIT COMPARATOR CIRCUIT

Prior Comparators

Three comparator circuits have been proposed [5].

1. The equality comparator using the combination of XNOR gates and an NAND gate is shown in Fig. 1.
2. The comparator using a pass-gate logic structure is shown in Fig. 2.
3. As shown in Fig. 3, another version of the comparator, using a merged XNOR/NOR gate and pseudo-nMOS FETs, is presented.

Equality Comparator

An example of the proposed dynamic CMOS 4-bit equality comparator is shown in Fig. 4. In Fig. 4, when the CLK is low, Node_1 is precharged to VDD. If \( A(0) \) and \( B(0) \) are both high, then \( N1 \) and \( N2 \) are on and \( P1 \) and \( P2 \) are off. Thus, no current path exists during the evaluation period, and then Node_1 will be kept high. If \( A(0) \) is high and \( B(0) \) is low, then \( N1 \) and \( P2 \) are on. Thus, a current path is formed between Node_1 and ground through \( P2 \) and \( N1 \) during the evaluation period. Node_1 will then be pulled down. The truth table is tabulated in Table I.

In short, when any pair of \( A(i) \) and \( B(i) \) is the same. In short, when any pair of \( A(i) \) and \( B(i) \) is the same.
If \( A(i) \) is not equal, a current path will be formed and Node_1 will be low. By contrast, if \( A(i) \) is equal to \( B(i) \) for all \( i \), Node_1 will keep high \([5]\). Notably, because PMOSs are used in the discharge path, the voltage of Node_1 can only be discharged to \( V_{tp} \) instead of GND. Thus, a latch is required to connect to Node_1. The weak \( n \) feedback is used to pull down Node_1 to ground when Node_1 is in the low state. The weak \( p \) feedback is utilized to latch Node_1 to \( V_{dd} \) when Node_1 is in the high state. Hence, the charge redistribution problem can be resolved. The pull-up time is determined only by the pull-up transistor \( P_0 \), but the ground switch \( N_0 \) will increase the pull-down time. Note that the ground switch may be omitted if the inputs of every pair are guaranteed at the same states during the precharge period \([2,5]\).

### Zero/one Detector

The same design methodology can be applied to another important application. That is, the zero/one detector. Notably, detecting all ones or all zeros on wide words requires large fan-in AND or OR gates. Constructing a tree of AND gates can overcome this problem, as shown in Fig. 5. Alternatively, another version of design, as shown in Fig. 6, was proposed \([5]\). The zero/one detector is also employed in the parallel testing of memory, where the outputs of the arrays are compared against the expected data, as shown in Fig. 7.

The proposed circuit of a 4-bit zero/one comparator is shown in Fig. 8. When the CLK is low, Node_1 is precharged to \( V_{dd} \). If \( Ref \), the reference data, is set high and \( D_0, D_1, D_2 \) and \( D_3 \) are all high, then \( N,N_0,N_1,N_2 \) and \( N_3 \) are on while \( P,P_0,P_1,P_2 \) and \( P_3 \) are all off. Thus, no current path exists during the evaluation period,
FIGURE 4 Proposed equality comparator.

FIGURE 5 Prior zero/one comparator (a).

FIGURE 6 Prior zero/one comparator (b).
and then Node_1 will be kept high. Similarly, if Ref is low and $D_{k0}$, $D_{k1}$, $D_{k2}$, and $D_{k3}$ are all low, then $N$, $N_0$, $N_1$, $N_2$, and $N_3$ are off and $P$, $P_0$, $P_1$, $P_2$, and $P_3$ are all on. Thus, no current path exists during the evaluation period either, and then Node_1 will be kept high. If any input is different from Ref, there will be some NMOS and PMOS turned on simultaneously. A current path will then be formed between Node_1 and ground during the evaluation period. Node_1 will be discharged to low. The truth table is tabulated in Table II.

### Transistor Count and Speed Comparison

The total transistor count of the mentioned circuits is summarized in Table III.

Note that tiny XOR with 6 transistors is used for the traditional comparators. It is obvious that the transistor count of the proposed comparators is much less than that of the other comparators with the same functionality. Regarding the speed comparison, owing to the low input capacitance of the dynamic logic, the speed performance is better than that of other logics. The comparisons of input capacitance of different comparators are tabulated in Table IV. Notably, the input capacitance of the proposed circuit is the minimum. Besides, there are only two stages in the proposed circuits, which make the total delay time shorter. Thus, the speed performance of the proposed design is expected to be better than that of the previous designs.
Design of the 64-bit Comparator

Following the proposed design strategy, a hierarchical design of a fast 64-bit comparator is shown in Fig. 9, which is composed of eight 8-bit equality comparators and one 8-bit zero/one comparator. The individual 8-bit equality comparator, respectively, determines the equality of one of the eight corresponding bytes of the two input 64-bit data, and produces one output signal to the 8-bit zero/one comparator wherein the Ref is set to “0”. In other words, the overall 64 bits are divided into eight bytes which are evaluated at the same time, and then the 8-bit zero/one comparator produces the final output signal. HSPICE is employed to optimize the speed. The length of all of the transistors are all set to 0.6 μm, while their widths are illustrated in Table V.

SIMULATIONS AND CHIP LAYOUT

The entire 64-bit comparator simulated by HSPICE reveals a very short delay as tabulated in Table VI.

The clock rate can run up to 200 MHz with 0.01 ps rise/fall time. Figure 10 is the waveform when the clock rate is 200 MHz. Figure 10 is also known to be the worst case scenario. That is, there is only one-bit difference between the two 64-bit input data. The TimeMill simulation results indicate a 2.5 ns delay without pads and 4.5 ns with pads.

The design is carried out by using UMC (United Microelectronics Company) 0.5 μm (2P2M) technology. The chip layout with pads is shown in Fig. 11 which

### TABLE V

<table>
<thead>
<tr>
<th>Comparator</th>
<th>( W ) in equality</th>
<th>( W ) in zero/one</th>
</tr>
</thead>
<tbody>
<tr>
<td>( P_{\text{clk}} )</td>
<td>10</td>
<td>15</td>
</tr>
<tr>
<td>( N_{\text{clk}} )</td>
<td>15</td>
<td>20</td>
</tr>
<tr>
<td>( P_{\text{evaluation block}} )</td>
<td>10</td>
<td>2.5</td>
</tr>
<tr>
<td>( N_{\text{evaluation block}} )</td>
<td>5</td>
<td>10</td>
</tr>
<tr>
<td>( P_{\text{inverter}} )</td>
<td>15</td>
<td>20</td>
</tr>
<tr>
<td>( N_{\text{inverter}} )</td>
<td>1</td>
<td>2</td>
</tr>
<tr>
<td>( P_{\text{feedback}} )</td>
<td>0.9</td>
<td>0.9</td>
</tr>
<tr>
<td>( N_{\text{feedback}} )</td>
<td>0.9</td>
<td>0.9</td>
</tr>
</tbody>
</table>

### TABLE VI

<table>
<thead>
<tr>
<th>I/O path</th>
<th>Delay (ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td>clk → output</td>
<td>2.126</td>
</tr>
<tr>
<td>input → output</td>
<td>2.120</td>
</tr>
</tbody>
</table>
occupies 1.8 × 1.8 mm² while the core is only 145 × 240 μm². The data are serially byte-wide I/Oed. We also simulate several comparator designs using different logics. Note that the adders/subtractors are also often used as comparators. The results are tabulated in Table VII.

The proposed design was approved by CIC (Chip Implementation Center) of NSC (National Science Council).

### TABLE VII The performance comparison of different designs

<table>
<thead>
<tr>
<th>Logic</th>
<th>Delay</th>
<th># Transistors</th>
</tr>
</thead>
<tbody>
<tr>
<td>64-b PLA-ANT CLA [6]</td>
<td>4.0 ns</td>
<td>8352</td>
</tr>
<tr>
<td>32-b EMODL adder [1]</td>
<td>2.7 ns</td>
<td>1537 (gates)</td>
</tr>
<tr>
<td>8-b TSPC adder (1 μm) [3]</td>
<td>7.5 ns</td>
<td>1832</td>
</tr>
<tr>
<td>All-N-logic [3]</td>
<td>Failed</td>
<td>2062</td>
</tr>
<tr>
<td>The proposed</td>
<td>2.50 ns</td>
<td>328</td>
</tr>
</tbody>
</table>

### FIGURE 10 Simulation waveform.

### FIGURE 11 Chip layout.

### FIGURE 12 Die photo.

### FIGURE 13 Simulation waveforms given randomly normal inputs.

### FIGURE 14 Simulation waveforms given the worst case of inputs (“1FF” and “1FE”).
CONCLUSION

Several dynamic CMOS comparators are proposed with a number of advantages. The transistor count is much less than that of the other similar designs. Although it has high fan-in, the number of series transistors is only two, which in turn reduce the pull down delay. Compared with XOR-based equality comparators and deterministic comparators, the proposed design is much faster. The design methodology is proven to implement a fast 64-bit dynamic comparator.

Acknowledgements

This research was partially supported by National Science Council under grant NSC 88-2219-E-110-001 and 89-2215-E-110-014.

References


Authors’ Biographies

Chua-Chin Wang was born in Taiwan, in 1962. He received the BS degree in Electrical Engineering from National Taiwan University, Taiwan, in 1984 and the MS and PhD degrees in electrical engineering from State University of New York, Stony Brook, in 1988 and 1992, respectively. Currently he is a Professor in the Department of Electrical Engineering, National Sun Yat-Sen University, Taiwan. His research interests include low-power logic and circuit design, VLSI design, and neural networks implementations.

Ya-Hsin Hsueh was born in Taiwan, in 1976. She received BS and MS degree in Electrical Engineering from National Sun Yat-Sen University, Taiwan, in 1998 and 2000, respectively. She is currently working toward the PhD degree in Electrical Engineering at National Sun Yat-Sen University. Her current research interests are VLSI design and interfacing I/O circuits.

Hsin-Long Wu was born in Taiwan, in 1976. He received BS and MS degree in Electrical Engineering from National Sun Yat-Sen University, Taiwan, in 1998 and 2000, respectively. He is currently working in Computer and Communication Lab of Industrial Technology Research Institute. His current research interests are VLSI design and system integration.

Chi-Feng Wu was born in Kaohsiung, Taiwan, in 1961. He received his BS degree in electrical engineering from National Cheng Kung University, Tainan, Taiwan and the MS degree (1994) and the PhD degree (2000) in Electrical Engineering from National Sun Yat-Sen University, Taiwan. Since 1987, he has been working with Philips Semiconductor in Kaohsiung. Currently he is the Factory Manager of Wafer Testing Factory of Philips Semiconductor, Kaohsiung. His major research interests include design for testability, Iddq testing and VLSI design.