Statistical Estimation of the Switching Activity in VLSI Circuits

Higher levels of integration have led to a generation of integrated circuits for which 
power dissipation and reliability are major design concerns. In CMOS circuits, both of 
these problems are directly related to the extent of circuit switching activity. The average 
number of transitions per second at a circuit node is a measure of switching activity that 
has been called the transition density. This paper presents a statistical simulation 
technique to estimate individual node transition densities in combinational logic 
circuits. The strength of this approach is that the desired accuracy and confidence can be 
specified up-front by the user. Another key feature is the classification of nodes into two 
categories: regular- and low-density nodes. Regular-density nodes are certified with 
user-specified percentage error and confidence levels. Low-density nodes are certified 
with an absolute error, with the same confidence. This speeds convergence while 
sacrificing percentage accuracy only on nodes which contribute little to power 
dissipation and have few reliability problems.


INTRODUCTION
The advent of VLSI technology has brought new challenges to the manufacture of integrated circuits. Higher levels of integration and shrinking line widths have led to a generation of devices which are more sensitive to power dissipation and reliability problems than typical devices of a few years ago. In these circuits excessive power dissipation may cause run-time errors and device destruction due to overheating, while reliability issues may shorten device lifetime. It is especially useful to diagnose and correct these problems before circuits are fabricated. In CMOS circuits, gates draw current and consume power only when making logical transitions. As a result, power dissipation and reliability strongly depend on the extent of circuit switching activity. Hence, there is a *Corresponding author. 244 F.N. NAJM AND M. G. XAKELLIS need for CAD tools that can estimate circuit switching activity during the design phase.
Circuit activity is strongly dependent on the inputs being applied to the circuit. For one input set the circuit may experience no transitions, while for another it may switch frequently. During the first input set the circuit dissipates little power and experiences little wear, but for the second its activity could cause device failure. However, the specific input pattern sets cannot be predicted upfront. Furthermore, it is impractical to simulate a circuit for all possible inputs. Thus, this input pattern dependence severely complicates the estimation of circuit activity.
Recently, some approaches have been proposed to overcome this problem by using probabilities to represent typical behavior at the circuit inputs. In [1], the average number of transitions per second at a circuit node is proposed as a measure of switching activity, called the transition density. An algorithm was also proposed to propagate specified input transition densities into the circuit to compute the densities at all the nodes. The algorithm is very efficient, but it neglects the correlation between signals at internal nodes. This leads to errors in the individual node densities that may not always be acceptable, especially since the desired accuracy cannot be specified up-front.
This correlation problem was avoided in [2], where the total average power of the circuit (a weighted sum of the node transition densities) was statistically estimated by simulating the circuit for randomly generated input patterns. The power value is updated iteratively until it converges to the true power with a user-specified accuracy (percentage error tolerance), and a user-specified confidence level. It was found that convergence is very fast because the distribution of the overall circuit power was very nearly Gaussian and very narrow about its mean.
While power estimation is one important reason to find the transition densities in a circuit, it is not the only one. If we assume that the power bus carries a constant voltage Vdd then a single logic gate draws an average current [1] of: Ixgdd Cx D (x) (1) and dissipates an average power of." ex D (x) (2) where Cx is total capacitance, and D(x) the transition density, at the gate output node x. Thus the individual node transition densities can be used to find the individual gate power values using (2) which are helpful in order to avoid hot spots and to ensure that the power dissipation is relatively uniform across the chip. Furthermore, the individual node densities can be used to estimate average current in the power and ground busses using (1), to be used for electromigration analysis. However, it becomes extremely inefficient to use the statistical sampling technique in [2] to estimate the transition densities at every gate output. This is because a large number of input patterns would be required to converge for nodes that switch very infrequently, as we will demonstrate in section 2.
In this paper, we will present an extension of the approach in [2] whereby we remove the above limitation and efficiently estimate the transition density at all circuit nodes. To overcome the slow convergence problem, we apply absolute error bounds to nodes with low transition density values, instead of percentage error bounds. This is done by establishing a threshold, r/rain, to classify node transition density values. Any node with a transition density value less than the threshold is classified as a low-density node and is certified with absolute error. Nodes with transition density values equal to or above the threshold are classified as regular density nodes and are certified with a percentage error. A major advantage of this approach is that the desired accuracy can be specified up-front by the user. Furtheremore, the percentage error bound is relaxed (i.e., replaced by an absolute error bound) only on low-density nodes. These nodes dissipate little power and have few reliability problems. As with other previous work in this area, our technique is presently SWITCHING ACTIVITY 245 restricted to combinational circuits (we are in the process of extending it to include sequential circuits).
The statistical simulation techniques to be presented are implemented in a prototype called "Mean Estimator of Density" (MED). MED's performance is evaluated by looking at the accuracy of its results, its convergence rate, and its execution time. Preliminary results of this work have appeared in [3]. This paper is organized as follows. In the next section, the statistical estimation technique is described. Section 3 presents experimental data and evaluates MED's performance, while section 4 presents a summary. Finally, two appendices are presented that contain some required theoretical results.

PROPOSED SOLUTION
This section presents our statistical estimation technique for computing the transition densities at all circuit nodes. It is expected that the user will supply the transition density, denoted D(x), for every circuit input node. Actually, the user should also specify the fraction of time that a circuit input signal is high, called the probability at that node, and denoted by P(x). If unspecified, these probabilities are assigned default value of 1/2. This technique, as well as the other techniques reviewed in the introduction, apply only to combinational circuits. It can be applied to the combinational part of a sequential circuit provided that the transition densities at the flip-flop outputs (which are inputs to the combinational part) are specified. Given the input transition densities and probabilities, we can use a random number generator to generate corresponding logic input waveforms with which to drive a simulator. Based on such a simulation of the circuit for a given time period T, we can count the number of transitions at every node, a number which will be called a sample taken at that node. If we repeat this process N times, and form the average of the number of transitions at a node, so-called the sample mean, then /T is an estimate of the transition density at that node. It is well known from statistics [4] that for large values of N, the sample mean will approach the true average number of transitions in T, to be represented by 7. Likewise, the sample standard deviation s will approach a, where O "2 is the variance of the number of transitions in T. One continues to take samples (make simulation runs) until is close enough to 7. The method by which one tests for this is called the stopping criterion, to be discussed next. The following sub-section details the mechanism of input waveform generation.

A. Stopping Criterion
According to the Central Limit Theorem [4], is a value of a random variable with mean r/whose distribution approaches the normal distribution for large N. The minimum number of samples, N, to satisfy near-normality is typically 30. It is also known that for such values of N one may use s as an estimate of a.
Since the distribution of sample means is nearnormal, we can make inferences about the quality of an individual sample. With (l-c0 x 100% confidence it then follows that . [4]: more applicable to our problem, so that with confidence (l-a) x 100%, we have: (5) It El is a small positive number, and if N_(za/2s) 2 (6) samples are taken, then E1 places an upper bound on the percentage error of the sample with (l-a) x 100% confidence: (7) This may also be expressed as the percent deviation from the population mean 7: where e is defined to be a user-specified error tolerance. Thus (6) provides a stopping criterion to yield the accuracy specified in (8) with confidence (l-a) x 100%.
It should be clear from (6) Thus min e becomes an absolute error bound that characterizes the accuracy for low-density nodes.
We therefore classify the circuit nodes into regular-density nodes and low-density nodes.
During the algorithm (after N exceeds 30) (6) is used as a .stopping criterion as long as >_ min, otherwise (9) is used instead. The value of /rain can be specified by the user and strongly affects the speed of the. algorithm, as will be shown in section 3.
Let r and r be the measured and true density values for a regular-density node, and let l and t be the corresponding values for a low-density node. Since > min then at convergence and for small e, el, we have: ] mine , mine l+e (11) so that the absolute error values for low-density nodes should be less than the absolute errors for regular-density nodes. Although low-density nodes require the longest time to converge, they have the least effect on circuit power and reliability. Therefore the above strategy reduces the execution time, with little or no penalty.

B. Input Generation
Our implementation of this technique has two modes, synchronous and asynchronous, as shown in the block diagram in Figure 1. In the synchronous mode, we assume that the (combina- SWITCHING ACTIVITY 247 tional) circuit is part of a larger synchronous sequential circuit design, so that its input events should be generated in synchrony. Otherwise, asynchronous operation is assumed and events do not have to be synchronized. Thus the only difference between synchronous and asynchronous operation is in the generation of the input transitions.

B.I. Synchronous Mode
In the synchronous mode, an input node may transition only at the beginning of a clock cycle, so that the input pulse widths are discrete multiples of the clock period, Tc. The distribution of the high (and low) pulses at the inputs is arbitrary, and can be user-specified. The choice of distribution is not very important because, as observed in [2], the power is relatively insensitive to the particular distribution, rather it depends mainly on the input transition densities. Our implementation assumes that the distribution is geometric [4]. This arises from a simple sufficient condition that an input signal be Markov [5], i.e., that its value after a clock edge depends only on its value before the clock edge, one that value is specified, and not on its values during earlier clock cycles. Under this assumption, we show in appendix B that the pulse widths have a geometric distribution. If #0 and #1 are the mean low and high pulse widths, computed as shown in appendix A from: then it is also shown in appendix B that the probability that a low signal will transition high on the clock is: p01 (14) #0 and the probability of a high signal transitioning low on the clock is: A random number generator uses (14) and (15) to generated input transitions for every clock cycle.

B.2. Asynchronous Mode
For circuits running asynchronously, input transition generation proceeds differently. Since input transitions may occur at any time, the input generation routine determines the length of time between transitions instead of the probability of transitioning at the clock edge. Again, the distribution of the pulse widths is arbitrary, and can be specified by the user. Our implementation is based on a Markov assumption, so that the length of time between successive transitions is a random variable with an exponential [5] distribution. The length of time a signal stays in the low (high) state has mean #0(#1). From this information, the waveform is easily generated using an exponential random number generator. Additionally, when running asynchronously the simulator requires a setup period. This is a waiting period during which no samples are collected. It is needed for the same reasons that a setup period was required in [2]. Briefly, it allows the circuit to "get up to speed". Before sampling begins, transitions at the inputs must be allowed to propagate into the internal nodes of the circuit. Until all levels of the circuit are involved, switching activity is artificially low and any power or reliability estimates will be skewed. The length of the setup period should be, as was also shown in [2], no less than the maximum delay of the circuit.

EXPERIMENTAL RESULTS
This technique has been implemented in the program MED (Mean Estimator of Density), in which the basic simulation capability is event-driven, gate level, with a scalable delay timing model (based on output capacitance and fanout). In general, any simulation strategy can be used, so that the technique presented can be wrapped around any existing simulator and simulation library. In this section we present data collected with MED, and show that it is both accurate and practical on a number of large benchmark circuits.

A. Input Specification
The experimental results to be presented are based on a specification of the typical circuit inputs as follows.
In the synchronous mode, we assumed that the circuit would be operated near its maximum operating frequency, so that the clock cycle time, Tc, is close to the maximum circuit delay, Tmax.
Unless otherwise specified, the results presented were based on a value of Tc that is nsec longer than Tmx.
Secondly, the transition density values were normalized to the clock period, i.e., the transition densities used by the program are expressed in terms of transitions per clock cycle, rather than transitions per second. The output densities are then invariant to clock cycle time, and the user has a more intuitive view of circuit activity -0.5 transitions per clock cycle is more informative than 5e7 transitions per second.
Finally, it was specified that every input node has probability of 1/2 and a transition density of 1/2. Thus, on average, each input node was assumed to spend an equal time high and low, and to have one transition every other clock cycle.
Asynchronous input probability and density assumptions are similar to the synchronous assumptions. In this case, the transitions densities are normalized by Tmax and inputs are assumed to have probabilities of 1/2 and transition densities of 1/2.

B. Data Collection
The issues to be investigated are (1) the error of the technique, (2) the handling of low-density nodes, and (3) the practicality of the technique for large circuits. The data collected should allow MED's performance to be evaluated in the above three categories.

Density Values
The first step in evaluating MED's performance is to establish a set of accurate node transition densities. This baseline would then be used to calculate the actual error of the estimated transition density values. This was done by running MED for a long time on the benchmark circuits presented at ISCAS in 1985 [6]. Typically, in order to achieve 99.99% confidence and 1% error tolerance for all the nodes, this required millions of input vectors and hours or days of CPU time. Table I lists the circuits, number of gates, and number of samples required for each circuit and mode of operation.

B.2. Calculating Error Distributions
To verify that MED produces results within the specified error tolerances, 10 runs with min varying linearly from 0.05 to 0.50 were executed with 95% confidence (a 0.05) and 5% error circuit Tables II and III give the percentage of transition density values out-of-bounds for all the circuits under investigation. From the tables it can be seen that this percentage is very low, well below the specified 5%. This happens because many of the nodes are oversampled, since the simulator will run until the last node converges. This yields more accuracy on some nodes than what is actually specified by the user.

B.3. Comparison ofmin and Execution Time
It is expected that since the simulator runs until its last node converges, and further that low-density nodes require the longest time to converge, then adjusting T]rnin would significantly affect overall simulation time while sacrificing percentage accuracy on a small number of nodes.
Ten simulations are run with T]min varying linearly from 0.05 to 0.50. SUN Sparc-ELC execution times in cpu seconds are tabulated and reported in Table IV. Low-density nodes typically require the largest number of samples to converge, and as a result execution time drops dramatically as Tlmin rises. In some cases however, the lowestdensity nodes are not the last to converge, and the circuit  adjustment of min has no effect on execution B.4. Execution Times on Large Circuits time.
The simulation times for all circuits except for c6288 follow a general downward trend, as shown in in Figure 2. The curves result from averaging circuit execution times (excluding c6288) normalized by the time required for the circuit to simulate with T]min 0.05.
The behavior of circuit c6288 is an exception to this trend. The execution times for c6288 are essentially invariant to T]rnin for 0 < Tlmin < 0.5. This occurs because c6288 has regular density nodes with considerable variation, and at least one of the regular density nodes with > 0.5 converges after all low-density nodes. Because of this, the last nodes to converge are not affected by Tlmin. For the technique to gain wide acceptability, it must have reasonable execution times on larger circuits. The circuits used in this section are the largest ones presented at ISCAS in 1989 [7].
Circuits were first simulated with high min. This provided a rough estimate of each circuit's transition density distribution. The simulation was then rerun with Tlmin chosen to classify under 20% of the nodes as low-density nodes while providing reasonable execution times. The number of gates, execution times, and percentage of lowdensity nodes are shown for each circuit in Table V. Considering the high accuracy level (5% error at 95% confidence), the execution times are reasonable, especially for the more common class of synchronous circuits, and indicate that this approach is applicable to large circuits.

SUMMARY
We have presented a statistical estimation technique, implemented in the program MED, which estimates individual node transition densities with  Data were gathered to verify that both regularand low-density node transition density values are within the stated error bouds. Trials.were run with 95% confidence and 5% error tolerance. It was found that well over 95% of regular node transition density values have less than 5% error. This occurs because many of the nodes converge quickly and are subsequently oversampled. Lowdensity nodes also performed well. Well oyer 95% of low-density node transition density values have less than the specified absolute error.
Data were also gathered to investigate the variation of execution time with Tlmin. In most cases, it was found that the execution time for circuits falls dramatically as min rises. This occurs because the lowest density nodes typically converge last. Finally

DEFINITION
The signal probability of x(k), to be denoted P(x), is defined as: It can be shown tha the limit in (A. 1) always exists. If x(k) x (k-1), we say that the signal undergoes a transition at time k. Corresponding to every logic signal x (k), one can construct another logic signal tx(k) so that tx(k) if x(k) undergoes a transition at k, otherwise tx(k)=0. Let nx(K) be The time between two consecutive transitions of x(k) will be referred to as an intertransition time: if x(k) has a transition at and the next transition is at i+ n, then there is an intertransition time of length n between the two transitions. Let #1(#0) be the average of the high (low), i.e., corresponding to x(k) (0), inter-transition times of x(k Let z E Z be a random variable with the cumulative distribution function Fz(k) 1/2 for any finite k, and with Fz(-Oe)= 0 and Fz(+ oe)= 1. One might say that z is uniformly distributed over the whole integer set Z. We use z to construct from x(k) a stochastic 0-1 process x(k), called its companion process, defined as follows.
DEFINITION 3 Given a logic signal x(k) and a random variable z, uniformly distributed over Z, define a 0-1 stochastic process x(k), called the companion process of x(k), given by: For any given k=kl, .X(kl) is the random variable X(kl +z)-a function of the random variable z. Intuitively, x(k) is a family of shifted copies of x(k), each shifted by a value of the random variable z. Thus, not only is x(k) a sample of x(k), but one can also relate statistics of the process x(k) to properties of the logic signal x(k), as follows.