A NEW STATISTICAL METHOD FOR MAXIMUM POWER ESTIMATION IN CMOS VLSI CIRCUITS

.


INTRODUCTION
Excessive instantaneous power dissipation in CMOS VLSI circuits is a potential source of many reliability and performance problems such as physical failures duc to overheating or clcctromigration, func- tional failures caused by voltage drop (or IR-drop) on power rails, reduced noise margins and switching speed degradation.The issues regarding reliability arc becoming cvcn more important as we move into dccp-submicron ICs which exhibit decreased feature sizes and *Corresponding author, e-mail: abari@cs.ntua.grpower supply levels.Consequently, there is an increasing need for tools and methods that perform fast and accurate estimation of the circuit's maximum power requirements during the design phase.
Previous work on this field has focused on the creation and ex- ploitation of deterministic or stochastic information (such as switch- ing activity and signal correlations) about the circuit's inputs, without explicitly simulating the circuit.While most related techniques exhibit good timing figures, they make some simplifying assumptions that of- ten result in loss of accuracy.In addition, their speed and algorithmic efficiency generally degrade with large or complex circuit structures.Statistical methods might offer a promising alternative but they have not seen a widespread application as yet, due to the inadequacy of exist- ing statistical models to deal with extreme situations such as in maxi- mum power analysis.Even though a step towards the right direction was made in [1], the reported method overlooked some important facts about extremes.
This paper presents a new statistical method for maximum power estimation which is based on extreme value theory and more precisely on the properties of extreme order statistics of the probability dis- tribution of the instantaneous power drawn from the power supply bus.The proposed method has proved extremely fast in its operation, as only a small subset of input patterns is required to obtain a rea- sonably accurate estimate of maximum power.Furthermore, the num- ber of units of this subset is constant and independent of the circuit size or complexity, whereas most competing techniques impose a com- putational load that grows exponentially with circuit size and may prove prohibitively expensive to apply to large circuits.On the other hand, there are no over-simplifying assumptions to hinder the neces- sary accuracy, since the statistical inference is performed on the out- come of highly-accurate simulation of the circuit under typical input vectors. 2. THEORETICAL BACKGROUND 2.1.Order Statistics and Extreme Order Statistics DEFINITION 2.1 Let X'I, X2,... ,X'n be a sample of size n from a given population with cumulative distribution function (cdf) F(x).
The units Xr are drawn in random so that they form independent and identically distributed (iid) random variables with cdf equal to F(x).If they are subsequently rearranged in an increasing order of magnitude, Xl:n < Xz:n <"" < Xn:n, then the r-th term Xr:n of this new sequence is called the r-th order statistic of the sample.In particular, the terms Xl:n min(X1, X2,... ,Xn) and are called extreme order statistics (minima and maxima order statistics respectively) or simply extremes.THEOREM 2.1 [2] The cdf Fx,:.(x) of the r-th order statistic of a sample drawn from a population with cdf F(x), is given by: Fx,:,(x) r ur-l(1 u)n-rdu IF(x)(r,n-r + 1) (2.1) where lp(a, b) is the incomplete beta function.By setting r n we obtain the cdf of the maxima: Fx,:, (x) Fn(x) (2.2) The above expressions require the knowledge of the parent cdf.
However, in the frequently encountered case where the probability distribution of a physical quantity is unknownl they prove quite insufficient.Therefore we have to turn to the asymptotic distribu- tions of extremes (if any) that emerge in the limit n + of the sample size.for given normalizing sequences {an} and {bn > 0}.
Then, cdf F(x) lies in the domain of attraction for maxima of'.

Properties of Asymptotic Distributions
This section contains some important facts about the three aforementioned families of asymptotic distributions.Theorem 2.3 implies that a parent distribution with a finite upper end point (w(F) < cannot lie in a Frechet-type domain of attraction, whereas a distribu- tion with an infinite upper end point (w(F) + oe) cannot belong to a Weibull-type domain of attraction.However, these observations are not enough to guarantee a Frechet or Weibull-type domain of attraction for a particular distribution.In fact the Gumbel limit distribution, even though it is not bounded by itself, asymptotically models many parent distributions with either finite or infinite end points.The upper end point of the Weibull family (2.8) is equal to the location parameter #, while both Frechet and Gumbel families have infinite upper end points.
The domain of attraction of a cdf F(x) is determined by the rate with which its right tail approaches the value of one as x w(F).Depending on this rate we can distinguish between three families of distributions [4], namely the Cauchy-Pareto family, the Beta family (a special case of which is the well-known uniform distribution) and the Exponential family (which includes most of the common statistical distributions such as the Normal and Gamma ones).The Cauchy-Pareto family is characterized by an infinite upper end point and a very slow rate of convergence to one at its tail and is therefore asymptotically modeled by a Frechet limit distribution.The tail of a distribution of the Exponential family converges to one at least as fast as that of the exponential distribution with cdf F(x)= 1-exp(-(x-a2)/a).Both unbounded and bounded cdfs belong to this family and are all modeled by a Gumbel limit distribution at their extremes.Finally, members of the Beta family of distributions, which asymptotically approach the Weibull limit distribution, exhibit the faster (approximately linear) convergence towards one and are all bounded by finite upper limits.
2.4.Tail Equivalence DEFINITION 2.3 TWO cdfs F(x) and G(x) are referred to as right tail equivalent if and only if w(F)=w(G) and lim 1- THEOREM 2.4 [5] Two right tail equivalent cdfs F(x) and G(x) belong to the same domain of attraction for maxima and have the same normalizing sequences an and bn i.e., tim Fn(an + bnx) lira Gn(an + bnx) H(x).
n--+oo n---+o The practical implication of the above theorem when dealing with extreme value problems, is that a common cdf F(x) may be fitted to the right tail of an unknown cdf G(x) which is observed to follow the same limit distribution as F(x).This will be the basis of the maximum power estimation method proposed throughout this paper.

Probability Plots
The basic concept of a probability plot of a given parametric family F(x; O) of distributions with vector parameter 0, is to modify the random variable X and probability P axes scales in such way that the plot against X of any cdf belonging to that family becomes a straight line [6].This allows one to decide whether or not a set of ordered data follows a probability distribution drawn from that particular family.
More specifically, if F(x; O) can be written as y F(x; O) h-l(ag(x) + b), then the transformation g(x), h(y) changes F(x; O) into a family of straight lines r/= a + b.
Such a transformation can be applied to the parametric families (2.7)-(2.9)by taking logarithms twice: log(-log(y)) =/3 log(x #) -/3 log cr log(-log(y)) -/3 log(# x) +/3 (2.12) The above equations provide the required x-axis transformations g(x) log(x-#), g(x) -log(#-x) and g(x) x for Frechet, Weibull and Gumbel probability plots respectively, as well as the y-axis transformation r/= h(y)=-log(-log(y)) which is identical in all three cases.We notice that the x-axis of the Gum- bel plot undergoes no change whereas the other two plots turn to logarithmic abscissa scales.Moreover, Weibull and Frechet plots re- quire knowledge of the location parameter # for the computation of the transformed x-axis scale.
Gumbel probability plot is the standard starting point when the domain of attraction of the data set is unknown.The primary reason for this is that the Gumbel family of distributions occupies a central position between Weibull and Frechet families and the shape of data on a Gumbel plot allows us to determine the type of limit distribution (further details on this subject are cited in the next section).Another reason is the ignorance of the location parameter # for the Weibull and Frechet cdfs.In that case we should first lay the data on a Gumbel plot and, after having decided in favor of a Weibull or Frechet cdf, we proceed by trial and error with # on the corresponding plot until a straight line is obtained.

Selection of Limit Distribution
In this section we show that the domain of attraction of a set of data can be invoked directly from the corresponding Gumbel probability plot and specifically from the curvature it exhibits.
The curvature of the plot can be quantified as the quotient of the slopes of two straight lines fitted on two successive intervals of the particular plot segment we are interested in [2].More precisely, we compute the quantity S (S,,,2/Sn3,n,), where is the slope of the least-squares straight line fit on the order statistics Xk with index k between ni and If the value of S, calculated on a Gumbel probability plot, is well above 1 or well below 1, we may decide in favor of a Weibull or Frechet-type domain of attraction respectively.Significance levels of this test are also provided in the literature [2], though not by analytical calculation but from extensive Monte Carlo simulations (over 5000 units) that were applied on a Gumbel parent.

Parameter Estimation
The next step after determining the domain of attraction of a parent distribution, is to estimate parameters #, g and (possibly)/ which are found in expressions (2.7)-(2.9).
If the domain of attraction turns out to be of Gumbel-type, we must first perform a least-squares straight line fit to the Gumbel probability plot and then estimate parameters # and cr by match- ing the line equation coefficients to the ones in (2.12).
If the domain of attraction is of Weibull or Frechet-type, we must create the corresponding probability plot for various values of # until we obtain the best straight line in terms of the minimum fit error.After # is determined, we may proceed with the estimation of parameters cr and/3 by matching the line equation to (2.10) or (2.11), just as in the Gumbel case.However, for the particular problem we are dealing with, we do not need those parameters, since (as it will become clear in Section 3) we are only interested in the location parameter/z.

Problem Formulation
The term "maximum power" refers to the worst-case instantaneous power (or equivalently current) that is drawn from the supply bus.In order to render the estimation problem precise, we have to con- sider the sources of power dissipation in digital CMOS circuits, which are [8] Pswitching, Pshort-cireuit and Pleakage.The first two quan- tities are the dynamic components of total power since they only occur at transitions between logic states, whereas the third one is the static component as it is a constant source of dissipation.Because of its static nature, leakage power may be regarded as an offset to the total instantaneous power, while at the same time, it is two to three orders of magnitude smaller than the other two components.It is therefore clear that the worst-case power conditions arise stric- tly on transition intervals and that any corresponding analysis should be restricted inside a clock cycle which marks a transition be- tween two consecutive binary vectors.Because of this, maximum power estimation is also referred to as cycle-accurate power estimation.
Within this context, estimation of maximum power is a fundamen- tally different problem than that of average power estimation.The former is related to the maximum value, among all possible vector pairs, of the instantaneous power consumed within a clock cycle, whereas the latter involves the mean power of a sequence of input patterns applied over a large number of clock cycles that cover an extended period of time.From a statistical point of view, the prob- ability distribution of average power exhibits smoothing properties over time and approximates a normal (Gaussian) distribution as the number of clock cycles tends to infinity.On the other hand, instantaneous power is considered within short periods of time and therefore its probability distribution is completely undetermined.In addition, a normal distribution model may be suitable for the center of the distribution but not for the tail which always follows an asymptotic extreme value model.Consequently, conventional probabilistic and statistical techniques [9,10], though have proved suc- cessful for average power estimation, are not adequate to handle maximum power and a radically different approach should be adopted which makes use of extreme value theory.
Instantaneous power dissipation in a CMOS circuit depends upon the specific pair of binary vectors (Vl, v2) that define a transition on a particular clock edge.If the number of primary inputs is n, then such a vector consists of n bits and obviously the population of input vector pairs contains 2 n. 2 n= 4 n units.We will restrict our analysis to combinational circuits and furthermore, we will assume that all primary inputs execute their transitions simultaneously on the same clock edge.However, the generalization to a sequential circuit is straightforward, if we expand the "previous" input vector vl to encompass the state bits (or outputs) of all sequential elements (flip- flops) of the circuit.In that case, if the number of state bits is m, then the input population will contain 2 n+m. 2 n 4 n.2" units.We should also remark that the input population may not always include all possible vector pairs but only selected ones that correspond to typical circuit operation or input switching activity (constrained problem).

Procedure Description
Since we have defined an input population of vector pairs, we may regard the instantaneous power consumption of a particular cir- cuit as a random variable with a certain probability distribution.The required maximum power will then be represented by the upper end point w(F) of this distribution.The proposed method closely follows the concepts presented in Section 2. However, in order to apply them we must assume a continuous probability distribution of power consumption and an input population of infinite size.Both assumptions are most frequently justified in practice for circuits of reason- able size with a moderate number of primary inputs (8 or more).
A flow chart of the estimation procedure is depicted in Figure 1.The main point is to acquire a number of distinct samples, form an additional sample of maxima and graphically transfer it to a Gumbel probability plot in order to determine the domain of attraction and ultimately the upper end point of the probability

FIGURE
Flow chart of the proposed method for maximum power estimation.
procedure begins with the generation of a number of vector pairs in random way or under certain constraints.This is equivalent to random sampling out of the unconstrained or constrained population of input vector pairs.The circuit due for maximum power ana- lysis is entered in a transistor-level or gate-level simulator such as SPICE or PowerMill.Each generated vector pair is fed as input for transient analysis to the simulator and the peak power during tran- sition time is recorded.The set of power values obtained for all generated vector pairs forms one of the samples to be used hereafter and the procedure is repeated for the rest of them.The choice of sample size is critical at this point, in order to ensure the validity of the asymptotic models for the minimum possible number of units.Experiments performed on various parent distri- butions have shown that the rate of convergence to the corres- ponding domains of attraction is reasonably fast.Indeed, most distributions already approach their limit distribution as soon as the sample reaches a size of about 30 to 50 units, depending on the type of distribution, the problem size and the quality of the selected units.The worst case is encountered in the case of normal distribu- tion, which takes about 100 units to approach its Gumbel-type limit distribution.Thus, the sample size is set to n 100.However one may further improve the quality of the final estimate by increasing the sample size.The choice of the number of samples, or equivalen- tly the size of the sample of maxima, is less critical and a total of rn 30 is more than adequate in most cases.On the whole, the total number of units required to obtain an estimate of the maxi- mum power is 3000, which is significantly smaller (by a factor of 2X to 10X) than the majority of the existing methods.
The sample of maxima of power values is subsequently laid on a Gumbel probability plot and the domain of attraction is determined with the aid of the curvature method presented in Section 2.6.
According to Monte Carlo tests reported in [2], the critical value of the curvature S for a sample size of 100 is S 2.588, with a significance level of 95%.The indices of the order statistics that define the inter- vals encountered in the calculation of S are chosen as nl 1, n. n3 m/2 and n4 m.
Due to physical limitations, it is obvious that the probability distribution of power consumption for a particular circuit is upper bounded (i.e., it has a finite upper end point) and therefore it can only lie in the domain of attraction of Gumbel or Weibull families (Theorem 2.3).Consequently the proposed method has considered both types of asymptotic models.
If the limit distribution turns out to be of Gumbel-type, we may infer (due to the discussion in Section 2.3) that the parent distribution belongs to the Exponential family regarding its rate of convergence to one as x w(F).The tail equivalence principle (Theorem 2.4) allows us then to approximate the right tail of the unknown parent dis- tribution with a function of the form: If al is close to then F(x) is a probability distribution with the same rate of convergence to one (and consequently with the same Gumbel- type domain of attraction) as the standard exponential distribution, but which has a finite upper end point equal to: w(F) F-l(1) a2 a log(al 1) Estimates of parameters a2 and a (location and scale respectively) may be derived with the same approach cited in Section 2.7 for the Gumbel parameter estimation, i.e., by fitting a straight line on a probability plot and by subsequently matching the coefficients of the line and plot equations.Care though has to be taken in that the type of probability plot is now exponential and also that the sam- ple does not necessarily follow the distribution for which the plot is designed, which means that the fit should only include the right tail of the sample.A good choice for the number of points comprising the tail is 2v/-.Parameters a2 and a are estimated for all 30 samples of power and mean values are taken out of them.The extra parameter al in (3.2) cannot be estimated from the exponential probability plot since the latter is designed for the standard exponential distri- bution F(x) -exp(-(x-a2)/a).However, al is associated with the Gumbel location parameter # of the sample of maxima, through the relation (see Appendix): al= exp( #-aa2) +1 --nl (3.3)For the estimation of parameter # we refer again to Section 2.7 and the Gumbel probability plot of maxima that we have already created.
When (3.3) is substituted in (3.2), we obtain the following estimate of the upper end point for the Gumbel case: w(F)=a2-alog [exp (-# a2)_] (3.4) The case of a Weibull limit distribution is considerably easier than the Gumbel one, since the upper end point is equal to the location parameter # which can be estimated as the value to give the best straight line in terms of the minimum fit error on the Weibull plot (Section 2.7).
To be more specific, we calculate the sum of squared errors be- tween the data and the corresponding coordinates of the straight line fit as a function of #, and then employ a function minimization algorithm such as the Nelder-Mead simplex method [7].

EXPERIMENTAL RESULTS
Experiments have been conducted on two sets of distributions which are assumed to come close to the behavior of instantane- ous power consumption in VLSI circuits.Distributions that belong to the first set asymptotically follow the Gumbel model, whereas members of the second set are associated with the Weibull extreme model.Each set embodies four distributions with range of values 15.00-49.54mW,15.00-222.69mW, 15.00-533.08 mW and 500.00-707.23 mW respectively.The difference between these two sets is obviously their speed of ascent from the lower end point towards the upper end point.Experimental results for the first and the second set of distributions are shown in Tables I and II respectively.Both tables display the average, maximum and minimum values of the estimated upper end point and the relative percentage error over 100 runs of the procedure for each distribution.The intermediate graphs created for a single run of the estimation procedure applied on the third distribution of both sets are depicted in Figures 2-4 (Gumbel set) and Figures 5-7 (Weibull set).Two important observations can be made by closer examination of the above tables.The first one is that the method appears to be much more precise when dealing with distributions of Weibull-type domain of attraction.This is not a direct result of our method but stems from the fact that, for random and equal-sized samples, a dis- tribution with a fast rate of ascent will exhibit more sample points belonging to its right tail than a distribution with a slower rate of ascent (as is obvious from Figs. 2 and 5).The second observation is that estimates of the upper end point in the Gumbel case appear slightly biased towards larger values than the actual ones on average.This can be attributed to the fact that estimation of parameters a2 and a of (3.1) is performed through a probability plot which is designed for the standard exponential distribution.However, this distribution has an infinite upper end point, which is a cause for a small positive bias on the estimated value of a finite one.

CONCLUSIONS
Fast and accurate estimation of maximum instantaneous power may both allow to spot reliability problems early in the design phase as well as to perform efficient design and optimization of the power distribution network.A statistical approach based on extreme value theory has been proposed to address these issues.The method's strongest benefit is its speed, which is mainly due to the fact that a small subset of input vectors captures all the necessary information for extreme value analysis.In addition, the simulative nature of the method renders it a very attractive alternative solution, as its in- tegration within the design flow of digital CMOS VLSI circuits is straightforward.

FIGURE 3 FIGURE 4 FIGURE 5 FIGURE 6 FIGURE 7
FIGURE 3 Exponential probability plot (with equation -log(1 -y) x/a-a2/a) and least-squares straight line fit on tail data.
log a log(-log(y)) -xl _U Gumbel probability plot equation distribution.The

TABLE
Results for distributions with Gumbel-type domain of attraction