Signal Strength Based Switching Activity Modeling and Estimation DSP Applications *

We present an effective switching activity modeling and estimation technique for components under resource sharing. The model uses word-level signal statistics to generate a single parameter, called signal strength. By using the signal strength, we can construct power models for the both cases of sharing and non-sharing of computing resources. The model enables us to effectively estimate switching activity at higher level of design abstraction. We have conducted several experiments using both synthetic and real data to evaluate our method. We have compared competing architectures for their relative power consumption for different components. The results show that average difference between the proposed method and very accurate power simulation (as opposed to switching estimation) using PowerMill is up to 12%.


I. INTRODUCTION
Power efficient applications such as portable computing and wireless communication devices have driven the VLSI industry to take power consumption as one of the major implementation constraints.High-level power estimation at archi- tectural or register-transfer level enables designers to evaluate competing architectures in the early phase of the design process.The designers can explore design space with larger flexibility and perform better trade-offs at higher levels.Hence, costly redesign steps can be avoided.
Several bottom-up approaches have been pro- posed to address this issue [1 5].The power factor approximation technique in [1] uses a constant type model to determine weighting factor to model the average power consumed by a given module.The model assumes switching activity is constant regardless of the difference in inputs.In contrast, activity-sensitive power estimation techniques ac- count for the variation of power dissipation due to different signal statistics.A bitwise data model is used in [2], while others use word-level sig- nal statistics to construct macromodels.In [5], a method to characterize switching activity based on average signal power is proposed.The systematic way of characterizing switching activity provides an effective basis to hierarchically esti- mate power consumption and to evaluate different designs efficiently.One drawback of the method is that it is not applicable to designs when sharing of computing resources is required.The purpose of resource sharing is to reuse computing com- ponents and to reduce the number of resources needed in a design such that the area is minimized.This is common practice in DSP processor based implementation.Though several studies mention the possible increment in switching energy when resources are shared [6,7], high-level power model- ing has not really indicated how sharing resources impact power consumption.
In the current day CMOS technology, power consumption is dominated by dynamic compo- nents [8].Hence, we will use switching activity as the metric for comparing power consumption.We count the number of switches at each output of basic cell in a functional module and sum them up to obtain the switching activity.For example, the basic cells for an array multipliers are AND gates, half adders and fuller adders.Relative weighting factors are applied to different kinds of internal cells to reflect output load capacitance driven by the gate.We also count the switches at the input pins and multiply them by the bit width, N, to account for input buffers/drivers.Delay-free mod- el is assumed in our current work.As we ignore the internal node switching activity, this method cannot be used for accurate power estimation.However, it is our objective to provide high-level models such that we can compare competing architectures with fast and reasonable accuracy for low switching activity (and hence, low power).
We focus our work on data-dominated applica- tions, such as DSP.By constructing analytical or table-lookup models for basic building blocks, such as arithmetic modules, multiplexers and registers, power (switching activity) estimation in a variety of DSP architectures can be obtained in a hierarchical way.
We will present a power macromodeling ap- proach to estimate switching activity of designs.The approach takes into account the effect of sharing resources and does not treat the building blocks as independent black boxes.
The rest of the paper is organized as follows.
Section II provides background for our work.Section III describes our techniques employed to characterize modules in resource sharing designs.
Experimental results are reported in Section IV.Finally, Section V draws the conclusions.

II. PRELIMINARIES
The power model we presented uses word-level statistics, while the switching activity occurs at bit level.A bit-level model based on word-level statistics is first introduced.Following that we will define a signal parameter called signal strength, 7, and state procedures to characterize switching activity.Throughout this paper the sign- magnitude number representation is assumed.A bit-level switching activity model based on word-level signal statistics is proposed in [2].In the bit-level model, the effects of input word-level statistics on the least significant bits (LSBs) and the most significant bits (MSB) are divided into three regions.The first region (MSBs) has low switching activity.The second region (LSBs) has high switching activity.The last region in between is considered to be a linear transition connecting the other two.This is shown in Figure for sign- magnitude (SM) representation.There are two break points, BPo and BP1, that separate the three regions.Switching activity at the MSBs, the region above the BP1, is considered to be zero.Switching activity in the lower-bit region, i.e., below BPo, is 0.5.

Signal-strength Based Power Model
The signal-strength based power model uses a parameter called signal strength to characterize power.Let the X[n] be the input sequence to a DSP system.Usually X[n] can be assumed to be a stationary Gaussian process.The signal strength, 7, is defined as the number of bits needed to represent the average signal power, r/is given as v/E(X 2 [n]-----+ 1) (1)   7 log 2 (2 N-l-l) d where X[n] represents an input sequence, E(X2[n]) is the average signal power of X[n] and N is number of bits used to represent the signal value.
v/E(X2[n]) equals the standard deviation of zero- mean signals.All signals are assumed to be uniformly quantized in a dynamic range of + d and are represented in sign magnitude form using N bits.With the given statistics of an input sequence, we can compute of the sequence by using Eq. ( 1).The breakpoints BPo and BP1 that define three regions of a multi-bit signal can be calculated from the word-level signal statistics and temporal correlation p) using Eq. ( 2) [5].
In [5], switching activities of building blocks are formulated based on empirical equations of the statistics of the signals applied at their input.An example that uses to characterize components is illustrated as follows.Consider an adder whose two primary input ports are A and B. Let L be the length of primary input vectors.The sequences to two inputs are (a(1),a(2),..., a(L)) and (b(1),b(2),..., b(L)), respectively.The Ta (or B) of the sequence, (a0)) (or (bq))), to input A (or B) is obtained by gathering signal statistics and applying Eq. (1).
After r/'s of input signals are obtained, the switch- ing activity of a functional block can be formu- lated as a function of r/'s of input sequences by extensive simulation and proper approximation.
One can perform linear approximation by fitting simulated results to the first order of the polynomial.As an example in [5], the switching activity of an adder can be expressed as SW= 1.59+ 0.49a + 0.49r/B.This kind of modeling method works well for switching activity estimation of fully parallel implementations, since no operations (multiplica- tions) share the same components (multipliers).In real implementations, sharing of computing re- sources must be considered for some resource- constraint applications. 11.3.Definitions for Arra.yMultipliers Before we consider generalized models, we need to define some terms for array multipliers which are used in our experiments.Two attributes are used to classify the array multiplier [9].One is the order of the operand's bits: most significant bit (MSB) first or least significant bit (LSB) first.The other attribute is what kind of adders are used to perform summation of partial products: a ripple (RP) carry type or a carry-save (CS) type.An MSB-first CS multiplier will be referred to as most significant bit first array multiplier with carry-save structure adders.
The idea of sharing resources implies that a shared functional unit has to multiplex its input from different sources.Hence the assumption about stationary signals for original signal strength based switching modeling is not valid any more.Therefore, it is necessary to have a model that can be used when sharing resources are needed.Consider a module that is shared by two operations.For the input A of a module, the new sequence is formed by alternating between two source sequences, (ali)) and (ai)) and is equal to a ,a 2 ,a ,a 2 -, all), a 1)) interleave( (ai)), (ai))).Interleave()is a function that mixes two alternating sequences as shown above.Parameters that can relate switching activity differences of (ali)) and (ai)) must be added for more accurate modeling.
Difference in switching activity of a resource, which will be referred to as Asw, is defined as the difference in switching without sharing and with sharing.The difference, Asw, can be positive or negative.Experimentally, we have observed an increment in switching activity in most cases of resource sharing.We observed that the Asw of a shared functional unit is affected by the difference of O's of input sequences, which are (ali)) and (ai)).A n will be used as the notation for difference of r/'s for any two signals.Qualitatively speaking, the larger the A n of two sequences, the larger is the Asw observed.A n can be used for estimating switching activity in case of resource sharing for the following reasons: First, 7 is related to average signal power, so the A n of two signals will indicate Asw.Second, it is shown in [5] that r/+ 2 marks the beginning of the most significant bits (MSBs) region in word-level statistical model.The switch- ing activity in the MSB region is very low and is generally assumed zero when modeling.However, the width of the MSB region will affect the switching activity.The sharing condition is based on the difference of r/'s at every primary input of a functional unit.Consider a two-input functional unit, FU, that is shared by two operations, OP1 and OP2.Each operation originally has two input sequences, (ai)) and (bli)) for OP1 and (ai)) and (bi)) for OP2.By applying Eq. ( 1), the corresponding 's can be computed and they are denoted as 1, /b, /a2 and b2.The shared FU will then have two new input sequences (V(aj)) and (vJ)), where (V(aj)) interleave (ai)), (ai)) and (v j)) interleave ((bli)), (bi))).
The A of two sequences into input A is computed as Da--llal-a2].Db=lT]bl--]b2l is for input B.
The switching activity of the shared FU will be constructed using Da and Db as well as al and r/bl.
Consider the construction tf a difference table for the FU under the sharing condition D 3 and   Db--0.For every entry in the table, two sequences whose rfs are al and a2 will be first generated by a sequence generator where Tla2 Via -+-Da Vial -1-" 3.
Then they are alternately fed to input A of FU.
The same procedures are applied to the other two sequences at input B of FU, whose 's are Tlb and T]b2 ( b +Ob 1"161 + 0 T]b 1).To form a differ- ence table, T]a and r/b1 range from to 16-Da and from to 16--Db, respectively for 16-bit data- width.Figure 2 plots a difference table when Da= 3 and Db 0 for a 16-bit MSB-first CS array multiplier.
Though we used a two-port functional module to explain our idea, our method can be applied to other common DSP modules such as registers and multiplexers.A register can be modeled as a one- port module.The Eqs. (3(a)) and (3(b)) remain the same, while the last part of Eq. ( 3) is modified to SW(OP1, OP2) SWbase('lal) + SWbase(la2) + SWdiff(?'lal,Oa) + SWdiff(Tla2,Da2 Let us now turn our attention to multiplexers (MUXes).Since our modeling for resource sharing actually considers a module under multiplexing situations, the modeling of the MUX is quite unique in our model and requires special treat- ment.We first focus on a 2 MUX and expand our model to a n MUX later.Since we focus on data-dominated applications, we assume that a n MUX is multiplexing among its input sources with equal opportunity.So each input sequence of a 2 MUX will have 50% chance to appear at the output of the MUX.
Since the MUX is always in multiplexing mode, there is no need for base table.We can consider a 2 MUX as an one-input module that shares two sequences from its two inputs.Therefore, we can formulate the switching counts as follows: SW(Tlal, ?a2)SWdiff(Tlal ,Oal) + SWdiff(Tla2, Da2 111.3.Sharing of n Operations Our work can be generalized to characterize switching activity (power) for sharing a functional unit among n operations.It is observed that switching transitions occur when two consecutive vectors are different.If the order of interleaving sequences is known, the variations of switching activity is the summation of n individual mixings.Suppose n=3 and the order of interleaving sequences is (ali)), (ai)) and" (i) a3 ).Three opera- tions, OP1, OP2 and OP3, share one functional unit.The first part of switching activity (SWno_sharing) is the summation of individual switching activity without sharing a resource.The second part of switching activity can be expressed as Asw SWdiff(Oel,oe2,0e3)= SWdiff(Oel, oe2) + SWdiff(OP2,0P3)ff-SWdiff(OP3,0P1).Asw is the summation of every consecutive pair in the order of mixing.For sharing of n, we need to look up n pairs.This makes our technique very effective and fast in estimating switching activity (or power).--Z SWdiff (TIa, TIb,Oa,Ob j=l (6) where n is number of sharing of one resource.The generalization for a n MUX is the expansion of Eq. ( 5) for a 2 MUX.The switching counts can be formulated using the following equation.
SWMux({ai})i=l...n SWdiff(TIal,Dal + SWdiff(a2,Da2) -+---  (N-2) x (N-2).This storage can be reduced to O(N) if polynomial surface fitting can be applied to each table.Since we have experimentally observed as high as 20% of mismatch between data and quadratic approximation, we do not use polynomial surface fitting to determine the switch- ing activity.Instead, we use tabular forms to store the results.
III.5.Analysis of Characterized Results of Functional Units We have constructed a set of switching activity models for different types of multipliers and adders using the method described above.The relative switching increment using resource sharing is called percentage switching increment (q,) and is defined as SWdiff TIa TIb Da Db SWbase(T]al, bl) q-SWbase(Ia2, b2) oo. (8) The quantity indicates the relative switching behaviors under sharing of resources.Figure 3 shows the percentage switching increment for an MSB-first CS array multiplier under resource sharing.There are five bars at every position of ]bl.Each represents Da= 1,2,3,4,9.The mixing conditions are Db-O while Da is varying.The varying Da is based on fixing al to 6.As Da= ]T]a -Tla2] Tla2 can take the following values: 7,8,9,10,15.We collect data from 5 different mixing conditions, (Da=l, Db--0), (Da=2, D6 =0),..., (Da= 9, O b =0).We take the data at r/1=6 and TIbI 1,2,...,14 from each mixing condition and place them side by side.The percentage switching activity increment at Da= 9 averages 22% and reaches as high as 28%.It is observed that q, increases as D increases.Similar behaviors can be observed for LSB-first array FIGURE 3 Percentage switching increment (3') for a 16-bit MSB-first CS array multiplier under resources sharing when Db =0 and D is varying and ?al is fixed at 6.
MB4irst.-multiplier with Dt and D varying q FIGURE 4 Normalized percentage switching increment for a 16-bit MSB-first CS array multiplier under resources sharing when Db 0 and Da is varying and al is fixed at 6. multipliers, tree multipliers and adders.We also normalize bars at every position of T]b by the bar whose value is the largest among the five bars.For example, bars that represent /at Da--1,2,3,4,9 are normalized by the 3' at Da= 9. Figure 4 shows the normalized increment in switching.We observe that no matter how large r/b1 is, there is always about 90% difference between Da= and Da--9.
These results suggest that A n of sharing sequences have significant impact on switching activity.

IV. EXPERIMENTAL RESULTS
We compare our signal-strength based switching activity estimation (SSSAE) with a power simu- lator, PowerMill, for the relative power consump- tion of DSP functional units.SSSAE works on higher level of design abstraction, i.e., architectur- al or register-transfer level (RTL), while Power- Mill needs detailed implementation at the circuit level.The prototype SSSAE software has been implemented in the C / / programming language.
Our evaluation shows that there is about 6-17% difference in switching activity estimation and very accurate power simulation on the relative power consumption of different components of an implementation.
IV.1.Setup and Procedures We have two implementations for a 6-tap FIR filter at the RTL and at the circuit level.The two implementations for the filter use the same number of resources, but have different resource binding for functional units.Figures 5 and 6 show two implementations at RTL. Information needed for switching (power) esti- mation for SSSAE and PowerMill are different.SSSAE requires a scheduled and allocated data flow graph (DFG) and input signal statistics, while PowerMill needs circuit-level description and data sequences.We use the following procedures to obtain the circuit-level implementation.The sched- uled and allocated DFGs are first implemented in VHDL.Then we use Synopsys Design Compiler to synthesize the VHDL code using low-power standard cell library developed by Carnegie Mellon University [10].The cells in the library are implemented using 0.35 pm CMOS technology with 3.3 V power supply." We use variety of data sequences to evaluate SSSAE.These data sequences are obtained from  various sources as shown in Table I Signal statistics are propagated from the primary inputs using analytical method given in [11].This method enables to propagate the statistics with fast and reasonable accuracy.With the signal statistics such obtained, signal strength can be computed as described in Section II.Our switching activity estimation is not designed for accurate estimation of real power consumption, but focuses on the relative power dissipation in components and rapid evaluation time.The results obtained from PowerMill have confirmed that our estimation are in good agreement with the circuit-level simulation.The execution time for each design using PowerMill took about .lamPMutt,] standsfor a MSB-f RP array multiplier, l'WTMuZt standsfor a.Wallae-Treemultiplier..We compare the relative power dissipation of different components of an implementation using two methods: one obtained using power simula- tion (i.e., PowerMill) and the other by switching activity estimation (i.e., SSSAE).Here FUs represent functional units, MUXes are for multi- plexers and REGs are for registers.
Tables II and III show the relative power consumption for different components of the  implementation.The average difference between the estimation and the simulation is 11.27% for FUs, 1.52% for MUXes and 12.09% for REGs.
The power consumption due to the registers ranges from 70% to 80% of the total power consumption.
It is observed from Table IV that the ratio between switching activity and accurate power simulation is similar for different rows of the table.
The ratio is related to real capacitance in the circuit.By extracting the ratio and applying to our models, the accuracy can be increased.For example, we can obtain the empirical factor for FUs of Implementation I as 0.618 from Table IV.Empirical factors for MUXes and REGs can be obtained similarly.They are 0.622 and 1.23 respectively.Use of such factors reduces the error between the power simulation and the switching activity estimation to within 5% as shown in Table V.Even though our modeling method does not include capacitance informa- tion of the design and glitches that occur during the operations of real circuits, the model still provides reasonable accuracy for the relative power consumption.

V. CONCLUSIONS
We have proposed an effective switching activity modeling and estimation technique that uses signal strength to characterize and evaluate components under resource sharing.This scheme considers not only switching activities due to varying input signal statistics, but also transitions stemming from hardware sharing.The effectiveness of the proposed model is verified by two implementations of an FIR filter.A variety of synthetic and real data are applied to the FIR filters.The results show that our models track the circuit-level simulator reasonably well.
The transformation of parallel algorithms to a resource-limited design is very common in DSP applications.The proposed generalized signal- strength based model provides a valuable tool to estimate and compare switching activites in the early phase of design when the input signal statistics are known.

FIGURE 2
FIGURE 2 Switching differences (Asw) for a 16-bit MSB-first CS array multiplier under resources sharing when mixing condition is D 3 and Db O.

FIROUT*
The control unit and control lines are not shown here.

FIGURE 5
FIGURE 5 Implementation of a 6-tap FIR Filter.
unit and control lines are not shown here.

FIGURE 6
FIGURE 6 Implementation II of a 6-tap FIRFilter.
Therefore the difference in the width of two MSB regions can be an indication of Asw due to resources sharing.111.2.Generalized Signal-strength Based Switching Activity Model By including At/, the signal strength difference of sharing sequences, we can generalize the switching activity model.The generalized switching activity model of a functional unit consists of a base switching table and a set of difference switching tables.The base table is constructed under the condition of sharing no computing resources.The switching activity in the base table can be derived based on the original signal strength based power model.The difference tables are constructed under different sharing conditions.

TABLE Description
while our switching activity estimation at RTL took less than 30 secs on the same platform.IV.2.Analysis of Results

TABLE IV
The ratio for FUs between switching estimation and power simulation extracted from TableII

TABLE V
Simulated and estimated relative power dissipation for implementation of Figure5.The estimated switching activity for different components have been multiplied by the extracted empirical factor