Comparative Power Analysis of an Adaptive Bus Encoding Method on the MBUS Structure

This paper proposes a novel bus encodingmethod onMBUS in order to reduce the power consumption of system-on-chips (SoCs). The main contribution is to lower the bus activity by an average 64.55% and thus decrease the IO power consumption through reconfiguring the MBUS transmission.This method is effective because field-programmable gate array (FPGA) IOs are most likely to have very large capacitance associated with them and consequently dissipate a lot of dynamic power. Experimental result shows an average 70.96% total power reduction compared with the original MBUS implementation.


Introduction
For the energy-limited IoT and wearable devices, low-power technologies to prolong the life of the battery become an important constraint for embedded chip design [1,2].A general formula of dynamic power dissipation can be written as where the supply voltage denoted as  and the load capacitance denoted as  are decided by the tape-out technology [3].The maximum clock or operational frequency, denoted as , is the reciprocal of the sum of setup time ( setup), critical combinational path delay between flip flops ( propagation), clock to  delay ( clk − ), and clock skew ( skew).All these four parameters are constant and cannot be changed after synthesis and technology selection.Hence, as the aforementioned equation ( 1) referred to, one of the most effective ways to reduce the on-chip power dissipation is reducing the toggle activity, denoted as , of signals, IOs, and logics.Moreover, the toggling rate of the on-chip bus becomes one of the main design issues because it dominates the power consumption and degrades the performance due to a complex scalability [4,5].For example, the existing on-chip buses, such as AHB [6] and AXI [7] from ARM Holdings, Wishbone from Silicore Corporation [8], and OCP from OCP-IP [9], cost much hardware resource in terms of slice/gate count and energy consumption, due to a large number of IO and signal definitions and complicated structures.They are designed for a broad range of various applications and are characterized by high flexibility, scalability, and universality.Under this context, a cost-effective and power-efficient control bus named as master bus (MBUS) [10] has been proposed for specific IoT applications as our previous work, making a better balance between the limited energy on tiny-size embedded chips and high speed requirement of complex computations.Furthermore, it has been improved in [11] to preselect data sequence for Advanced Encryption Standard (AES) engines, so that the state buffering and rescheduling overhead can be reduced.
Based on the MBUS protocol, this paper presents a novel bus encoding technology on the existing address signal in order to reduce the dynamic power.More specifically, the contributions are as follows: (1) We represent four different transfer types with two always-zero bits of the address signal, by which the switching activity can be decreased and thus the IO power dissipation can be lowered.
(2) As a case study, we apply this method on a generic control/central bus, MBUS, and introduce the basic idea of the hardware structures.Field-programmable gate array (FPGA) results show an average 70.96% power reduction compared with the original designs.
The rest of the article is structured as follows: in Section 2 we present related work and Section 3 discusses our proposed encoding methodology.In Section 4 we introduce the hardware implementation.Then, we demonstrate and evaluate the approach with simulation, synthesis, and power measurement in Section 5. Last, we summarize our work in Section 6.

Related Work
In earlier works, there were many bus encoding algorithms aiming to reduce the power dissipation on interfaces by mapping the information on IOs or signals to a form which has less transition activity than the original, such as the bus-invert encoding [12,13], gray code [14], serial T0, and combined bus-invert and T0 technology [15].
The basic idea of bus-invert encoding is flipping a transmission word when the Hamming distance between the current word and the previous word is greater than a half of the bus size; otherwise, no encoding is applied to the transmission word [12].In such a way the maximum number of lines that switches can be limited to 50% of the bus size.Gray and T0 encodings are targeted to the situations in which the address to be sent is consecutive to the one sent previously [14].Furthermore, the bus-invert technique is combined with T0 in [15], thus obtaining more activity reduction compared with single T0 technology.
However, all of these encoding algorithms require redundant control lines and overhead logic to recognise the switching rate between subsequent transmissions.And they are less effective when applied to a single or interleaving transfer mode, because in this case the percentage of insequence bursts decreases.To overcome these issues, this paper proposes a new encoding method with the existing address lines.By differentiating and processing different register configuration modes, an average toggle rate will be decreased by using this encoding technology.More important, our proposed encoder employs only concatenation and shifting operators of hardware description language (HDL), which will be converted as just reconnected signals and IOs after synthesis.

Proposed Encoding MBUS
In this section, we present an encoding MBUS protocol, capable of lowering the bus activity and thus decreasing the IO energy consumption.

An Analysis on MBUS Protocol.
MBUS is defined as a control bus for functional register configuration [10].Considering the instruction operations on energy-limited chips, MBUS is created for minimal power consumption and reduced interface complexity so that it only supports single transfer mode with at least one-cycle command and one-cycle data.
As shown in Figure 1, "m ce" and "m wr," respectively, represent the data transfer enabled signal and direction, 1 for write and 0 for read."m data" is designed as a bidirectional and shared interface with write and read channels so that the wire usage efficiency can be increased and the hardware interconnection can be simplified.Since the MBUS is a singlemaster and multislave structure, there is no arbitration so that the command stage takes only one master cycle, in which "m ce," "m wr," and "m addr" are sent from master to salve.
From slave to master, a valid signal, denoted as "s vld," is defined to handshake with the request in order to avoid metastable signals crossing different time domains and overflows of command FIFOs.Additionally a response delay timer is required to detect command errors.If the current response is a timeout, the command is indicated as "error" and must be "retried" or "discarded" by the master.
By analyzing the MBUS protocol, it can be observed that the sequence of command stages can be predicted, exploited, and rearranged using a scheduler or an arbiter [16].Moreover, the toggle rate reduction on "m addr" of the command stage can dramatically decrease the entire MBUS IO power due to its multibit definition.

Functional Register Map.
As an example, a digital system built around an 8-bit microprocessor providing 16-bit address lines is applied in our work, which can address up to 64 KB of memory.The hardware of the system is arranged so that devices on the address bus will only respond to particular addresses which are intended for them, while all other addresses are ignored.This is the job of the address decoding circuitry, and that establishes the memory map of the system.For instance, system's register map may look like Figure 2.This memory map contains 16 bases, each for one applicationspecific module or peripheral, which is also quite common in actual system architectures.
As a case study, the first 4 KB of address space, denoted as Base#0 in Figure 2, may be allotted to random access memory (RAM), another 4 KB of Base#1 to read only memory (ROM), and the remainder to a variety of other devices such as host or device interfaces, timers, counters, and wireless devices.In one of the bases, Base#F used by a host interface, for example, the offset or pointer "0xF18," represents the transfer length, "0x000" indicates the transfer direction, 1 for write and 0 for read, and "0xF0C" performs the transfer enable.Hence CPU can send 512 B data out using the host interface by sequentially writing "0x200" to the memory location "0xFF18," "0x1" to the memory location "0xF000," and "0x1" to the memory location "0xFF0C."Then the address switching rages are 37.5% for both of the two subsequent operations.
An alternative way to drive the bus is in the order of "0xF000," "0xFF18," and "0xFF08."In such a way the toggle rate of the first transmission will not be changed but the second sequence switching rate will be decreased to 6.25%.Furthermore, since the consecutive or increment register configuration, such as "0xF000," "0xF004," "0xF008," and "0xF00C" in sequence, is very commonly used by software configuration, an encoding bus can be employed to reduce the toggle rate instead of directly sending the original addresses.

The Address Encoding Method.
In our study, the data bus is word size so that the least significant 2 bits of MBUS address is always 0. Instead of filling out 2-bit 0, we redefine this 2-bit field as 2 flags, "Same Base" and "Consecutive Address."As shown in Figure 3, the "Same Base" field represents that the previous and current addresses on bus are from the same base, 1 for valid and 0 for invalid.Likewise, the "Consecutive Address" field indicates that the two addresses are consecutive in the memory location.In other words, when the "Consecutive Address" flag is asserted the present address will be the previous address plus 4 as a word-size bus.
More specifically, Algorithm 1 introduces the encoding procedure and Figure 4 explains the encoding method with four different test cases.
(1) Mode#0: generally software configurations are consecutive and increment in the memory location.
In such case the base and offset fields sustain the previous values and the 2-bit identifiers of the "Same Base" and "Consecutive Address" are asserted.As an example shown in Figure 4, since the higher 14 bits of the 16-bit current address is not changed in Mode#0, the IO switching power will be dramatically decreased.On the decoder side, one 10-bit adder (the least significant 2 bits is always 0 and the most significant 4 bits is directly connected) is required to compute the original address.(2) Mode#1: when the transfers are consecutive and basecrossing, the current "Based Address" field should be updated and the "Same Base" flag has to be deasserted.Because base-crossing results in a large percentage of IO flipping, the power dissipation can be reduced a lot by using this method.As an example shown in Figure 4, the "Offset Address" field sustains the previous address so that the flipping rate is reduced from 68.75% (0x0FFC -0x1000) to 12.5% (0x0FFC -0x1FFD).Different from Mode#0, on the salve side the original address can be simply decoded by concatenating 4-bit "Base Address" with the initial offset of the current base, in this case they are 12-bit 0.
(3) Mode#2: memory addresses are often interleaved in the same base.In this case the 2-bit identifiers are set to be binary "10," and the power consumption is similar to the traditional address transmission due to the similar toggle rate.As shown in Figure 4, in the third case the number of switching bits is changed from 6 to 7.
(4) Mode#3: in Mode#3, the switching rate is not modified in the case of base-crossing and addressinterleaving.For instance, the least significant 2 bits, "Same Base" and "Consecutive Address," is 0 in Figure 4 and the fields "Based Address" and "Offset Address" have to be updated, so that the dynamic IO power will be the same as that of the conventional bus communication.

Implementation of the Encoding MBUS Structure
Address encoding concepts are introduced in Section 3; we then implement the design under test (DUT) and evaluate the performance using a performance evaluation methodology [17].
As a case study, we consider that MBUS address will be of 16-bit wide for both original and encoding tests.The DUT structure is shown in Figure 5. First we design an MBUS master using Verilog HDL.The basic module of the master is a bus encoder.Then, we implement a decoder on the slave side as a verification intellectual property (VIP).
In our work, we use Mentor Graphic ModelSim 10.4d as the simulator.After simulation we can obtain the waveforms (VCD) and after the synthesis we can get the net lists (NCD).These files are required to analyze the power consumption using XPower Analyzer (XPA).Moreover, Xilinx ISE 14.7 is employed as the synthesis tool with the target device Spartan6-XC6SLX4L in our study.
We perform 10 us simulation to understand the tests.For example, Figure 6(a) shows a vector of 4-beat increment transfer.The addresses on the encoding MBUS are "0x0FF0," "0x0FF3," "0x0FF3," and "0x0FF3."Notice that the last three transactions are not modified so that the switching activity, calculated as 2/(16 × 4) in this case, is lower than the toggle rate of the original MBUS.Since the original MBUS addresses should be modified in each transaction, in the order of "0x0FF0," "0x0FF4," "0x0FF8," and "0x0FFC," the toggle rate should be computed as 3/(16×4).Although the improvement is limited for this 4-beat transfer, the IO power dissipation will be reduced much more than this case when the number of transactions is large.
In Figure 6(b), the power saving is more than the first case, because all the 10-bit offset field sustains the previous value.The switching rate is only 2/16 for the encoding MBUS; however, the original bus costs 11/16 bus toggle rate.For Mode#2 and Mode#3, the power consumption of the encoding bus is similar to the original MBUS; hence, the power saving is very low.
In what follows, the power consumption reports and corresponding signal rate information are generated by the XPA tool.We claim and experimentally show that there is almost a linear relationship in signal rate and power reductions for bus transfers.As an example shown in Table 1, the IO power dissipation is 16.04 mW with 12.2% signal rate on the original MBUS (as shown in the third row), and the IO power cost is reduced to 2.53 mW due to the very low signal rate (1.9%) on the encoding MBUS (as shown in the fourth row).
Furthermore, we summarize the total power dissipation, including both static and dynamic power, in Table 2.As an example shown in the third and fourth rows, in Mode#0 the power consumed by our proposed encoding method is reduced by (15 mW/29 mW) × 100 = 51.7%compared with the original bus.
In the same way we can estimate power dissipation of the other test modes.Generally, in Mode#1 (as shown in the fifth and sixth rows) up to 70.9% of power is saved but in Mode#2 (as shown in the seventh and eighth rows) and Mode#3 (as shown in the ninth and tenth rows) the power dissipations are very similar to the original MBUS, which proves our expectation and analysis in Section 3.

Results and Analysis
In this section, the system performance of speed and power consumption is analyzed and concluded.
In general, it can be observed that our proposed work significantly reduces the total power consumption in Mode#0 and Mode#1 by lowering the IO power, actually by decreasing the switching activity.The toggle rate and IO power decreasing can reach up to 18.9% in the best case of Mode#1, and thus in the same test the total power reduction is achieved by 29.1%.
The worst case occurs at Mode#2, where the power consumption of our design is increased to 1.1 times compared with the original MBUS.Assuming the percentage of each test mode is 25%, the proposed method can reduce the system power to an average 70.96% compared with the original bus.Since the consecutive (Mode#0) and bank/page/basecrossing (Mode#2) cases frequently occur at the register configuration, our proposed work is suitable for the control/central buses such as MBUS.Moreover, the system speed is estimated in Table 3.The maximum operational frequency can reach 323.217MHz with the critical path as 3.094 ns, which is shown in the second row.As shown in the other rows, the DUT meets all the timing constrains such as setup and hold time.

Conclusions
This paper proposes a novel address encoding method and applies it to the MBUS protocol as a case study.Power analysis results show an average 70.96% power reduction with equal weight of different test modes and up to 70.9% power decreasing in the specific case of base-crossing and increment addresses.
So far the encoder is considered as the design under test, and in the future we will focus on the performance evaluation for the whole system involving both masters and slaves.Compared to the master design with just concatenation operators, the decoder implementation will cost more slices and power due to the complex address decoding structure, involving one 10-bit adder, one 4-channel multiplexer, 14 latches, and some flip flops for a state machine.

Figure 4 :
Figure 4: Examples of different register configuration modes.

Table 1 :
Signal rate and power dissipation.

Table 2 :
Power dissipation of original and encoding MBUS.