Design of Smart Power-Saving Architecture for Network on Chip

In network-on-chip (NoC), the data transferring by virtual channels can avoid the issue of data loss and deadlock. Many virtual channels on one input or output port in router are included. However, the router includes five I/O ports, and then the power issue is very important in virtual channels. In this paper, a novel architecture, namely, Smart Power-Saving (SPS), for low power consumption and low area in virtual channels of NoC is proposed.The SPS architecture can accord different environmental factors to dynamically save power and optimization area in NoC. Comparison with related works, the new proposed method reduces 37.31%, 45.79%, and 19.26% on power consumption and reduces 49.4%, 25.5% and 14.4% on area, respectively.


Introduction
In recent years, the 3-dimensional IC and TSV (Through-Silicon Via) technology are proposed to solve area issues.The 3-dimensional IC of Intel Ivy Bridge processor and the 16core multicore architecture can be implemented in 22 nm [1].Therefore, the multicore and heterogeneous systems are popular research in SoC (system-on-chip).These architectures require high throughput and performance to transfer data in a multicore SoC.Therefore, the NoC (network-on-chip) can be proposed to solve this requirement, but it derived new problems such as power consumption and area [2,3].
The NoC architecture [1] consists of processing element (PE), network interface (NI), router, and topology which is shown in Figure 1.The PEs transfer information to NI, the NI packages the information into flits then passes to routers.The routers have difference corner router (CR), edge router (ER), and router (R); the CR, ER and R has three, four, and five I/O ports to access information then each port includes  virtual channels.Router includes transmission channel, routing computation (RC), virtual channel arbiter (VA), switch arbiter (SA), and crossbar (XBAR).The flits includes header, body, and tail; the header flit has PE priority, source address, destination address, and so forth.The RC uses header flit and routing algorithms to find transmission path.VA uses two stages arbitration to select most high priority packet transmission and then will sign transmission channel.SA uses two stages arbitration and will select most body flits into XBAR to transmit.The VA will be working when the packet is arrival.The SA operation when the flit is arrival.The tail flit represents last flit, and then the router will unregister transmission channel.The router topology includes mesh, star, and fat tree [4,5].
Yoon et al. [6] analysis of virtual channels (VCs) can avoid routing and protocol deadlock and improve the routing performance when the packet traffic is congested.The VCs can solve packet switch hard issue but it leads the power and area and so forth issue in NoC.
Nicopoulos et al. [2] proposed IntelliBuffer architecture to solve PV (process variation) to reduce the power consumption in layer 1 [7].It differs from the conventional architecture in two fundamental ways.First, these slots use clock-gating to reduce the power consumption when slots are empty.In order to avoid data loss transmission, one of slots clock keeps to access data in each I/O port.Second, the router creates a leakage classification register (LCR) table; then the write and read pointer always accesses the lowest power consumption slots from the LCR table.
Taassori et al. [3] proposed an adaptive data compression technology to reduce the number of packet bits in layer 3 [7].It reduces of the number of transmissions.Therefore, it can improve power consumption of router.Palma et al. [8] use T-Bus-Invert technology to reduce the hamming distance transition activity rate to improve the power consumption.Jafarzadeh et al. [9] use end-to-end data coding technology to minimize switching activity rate and routing path to improve NI power consumption.Lee et al. [10] proposed buffer clock-gating architecture and used clock-gating to reduce the transmit power consumption when slots are empty and full.Ezz-Eldin et al. [11] proposed an adaptive virtual channel with two sections in layer 1 [7].First, the work used hierarchical multiplexing tree for Virtual Channels (VCs) to reduce area.Second, it uses clock-gating to reduce power consumption.Rosa et al. [12] proposed dynamic frequency scaling in PE for NoC.It considers the communication and loading rate to control the router frequency to reduce the power consumption.
Huaxi et al. [13] proposed fat tree-based optical NoC; this architecture includes topology, placement, layout, and protocol.This paper proposed low power and cost router optical turnaround router to improve the power consumption.Gu et al. [14] proposed Cygnus router to optimize the router algorithms to reduce the power consumption.Swaminathan et al. [15] create two FIFOs in NI.Use two FIFO dynamic configuration data access to improve throughput and power consumption.
In the next section we analyse the power consumption under the difference VCs access.Section 3 we introduce the topology and router packet architecture, we addition the SPS in router to save power.In Section 4 we present SPS with router design.Section 5 contains experimental results and Section 6 concludes this paper.

Power Issue with Virtual Channels
The multicore architecture and big data communication are more popular in next generation.Traditional communication technologies cannot meet a large amount of traffic on multicore and heterogeneous chip.The NoC can solve this issue.It uses network transmission method to make the difference core communication at same time.The NoC can solve the communication issue but the big data access enhances the power consumption.
The router composed of the arbitration and transmission unit [16] is illustrated in Figure 2. The arbitration unit selects the highest priority packet sent to next router.The arbitration unit includes routing computation (RC), VC arbiter (VA), and switch arbiter (SA).The RC is the calculation of routing paths and priorities.The VA contains a number of two-stage arbitrations to select packet and sign up VCs.First stage selects the local highest priority packet from input VCs to crossbar and signs up VCs.Second stage selects the global highest priority packet from input crossbar to output VCs and signs up VCs.The SA also contains a number of two-stage arbitrations to select flits for transmission.First stage selects the local highest priority flits from input VCs to crossbar.Second stage selects the global highest priority flits from input crossbar to output VCs.The VA executed prepacket and the SA executed preflits.
The router with transmission unit is illustrated in Figure 3.In this unit, it includes VCs to access large packet from input physical channel to output physical channel.A power consumption calculation to VCs is shown in (1).The variable of  represents the number of access packets or flits in VCs.The variable of  represents access frequency in VCs.The variable of  represents capacitance and ] represents voltage in VCs.Nicopoulos et al. [2] and Katabami et al. [17] proposed clock-gating to solve this issue.
In this paper, we proposed a dynamic control of each virtual channel clock in different transmission environments.Whether packet transfer is complete, the SPS can effectively reduce the power consumption and does not affect the transmission performance.Consider (1) addresses (  and   ) is equal then it means the flits arrival.

Router and Topology with SPS
Otherwise, the - routing algorithm includes two-stage flows.In stage one, the flits are sent until that the   equals of   on the -axis routers.In stage two, the flits are sent to the destination by -axis routers.The virtual channel will be initialed under packet transmit on two routers, which procedure is shown on Algorithm 1.
The control method of arbiter architecture uses different transmission mode to design.The VC arbiter and switch bar are by the topology and priority to design the routing computation unit.Algorithm 2 constructs VC two stages arbitration of prepackets.Stage 1 decided high priority packet into crossbar from local VCs (input VCs) of each packet at lines 3 to 4 and lines 8 to 10. Stage 2 decided most important packet to transmission from global VCs (output VCs) of each packet at lines 5 to 6 and lines 11 to 13.

Sign up Algorithm
Input:  ℎ and   .
(1) while (flits arrival) do (2) if ( ℎ2 is header and  is free channel) (3) {sign up the channel and select the channel to output} (4) else if ( ℎ2 is body and  =  ℎ2 ) (5) {select the channel to output} (6) else if ( ℎ2 is tail and  =  ℎ2 ) (7) {clear the channel and select the channel to output;} (8) else (9) {read back flit to virtual channel} (10)      The router includes four directions to connect other routers and one local physical channel to connect PE in transmission channel architecture.There have been VCs of each physical channel without local physical channel.The switch bar support for transmission the most important packet to output channel.The SPS controls each VCs power consumption when the channel status changes.The SPS architecture is introduced in next section.

Topology Architecture.
The topology is definition of the packet transmission path between router and link.The router connection topology architecture is shown in Figure 6; they include star, mesh, ring, and tree topologies.The RC algorithms depend on topology architecture in arbitration unit.The VA and SA algorithms depend on packet priority in arbitration unit.In this paper, the topology is the 2 × 2 mesh, the RC algorithm is - routing, and the VA and SA algorithms are lottery [18].
The router that connects with PE is shown in Figure 7; so that the PE and router access information, use the network interface (NI).It handles the information between router and PE.The NI includes two level designs [19] as shown in Figure 8.It contains three modules to meet the specifications of the different layers.The shell module needs to meet IP specification.The kernel module needs to meet the NoC topology specification.

Flits with Router
Architecture.The flit specification with router is shown in Figure 9; the flit type of 2-bit 00 represents the one packet; this flit type does not sign up VCs.The 2-bit 01 represents the header flit which includes routing information and address; this flit type always is determined in sign up channel.The 2-bit 10 represents the body flit which includes transmission information; this flit payload records the segment packet.The 2-bit 11 represent the tail as last transmission information; this flit not only records the last segment packet but also cleans the VCs.

SPS with Router Design
The VC that contains many slots to access data led to extra power consumption.In this paper, we propose SPS architecture to reduce the power consumption.

4.1.
Router with SPS Architecture.The proposed router with SPS architecture is illustrated in Figure 10.The physical channel (PC) is used to connect other routers and access information.The input VCs (IVC) is used to store information from PCs.It always is designed by FIFO or other sequential logic.The arbiter decides the flits priority to control input switch logic (ISL) and output switch logic (OSL) to transmit flits.It includes RC, VA, and SA.The crossbar (CR) connects IVC to OVC, the switch signal form arbiter.The output VCs (OVC) store information from CR.The proposed SPS uses the transmission channel status to dynamic control IVC and OVC clock in essential operating.
The VCs with SPS architecture are illustrated in Figure 11.It controls system clock into I/O VC to reduce power consumption.In this architecture, the VC contains 0 to  − 1 slots to access data.(3).The  1 is power consumption of empty gating.The clock-gating architecture does not control clock when VCs is full stage.The VCs always store flits to wait for transmission.
The SPS consumes power in Clock Block B. Our analysis for SPS architecture is shown in (4).The  2 is power consumption of SPS.It saves the power consumption of empty and full gating for VCs.Consider

Design of SPS.
The proposed SPS uses the VCs status to dynamic control clock of each VC.The CFSM of SPS with VCs is illustrated in Figure 13; it contains two CFSM in this architecture.
The first CFSM includes initial, empty, full, and waiting status.Initial status: when the VC is reset, the structure is into the initial status until the flit arrive.Empty status: when the user resets the VCs or the flits transport to next storage unit, the structure is into this status.Full status: the store flit in VC is full.Waiting status: When the user resest the VCs or the store flit is complete.
The VCs with SPS algorithm is illustrated in Algorithm 4. In line 3, the VCs will initialize the VCs count and flags.The VCs will access flits to change VCs count when channel packet or arbiter signal arrive at line 4 to 9. When the VCs count can be changed, then the VCs flag will be changed at line 10 to 17.
The second CFSM includes initial, clock-gating, and wake up status.Initial status: this principle is the first CFSM of initial state.Clock-gating: when the VC changes to full or empty, then SPS will disable this VC clock and change to this status.Wake up: when the VC want to store flit, one VC will wake up.
The SPS algorithm is illustrated in Algorithm 5.In line 3, the SPS will initialize VCs clock and access status from VCs

end while Algorithm 1 :Figure 4 :
Figure 4: Topology and router relation with SPS.

Figure 11 :Figure 12 :VC 0 to i− 1 (Figure 13 :
Figure 11: VCs with SPS architecture. (2)ign of SPS Control Timimg.The VCs access timing diagrams of SPS architecture are illustrated in Figure12.The Clock Block A indicates that the VCs have no information to transmit.The Clock Block B indicates that the VCs are writing information.The Clock Block C indicates that the data in VCs are waiting to transmit.Our analysis for unused clock-gating architecture is shown in(2).The slots access information of power consumption is denoted by   .The slot content full and empty of power consumption are denoted by   and   , respectively.The   is power consumption except for   ,   , and   .The unused clock-gating architecture does not control clock for sequential logic in VCs.Therefore, the logic will generate power consumption in high transmission structure.The clocking gating consumes power in Clock Block B and Clock Block C. Our analysis for clock-gating architecture is shown in