^{1}

^{1}

^{1}

^{1}

^{1}

^{1}

Reconfigurable logic devices (RLDs) are classified as the fine-grained or coarse-grained type based on their basic logic cell architecture. In general, each architecture has its own advantage. Therefore, it is difficult to achieve a balance between the operation speed and implementation area in various applications. In the present paper, we propose a variable grain logic cell (VGLC) architecture, which consists of a 4-bit ripple carry adder with configuration memory bits and develop a technology mapping tool. The key feature of the VGLC architecture is that the variable granularity is a tradeoff between coarse-grained and fine-grained types required for the implementation arithmetic and random logic, respectively. Finally, we evaluate the proposed logic cell using the newly developed technology mapping tool, which improves logic depth by 31% and reduces the number of configuration data by 55% on average, as compared to the Virtex-4 logic cell architecture.

System large-scale integrated circuits (LSICs), which exhibit high performance and have high densities, are manufactured using advanced process technologies. However, their high costs are an enormous disadvantage. To respond to diversifications in market trends and frequent changes in standards, various types of products must be manufactured in low volumes, despite the fact that conventional production facilities essentially target high volumes of only a few types of LSICs. It follows that the cost benefit of each chip is poor because of the increase in both the nonrecurring expense (NRE) and design cost.

Hence, a reconfigurable logic device (RLD), which has circuit programmability, is applied to embedded systems as a hardware intellectual property (IP) core. Due to its flexibility, it is possible for designs to be implemented in the shortest turn-around time from specification to implementation. In particular, there is a possibility that design complexity and power distribution problems can be solved using a dynamic reconfigurable processor. It becomes necessary to adapt the frequently changing market cycles, such as that of mobile phones.

However, conventional RLDs, which are commercial field-programmable gate arrays (FPGAs), cannot achieve efficient implementation. Therefore, the chip area and power consumption increase. Because embedded systems, in particular, require a small area and low power consumption, the above-mentioned parameters are critical for these products. If we consider the quality of the actual product, conventional FPGAs are not satisfactory in terms of performance or function.

We have studied a reconfigurable logic architecture
that has both flexibility and high performance [

The use of a VGLC as a reconfigurable logic core.

Bug fixing

Avoidance of high volume

Version upgrade

Multiple mode/specfication

The remainder of the present paper is organized as
follows. Related research is described in Section

The need to
optimize a general-purpose reconfigurable logic cell architecture is recognized
in the field of reconfigurable IP-core design. For example, mixed-grain
FPGA [

Conventional RLDs are roughly classified into
fine-grained and coarse-grained architecture types based on the granularity of
the basic logic block [

However, these RLDs can be efficiently implemented only in their specific application domains. If fine-grained RLDs incorporate a digital filter, which requires several arithmetic operations, a large circuit area will be required. On the other hand, coarse-grained RLDs have several area and delay overheads for glue logic implementation.

In order to overcome this disadvantage, traditional
RLDs have dedicated circuits inside and outside the logic block. Table

Structures of typical RLDs.

Device | Virtex-4 (XC4VLX15) [ | DRP-1 [ |
---|---|---|

Logic Block structure | 4-LUT | 8-bit ALU |

Carry logic | 8-bit DMU | |

MULT AND | Register file | |

Distributed RAM 64 bit | (8 bit | |

# of LBs | 112,288 | 512 |

Dedicated IP | 18-bit multiplier | 32-bit multiplier |

In this section, we present a type of application
domain analysis. For this purpose, we use six types of OpenCores [

We first synthesize six industrial designs using the Synopsys Design Compiler with ASIC standard cell. Here, we extract the datapath circuit in the form of a macroblock using the Synopsys Design Ware library. This allows the extraction of various macroblocks, such as adders, multipliers, and wide-ranging MUXes. Nondetected circuits are categorized as random logic computation circuits.

We mapped these designs onto standard cells with a CMOS
0.35-

Synthesis results of six benchmark circuits.

In the following section, we propose a homogeneous
architecture in which, as discussed in Section

As mentioned above, since the granularity of a conventional logic cell is fixed, it is difficult to efficiently implement circuits for all applications. This problem can be solved if the computational granularity of a logic cell unit can be changed. We propose a logic cell that has a 4-bit ripple carry adder with a configuration memory bit. In this section, we describe a hybrid cell that is used as the basic element of the VGLC. Next, we introduce a novel logic cell structure and its functions.

The arithmetic computation is based on an adder, and
the random logic is expressed efficiently in a “canonical form,” such as a
lookup table. Figure

Basic structure of a hybrid cell.

Each part enclosed by brackets corresponds to a
configuration memory bit. This equation can represent the 16 logic patterns
with a 4-bit configuration memory as well as a 2-input LUT. Both Figures

Furthermore, to allow increased functionality beyond a
1-bit full adder and 2-input Reed-Muller canonical form, we construct a
structure called basic logic element (BLE), as shown in
Figure

Components and functions of a BLE.

We propose a VGLC architecture with HC.
Figure

Variable grain logic cell architecture.

The signal carry_in and carry_out terminals are
connected directly to the adjoining VGLCs as dedicated lines. The add/sub
control signal (AS) and carry selector memory (CP) are entered commonly for all
four BLEs. As shown in Figure

Basic function of a VGLC.

Arith. Mode

Shift-resistor Mode

Random Logic Mode

Misc. Logic Mode

Wide-range MUX Mode

When a BLE is used as a full adder, the VGLC is composed of a 4-bit ripple carry adder or a subtractor with a carry line between the BLEs. Since the carry path is created via the local interconnection within a logic block, it is propagated at a high speed. We can expand the bit range corresponding to the computation using the dedicated lines carry_in and carry_out. It is possible to dynamically select the add/sub functions using AS.

Figure

Output selector part.

When a BLE is used in the canonical form, the VGLC is
composed of a 3-input or 4-input canonical form (3-CF or 4-CF) with MUX10-12
(see Figure

It is shown
that the VGLC can represent 2-, 3-, and 4-input canonical forms. However, this
requires an area larger than 2-, 3-, and 4-LUTs, respectively. In order to
reduce these overheads, we use the misc. logic (miscellaneous logic) function
that applies its gate structure. Since a BLE can be used for a 4-input/2-output
module, it can represent the maximum 3- or 4-input logic pattern with 4-bit
configuration memory bits. Table

Output logic pattern of the BLE (CP = 0).

Four-input variable | Three-input variable | |||
---|---|---|---|---|

AS | ||||

0 | 182/65,536 | 24/65,536 | 120/256 | 43/256 |

1 | 230/65,536 | 24/65,536 | 148/256 | 43/256 |

Total | 446/65,536 (0.68%) | 206/256 (80.47%) |

Example of a three-input misc. logic function.

Using multiple BLEs, the VGLC can also increase the number of inputs that are expressed. The coverages are 73.6% and 50.3% in the 4-input logic and 5-input logic, respectively, with two BLEs. We can combine a maximum of four BLEs.

In Figure

In order to efficiently map the five functions of the VGLC, the datapath circuits (e.g., adder) a or wide-ranging multiplexer (e.g., 8:1 multiplexer) are prepared in advance in the user library for the logic synthesis process. We can extract these circuits as a macroblock during logic synthesis. The remaining circuits are mapped using the Misc. Logic and Random Logic modes.

We need to implement Random Logic circuits
using the VGLC structure. UCLA FlowMap [

Correspondence to the netlist that contains the macroblock.

Addition of the Misc. Logic mode for random logic mapping.

First, since HeteroMap targets single-output LUTs, it
must be modified such that the netlist will include a multioutput macroblock.
Second, the VGLC has the

The minterm expansion of an

Two functions

The function that has the
smallest binary number representation among the functions of a

Figure

^{*}/

VGLC-HeteroMap.

In this evaluation, we show the number of logic cells, the logic depth, the area, and the amount of configuration data used to implement benchmark circuits. This section presents a brief description of the environment and the model using the following evaluations.

We need to
fairly compare both the coarse-grained and fine-grained types with the VGLC.
However, a coarse-grained type logic cell has various structures, and there is
no freely available CAD tool. Therefore, the fine-grained logic cell is
restricted to three types of LUT-based structure, as shown in Figures

Homogeneous LUT structures with carry chain.

6-LUT with carry chain

4-LUT with carry chain

Heterogeneous LUT structures with carry chain.

Three components of the VGLC.

Similar to the area count, the total amount of
configuration data is characterized by the product: the number of configuration
memory bits per logic cell

The combination
delay of the 4-LUT (

Implementation parameters for VGLC.

Function | |
---|---|

1-BLE (e.g., 2-CF, 3-misc.) | 2.4 |

2-BLE (e.g., 3-CF, 4-misc., 5-misc.) | 2.5 |

4-BLE (e.g., 4-CF, 6-misc.) | 2.6 |

Note that although the 0.35-

Figure

Architecture evaluation flow.

mapping method using Random Logic mode only,

mapping method using (A) + Misc. Logic mode + macro block,

mapping method using each comparison logic cell.

First, the Verilog RTL netlists of circuits are used as inputs for the synthesis tool Synopsys Design Compiler and are mapped onto the standard cell library. An adder and/or wide-ranging multiplexer is extracted as a macroblock using the DesignWare library. Furthermore, we convert the gate-level netlist into the BLIF format using perl script, the netlists are mapped using the VGLC-HeteroMap. Second, each comparison logic cell used the above-described netlist and technology mapping tool, as well as the VGLC. Finally, we obtain the mapping delay, the logic depth, and the number of logic cells.

In this evaluation, OpenCores [

We first
evaluate effect of misc. logic functions. In this evaluation, five OpenCores
circuits and large/medium MCNC benchmarks are used. Table

Number of logic cells and mapping delay.

Circuit | No. of logic cells | Mapping delay [ns] | ||
---|---|---|---|---|

(A) | (B) | (A) | (B) | |

C7552 | 563 | 410 | 19.9 | 15.4 |

s5378 | 430 | 262 | 14.9 | 12.7 |

C2670 | 158 | 96 | 17.3 | 12.7 |

misex3 | 1,855 | 1,667 | 17.7 | 17.5 |

seq | 1,452 | 1,006 | 17.5 | 17.4 |

ac97 | 4,154 | 2,145 | 10.1 | 7.7 |

aes | 11,590 | 4,633 | 20.5 | 19.9 |

biquad | 791 | 578 | 25.1 | 20.2 |

sha256 | 3,499 | 1,719 | 17.7 | 14.9 |

vga | 780 | 385 | 12.8 | 12.7 |

Ratio of Misc. Logic (five MCNC benchmarks).

Ratio of Misc. Logic (five OpenCores benchmarks).

Table

Normalized delay, area, and configuration data.

Circuit | Mapping delay | Area | Data | |||
---|---|---|---|---|---|---|

(A) | (B) | (A) | (B) | (A) | (B) | |

C7552 | 1.00 | 0.77 | 1.00 | 0.73 | 1.00 | 0.73 |

s5378 | 1.00 | 0.85 | 1.00 | 0.61 | 1.00 | 0.61 |

C2670 | 1.00 | 0.73 | 1.00 | 0.61 | 1.00 | 0.61 |

misex3 | 1.00 | 0.99 | 1.00 | 0.90 | 1.00 | 0.90 |

seq | 1.00 | 0.99 | 1.00 | 0.69 | 1.00 | 0.69 |

ac97 | 1.00 | 0.76 | 1.00 | 0.52 | 1.00 | 0.52 |

aes | 1.00 | 0.97 | 1.00 | 0.40 | 1.00 | 0.40 |

biquad | 1.00 | 0.80 | 1.00 | 0.73 | 1.00 | 0.73 |

sha256 | 1.00 | 0.84 | 1.00 | 0.49 | 1.00 | 0.49 |

vga | 1.00 | 0.99 | 1.00 | 0.49 | 1.00 | 0.49 |

Ave. | 1.00 | 0.87 | 1.00 | 0.62 | 1.00 | 0.62 |

Table

Logic depth, area, and data for different architectures.

Circuit | VGLC | Virtex-4 | ||||
---|---|---|---|---|---|---|

Depth | Area | Data | Depth | Area | Data | |

C7552 | 6 | 273,060 | 11,070 | 8 | 167,480 | 13,825 |

s5378 | 5 | 174,492 | 7,074 | 6 | 132,288 | 10,920 |

C2670 | 5 | 63,936 | 2,592 | 7 | 45,156 | 3,728 |

misex3 | 7 | 1,110,222 | 45,009 | 7 | 592,328 | 48,895 |

seq | 7 | 669,996 | 27,162 | 7 | 506,680 | 41,825 |

ac97 | 3 | 1,428,570 | 57,915 | 4 | 1,225,996 | 101,203 |

aes | 8 | 3,085,578 | 125,091 | 8 | 1,982,624 | 163,660 |

biquad | 18 | 384,948 | 15,606 | 25 | 274,328 | 22,645 |

sha256 | 48 | 1,144,854 | 46,413 | 88 | 1,098,372 | 90,668 |

vga | 11 | 256,410 | 10,395 | 18 | 272,632 | 22,505 |

circuit | 4-LUT | 6-LUT | ||||

Depth | Area | Data | Depth | Area | Data | |

C7552 | 8 | 154,840 | 12,640 | 6 | 316,892 | 30,418 |

s5378 | 6 | 124,852 | 10,192 | 5 | 291,066 | 27,939 |

C2670 | 7 | 41,748 | 3,408 | 5 | 76,082 | 7,303 |

misex3 | 7 | 547,624 | 44,704 | 7 | 1,022,570 | 98,155 |

seq | 7 | 468,440 | 38,240 | 6 | 1,173,338 | 112,627 |

ac97 | 4 | 1,115,828 | 91,088 | 4 | 1,885,298 | 180,967 |

aes | 8 | 1,833,776 | 149,696 | 7 | 4,150,308 | 398,382 |

biquad | 26 | 255,192 | 20,832 | 22 | 442,532 | 42,478 |

sha256 | 168 | 1,015,476 | 82,896 | 168 | 2,228,016 | 213,864 |

vga | 33 | 250,880 | 20,480 | 33 | 517,916 | 49,714 |

Normalized logic depth, area, and configuration data.

Circuit | Virtex-4 | 4-LUT | 6-LUT | ||||||
---|---|---|---|---|---|---|---|---|---|

Depth | Area | Data | Depth | Area | Data | Depth | Area | Data | |

C7552 | 1.33 | 0.61 | 1.25 | 1.33 | 0.57 | 1.14 | 1.00 | 1.16 | 2.75 |

s5378 | 1.20 | 0.76 | 1.54 | 1.20 | 0.72 | 1.44 | 1.00 | 1.67 | 3.95 |

C2670 | 1.40 | 0.71 | 1.44 | 1.40 | 0.65 | 1.31 | 1.00 | 1.19 | 2.82 |

misex3 | 1.00 | 0.53 | 1.09 | 1.00 | 0.49 | 0.99 | 1.00 | 0.92 | 2.18 |

seq | 1.00 | 0.76 | 1.54 | 1.00 | 0.70 | 1.41 | 0.86 | 1.75 | 4.15 |

ac97 | 1.33 | 0.86 | 1.75 | 1.33 | 0.78 | 1.57 | 1.33 | 1.32 | 3.12 |

aes | 1.00 | 0.64 | 1.31 | 1.00 | 0.59 | 1.20 | 0.88 | 1.35 | 3.18 |

biquad | 1.39 | 0.71 | 1.45 | 1.44 | 0.66 | 1.33 | 1.22 | 1.15 | 2.72 |

sha256 | 1.83 | 0.96 | 1.95 | 3.50 | 0.89 | 1.79 | 3.50 | 1.95 | 4.61 |

vga | 1.64 | 1.06 | 2.16 | 3.00 | 0.98 | 1.97 | 3.00 | 2.02 | 4.78 |

Min. | 1.00 | 0.53 | 1.09 | 1.00 | 0.49 | 0.99 | 0.86 | 0.92 | 2.18 |

Max. | 1.83 | 1.06 | 2.16 | 3.50 | 0.98 | 1.97 | 3.50 | 2.02 | 4.78 |

Ave. | 1.31 | 0.76 | 1.55 | 1.62 | 0.70 | 1.42 | 1.48 | 1.45 | 3.43 |

The logic depth and amount of configuration data are
reducible in all comparative logic cells. For this reason, the misc. logic
function and heterogeneous mapping may have a considerable effect on
large-scale applications in particular. With fewer logic cells to cross, the
routing delay between the modern FPGA is lower. Therefore, it is necessary to
pack more logic into one block in order to avoid routing delay, which is a
major problem in the deep submicron FPGA. It is important that the VGLC can
pack more logic per logic cells. On the other hand,
Table

In the present evaluation, we assume that the routing
architecture is island style routing. It is actually a routing area
which dominates a die area and has a significant influence
on circuit performance (area, delay, and power). However, since the VGLC is
used as a logic IP core for small or medium circuits, we believe that the
number of routing tracks in island style routing is reduced to below the number
of stand-alone type modern FPGAs. Moreover, the VGLC does not restrict the
routing architecture to the above style, and various connections and routing
architectures can be implemented due to the specifications. For example,
Runesas MX-core [

In the present paper, we have proposed a VGLC architecture and evaluated the area, delay, and configuration data using a technology mapping tool called VGLC-HeteroMap. The novel architecture, which is based on a 4-bit ripple carry adder that includes configuration memory bits, offers a tradeoff between coarse and fine granularity and can be used for efficient mapping in an application. In our evaluation, the VGLC improves the logic depth by 31% and reduces the number of configuration data by 55% on average, compared to the Virtex-4 logic cell.

The present study did not consider the routing
network, which is likely to dominate the area and delay in an FPGA
implementation. In the future, we will study the connection block and routing
structure required to best support the VGLC. Currently, we are attempting to
optimize the full custom design to evaluate the chip performance and are
developing clustering, place, and routing tools for the proposed architecture.
Furthermore, the VGLC can be used as an IP core and can be considered as an
FPGA logic element. Since the VGLC can reduce the logic depth compared to other
traditional LUT architectures, it is possible to implement the VGLC as a 2D
array of the VGLC that is connected by an island style routing architecture.
Kuon and Rose [

The present study was supported by the VLSI Design and Education Center (VDEC) at the University of Tokyo in collaboration with Synopsys, Inc., Calif, USA.