^{1}

^{1}

^{1}

^{1}

^{1}

^{2}

^{1}

^{2}

In modern SRAM based Field Programmable Gate Arrays, a Look-Up Table (LUT) is the principal constituent logic element which can realize every possible Boolean function. However, this flexibility of LUTs comes with a heavy area penalty. A part of this area overhead comes from the increased amount of configuration memory which rises exponentially as the LUT size increases. In this paper, we first present a detailed analysis of a previously proposed FPGA architecture which allows sharing of LUTs memory (SRAM) tables among NPN-equivalent functions, to reduce the area as well as the number of configuration bits. We then propose several methods to improve the existing architecture. A new clustering technique has been proposed which packs NPN-equivalent functions together inside a Configurable Logic Block (CLB). We also make use of a recently proposed high performance Boolean matching algorithm to perform NPN classification. To enhance area savings further, we evaluate the feasibility of more than two LUTs sharing the same SRAM table. Consequently, this work explores the SRAM table sharing approach for a range of LUT sizes (4–7), while varying the cluster sizes (4–16). Experimental results on MCNC benchmark circuits set show an overall area reduction of ~7% while maintaining the same critical path delay.

Look-Up Tables (LUTs) in an FPGA offer generous flexibility in implementing logic functions. LUT is an

To bridge this gap between FPGAs and ASICs, FPGA architectures have been under continuous overhaul, ever since their inception. Previously published articles such as [

In the past few years, some research has been focused towards exploring innovative logic blocks for FPGA, such as [

All of the architectures discussed above utilize the concept of NPN-class equivalence [

The main drawback of the logic blocks proposed in [

Meanwhile, a lot of research has also been directed towards architectures with reduced number of configuration memory cells. Architectures such as [

Kimura et al. [

This work employs a novel CLB [

(a) LUTs with shared SRAM vectors and (b) CN logic.

The experiments in [

In an earlier work [

The remainder of this paper is organized as follows. Section

This section describes the steps involved in mapping NPN-equivalent functions to LUTs with shared SRAM tables.

Two functions, say

For

In [

After finding the equivalent classes, LUTs are clustered using T-VPACk algorithm [

A CLB with 10 LUTs (

We employ two clustering approaches which attempt to map NPN-equivalent functions on these shared pairs. The one used in [

A comparative analysis of the two approaches has also been performed which will be discussed in Section

Modified CAD flow with equivalence analysis.

All the architecture files we have used for experimentation extract their parameters from the iFAR

The number of inputs to the cluster

Table

Architecture file parameters.

Parameter | Value |
---|---|

RMinWidth NMOS | 2800 |

Switch block type | Wilton |

Switch type | MUX |

Segment length | 4 |

RMinWidthPMOS | 7077 |

Fs | 3 |

Switch delay | 103 ps |

Segment type | Unidir. |

Since

The model employed in VPR to perform area estimation is known as Minimum Width Transistor Area Model (MWTM) [

For a

Since, for a

The LUTs whose I/Os are appended with CN logic cells will incur an additional delay in their look-up times. To estimate this delay, we simulated the CN logic cell in Cadence Virtuoso 6.1.5 using 150 nm process. The propagation delay is the average value for rise and fall transitions. The propagation delay of the CN logic (

To evaluate the performance of the CLBs with shared SRAM tables, we have performed rigorous testing on a variety of architectures by varying

In this section, we present our observations for the following set of results:

Number of NPN-equivalence classes as a function of input size

Comparison of the two clustering methods (discussed in Section

Effects of varying the number of shared pairs (

Effects of varying the degree of sharing (

Impact of modified clustering on routability

Channel width, critical path, logic area, and total FPGA area for cluster sized

NPN-Equivalence analysis is performed for an input circuit after it is synthesized as shown in Figure

Number of used NPN-equivalence classes for input size

In Section

Hence, to evaluate performance of the two methods, we observed the number of CLBs required to implement the whole circuit. The number of required CLBs serves as a measure of relative efficiency; the clustering approach which requires more CLBs is not well-suited to map NPN-equivalent functions on LUTs which allow sharing their SRAM tables. Also an increase in the number of CLBs affects the final place and route results like critical path delay and routing channel width.

In Figures

Comparison of the clustering methods for

Comparison of the clustering methods for

The area and configuration memory savings are directly related to the number of shared SRAM tables in a CLB. For example, in a CLB with cluster size

Although a high value of

Figures

Number of CLBs utilized for cluster size of

Number of CLBs utilized for cluster size of

The idea of LUT pair sharing a memory vector can be extended to 3 or more LUTs, if all the LUTs have been mapped with the same NPN class. In this article, we extend the degree of sharing (

However, similar to the number of shared pairs (

To depict this behavior we plot the number of LUTs left empty during clustering due to the lack of NPN-equivalent classes in the priority queue against the varying input size (

Number of LUTs missed for cluster size

In Sections

To analyze the impact of greater CLB usage on the circuits routability, we normalize the number of CLBs and channel widths required to implement the entire MCNC benchmark circuits using the modified architecture

Impact of high CLB usage on channel width.

We first consider the CLBs with cluster size

Average area savings for

Average critical path for

For the experimentation with cluster size

The results in Figure

The results obtained for the cluster size

Average area savings for

Average critical path for

The results in Figure

Most of the recently proposed FPGA architectures focus on replacing legacy LUTs with innovative, high coverage logic blocks. Although such logic blocks offer high area and performance gains for a particular benchmark suite, they are not generic enough to maintain quality of results over a wide range of circuits. In this paper, we have explored a novel FPGA architecture which allows sharing LUTs SRAM vectors between NPN-equivalent functions. To find NPN equivalence, a very high speed state-of-the-art Boolean matching algorithm has been employed. Furthermore, an efficient packing technique has also been introduced to cluster NPN-equivalent functions together inside a CLB. By using CLBs with shared LUTs (for cluster size

The authors declare that there are no conflicts of interest regarding the publication of this paper.

This project is funded by NSTIP, Saudi Arabia. The authors acknowledge the support of STU (Science and Technology Unit), Umm Al-Qura University, Saudi Arabia.