Memory Chips with Adjustable Configurations *

In this paper, we present the concept of Field Programmable Memory Cell Arrays (FPMCAs) as the memory counterpart to Field Programmable Gate Arrays which have proved their utility in design and rapid prototyping. Principles of dynamic reconfigurability using programmable logic and programmable interconnect are incorporated into random access memories to achieve this flexibility. We first present the design of a variable width RAM (VaWiRAM) which is a simple example of a Field Programmable Memory Cell Array. The configuration of VaWiRAMs can be adjusted by setting a few configuration pins on the memory chip. A VaWiRAM reconfigurable between widths and Wmax can be constructed with the extra cost of Wmax pass gates, (Wmax/2) 2-to-1 multiplexers, and [log2[log2(k)+ 1] mode pins. A novel scheme to overlap the address pins with mode control pins and achieve the mode control with only one extra pin is also presented. The paper discusses the architecture of the proposed VaWiRAMs in detail, analyzes the design tradeoffs and introduces the concept of FPMCAs.


INTRODUCTION
Designing efficient memory systems for modern microprocessors is increasingly becoming a chal- lenge [10, 12, 15, 18, 19,21].The width of DRAMs available in the market is lower than the bus-width of most modern high performance processors and hence most computers normally use single-in-line memory modules (SIMMs) which are small printed circuit boards stocked with multiple DRAMs ar- ranged in parallel, to yield a width of 32 to 144 bits, matching the memory width to the computer's bus-width.Granularity the minimum increment of memory that may be added to a system [11] in- creases as the width of the memory chip becomes narrower with respect to the memory bus.For in- stance, on a 32-bit wide bus that uses 4-Mbit   DRAMs, if the chips are organized as M X 4 bits, *This paper is based on "VaWiRAM: A Variable Width Random Access Memory Module", by Lizy Kurian John which appeared in the Proceedings of the 9th International Conference on VLSI Design, January 1996, page number 219-224, (C) 1995 IEEE.U.S.
Patent 5, 867, 422 has been granted to the author on FPMCAs.Lizy John's research is supported by grants from the United States National Science Foundation, the State of Texas Advanced Technology Program, IBM, DELL, AMD, and Intel.
L.K. JOHN   then it takes eight of them to fill out the 32-bit bus with a minimum granularity of 4 MBytes of mem- ory.If, however, the memory chip is organized as 4M X bit, then 32 of them are needed to fill out a 32-bit bus with a minimum granularity of 16   MBytes of memory.If the processor bus-width is 64 bits, the granularity increases even further.The organization of the chip thus affects the expandability of the memory system.It would be desirable for a user to be able to configure the memory chip to any desired width, without being forced to adopt a certain granularity.In this paper, we present the design of a variable width RAM (VaWiRAM) that can be programmed to configure itself to variable widths.In essence, if a memory chip of N b bits is available, our design makes it possible to configure the chip to 2NXb/2 bits, or 4NXb/4 bits, or in general kNXb/k bits, where k is a power of two.
Currently, each SIMM can be used only with processors that have exactly the same bus-width.Even a configuration with half the bus-width can- not efficiently use the same SIMM.With the avail- ability of variable-width RAMs, programmable SIMMs can easily be constructed.By changing the values on a few control pins, the chips can be re- programmed to different configurations.
Another fact motivating the research in this paper is that different processors need different inter- face logic in order to be able to use the same memory chip.Currently memory vendors manufacture sepa- rate cache modules for different processors.For instance, Motorola MCM72BA32 and MCM72BA 64 BurstRAMs are [17] designed to be secondary cache modules for the Intel Pentium processor.For the PowerPC and MC68040 processors, Motorola provides the 64 K X 18 Bit BurstRAM, MCM 67M618.It would be desirable to embed program- mable logic on a memory chip so that it may be easily interfaced to various processors.
The primary objective of this paper is to present the design of a variable width RAM (VaWiRAM) which can be configured to different widths.A second objective of this paper is to present the design of a fully programmable memory cell array (FPMCA) with programmable logic on the chip, enabling to interface the memory module to vari- ous processors, with reduced parts count.
In Section 2, we describe the related research in the past.In Section 3, the design of the VaWiRAM is described.The design of the programmable out- put data control circuitry and the programmable column decoder unit are described in detail.Various trade-offs involved in the design are examined.
In Section 4, exten.sions to the VaWiRAM, especial- ly Field Programmable Memory Cell Arrays (FPMCAs), are explained.In Section 5, we offer concluding remarks.
2. RELATED RESEARCH Some of the look-up table (LUT) based FPGAs from manufacturers such as Xilinx [22] and Altera [2] offer a method to realize reconfigurable RAM [20].For instance, the look up tables on the Xilinx XC4000 series of FPGAs can be used to construct RAMs and it is possible to realize RAMs of different sizes using these FPGAs.However, since FPGAs are primarily intended for programmable logic, it'is inefficient to use them as RAMs.For instance, Xilinx XC4020 (a 20,000 gate chip with more than 240 pins) [22] can yield only 28.8k RAM (not really surprising, because XC4020 is not intended to be a memory chip).What we pre- sent in lhis paper are dedicated memory resources with reconfigurability.Such FPMCAs were first discussed in [14].Programmable memories have been discussed by Bertin et al. [4,5], however the programmability had been for functionality rather than for configuration.Some reconfigurability at the factory is available in most RAMs for fault-tolerant purposes.For instance, many RAMs incorporate spare rows and columns to replace faulty cells.In many cases, the programming technology is laser-interconnect and reconfigurability is limited to one time.This paper presents the design of a user-programmable mem- ory cell array which can be adjusted to different configurations multiple times.

ARCHITECTURE
Conventional memory chips contain a matrix of memory cells, row and column address decoders, input and output buffers and control circuitry.In dynamic RAMs, the I/O circuits will include sense amplifiers etc. Figure illustrates the architecture of the proposed VaWiRAM chip, which is similar to conventional memory chips except that programmability has been incorporated into the various blocks.For instance, the column address decoder (PCADU), output data control unit (PADCU), and input data control unit are programmable.The row decoder and the memory cells are not shown to be reconfigurable, but they also could be made reconfigurable.We have restricted the program- mability to a few selected units in order to reduce the cost of reconfigurability.(We chose the column decoder to be programmable, but one could have chosen the row decoder instead.)The architecture also includes an extra block, the Mode Control Unit (MCU) which receives mode information from the user and delivers appropriate control signals to the various programmable units.
In order to incorporate programmability into the VaWiRAM, we use pass gates, and multi- plexers.A pass transistor can be used to realize a programmable interconnection between two wires.The pass transistor turns on or off depending on the control value.When off, the pass gate presents a very high resistance and thereby an open connec- tion between the two wires.When the pass gate is turned on, it forms a relatively low resistance and thereby a closed connection between the two wires.
Thus the connection between the two wires can be turned on or off by changing the control values.Similarly, a bidirectional pass gate can be con- structed with two pass transistors and an inverter.
A multiplexer may be used if there are several input wires.The control signals to control the pass gates and multiplexers is applied at special mode input pins (here, the MCU input pins).The details of the various blocks inside the chip are presented below.The design uses a hardwired row decoder and a programmable column deco- der.The job of the programmable circuitry is to configure the data width to the desired width and correspondingly configure the column decoder, column address pins and chip data pins.At this juncture, we define a few terms.

DEFINITIONS
Wmax: The maximum allowed width of the chip.Wmax configuration: A VaWiRAM in its max- imum width.
Wmin: The minimum allowed width of the chip.Wmin configuration: A VaWiRAM in its minimum width.
Width Variability factor (k): The ratio of the high- est desired width to the lowest desired width.
For illustration purposes, we use a 1M X chip which is reconfigurable to 512kX2 bits, or 256k X 4 bits, or 128k X 8 bits, or 64kX 16 bits.
Since we keep the row decoder to be fixed, the 1M X chip can be viewed as one thousand k X blocks as in Figure 2.For this chip, maximum width, Wmax 16   minimum width, Wmin Pmgrammabl; Logic t""reali'ze Clumn decoders, I/0' Buers,'nd"othet control circuitry FIGURE 2 A 1M-bit chip can be viewed as 1000 blocks of lk bit size.and width-variability factor, k=16.

Programmable Column Address Decoder
Unit (PCADU) When data-width of a chip increases by factor k, the number of address lines get reduced by logz(k) and hence if there are n address bits in the column address in the minimum width case there will be n-log2(k) address bits in the maximum width case.For instance, the chip under design will need 20, 19, 18, 17, or 16 address pins and 1,2, 4, 8, 16  data pins respectively in the five different config- urations.
The PCADU is set up in two stages with a layer of multiplexers constituting the first stage as in Figure 3a, and a layer of decoders in the second stage as in Figure 3b.THEOREM If there are n bits in the column address in the minimum width case and n-logz(k) bits in the maximum width case, the programmable column decoder can be set up as k/2 multiplexers, each of size 2-to-l, and k decoders each of size n-log2(k) to 2 n-lg2(k).
In this example, we set up the first stage as eight 2-to-1 muxes as in Figure 3a, and the second stage as sixteen 6-to-64 decoders as illustrated in Figure 3b.Let us assume that A0, A1, A2,..., A19 denote the twenty address lines.In the 1M configuration, the lower 10 address pins A0, A1,..., A9 will be connected to the row decoder, and the higher 10 address bits, A10,A11,... ,A19 are connected to the column decoder.The column decoder will use 10, 9, 8, 7, and 6 bits when the width of the chip is 1,2, 4, 8, and 16 bits respectively.The 6 input lines to all of the sixteen 6-to-64 decoders in Figure 3b are connected to address lines A10, A11,..., A15.Only one of the sixteen 6-to-64 decoders will be enabled concurrently if chip width is 1.Two, four, eight or all sixteen of the decoders will be enabled simultaneously if desired chip-width is 2, 4, 8, or 16  respectively.The circuitry that will accomplish the FIGURE 3 Programmable column address decoders.
enabling of different numbers of decoders is shown in Figure 3a.The inputs and outputs of the cir- cuitry in Figure 3a are illustrated in Table I.The PODCU in VaWiRAM incorporates pass gates in order to realize width variability.Pass gates are placed between the data lines as in Figure 4, so that the lines can be connected to- gether or not, depending on the desired width.
THEOREM 2 To achieve a width-variability factor of k, log2(k) levels of pass gates are required.
Proof By connecting together pairs of data lines in the Wmax configuration, a width of Wmax/2 can be realized.Since width-variability factor is k, Wmax/Wmi k.The width gets divided by two for each level of pass gates; so log2(k) levels of pass gates will be required to achieve a width variability factor of k.
THEOREM 3 The total number of pass gates re- quired to achieve a width-variability factor of k is log2 (k) Wmax Z 2 i=1 Proof In the Wmax configuration, (Wmax/2) pass gates are required to connect together pairs of data lines and obtain a width of (Wmax/2).(Wmax/4) pass gates will be required for the next level, (Wmax/8) pass gates for the next level and so on.
Proof If Wmin 1, k Wmax and Wmax/2 --Wmax/4 --Wmax/8 +"" THEOREM 5 To achieve a width-variability factor of k, logz(k) control values are sufficient.Proof To achieve width-variability factor of k, logz(k) levels of pass gates are sufficient.Each pass gate in the same level has identical values.Hence logz(k) control values are sufficient.Actually we can further reduce the number of control signals, which is explained in the subsection on Mode Control Unit (MCU).
For illustration, from Theorem 2, four levels of pass gates are required to change the width between and 16. Figure 4 illustrates how 16 data lines can be controlled by 4 mode select lines to yield any width from 16 to 1, which are powers of two.From Theorem 3, the number of gates required can be verified to be 15. Figure 5 illustrates the values required to control the pass gates.
A similar circuit has to be incorporated into the input circuitry (PIDCU) also.Because of the similarity, the design of PIDCU is not described here.The mode select lines controlling the pass transistors in the input data lines can be the same as those in the output circuitry.

Mode Control Unit (MCU)
The control values required for the PCADU, PODCU and PIDCU are identical and are genera- ted by the MCU.Out of the log2(k) control values, not all combinations are used and hence we could further minimize the number of control signals required for the operation of the chip.
THEOREM 6 If the width-variability factor is k, [log(log2k + 1) bits of information are sufficientfor the mode information.
Proof If the width-variability factor is k, there are log2k + distinct widths considering the re- striction that only powers of two are allowed.If there are log2k + distinct widths, [log(log2k + 1)   bits of information will be sufficient to indicate the desired width.
Illustration In our example, the width-variability factor is 16.Hence there will be log2(16)+ or 5 distinct widths that may be desired (actually 1,2, 4, 8, and 16).Since there are 5 different modes to be supported, [log2(5) or 3 bits are required to re- present the mode information.Even in a case for programming from bit to 1024 bit width, only four lines are sufficient.
For our example design, the mode control unit has three input bits which are used to generate the select signals for the programmable column dec- oder and the input and output data control units, according to Figure 6.The input combinations to correspond to each output combination is rather arbitrary and different choices can lead to different number of gates in an implementation.The input combinations shown in Figure 6 is the trivial choice but not the optimal choice.There are ways to overlap the functions of some of the mode control pins with the address pins, which is dis- cussed later. 3.4.Tradeoffs

Pin Limitations
The most serious problem associated with Va- WiRAMs is the requirement for a large number of pins for the chip.VaWiRAM chips should be able to support the number of data pins corresponding to the largest width to be supported, the number of address pins corresponding to the smallest width, and at least log2[log2(k)+ 1] mode pins.Wider configurations naturally require larger number of data pins and hence the increase in the number of data pins should not be considered as a disadvan- tage.The need to have mode control pins is a disad- vantage, however, we can overlap some of the mode control pins with the address pins as described below.
While increasing the data width by k times, the number of address lines required reduces by log(k) and those lines can be used to deliver the mode information to the chip.Figure 7  manner in which the address lines and the mode control bits can be shared.All mode control pins except one can be time-shared with the address pins.That line if 0 commands the chip to be in the narrowest possible configuration.If this mode control line is 1, the chip is in a wider mode.Hence at least the highest address line (here, A20) is not required for addressing.So the highest address line can be used for mode information.If that line is also 0, the chip is twice as wide as the narrowest configuration.If A20 is a 1, the chip is even wider.So A19 is not required for addressing and will be used for mode information.If A19 is 0, the chip is 4 times as wide as the narrowest configuration.If A19 is 1, the chip is wider than 4 times the original width.So A18 will not be required for addressing and can be used for mode information.If A18 is a 0, the chip is 8 times the original width and if A18 is a 1, the chip is 16 times the original width.Thus mode control can be accomplished by just one extra pin.

Area Overhead
The VaWiRAM requires more hardware than a conventional RAM.The input and output data control unit will need (Wmax-1) extra pass gates each.Similarly the programmable address decoder unit requires k/2 extra multiplexers each of size 2-to- 1.The mode control unit also consumes a few gates.
Table II illustrates the area overhead for achiev- ing the programmability for a 64k bit memory array.The overhead is computed as the ratio of the number of transistors utilized for reconfigur- ability to the number of transistors utilized for the actual memory cells.Static RAM cells with 6- transistors are assumed for the RAM.A pass gate is assumed to contain 4 transistors and a 2-input mux is assumed to contain 6 transistors including the required inverters.The mode control unit is assumed to be built as a compound logic function implemented at transistor level.Figure 8 illustrates the percentage overhead for programmability upto a width variability factor of 256, for 16k, 64k and 1M RAMs.
Compared to the flexibility yielded by the de- sign, the incurred overhead is negligible.For exam- ple, for a 1M RAM, the area expended in achieving a programmable width from bit to 256 bits is only 0.046%.For smaller RAMs, the percent- age overhead is higher.Overall, the area overhead increases with the width variability factor k and reduces with increasing array sizes.

Time Considerations
The circuitry incorporated for the reconfigurability of VaWiRAMs can contribute to longer line de- lays and slower switching.The various program- mable units, namely the PCADU, PODCU, and MCU will add a few gate delays to the RAM ac- cess process unless carefully implemented.A naive design will slow down the chip, however, on care- ful observation, it was clear that many of these delays could be eliminated by appropriate circuit design.The delays can be reduced or eliminated by overlapping them with operations in the con- ventional RAM.It is difficult to obtain precise estimate of the delays without detailed simulations of the whole circuit and such an analysis is beyond the scope of this paper.A preliminary evaluation shows that the PODCU will contribute one tran- sistor delay.Similarly, the MCU will lead to adding two transistor delays.

EXTENSIONS TO THE VaWiRAM ARCHITECTURE
In VaWiRAM, we restricted reconfigurability to the column decoder and the input-output data control.We could remove this restriction to yield a chip with programmable memory array as well as programmable row and column decoders.The most general view of such a programmable me- mory cell array would be as in Figure 9.There will be a sea of memory blocks (individual memory cells or large groups of cells) which can be inter- connected in ways desired by the user, using pro- grammable interconnect present in various locations within the chip.The row and column decoders, I/O circuits and interconnections be- tween the memory blocks will be configured by the user from the programmable logic present in the chip.The chip will also have I/O blocks which can be configured as input or output buffers.The pins on the chip are not hard-wired to any address or data lines and will be connected by the user to yield the desired configuration.These program-mable logic blocks may be similar to the config- urable logic blocks in commercial FPGAs or they can be tailored to constitute elements specific to interface logic needed by memory chips.The programmable logic can be used to dynamically alter the data width of the RAM, and to configure any control logic required to interface the memory to the processor.
The voltages controlling the various program- mable points in the chip reside in static RAM cells constituting the 'configuration RAM'.The chip configuration can be altered by loading a different set of bits into the configuration RAM.Smaller memory blocks will yield increased flexibility during reconfiguration but will increase the overhead associated with reprogramming.Higher amount of routing resources will help easy routing but will increase the overhead.Partitioning, placement and routing algorithms developed in connection with FPGAs may be used for FPMCAs as well.
Every programmable point consumes silicon and reduces the real estate that can be used for mem- ory cells.Fine-grain programmability in which every memory block is a single cell will be very inefficient due to this reason.A whole range of devices from less efficient and highly flexible to highly efficient and less flexible, as in Figure 10 can be designed.The key to designing a successful pro- grammable device is striking the right balance between the overhead associated with reprogrammability and the flexibility gained by it.
The FPMCA concept could be implemented in several ways.Depending on the amount of pro- grammability provided by the chip, we categorize FPMCAs to various levels as indicated in Table III.
(i) Level 0 is conventional RAM with no reconfig- urability of organization.There is no extra flexibility or overhead.Level 0 is defined only to form a baseline configuration for comparison purposes.It may be noticed that level 0 FPMCAs have an overhead of 0%.(ii) Level   reconfigurable between widths and Wma can be constructed using Wma pass gates, (Wmax/2) 2-to-1 multiplexers, and Ilog2[log2 (k) + 1] mode pins.Level FPMCAs have an overhead typically smaller than 10%.The area overhead of a few FPMCAs with program- mability equal to the square root of the total array size is presented in Table IV.
The area overhead is seen to be less than 6.5 % for RAMs of size lk to 64k bits.As the size of the RAM reduces, the overhead can go beyond   10% (for instance, in the case of a 256 bit array with a width variability factor of 16, the overhead is 11%).However, it should be remembered that the classification is being used only to illustrate the general trend that reconfigurability at a fine-grain costs more than coarse-grain reconfigurability.(iii) Level 2 or Medium Grain Reconfigurable FPMCAs contain two-dimensional program- mability rather than only programmability of width as in VaWiRAMs.Each of the memory blocks is not a single cell, but it is a macrocell consisting of a two-dimensional array of in- dividual cells.Level 2 FPMCAs have overhead typically ranging from 10% to 50%.It may be noticed that according to our definition, 50% overhead means that the overhead is equal to the usable memory area.Overhead decreases as the size of the macro-block increases.
(iv) Level 3 or Fine Grain Reconfigurable FPMCAs contain a highly programmable memory array in which each memory block is an individual cell.As expected, the area overhead is extre- mely high, and is higher than the real estate spent on memory.(Not too surprising-in many FPGAs, more silicon is spent on inter- connections rather than on programmable logic).Level 3 FPMCAs have an overhead higher than 50%.An evaluation of the over- head for level 3 FPMCAs of sizes 256 bits to 2k bits is illustrated in Table V.It may be noticed that this computation considers only the hard- ware required to yield a programmable array; it does not consider any programmable interface logic on the chip intended to interface the chip to processors or other devices.
The distinction between different classes of devices is not rigid.As mentioned before, the classi- fication is intended only to illustrate the general trends.FPMCAs is a new concept and the treat- ment provided here is rather preliminary.However, it is an important step towards understanding the tradeoffs in their design.
The dynamic reconfigurability of FPGAs [2,. 6,22] have been employed in several interesting applications [3, 8, 9, 13].Availability of FPMCAs can lead to interesting applications requiring large amounts of flexible memory.It will be useful for developing adaptive cache architectures or branch predictors that dynamically reconfigure itself to suit the application being run on the processor.As in FPGA based systems where hardware upgrades become a software procedure that can be accom- plished by sending a different configuration pat- tern on a diskette or via an electronic transmission [16], systems with FPMCAs can easily be upgrad- ed or reconfigured.The proposed FPMCA will add more flexibility to all users of memory chips.Memory manufacturers need not make X1,X4, X16 and X32 chips.The proposed generic memory array can be programmed to the desired config- uration easily.
The FPMCAs will also be extremely useful for prototyping large memory systems.Field Programmable Interconnect Devices (FPIDs) [1] are often combined with FPGAs to realize field program- mable boards [7].The FPMCA will add more flexibility to such field programmable boards and programmable backplanes.Currently different memory chips have to be interconnected through FPIDs in order to realize flexible memory organi- zations.With the availability of FPMCAs, the programmable boards and backplanes will become more efficient.There's no doubt that FPMCAs will be a powerful addition to the suite of program- mable hardware consisting of FPGAs, FPIDs, and field programmable boards and backplanes.

SUMMARY AND CONCLUDING REMARKS
In this paper, we presented the design of a variable width RAM called VaWiRAM which allows the user to configure the RAM to the desirable width.
A VaWiRAM chip reconfigurable between widths and Wma can be constructed at an additional cost of Wmaxpass gates, (Wmax/2) 2-to-1 mul- tiplexers, and logz[logz(k)+ 1] mode pins.This paper discussed the architecture of the proposed Variable Width RAMs (VaWiRAMs), and analyz- ed the trade-offs in their design.Although a naive implementation requires log2[ log2(k) + 1] mode pins, a novel scheme requiring only a single mode pin was presented.We also discussed a powerful extension to the VaWiRAM, the Field Programmable Memory Cell Arrays (FPMCAs) which are reconfigurable memory chips with embedded logic to realize the interface logic to other system com- ponents with reduced parts count.The FPMCAs il- lustrate the potential for a fully reconfigurable memory component, whereas the VaWiRAM illus- trates that limited versions of such systems can be constructed with extremely low overhead.Al- though the concept can be applied to static and dynamic RAMs, design of adjustable dynamic RAMs is more complex (compared to SRAMs) due to the need for refreshing.received her Ph.D in Computer Engineering from Penn State in August 1993.Her research interests include high performance processor and memory architectures, program behavior studies, compiler optimization for high performance processors, rapid prototyping, etc. Currently she is supported by grants from the National Science Foundation including a CAREER Award, and the State of Texas Advanced Technology Program.She re-

FIGURE 5
FIGURE5 Values for the select signals for the 5 different configurations.

FIGURE 6
FIGURE 6 MCU inputs and outputs.

FIGURE 10
FIGURE 10The spectrum of programmable memory cell arrays.

TABLE Inputs
Circuitry for reconfiguring data width.A simplified symbol is used to represent the pass gate.

TABLE II Area
1 or Coarse Grain Reconfigurable FPMCA is in fact VaWiRAM; i.e., RAM with very minimal programmability.A chip

TABLE IV
Area overhead for VaWiRAMs with a width variability factor equal to the square root of the size of the array

TABLE V
Area overhead for level 3 FPMCAs