A Stochastic D / A Converter Based on a Cellular Automaton Architecture

The design and VLSI implementation of a new stochastic D/A converter using the properties of Cellular Automata (CA) is presented in this paper. The converter is implemented using a Double Layer Metal (DLM), 0.7 tm, N-well, CMOS technology process provided by the European Silicon Structures (ES2). Its maximum conversion rate is 6 kHz and it is intended to be used in low-cost applications. Additionally, the proposed approach integrates into digital techniques more easily than other popular building D/A techniques.


INTRODUCTION
The goal of a D/A converter is to convert a quantity specified as a binary number to a voltage or current proportional to the value of this number.The most popular D/A conversion techniques are: (i) scaled resistors into summing junction, (ii) R-2R ladder and (iii) stochastic.The vast majority of converters use the resistor ladder approach.The stochastic technique though slower is significantly lower in cost and it integrates into digital techniques more easily [1].
The key building block of a stochastic D/A is the Pseudo Random Number Generator (PRNG).In the stochastic D/A converter approach the pseudo number generation is usually performed *Corresponding author.through software.Although this results in a low- cost implementation the conversion rates achieved are only a few tens Hz.The use of specific hardware (CA or LFSRs) in the pseudo number generation process significantly improves conver- sion rates.PRNGs based on Additive Cellular Automata (ACA) do not have long feedback loops and, therefore, can operate at significantly higher speeds compared to traditional Linear Feedback Shift Register (LFSR) based PRNGs [2,3].A six- cell CA circuit is 1.7 times faster than the analogous LFSR circuit.Also, they compare favourably with the autocorrelation and cross correlation of LFSR generators.Furthermore, their regular structure and interconnections result in efficient layouts and less silicon.This paper presents for the first time, as far as we know, the design and VLSI implementation of a stochastic D/A converter using the properties of CA.Its maximum speed of operation is 6 kHz.The D/A converter is intended to be used in low-cost D/A conversion applications.

DEFINITIONS OF CELLULAR AUTOMATA
One-dimensional (l-D) CA consist of cells ar- ranged in a straight line, as shown in Figure 1 [4].
The local state of a CA is defined as the value al t) of the cell at position on time step t.A CA evolves in discrete time steps, and the value of a local state at any given clock cycle depends on the cell neighbourhood values on the previous clock cycle, according to a specific rule (local rule).
Neighbourhood which consists of the adjacent cells is called 3-neighbourhood.The CA local rule, g, of a 3-neighbourhood CA is denoted as: (t+l) g(a}21 al t)''( A rule can be represented by a state transition table shown in Table I.The local rules for the CA are described by an 8-bit number.This number represents the state transition function of the CA, and its decimal equivalent is referred to as the rule number [5].There are 28 distinct CA rules in one dimension with a 3-neighbourhood.A CA that uses only linear functions to form the local rule is called an additive CA.For example, XOR opera- tion (modulo 2 addition) of the two nearest neighbours defines rule 90 [4], where: a}t) ai-l(t-1) 1 "i+l"(t--1)(@ denotes mod 2 addition) Rule 150 is defined by the relation [4]: a}t) a(t-1) ..(t-l) a(t-1) The boundary conditions can be either periodic or null.Periodic boundary conditions suggest that the CA forms a ring thereby making the first and last neighbours, whilst null boundary conditions assume that the first and last cells consider their missing neighbour cell to always have a zero value.
The .global state for a CA is defined as the ordered set of the local states of its cells.The total number of possible global states for a CA with N distinct cells is 2v, each state being uniquely specified by an integer word of N bits, which represent individual cells.An N-bit additive CA is characterised by a NxN matrix Tr, called the global rule transition matrix, operating over GF(2).
The construction of the Tv matrix is based on the neighbourhood dependence of the cells.For  example rule 90 is characterised by the following matrix: 0 0 0 0. 0 0 1 0 1 0 0. 0 0 TN= 0 1 0 1 0. 0 0 (4) 0 0 exhaustive if its characteristic polynomial PX) is primitive.An exhaustive CA can be built by a null bounded HCA with alternation of additive rules resulting in a primitive polynomial Px) [2].
As an example of the behaviour of such a hybrid additive CA (HACA), construct a null boundary CA of an even length N, and then alternate rule 90 and rule 150, as shown in Figure 2a.The transition matrix in this case is" If a (t) [a t) at).., a?] ris the global state vector of clock cycle then the next global state, A (t + 1), of clock cycle + 1 can be directly obtained from the relation: The characteristic polynomial of the matrix Tv characterises completely the CA and is given by the relation: Pv(X) det ITn +x.IN] (6)   where lv denotes the NxN identity matrix.

CA BASED PSEUDO RANDOM BINARY SEQUENCE GENERATION
In homogeneous CA the same rule is applied to all cells.A Hybrid Cellular Automaton (HCA) renounces the uniformity of the rule and different cells can follow different rules to update their states.An N-bit CA which yields to a sequence of global states of length 2v-1 (maximum length sequence) is called exhaustive [2].A CA is where cii(1 <i<n) can either be 0 or 1; ii"-" 0 refers to rule 90 whilst cii refers to rule 150.It can be seen from equation ( 7) that the HACA can be constructed directly from the values in the diagonal, if its 90/150 matrix is known [6].
Figures 2b and 2c show the cells of the CA based pseudo random number generator.The required communications are restricted to the nearest neighbours, and, thus the speed advantage becomes obvious.A HACA is much faster than a LFSR with the same characteristic polynomial [2][3].The most critical nodes, as far as the speed of the 8-bit HACA circuit is concerned, are the outputs of the three input XOR.As a result the speed of a HACA does not depend on its length, but on the delay of the three input XOR.For larger implementations the advantage in speed is even more significant, as the CA approach does not become slower, whereas the LFSR technique becomes appreciably slower, since the delay grows The block diagram of a stochastic converter is shown in Figure 4.In this circuit, the digital value is converted to a time ratio which is output as a pulse width.The pulses are output as//max for time Ton followed by zero for time Toff in a continu- ously repeated cycle until a new Your is required.
The pulse train is then integrated by an external lossy integrator circuit.This is a classical case of a smoothing (or reconstruction) filter.linearly with the number of stages.Also, LFSRs and HACA produce sequences of global states of similar statistical characteristics with uniform distributions but HACA are Superior than LFSRs in several statistical tests [7]. Figure 3 shows two different presentations of the state-time diagram of the 8-bit CA based PRNG.The state-time diagram shown in Figure 3b, assigns each bit in the CA to a horizontal pixel and assigns the pixel the value "#" if the corresponding bit is a logical To. + To The long term average of the pulse train is the correct mean Vout, but there is a problem as frequency f of the output signal should be f<<l/ (Ton+ Tort).The problem can be visualised by considering the capacitor being charged up for time To and then discharging for time Tom In this way the output fluctuates about the desired value.
The proposed solution is to distribute the To into many more shorter ON times, and distribute the Tort into similar number of OFF times leaving the total on and off in a cycle unaltered by spreading them evenly through the cycle.This can be achieved through a PRNG that produces pseudo random sequences of  randomness properties [3].The digital input value is compared with the present value of the PRNG and if the digital input is greater, then Vmax is output for the duration of that minor cycle, otherwise the output is zero.Of course, there are now minor cycles but the same timing accuracy is required for a given digital to analogue accuracy as for a simple pulse width cycle.Also, notice that for FIGURE 5 Parallel pipelined comparator.
an eight-bit resolution, 256 possible time divisions per major pulse are required.
The comparator, shown in Figure 5, is a parallel pipelined 8-bit comparator.It operates similarly to the traditional comparators, but the results of the processing levels are latched using the pipelining principle to speed up the comparison process.Thus, the comparator can operate at similar speed to the PRNG.The maximum frequency of operation of the digital block, implemented in VLSI, is 125MHz (worst case).The die size dimensions of the chip are 1.65 mmx 1.65 mm= 2.72 mm2.A block level layout of this chip is shown in Figure 6.The inputs to the chip are the 8-bit digital data, the clock, as well as the power and ground connections, whereas the output is the output of the comparator.The simulation and test language STL, a high level language with a structure similar to the C language, has been used to examine the functionality of the chip.The STL simulation output comparison capability allows to automati- cally compare the expected output values specified in the STL source program with the results of the simulation.No errors have been detected during this process.
The required integration is performed by an external smoothing filter.For each minor cycle, these must be either on or off.Each of these divisions has a duration of t 1/125 8 ns and the major cycle has a duration of 255*t-2ps.The integrator must integrate a number of such cycles before its output settles.With a single pole integrator with a time constant of 10 ps the settling time will be approximately 4.5 (time constants) x 10 40.5 ts.Assuming four outputs per cycle the maximum conversion frequency becomes 6 kHz. Figure 7a depicts the output of the comparator for two digital inputs, whereas Figure 7b shows the output of the integrator for the latter inputs.This is the analogue output of the proposed technique.
It is clear from the previous discussion that the higher the resolution of the converter the lower its speed.As a CMOS device a low power consump- tion is expected.The circuit draws only 0.5 pA from the + 5V dc power supply.The linearity error is 4-1/2 LSB maximum.

CONCLUSIONS
The design and VLSI implementation of a new stochastic D/A converter using the properties of CA have been presented in this paper.The implementation technique is faster than the similar existing ones, due to the speed of the CA based PRNG.Furthermore, it integrates more easily with digital techniques.Its maximum frequency of operation is 6 kHz.The converter is intended for use in low-cost D/A conversion applications.

FIGURE 4
FIGURE 4 Block diagram of the stochastic D/A convertor.

FIGURE 6 FIGURE 7
FIGURE6 Block level layout of the ASIC.

TABLE State
a uniform distribution.CA based PRNGs provide an alternative to conven- tional LFSR based generators with improved #