Hardware implementation of artificial neural networks (ANNs) allows exploiting the inherent parallelism of these systems. Nevertheless, they require a large amount of resources in terms of area and power dissipation. Recently, Reservoir Computing (RC) has arisen as a strategic technique to design recurrent neural networks (RNNs) with simple learning capabilities. In this work, we show a new approach to implement RC systems with digital gates. The proposed method is based on the use of probabilistic computing concepts to reduce the hardware required to implement different arithmetic operations. The result is the development of a highly functional system with low hardware resources. The presented methodology is applied to chaotic time-series forecasting.

General architecture of RC systems. All connections in the reservoir are randomly chosen and kept fixed except for the ones that couple the reservoir on the output layer (dashed arrows).

The RC architecture is composed of three parts: an input layer, the reservoir, and an output layer (see Figure

The general expression to estimate the neuron states is given by

A simple cycle reservoir (SCR) topology. Units are organized in cycle.

It has been observed that the reservoir configured with a ring structure presents only a slightly worse performance than the classical topology [

The RC principle can be used to implement computations on dynamical systems treating them as reservoirs. For example, it has been used to perform computation on hardware platforms such as analog electronics [

Despite the potential benefits of the hardware realization of ANNs, the implementation of massive neural networks in a single chip is a challenging task due to the fact that ANN algorithms are “multiplication-rich” and the multiplication operation is relatively expensive to implement [

Stochastic computing (SC) has evolved as a feasible alternative to implement complex computations due to the simplicity of the involved circuitry. It is based on the result of applying probabilistic laws to logic cells where variables are related to the random switching activity of internal bits [

While the major benefits of SC are low hardware cost, low power requirements, and its inherent high error tolerance, the main drawback of SC is long computation time, which tends to grow exponentially with respect to precision. Over the years, SC has been recognized as potentially useful in specialized systems, where small size, low power, or soft-error tolerance is required and limited precision or speed is acceptable [

Even though SC-based ANNs seem unlikely to achieve speed-up compared to the conventional binary logic ones, they can be an interesting solution for those electronic systems implementing computational intelligence techniques and requiring low power dissipation but not demanding very high computational speed such as wireless sensor networks [

Another appealing feature of SC implementations is a high degree of error tolerance. Stochastic circuits tolerate environmental errors that seriously affect the behavior of conventional circuits. A single bit flip (especially of a high-significance bit) causes a huge error on a binary circuit, but flipping a few bits in a long bitstream has little effect in the value of the stochastic number represented. Therefore, SC can be interesting for applications like spacecraft electronics which operate under radiation-induced error conditions.

We specifically focus on the implementation of the reservoir system with ring topology of Figure

Furthermore, we propose an implementation scheme for the stochastic ESNs which allows overcoming a major challenge of stochastic computing regarding the significant number of resources consumed by the stochastic number generators (SNGs) [

The proposed methodology is used to implement massive reservoir networks and applied to a challenging time-series prediction task.

In stochastic-based computations, a global clock signal provides the time interval during which all stochastic signals are stable (settled to 0 or 1, LOW or HIGH). During a clock cycle, each node of the circuit has a probability

Basic concept of the stochastic codification. Information is coded as the probability “

Product operation of two stochastic signals with switching activities

Pulsed signals follow probabilistic laws when they are evaluated through logic gates. For instance, the AND gate provides at the output the product of their inputs (i.e., the collision probability between signals) as it is illustrated in Figure

Stochastic arithmetic circuits. (a) Unipolar and bipolar multipliers. (b) Unipolar complementary operation and bipolar negation. (c) Adder used for both unipolar and bipolar notation.

A stochastic computing system requires converting any real number (either in the unipolar

Binary numbers are converted to pulsed signals using a Binary to Pulsed (B2P) block; see Figure

(a) Pulse to binary converter

A probabilistic error is always present during conversions. When converting a switching signal with probability

ESNs and, in general, ANNs are composed of individual artificial neurons performing a mathematical function. In particular, the neuron receives one or more inputs and sums them to produce an output. The sums of each node are weighted, and the sum is passed through a nonlinear function known as an activation function that usually has a sigmoid shape. Figure

Operation and schematics of an artificial neuron with two inputs.

Figure

ESN with cyclic topology composed of two-input neurons.

The SC-based implementation of the operations performed by the single neuron of Figure

SC-based two-input sigmoid neuron. The linear part uses probabilistic logic whereas the nonlinear activation function is implemented classically. The output of the linear part is multiplied by 2 (shifting the binary word one position to left) to compensate the scaled sum performed by the multiplexer.

Regarding the computation of the sigmoid function, which is a crucial issue for the neural implementation, there are different stochastic approaches to reproduce the hyperbolic tangent function [

Implementation of the SIG-sigmoid function using direct bit-level mapping proposed by Tommiska [

Experimental measurements of the SIG-sigmoid approximation are presented in Figure

(a) Experimental measurements of the SIG-sigmoid function [

It can be appreciated in Figure

It is worth noticing that the four sequences of pseudorandom numbers required by the B2P converters contained in each neuron do not need to be different for each neuron since the neurons communicate with each other using binary magnitudes instead of probabilistic bitstreams (the output of a neuron is sent to another neuron as a binary value). The use of common random number generators for all the neurons allows reducing the number of required logic elements per neuron. Actually, the three B2P converter blocks (each containing a LFSR and an

SC-based implementation of an ESN with cyclic architecture. A few pseudorandom number generators are shared by all neurons.

The greatest advantage of SC is the minimal use of resources for addition and multiplication. However, uncorrelated bitstreams and therefore large numbers of stochastic number generators (SNGs) are required. Since SNGs can account for a significant portion of the circuit [

The hardware resource consumed by our SC-based implementation of ESNs is presented in next section for different network sizes. A breakdown indicating the logic elements used by each component of the neuron is also included.

Furthermore, a classical digital implementation of the ESN with cyclic architecture has been realized. The required operations illustrated in Figure

A software program has been developed which allows the ESN structure (using either the SC approach or the deterministic one) to be exported automatically to a VHDL hardware description. The program generates the VHDL code for the reservoir with any desired number of neurons and weight configuration. This VHDL code can finally be synthesised to an actual hardware implementation.

As an example of functionality of the proposed methodology, a small reservoir computer was synthesized on a Cyclone IV medium cost FPGA (see Table

Spent hardware resources of the medium-sized Cyclone IV (EP4CE115F297C7N) FPGA for the 20-unit and 50-unit reservoir networks.

Implementation approach | Stochastic | Conventional | ||
---|---|---|---|---|

Reservoir size | 20 neurons | 50 neurons | 20 neurons | 50 neurons |

Total logic elements (LEs) | 2186 (1.9%) | 5306 (4.6%) | 9013 (7.9%) | 19975 (17.4%) |

Combinational functions | 2149 (1.9%) | 5251 (4.6%) | 9013 (7.9%) | 19975 (17.4%) |

Dedicated logic registers | 858 (0.7%) | 2054 (1.8%) | 320 (0.3%) | 800 (0.7%) |

The output layer, which only requires a multiplier-adder circuit, was implemented using conventional binary logic with a resolution of 8 bits for the output weights. At the same time, a numerical model of the stochastic-based reservoir hardware is also developed with MATLAB for a more efficient training and debugging. The resolution of each variable is limited according to the hardware.

The first task selected to be performed is a nonlinear transformation of the input:

Traces of two arbitrarily selected neurons from the reservoir when driven by a sinusoid input. Experimental values (symbols) are plotted along with the numerical results (lines).

The MATLAB model of the stochastic hardware allows us to perform a comprehensive assessment to find the configuration of the stochastic reservoir network providing optimum results. This is preferable to using a classical reservoir model since the optimum configuration values can be quite different between both scenarios. In Figure

Simulation results for the mean square error (MSE) in the fitting task according to the classical and stochastic approaches. The two scanned parameters are

Once the optimum parameters were determined, the hardware was configured, trained, and experimentally tested. The training (assessment of the output layer optimal weights) was carried out using the experimental outputs of individual neurons. This training consisted of a linear regression of the teacher output

In the time-series experiment, we performed a total of 250 time steps. The first 20 time steps (the transient) are neglected, from

An experimental training error

Simulation and experimental results of the mean square error (MSE) for the optimum reservoir configuration. The performance is represented for different evaluation periods. The estimated deterministic error is also given as a reference.

Finally, we show in Figure

Input signal

The evaluation time of the network is of the order of 1.3 ms when using 16-bit counters in the

A more complex task is implemented for a proper validation of the proposed methodology. This task consists in a one-step ahead prediction of the Santa Fe dataset [

The analysis of this task was conducted using the MATLAB model of the stochastic hardware for two different reservoirs (with

Simulation results for the normalized mean square error (NMSE) in the time-series prediction task according to the deterministic and stochastic approaches. The two scanned parameters are

The configuration parameters allowing the best performance error for the validation dataset were applied to the network when processing the test set. The final optimum results as a function of the number of neurons in the reservoir are depicted in Figure

Normalized mean square error (NMSE) for the time-series prediction task. The stochastic-based results are displayed for a 20-unit and for a 50-unit reservoir using different values of the evaluation time. The corresponding results obtained with a deterministic approach are also represented.

In Figure

Segment of the laser time-series (predicted and targeted values). Predictions performed using the stochastic methodology with

The hardware resources required to implement the proposed stochastic-based reservoir networks together with the resources used for the conventional deterministic implementations are presented in Table

Comparison of the logic elements spent by the stochastic implementation and the conventional one.

The implemented reservoir did not require any memory bits except the ones used to store the input data values. The values shown in Table

The use of the cyclic architecture allows significant resource saving compared to the standard random ESN implementation. A preliminary study [

A breakdown of the hardware requirements of each component of the SC-based neuron is illustrated in Figure

Breakdown of the hardware requirements of each component of the SC-based neuron.

The B2P converters, which are used as common elements by all the neurons, use 46 LEs each whereas the SNG (based on a 26-bit LFSR) consumes 26 LEs. Therefore, significant resource saving is achieved by sharing these components.

The proposed SC-based neuron design seems to be optimum in terms of hardware resources. Further reduction of the area requirements is only possible at the cost of a loss in accuracy using, for example, a rougher approximation of the sigmoid function or lower order P2B converters (the presented results are for neurons using pulsed signal conversions to 16-bit binary magnitudes).

In this work, we have proposed and analysed an alternative architecture that exploits stochastic computing for doing time-series prediction with echo state networks. We have found that the stochastic architecture requires less area than a conventional hardware implementation. This characteristic makes the ESN implementation possible using low cost FPGA devices. Moreover, it has the advantage of being much more tolerant to soft errors (bit flips) than the deterministic implementation, which makes it particularly useful for applications that need to operate in harsh environments such as space.

However, it should be noted that the stochastic implementation requires relatively many clock cycles to achieve a given precision compared to a binary logic conventional implementation. For instance, to get a 16-bit resolution, a computation time of 2^{16} clock cycles is needed.

Therefore, in general, potential applications of the stochastic implementations are specialized systems where small size, low cost, low power, or soft-error tolerance is required, and limited speed is acceptable. The presented SC-based ESN approach can be an interesting solution, by way of example, for electronic systems implementing computational intelligence techniques and requiring low power dissipation such as wireless sensor networks, predictive controllers, or medical monitoring applications.

For the ESN, a ring topology has been selected since hardware resources are minimized with this configuration while the precision of the network is not decreased with respect to a classical random one. In addition, we have proposed an implementation area-efficient scheme that employs probabilistic logic for the arithmetic operations and conventional binary logic for the nonlinear activation function. This scheme allows reducing the number of SNGs, which are expensive in terms of hardware resources, using common SNGs for all neurons. It has been observed that the area cost of the proposed implementation is dominated by the P2B converters and the sigmoid function.

The proposed methodology has been used to implement a massive reservoir network and has exhibited considerable performance in a chaotic time-series prediction task.

Reservoir networks present some advantages compared to conventional recurrent neural networks that enable a more efficient hardware implementation. A major benefit of RC networks is their sparse connectivity. This characteristic allows a simple wiring that matches the FPGA capabilities. Additionally, a simple training process can be performed offline.

The use of the stochastic logic implies certain constraints. The shortcomings are the evaluation time and the precision. Nevertheless, these drawbacks are compensated for by the much simpler architecture and by the stochastic logic’s inherent noise immunity which, all in all, allow a massive, parallel, and reliable implementation.

The authors declare that there is no conflict of interests regarding the publication of this paper.

This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO), the Regional European Development Funds (FEDER), the Comunitat Autònoma de les Illes Balears under Grant Contracts TEC2011-23113, TEC2014-56244-R, and AA/EE018/2012, and a fellowship (FPI/1513/2012) financed by the European Social Fund (ESF) and the Govern de les Illes Balears (Conselleria d’Educació, Cultura i Universitats).