Architecture Analysis of an FPGA-Based Hopfield Neural Network

Interconnections between electronic circuits and neural computation have been a strongly researched topic in themachine learning field in order to approach several practical requirements, including decreasing training and operation times in high performance applications and reducing cost, size, and energy consumption for autonomous or embedded developments. Field programmable gate array (FPGA) hardware shows some inherent features typically associated with neural networks, such as, parallel processing, modular executions, and dynamic adaptation, and works on different types of FPGA-based neural networks were presented in recent years. This paper aims to address different aspects of architectural characteristics analysis on a Hopfield Neural Network implemented in FPGA, such as maximum operating frequency and chip-area occupancy according to the network capacity. Also, the FPGA implementation methodology, which does not employ multipliers in the architecture developed for the Hopfield neural model, is presented, in detail.


Introduction
For nearly 50 years, artificial neural networks (ANNs) have been applied to a wide variety of problems in engineering and scientific fields, such as, function approximation, systems control, pattern recognition, and pattern retrieval [1,2].Most of those applications were designed using software simulations of the networks, but, recently, some studies were developed in order to extend the computational simulations by directly implementing ANNs in hardware [3].
Although there were some works reporting network implementations in analog circuits [4] and in optical devices [5], most of the researches in ANNs hardware implementations were developed using digital technologies.Generalpurpose processors and application-specific integrated circuits (ASICs) are the two technologies usually employed in such developments.While general-purpose processors are often chosen due to economic motivations, ASIC implementations provide an adequate solution to execute parallel architectures of neural networks [6].
In the last decade, however, FPGA-based neurocomputers have become a topic of strong interest due to the larger capabilities and lower costs of reprogrammable logic devices [7].Other relevant reasons to choose FPGA, reported in the literature, include high performance requirement which is obtained with parallel processing on hardware systems when compared to sequential processing in software implementations [8], reduction of power consumption in robotics or general embedded applications [9], and the maintenance of the flexibility of software simulations while prototyping.In this particular feature, FPGA presents advantages over ASIC neurocomputers because of the decreased hardware cost and circuit development period.
Among several studies on different network models implemented on electronic circuits, just recently were published in the literature works on the hardware implementations addressing specifically the Hopfield Neural Networks (HNNs).Those works usually aimed to approach the resolution of a target problem, such as, image pattern recognition [10], identification of DNA motifs [11], or video compression 2 Advances in Artificial Neural Systems [12] and only few studies concentrated on the peculiar features about FPGA implementations of HNN [13], which, unlike others neural architectures, is fully connected.
Driven by such motivation, the present work does the analysis of the chip occupied area according to the quantity of HNN's nodes and the precision required to represent its internal parameters.The paper is organized as follows.Next section briefly reviews the concepts of the network proposed by Hopfield, Section 3 describes details of the implementation strategy, and the last two sections present the results, conclusions, and discussions raised by this work.

Hopfield Neural Network
The HNN is a single-layer network with output feedback and dynamic behavior, in which the weight matrix , connecting all its  neurons, can be calculated by with each one of the  stored patterns represented by a vector   of  elements    ∈ {−1; +1} and  denoting the identity matrix.
It has been shown [14] that, for a weight matrix with symmetric connections, that is,   =   , and asynchronous update of the neural activation,   , the motion equation for the neuron's activation always leads to a convergence to stable states.In the discrete-time version of the network, the neural activation is updated according to where (⋅) is a sign function of the neural potential   and   ∈ {−1; +1}.The model, proposed by Hopfield in 1982 [15], can also be interpreted as a content-addressable memory (CAM) due to its ability to restore prior-memorized patterns from initial inputs corresponding to noisy and/or partial versions of such stored patterns.The CAM storage capacity  increases according to , and, with allowance for a small fraction of errors; references [16,17] demonstrated the following relation: 2.1.Application-Dependent Parameters.The number of nodes  of the Hopfield architecture is defined by the length of the binary strings stored by the associative memory, which is specific to each application.For example, when using the associative architecture for the storage of binary images with  pixels, the number of neural nodes has to be  [14].With respect to , which is the number of stored patterns in the associative Hopfield architecture, each one with length  bits, it is also defined by the requirements of the application:  is the number of distinct patterns relevant for the associative process.In the case of storage of images of decimal digits, for example,  will be 10.Notice though that the theoretical study of the Hopfield architectures indicates that we need to have a reasonable relationship between the number of stored patterns  and the size of the architecture .According to [16], when  increases beyond 0.14 it results in significant degradation of the associative memory performance, and we say that the storage capacity of the architecture is exceeded.

Implementation Architecture
The FPGA-based HNN's digital system developed is depicted in Figure 1 and consists of a control unit (CU) and a data flow (DF) which implements the processes shown in the block diagram of Figure 2.

Data Flow.
The input unit of Figure 2 receives the prompting input patterns and allows the update of neural activation   , according to the architecture illustrated by Figure 3, comprised of registers (R) and multiplexers (MUX).
The weight unit (Figure 2) is designed to execute the computation of     without employing multipliers.The process is executed by registering the values of +  and −  and selecting between them based on the   state (−1 or +1). Figure 4 depicts the partial architecture of the weight unit, focusing on the implementation of the calculus involved in a specific neuron (neuron #1).
From Figure 4, it can be noticed that the proposed system implements the asynchronous version of the HNN, since the weight unit is designed in order to compute one weighted value (  ) at a time.
The summation unit, in Figure 2, performs the function (  ) by inverting the signal bit resulting from the sum of the weighted neural activations   and feeding it back to the input unit.The architecture of this unit is shown in Figure 5.
The architecture of the final block in Figure 2, the output unit, is depicted in Figure 6.The unit outputs the state of activation for the entire network if a stable state is reached; that is,   () =   ( + 1) for  = {1, 2, . . ., }.The termination condition is identified by a comparison between the network state at the beginning and the network state at the end of an operational epoch, that is, after updating all neurons.
Finally, Figure 7 presents the circuitry used to implement the computations concerning one neuron.The illustration covers a single neuron (neuron #1) and depicts the interconnections between input, weight, and summation units.Connections to output unit are not shown.

Control Unit.
The control unit is designed to activate the sequence of registers in data flow (Figure 1).More specifically, after the capture-input is triggered, the signals originated by control unit enable R IN and R  in input unit, R  in weight unit, and R ADDER in summation unit, sequentially.Control unit also generates the selection signals for the multiplexers in input unit to admit external data entrance only in the first cycle of operation and, in weight unit, in order to enable the computation of each neuron.At the end of an operational cycle, control signals are generated by the unit for

From input unit
To summation unit the purpose of allowing comparisons between initial and final neural activations patterns of the entire network, in order to detect convergence.Algorithm 1 summarizes control unit by presenting the pseudocode of the state machine.

Dimensions of the Registers.
Internal parameters of the network use two's complement representation and, due to the high number of interconnections in HNNs, it is important to establish the size of weight registers.From (1), the maximum 6 Advances in Artificial Neural Systems weight value is ±, when all elements of the vector describing the stable states programmed contribute with identical values.Therefore, the length of weight registers (  ) can be calculated by The two bits added are due to the necessity of representing the actual number expressed by  and do not represent only  different states along with the weight signal.Equation ( 4) can be rewritten as a function of , according to (3) by with  always a positive number.
Also, in order to prevent overflows while registering the weighted summations, it is possible to calculate, from (2), the maximum value of   , in a network with  neurons, as ±( − 1).From the capacity analysis described by (3), the number of bits to register  value can be written as a function of the quantity of neurons by where   denotes the length of  registers.The bit added is due to the representation of the  signal.Also, in order to save space on the FPGA, only half of the   are stored.Since, as mentioned earlier,   =   , the registered weights are used twice in the present implementation [13].

Results and Discussions
This section presents the set of experiments conducted in order to obtain some relevant parameters of the HNN implementation on FPGA, such as, maximum operating frequency and chip-area occupancy, in function of the quantity of neurons.The target device chosen to embed the network was the Spartan3E XC3S250E from Xilinx Inc.The choice was made aiming to employ a device with similar features to other published works [10,13], that is, a FPGA with approximately equal numbers of logic cells and system gates.
The parameters set for the first experiment were  = 16 neurons,  = 2 patterns,   = 3 bits, and   = 6 bits.According to ISE Project Navigator 13.1 used to design the proposed architecture, the maximum frequency was 81.390 MHz and the digital system occupied 197 slices, which means 8% of the space available on the chip.Figure 8 contains the network response to a prompting input pattern with Hamming distance from a stable state equal to 7 corrupted bits in the prompting condition.The figure shows that the network reached the convergence in approximately 1.3 s, which is equivalent to 100 clock cycles.
In order to allow a graphic visualization of other experiments conducted, Figures 9 and 10 are presented.In this experiment, the parameters were set to  = 32 neurons,  = 4 patterns,   = 5 bits, and   = 9 bits.Figures 9 and 10 depict two stable states stored by the HNN (left side of the pictures) and the prompting inputs applied to the network with some corrupted bits (right side of the pictures).The first stored pattern, in Figure 9, represents the letter "U" and the second stored pattern, in Figure 10, represents number "5. " Both input patterns applied to the network have Hamming distance from stored patterns equal to 10 corrupted bits and were successfully restored by the developed system.
Table 1 shows the information obtained from the experiments conducted.The table contains the parameters  and  for the sequence of HNNs implemented and the information on maximum frequency and maximum output time after clock and number of occupied slices for each implementation.Also, in order to better visualize the data, Figures 11 and 12 illustrate the maximum frequency and the number of occupied slices graphically, according to the network size.
From the obtained data, it can be seen that the architecture developed takes, at most, 4.144 ns to produce an output after the clock input, independently of the network configuration.Such stability is due to the parallel architecture implemented, because increments of neurons to the network do not increase the depth of the proposed circuit and, Figure 9: Illustration of one pattern memorized by the HNN representing letter "U" (left) and its corrupted version applied to the network (right).After convergence, the system outputted the stored pattern (left) successfully.
therefore, do not increase the maximum output time after clock.
As expected, a decrease of the system maximum input frequency according to the addition of neurons to the network can be seen in Figure 11.This decrement is due to a greater spread in the distribution of the FPGA resources caused by the increased number of logic elements and interconnections used in the network architecture.
The chip-area occupancy, however, presents a significant decrease between 13th and 14th implementation, despite the addition of a neuron to the network, as shown in Figure 12.Such occurrence is due to the increased number of patterns memorized by the network.The aggregation of one neuron, from 28 to 29 units, between experiments 13 and 14 (Table 1), allows the network to store one more pattern, rising from 3 to 4 stable states, which generates only even values of weights (no odd values are possible), according to (1).Even values represented in binary format have zeroes for the least significant bit.This enables the FPGA synthesis tool to reduce the entire logic chain employed to implement the presented architecture.
Lastly, an experiment was conducted in order to compare the proposed strategy of employing multiplexers instead of multipliers to calculate     with common approach.The experiment consisted in implementing the target circuit of the weight unit (Section 3.1) using both architectures,   multiplexers and multipliers, for the first and the last trials of Table 1.In the first implementation ( = 16 neurons,   = 3 bits), the use of multiplexers allows a reduction of 71 occupied slices on the chip.For the 17th implementation in Table 1 ( = 32 neurons,   = 5 bits), the proposed architecture allows a saving of 128 slices in the weight unit when compared to the use of common multipliers.

Conclusions
The present paper details a methodology for implementing the asynchronous version of the Hopfield Neural Network on an FPGA.The proposed architecture avoids the use of multipliers by using an array of multiplexers and does not store duplicated weights on the chip.Along with the proposal of the digital system, an approach to estimate the length of registers involved in the network design is presented.A set of experiments with the developed neural chip shows the decrease of the maximum operating frequency allowed in function of the quantity of neurons.On the other hand, an interesting observation is that the chip-area utilization does not always increase according to the enlargement of the network size; it also depends on other variables, as the weight matrix and the number of stored patterns of the HNN.The present purpose allows the implementation of larger networks on the chip by properly setting such parameters.

Figure 8 :
Figure 8: Simulation of the HNN, converging in approximately 1.3 s to the stored stable state: "000011110000".

Figure 10 :
Figure10: Illustration of one pattern memorized by the HNN representing number "5" (left) and its corrupted version applied to the network (right).After convergence, the system outputted the stored pattern (left) successfully.

Table 1 :
Parameters obtained from the experiments.