Implementation of Blockchain Consensus Algorithm on Embedded Architecture

*e adoption of Internet of*ings (IoT) technology across many applications, such as autonomous systems, communication, and healthcare, is driving the market’s growth at a positive rate. *e emergence of advanced data analytics techniques such as blockchain for connected IoTdevices has the potential to reduce the cost and increase in cloud platform adoption. Blockchain is a key technology for real-time IoT applications providing trust in distributed robotic systems running on embedded hardware without the need for certification authorities. *ere are many challenges in blockchain IoT applications such as the power consumption and the execution time. *ese specific constraints have to be carefully considered besides other constraints such as number of nodes and data security. In this paper, a novel approach is discussed based on hybrid HW/SW architecture and designed for Proof of Work (PoW) consensus which is the most used consensus mechanism in blockchain. *e proposed architecture is validated using the Ethereum blockchain with the Keccak 256 and the field-programmable gate array (FPGA) ZedBoard development kit. *is implementation shows improvement in execution time of 338% and minimizing power consumption of 255% compared to the use of Nvidia Maxwell GPUs.


Introduction
e global IoT market is expected to reach a value of USD 1,386.06 billion by 2026 from USD 761.4 billion in 2020 at a CAGR of 10.53%, during the period 2021-2026 [1].
e IoT technology is connecting various devices such as mobile phones, sensors, and household appliances together for collecting and sharing data for the next industrial revolution of intelligent connectivity. e fourth industrial revolution (Industry 4.0) interconnects smart digital technology with real worlds to create smart manufacturing and supply chain management [1]. In the current context, the emergence of Industry 4.0 and the adoption of IoT devices require manufacturers to implement innovative ways to advance production with intelligent connectivity that uses more robotics and avoids industrial accidents and machines' downtime failure. erefore, industries, hospitals, supply chains, governments, banks, and logistics need to be connected using Distributed Ledger Technologies (DLT) such as blockchain technology to react quickly for a more connected world. is will enable more secured process dealing with big data analysis generated by IoT devices.
Blockchain is mainly dealing with data storage and management and a distribution technology that is transparent and secure and operates regardless of a central control body [2].
Unlike traditional methods, blockchain allows peer-topeer transfer of digital assets without the need for an intermediary.
Representing the new generation of blockchain, Ethereum can play a major role of a public blockchain like Bitcoin, or a private blockchain such as Hyperledger Fabric. It is also the basis of other blockchains which are specific frameworks for applications, such as the Azur. For example, the blockchain proposed by Microsoft, which was optimized to take advantage of the characteristics of the cloud. Another example is the Grid + blockchain which is used in energy management applications.
To preserve the security of the blockchain, a specific algorithm, known as consensus, is used. It allows a new block to be added to the blockchain without compromising the integrity of data stored in the distributed ledger.
Moreover, some blockchains are defined with intelligent contracts and software platforms to play the role of links in the blockchain. However, all these blockchains are using consensus to preserve their security. In this context, several types of consensus are proposed in the literature such as the Proof of Work (PoW), the Proof of Stake (PoS), the Proof of Authority (PoA), the PBFT, and the Ripple and the Raft [14]. ese consensus algorithms have different complexity levels. One of the most complex and energy-intensive consensuses is PoW which was used in several blockchains such as Bitcoin, Ethereum, and IoTA [15]. As an example, the mining process time is approximately 10 minutes for Bitcoin [16] and 15 seconds for Ethereum using Nvidia RTX 3080 GPU [17]. Regardless of the number of miners, it still takes about 10 minutes to mine one Bitcoin. At 600 seconds (10 minutes), all else being equal it will take 72,000 GW (or 72 terawatts) of power to mine a Bitcoin using the average power usage provided by ASIC miners [16]. e use of blockchain, particularly the mining part, requires significant computing resources. In this paper, a feasibility study of implementing the blockchain on an embedded system and particularly on field-programmable gate array FPGA is presented taking into consideration all the resource requirements to validate this approach.
An embedded architecture is proposed to implement the PoW consensus, especially on FPGA-based architecture.
is optimized architecture should accelerate the classical PoW process and consequently minimize the energy consumption. is proposed architecture is chosen according to a comparison between different software (SW), hardware (HW), and mixed architectures.
More precisely, the contribution of this paper is as follows.
e main contribution of this paper consists of two parts. First, an embedded architecture is proposed to implement the PoW consensus algorithm on FPGA. is part is called the off-chain system block. And, the second part is dedicated to the design of an off-chain/on-chain system. e PoW implementation and particularly the hash algorithm were off-chain (on FPGA). e node smart contract, transactions, and blocks where on-chain (they are implemented on the Raspberry Pi 3 platform). e remainder of this paper is as follows. In Section 2, we describe the basic notions of the blockchain, particularly its different consensus followed by a study on embedded technologies and mixed HW/SW architectures [18]. In Section 3, a description of the PoW used in the blockchain Ethereum will be dissected. e profiling of this function will allow to describe the embedded architecture to be chosen. Section 4 is reserved for the choice of the architecture and the different parts of our system containing the consensus implementation. e last part will be reserved for the results obtained and the comparison between SW on GPU and HW-implemented architecture from execution time and energy consumption point of view. Finally, in Section 5, we conclude and give potential perspectives.

Blockchain Overview.
In this section, we give an overview of the blockchain technology and its different classes and main components.

Security-Based Blockchain Classification.
From the security point of view, blockchain can be classified as public, consortium, and private.

Public Blockchain.
e blockchain is said to be public because it is open to everyone. us, it is assimilated to a marketplace, where anyone can open a store to offer any products and services. In this case, there are no restrictions on the comings and goings of visitors who are free to visit the different stores to make purchases.
Consequently, a public blockchain has several characteristics, such as a decentralized network which is open to all actors without any restriction, data can be consulted by all without any restriction, and data can be consulted by all without any restriction, but it is indelible, forgery-proof and cannot be modified afterwards. In this class of blockchain, the use of the PoW consensus makes the blockchain's transactions impossible to falsify and very easy to manipulate.

Consortium Blockchain.
It consists of a permitted blockchain which is partially decentralized and differs from public blockchains because its network is only accessible to a limited number of users.
New members must be validated by the nodes and already existing members in the consortium, and the accessibility of the data depends on the access rights granted to each node. It can be compared to a corporate marketplace (here, the "consortium") for which only consortium members would be allowed to open a store to offer products and services. However, the consortium may grant some exemptions to open additional stores. e comings and goings in this marketplace are normally restricted by the rules defined by the consortium.
It should be noted that the vast majority of existing consortium blockchains operate under the Proof of Authority (PoA) system. As examples of public blockchains, we can cite Ripple [20], Funds DLT, etc.

Private Blockchain.
In contrast to public blockchains, private blockchains (of which permitted blockchains are a special case) are like distributed databases. eir characteristics are as follows: (i) e network is accessible to a limited number of users. New entrants must be validated by a central decision-making body. (ii) e accessibility of the data depends on the access rights of each node. is is defined by the central decision-making body. (iii) On a private blockchain, the consensus is based on the trust placed in all the validator nodes.
A private blockchain can be compared to a marketplace where all members authorized to launch a store, or to sell products and services, are only members of this same structure.
As a result, the cases of use are very frequent. As for distributed databases, they are useful for sharing confidential or important data within an organization or within the different entities of a group.
ere are many examples of private blockchains. We can cite Hyperledger Fabric, Grid+, Azur, Ethereum (both private and public blockchains), etc.

Consensus Algorithms.
It consists of the transition from centralized systems where the administrator or the central system can validate or invalidate transactions such as the banking system and database management systems.
In this kind of systems, the administrator is the valid or invalid manager. In decentralized systems such as blockchains, the absence of an administrator requires another protocol for verification and validation. e intermediary functions are moved to the periphery participating pair in the infrastructure of the chain. Since the peers do not necessarily know each other, it is a decentralized system. e consensus algorithm consists of firstly setting up a process to validate, verify, and confirm transactions, then recording the transactions in a large distributed directory, creating a block record (a chain of blocks), and finally implementing a consensus protocol.
us, validation, verification, consensus, and immutable recording lead to trust and security of the blockchain.
Several types of consensus are used in the blockchain including PoW [21], PoS [21], PoA [21], PBFT [21], Ripple [20], and DAC [21]. In this paper, we will describe only the PoW algorithm that will be implemented in HW (FPGA platform). In the next part, we will describe the state-of-theart of embedded systems.

Overview of Embedded
Architectures' Solutions e evolution of electronics and microelectronics has made it possible to minimize the size of transistors to increase the number of electronic components integrated on the same chip. e main component is the microprocessor. Microprocessors consist of one or more central processing units (CPUs), as well as other modules required for their operation such as memory controllers, cache memory, and I/O controllers.
However, in some systems, the integrated circuit contains not only the microprocessor but also other components such as microcontrollers and GPUs. Such a system is called System on Chip (SoC). ese SoCs are based on the minimization of space and power consumption, while preserving the necessary performance for the constraints of the appropriate applications.
For example, a typical modern SoC contains the CPU, the GPU, the communication modules (Wi-Fi, Bluetooth, etc.), a module for localization, as well as other subsystems and coprocessors providing various functions such as device security [9].
ese SoCs are used in applied computer systems generally called embedded systems. Although there is no formal definition of the latter, they are generally information systems designed for well-defined tasks [22] and are integrated in other products [23]. e use of embedded systems has also touched the blockchain technology. us, e-health, agriculture, light and heavy industry, e-learning, and augmented reality [24] applications are often based on SoCs to set up systems that meet their different needs.
us, we find different architectures that are in adequacy with the different needs. We can find single processor systems whose performance is enhanced by HW accelerators (IPs) [24], or massively parallel architectures that take advantage of the large number of processors operating in perfect parallelism [25].
If the use of embedded systems has touched several domains, its use in the blockchain domain has remained rather limited, especially for FPGAs' technology. In fact, despite its various internal resources such as embedded high-speed memory, parallel computing blocks, and flexible architecture, which are suitable for computationally complex applications, it is still limited to the use of the PoW consensus.
Such idea is rarely discussed in the literature. We mention particularly in the work presented in [18], where the authors presented the possibility of implementing an embedded robotics application managed by blockchain.
In the work by Chaari [26], an embedded system based on a Raspberry Pi 3 platform was used. One of the problems encountered in this work is essentially that the Raspberry is unable to run all the PoW consensus software functions due to its limited capabilities.
In this paper, the main target is to propose an embedded architecture suitable for blockchain applications and able to support the implementation of the PoW consensus. Hence, we will show the feasibility as well as the gain realized by using such architecture adopted at Ethereum PoW on FPGAs.

Ethereum Blockchain Components.
In this section, we are interested in blockchain components, especially Ethereum blockchain and its different components. e blockchain is based on specific terminology representing important concepts. Among the frameworks of the blockchain, there are the following.

Transactions.
ese are the exchanges of data between different users. Each transaction is signed by the sender's private key.
anks to this signature, the security of the transactions is guaranteed. erefore, any modification of these transactions during transmission can be avoided.

Blocks.
A block is a record in the blockchain which contains the confirmed transactions.
us, each open transaction will be added to a block. After a period, for a new block containing transactions to be added to the blockchain, it must be validated by a selected person called a minor. is validation operation is called mining.

e Block Chains.
Each block in the blockchain is linked to the previous block. is link is done by inserting the hash specific to the previous block. erefore, the hash of each block includes not only its own hash but also the hash of the previous block. Figure 1 illustrates what has been described. is way we can protect the blockchain from any form of corruption.

Smart Contracts.
A smart contract is a software "installed" on a blockchain solution. It is the most important link in the blockchain. It runs automatically as soon as the various preprogrammed constraints are checked. Even though it is not a legal document, the intelligent contract automates the execution of a contractual commitment.
A consensus algorithm is a process through which all the nodes of the blockchain network achieve a common agreement about the actual state of the distributed ledger [26]. A well-designed consensus protocol can ensure the fault tolerance, authenticity, and security of a blockchain system.

Ethereum Consensus Algorithm.
e Ethereum consensus is based on the Ethash algorithm, also known as the Dagger Hashimoto algorithm. e simplified diagram [28] described in Figure 2 represents this algorithm structure and particularly the main one [29]. e profiling of the Ethash algorithm shows that the most used and consuming part is the Keccak 256 part. erefore, we will implement this part in HW.

From SW to HW Architecture
We notice that the implementation of new technologies (IoT, identification, recognition, virtual reality, etc.) is no longer carried out on traditional platforms (PCs, servers, GPUs, etc.) but on embedded systems that can be either generic or well-tailored to the specific requirements of these emerging applications.
To set up a customized solution, it is important to use a mixed SW/HW design allowing adequate mixture of programmability and computing power.
Unlike the development of computer-based software and systems, which is very resource-intensive, the implementation of a System-on-Chip is based on a specific methodology to meet the limitations imposed by the target platforms. In this section, we will characterize the methodology used to realize the design flow of system-on-chip. e development can be carried out according to several models. e V model presents the development cycle of a system. is approach is based on two axes: An axis of specification and design: this axis has as a parameter realization time An axis of realization and integration: its parameters are the systems and components Starting from a defined need, the first stage, which is the specification stage, consists of defining the system to be generically realized and then specifying the performances to be respected. en, the design stage must be implemented. As for the specification, the design is based on two parts: a first generic followed by a second one which is detailed and during which the system is subdivided into different blocks.
is conceptual approach leaves room for the realization of the components of our system.
Once the system realization part is completed, a battery of tests is necessary before obtaining our finished product. We start  with unit tests to verify the functioning of the previously defined blocks. en, an experiment of integration of these different blocks is carried out. After that, a performance verification is set up to meet the specification presented in the first part. en, the system integration is done for validation. Finally, an operational test is carried out to verify compliance with the expected specification. is being completed, our product is finalized. It thus meets the need defined previously [21,30].

Embedded System Fields of Application
e use of embedded systems emerges in several fields such as agriculture, industry 4.0, smart cities, and e-health. To design efficient embedded architectures for blockchain applications, we need to profile the consensus algorithm to design an architecture on the FPGA platform. It is possible to have as a result a monoprocessor or a multiprocessor architecture. Different tasks are subdivided on processors during program execution.
In other systems, it is possible to have a monoprocessor architecture with coprocessors (also named IPs). ese coprocessors are designed using a HW language such as VHDL, Verilog, System Verilog, and System C.
Such an approach was used for example in the study by Frikha et al. [31], where the authors implemented an adaptive multimedia multiconstraints' system based on dynamic reconfiguration on FPGAs with augmented reality as a case study. In the work by Boutekkouk [32], the author presented the design of an intelligent embedded system. is system can be used in many artificial intelligence-based systems such as expert systems, neural networks, and other sophisticated artificial intelligence (AI) models to guarantee some important characteristics such as self-learning, selfoptimising, and self-adaptation. Among the embedded systems' application fields, we can also mention smart cities [33], smart agriculture [34], and e-health [35]. All these fields based on IoT use embedded systems mainly for their adaptability in designing systems with low energy consumption.
In this paper, we choose a monoprocessor system coupled with hardware accelerators that executes the most complex part of the application. Using the same approach proposed in the study by Frikha et al. [31], we profiled the consensus algorithm proposed by Ethereum. anks to this profiling, we will implement the best architecture to minimize the resources and improve the SW execution time.
is will allow us to choose the best possible architecture. We propose to implement an embedded architecture for the Ethereum hash algorithm. is algorithm named Ethash is a SHA 3. e implemented part is the Keccak 256 algorithm.
To the best of our knowledge, this blockchained approach has not been previously implemented. Additionally, the key idea of the work is to address the problem of important energy consumption of public blockchains.

The Proposed Consensus Embedded Architecture
Since the PoW consensus algorithm is the most time-consuming and energy-intensive part of the blockchain process, the aim of this paper is to reduce its execution time. is proposed approach is based on a mixed on-chain and off-chain implementation. Only one part of the implementation (on-chain part) is connected to the blockchain. e other part (off-chain part) is connected directly to the on-chain part, and it is responsible for giving the consensus result.
More precisely, the PoW consensus and, more specifically, the part of the Keccak 256 algorithm on FPGA will do the off-chain encryption.
Inspired by Baklouti and Abid [25], we have set up this system to implement the PoW consensus and more specifically the part of the Keccak 256 algorithm on FPGA to do the off-chain encryption.
Keccak 256 is a part of Ethash which is the consensus of the PoW repetition. Figure 3 represents the Keccak deployment architecture.
In this section, we are going to compare the software implementation and the hardware implementation of the Keccak hash algorithm. After profiling, Keccak is the more complex, energy consuming, time consuming, and repeated function.
As input of the Keccak system, we have the proposed new block, the head of the most recent block, and finally the nonce value. e hash and the combination of different blocks give a hash number. If this number is less than the target value, then we solved the PoW, else we must increment with a new nonce value and try the whole process again.
e mining difficulty was determined by comparing the hash number and the target value. As mentioned in the work by Chaari [26], the implementation of the blockchain Ethereum node on a resource-constrained platform such as the Raspberry PI3 shows that the implementation of PoW leads to the platform crash.
As a first contribution, we present here the study we carried out in order to divide our node on two parts: a node without PoW that works on-chain: it runs on the ARM processors of the Raspberry Pi 3, and an off-chain verification part implemented on FPGAs.
In the following section, we will describe the obtained results and the implemented system.

Initial System.
After writing our genesis file and running the init command on the Raspberry Pi 3, the initialization of our blockchain was successful. en, we were able to execute the node and access the JavaScript console where we performed some basic ether transfer transaction between the predefined accounts which were successfully submitted. But the moment the mining is being started, the Raspberry Pi 3 would overheat and stopped functioning. For that, we executed another node from the same blockchain on a computer that was able to mine the transactions and synchronize the results with the node running on the Raspberry Pi 3 as illustrated in Figure 4. erefore, using Proof of Work, a Raspberry Pi3 can only synchronize the mined blocks but not mine new ones. at is why we decided to implement consensus system off-chain. By taking the code implemented in the Java language related to the Ethereum node, we managed to isolate the part corresponding to the PoW consensus. is code has also been profiled to obtain the result of Figure 3. e result of this profiling is described in Figure 5. Several loops are present: the relative loop to the nonce is repetitive and independent of any other input. We can consequently implement any VHDL system and create several generators of nonce values.

VHDL Keccak Implementation.
Due to the health crisis and the impossibility to have more performant platforms, we choose to use the available ones. Henceforth, we use the Raspberry Pi 3. For the ZedBoard, we can explain it to its outperformance compared to the Virtex 5 ML 507 one. e implementation of the Keccak code in VHDL has been done to create an ASIC allowing the working off-chain to do the hash and to set up the PoW consensus. We used the Xilinx ZedBoard FPGA as a prototyping platform to realize the Keccak [29]. is board is an evaluation and a development board based on the Xilinx Zynq 7000.
Combining a dual Cortex-A9 Processing System (PS) with 85,000 Series-7 Programmable Logic (PL) cells, the Zynq-7000 AP SoC can be targeted for broad use in many applications. e ZedBoard's robust mix of on-board peripherals and expansion capabilities make it an ideal platform for both novice and experienced designers [29].
To  Figure 6 represents the proposed architecture of Keccak RTL implementation architecture. It contains different inputs and outputs but also the logic gates, Fifo, Padder bloc, Hash bloc, and different RTL registers.

Simulation Results.
After the simulation results of the Keccak RTL implementation, the VHDL code simulation is proposed in Figure 7. e value of nonce to obtain the hash value is indicated in the figure by the arrow. Note that the nonce value used to obtain the proper hash is 239327.

SW and HW Comparison.
After implementing the code, we tried to compare the SW version of the code implemented in Java running on Raspberry PI 3 and the two architectures. e HW1 architecture represents the complete implementation on the Keccak code presented in Figure 3. HW2 consists of using 4 nonce-generating IPs working in parallel in order to parallelize the code and to minimize the execution time.
We notice that the HW1 gain compared to the SW is approximately 5.25x. e HW2 gain compared to the SW is approximately 7.55x. e energy consumption on the Raspberry PI 3 is 3.7 W; however, in the HW version, we note that the HW1 requires 1.2 W, while the second requires 1.7 W. e difference in consumption despite the use of the same platform (ZedBoard) for the HW1 and HW2 is due to the duplication of the IPs of nonce generators. Table 1 illustrates the obtained result of HW/SW comparison.
e system obtained after this implementation is described in the figure. We can find there a description of the classical architecture of Ethereum followed by the on-chain/ off-chain architecture that has been adopted. Figure 8 presents the proposed part in the paper with an on-chain architecture implemented on Raspberry Pi3, whereas the offchain one is set up on FPGA. Figure 9 represents a comparison between the classical blockchain architecture and the proposed architecture.

Conclusion
In this paper, we have highlighted the HW implementation of the PoW consensus. is consensus is used in the Ethereum blockchain. We were able to demonstrate that, to successfully implement this consensus on low-resource platforms, it is possible to use an on-chain system to successfully transfer and receive data and an off-chain system to implement the consensus and send the result to the on-chain node. is system, despite its complexity, allows a gain of at least 5 times compared to a pure SW system in execution time, while minimizing energy consumption. It can also be improved and accelerated by playing on the different blocks of the consensus. Indeed, we have added 4 IPs of nonce generators, but we could improve the result even more by adding more Keccak 256 and or 512 IPs to have a more efficient and faster system.

Security and Communication Networks
Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.