Nakamoto Consensus to Accelerate Supervised Classification Algorithms for Multiparty Computing

Bitcoin mining consumes tremendous amounts of electricity to solve the hash problem. At the same time, large-scale applications of artificial intelligence (AI) require efficient and secure computing. (ere are many computing devices in use, and the hardware resources are highly heterogeneous. (is means a cooperation mechanism is needed to realize cooperation among computing devices, and a good calculation structure is required in the case of data dispersion. In this paper, we propose an architecture where devices (also called nodes) can reach a consensus on task results using off-chain smart contracts and private data. (e proposed distributed computing architecture can accelerate computing-intensive and data-intensive supervised classification algorithms with limited resources. (is architecture can significantly increase privacy protection and prevent leakage of distributed data. Our proposed architecture can support heterogeneous data, making computing on each device more efficient. We used mathematical formulas to prove the correctness and robustness of our system and deduced the condition to stop a given task. In the experiments, we transformed Bitcoin hash collision into distributed computing on several nodes and evaluated the training and prediction accuracy for handwritten digit images (MNIST). (e experimental results demonstrate the effectiveness of the proposed method.


Introduction
Artificial intelligence (AI) has significantly affected human life in various aspects, solving various tasks, such as image classification and object detection based on supervised classification algorithms. Supervised classification algorithms use computational methods to learn information directly from data, where a positive and proportional relationship exists between the number of training samples and the accuracy of prediction results. e increase in the number of training samples would mean increasing the algorithm's training sample time.
us, having an architecture where the nodes can quickly get results and reach a consensus through off-chain smart contracts and private data would be extremely useful.
At present, the research on computational power based on blockchains can be summarized as follows: (1) Some proposed works leverage auction mechanisms to off-load tasks [1][2][3][4][5][6]. In these works, an application is divided into multiple tasks, and the tasks are off-loaded to a cloud server or edge servers. Consuming time, energy, and edge servers' reputation are the indexes of auctions. To our knowledge, heterogeneous devices and privacy issues are not considered. (2) Several methodologies use deep learning to derive task off-loading for heterogeneous devices [7][8][9]. However, privacy issues have not been well solved, or computing devices must work in a permitted network. (3) Distributed computing based on Federated Learning (FL) has been proposed because it can protect privacy and reduce network burden [10][11][12][13]. FL can complete AI computation without disclosing data, but FL is not suitable for heterogeneous devices.
Smart contracts are naturally distributed computation technologies. As a mature technology, on-chain smart contracts have some shortcomings in running complex programs. For example, Bitcoin scripts are not Turing-complete [13], and Ethereum does not support the execution of complex computations. [14][15][16].
A consistent result must be obtained in calculating distributed and heterogeneous data while considering speed, energy consumption, and privacy protection. Reputation is an important index in evaluating nodes. Nodes with a higher reputation can process more tasks and get more rewards. Many blockchain-based computing models punish malicious nodes [17][18][19]. However, with different training samples and devices, even honest nodes can make mistakes in AI calculations (e.g., supervised classification algorithms). Such penalties will greatly fluctuate the reputation of nodes and affect the calculation results.
In our proposed model, off-chain smart contracts and private data (edge data center) are leveraged for multiparty computing. Our method can speed up training and improve prediction accuracy and privacy. Our experimental results on MNIST show that the cooperation of many low-power nodes is not weaker compared to centralized servers. e key points of this paper can be summarized as follows: (1) We propose strong privacy protection and compatibility-computing model for supervised classification algorithms to accelerate data training in supervised classification and improve the accuracy of prediction results. (2) In the proposed architecture, we proved that the prediction result with the most supporters is most likely to be the right one. (3) Using Nakamoto consensus, we calculate the impact of the number of nodes and the accuracy of single node prediction for the entire blockchain. We also obtained the condition for task termination through calculations. (4) e influences of malicious nodes and lazy nodes on the prediction results are discussed further, and we proved the robustness against malicious nodes and lazy nodes. e remainder of this article is organized as follows. In Section 2, the background of the study is discussed, and the rationale behind the design is explained. In Section 3, the novel distributed computing architecture is demonstrated. Section 4 explains how the methodology improves prediction accuracy and the robustness of the supervised classification algorithm. Section 5 presents the experimental results and discusses the performance of the proposed architecture. Finally, Section 6 presents the conclusions of this study.

Background
Similar to traditional programs, smart contracts can be stored and executed. However, smart contracts are distributed programs residing in the blockchain. ey are automatically triggered according to instructions and do not require the participation of a third party. For a trigger event, the result of distributed execution needs to be unique, such that all nodes need to admit the solution known as the consensus mechanism. is section introduces the consensus mechanism in Bitcoin and presents relevant research on smart contracts.

Bitcoin and Nakamoto
Consensus. At present, blockchain technology has become a research hot spot in finance, IoT, copyright protection, and information technology. It is a decentralized peer-to-peer (P2P) architecture, where the nodes consist as network participants. Blockchain establishes transparency and trust without third-party insurance.
As the first widely deployed and decentralized global currency, Bitcoin has attracted increasing attention. Nodes in Bitcoin compete to perform challenging Proof of Work (PoW) problems, in which solutions to the problems are worked out about every ten minutes. e winners who solve the problems get rewards through bonuses, which are stored in blocks. e blocks propagate among Bitcoin nodes through the network, so the bonuses are recorded by each node redundancy. When a block is accepted and added to the blockchain, the height of the blockchain is increased by one. In some cases, multiple nodes solve the problem before receiving solutions from other nodes, and so multiple blocks may be generated at a certain height.
A consensus mechanism is an algorithm in which a group of nodes can reach an agreement on events and sequences simultaneously.
ere are a lot of consensus mechanisms, such as PBFT [20], Paxos [21], Raft [22], Proof of Stake (PoS) [23], Tangle [24], and Shimmer [19]. e core technology to reach a consensus on Bitcoin is the Nakamoto consensus, as presented in Figure 1. In the figure, there are blocks chained in succession from Block 0 to Block n-1 , where Block 0 is at height 0 and Block n − 1 is at height n − 1. At height n, node Alice and node Bob declare their blocks Block n and Block * n concurrently. Node Carol receives Block n before Block * n , so Block n is added to Carol's fork as a tip, and Block * n is stored as a backup. e same goes for Alice and Bob, wherein they build forks using their blocks, and subsequent blocks are stored in their memories as backup blocks. Alice and Carol keep the same tips in their memories, while Bob keeps them differently. At height n + 1, Carol publishes its block Block n+1 while Block_n's hash is kept in Block n + 1' s header. Because the fork with Block n + 1 is longer than that with Block * n , when Bob receives Block n + 1 , it keeps Block n + 1 in memory, activates Block n + 1 's previous blocks (i.e., Block * n ), and then reserves Block * n as a backup. At height n + 2, Bob publishes its new block Block n + 2 with the header pointing to Block n + 1 . As a result, Alice's bonus in Block n is accepted by every node, and Bob's bonus in Block * n is ignored.
At height n, there are two blocks called forks. After several rounds of competition, the longest fork is considered to be the best chain in Bitcoin. e computing competition of the PoW problem for computing power incentives in Bitcoin is also called mining [25]. Bitcoin adjusts the difficulty of mining to ensure that a result is worked out every 10 minutes. In early October 2020, Bitcoin difficulty is 19.30 T, and the hash rate reaches 138.09 exahashes per second (EH/s) [26]. Such giant power makes Bitcoin the most energy-consuming application. According to Digiconomist [27], the estimated power used by miners to verify Bitcoin blockchain transactions is 70.89TWh a year, which is greater than the annual electric consumption of Colombia and 41 other countries. erefore, the power waste in the hash collision has become an emerging concern.

Smart Contracts.
e concept of smart contract was proposed by Nick Szabo in the 1990s [28]. He proposed embedding the concept of contracts into computer components. However, this concept was only theoretical because the technologies and protocols needed were not available at the time. Today, these requirements are available, allowing the implementation of smart contracts with blockchain technology.
ere are two types of smart contracts: on-chain smart contracts and off-chain smart contracts. On-chain smart contracts are executed by all nodes in the network, such as Bitcoin scripts [29], Ethereum smart contracts [30], and Fabric chain codes [31]. On-chain smart contracts have three disadvantages. First, they must be run by all nodes, which means that they scale poorly. Second, the Turing downtime problem (also called the statement loop problem) directly causes the smart contract environment to execute script files in an infinite loop, resulting in increased running pressure until the system crashes. ird, although external data can be fed to smart contracts by oracles, data are visible to all nodes.
Off-chain smart contracts are executed outside of the core protocol. Only a subset of nodes need to execute them, such as the ongoing IOTA Smart Contracts [32], FastKitten [17], Ekiden [33], and ZoKrates [34]. In these systems, the calculation of tasks is performed off-chain by using multiparty computation (MPC) [35,36], and the consensus is reached on-chain. While off-chain smart contracts do not put burden on the network and can handle heterogeneous data, their overall security depends on the security of each device. e summary of details of these works is presented in Table 1. ough these works adopt off-chain smart contracts to enable efficient decentralized task execution at low cost, none consider the heterogeneous data and devices.

Novel Distributed Computing Architecture
Traditionally, computers spend much time training a large number of samples.
ere are many types of computing devices used ubiquitously around the world, such as smartphones, smart vehicles, and wearable devices. erefore, we propose a blockchain-based architecture for supervised classification. e model can gather the computing power of scattered equipment and reduce calculation time while ensuring accuracy. Devices are heterogeneous; some are powerful with their hardware, some have strong operating systems, and some are efficient with their training samples. To retain each device's advantage, we propose a blockchain framework named RapidTrainChain with flexible off-chain smart contracts and compatible consensus named Proof of Prediction (PoP) for node cooperation. e longest chain is selected as the best chain in PoP. In each block of the chain, the transactions and task solutions are stored. e same solution is linked in the same fork, while different solutions are in different forks.
RapidTrainChain is designed as a distributed computing system to maximize overall performance and protect the data. e system architecture is shown in Figure 2. Off-chain algorithms and private data are managed by devices. Devices are also called nodes in the blockchain. When Rapid-TrainChain receives a task, the node determines whether to  Security and Communication Networks start a new task. If the task needs to be started, the node triggers the off-chain smart contract through an interface to start computing. e node can start working on a task if the node is free. After the nodes complete a task, the prediction results are stored in the blocks of RapidTrainChain. Same solutions are stored in the same fork, and the longest fork is considered the best chain. Solutions stored in blocks of the best chain are the final solution for the task. e node can decide when to stop working, as discussed in Section 4.5.
In contrast with the hash collision in Bitcoin or Ethereum, the accuracy of prediction results cannot be verified by nodes, so each legal prediction result from every node is stored in RapidTrainChain. Multiple nodes can also generate the same prediction results simultaneously, giving rise to blocks with the same prediction results stored on different forks. In this case, the prediction results with the most supporters cannot constitute the best chain. To ensure that the blocks supporting the same prediction are in the same fork, every node checks whether it is consistent with the predicted result in the latest received block. If the node is consistent and its block is not in the fork, it publishes a new block following the received one. e workflow is shown in Figure 3.
All nodes begin to train at t 0 . Bob initially finishes training and publishes its prediction result in the yellow block at t 1 . Later, Alice and Carol publish their findings at t 2 . Alice's and Carol's prediction results are the same but they are different from Bob's. As Alice and Carol publish their blocks simultaneously, they cannot follow each other's block. Also, they are unable to follow Bob's block as their prediction results are different. At t 3 , Alice and Carol publish their blocks Block n + 1 and Block * n+1 separately again, so their blocks are both the longest. At t 4 , Dave publishes its prediction result the same as Alice's and Carol's. Dave receives Block n + 1 earlier than Block * n+1 , so Block n + 2 follows Block n + 1 . e fork with Block n + 2 is the best chain. Every node has its own special off-chain smart contracts and private training data. When a new task is started, nodes begin to train their private samples with their off-chain smart contracts and publish the prediction results into blocks. Blocks with the same prediction results are connected to the same fork, while different prediction results are stored in different forks. e  result with most supporters, which is the longest fork, is considered the best chain. e accuracy of the prediction is discussed in Section 4.1.
Parameters and descriptions used in this paper are listed in Table 2.
For the following reasons, the longest fork is the best chain in PoP and PoW, but PoP is different from PoW: (1) PoP does not work on hash collision or certain algorithms. It works with various off-chain smart contracts. (2) PoP does not wait for a certain period for confirmation and security. e confirmation condition for task solving is discussed in Section 4.3 and Section 4.4.
(3) Nodes do not verify the correctness of prediction results but reject illegal blocks. (4) e same prediction results are chained in blocks of a fork.

Quantification and Proofs
e smart contract is executed on multiple nodes in a distributed way. Because each node cannot have all the training samples, a node can get incorrect results. is section shows that the consensus mechanism can ensure that the result of the voting is correct. When some nodes skip calculations or cheat RapidTrainChain out of their own selfish desire, they do not affect RapidTrainChain in obtaining the right results.

Accuracy Estimation.
Because the data is distributed and private, the training samples of one node are comparatively less than in a centralized system, which suggests that the node has low predictive accuracy. However, nodes that collaborate through blockchain technology can provide high-accuracy prediction results.
In this paper, RapidTrainChain works on the premise that appropriate private data and smart contracts are adopted. For example, if a delivery person's smartphone shows relatively more people ordering hotpot today, it is more likely that the weather would be cold rather than hot or mild.
is may suggest that takeout order data and appropriate off-chain smart contracts on smartphones could be used for weather inference. Under this premise, nodes are more likely to choose the correct classification.
As in the example, the weather may either be hot, mild, or cold, so |C weather | � 3. e correct solution is also called target. If today is cold, target weather � cold. Under the premise that private data and off-chain smart contracts are appropriate, node i is more likely to predict cold weather; that is, p node i weather cold > 1/3. Just like voting, the solution with the most supporters will be elected. e more participating nodes, the higher the accuracy. RapidTrainChain's accuracy P x target x changes with the number of nodes and the accuracy of each node. Let us start the proof with a simple case: (1) For any solution to a given task, each node has the same probability of working out; that is, ∀node i , node j ∈ Nodes, ∀c k ∈ C x , p node i x c k � p node j x c k .
(2) Except for target x , the probability of a node getting all other solutions is the same; that is, ∀c l , Algorithm 1 calculates the accuracy of Rapid-TrainChain's P x target x . Using this algorithm, we can get the Suppose that the accuracy of a node named node j is p node j x target x ; the total accuracy of RapidTrainChain (P x target x ) can be characterized as follows: (1) If node j properly trains samples and computes more accurately than node i , that is, p node j x target x > p node i x target x > 1/|C x |, while the number of nodes remains constant, the accuracy P x target x is improved.
(2) If node j properly trains samples but computes less accurately than node i , that is, p node j x target x > p node i x target x > 1/|C x |, given enough nodes computing at least as accurately as node j , the accuracy P x target x is also high. (1)

Robustness with Lazy
Nodes. Some nodes may skip training and make predictions directly. e prediction results are randomly picked from the set of all solutions (C x ), and these nodes are called lazy nodes. Nodes that properly train samples are called honest nodes. e accuracies for these types of nodes are as follows: (1) For lazy nodes, p lazy node x target x � 1/|C x |, and, (2) For honest nodes, p honest node where lazy_node ∈ Nodes lazy and honest_node ∈ Nodes honest . erefore, the following equations can be used to calculate the expected fork lengths for the different solutions: Node s honest × 1 − p honest node Equation (2) is the expected length of a fork with target x , that is, |Node s target x x |. Equation (3) is the expected length of other forks. e distance between the tips of forks is equal to equation (2) minus equation (3); that is, |Node s honest | × (p honest node x target x × |C x | − 1)/|C x | − 1. As long as p honest node x target x > 1/|C x | and |C x | > 1, the greater the number of honest nodes is, the more likely the fork with target x would be the longest. e number of lazy nodes does not affect accuracy.

Solutions Competitions.
Suppose that there is one solution that is easier to calculate than all other solutions except target x , that is, ∃c k ∈ C x − target x , c k , p honest node x c k ≥ p honest node x c l . e race between the fork with c k (Node s c k x ) and that with target x (Node s target x x ) can be characterized as a Binomial Random Walk. e probability (pro z ) indicating the length of forks with c k catching up that of target x from z blocks behind pro z is given in the following equation: If p honest node x target x > p honest node x c k , pro z drops exponentially as z increases. As the number of accurate blocks increases, the chances of p honest node x c k become extremely smaller.    Security and Communication Networks 4.5. Task Duration. RapidTrainChain works on one task until one fork is unambiguously the longest, such that the length of the fork satisfies the condition: where c k is a solution belonging to C x . If a fork with c k is the longest (i.e., no fork can possibly be longer), Rapid-TrainChain stops the task. However, there are infinite nodes in the public network. If forks are very unlikely to catch up with one fork, RapidTrainChain should stop working on this task. As discussed in Section 4.3 and Section 4.4, if there are more than |Node s Sybil |/p honest node x target x − p honest node x c w honest nodes after a certain period and supposing p honest node x c w /p honest node x target x ≤ 0.5, RapidTrainChain stops the working task when one fork is 21 blocks longer than the others.

Implementation and Evaluation
We enhanced the mining function in Bitcoin source codes (generatetoaddress in mining.cpp), validation functions (CheckProofOfWork in pow.cpp), and other Bitcoin-related functions to invoke off-chain smart contracts. Off-chain smart contracts can train samples and make predictions and store these prediction results into blocks. We set up a powerful computer and 10 less powerful nodes in Rapid-TrainChain, as shown in Table 3. Each node in Rapid-TrainChain possesses 5,000 training samples, while a powerful computer holds 50,000 training samples. We monitored the performance of nodes in RapidTrainChain and compared it with the computer's performance. To make the performance data comparable, nodes in Rapid-TrainChain and the powerful computer adopted the same algorithm in training their samples and making predictions repeatedly.  Security and Communication Networks e summary of node performances is presented in Figure 6. e main highlights are as follows: (1) From Figure 6(a), the RapidTrainChain and powerful computer start simultaneously at time � 20 seconds. RapidTrainChain stops at time � 1240 seconds, while the powerful computer stops at time � 5420 seconds. e powerful computer takes more than four times as long as RapidTrainChain. (2) Figures 6(a) and 6(c) present the CPU and memory performance charts, showing that a single node in RapidTrainChain consumes much less CPU computing and memory resources. is is because the node trains fewer samples, and the calculation burden is shared. (3) Figure 6(d) shows the data storage performance chart. A single node in RapidTrainChain requires a little more storage than the powerful computer because it needs to store messages from other nodes. (4) As shown in the network performance chart in Figure 6(b), the powerful computer consumes little bandwidth, while RapidTrainChain uses some bandwidth for block transfer.
We used a convolution neural network (CNN) algorithm to predict on MNIST. Because the existing CNN algorithm performs very well in digital recognition, the accuracy of prediction would be high even if only 5,000 training samples were used. us, we divided the 50,000 samples into 20 parts (i.e., each containing 2,500 samples) and assigned the parts to 40 nodes. As shown in Figure 7, the accuracy of Rap-idTrainChain is much higher than the average accuracy of nodes and increases as the number of nodes increases.

Conclusion
In this paper, we presented a novel supervised learning approach based on Bitcoin (RapidTrainChain). In the proposed algorithm, we introduced a rapid, compatible consensus mechanism (PoP), which helps RapidTrainChain make an accurate prediction. We formalized the cooperation mechanism to reduce the workload of a single node while maintaining overall accuracy, improving overall efficiency, and ensuring overall privacy. In the experiment, we showed the influence of honest nodes, lazy nodes, and Sybil attackers on the overall accuracy. We implemented our proposed algorithm and evaluated its efficiency. Our results suggest that RapidTrainChain does not depend heavily on the computational power of single nodes and is friendly to heterogeneous devices. We found that the more nodes are used in RapidTrainChain, the more secure the system becomes. Moreover, our results showed that the number of nodes does not affect the processing time for a given task and that RapidTrainChain can be applied in the public network.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.