Secure Data Collaborative Computing Scheme Based on Blockchain

With the rapid development of information technology, different organizations cooperate with each other to share data information and make full use of data value. Not only should the integrity and privacy of data be guaranteed but also the collaborative computing should be carried out on the basis of data sharing. In this paper, in order to achieve the fairness of data security sharing and collaborative computing, a security data collaborative computing scheme based on blockchain is proposed. A data storage query model based on Bloom filter is designed to improve the efficiency of data query sharing. ,e MPC contract is designed according to the specific requirements. ,e participants are rational, and the contract encourages the participants to implement the agreement honestly to achieve fair calculation. A secure multiparty computation based on secret sharing is introduced. ,e problem of identity and vote privacy in electronic voting is solved. ,e scheme is analyzed and discussed from storage expansion, anticollusion, verifiability, and privacy.


Introduction
Secure multiparty computing (SMPC) is a kind of privacy collaborative computing in which participants do not trust each other and have no trusted third party. It is applicable to solve the problem of mutual cooperation and mutual pursuit of common interests but not complete trust between organizations. Yao proposed a secure two-party computation to solve the "millionaire problem" in literature [1], which was extended by Goldreich et al. [2]. e basic model of secure multiparty computation is established theoretically.
Specifically, SMPC means that the participants have their own data by calculating the function y i � f(x i ) (1 ≤ i ≤ n).
e corresponding calculation results are obtained, y i . In this calculation process, the calculation function used is f. Replace a trusted third party in an ideal situation. e basic requirement of secure multiparty computing protocol is to ensure the security of the protocol and the fairness of computation. However, some participants in the calculation conspire to disclose data information. e common collusion attacks can be divided into two types: semihonest participants (passive attack): participants perform computing tasks in accordance with the protocol and may leak their input data and calculation results to attackers, that is, attackers can obtain data. Malicious participants (active attack): the participants perform the calculation task according to the attacker's request. It not only discloses the input data and calculation results to the attacker but also tampers with the data or even terminates the protocol according to the attacker's intention.
At present, in the research of SMPC, literatures [3,4] focus on how to prevent collusion in multiparty computing, focusing on anticollusion but not paying attention to privacy protection. At the same time, there are some problems in SMPC, such as frequent interaction between participants, which reduces the efficiency, and only part of the output of participants cannot achieve fairness. In order to solve the above problems, the works [5,6] focus on the design of smart contracts for multiparty cooperation and propose solutions to solve specific problems such as collusion among participants and contract disputes. Based on bitcoin network [7][8][9], a fair SMPC protocol based on penalty mechanism is proposed. In order to solve the fairness and robustness problems in SMPC, BFR-MPC scheme is proposed [10]. A kind of incentive mechanism is used to encourage all parties to cooperate, and those who do not cooperate will be punished economically. e fairness of the scheme is proved by game theory. In [11], the blocks are partitioned in MapReduce framework to realize data storage, and the improved homomorphic encryption is used to directly process the ciphertext and proxy reencryption is used for data sharing.
In order to solve the fairness and privacy problems of SMPC, blockchain has become an effective and feasible solution. It provides a trusted execution environment for SMPC and uses incentive mechanism to ensure computational fairness. However, there are also some problems: (1) SMPC based on bitcoin network has limitations in practical application scenarios because it cannot provide Turing complete implementation of complex functions. (2) In order to ensure the fairness of SMPC, incentive mechanism is introduced, but there is no in-depth study on the security and query efficiency of data storage. (3) In order to ensure the real-time performance of transaction calculation, it is necessary to improve the consensus algorithm to improve the consensus efficiency.
In response to the above problems, this paper combines the key technology of blockchain with SMPC. (1) According to the actual demand of computing transaction, the calculation contract is designed, which is convenient for complex practical scenarios. (2) It provides flexible data access for users by using Bloom filter and realizes efficient and feasible authorized access privacy data sharing. (3) e improved consensus algorithm is used to improve the efficiency of consensus and make the nodes consistent quickly. In the SMPC scheme based on blockchain, secret sharing is carried out to prevent the participants from conspiring to disclose the data information, thus ensuring the safety of the data. e incentive mechanism of blockchain promotes the fairness of computing. e rest of this paper is organized as follows: in Section 2, we introduce key technologies such as secret sharing, data storage access control, and consensus algorithms. In Section 3, we introduce a system model and an SMPC protocol algorithm based on secret sharing. In Section 4, we perform security verification and performance analysis on the proposed architecture. Section 5 analyzes how our solution solves the problems in actual application scenarios. Finally, we come to the conclusions in Section 6.

Related Work
In this section, we introduce secret sharing, the data storage access structure in the blockchain system, and the consensus algorithm (RBFT) used in our architecture.

Password Sharing.
Shamir's key sharing scheme based on gate trap is described in detail in [12]. Suppose that there are participants who do not trust each other but abide by a secret sharing protocol. e protocol is divided into two processes: secret distribution and reconstruction.
Secret distribution: choose a secret S to share among n participants. Each participant R i has a secret S i and a verifiable public identity x i (1 ≤ i ≤ n). Participant R i randomly selects a polynomial f i (x) � S i + a i1 x + · · · + a it x t to encrypt its own secret S i , where n > t, t ∈ z + . Send the secret to other participants . Secret reconstruction: when the participants negotiate to recover the secret S, a threshold t needs to be reached and only then can we reconstruct. Assuming that there are t + 1 participants who can calculate F(x 1 ), . . . , F(x t+1 ), the secret S is obtained by Lagrange interpolation formula. If the number of participants is less than the threshold t, it is impossible to get any information about S.

Data Storage Access.
Manuskin et al. [13] used fragmented Ostraka nodes to solve the problem of block storage capacity limitation, which improves the query efficiency without affecting the security of the underlying consensus mechanism. Jia et al. [14] proposed an efficient query method ElasticQM for the scalable model of blockchain storage capacity. By extending the storage structure and improving the search algorithm, the query efficiency is improved. In [15], we used Bloom filter to generate keyword index and proved that it is secure against keyword search attack, which improves data query efficiency on the basis of reducing data storage space on the chain.
In the blockchain fragmentation storage model, the data is stored on the blockchain. Blockchain will cause all blocks to store data information synchronously, which increases the complexity of consensus algorithm and takes up a lot of storage space. is results in a great waste of resources on the chain and correspondingly increases the cost of data query. e scheme combines on-chain index and off-chain storage to solve the problem that blocks are difficult to store massive data and can be better compatible with traditional databases.
In the scheme, the out-of-chain database stores all the data information of the data owner. e data owner extracts the keywords and uses the Bloom filter to generate the keyword index. en the public key of the specified data inquirer is used to encrypt the data information. e generated keyword index information is stored in the index block, which stores the key and the corresponding storage address value. If the data inquirer queries the corresponding data, he only needs to use his own private key to decrypt and obtain the data storage location according to the storage address value to get the complete original data. e specific process is shown in Figure 1.

Consensus
Algorithm. Due to the decentralized characteristics of blockchain, it is difficult to reach consensus on the storage data information between nodes, such as easy loss and damage. Although PBFT algorithm has higher consistency than other consensus algorithms, its usability is facing challenges. If a single node fails due to failure and needs to wait for the view conversion, it is difficult to recover the failed node in time.
e redundant Byzantine fault tolerance (RBFT) algorithm obtained by improving the PBFT algorithm in [16] has a complete disaster recovery mechanism, which ensures that the consensus process can be changed to dynamic nodes recovery when data inconsistency occurs. Compared with PBFT algorithm, it has higher TPS and lower delay. e scheme adopts timeout mechanism on RBFT view switching protocol, which can effectively identify and deal with the fault nodes in time.
RBFT retains the original (preprepare, prepare, and commit) processes of PBFT and has the same fault tolerance capability as PBFT. An important transaction calculation and verification link is added to ensure that a consensus is reached on the execution of transaction calculation sequence and the result of block verification. e specific process is shown in Figure 2.
RBFT consensus process steps are as follows: (1) Transaction forwarding stage: the client sends the transaction to any node, and the node receives it and broadcasts it to other nodes. (2) Preprepare phase: the master node packages the transaction into blocks according to the self-defined timeout mechanism and maximum block size strategy and verifies the transaction. Finally, the transaction information and verification results are written into the prepared message for broadcast. e related variables of RBFT were as follows: (1) RBFT limits the number of nodes N to at least 4 and can tolerate malicious f nodes at most f � ⌊(N − 1)/3⌋ (2) e number of nodes needed to reach the consensus is as follows: quorum � N + f + 1)/2

SMPC Scheme Based on Blockchain
We introduce the system model structure, calculation contract, and the MPC protocol based on secret sharing and compare and analyze the calculation complexity of the protocol algorithm in this section.

System Model.
is section introduces the blockchain network, computing network, and MPC contract in the scheme model, as shown in Figure 3. As a distributed ledger, blockchain provides a fair and reliable environment for computing. Calculation participants should register in MPC contract and pay corresponding deposit. In the process of calculation, if only some nodes get the calculation results, the deposit deduction penalty will be carried out. If each computing node complies with the protocol, it gets the corresponding output and pays a fee for it. Users are authorized to query the data stored in the chain, which realizes the data sharing and fairness of transaction calculation. Considering the actual application scenario requirements of electronic voting election, it is necessary to anonymize the identity of voters and candidates and encrypt the voting  Firstly, the data owner extracts the keywords of the stored data, generates the key index using the Bloom filter, and encrypts the corresponding index information and the address value of the data storage with the corresponding public key of the inquirer. e index table is stored in the index block of the chain, and the query uses the private key to decrypt the data.
Secondly, when the calculation requester sends the transaction calculation request, it submits the public key to the blockchain and pays the deposit to the smart contract. e improved consensus mechanism is used to ensure the consistency of nodes. According to the request content, each node queries the original data for joint calculation. After that, the calculation result is encrypted by public key, and the requester receives the result and decrypts it with private key.
Finally, the transaction is verified by the verification node after the transaction calculation. If the transaction calculation is wrong or one of the participants does not get the calculation result, the malicious participant is traced through the time stamp and punished by the smart contract. At the same time, the honest participant is rewarded and the block status is changed accordingly. On the contrary, the deposit will be refunded.

Calculation Contract.
Computing contract is a kind of smart contract in which participants negotiate and reach an agreement before participating in multiparty computing services. It is deployed on the blockchain platform in the form of code and automatically executed without human intervention trigger conditions. e contract includes the calculation node parameters, initial state, calculation process script, and execution conditions of the current calculation task. e calculation can be divided into several rounds according to the calculation tasks and conditions. Each round of calculation in the contract has an initial judgment condition and time limit. In the initialization phase, participants (computing nodes or users) register on the MPC contract to pay the deposit. When the number of nodes involved in the calculation task and the correct input meet the initial judgment conditions, the calculation process script can be executed. If there is an error prompt in the calculation process, it may be caused by nondeterministic  factors (e.g., network congestion, code vulnerability, and other factors), so it is necessary to detect and modify and reexecute. If the calculation results are obtained, the submission must be sent correctly in this round. e verification node in the contract will verify whether the submitted information is correct. At the start of the next round of computing tasks, the contract will check the correctness of the messages submitted by all participants to determine whether the calculation continues. e execution of each round in the middle is completed in an integrated sandbox environment, which cannot be found and understood. Only in the last round, it is verified that the computing nodes publish their output results correctly and reward or deduct the deposit.

MPC Protocol.
To ensure data integrity, privacy, and computing efficiency, combine homomorphic encryption and multiparty computing for transaction privacy calculations [17,18]. For arbitrary data encryption, there is no distinction in any polynomial time algorithm, and a homomorphic encryption based on semantic security is proposed [19][20][21]. reshold secret sharing solves the problem of data privacy leakage caused by the collusion of participants in SPMC [20]. Suppose that the public key cryptographic mechanism is (sk, pk, W, C, E n , D e ); the encryption algorithm and decryption algorithm are E n ( ) and D e ( ), respectively. e encryption key and decryption key are pk and sk, respectively (where pk is public and sk is secret).
Ciphertext space is C, and plaintext space is W. For any two pieces of data information n 1 , n 2 ∈ W, ∀k ∈ Z (k is a constant), if n 1 + n 2 ∈ W and kn i ∈ W (i � 1, 2), then E n(pk) n 1 ⊕ h E n(pk) n 2 � E n n 1 + n 2 , k ⊗ h E n(pk) n i � E n kn i , where "⊕ h " and " ⊗ h ," respectively, represent the addition homomorphic operator and the multiplication homomorphic operator. For the analysis of homomorphic encryption schemes [22], the homomorphic properties are applied to matrix operations in linear spaces. Assume that, in linear space V n×n , vector v � (a 1 , . . . , a n ) is an n-dimensional row vector in linear space, and v T is an n-dimensional column vector. Encryption algorithm E n is used to encrypt each element in vector v � (a 1 , . . . , a n ). Further extending it to matrix B ∈ V n×m , we can get E n (v) � E n a 1 , . . . , E n a n ⟹ E n (B) i×j � E n B i×j . (3)

Protocol Design.
Suppose that there is a set of computing participants R � (R 1 , . . . , R k ). Each participant R i has a matrix C i and a corresponding vector α i , and the corresponding relationship is R i : C i ⟶ α i (1 ≤ i ≤ k and k ≥ 4), where C i is an n × n-dimensional matrix and α i is an n-dimensional vector. e value range of the number of participants k is related to the value of the number of nodes N in the above consensus mechanism. In order to make the input parameters have privacy, the invertible matrix mentioned in literature [20] is used to camouflage the cooperative calculation of linear equations as shown below: where Q is a randomly generated n-order invertible matrix and x � x. e specific steps of the agreement are shown in Table 1.

Computational Complexity.
In the above MPC protocol, it is obtained that kn(n + 1) encryption operations have been performed in Step 1. When calculating the encryption matrix k i�1 E n(pk) (C i ) � E n(pk) ( k i�1 C i ) and the corresponding encryption vectors Step 2, there are a total of (k − 1)(n + 1)n addition homomorphic operations. In Step 3, when calculating E n(pk) (Q k i�1 C i ), there are kn 2 (n − 1) times of additive homomorphism and kn 3 times of multiplication homomorphism. A total of kn(n − 1) times of additive homomorphism and kn 2 times of multiplication homomorphism were performed when calculating E n(pk) (Q k i�1 α i ). In Step 4, a total of kn(n + 1) times decryption operations were performed.
Compared with the calculation time y1 of the multiplication homomorphism and the addition homomorphism of the SMPC protocol in [23], the calculation time y2 of this protocol is reduced. Since the improved consensus mechanism requires at least 4 participants, the lower limit of the matrix dimensions is set to 4 dimensions, as shown in Figures 4 and 5.

Security Proof and Performance Analysis
In this section, we have played the adversary (attacker) and challenger (computing participant) game under the DBDH assumptions. It proves that the scheme is safe under specific ciphertext attacks, and we analyze the performance of the scheme in anticollusion, verifiability, scalability, and privacy.

Security Proof
Lemma 1. Based on the DBDH assumption, our scheme can resist selected ciphertext attacks in the random oracle model [24], so our scheme is IND-CCA safe.
Proof. Suppose that there is a probabilistic polynomial time PPT; given a public key encryption scheme I � (Gen, En, De), the adversary A uses the auxiliary input function ϕ in the polynomial time to play game with the challenger B as follows: (1) Key generation: challenger B runs the key generation algorithm Gen( ) to obtain the public and private key pair (pk, sk) and send the public key pk to A.

Security and Communication Networks
(2) Inquiry 1: ciphertext decryption oracle OracleDe( ): A submits ciphertext C to OracleDe( ), and B runs OracleDe( ) to decrypt C to obtain the corresponding information. Key leakage oracle OracleLe( ): for any auxiliary input function ϕ ∈ I, the adversary A sends ϕ to B, and B sends ϕ(pk, sk) to A. In the scheme, the advantage of opponent A to win the game is defined as Adv A � |pr(σ � σ * ) − (1/2)|, and the advantage of opponent A to break the encryption scheme in probabilistic polynomial time PPT is ε � ε(n) � Adv (0) A , f(n) (ε is a negligible small amount). e challenge ciphertext generated by En(pk, M σ ) and En * (ϕ(g, sk), M σ ) (g is a common parameter) is indistinguishable; that is, erefore, the advantage of adversary A is negligible in the game. No adversary can break our algorithm, so our solution is safe.

Prevention of Collusion.
Collusion attack is a key problem in the real world, especially when the malicious people collude to obtain the data and information of other participants to seek benefits because of some intention (e.g., economic factors). Under the condition of semihonest model, our proposed SMPC scheme based on blockchain can avoid collusion and enable participants to execute the protocol honestly. Blockchain provides a trusted environment for collaborative computing through cryptography technology. Participants' data are encrypted and stored on the chain, and the master key needs to be reconstructed when executing the computing protocol, effectively preventing collusion among participants. At the same time, because the data is encrypted and stored on the chain, the Table 1: Specific steps of the calculation protocol. MPC protocol Input: for R i , there is an n × n-dimensional matrix C i and a corresponding n-dimensional vector α i , where 1 ≤ i ≤ k and k ≥ 4 Initialization: randomly select a pair of public and private keys (pk, sk), decompose the private key to obtain subkey sk l , (1 ≤ l ≤ k), distribute the subkey to each participant R � (R 1 , . . . , R k ), and randomly select a reversible matrix Q with the same dimension for participant R 1 Step: 1 Use public key pk to encrypt the matrix and corresponding vector owned by participant R j (2 ≤ j ≤ k) to obtain E n(pk) (C j ) and E n(pk) (α j ); then send E n(pk) (C j ) and E n(pk) (α j ) to R 1 2 R 1 uses the public key pk to encrypt the matrix C 1 and the vector α 1 to obtain E n(pk) (C 1 ) and E n(pk) (α 1 ); calculate the encryption matrix k i�1 E n(pk) (C i ) � E n(pk) ( k i�1 C i ) and the phase encryption vectors k i�1 E n(pk) (α i ) � E n(pk) ( k i�1 α i ) at the same time 3 Use the invertible matrix Q to calculate E n(pk) (Q k i�1 C i ) and E n(pk) (Q k i�1 α i ); then send them to the corresponding participant R 4 When the number of participants possessing subkey sk i reaches the threshold, the master key sk is reconstructed by Lagrange interpolation formula, and the encryption matrix D e(sk) (E n(pk)    MPC protocol algorithm is encrypted by a random reversible matrix. After analyzing the game process of the adversary and the challenger, the generated challenge ciphertext is indistinguishable, and the advantage of the adversary to break the encryption can be ignored, which proves that our solution can resist the selective ciphertext attack. Secondly, the participants register and confirm their identity information, and the blocks of data storage have unique time stamp identification. At the same time, they pay a deposit as a guarantee to abide by the contract agreement, which makes noncollusion obtain higher benefits than collusion and eliminates the motives of participants' collusion.

Verifiable.
Due to the lack of trusted third party in existing secure multiparty computing, there may be a potential security risk that participants conspire to tamper with data. e decentralization of blockchain provides a credible environment for multiparty computing, and the core of decentralization is consensus mechanism. For secure multiparty computing based on blockchain, before calculation, nodes and other nodes need to use consensus algorithm to generate blocks and verify the synchronization, so as to ensure the efficient consistency between nodes, and then homomorphic encryption calculation can be carried out to ensure the verifiability of transaction calculation.

Scalable.
Blockchain is a decentralized ledger. Information stored on the chain through encryption cannot be tampered with. e consensus generation blocks on the chain all have time stamp traceability. However, due to the limited block storage, it is not possible to store all the information completely in the block. In order to ensure the security and fast query of the storage information, it is not universal to adopt the block partition method. In the scheme, we combine the blockchain storage and the onchain index to store the original information file in the outof-chain database, and the on-chain block storage address and keyword index are easy to find.

Privacy.
For this scheme, there are three privacy protection measures. First, when data information is stored, the data information is encrypted by using password technology, the data file is encrypted and stored in the database outside the chain, and the storage address and index information are stored in the block. Only users with decryption key or authorized users can get the original data information. Second, before calculation, the data information stored in the block needs consensus mechanism to reach consensus chain. In the improved consensus algorithm, the information received from the node is verified with the information written by the master node. If the information is consistent, the stored data is correct. ird, threshold secret sharing is used to decompose the key and distribute it to each participant. In decryption, the secret reconstruction is needed to recover the master key. When the subkey share of the participant reaches the threshold, the master key can be recovered. Otherwise, the participant cannot reconstruct the key for decryption. Because the scheme is encrypted during the data storage and calculation phases, we performed a formal proof of the optional ciphertext attack on the entire scheme, which proved that our scheme is safe and can protect data privacy.

Application Scenarios
e scheme is suitable for applications with high privacy requirements and easy information sharing, for example, the typical scene of electronic election. In the traditional voting scheme, the privacy and correctness of voting depend on the credibility of the voter. Due to the great rights of the vote counter in the voting scheme, the ballot contains sensitive identity information and vote data information of the voters and voters, which may directly lead to the voters being unable to make a real choice, thus affecting the fairness and effectiveness of the voting results. e traditional voting scheme has the following limitations: (1) the voter knows the information of the vote; (2) the teller may know the identity of the voter; (3) the result of the ballot depends on the correct calculation of the voter; (4) the voter trusts the teller not to tamper with the ballot.
To solve the above problems, our proposed multiparty computing electronic voting election scheme based on blockchain ensures the anonymity of identity and the security of ballot data. e whole voting process is open and verifiable, and the security and privacy issues in the voting process are protected. If the voter is qualified to vote only after passing the registered identity authentication, once the voter's vote is submitted, it cannot be modified or deleted; except for the voting participants (i.e., voters and candidates), the third party cannot obtain the specific content of the vote; the voter can verify his or her own vote, including whether it is tampered with or not and whether it is included in the final result; all the people except the voter can check whether the vote is tampered with or not and whether it is included in the final result. Authorization can arbitrarily verify and supervise the voting process.

Scheme Comparison
rough the performance analysis, we compare and analyze the existing SMPC based on blockchain. In order to prevent collusion, Wang and Sen-ching and Luo and Li [4,23] improved the use mechanism of previous work. e influence of privacy preference on preventing collusion attack is analyzed. We also find the limitation of the same preference in symmetric game. e research scheme of SMPC based on bitcoin network [9] can prevent collusion and improve fairness through punishment mechanism. Due to its limitations, it cannot be applied to complex functional scenarios. References [10,11] are all based on blockchain for secure multiparty computing, without relying on a trusted third party. Reference [10] obtains the correct output under the premise of ensuring the privacy of the input. In order to solve the problem of fairness and robustness, a BFR-MPC scheme is proposed. e fairness is improved by encouraging the cooperation of all parties through incentive mechanism. e security data information sharing and multiparty computing model are proposed in [11]. e data is partitioned to expand the storage capacity, and the consensus algorithm is improved to ensure the consistency between nodes. e homomorphic encryption algorithm directly uses ciphertext for data security calculation to ensure privacy. Finally, the performance analysis and simulation results show that the efficiency is improved significantly. e scheme carries out multiparty security calculation on the blockchain and adopts the combination of on-chain index and off-chain index to expand the block storage, and the use of Bloom filter improves the efficiency of data query and verification. A calculation contract is designed by smart contract to pay deposit during transaction calculation to prevent collusion among participants. e scheme has high fairness and security. e specific performance comparison is shown in Table 2.

Concluding Remarks
Blockchain is a decentralized distributed storage structure, which will not cause data loss or damage due to single node failure. e scheme carries out multiparty collaborative computing on the blockchain. e data is encrypted to ensure the security. Secure multiparty computing can ensure that multiple participants who do not trust each other can perform the given computing tasks while protecting data privacy. In addition, in order to prevent the calculation participants from colluding to destroy the correctness of the calculation results, combined with the smart contract in the blockchain, the specific MPC contract is designed to encourage the participants to abide by the protocol to participate in the calculation honestly. e characteristics of blockchain, such as unforgeability, decentralization, and anonymity, combined with secure multiparty computing, can be applied in the fields of electronic auction, bidding, and medical sensitive information sharing. e next work is to apply the combination of blockchain and secure multiparty computing in specific actual scenarios to solve the existing key problems. At the same time, regulatory authorities need to formulate standards and improve regulatory policies according to the actual business needs.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.
Security and Communication Networks