Audit as You Go: A Smart Contract-Based Outsourced Data Integrity Auditing Scheme for Multiauditor Scenarios with One Person, One Vote

The data outsourcing services provided by cloud storage have greatly reduced the headache of data management for users, but the issue of remote data integrity poses further security concerns and computing burdens. The introduction of a third-party auditor (TPA) frees data owners from the auditing burden and alleviates disputes over the audit results between data owners and cloud storage providers. However, malicious cloud servers may collude with TPAs to deceive users for ﬁnancial proﬁts. Hiring multiple auditors in a single audit assignment appears to be a method to address the above problem, but the ensuing voting issues need to be further explored. In this paper, we proposed a smart contract-based outsourced data integrity auditing scheme for multiauditor scenarios. Unlike some existing schemes using reputation like factors as their voting weights, auditors in our scheme vote equally and audit as they go, without any maintenance. This mechanism not only frees auditors from trivia not related to the auditing but also avoids the drawbacks of centralization associated with over-high voting weights. The challenge used to check the integrity of the outsourced data is jointly generated by each involved auditor. Any collusion would be detected as long as there exists more than one honest auditor in the audit. We implement and deploy the scheme as Ethereum smart contracts. With the help of blockchain, the entire auditing process is public and transparent. Both the generated data and the obtained results are persisted with immutability, which ensures the traceability of all historical audits. The comprehensive theoretical and experimental analyses demonstrate that our scheme meets the claimed targets with high eﬃciency and low gas costs.


Introduction
With the rapid development of information age, individuals and organizations have produced a large amount of data. By 2025, the amount of data generated globally is expected to reach 463 exabytes each day [1]. Traditional local storage models can no longer meet the management needs of such a massive volume. Cloud storage is quickly attracting the attention of users for its scalability, low cost, and location free [2]. With technologies such as virtualization, cloud storage converges loose nodes into a powerful platform to provide unified services to users. Today, more and more people are willing to migrate their local data to leased cloud storage [3]. However, once the data is uploaded to the cloud storage, the owner completely loses control over the data. ey are obliged to access the data through the interface provided by cloud storage servers, and they have to entirely rely on the cloud storage to ensure the integrity of their data. Unfortunately, even though cloud storage employs a variety of advanced technologies to guarantee the reliability and robustness of users' data, corruption caused by hardware failure, management errors, or external attacks still occurs [4]. What is worse, malicious servers may even delete the data that is rarely accessed by users in order to free up more storage space to gain greater profit. In addition, once data integrity has been compromised intentionally and otherwise, dishonest storage servers tend to conceal the incidents to prevent their reputation from tarnishing. So how to effectively detect the integrity of data stored in the cloud storage has become a research hotspot.
In order to address this problem, several remote data integrity auditing schemes have been proposed [5][6][7][8][9][10][11][12]. ese schemes enable users to efficiently audit their data's integrity without a complete download. To achieve this, a user needs to divide the original file into blocks and then generate a tag for each block, which is used to verify the integrity of its corresponding block. When launching a file audit, a challenge will be generated and then sent to the storage server. e challenge contains a collection of selected block indexes and a collection of random numbers corresponding to the indexes. On receiving the challenge, the cloud server picks the data blocks specified in the challenge and computes them together with the random numbers to obtain an integrity proof. By verifying the proof, the data owner can determine whether the cloud server is actually keeping his data virgin or not.
To get rid of tedious audit routines and complex calculations, data owners would like to delegate TPAs to conduct audit tasks. However, introducing a third party poses additional risks that malicious cloud servers may try to trick their users by colluding with auditors. Employing multiple auditors on an audit assignment and determining the final audit outcome based on the votes of all participants can mitigate this collusion, but how to design a reasonable voting mechanism with multiple untrusted participants is still challenging. e common method to deal with inconsistent voting results in a multiparticipant scenario is weighted voting, where a weight is supposed to be maintained for each auditor, which is typically represented by reputation. e weight of an auditor stands for the extent to which his vote influences the final result. In addition, when an auditor's vote is consistent with the final result, his reputation increases, otherwise it decreases. Intuitively, weighted voting hopes to build a virtuous ecosystem where honest auditors will always tell the truth and their reputation go on rising. On the contrary, dishonest auditors who are caught cheating will receive a reduction in reputation as punishment, and their reputation will keep declining as the cheating continues. At this rate, a few reputable auditors in the system are bound to become "elders," and their excessive voice will gradually centralize the system. In contrast to weighted voting, the result of nonweighted voting depends only on the number of votes each candidate receives, and the only thing that needs to be considered is membership, namely, who can vote and who is not allowed to vote in the system. If there is no threshold for the voting, malicious attackers can easily generate a large number of accounts with a very low cost to be involved in an audit, directly affecting the final result by an overwhelming numerical advantage. is type of attack is known as the Sybil attack [13], which is common on peer-to-peer networks.
In summary, the introduction of multiple auditors may somewhat mitigate the collusion, but the problem has not been fundamentally solved and the following threats remain.
(1) In weighted voting, the mechanism would lead the system to be progressively centralized. e collusion of a few reputable auditors is enough to sway the final outcome, even if honest auditors are outnumbered. is will reduce the cost of malicious cloud servers doing evil, while also weakening user confidence in the auditing system.
(2) In nonweighted voting, without a reasonable membership for the system, Sybil attacks can be easily launched. Malicious cloud servers can generate or buy large numbers of audit accounts to promote their desired results. (3) e collusion between malicious cloud servers and auditors makes the detection of corrupted data fail. is collusion is undetectable because there is no way to distinguish whether an auditor's challenge is randomly generated or well constructed. With this, the cloud server can "truly" pass the proof verification by saving only a small part of specific data blocks.
1.1. Motivation. As mentioned above, in contrast to weighted voting, which inevitably leads to centralization, nonweighted voting only needs to design a reasonable membership mechanism to avoid Sybil attacks. Besides, the whole auditing process is considered to be written in the form of smart contracts and deployed to Ethereum, where any externally owned account (also known as a user account) can participate by simply paying a deposit. Another benefit of using smart contracts as the carrier for the multiauditor scenario is that it makes the process public, transparent, and traceable. e participants, details of the execution process, intermediate data, and the final result of the audit assignment are permanently recorded on the blockchain. You can always look up any historical audit without worrying about loss or manipulation.

Contribution.
Based on the above motivation, we design a remote data integrity audit scheme based on Ethereum smart contracts with the following features: (i) Cheating resistance. Without complete retention of user data, any server spoofing cannot pass the data integrity audit. is is the basic security requirement for remote data integrity auditing. (ii) Smart contract-based audit. e auditing process is scheduled by smart contracts. Any Ethereum externally owned account can participate in the audit and has nothing to maintain, namely, audit as you go. Every single audit instance is persisted on the blockchain, which ensures public transparency and traceability. (iii) Collusion resistance. We propose an aggregated challenge generation algorithm, where the final challenge is composed of the share independently submitted by each auditor. Such that, as long as there exists at least one honest auditor, the challenge is not going to be generated as malicious auditors might expect. We also designed a nonweighted voting mechanism, namely "one person, one vote." When the audit results come out inconsistent, the arbitration will be enforced and the honest will be rewarded and the dishonest punished.

Related Works.
Traditional remote data integrity verification mechanisms fall into two main types: one is provable data possession (PDP) and the other is proof of retrievability (PoR). In PDP, the user requests a proof by sending some randomly selected blocks to the server and then determines the integrity of the remote data by verifying that the proof is correct. e PoR scheme stores each encrypted file in a cloud server with a set of pseudorandom blocks. e client can then check the integrity of the data by verifying that the server retains the pseudorandom blocks. In 2007, Ateniese et al. [5] first defined PDP and proposed the PDP scheme. In their scheme, the data user randomly selects several blocks of data to verify the integrity of the data with less communication and computational cost. If the integrity verification of these selected blocks passes, it can be determined that the server has a high probability of having complete data. Later, Juels et al. [14] proposed a PoR model in which the main idea is to embed a set of random values called "sentinels," and the auditor can check the integrity of the data by checking the presence of sentinels at specific data points. Shacham and Waters proposed two PoR schemes based on the homomorphic linear verifier [15], which further improved the efficiency of the PoR scheme proposed by Juels and Burton [14]. To implement PDP on dynamic cloud data, Ateniese et al. proposed another PDP scheme [16], which supports all dynamic operations except insertion operations. Shen et al. [6] proposed a dynamic PDP scheme that supports fully dynamic operations. Later, various PDP and PoR schemes were proposed to extend the performance or functionality of traditional schemes. A number of common PoR and PDP schemes have emerged to enrich the integrity checking capabilities of outsourced data, such as deduplication [17], batch audit [18,19], and data update [7,20]. To reduce the computational burden on the user side, public auditing schemes [8,[10][11][12][21][22][23] are proposed to allow TPAs to audit the integrity of their cloud data on behalf of data owners. To guarantee the integrity of medical data and reduce the burden of the data owner, Li et al. [24] propose an efficient, privacy-preserving public auditing protocol for cloud-based medical storage systems that supports the functions of batch auditing and dynamic update of data. is scheme not only saves TPA and data owner computation costs but also reduces the communication overhead between TPA and cloud servers. Considering that key retention is a burden for data users, Shen et al. [25] propose a new paradigm called "data integrity auditing without private key storage," which utilizes a linear sketch with coding and error correction processes to confirm the identity of the user. To enable data integrity auditing under the multiwriter model, He et al. [26] propose the first public auditing scheme for shared data that supports fully dynamic operations. To implement the new paradigm, they proposed a specially designed authenticated structure, called the blockless Merkle tree, and a novel cryptographic primitive, called permissionbased signature in edge computing scenarios, caching data on edge servers can minimize users' data retrieval latency. However, this new architecture poses challenges for traditional data audit models. Li et al. [27] propose a new data structure named variable Merkle hash tree (VMHT) for generating the integrity proofs of those data replicas during the audit, which solves the above problem. Considering existing schemes suffer from issues of complex certificate management or key escrow problems, Gudeme et al. [28] propose a certificateless privacy-preserving public auditing scheme for dynamic shared data with group user revocation in cloud storage, without public key infrastructure (PKI) or identity-based cryptography (IBC). To verify whether an untrusted CSP stores all their replicas in different geographic locations or not. Yu et al. [29] propose a dynamic multireplica auditing scheme, with both the integrity and geographic locations of a cloud user's data replicas verified.
Recently, blockchain has been considered as one of the most promising technologies to provide security support for IoT systems [30]. It was initially used to provide digital payments [31] and is now commonly used for smart contracts [32,33] and data storage. e trust issues associated with traditional data integrity verification make the integration of blockchain into data integrity verification an inevitable trend. Based on a distributed data storage blockchain, Zhang et al. [34] proposed a privacypreserving electronic health record (EHR) public auditing scheme to prevent malicious behavior by TPA. However, it does not support batch auditing and data updates. Liu et al. [35] proposed to apply blockchain to avoid the use of TPA, and Yue et al. [36] proposed a blockchain-based framework that attempts to obtain trustworthy audit results. ey all lack the necessary considerations to ensure the credibility of the results of off-chain events. Kun et al. [37] implemented private blockchain-based data validation in an untrustworthy environment, but their solution requires building and deploying a private blockchain, which is very difficult in practice. Zhou et al. [38] proposed a witnessing model to credibly enforce smart contract-based off-chain cloud service level agreements (SLA). Miao et al. [39] proposed a mechanism to generate challenges using block hashes, but the method does not guarantee that the audit results will not be tampered with off-chain. ere are also some blockchain-based multiaudit models [37,40]. However, their proof validation process is in smart contracts or in blockchains using proof of work, which can consume excessive costs of public chains or validation time. Zhang et al. [41] propose a certificateless public verification scheme against procrastinating auditors (CPVPA) by using blockchain technology. CPVPA is built on certificateless cryptography and is free from the certificate management problem. is scheme mitigates the impact of the TPA's laziness on the audit. To solve the problem of repeated auditing of data shared by multiple tenants, Xu et al. [42] propose a blockchain-based deduplicatable data auditing mechanism, which also works out the problems such as high cost and reliance on trusted third parties in traditional approaches. Chen et al. [33] proposed a blockchain-based crowdsourcing auditing approach to achieve trustworthiness in audit results. e model relies on an untrusted audit committee. However, the scheme maintains a reputation as the voting weight for each auditor, which may introduce the disadvantage of centralization to integrity auditing.

Organization.
e rest of the paper is organized as follows. We discuss the preliminaries in Section 2. Section 3 describes the subalgorithms executed by each participant and the scheduling framework of the scheme. e security analysis and formal proof are described in Section 4. Section 5 analyses the implementation and performance. Finally, Section 6 concludes the paper.

Bilinear Map.
Let G and G T be two multiplicative cyclic groups with a large prime order q. e: G × G ⟶ G T is a bilinear map with the following properties: there exists an efficient algorithm to calculate e(u, v).

Complexity Assumption
Definition 2. (CDH assumption). e advantage for any probabilistic polynomial time (PPT) algorithm A to solve the CDH problem in G 1 is negligible. It is defined as Adv CDH Here, ε denotes a negligible value. Definition 3. (Discrete logarithm (DL) problem). Given the tuple (g, g a ) where a ∈ Z * q is unknown. the DL problem is to calculate a. Definition 4. (DL assumption). e advantage for any PPT algorithm A to solve the DL problem in G 1 is negligible. It is defined as Adv DL Here, ε denotes a negligible value.

Blockchain and Smart Contract.
Blockchain technology enables decentralized peer-to-peer transactions, coordination, and collaboration without trust through data encryption, timestamps, and distributed consensus. A "smart contract" is simply a program that runs on the blockchain. It is a collection of codes (its functions) and data (its state) that resides at a specific address on the blockchain. ey are typically used to automate the execution of an agreement so that all participants can be immediately certain of the outcome, without an intermediary's involvement or time loss. User accounts can interact with a smart contract by submitting transactions that execute a function defined in the smart contract. Smart contracts cannot be deleted by default, and interactions with them are irreversible. With the help of blockchain's immutability, the process of running smart contracts and generating data cannot be changed later.
is is very important when you want to trust something or make something more trustable. e scheduling part of the audit assignments can be stripped out of the overall audit logic and put into a smart contract. e parties participate in the audit by interacting with the contract. e contract is responsible for driving the audit process, collecting the intermediate results of each participant's calculations, assigning calculation tasks to each participant, completing the vote tally, and outputting the final audit results.

Two-Phase Commit.
e concept of two-phase commit (2PC) is derived from the database management system. It is a standardized protocol that ensures that a database commit is implemented in the situation where a commit operation must be broken into two separate parts. Since our audit scheme is based on smart contracts, any data submitted by the participants is publicly available. is poses a security risk to the operation of the protocol. e purpose of introducing 2PC in a public system is to ensure that the data submitted by each participant is confidential to others.

Proposed Scheme
In this section, we introduce the components of the proposed system and then explain the subalgorithms related to data integrity auditing and their executors.

System Model.
e system consists of a data owner, a storage provider, an auditor, and a smart contract, where there can be any number of auditors.
e data owner rent cloud storage services and outsource large amounts of data to the cloud storage. e data owner may be individual or organizational consumers. (ii) Storage Provider.
e storage provider provides cloud storage services to the data owner. It has significant storage capacity and powerful computing capability. When receiving a data auditing challenge, the storage provider should respond with an integrity proof to auditors. (iii) Auditor. e auditor challenges the storage provider and identifies the integrity of the user data by verifying the proof returned by the provider. (iv) Smart Contract. e smart contract stipulates the audit process. ere are two smart contracts in the system. While AMSC (assignment management smart contract) manages the audit assignments, AASC (audit assignment smart contract) is instantiated by AMSC and performs a specific auditing assignment.

Notations.
To make the proposed scheme more clearly understood, we summarize the main notations involved in Table 1.

Auditing Framework.
is section introduces how the smart contract boosts the interaction of each participant and achieves data security checks and aggregation. Note that all of them are executed on-chain except for the data outsourcing indicated by the dotted line in Figure 1, which is done off-chain.
(i) Deploy AMSC: AMSC is deployed at the very beginning. All storage providers, data owners, and auditors in the system are going to listen at its address for the events. (ii) Enroll File: assuming data outsourcing has been done off-chain, the data owner submits the file identifier F i d and his storage provider's address to AMSC for the file enrollment. (iii) Request Audit, Instantiate a New AASC, and Inform Audit Information: these three steps are done consecutively together. When an audit is launched, the data owner sends an auditing request to AMSC along with the fee he is willing to pay. e request includes F i d and the challenged number of data blocks c. After a brief verification, an AASC instance for this file will be deployed by AMSC. All participants listening to AMSC will receive this event. Consequently, the data owner and his storage provider begin to listen to the newly deployed AASC's address. (iv) Apply Audit: at the same time, auditors also received the above event. Any interested auditor can apply for the audit by generating two large random numbers r, s ∈ Z * p and submitting them to AASC with 2PC. Meanwhile, enough deposits are required. e detailed process of this 2PC is illustrated in Figure 2. conclusion about the result, AASC sends r s , s s , μ, σ to the data owner, who then performs an arbitration to get the final auditing result. Based on this result, AASC distributes the balance in the contract account to the auditors who achieve the same result as the data owner as their rewards. e detailed process is illustrated in Figure 3.

Algorithms.
is section introduces the calculations that each participant needs to complete in an auditing assignment.  Figure 1: Smart contract-based audit process framework.

Start
An auditor generates r,s andsubmits h(r),h(s)in the 1st phase.
An auditor submits r,s in the 2nd phase. Yes

No An auditor's r,s match its h(r)
and h(s)?
The auditor is eliminated and its deposit is educted.
The auditor's r,s is gathered to generate the challenge. The data owner submits its audit result according to the proof.

Start
Each auditor submits its audit result according to the proof.

No
All audit results are consistent?
The result is taken as the final one. permutation π: Z * q × 1, 2, . . . , n { } ⟶ 1, 2, . . . , n { }, a secure hash function h: G T ⟶ Z p , and a random value u ∈ G. en the data owner selects a random value sk � x ∈ Z * q as the secret key and calculate the public key as v � g x . Finally, release pk � q, G, G T , e, H, h, ϕ, π, u, v to public and keep sk as secret.
(2) TagGen(F, F i d , pk, sk) ⟶ Φ, t { }: this algorithm is executed by the data owner. Let the file F � m 1 , m 2 , . . . , m n be identified by 1, . . . , n). en, for each block, the data owner computes the corresponding au- is the signature of the file. e data owner then uploads the data file F and corresponding data tag { }: this algorithm is executed by the storage provider. Besides verifying the validation of the file identifier's signature, the storage provider checks the correctness of each authenticator by and output the result of the authenticator verification, 1 for true and 0 for false. (4) Challenge(·) ⟶ r s , s s : this algorithm is executed by auditors together with AASC, each auditor independently picks two big random numbers r and s from Z * p , then sends them to AASC. AASC aggregates all r ′ s and s ′ s into r s and s s , respectively. AASC finally sends the two numbers to the storage provider as the data integrity challenge.
holds, then output the auditing result, 1 for true and 0 for false.

Security Model.
We consider our scheme to fulfill the following two security requirements. First, the integrity of the challenged files is properly verified if the storage provider and auditors execute the protocol honestly. Second, the scheme resists semitrusted storage providers from deceiving the auditors about the integrity of the challenged data. It means, if the storage provider does not have the intact data file, it cannot generate the correct proof of data integrity. e first security requirement is defined as follows.
Definition 5. e proposed scheme is correct for data integrity checking, if for any random r s , s s ∈ Z * p , a data file F and the corresponding tag Φ, the following equation holds: ProofVerify r s , s s , ProofGen r s , s s , F, Φ � 1. (3) e second security requirement aims to resist three attacks mentioned in [43] launched by the storage provider, namely forge attack, replay attack, and replace attack. In each of these three attacks, the semitrusted storage provider responds to auditors with an invalid proof. We can capture the requirement through a security game that covers all three attacks.
is security game consists of adversary A and challenger C. A plays the role of a semitrusted storage provider who tries to trick auditors by forging data integrity proof. e game is described as follows: (1) C runs Setup(1 λ ) algorithm to generate pk, sk , then release pk to A. (2) A makes queries repeatedly to C for some files. C returns Φ←TagGen(F, F i d , pk, sk) to A. (3) Finally, A outputs σ, μ for a data file F and data tag Φ on the challenge r s , s s .
We define the advantage of A is Adv A � Pr[ProofVerify (r s , s s , μ, σ) � 1]. We say the adversary wins the above game if Adv A is non-negligible.

Definition 6.
e proposed scheme is sound, if there exists an efficient extraction algorithm such that, for σ, μ output by adversary A to the data file F and data tag Φ on the challenge r s , s s and A wins the above game, the extraction algorithm recovers file F from Φ and σ, μ .

Theorem 1. (Auditing correctness). When the storage provider stores the user's data correctly, the proof it generates can be verified by auditors.
Proof. Given valid proof from the storage provider P � μ, σ , the verification equation (1) in the ProofVerify algorithm will hold. Based on the properties of the bilinear mapping, the verification equation (1) can be proved correct by deriving the left-hand side from the right-hand side as follows:

Security and Communication Networks
We use the hybrid argument technique to prove soundness, as in [15]. "Hybrid arguments" have been used extensively in cryptography for many years. Such an argument is essentially a sequence of transitions based on indistinguishability. First of all, we define the following games: Game-0. Game-0 is the original game defined in Section 4.1.
Game-1. Game-1 is the same as Game-0, except that the challenger C keeps a local list of all the tags he has signed. If the adversary A has ever submitted a tag Φ that (2) has a valid signature under sk but (2) has not been signed by C, then C announces failure and aborts. Game-2. Game-2 is the same as Game-1, except that C records all responses to TagGen queries from A. If A succeeds but σ output by A is not equal to j∈I σ j v i , the challenger C announces failure and aborts. Game-3. Game-3 is the same as Game-2, except that challenger C announces failure and aborts if at least one μ ′ ≠ j∈I v j · m j .

Lemma 1. If there exists an algorithm A that can distinguish between Game-0 and Game-1 with a non-negligible probability, then we can construct an algorithm B to break the existential unforgeability with non-negligible advantage.
Analysis. If A causes C to abort in Game-1, then we can use A to construct an algorithm B against the existential unforgeability of the signature scheme.

Lemma 2. If there exists an algorithm
A that can distinguish between Game-1 and Game-2 with a non-negligible probability, then we can construct an algorithm B to break the computational Diffie-Hellman assumption with non-negligible advantage.
Analysis. Suppose that g x and g y are elements of the CDH problem and we set v � g x , u � g y . Suppose A can respond to a signature σ ′ , which is different from the expected signature σ. We can compute erefore, we can calculate g x·y � (σ ′ /σ)

Lemma 3.
If there exists an algorithm A that can distinguish between Game-2 and Game-3 with a non-negligible probability, then we can construct an algorithm B to break the computational Diffie-Hellman assumption with non-negligible advantage.
Analysis. We assume that h(·) is a random oracle controlled by an extractor that answers a hash query posed by the adversary. For η � h(S) from the extractor, the adversary outputs μ, σ such that en, the extractor sets h(S) to be η * ≠ η. e adversary outputs μ * , σ such that We divide the above two equations, then we have

Theorem 2. (Soundness). Assume that the computational
Diffie-Hellman problem is hard in bilinear groups and the digital signature scheme is existentially unforgeable. en no probabilistic polynomial-time adversary can break the soundness of the scheme with a non-negligible probability.
Proof. Any adversary's advantage in Game 3 must be 0, because if there is no intact file F, i.e., at least one μ ′ ≠ j∈I m j v j , the challenger always announces failure and aborts. According to the game sequence and Lemmas 1-3, the advantage of the adversary in the original game, Game 0 must be negligible.

Analysis of Collusion Resistance. As portrayed in the
ProofGen algorithm, r s is used as the seed for the pseudorandom function ϕ to generate the indexes of the blocks to be challenged. is means that if ϕ is inherently secure, the indexes cannot be known without knowing r s . As proven in eorem 2, the probability that a storage provider generates a proof that passes the verification without preserving the complete data is negligible. Malicious auditors can make the negotiated seed fall into the designed set by colluding with the storage provider, so that the indexes and random numbers of the challenge blocks are generated as per their expectation. e storage provider only needs to store a small part of the real data block to pass the ProofVefiry algorithm. As long as there exists at least one honest auditor involved in the audit, the generation of aggregated random numbers is then not controlled by malicious auditors, and the probability that the number happens to be in the designed set is w/|Z * p |, where w is the size of the set, which is negligible.

Discussion on Data Owner's Trustworthy.
e only security assumption in our scheme is that the data owner is honest. e data owner will perform arbitration when the auditors do not reach a consensus on the audit results. Actually, this is different from cutting out the auditors and allowing the data owner to perform the audit directly by himself. In a system where only two parties participate, the conclusions declared by either party are unconvincing. In our scheme, the arbitration will only be performed when the auditors' results are inconsistent, which means that each kind of result is reached by multiple individuals. Moreover, the auditor who lied will definitely be discovered, which allows the data owner to perform the audit directly by himself.
is leaves the auditor with no reason to lie, meaning that the arbitration may rarely be enforced.

Discussion on the Employability of Two-phase Commit.
Our program uses 2PC in two phases, Challenge Generation and Result Submission. e generation of the challenge in our scheme relies on two numbers submitted by each auditor independently, which are confidential to the other auditors. If a malicious auditor knows other auditors' numbers, then he can construct special numbers that prompt the smart contract to generate a challenge as he intends, which will make the whole scheme fail. We artificially divide the submission of secret numbers into two steps by introducing 2PC: the first step submits the hash value, and the second step submits the corresponding hash key. Due to the oneway nature of hashing, a malicious auditor cannot derive the secret numbers in the first step even if he knows the hash value, and thus cannot have any influence on the generation of the final challenge by constructing his own number. When it comes to the results submission phase, some auditors may choose to copy other auditors' results due to their laziness. In the first step, the honest auditor can concatenate the audit result with its blockchain address to calculate the hash value. In this way, the smart contract can determine whether an auditor has copied someone else's results by checking whether the hash value submitted in the second step matches the key submitted in the first commit.

Implementation and Performance Analysis
In this section, we discuss the performance of the proposed scheme in terms of computation and gas cost, respectively. We carry out a series of simulation experiments to evaluate the performance of our scheme, and the codes can be found at https://github.com/TDMaker/sc-paper. Note that, since the underlying layer of our scheme is a P2P overlay network, the network traffic required to maintain it must be much larger than other end-to-end schemes, so we have omitted the comparison of communication costs.

Environment.
e experiments were carried out on an Ubuntu Desktop 20.04 with the processor of Intel(R) Core(TM) i7-6500U CPU @ 2.50 GHz × 4 and 4 GB of RAM. In the local computing part of each participant, we use the pairing-based cryptography (PBC) library [44] and the GNU multiple precision arithmetic (GMP) [45], and we implement the simulation experiment using C language. In our experiments, we choose the parameter a.param to be the parameters of the PBC library. e smart contracts are written in Solidity language and run in the Rinkeby Ethereum test net. Each participant uses programs written in JavaScript to interact with the smart contract by calling the npm: web3 package [46].

Computation Analysis.
We analyze the computation costs of all subalgorithms of the proposed protocol. We chose the size of the data block to be 160 bits. Without loss of generality, we change the block count from 100 to 1000 with an increment of 100 in each test. Since TagGen and TagVerify are executed only once for the same file, and the time overhead is relatively large compared with other algorithms, as shown in Figure 4. e rest of the algorithms need to be executed repeatedly during each audit, as shown in Figure 5. For Setup, it is used to generate the system parameters. Since its time overhead is static and relatively small, we do not plot it on the figure and only note it 4.715 ms averaged over ten experiments.

Security and Communication Networks
For TagGen and TagVerify, it is used to compute and check data owner's outsourced authenticators, which take much longer time than other algorithms. is time cost increases with the size of the user file, which might become quite large. Fortunately, this time cost is one-time and can be done offline. For Challenge, it is to determine two random numbers to constitute the challenged block's index sequence, which is pretty fast. For ProofGen, it is to calculate integrity proofs by aggregating the challenged data blocks. is time cost relies mainly on the length of the challenge sequence and increases with the number of challenged blocks. For ProofVerify, it is to check the integrity proof, which is generated by the storage provider. is time cost is also increasing with the number of challenged blocks due to the same reason as ProofGen. In our protocol, the most frequent algorithms are Challenge, ProofGen, and ProofVerify, which are periodically performed by the storage provider and auditors.
us, data owners in our protocol have a little workload after data outsourcing except when arbitration is needed.
Comparison: to show the efficiency advantage of our scheme, we compare it with the schemes proposed in [24,25,[47][48][49]. We list the results in Table 2. e computation cost of our scheme mainly lies in the expensive operations such as multiplication, exponentiation, and pairing. Other operations like hash function and addition only incur negligible costs, so we omit them when analyzing the computation cost. For simplicity, we use T mul , T exp , and T p to represent the overhead of multiplication operation, exponentiation operation, and pairing operation on group G, respectively. Suppose there are n blocks in total, of which c blocks are challenged. It is easy to see that the entire efficiency of the scheme is mainly dependent on the efficiency of the algorithms TagGen, TagVerify, ProofGen, and ProofVerify. However, the TagGen and TagVerify are run only once, its impact on the overall efficiency of the audit protocol is negligible. erefore, we only make comparisons to evaluate the efficiency of the algorithms ProofGen and ProofVerify. It is easy to find that our scheme takes the same number of multiplication operations as [48], but one more than all four remaining schemes in proof generation. Our scheme has the same exponentiation operation as the first three schemes. Meanwhile, [25] has one less exponentiation operation than ours, and [24] has one more than ours. e pairing operation is the most time-consuming operation, but it occurs only in [24,47,48]. In proof verification, the scheme [25,47] needs two pairing operations and the scheme [49] needs three paring operations, but in the scheme [48], the paring operation is linear with the number of challenged blocks, while our scheme and the scheme [48] reduces (c + 1) multiplication operation and two exponentiation operations compared with the scheme [47,49]. Although [24] outperforms other schemes in terms of exponentiation and paring operations, it does increase linearly in terms of multiplication operations. Nonetheless, the above schemes all make various computational concessions for functionality while satisfying their proposed functional properties on the basis of security. So, the mere computation cost comparison can only be used as a meager reference.

Gas Cost Analysis.
Gas is the fuel to be paid for running smart contracts on Ethereum. It measures how much "work" needs to be done for an operation or a series of operations. e gas prevents junk transactions from blocking the network and serves as additional income for miners. We deployed our smart contracts on Rinkeby [50], which is an Ethereum test net (or test network). e only difference in whether all auditors reach the same conclusion is that there is an additional step of arbitrate by the data owner at the end. All other steps are exactly the same. erefore, we only explain the case that requires the data owner's arbitration. Because any number of lying auditors can be detected as long as there exists at least one honest auditor, and because the number of dishonest auditors have no effect on the final result, we introduce only two auditors in the experiment: an honest auditor and a dishonest one. Figure 6 illustrates such an audit assignment, where the vertical coordinates represent each participant, and the horizontal coordinates portray how much Wei of gas each participant spends to execute a certain algorithm. Wei is the smallest unit of currency in Ethereum. 1 Ether � 10 18 Wei. Note that, Inform Submit Hash Key 1, Inform Submit Hash Key 2, and Inform Proof Gen are events emitted (which are only auxiliary steps), so we did not list them in Section 3.3 for clarity. Submit Hash Key 1 and Submit Hash Key 2 are substeps of 2PC, so they have not been listed, either. As we can see from the figure, the two algorithms with the highest gas cost are Deploy TMSC and Request Audit, because these two algorithms involve the deployment of smart contracts. e large amount of gas consumed by smart contract deployment comes from two aspects, on the one hand, the CREATE op code of the smart contract, which is called during contract creation, costs a fixed 32, 000 gas; on the other hand, from the storage of contracts, more byte code means more storage, and each byte costs 200 gas. is adds up very quickly. And the left operations require very little gas overhead. Fortunately, the Deploy TMSC algorithm is a management contract for audit assignments that is deployed only once in an audit system, while the Request Audit algorithm is instantiated once for every audit assignment executed. Other operations require less gas overhead. In an audit assignment, the increase of gas for each additional auditor is less than 400, 000. e gas overhead for the other participants is fixed, except when data owner arbitration is required, which needs an additional 100, 000 gas, but this overhead is insignificant compared to the reward it can earn.

Conclusion
In this paper, we design a remote data integrity audit auditing scheme based on the Ethereum smart contract. e challenge of this scheme is jointly generated by all Ethereum users participating in the audit. When auditing results are inconsistent, the data owner will complete the final Table 2: Computation comparison with some existing schemes.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that there are no conflicts of interest with this study.