A Scalable Blockchain-Based Integrity Verification Scheme

Ensuring the integrity of remote data is the prerequisite for implementing cloud-edge computing. Traditional data integrity veri ﬁ cation schemes make users spend a lot of time regularly checking their data, which is not suitable for large-scale IoT (Internet of Things) data. On the other hand, the introduction of a third-party auditor (TPA) may bring about greater privacy and security issues. We use blockchain to address the problem of TPA. However, implementing dynamic integrity veri ﬁ cation with blockchain is a bigger challenge due to the low throughput and poor scalability of blockchain. More importantly, whether there is a security problem with blockchain-based integrity veri ﬁ cation is not yet known. In this paper, we propose a scalable blockchain-based integrity veri ﬁ cation scheme that implements fully dynamic operations and blockless veri ﬁ cation. The scheme builds scalable homomorphic veri ﬁ cation tags based on ZSS (Zhang-Safavi-Susilo) short signatures. We exploit smart contract technology to replace TPA for integrity veri ﬁ cation tasks, which not only eliminates the risk of privacy leakage but also resists collusion attacks. Furthermore, we formally de ﬁ ne a blockchain-based security model and prove that our scheme is secure under the security assumption of cryptographic primitives. Finally, the mathematical analysis of our scheme shows that both the communication complexity and the communication complexity of an audit are O ð c Þ , in which c is the number of challenge blocks. We compare our scheme with other schemes, and the results show that our scheme has the lowest time consumption to complete an audit.


Introduction
The rapid development of the Internet of Things (IoT) brings huge amounts of data. IoT devices store data on the cloud for cloud-edge computing. However, ensuring the integrity of remote data is the prerequisite for implementing cloud-edge computing [1].
Traditional cloud data integrity verification schemes [2,3] rely on techniques such as message authentication codes and hash functions to let users know the status of their data. Nonetheless, these heuristics have large computation and communication overheads since users need to retrieve all data. Some schemes [4,5] reduce the verification overhead of the integrity verification system by constructing homomorphic verification tags. While these schemes enable quick auditing of data, users still need to spend a lot of time audit-ing their data periodically. To reduce the auditing burden on users, third-party auditors (TPAs) [6] are introduced to perform auditing tasks on cloud data. However, in real-world scenarios, TPAs are not completely trustworthy, and there are two threats [7][8][9]. First, a malicious TPA may extract data privacy by auditing the same data blocks over and over again. Second, a malicious TPA may collude with cloud servers to produce fake audit results.
Fortunately, blockchain smart contract technology [10,11] makes it possible to address these issues simultaneously. Smart contracts are encapsulated scripts that can be automated for execution. Therefore, we can use smart contracts to perform auditing tasks instead of TPAs. However, the low throughput and poor scalability of blockchain make it difficult for blockchain to be used in dynamic cloud storage. Therefore, it is a huge challenge to address the scalability of integrity verification schemes in blockchain network environments. More importantly, whether the security of integrity verification schemes is affected in the open network environment of blockchain should be noticed. To the best of our knowledge, there is no scheme that gives formal security proof. Therefore, it is essential to give proof of security for blockchain-based integrity verification schemes.
In this paper, we propose a scalable blockchain-based integrity verification scheme that enables fully dynamic actions such as insertion, deletion, and modification to address the issues raised above. We create a scalable homomorphic verification tag based on the ZSS (Zhang-Safavi-Susilo) short signature, which uses basic cryptographic hash functions such as SHA-1 or MD5 and does not require expensive specific hash algorithms to accomplish scalability. The scheme supports blockless verification that allows users to audit their data without retrieving all of it. In addition, we use blockchain smart contract technology instead of TPA for the task of integrity verification, which not only eliminates the risk of privacy leakage but also protects against collusion attacks. To evaluate the level of security of our scheme in a blockchain environment, we formally define a blockchainbased security model and demonstrate that the scheme is secure against adaptive chosen message attacks under the security assumption of cryptographic primitives.
1.1. Contributions. The following are the main contributions of this paper: (1) We propose a scalable blockchain-based integrity verification (SBB-IV) scheme that implements fully dynamic operations and blockless verification. The scheme achieves scalability under blockchain networks by building scalable homomorphic verification tags (HVTs) based on ZSS short signatures, which use general cryptographic hash functions and do not require expensive special hash functions (2) We exploit smart contract technology to replace TPA for integrity verification tasks, which not only eliminates the risk of privacy leakage but also resists collusion attacks. Furthermore, we formally define a blockchain-based security model that captures the semantic security of adaptive chosen message attacks (CMA). We show that the SBB-IV scheme is secure against adaptive CMA under the security assumption of the q-CAA problem (3) The mathematical analysis of our scheme shows that both the communication complexity and the communication complexity of the scheme are OðcÞ, in which c is the number of challenge blocks. In addition, we do a series of tests on Hyperledger Fabric V2.2 and compare our scheme to the current stateof-the-art. Our technique is more efficient, as it takes only 2.3 seconds to conduct an audit when 1% of the data blocks are faulty 1.2. Paper Organization. The remainder of this work is arranged in the following manner. We provide an overview of related works in Section 2. Preliminaries are shown in Section 3. The network, threat, framework, protocol, and security model are all shown in Section 4. The detailed algorithms are presented in Section 5. We examine the correctness, dynamic, and security in Section 6. The mathematical analysis and experimental results are presented in Section 7. The paper comes to a close with Section 8.

Related Works
2.1. Traditional Data Integrity Verification. Provable data possession (PDP) [12] and proofs of retrievability (POR) [13] are two types of data integrity verification models. The PDP model was formally specified by Ateniese et al. [12], who presented an HVT based on RSA (Rivest-Shamir-Adleman) signatures. They separated the data into blocks and calculated the HVTs for each one. The user then chose a fixed number of blocks for verification at random. Although the sampling approach decreases the computing cost from linear to constant, the scheme is not capable of dynamic operations due to the fixed index of blocks. Juels et al. [13] presented a sentinel-based POR technique in which data segments (sentinels) were randomly inserted into the full data encoded using error correction codes. Due to the limited number of sentinels, it can only undertake limited auditing. BLS (Boneh-Lynn-Shacham) signatures were utilized by Shacham et al. [14] to create HVTs, which reduces communication overhead because the BLS signature is shorter than the RSA signature. Wang et al. [8] described how to build a dynamic PDP system using Merkle tree, an authenticated data structure. Similarly, Erway et al. [15,16] proposed a skip-list-based dynamic-PDP (DPDP) system. Instead of using a fixed index, these data structures indicate block positions in terms of the order of leaf nodes, allowing blocks to be dynamically inserted at varied locations. However, because these data structures require supplementary information to validate the leaf node placements, they have a computational and communication complexity of Oðlog nÞ, making them unsuitable for large-scale data. By first encrypting the data and then providing some precomputed hashes of the encrypted data to the TPA, Shah et al. [6,17,18] introduced a TPA to audit the data. The TPA, on the other hand, will be unable to continue auditing after the hashes run out. Furthermore, a hostile TPA may collect information by auditing the same data blocks over and over again. Although random mask approaches [5,9,19,20] have been devised to obscure the linear combination of data and prevent the TPA from extracting it, they are still ineffective in preventing collusion attempts.

2.2.
Blockchain-Based Data Integrity Verification. By replacing the integrity management service of centralized nodes with a fully decentralized data integrity service, Liu et al. [10] proposed a blockchain-based Internet of Things (IoT) data integrity service framework that eliminates TPA. However, as they only implemented the proposed protocol's basic features, the efficiency of building smart contracts for IoT devices is insufficient for large-scale IoT data. To assure data availability and privacy, Liang et al. [23] suggested a decentralized and dependable cloud data source protection architecture. The architecture used tamper-proof blockchain records and embedded data provenance in blockchain transactions, with auditors verifying the data's origins based on the information in the blocks. Paying the blockchain miners, on the other hand, would be prohibitively expensive for cloud customers. To address the problem of unreliability in traditional verification procedures, Yue et al. [22,24] presented a blockchain-based P2P cloud storage data integrity verification methodology. The approach used the Merkletree to verify data integrity and examined system performance using various Merkle-tree architectures. Wang et al. [25] proposed a decentralized architecture to tackle the traditional paradigm's single-point trust problem through communal trust. The architecture built a public protocol that maintains the data state under public scrutiny and prevents storage parties from engaging in fraudulent activities. For large-scale IoT data, Wang et al. [11] developed a blockchain-based data integrity verification system. They constructed a prototype system of edge computing processors near IoT devices to preprocess large-scale IoT data and performed data integrity verification in the form of transactions. None of the aforementioned approaches provide formal proof of security, and the security of integrity verification in a blockchain network setting remains an open question. We compared our scheme with the state-of-art, as shown in Table 1.

Smart Contract.
A blockchain is a distributed database that uses encryption, hashing, timestamping, consensus mechanisms, and other techniques [26]. All operations (transactions) are recorded on the blockchain, which is a chained data structure with tamper-proof features. A smart contract is a blockchain-based event-driven program [27]. It is contained within a virtual node that allows automated script execution and data processing in response to event triggers. Smart contracts, like transactions on the blockchain, offer distributed storage and tamper-proof characteristics. Being different from traditional executable programs, smart contracts are distributed and run according to preset rules that create communication protocols between communicating parties [28]. As a consequence, smart contracts enable traceable and irreversible activities without the involvement of a third party.

ZSS Signature.
Let g be the generator of the group G which is a cyclic additive group with the large prime order p. Allow G T to be a cyclic multiplicative cyclic group of order p. Let e : G × G ⟶ G T be a bilinear pairing if it satisfies the following properties: (1) Bilinear: ∀P, Q ∈ G, and a, b ∈ ℤ p , the equation eðaP , bQÞ = eðP, QÞ ab holds (2) Computability: ∀P, Q ∈ G, there is an effective algorithm to calculate eðP, QÞ (3) Nondegenerate: ∃P, Q ∈ G, such that eðP, QÞ ≠ 1, which means that the map does not send all pairs in G × G to the identity in G T . The ZSS signature [29] includes three algorithms: KeyGen, Sign, and Verify. Let H : f0, 1g * ⟶ f0, 1g λ be a secure hash function.
(i) KeyGen: Randomly select an integer α ⟵ ℤ * p , and compute αg. The private key is sk = α, and the public key is pk = αg (ii) Sign: Given a message m ∈ f0, 1g * , the signature is Sig = 1 HðmÞ + α g (iii) Verif y: Given a signature Sig, a public key pk, and a message m, compute HðmÞ, and verify the equation:

Wireless Communications and Mobile Computing
If the equation holds, the signature is valid; otherwise, the signature is invalid. Figure 1 depicts the SBB-IV scheme's network model, which consists of three entities: data owner devices (DODs), cloud service providers (CSPs), and smart contracts.

Network and Threat Model.
(1) DODs: DODs act as nodes on the blockchain network, outsourcing users' data to CSPs and paying for the execution with smart contracts (2) CSPs: Data storage and maintenance services are provided by CSPs, which are connected to the blockchain network as nodes (3) Smart contracts: Smart contracts are virtual nodes that contain automated scripts. They cannot be destroyed or modified by any enemy DODs outsource users' data to CSPs and pay for the execution through smart contracts on the blockchain network. Smart contracts issue a challenge to audit cloud data integrity. Based on the proof created by CSPs, smart contracts use the verification algorithm to check the proof's validity and deliver the outcomes to DODs. Finally, the blockchain keeps track of everything. The TPA collusive attack is avoided in this process because smart contracts are automated execution scripts. As a result, only the threat model described below is considered in this paper.

Malicious CSP.
The malicious CSP knows the data and the public information; the purpose of the malicious CSP is to cheat smart contracts. That is, the malicious CSP owns the knowledge <Data, public inf ormation > and wants to find fake proof to pass the verification of smart contracts.  In the Challenge stage, DOD sends a random seed to the challenge smart contract (C-SC). Then, using the algorithm chal ⟵ ChallengeðseedÞ, C-SC produces a challenge and transmits it to CSP and A-SC.
In the Verify stage, CSP creates a proof using the procedure P ⟵ Responseðpk, chal, Φ, MÞ and delivers it to A-SC, according to chal. The proof is then verified by A-SC using the algorithm Verif yProof ðpk, chal, P, HSÞ, and the result is sent to DOD.

Blockchain-Based Security
Model. An interactive game between a challenger C, a smart contract S, and an adversary A defines the blockchain-based security model. In the Setup phase, we convey the challenged data M to the adversary to capture the semantic security of adaptive chosen message attack. As a result, in the Query phase, the adversary might adaptively pick multiple data blocks for the HVT query. The game is played in the following manner: (i) Setup: C generates and sends a public key pk to A.
Then, C sends a data M = fm 1 , m 2 , ⋯, m n g and its HVTs sequence Φ to A. Finally, C sends the hash sequence HS to S The game process can be referred to as Figure 3. The security definition of the scheme is as follows.

Definition 1. If any probabilistic polynomial-time adversary
A cannot win the game with nonnegligible probability, the SBB-IV scheme is secured against the adaptive chosen messages attack when the integrity of the remote data M is violated.

Scheme
Detail. The SBB-IV scheme is described in full in this section: (1) KeyGenð1 κ Þ ⟶ ðpk, skÞ. The algorithm selects a random number from the ring, α ⟵ ℤ * p , as a private key sk, and then computes Y = αg as a public key, according to the security parameter κ (2) HVTGenðsk, MÞ ⟶ ðΦ, HSÞ. The algorithm firstly splits a data M into n equal length blocks; that is, M = fm 1 , m 2 , ⋯, m n g. Next, for 1 ≤ i ≤ n, it calculates the hash value Hðm i Þ for a block m i and then computes the HVT as following equation: Finally, it outputs a HVTs sequence, Φ = fδ 1 , δ 2 , ⋯ , δ n g, and a hash sequence HS = fHðm 1 Þ, Hðm 2 Þ, ⋯ , Hðm n Þg.
(5) Verif yProof ðpk, chal, P, HSÞ ⟶ f1, 0g. The algorithm accepts pk, P, chal = fði j , v j Þg 1<j<c and HS as input and then calculates: Finally, the output is depending on whether the following equation holds: If the equation holds, the algorithm outputs 1; otherwise, it outputs 0. (2)) is based solely on the block and excludes a fixed numerical index, it can enable completely dynamic operations such as modification, insertion, and deletion. The technique generates a hash sequence that records the location of each block. As a consequence, the following procedure is used to update the data:

Dynamic. Because the HVT built in the scheme (as shown in Equation
Step 1: DOD delivers an request to A-SC, Request = ðop e, pos, conÞ, where ope represents the updated operations, p os denotes the updated position, and con represents the updated content. Note that when a delete operation is performed, con is empty Step2: According to the request, A-SC performs the corresponding update operation and records the modification on the blockchain Step3: When the blockchain is recorded successfully, DOD sends the Request to CSP to complete the update 5.4. Implementation. In our scheme, we encapsulate the Ch allenge algorithm and the Verif yProof algorithm into smart contracts to perform the task of auditing instead of TPA. Users can trigger the execution of smart contracts by sending seed. This Seed not only includes the number of audit blocks and security parameters, but users can also set parameters such as the audit cycle time and the number of audits performed according to their needs. The parameter c is the number of randomly sampled blocks in one audit. The parameter c is larger, the higher the audit confidence and the higher the computational overhead. Therefore, users set different c according to their needs to make a trade-off between different confidence levels and computation overhead. The tamper-proof nature of smart contracts eliminates the possibility of privacy leakage and collusion attacks. Because to the collision resistance and one-way nature of the hash function, an attacker cannot access the data through the hash value, despite the fact that we have put the hash sequence on the public smart contract.
6. Scheme Analysis 6.1. Correctness. The PRP and PRF functions in the Challe nge algorithm of the SBB-IV scheme ensure that the blocks are randomly picked for each audit, making it impossible for a malicious CSP to prepare proofs ahead of time. If the remote data is preserved, the proof P generated by the Res ponse algorithm will always pass the Verif yProof algorithm's verification. The scheme is correct in the following ways: 6.2. Security. We treat the hash function Hð·Þ as a random oracle and reduce the security of the SBB-IV scheme to the q-CAA problem [21].

Wireless Communications and Mobile Computing
Definition 2 (q-CAA problem). For an integer q, and α∈ R ℤ p , g ∈ G, given where w 1 , ⋯, w q ∈ R ℤ p , to compute ð1/w + αÞg for some w ∉ fw 1 , ⋯, w q g.
q-CAA assumption. The q-CAA problem is ðt, εÞ-hard if for a t-time adversary A, the advantage of A to solve the problem is negligible: where ε is a negligible probability, and w 1 , ⋯, w q ∈ R Z p .
Theorem 3. Suppose the ðt, εÞ-q-CAA assumption holds in the group G, our scheme is ðt, εÞ-secure against adaptive chosen message attack under the random oracle model.
Proof. If an adversary A can break the security of the SBB-IV scheme, we will show a challenger C how to use A to solve the q-CAA problem. The challenger C has known that g ∈ G, Y = αg, w 1 , w 2 , ⋯, w q and δ 1 = ð1/w 1 + αÞg, δ 2 = ð1/w 2 + αÞg, ⋯, δ q = ð1/w q + αÞg, and her goal is to calculate ð1/w + αÞg for some w ∉ fw 1 , w 2 , ⋯, w q g. Therefore, an interactive game between a challenger C, a smart contract S, and an adversary A as follows: (i) Setup: The challenger C generates the public key pk, Y = αg and sends it to A. Then, C selects a data M = fm 1 , m 2 , ⋯, m n g and constructs its HVTs sequence Φ and its hash sequence HS as follows.
C maintains a list of tuples <w i , H i , m i > . The list is initially empty. For a block m i , C selects a w i ∈ fw 1 , ⋯, w n g and computes When the block audited is corrupted, suppose that there is a block m j corrupted and A can forge a fake proof that passes the verification with a nonnegligible probability.
We assume that the fake proof is P * = fθ * , u * , η * g, where When S verifies the proof P * by executing the algorithm Verif yProof , it computes Wireless Communications and Mobile Computing Therefore, the process of verification is as follows: If the fake proof passes the verification, we get eðη * , gÞ · eðμ * + θ * , gÞ = eðg, gÞ. Hence, from the above derivation, we get the following equation: where −gðv j /δ * j Þ + v j ðHðm j Þ + m * j + αÞ = 0. That is, gð1/δ * j Þ = Hðm j Þ + m * j + α. As a result, we get δ * j = ð1/Hðm j Þ + m * j + αÞg. Since we have assumed that m j ≠ m * j , we will discuss it in two cases.
(i) Case 1: Hðm j Þ + m * j = Hðm j Þ + m j: In this case, we get m * j = m j , which contradicts our hypothesis. Therefore, this case proves that the block m j is not corrupted when the adversary A wins the game.
(ii) Case 2: This case shows that when the adversary A finds a fake HVT δ * j with a nonnegligible probability in a time t, the challenger C finds a ð1/w + αÞg for some w ∉ fw 1 , ⋯, w q g with same nonnegligible probability in a time t, which means C breaks the q-CAA problem.
In summary, suppose the ðt, εÞ-q-CAA assumption holds in the group G, our scheme is ðt, εÞ-secure against adaptive chosen message attack under the random oracle model.

Scalability.
In an IoT data storage system, with the continuous increase of IoT data, cloud storage needs to have scalability. In the SBB-IV scheme, we divide large data into smaller blocks, which is beneficial to the fine-grained control of the data and enhances the scalability of the cloud storage system. In the meanwhile, the proposed scheme is fully dynamic which means node devices can insert, modify, and delete uploaded data according to their needs. Furthermore, the scheme can be compatible with more systems without compromising efficiency, since HVTs are computed using general cryptographic hash functions rather than expensive elliptic curve hash functions [30]. As a result, the scheme is suitable for the integrity verification of largescale IoT data.
In addition to IoT systems, our scheme can also be applied to a blockchain-based P2P (peer-to-peer) file system. In this system, an edge device is a peer node, and each peer node can become a client or server. Our scheme solves the bandwidth problem of sharing files from a central server to clients. Files can be shared through different nodes without requesting all files from a central server. At the same time, due to the homomorphism of HVTs, the speed of nodes verifying file integrity is greatly improved. Therefore, the SSB-IV scheme greatly improves the scalability and efficiency of file sharing

Evaluation
To justify the performance of the SBB-IV scheme, we conduct mathematical analysis and a series of experiments in this part. The pairing-based cryptography library (PBC, http://crypto.stanford.edu/pbc/) is used in our experiments. The experiments are implemented in the GoLang programming language and run on an Intel(R) Core(TM) i7-10700 CPU with 16 GB of RAM. The blockchain platform is Hyperledger Fabric 2.2.0. The security level has been set to 80 bits, implying that the jpj = 160. We set each block's size to 8 KB and produce 1000, 5000, 10000, 50000, and 100000 blocks for the test. We present the average values across these 10 trials throughout the examination. In the Setup phase, user sends pk, M, Φ, and HS, in which the communication complexity is OðnÞ. In the Challenge phase, C-SC sends challenge chal = fði, v i Þg s 1 ≤i≤s c which is 2cjpj bits. In the Verify phase, CSP sends proof P = fθ, u, ηg which is 3jpj bits. Therefore, the communication complexity of an audit is OðcÞ.

Experiments.
In this section, we evaluate the actual performance of the scheme with a series of experiments.

Setup.
In the Setup stage, the user's main computation overhead comes from the HVTGen algorithm. At the same time, the smart contract needs to store a hash sequence HS . In our experiments, we set the number of blocks to 1,000, 5,000, 10,000, 50,000, and 100,000, respectively. Because each block is 8 KB in size, 100,000 blocks represent 780 MB of data. As shown in Figure 4, the time consumption of the HVTGen algorithm grows linearly, but the algorithm only needs to be executed once. For 780 MB of data, the smart contract's storage consumption is only 15.04 MB, which is easily achievable for a distributed ledger ( Figure 5). 7.2.2. Audit. As we discussed in Section 5, in the Challenge algorithm, the parameter c is larger, the higher the audit  To fully understand the time consumption of an audit, we locally tested the overall time consumption of three algorithms which include the Challenge algorithm, the Response algorithm, and the Verif yProof algorithm. We select other three blockchain-based schemes (YLZ- [22], WCF- [25], and WHZ- [11]) for comparison. In our experiments, we audit numbers c to 200, 300, 400, 500, and 700, respectively. The experimental results (as presented in Figure 6) show that our scheme has the lowest overall time consumption for one audit.
In addition, we test the time consumption of the Respo nse algorithm running locally and the time consumption of the Verif yProof algorithm running in the encapsulated smart contract. Our blockchain platform uses Hyperledger Fabric 2.2.0, and we build a test network on a virtual machine (Ubuntu 20.04). Let P x indicates a probability and t means the number of corrupted blocks, we get  Equation (13) shows that when t blocks are corrupted, different values of c will produce different confidence levels. Therefore, we assume that when 1% blocks corrupted, we set c = 300 and c = 460 to get 95% and 99% confidence, respectively, As presented in Figure 7, the time cost of Response and Verif yProof algorithm will remain even with the increase of the number of blocks.

Dynamic.
For dynamic simulation tests, we use n = 50000. The time it takes to edit a block in our situation is determined by the time it takes for the blockchain to write the record and the time it takes to produce HVTs. We con-figured an endorsement node in our test network to write operation records to the Hyperledger (https://hyperledgerfabric.readthedocs.io/en/latest/index.html). The time consumption of insertion and modification is linear as the number of dynamic blocks rises, but the time consumption of deletion remains constant, as shown in Figure 8. Because deletion does not involve the creation of new HVTs, insertion and modification take longer than deletion. The time spent on deletion is primarily due to the time spent writing records by the endorsing node. Because the method for both operations is the same, insertion and modification take almost the same amount of time.

Conclusion
This paper mainly solves three problems, including the problem of TPA's privacy leakage and collusion attack, the problem of poor blockchain scalability, and the security problem of blockchain-based integrity verification schemes.
To address the problems above, we propose a scalable blockchain-based integrity verification scheme that implements fully dynamic operations and blockless verification. The scheme builds scalable homomorphic verification tags based on ZSS short signatures. We exploit smart contract technology to replace TPA for integrity verification tasks, which not only eliminates the risk of privacy leakage but also resists collusion attacks. Furthermore, we formally define a blockchain-based security model that captures the semantic security of adaptive chosen message attacks. We show that our scheme is secure under the security assumption of cryptographic primitives. Finally, the mathematical analysis of our scheme shows that both the communication complexity and the communication complexity of an audit are OðcÞ, in which c is the number of challenge blocks. We compare our scheme with other schemes, and the results show that our scheme has the lowest time consumption to complete an audit.

Data Availability
The data that support the findings of this study are not openly available due to reasons of sensitivity and are available from the corresponding author upon reasonable request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.