HB-MHT: Lightweight and Efficient Data Integrity Verification Scheme for Cloud Virtual Machines

With the rapid development of cloud computing, cloud storage is widely used. In the cloud environment, users’ virtual machine systemmirrors and data are stored in the cloud server.+e escape of virtual machines and Trojan virus attacks make it challenging to ensure the integrity of virtual machine systems. Trusted computing is expensive to randomly verify data integrity and does not adapt to dynamic data changes. Provable data integrity is a potential solution to this problem. Merkle Hash Tree (MHT) model is widely adopted in provable data integrity. Although MHTrequires only a small amount of evidence for verification, the verifier’s number of hash calculations and the server’s efficiency of evidence query are not optimal. Moreover, the verification frequency of each piece of data is not considered by MHT. Properly handling these factors can improve the actual verification performance. In this paper, a lightweight and efficient data integrity verification approach called HB-MHT is proposed for the tenant virtual machine (TVM) in cloud computing. In HB-MHT, the Huffman hash tree scheme is used for small file verification to ensure that the hot file has a shorter path, which reduces the required amount of evidence for verification. Meanwhile, the B hash tree scheme is used for big files verification, which can effectively reduce evidence query time and hash calculation times. +e experimental results show that the scheme proposed in this paper can perform data integrity verification well, with reduced computing and storage overhead.


Introduction
e wide application of cloud computing provides users with convenient and cheap services, which greatly reduces the storage overhead and management burden of users. However, due to the huge scale of cloud computing systems and unprecedented openness and complexity, these systems are faced with security issues more severe than before. e centralized management of cloud computing centers has become the target of attacks, which endangers the data confidentiality, integrity, and availability of cloud platforms [1][2][3].
When users or enterprises outsource their data to cloud computing service providers or entrust cloud computing service providers to run their applications, cloud computing service providers have priority access to the data or applications. Cloud system tenants usually use encryption to ensure the security of the private data stored on the cloud server [4]. Meanwhile, the development of encryption technology and access control technology provides reliable confidentiality for the tenant's private data [5]. However, due to various risks such as internal personnel management negligence, hacker attack, or server system failure, cloud service providers cannot make users believe that their data has not maliciously tampered. Since all data of the tenant virtual machine are stored in the cloud, security issues often occur, such as the virtual machine escaping and the attack of virtual machines by illegal users outside the system. ese issues bring great challenges to the integrity protection of tenant data in the system. In the cloud environment, it is usually assumed that the cloud server is not trusted, and the tenant virtual machine can only establish trust through online interaction with the cloud server. In addition, there are many dynamic services in the cloud environment, thus effectively ensuring that the data integrity in the dynamic environment is crucial to users' experience of cloud services. Due to the openness and dynamic characteristics of cloud computing, it is difficult to establish trust, which makes the security of dynamic data integrity of tenants a problem. erefore, it is urgent to design an efficient integrity verification scheme to ensure the integrity of tenant virtual machine systems and data in the cloud. Building a secure and reliable integrity verification method has become a research hotspot in the field of information security. e traditional trusted computing method is based on the trusted computing technology proposed by the Trusted Computing Group (TCG) [6], which tries to provide credibility for the endpoints in the distributed computing environment. e Trusted Platform Module (TPM) is introduced into the hardware layer of the computing platform to provide a hardware-based trusted root for the computing platform. Starting from the trusted root, the integrity of the hardware and software of a local platform is measured layer by layer by the trust chain transmission mechanism. e traditional trusted computing methods have a low verification level and coarse granularity, and they are faced with the problem of high computational cost in system dynamic measurement. When verifying the integrity of a file, these methods need to calculate the integrity of all previous files in turn, which requires a lot of computing power to randomly verify data integrity and does not adapt to dynamic data changes.
Provable data integrity (PDI) is considered a potential solution to this problem. According to whether fault-tolerant processing is adopted, PDI can be divided into two types: POR model of data recovery proof and PDP model of data holding proof [7,8]. PDP judges whether the cloud data is damaged, while POR attempts to recover the data after identifying the damaged data. PDP pays more attention to detection efficiency, while POR pays more attention to data recovery. From the perspective of monitoring Trojans or Viruses destroying cloud data, this paper mainly discusses the PDP mechanism.
Merkle Hash Tree (MHT) is commonly used in PDP for the cloud environment. MHT is a well-studied authentication structure [9] that intends to efficiently prove that a data set is undamaged and unaltered. It is constructed as a binary tree where the leaves are the hashes of authentic data values. Each middle node is the hash of the concatenation of its left and right child nodes, and the root node is signed by the integrity management authority. e research on MHT has greatly improved the efficiency of PDP. However, there are two issues that MHT does not consider. One issue is that the verification frequency of specific data is not considered, and each data has an equal authentication path. It is reasonable that the frequently verified data should have a shorter path. e other issue is that MHT designs the authentication structure for reducing the number of evidence but ignores the impact of the number of hash calculations of the verifiers and the evidence query time of the servers, which may be important performance constraints in the actual environment.
To address the aforementioned challenges, this paper proposes a data integrity verification approach called HB + -MHT for TVM. HB + -MHT adopts different verification schemes according to the file size. Small files are loaded into the TVM, and the optimized hash tree is adopted for verification. In this case, compared with the communication overhead of transmitting small files to TVM, the traditional PDP methods with multiple signature operations might take more verification time. For large files, an improved PDP method is adopted because the communication overhead of transmitting large files to TVM may be too high. Our contributions are summarized as follows: (i) e Huffman Merkel scheme is designed to verify the integrity of small files. According to different integrity verification frequencies or weights, the authentication path of hot files is optimized and shortened effectively. (ii) e B + hash tree scheme is designed to verify the integrity of large files, which optimizes the server evidence query and reduces the number of client hash verifications. (iii) Experiments show that our scheme can realize data integrity verification well, which reduces the computational overhead effectively and achieves high verification efficiency. e rest of the paper is arranged as follows. Section 2 discusses the related work, Section 3 describes the detailed design of the scheme, Section 4 evaluates and analyzes the experimental results, and finally, Section 5 concludes the paper.

Related Work
TCG introduces the TPM in the hardware layer. Based on the trusted root and the trusted chain transfer mechanism, the integrity measurement can be implemented for the hardware and software layers of the local platform. However, the integrity measurement mechanism proposed by TCG has low verification levels. It can only verify the credibility of the operating system of the computing platform and does not specify the verification method for application layer credibility. In the process of verification, the whole measurement list should be sent to the verifier, which will easily cause measurement leakage. Reiner Sailer et al. [10] proposed IMA, a more secure system integrity measurement architecture. e system kernel maintains a list of metrics and hashes the measured values to obtain the aggregation values as a reliable basis for the verification of the metrics. Meanwhile, the system extends the concept of TCG trust metrics to the application layer dynamic system. However, every verification needs to hash all the measurements again to obtain the aggregate value, which greatly reduces the efficiency of the system operation. Also, the privacy protection of the TCG remote authentication mechanism is still unsolved. To solve the shortcomings of IMA, Xu et al. [11] store the measurement with the Merkle tree structure so that only the authentication path related to the measurement needs to be sent. However, since the Merkle tree is a balanced binary tree, too many middle nodes need to be queried. So, the average search complexity is high, and this method needs to be further optimized. e current PDP mechanisms mainly rely on MAC authentication code, RSA signature or BLS signature, and they can support dynamic operation, multiple copies or privacy protection, etc. Based on the data holding proof of MAC authentication code [12], the MAC value of the message authentication code is used as the verification metadata to verify the integrity of the data stored on the remote server. However, users need to download data for integrity verification, which results in a lot of communication overhead and privacy data leakage. Also, this method does not support dynamic data integrity verification. To solve the problem that dynamic data operation is not supported, Erway et al. [13] introduced the jump table dynamic data structure to support dynamic data operation. is scheme has some problems, such as a long authentication path, requiring a lot of auxiliary information, and high computational and communication overhead. Wang et al. [14] proposed a PDP mechanism based on the Merkle tree, which uses the MHT structure to ensure the correctness of data block location. Meanwhile, they used the BLS signature mechanism to ensure the integrity of data block content. Although the model supports dynamic data operations, the insertion of data is easy to increase the scale of MHT and make it out of balance. Shen et al. [15] proposed a fully dynamic structure combining bidirectional linked list and position array to support data dynamic update more effectively. Tan et al. [16] proposed a lightweight integrity verification scheme based on the jump table and BLS signature mechanism to reduce the overhead of generating verification metadata for verifiers. Sun et al. [17] proposed an adaptive authenticated data structure with privacy-preserving for big data stream in cloud, which can provide real-time authentication of outsourced big data. Jin et al. [18] verified the data that is accessed frequently in the cloud by signature verification. However, this method requires verifiers to download the data locally and verify the data that has not been queried for a long time through PDP.
In general, many methods reduce the amount of verification evidence through MHT and extend the structure of MHT for further performance optimization or security enhancement. However, MHTmay not be optimal in some cases. For example, a frequently verified file has the same amount of evidence as an infrequently verified file in the traditional MHT-based scheme. It is more reasonable for hotspot files to have shorter authentication paths. For the verification of large files, because the amount of evidence without transferring the file is significantly reduced in the PDP scheme, how the cloud storage server can quickly find the evidence and how the client can quickly calculate the hash values may be the main bottleneck. To this end, this paper proposes the Huffman hash tree and B + hash tree to overcome the bottleneck.

Detailed Design
3.1. Overall Framework. As shown in Figure 1, HB + -MHT consists of TVM, Cloud Storage Server (CSS), and Key Management Center (KMC). KMC securely distributes public and private keys to tenants and transmits the tenants' public keys to CSS. TVM runs the query agent to initiate a data integrity verification challenge to CSS. CSS queries the evidence and generates proof as a response. TVM verifies the data integrity by analyzing the correctness of the proof. e HB + -MHT cloud data integrity verification approach consists of two verification schemes. One is the Huffman hash tree (HHT) scheme, which is generally used to verify the integrity of the small file. Compared with the communication overhead of transmitting small files, the traditional PDP methods with more signature operations might take more computing time. erefore, for small files, this scheme adopts a method that loads these files into TVM and further combines hash tree and Huffman tree to realize HHT, thereby shortening the authentication path and improving the verification efficiency of the files with high verification weight. e other is based on the combination of the B + hash tree and BLS aggregate signature. It reduces communication overhead by not loading files into TVM and achieves no-copy verification of large files. Because the amount of evidence is significantly reduced when the PDP scheme is adopted, how the cloud storage server can quickly find the evidence and how the client can quickly calculate the hash values has become the main bottleneck. B + hash tree optimizes hash computation and evidence organization, which can effectively reduce the processing time of CSS and TVM.

B + HT: B + Hash
Tree. B + HT is an extension of the combination of B + tree and MHT, and it is also a balanced multichannel evidence lookup tree. For the B + HT with an order of m, each node in the tree has at most m subtrees, and all nonleaf nodes except the root have at least m/2 subtrees. A middle node consists of a tuple (C 1 , H 1 , P 1 , C 2 , H 2 , P 2 , . . ., is the number of leaf nodes of its i-th subtree; H i is the hash value of the concatenation of all data items in the root node of its i-th subtree; P i points to the root node of its i-th subtree. e leaf nodes represent all data blocks of the file. e root node has one more item than the middle node, which is the signature of the root node data. e data of B + HT should be usually stored on a disk due to limited memory. Most operating systems read data in blocks and put data written at one time into one disk block or multiple consecutive disk blocks. erefore, if the size of the B + HT node is slightly smaller than the disk block size u, the quick query of evidence in nodes can be achieved. Let v be the size of subtree information in a node; then, m should be approximately equal to u/v. To ensure performance and tree balance, the number of subtrees of each middle node cannot be less than ⌈M/2⌉. Each hash calculation in MHT only takes the data of two subtrees as the input, while each hash calculation in B + HT takes the data of multiple subtrees as the input. erefore, B + HT can obtain the total hash value of all blocks through much fewer hash operations than MHT, which means that the number of hash calculations for Security and Communication Networks evidence verification is greatly reduced. Each middle node stores the number of leaves of each subtree, which can realize a fast data block search. As an example, Figure 2 shows looking up the data block d 11 . According to the root node, it is deduced that d 11 is the seventh leaf node of V 3 's subtree. Similarly, according to the V 3 node, it is known that d 11 is the fourth leaf node of V 7 's subtree.
We define authentication path path i about data d i as the set of passing nodes on the path from node i to root node. e set of node hash values of the sibling nodes of all nodes on the authentication path is called auxiliary authentication information, which is denoted as Ω i . As shown in Figure 2, the authentication path about d 11 is Path 11 � v1, v3, v7 { } and its auxiliary authentication information , h(d 10 )〉}. Obviously, the authentication path is much shorter than MHT.

HHT: Huffman Hash Tree. Huffman hash tree (HHT)
is an extension of the combination of the Huffman tree and MHT, in which each leaf node is assigned a weight to reflect the frequency of data verification. A middle node is represented as a tuple (LC, LP, H, RC, RP), where H is the hash of the data concatenation of its left child and right child; LP and RP point to the left and right child nodes, respectively; LC and RC are the numbers of leaf nodes contained by the left and right children, respectively. In HHT, the leaf nodes are numbered sequentially from left to right. e path of a leaf can be quickly located according to LC and RC stored in the middle nodes. e root node also adds a signature to the hash value H. For a leaf l, this paper defines l's weighted path length as the product of l's path length and l's. Let AWPL be the average weighted path length of all leaf nodes in the tree. Suppose the weights of n leaf nodes are (w 1 , w 2 , w i , . . . , w n ) n i�1 w i � 1; then, the binary hash tree with the smallest AWPL is called the Huffman hash tree.
For example, in Figure 3, there are four files a, b, c, and d with verification probabilities of (0.6, 0.2, 0.1, 0.1), respectively.
ree binary hash trees are constructed with (I) AWPL � 2, (II) AWPL � 2.8, and (III) AWPL � 1.6. It can be seen that AWPL of the tree in (III) is the smallest. It can be analyzed and verified that the tree in (III) is exactly a Huffman hash tree. e general process of generating HHT is as follows. (1) According to the given weights (w 1 , w 2 , w i . . . w n ) , n i�1 w i � 1, the set F � T 1 , T 2 , . . . , T M of n binary trees are formed. Each T i has only one node, and its left and right subtrees are empty. (2) Select two trees T s1 and T s2 with the smallest root node weight from F and build a new binary tree T new with T s1 and T s2 as its left and right subtrees. e weight of T new 's root node is the sum of the weights of T s1 and T s2 , and LC, LP, RC, and RP of T new 's root node are assigned corresponding values according to T s1 and T s2 . (3) Remove T s1 and T s2 and add T new to the set. (4) Repeat Steps 2 and 3 until there is only one tree in F. is tree is a Huffman hash tree.
e KMC uses the key generation algorithm. Let G 1 be the multiplicative cyclic group of a large prime. p and g are the generators of G 1 . Randomly select a SK∈ R Z p , and calculate PK � g ke . en, the public parameter is PK, and the system master private key is MSK. KMC generates the tenant's private key sk and the tenant's public key according to the tenant's user i d. e generated keys are sent to TVM.
Step 2. TVM establishes a hash tree HHT of integrity measurements. e generation of HHT is shown in Algorithm 1. e data stored in the leaf node of HHT is the hash value of the initial small files of the system. e internal nodes are constructed according to the properties of the hash tree and generated by hash operations. TVM sings the HHT root node with the tenant private key, and a timestamp [19] is added to the signature. Sig sk (Nodehash rootnode ) � H(Nodehash root node node‖time stamp stamp) sk .
Step 3. TVM sends the file set that needs to be verified, its HHT and sig sk (Nodehash rootnode ), to CSS. e data are saved and maintained by CSS.   (2) Challenge Verification Phase. Challenge verification is initiated by TVM or a third-party auditor (TPA) on behalf of the tenant. TVM generates a query challenge about the file f i and sets it to CSS. en, CSS generates the authentication path Ω i according to HHT and sends a proof pro � Ω i , sig sk (Nodehash rootnode ) to TVM. After receiving the proof, TVM calculates the hash value of the file and uses Ω i to calculate Nodehash rootnode ′ . e verification process can be expressed as e(Nodehash rootnode � � � �timestamp) e (sig(Nodehash rootnode ), g). If the verification is successful, the integrity of the file is not damaged. For the file insertion operation, since a newly added file f new has not been verified and has the lowest weight, a leaf node f low with the lowest weight is selected, and f low is expanded into a subtree f subtree with only three nodes. F subtree 's left child is f new, and the right child is f low . Accordingly, all nodes on the path from f subtree 's root node to the root node of the tree need to be updated, so the complexity of the insertion update operation is the height of the tree.
For the file deletion operation, if the nodes with high weight are deleted, the nodes with low weight need to move up as a whole, which requires a lot of calculation. Meanwhile, CSS needs to interact with TVM for a lot of information. Considering the overhead, this paper keeps the leaf nodes corresponding to the deleted file and does not move them, except that the hash values stored in them are cleared. Accordingly, all nodes on the path need to be updated, so the complexity of the deleting update is the height of the tree. In the subsequent addition of files, these dummy node locations are preferred.
(4) Verification Frequency Change. Tenants' personalized verification requirements may change dynamically and often show local characteristics; i.e., some files are frequently verified in a certain period. e static HHT generated at one time may not provide good verification services under dynamic changes. Also, regenerating the whole HHT might    W hht and requests the tenant to resign the relevant path. is operation occurs when TVM verifies f. erefore, the evidence information related to f can be reused and verified at one time. In the global update mode, the Euclidean distance between the vectors W cur and W hht is calculated regularly. When the distance is greater than the threshold, HHT will be regenerated and resigned. Considering the high overhead of this method, the update interval is long, and the threshold is high. Local update follows the locality principle so that the hot files can always be verified quickly with a shorter authentication path. Moreover, since the update occurs when the tenant verifies the hot file, the reuse of evidence saves communication bandwidth and computational overhead. e global update ensures that HHT will not fall into local optimization and can maintain the adaptation of updated HHT.

B + HT Scheme for Big Files
(1) Initialization Phase. Firstly, TVM divides the file into fixed-size blocks to obtain the data block set D � (d 1 , d 2 , d 3 . . . d n ) and hashes all blocks to obtain the hash set μ � H(d i ) 1 ≤ i ≤ n . en, TVM randomly generates φ←Z p and uses the tenant private key sk to sign each data block d n , which obtains the signature set Φ � σ i 1 ≤ i ≤ n through σ i � (H(d i ) · φ d i ) sk . en, TVM uses the elements in μ as leaf node to construct the B + HT structure C and signs C with sk, i.e., sig sk (Nodehash rootnode ) � H(Nodehash rootnode � � � �timestamp) sk , where a timestamp is added to the signature. Finally, TVM sends D, Φ, C, sig sk (Nodehash rootnode ) to CSS for saving.
(2) Challenge Verification Phase. Challenge verification is initiated by TVM. TVM randomly selects the location set I of data blocks to be verified. en, it randomly generates ε i ←Z p and sends a challenge message i, ε i i∈I to CSS. CSS calculates μ � i∈I ε i d i ∈ Z p and ω � i∈I σ i ε i . Next, it sends μ, ω, and H(m i ) of the target data block and auxiliary authentication information Ω i of the data block as the proof pro to TVM.
After receiving the verification, TVM uses Ω i , H(d i ) i∈I to calculate Nodehash rootnode ′ and performs the verification: e h Nodehash rootnode ′ ‖timestamp, g sk � ? e sig Nodehash rootnode , g .
If the verification is correct, continue to judge: If the verification is successful, the integrity of the file is not damaged. e whole process of challenge verification is shown in Figure 5. Correctness: rough formula (1), TVM can verify the integrity of the data block d i of a file F stored in CSS without downloading d i . However, verifying the integrity of multiple data blocks in F does not guarantee that F is not destroyed, even if TVM has verified all data blocks. is is because formula (1) does not bind the location of the data block d i in F and untrusted CSS can use another data block d j in other locations. If the data of d j is correct, this fake can pass the verification of (1). Formula (2) limits the position of each data block with the hash tree. Although the hash tree can verify the content, the verification can only be conducted when the verified data block d i is loaded into TVM. erefore, only the combination of formulas (1) and (2) can realize the PDP verification of the content and location.
(3) Dynamic Data Update Phase. To make the B + HT scheme more suitable for the cloud environment, the integrity verification scheme must support the dynamic update of data. e block update operations include modification, insertion, and deletion.
If the verification is correct, auxiliary information Ω i and h(d ′ ) are used to calculate sig sk (Nodehash rootnode ′ ), and CSS updates sig sk (Nodehash rootnode ′ ).

(b) Data insertion
Data insertion operation inserts a new data block. Besides changing the physical structure of the file, this operation may cause node splitting that does not meet the structural properties of the B + hash tree and change the authentication structure C. We use the B + HT tree in Figure 2 to illustrate the insertion operation. Suppose TVM wants to insert a block d new after a block d i . First, TVM generates signature σ ′ corresponding to d new and then sends the insertion request message to CSS. After CSS receives the message, it first finds the node node involve− i that involves hash(d i ). Assume that the order of the B + HT tree is k. ere are two situations in the current node, as shown in Figure 7.

Security and Communication Networks
(1) e number of subtrees of node involve− i is less than k. In this case, H(d new ) is added to node involve− i , and the involved nodes on the authentication path are updated. Figure 7(a) shows the case of inserting a new node m after the block d 1 in Figure 2. en, CSS will directly generate a proof pro(Ω new , H(d new )).
(2) e number of subtrees of node involve− i is equal to k.
In this case, node involve− i needs to be split, and its parent node node parent− involve− i whose subtrees number is equal to k also needs to be split, etc. Specifically, CSS splits node involve− i into node involve− i− 1 and node involve− i− 2 . node involve− i− 1 involves the information of all subnodes before (including) hash(d i ) in node involve− i ，and node involve− i− 2 involves the information of all subnodes after hash(d i ) in node involve− i . e information of d new is appended to the end of node involve− i− 1 .
en, CSS updates the information of node parent− involve− i . e item about the subnode node involve− i in node parent− involve− i is divided into two items.
e hash values in the two items are, respectively, hash (node involve− i− 1 ) and hash (node involve− i− 2 )，and their number of leaf nodes and child pointers are assigned according to node parent− involve− i and node involve− i− 2 , respectively. Figure 7(b) shows the case of inserting a new node n after block d 9 in Figure 2. CSS will generate proof pro(Ω new , H(d new )) according to the new structure of B + HT. After receiving pro, TVM continues to calculate the value of the nodes in the authentication path until Nodehash rootnode ′ is obtained. If the verification passes, TVM updates the root signature and sends it to CSS. (c) Data deletion Data deletion removes a leaf node, and it is the opposite of data insertion. Assume the order of the B + HT tree is k. If a node d del is deleted, a simple case is that the number of subtrees of the node d hash-del involving hash (d del ) is greater than ⌈k/2⌉ − 1. In this case, the item of d del in d hash-del is deleted simply. Taking the B + HT tree in Figure 2 as an example, Figure 8(a) shows the processing of deleting node d 5 .
If the number of subtrees of d hash-del is less than ⌈k/2⌉ − 1. e rest of d hash-del must regenerate new nodes with its brother nodes to satisfy the properties of the B + HT tree. Similarly, the related parent node changes are processed. For example, Figure 8(b) shows the process of deleting node d 1 and merging its parent node with its peer neighbor node.

Security Analysis.
e security of HB + MHT is analyzed from the aspects of integrity verification and data update, including whether it can prevent untrusted CSS from cheating tenants with incorrect data, and whether it can prevent untrusted third parties from maliciously updating authentication data in CSS.

Theorem 1.
If the hash algorithm is collision-resistant and the signature algorithm is unforgeable, no adversary against the B + HT scheme could make the verifier accept in a data verification protocol instance with a nonnegligible probability, unless it responds by correctly calculating the value. Proof.
e BLS signature scheme is adopted, whose unforgeability is proved in [14]. e security of HHT and B + HT trees is mainly analyzed. It will be proven that if there is an adversary A who can break the scheme with nonnegligible probability advantage, then A has an algorithm B to find a pair of hash collisions with a nonnegligible probability advantage.
Given a block d i to be verified with its correct authentication path path i , and correct auxiliary authentication information Ω i , TVM's verification process can be denoted as a Boolean function verify (i, H(d i ), Ω i ) with parameters (i, H (d i ), Ω i ), and its calculation of path nodes starts from the leaf node. e analysis of the j th node on the path path i yields where the j-1 th node on path i is the r th child of the j node, K is the order of B + HT, and Ω , it is contradictory to the collision-resistant characteristics of the hash function. It can be concluded from the above discussion that the probability the adversary A can destroy the data verification protocol of the B + HT is negligible. erefore, the data verification protocol of the B + HT is secure. □ Theorem 2. If the hash algorithm is collision-resistant and the signature algorithm is unforgeable, no adversary against the B + HT scheme could make CSS accept in a data update protocol instance with a nonnegligible probability, unless it responds by correctly calculating the value.
Proof. When an adversary A submits an update request of inserting a block d malice ′ after the i th block to CSS, A will receive a message (i, H(d i ), Ω i ) from CSS. According to d malice ′ , Ω i and the current time timestamp now, A needs to recalculate the root value hash root of the tree and sign the hash value of the concatenation of hash root and timestamp now . Because the signature cannot be forged and A has no private key, A can only reuse a previously signed message signature old . Assume signature old � hash(hash root-old ||timestamp old ). e current timestamp timestamp new is greater than timestamp old and Ω i cannot be changed. If A can construct hash root-new by adjusting d malice ′ that satisfies hash (hash root-new ||timestamp new ) � hash (hash root-old ||timestamp old ), then A wins the security game. However, due to hash root-new ≠ hash root-old and timestamp new ≠ timestamp old , it is contradictory to the collision-resistant characteristics of the hash function. It can be concluded from the above discussion that the probability the adversary A can destroy the data insertion protocol of the B + HT is negligible. e analysis of the deletion and modification operations is similar. erefore, the data update protocol of the B + HT is secure.
From the perspective of security, HHT is a deformation of MHT, and the security of the authentication structure of the binary hash tree has been analyzed by [14]. erefore, the analysis of HHT is omitted here.

Performance Evaluation.
Here, representative schemes are compared to better evaluate the performance of HB + MHT. More specifically, HHT is compared with MHT, and B + HT is compared with Wang's scheme [14] in terms of space and time complexity. e experiment is performed in a simulation environment. TVM and CSS are, respectively, run on two hosts, which are equipped with an Intel(R) Core (TM) i7-8700k@ 3.70 GHz CPU and 16.0 GB RAM and run Windows 10 operating system. e two hosts are directly connected through a gigabit network cable. B + HT is implemented in C++ language, and the development tool is Visual Studio 2019. e hash operation uses SHA1. e signature operation uses BLS. Besides, a pairing-based cryptography library [20] is adopted to implement these cryptographic algorithms.

Space Complexity.
In PDP integrity verification schemes, TPA or TVM only stores a small amount of information such as signature and timestamp, which takes up a small storage space. e storage overhead of CSS is analyzed in Table 1, where lG 1 is the signature length, lH is the length of hash, M is the number of files, item hht is the size of the node in HHT, item mht is the size of the node in MHT, N is the number of file blocks, and K is the order of B + HT tree.
As shown in Table 1, the storage overhead of CSS is mainly related to the storage of tree nodes. For small file verification, HHT occupies slightly more storage space than MHT because the data of HHTnodes includes the number of leaf nodes in its subtree with a size of 32 bits. However, compared with the 160-bit size of the SHA's hash value stored in a tree node, the increment of storage space is small. For large file verification, the storage overhead of B + HT and Wang's scheme [14] for storing block signature is the same. However, B + HT occupies significantly less storage space than Wang's scheme for storing tree nodes. With the increase of K, B + HT can save storage space logarithmically than Wang's scheme [14]. Of course, K cannot increase infinitely, and it should not exceed the size of the disk block.

Time Complexity
(1) HHT Scheme. Because HHT is obtained by changing MHT, their time complexity is evaluated and compared. Integrity verification services are used more frequently than data update services. Here, the time from the initiation of a user request to the end of verification is analyzed. e main overhead includes CSS's query evidence time T query , the communication time T com , and TVM's verification time T ver . e time cost of each phase of the two schemes is shown in Table 2. It can be seen that whether HHT is superior to MHT Security and Communication Networks mainly depends on whether HHT can provide an AWPL lower than log 2 M, where M is the number of the files. erefore, the value of AWPL is tested. During the initialization stage, given M files with randomly allocated weights, B + HT runs Algorithm 1 to establish an HHT tree. In the verification stage, in order to simulate local selection, we set up a set whose size is equal to 50. Only the files in the set can be selected. After selecting a file in the set to verify at a time, we randomly update a small number of files in the set. According to their verification weights, the files to be verified are selected by roulette-wheel. Meanwhile, a variation factor is established to slightly adjust the weights during verification. According to whether HHT enables the VFC mechanism, HHT is divided into HHT-1 (disabled) and HHT-2 (enabled). Let C be the number of tests and let L be the average authentication path. en, the value of L/log 2 M is recorded in Table 3 to intuitively reflect the performance of HHT.
e test results show that the HHT scheme is generally better than the MHT scheme. When M is small or C is large, the HHT scheme has a better effect, which is mainly attributed to the sampling variance. HHT optimizes the authentication path of the tree according to the weight. When the number of samples increases, sample indicators can better reflect the real indicators of HHT. In addition, the result that the HHT-2 scheme is better than the HHT-1 scheme shows that the VFC mechanism can better adapt to the change of verification probability.
(2) B + HT Scheme. B + HT and Wang's scheme [14] adopt the same signature scheme for the content verification of data blocks.
e difference between them is the hash tree structure for the location authentication of data blocks.
us, this paper mainly compares the performance of their hash trees. TVM's verification time T ver is mainly related to the number of hash calculations, which depends on the height of the tree. e communication time T com is mainly related to the product of the node size and the height of the tree. CSS's evidence query time T query mainly depends on the node query times, i.e., the height of the tree. e complexity of location verification for a single data block is shown in Table 4. It can be seen that the verification performance of TVM and CSS in our scheme is log 2 K times higher than that of Wang's scheme [14]. When the order K of B + HT equals 4, T ver and T query are reduced by two times. Of course, this is accompanied by the increase of T com . Assuming that the number of blocks M � 1204, K � 4, and the hash length is 20 bytes, B + HT only increases the traffic by 0.6 k more than Wang's scheme [14] for each verified data block. In fact, with the development of communication technology, the increased traffic can be negligible. is scheme is suitable for the case where the computing power of both TVM and CSS is weak or the burden is heavy, and the communication bandwidth is good.
is case is common in the cloud environment.
From the perspective of tenant experience, the total verification response time is further evaluated because the tenant may be concerned about how long it can complete verification after it sends a query request. e total time T sum is equal to the sum of T query , T com, and T ver . e experimental parameters are as follows: K � 4, BLS signature, a block size of 20 bytes, and 1G bandwidth. Figure 9 shows the time overhead of the two schemes with the same detection    probability of 99% under different file sizes. It can be seen that T sum of our scheme is lower than that of Wang's scheme. When K � 4, the overhead of T com increases by 1.5x, while the overhead of T query and T ver decreases by 2x. e test results show that the reduced time of T query and T ver exceeds the increased time of T com . At this time, T query and T ver become the performance bottleneck. Moreover, the larger the file, the higher the height of the tree, and the more complex the system management, resulting in a slower search of disk I/O and a larger T query .

Conclusions
is paper proposes a lightweight and efficient integrity verification method for cloud data file systems, which uses different integrity verification schemes for different file types to improve the verification efficiency. e Huffman Merkel tree verification scheme is used for small files, which could shorten the authentication path of small files according to the file verification frequency or user-defined file verification weight. Meanwhile, the B + hash tree integrity verification scheme is used to verify large files, which can effectively reduce the query response time of nodes. Besides, Bilinear aggregate signature is used to verify the existence of large files, which can effectively reduce the requirements of user computing power and communication bandwidth. e experimental results indicate that our scheme can perform data integrity verification well, with less computation and communication overhead and higher verification efficiency than the existing methods.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.