Ldasip: A Lightweight Dynamic Audit Approach for Sensitive Information Protection in Cloud Storage

How to audit the integrity of data stored on the cloud with incomplete trust is an important problem that restricts the development of cloud storage. Although there are several data integrity audit schemes in cloud storage, the increased need to protect sensitive information and support large-scale data storage and dynamic update will result in a significant increase in audit cost, which seriously affects the efficiency of existing cloud audit systems. To solve this problem, we propose Ldasip, a lightweight dynamic auditing method that supports sensitive information protection in cloud storage. Exploiting identity-based data integrity audit, a data masking technology is introduced into to protect user’s sensitive information. At the same time, an improved multibranch tree structure is proposed to realize dynamic audit and reduce communication overhead in the verification process. -eoretical analysis and comprehensive experiments have been conducted, which demonstrate the effectiveness of Ldasip. -e results show that Ldasip can ensure the correctness of the audit, protect the sensitive information in the user’s stored content, and support the dynamic update of data with less audit time and communication overhead.


Introduction
With the rapid development of information technology and network technology, user data are growing explosively. e emergence of cloud services addresses the limitations of computation and storing large amounts of data locally. However, when the data are outsourced and stored in the cloud, users lose absolute control over the data, and the cloud data may be tampered or destroyed by attackers or cloud service providers intentionally or unintentionally. For example, in 2020, about 400 GB of data was downloaded from a UN cloud server in Europe by an intruder. e personal information of more than 4,000 UN staff members was compromised [1]. In 2021, more than 200 million pieces of user data stored on Sina servers, including the personal data of 7.3 million Chinese citizens, were stolen and made public by hackers [2]. In 2022, data from a cloud server of a Croatian telephone operator was downloaded by an intruder. e personal information of more than 200,000 people was compromised [3]. ese events greatly reduce users' trust in cloud services and restrict the development and promotion of cloud storage. erefore, how to maintain the security of cloud storage data is one of the important problems to be solved in cloud storage.
Security audit is an important approach to ensure data integrity. e existing cloud environment data integrity audit schemes are generally divided into two categories: private audit and third-party audit. Private audit can only be performed by users, which is more efficient, but requires users to be responsible for data signing, audit verification, and other calculation and maintenance of a large amount of information [4]. In third-party audit [5], data integrity verification is completed by a trusted third party and the verification report is sent to the user, which greatly reduces the computation cost of the user. Currently, more and more experts and scholars at home and abroad are paying attention to third-party audit.
e existing work has put forward different solutions from three aspects: protection of sensitive information, reduction of audit cost, and dynamic update of data.
ere are two main types of threats concerned in the existing methods of protecting sensitive information. First, third-party auditors are usually honest and curious. When they perform data integrity verification on behalf of users, they may push back the original data through the audit proof returned by the cloud service provider, resulting in user data leakage [6]. Second, cloud service provider (CSP) is not trustworthy, and he may leak user data to malicious users for his own interests. Due to the introduction of third-party auditors, there is communication overhead between the user and the third-party auditor. Meanwhile, in order to facilitate the validation, the auditor needs to maintain a large amount of validation metadata, state, secret key information, etc., which itself incurs additional computation and storage overhead.
In addition, for third-party audit, most data integrity verification structures rely on the public key infrastructure, where users need to generate and manage public key certificates and auditors need to validate them. In order to simplify certificate management, Wang et al. [7] proposed an identity-based integrity verification scheme based on bilinear pairings. In this scheme, a trusted key generator was introduced to generate the private key that can sign the user data, and the user's ID number, e-mail address, or name was directly regarded as the public key, which could eliminate the cost of generating and managing the user's certificate and the cost of verifying the proof by the third-party auditor. However, most current identity-based audit schemes did not take into account the scenario when a user updates data in the cloud. Supporting dynamic data manipulation means allowing users to insert, delete, and modify data files during the integrity audit of user data. Guo et al. [8] put forward an identity-based dynamic integrity audit scheme. However, this scheme adopted the Merkle Hash Tree authentication structure, with the increase in data blocks, the authentication process, and authentication data need too much auxiliary information, so it took a long time for the cloud service provider to query the data blocks in the verification and update processes, which brought extra computation and communication overhead to the cloud service provider.
To solve the problem, we proposed Ldasip, a lightweight dynamic audit method that supports sensitive information protection in cloud storage. Exploiting identity-based audit model, this method can avoid the complex certificate management work such as the issue, management, and revocation of public key certificate. At the same time, a data masking technology is introduced into the proof generation algorithm executed by the cloud service provider to prevent the third party from pushing out the original data through the audit proof and to realize the protection of sensitive information of user data. In addition, an improved multibranch tree structure is proposed. By selecting the deputy root node to store information, the length of authentication path is shortened. e locality principle is adopted to reduce the data block query time, thus improving the efficiency of third-party audit, and reducing the communication overhead in the verification process while realizing dynamic audit.
Compared with existing work, our contributions are summarized as follows.
(1) We proposed a new cloud service node proof generation algorithm by introducing data masking technology, which can prevent third parties from inferring the user's original data based on the audit proof after multiple challenges. A generation and verification mechanism of third-party legal authority is proposed to ensure that only the legitimate thirdparty authorization by the user can verify the files on behalf of the user and reduce the security threat brought by the third-party auditor. (2) We proposed an improved multibranch authentication tree structure. e node utilization rate is improved by storing data block information on the nonleaf nodes of the authentication tree, and the deputy root node is selected to describe the integrity of the node and all its descendants. In this way, the cloud service provider does not need to traverse the whole authentication tree when conducting data block query, which shortens the authentication path length and improves the efficiency of user signature while supporting the dynamic update of data.
(3) We conducted theoretical analysis and experimental evaluation of the proposed scheme. e results show that Ldasip supports dynamic audit and sensitive information protection, and has lightweight challenge response and integrity verification cost, compared with the existing classical schemes such as literature [9,10]. e rest of this paper is organized as follows. is section presents related work, describes the system model, introduces Ldasip in detail, provides security theoretical analysis of Ldasip, presents our experimental results, concludes this paper, and shows some possible future work.

Related Work
In cloud storage, data outsourced in the cloud by users are faced with external attacks and internal threats, and data integrity audit is an important solution [11].
ere are several typical approaches to data integrity audit. e following analysis and comparison are made from three aspects: audit performance optimization, support sensitive information protection, and support data dynamic operation.
In the process of data integrity audit, the protection of sensitive information of user data is mainly threatened by the untrustworthiness of cloud service providers and thirdparty auditors. Ateniese et al. [12] put forward the concept of provable data possessing verification scheme based on homomorphic verification tag and random sampling strategy for the first time. However, the computation and communication overhead of this scheme were high, and the protection of sensitive information of user data was not considered. Shah et al. [13] proposed to encrypt users' data and calculate hash value based on symmetric encryption and send them to auditors. In this scheme, auditors need to verify whether the server had the previously promised decryption key, and the scheme was only applicable to encrypt files and users need to redownload their data from the cloud to the local area, which increased the computation cost of the audit process. Wang et al. [14] proposed a privacy-protected remote data integrity audit scheme using random masking technology, but this scheme was not applicable to identitybased integrity audit scenarios, and there was no way to prove the authenticity of audit proofs sent by cloud service providers. In order to resist the attack of quantum computer, Tan et al. [15] proposed an audit scheme based on lattice to construct a random mask to cover up the audit proof, so that user data could be protected from the attack of curious third parties. Wang [16] et al. proposed a data integrity audit method based on Hash Message Authentication Code (HMAC) and indistinguishability confusion, which supported the protection of sensitive information of user data. However, this method required the user to manage the certificate, which had a large computation and storage overhead. erefore, Han et al. [17] put forward a distributed data integrity audit scheme based on blockchain, which could resist various attacks in the integrity audit process by using blockchain system with decentralized, tamper-proof, and traceability characteristics. However, this method did not support data update.
Early data integrity verification schemes focus on ensuring the integrity of static data. When users perform dynamic operations such as adding, deleting, and modifying files, they need to download the files to the local area for update and then upload them to the cloud. is process will incur a lot of computation and communication overhead. Erway et al. [18] proposed the authentication dictionary based on Grade-based authentication dictionary (DPDP-I) and RSA tree-based authentication dictionary (DPDP-II) to construct the dynamic data audit scheme; however, when the cloud service provider updated the data, the update of the underlying nodes in the skip table would lead to a lot of computation overhead, and this structure would have a larger length when the number of data blocks was large, which would lead to an increase in the amount of auxiliary authentication information. Wang et al. [19] proposed a dynamic audit method based on the classic Merck Hash Tree Block Label Authentication and introduced bilinear aggregate signature technology to support batch audit, but the scheme did not consider the protection of sensitive user data. Daniel et al. [20] proposed to build a data structure based on file hash values to realize dynamic audit. Jian Shen et al. [21] put forward a new dynamic structure composed of double link information table and position array to realize dynamic audit. Sookhak et al. [22] put forward a dynamic audit scheme based on divide-and-conquer table, which divided the data structure into k numbers to reduce its size and reduced the computation cost of users when updating. However, the scheme did not consider the protection of sensitive information of user data. T. Shang et al. [23] proposed a data structure of Merkle hash tree for block tag authentication, allowing users to insert data after each data block, which could effectively improve the efficiency of dynamic integrity audit. Yuan et al. [24] proposed a modified index hash table (MIHT) structure, which can effectively realize data dynamics.
Existing performance optimization efforts focus on reducing the computation and storage overhead and communication complexity of users, third-party auditors, and cloud service providers in the data integrity audit process. Ateniese [25] et al. proposed a Proof of Data Possession Protocol (E-PDP) to reduce the computation and communication cost of the auditor through random sampling of data blocks. However, this scheme did not consider the huge computation cost of the cloud server. Sookhak et al. [22] proposed a data integrity audit method based on file compression and improved algebraic signature, which could reduce the computation and communication overhead between the user and the cloud service provider. ird party-based data integrity audit structures mostly rely on public key infrastructure. Users need to generate and manage public key certificates, and third-party auditors need to maintain a large number of verified metadata, state, key information, and verify certificates. erefore, Wang et al. [7] proposed an identity-based integrity audit scheme based on bilinear pairs to simplify key and certificate management. A trusted key generator was introduced into the original tripartite interaction and was responsible for generating the private key for the user to sign the data. e user's ID number, e-mail address, or name and other information were treated as the public key, so there was no need for the user to generate and manage the certificate, and third-party auditors to verify the certificate, which reduced the computation overhead of the user and third-party auditors. However, this method was only suitable for small-scale users. For this reason, Zhang et al. [26] proposed an identity-based audit scheme by introducing a hierarchical private key generator suitable for large-scale user groups, which improved the audit efficiency of large-scale users. Shen et al. [27] proposed a data integrity audit scheme without storing private keys in which biometric data (such as iris scan and fingerprint) were used as the fuzzy private key of the user to avoid using hardware token, which improved the security and efficiency of audit. However, these schemes did not support the protection of sensitive information of user data and dynamic update.

System Model
In identity-based cloud data integrity audit system, the public and private keys are generated based on the user's identity, and there is a third party responsible for verifying the integrity and availability of the data stored in the cloud [7]. As shown in Figure 1, there are four entities in the identity-based data integrity audit system: user, third-party auditor (TPA), cloud service provider, and private key generator (PKG). Among them, users are entities that outsource data stored in a cloud server. Cloud service providers are entities that have powerful storage resources and provide storage services for users. A third-party auditor is a trusted entity with professional data audit capability that performs data integrity verification on behalf of users. PKG is a trusted entity that generates parameters for the system and private keys for users.
As shown in Figure 1, the identity-based data integrity audit process is as follows:

Security and Communication Networks
(1) e user sends his identity information (ID) to the PKG. e PKG calculates the private key for the user according to the user's ID and sends it to the user.
(2) e user uses his private key to generate data tags that will be used to verify the integrity of the data blocks, and uploads all data blocks and his corresponding tags to the cloud. (3) To verify the integrity of data stored in the cloud, the user authorizes a TPA to send an audit challenge to the cloud. (4) When CSP receives an audit challenge from TPA, CSP generates the audit proof based on the audit challenge and user data blocks stored in the cloud and sends it to TPA. (5) When TPA gets the audit proof, it will judge whether the user's data stored in the cloud are complete based on the correctness verification of the audit proof, and sends the verification result to the user.
Identity-based data integrity audit still faces the following security risks and performance problems.
During the audit process, data privacy is mainly threatened by the following two aspects. As shown in Figure 1, firstly, the third-party auditor is not completely trustworthy, and it is possible to deduce the user's original data when verifying the audit certificate out of curiosity in step ①, thus causing the leakage of user data. In addition, cloud service providers lack credibility, and data stored by users may be damaged or missing due to external attacks or adverse effects of the cloud in step ②. e audit process also faces performance problems in the dynamic update and integrity verification stage. When users update cloud data, they can only download the whole file locally to update the data, which leads to the calculation and communication overhead in step ③. In addition, data integrity audit meets the requirements of users to verify the integrity of cloud data at any time, but long-term integrity audit will bring bad user experience. For example, it takes too much time in the file signature generation process in step ④ and integrity verification process in step ⑤, which seriously affects the audit performance.
Under the framework mentioned above, this paper proposed a lightweight integrity dynamic audit approach that supported sensitive data protection. e approach has the following objectives: (1) Ensure the correctness of the private key. When PKG sends the correct private key to the user, the private key must be verified by the user. (2) Ensure the correctness of TPA audit authorization.
Only the TPA authorized by the user can get a certified reply from CSP. (3) Ensure the correctness of the verification process [28]. e valid proof produced by the proof generation algorithm passes the verification algorithm with overwhelming probability. In other words, Ldasip can ensure that in the verification process, if both TPA and CSP are trusted and the data files are stored correctly, then the audit proof generated by CSP based on the challenge information must pass the verification successfully. (4) Support sensitive data protection. User identity and data content are not disclosed to TPA during the audit process. (5) Ensure the integrity of storage [29]. A CSP without user data cannot provide a valid audit proof.

Design of Ldasip
In this section, we will give the design of the proposed Ldasip approach, including its working principles and detailed core functions.

Working
Principle. In Ldasip, there are different function modules deployed in users, PKG, TPA, and CSP, respectively. e architecture of Ldasip is shown in Figure 2, where PKG is responsible for system initialization and key generation. Users perform third-party legal authority generation, signature generation, improved multibranch tree authentication structure construction, update request generation, and update proof verification. TPA performs challenge and verification model. CSP executes third-party legal authority verification, audit proof generation, and update proof generation module. System initialization module is executed to generate relevant system parameters for initializing data integrity audit by PKG. It is realized by Setup(1 λ )⟶(PP, msk). It inputs a security parameter and generates a public parameter and a master key according to bilinear mapping, which can be used for the subsequent data integrity verification.
Key generation module is executed on PKG [30], which generates the corresponding private key for the user according to the identity information provided by the user. It is realized by KeyExtract (PP, msk, ID) ⟶skID. It inputs public parameters, master key, and user ID, and outputs the private key corresponding to the user ID. In this process, the user sends the ID to PKG, and PKG calculates the user's private key and sends it to the user through a secure channel. When the user receives the private key sent by PKG, it verifies whether the private key is valid through formula (1) (details are shown in Table 1), and receives the private key if the equation holds, otherwise, discards it.
ird-party legal authority generation module is executed by a user. e user authorizes TPA to perform data integrity audit instead of himself and generates legal authority for the authorized TPA and legal authority verification value for CSP according to the TPA's identity information and the user's identity information, so that the legitimacy of TPA can be verified during the audit process. e function is realized by Entrust(ID,ID TPA )⟶Entrust. e inputs of this function are user ID and TPA's ID, and the output is the corresponding legal authority. To prevent malicious attackers from launching denial-of-service attacks on CSP, it is stipulated that only TPA authorized by users can launch integrity audit challenge. e user generates legal authority for TPA. In Ldasip, the user generates authorization, calculates legal authority verification value, sends the legal authority to TPA, and then sends the legal authority verification value to CSP.
Improved multibranch tree authentication structure construction module is executed on a user, who constructs the improved multibranch tree authentication structure, and sends authentication structure and signature together to the CSP. e function can be realized by Construct(m i )⟶C. It inputs data blocks and outputs the authentication structure. e details of the algorithm are described in the following subsections.
Signature module is executed on the user, which generates signature of the data blocks and the signature of the root node and the deputy root nodes of the multibranch authentication tree. e function is realized by Sign (m i ,skID) ⟶ (σ i ,Γ,c). e inputs of this function are user's private key and data blocks, and the outputs are file authentication tag and authentication structure signatures. e user generates an authentication tag for each data block of the file by hash operation, calculates the root node and the deputy root node by user's private key, sends the data blocks and signatures to the cloud server, and then deletes the local data blocks.
Challenge module is executed on a TPA to generate audit challenge. e function is realized by Challenge (PP, ID)⟶ chal, where inputs public parameters and user ID, and outputs challenge chal. TPA generates a challenge chal and sends it to the cloud. In this process, TPA randomly selects a set containing multiple elements to form a challenge chal. TPA sends the challenge chal and legal authority (user ID, TPA ID, and corresponding legal authority) to the cloud.
ird-party legal authority verification module is executed on CSP to verify the validity of TPA authorized by users. It is realized by EntrustV(PP, Entrust,V,ID,ID TPA )⟶ {0,1}. e inputs of the algorithm are public parameter, legal authority, legal authority verification value, user ID, and TPA ID, and the output is 0 or 1. CSP verifies whether the TPA is legal through formula (2). CSP considers the TPA legal and executes the audit proof generation algorithm if it holds; otherwise, it terminates the process.

Security and Communication Networks
Audit proof generation based on data masking module is executed on CSP, which generates audit proof according to the audit challenge sent by TPA. It is realized by Proof(chal,σ i ,m)⟶P, where inputs are the data block m i , the authentication tag σ i and the challenge chal, and the output is audit proof. Details will be introduced in the following subsection.
Verification module is executed on TPA, which verifies the audit proof returned by CSP and judges whether the CSP stores the user data completely, then sends the verification result to the user. It is realized by Verify(chal, PP,ID,P)⟶ {0,1}, where inputs are public parameters, challenge chal, user ID, and audit proof, and outputs the audit result 0 or 1 to indicate whether the file stored in the cloud has been tampered. e TPA verifies whether formula (3) and (4) are hold. If they do not hold, it means the integrity of the file cannot be guaranteed and then outputs fail. If they are hold, TPA judges whether the proof is correct by checking the following formula (5). If the equation is true, it means the data stored in the cloud are integrated, and TPA outputs 1, otherwise outputs 0.
Supporting dynamic update means users can update cloud data without downloading files from the cloud. e user data are updated through the interaction between the user and CSP. e module consists of two parts: one part is update request generation and update proof verification run by the user, and the other part is update proof generation run by the cloud service provider. A user generates the update request information as update and sends it to the CSP by executing the update request generation module. After the CSP receives the update request, it runs the update proof generation module and sends the update audit proof P update to the user. en, the user verifies P update provided by the CSP. If the verification is successful, it means the update operation is performed correctly. e user can delete the locally stored data information; otherwise, the verification fails. Specific agreements will be described in the following sections.
For the convenience of subsequent introduction, a unified symbol description table is given in Table 1.

Construction of Improved Multibranch Tree Authentication Structure.
e traditional multibranch tree authentication [31] only stores the hash value of the data block in the leaf nodes, the data structure is huge, and the effective utilization of nodes is low. In Ldasip, the multibranch authentication tree is reformed as shown in Figure 3. First, data

Symbol
Meaning λ A security parameter as input G Additive cyclic group whose two orders are big prime numbers q > 2 k G T Multiplicative cyclic group whose two orders are big prime numbers q > 2 k e: G×G⟶G T Bilinear mapping Hash functions: Generators of group G p pub (mpk) PKG randomly selects x 0 ∈Z * q and calculates P pub � g x0 Msk s selected randomly by PKG skID e corresponding private key to ID, skID Public key:pk � B·P pub x ∈ Z * q as the secret key to generate authorization, and calculate V � g x as the legal authority verification value. m i Data block of the file C e improved multibranch tree authentication structure I e index of the data block m i Name e file identifier V n e current version number block information is stored on nonleaf nodes of multibranch tree such as z n+1 . Second, the deputy root node is set to shorten the length of the authentication path. ird, the principle of locality is used to add access frequency to the node to record the frequency of the data block being accessed.
Definition 1. Deputy root node R * In a multibranch tree, a node is selected from all the tree nodes in a region of the tree to describe the integrity of the information stored by this node and all the descendant nodes below, and it is called the deputy root node of this region, denoted by R * .
In Figure 3, the node R * is the deputy root node of all nodes in the rectangular box area, and Q is set as the identifier of the deputy root node, which is expressed as follows.
Root node R is a special deputy root node used to describe the integrity of the whole file. All R * are signed separately to obtain the unique signature set Γ of the deputy root nodes of file F. ese deputy root nodes and signatures are used for data integrity verification and user data update.
In the multibranch tree structure, there is an n-branch tree below the deputy root node; that is, each leaf node of the tree has n child nodes, while each leaf node can only have one parent node. e depth of the tree is d, and each node in the tree is a data container, used to store the node identification information and the hash information of the data block corresponding to the data block.
According to the principle of locality given by [32], researchers find that data access is characterized by stages and aggregation when analyzing programs in which aggregation is usually reflected in temporal locality and spatial locality. Locality of time refers to the fact that after data have been accessed, it is likely to be accessed again shortly after. Spatial locality means that after one datum is accessed, data with adjacent addresses may also be accessed shortly thereafter.
erefore, this paper designs the information stored by the node of multibranch tree with the help of locality principle, aiming to realize the efficient search of multibranch tree. Specifically, the node storage information is as follows.
Definition 2. Node storage information multibranch tree storage information is denoted as z i �(Ψ,h(z i ),F), where Ψ�(Q,A i ) is the identifier, Q is the identifier of the deputy root nodes, and A i is the node version number identification to ensure the freshness of the node, h(z i ) is the hash value of the node, which is obtained by hashing the hash value h(m i ) of its corresponding data block after linking with the hash value of the child node, F is the recent access frequency; that is, if there are q requests for access in the most recent time interval t, then F � q/t. Definition 3. Authentication path. e authentication path refers to the set of all parent nodes on the path of the i-th node from bottom to top, starting from the user request authentication node to the deputy root node, recorded as path i �(r 1,i ,r 2,i ,. . .,r j,root' ), where path l � d is the authentication path length, r 1,i indicates the i-th node that needs to be verified, and r d,root * represents the deputy root node. e construction process of the improved multibranch tree authentication structure includes initialization and the construction of multibranch tree, which are shown in Algorithm 1 and Algorithm 2.
Compared with the traditional audit scheme based on multibranch tree, Ldasip stores the data information in the nonleaf nodes of multibranch tree. For the same level and the branches of the tree structure, Ldasip stores more data blocks, so the file will be divided into smaller blocks, and smaller blocks will shorten the time of computing the hash value, thus increasing the operation efficiency of the whole tree structure. In addition, by adding deputy root nodes, Ldasip enables decentralization. In the integrity of the audit process, the CSP traverses the tree structure when retrieving data blocks based on the audit challenge sent by the TPA, and Ldasip can query the recent access frequency of the data block from the node storage information of the deputy root node. It starts to traverse from the deputy root node with high recent access frequency to search for data blocks and quickly find the area where the target data block is located.
us, it does not need to traverse the entire tree structure to leaf nodes like the traditional scheme, and Ldasip shortens the retrieval path and reduces the file retrieval time of CSP and the computation cost of cloud service providers.
In the same way, when users dynamically update the file, CSP can quickly find the corresponding data block by traversing from the deputy root node with high access frequency, and then, CSP updates the file. Ldasip only needs to update the part of the deputy root node hash value and shortens the update levels of the hash value, and it also shortens the path when it calculates the hash from the bottom up and reduces the computation overhead of CSP.

Generation of Audit Proof Based on Data Masking. In
Ldasip, a data masking technology is introduced into the proof generation algorithm executed by CSP in order to prevent the sensitive information of users from deriving by curious third parties. e details are as follows.
e CSP is responsible for executing the proof generation algorithm. First, CSP calculates T � iεIσ vi i and μ � iεI v i m i and then sends them to TPA as an audit proof for verification after the CSP receives the audit challenge chal sent by the third-party auditor. However, if μ is directly sent Figure 3: Improved multibranch certification tree structure.

Security and Communication Networks
to TPA, with the increase of TPA verification times, it is very likely that the data block m can be easily obtained through solving linear equation μ � iεI v i m i . To solve the above problem, we learn from Cong Wang's idea of random concealment [19], but instead of setting random concealment factor, we directly encrypt the hash value of the user data blocks. e CSP uses hash function H 3 to calculate the user data block, so that μ � iεI v i H 3 (m i ) can hide the user's original data block and prevent the user data and sensitive information from being deduced and leaked by curious third parties. Finally, P � {T,μ,{H 1 (z i ),Ω i } i ∈ I ,c, Γ 1≤i ≤ I } will be sent to TPA as the audit proof. e process of audit proof generation includes two parts: block searching algorithm and proof generation algorithm, which are shown in Algorithm 3 and Algorithm 4.

Authentication Protocol Supporting User Data Update.
e common update of data by users mainly includes inserting, deleting, and modifying data blocks [33]. In the process of data update, the user first sends an update request to CSP, and then, CSP updates the data block; generates an update proof, the new root node, and the new deputy root node; updates the authentication structure; and sends the update proof to the user. e user needs to verify the validity of the improved multibranch tree authentication structure before verifies the update data block. If the verification passes, it continues; otherwise, it terminates. e root and the deputy root node are then recalculated and compared with the value returned from CSP. If they are consistent, it means CSP updates the file correctly; otherwise, it does not. After verification, the user signs the new root and the deputy root node, and sends them to the cloud service provider for updating.
Since the deputy root node R * is added to multibranch tree structure and stores the hash value of the corresponding data block for each node in the tree with it as the root, when updating the data block, CSP does not need to retrieve the bottom leaf node when retrieving the data block, which shortens the file retrieval path and the update level of hash value.
is scheme reduces the computation overhead of CSP and reduces the auxiliary information and the communication overhead between CSP and users. In addition, CSP only updates the deputy root node nearest to the data block to be updated, which reduces the auxiliary information required during the generation of audit proof and improves the audit efficiency of the whole system.

Verification Protocol When Data Are Modified.
e user's data modification operation is essentially a replacement process. Generally speaking, it is the process of finding the target data block to be modified first and then replacing it and modifying the data block m i to m i '. e modification process is shown in Figure 4.
(1) e user generates the update request through the authentication tag of the new data block, the authentication tag is   Figure 5, verification interaction during data deletion is as follows:

Verification Protocol When Data Are Deleted. As shown in
(1) In the process of data deletion, the user first sends a delete request update�(D,i) to CSP, i represents the sequence number of the data block to be deleted, and D represents the deletion operation. (2) CSP updates the file after receiving the request message. First, CSP retrieves the data structure, finds and deletes the specified data block mi, updates the hash values of the root node R′ and the deputy root node R * ′, and then sends the updated proof Pupdate � {{H1(zi),Ωi}i ∈ I,c,R′,Γ1≤i ≤ I,R * ′ } to the user.

Verification Protocol When Data Are Inserted.
Assuming that m i * is to be inserted after the data block m i . e process during data insertion is shown in Figure 6.  (1) e user first generates an authentication tag σ i * � (H 1 (name‖V n * ‖t i * ) · u H 3 (m * i ) ) skID for the data block to be inserted and sends an update request update�(I,i, m * i σ * i ,σ * i ) to CSP. I represents the insertion operation.
(2) CSP inserts the data block at the specified position.
After the CSP receives the update request, it updates the file. First, CSP retrieves the location of data block m i and inserts m * behind it, then updates the authentication tag set T' � σ 1 , σ 2 , . . . , σ i , σ * , . . . , σ I . e node hash value of the inserted data block is updated to the hash value of the original data block m i and the inserted data block m * , and then, CSP updates the hash value of the deputy root node and sends the update proof P update � {{H 1 (z i ),Ω i } i ∈ I , c,R′,Γ 1≤i ≤ I ,R * ′} to the user.   5.Use {H 1 (z i ),Ωi} to calculate R and R* 6. Verify the root R and the deputy root R* through equations (3) (4). If they are equal, continue to verify, otherwise the output will fail 7. Use H 1 (z i '),Ωi} to calculate R", check R"=R', R*"=R*' 1.calculate σ i *= (H 1 (name||V* n ||t* i).u H3(m*i) ) skID

Theoretical Analysis
In this section, the Ldasip method is analyzed theoretically, including functional, security analysis, and communication cost comparison with existing schemes.

Functional Comparison.
ere are several identity-based data integrity audit schemes, in which scheme [9] and scheme [10] are classic schemes. As shown in Table 2, both Ldasip and the scheme [9] adopt identity-based integrity audit, while the scheme [10] adopts fuzzy identity-based data integrity audit. All three schemes can simplify certificate management. e scheme [9] does not support data privacy protection and neither scheme [9] nor scheme [10] supports data dynamic and batch verification. Ldasip supports data privacy protection, data dynamic, and batch verification at the same time.

Security Analysis.
is section mainly analyzes the correctness, integrity, and sensitive information protection ability of Ldasip.

Correctness Analysis.
e correctness of a cloud audit approach is that the information generated by the private key generation algorithm KeyExtract(), the authorization algorithm Entrust(), and the proof generation algorithm Proof() can be equation-verified with overwhelming probability. e following analysis results are presented in the form of propositions. See appendix for the proof process.
can be proved to be true.

Proposition 6.
shows that the audit proof based on the challenge information must pass the validation successfully if both the TPA and the CSP are trusted, and the data files are stored correctly in Ldasip.

Soundness
Analysis. e following analysis will ensure that any CSP that can generate valid proofs and pass validation algorithms is in fact storing complete files. See appendix for the proof process.

B)modq ) cannot be verified when m i is replaced by m' i .
Proposition 7 indicates that an adversary may not provide a valid audit proof if he does not store or does not store files fully. In other words, if data stored outsourced in the cloud have been compromised, it is computationally infeasible for the CSP to fabricate data to obtain the verifiable audit proof.

Ability to Protect Sensitive Information.
e following part analyzes whether Ldasip can protect sensitive information.

Proposition 8.
Suppose that TPA has the audit proof P � {T,μ,{H 1 (z i ),Ω i } i ∈ I ,c,Γ 1≤i ≤ I }, but he cannot infer data block m i . e proof process is shown in the appendix. Proposition 8 shows that the TPA cannot obtain the user's original data from the audit proof.

Computation Overhead.
e computation cost of Ldasip is analyzed, and Proposition 9-Proposition 13 are obtained. See the appendix for specific proofs. Proposition 9. If Hash G represents a hash operation in G, Exp G represents a power operation in G, and Pair represents a pairing operation in e:GxG ⟶ G T . e computation cost of the integrity audit scheme in the third-party legal authority generation and verification phase is 2Hash G + 2Exp G + 2Pair.

Proposition 10. If Hash G represents a hash operation in G,
Exp G represents a power operation in G, and Mul G represents a multiplication operation in G. e computation overhead of Table 2: Function comparison of related schemes.

Scheme
Public verification Simplify certificate management Support data privacy Data dynamic Batch verification Scheme [9] √ the signature generation stage is (n + 2)Hash G + nMul G + (2n + 2)Exp G + nHash z * q .
Proposition 11. If Hash z * q represents a hash operation in Z * q , Exp G represents a power operation in G, Mul G represents a multiplication operation in G, Mul z * q and Add z * q represent one multiplication operation and one addition operation in Z * q , and c is the number of data blocks being challenged. e computation overhead of the challenge response phase is (c-1) Mul G + cExp G + cMul z * q + (c − 1)Add z * q .

Proposition 12. If Hash G represents a hash operation in G,
Hash z * q represents a hash operation inz * q , Exp G represents a power operation in G, Pair represents a pairing operation in e: GxG ⟶ G T , Mul G represents a multiplication operation in G, and c is the number of data blocks being challenged. e computation overhead of the verification phase is (c+2) Exp G (c + 2)Hash G + cMul G + Hash z * q + 2Pair. Proposition 13. If Hash G represents a hash operation in G, Exp G represents a power operation in G, Pair represents a pairing operation in e: GxG ⟶ G T , Mul G represents a multiplication operation in G, and c is the number of data blocks being challenged. e computation overhead of the dynamic operation phase is 7Hash G + 4Exp G + 4Pair+ Mul G · Hash z * q .
According to the above Proposition 9-Proposition 13, it can be compared with scheme [9] and scheme [10] in the computation overhead of signature generation phase, challenge response phase, verification phase, and dynamic operation phase. As shown in Table 3, although the computation cost of Ldasip in the signature process is slightly higher than that in the literature [9], because Ldasip supports dynamic update, the process needs to calculate the signatures of root nodes and deputy root nodes. Compared with the large amount of computation caused by users downloading files locally and updating them in static audit, the cost of Ldasip in this stage is far less than the above situation, and Ldasip can support sensitive information protection, but the literature [9] does not. Compared with the literature [10], the computational cost of our scheme in tag generation and verification stage is obviously less than that in the literature [10]. Ldasip uses the improved multibranch tree as the authentication structure in the audit process, which realizes the low-cost construction of the tree. In addition, this structure can shorten the authentication path in the audit process, effectively reduce the computational burden of users and third-party auditors, and improve the efficiency of integrity audit. Moreover, the scheme in this paper can meet the needs of users' sensitive information protection and dynamic update at the same time. Ldasip is more applicable in the cloud storage environment, since it reduces the computation burden of TPA and the user, and it can support sensitive data protection and dynamic operation.

Communication Overhead.
Communication overhead mainly comes from the transmission of legal authority, audit challenge, audit proof, and data update process. Proposition 14 analyzes the communication cost at each stage, and see the appendix for specific proofs.

Proposition 14.
If |p| represents the size of an element in G, and according to the safety and practical experience, we choose random numbers from the big prime number Z * q , which |q| represents the size of an element in Z * q , |m| is the size of the data block, n is the number of data blocks, and |n| represents the size of an element in set [1,n], and l represents the number of deputy root nodes included in the retrieval process of authentication structure. e communication overhead of the authorization process is 2|p|+|q|. e communication overhead of the data tag value and signature upload process is (n+2)|p|+n|m|. e communication overhead of the audit challenge response process is (c+3)|p|+(c+1)| q|+c|n|, and the communication overhead of the data update process is (l+7)|p|+|m|.
As shown in Table 4, the communication cost of Ldasip is equivalent with one of [9] in the process of uploading signatures and audit proof. Compared with [9], Ldasip supports sensitive data protection and dynamic data update. e communication cost of Ldasip is lower than that in [10]. Compared with [10], Ldasip supports dynamic data update and achieves high efficiency of data integrity audit.
We further analyzed and compared the communication complexity with schemes [9,10]. As seen from Table 5, the communication complexity of each entity in the audit process is reduced in Ldasip. Where n represents the number of data blocks, and m represents the number of Table 3: Qualitative analysis of the calculation cost of this scheme and existing schemes.
subnodes of the deputy root node of the multibranch tree, which n is much bigger than m.

Experimental Evaluation
We have implemented Ldasip in an OpenStack-based cloud computing platform. Comprehensive experiments have been conducted to compare with the existing schemes [9,10] at public verification, simplified certificate management, support for data protection, data dynamics, and batch verification.

Experimental Evaluation.
e experiment topology is shown in Figure 7. Five physical machines are used in the experiment. One physical machine serves as the user, one physical machine serves as PKG, and one physical machine serves as TPA. e cloud service provider provides cloud platform services. We deployed the control node to a single physical machine, and the compute node and the network node to a single physical machine. Figure 8 is a sequence diagram between entities. In our experiment, we set the base field size to 512 bits, and the size of Z * q (|p|) is 160 bits.

Authentication Structure Effect.
In the experiment, we compared CSP computation time and the authentication tree construction time of Ldasip with traditional MHT authentication structure under different data block conditions. e file with the size of 200M is divided into blocks according to each data block of 1 KB from which different numbers of data blocks are extracted each time and record the computing time. As shown in Figure 9 and 10, the horizontal axis is the number of data blocks, and the vertical axis is CSP computing time and the authentication construction time, respectively. e construction time of Ldasip is greatly reduced compared with that of MHT, and the audit efficiency of Ldasip is significantly improved compared with MHT structure. erefore, Ldasip can reduce the computational burden of users and cloud service providers, thus improving the performance of audit methods. Table 6 shows the specific algorithms of Ldasip and scheme [9,10] in the signature generation stage, challenge response stage, and verification stage.

Computation Overhead.
In the experiment, we compared the signature generation computational overhead of scheme [9,10] with Ldasip under different data block conditions. Let the file size be 20 MB, each file is divided into 1000000 data blocks, with an interval of 100, select different data blocks from 0 to 1000 for experiment, and record the computation overhead.
As shown in Figure 11, where the horizontal axis is the number of data blocks that generate the signature, and the vertical axis is the signature computation time. e experimental results show that the computation overhead of the

Scheme
Data tag value, signature upload Audit challenge Audit proof Data update Scheme [9] n (|p|+|m|) c (|n|+|q|) |p|+|q| -Scheme [10] (3n+1)|p|+n|m| c (|n|+|q|) s|q| + (2c+1)|p| -Ldasip (n+2)|p|+n|m| c (|n|+|q|) (c+3)|p|+|q| (l+7)|p|+|m|  signature process increases linearly with the increase of the number of data blocks for all methods. e cost of Ldasip in signature generation stage is less than that in scheme [10] but slightly higher than that in scheme [9]. is is because Ldasip supports dynamic update, and the process needs to calculate the signatures of root nodes and deputy root nodes. Compared with the large amount of calculation caused by users downloading files locally and updating them in static audit, the cost of Ldasip at this stage is far less than the above situation.
In the experiment, we compared the challenge response computational overhead of scheme [9,10] with Ldasip under different data block conditions. Let the file size be 20 MB, each file is divided into 1000000 data blocks, with an interval of 100, select different data blocks from 0 to 1000 for experiment, and record the computation overhead.
As shown in Figure 12, the horizontal axis is the number of challenged data blocks, and the vertical axis is the challenge response overhead. Experimental results show that the computation overhead of the challenge response process increases linearly with the increase of the number of challenged data blocks for all methods. However, the overhead of Ldasip is lower than that of the scheme [9,10]. e reason is that Ldasip introduces an improved multitree, which makes CSP spend less time on searching and calculating data blocks. In addition, Ldasip can also realize the protection of user sensitive data.
In the experiment, we compared the verification computational overhead of scheme [9,10] with Ldasip under different data block conditions. Let the file size be 20 MB; divide each file into 1000000 data blocks, with an interval of 100; select different data blocks from 0 to 1000 for experiment; and record the computation overhead.
As shown in Figure 13, the horizontal axis is the number of challenged data blocks, and the vertical axis is the verification overhead.
e experimental results show that the computing overhead of verification increases with the increase of the number of challenged data blocks. e calculation process and parameter size of Ldasip and Scheme [9] in the verification stage are nearly the same, and the simulation results change linearly. Because the length of a single challenge chal is constant, with the linear increase of the number of challenge data blocks, the computational overhead is also linear. Although Ldasip adds the root node integrity verification, the amount of calculation is very small, which is practically negligible. erefore, the cost of Ldasip is basically the same as that in the scheme [9], but significantly lower than     Note.ω represents user identity, and s k , rk represents a random for each user ID.

Conclusion
In recent years, cloud storage services have become an increasingly important part of the information technology industry. It is critical to ensure the integrity of data outsourced to the cloud. erefore, we proposed Ldasip, a lightweight dynamic audit method that supports sensitive information protection in cloud storage. Exploiting identitybased data integrity audit, a data masking technique is introduced to protect users' sensitive information. An improved multibranch tree structure is proposed to realize dynamic audit and reduce the communication overhead during verification. ird-party legal authority verification mechanism is introduced to ensure that only the legitimate third-party authorization by the user can handle the files on behalf of the user and reduce the security threat brought by the third-party auditor. Finally, the theoretical analysis and experimental evaluation results of Ldasip are given.
However, some issues are not covered in this paper. First, more efforts should be made to support data integrity audit in more complex cloud service scenarios such as the cloud service composition. Second, this paper mainly studies the integrity audit of cloud storage data by trusted third-party auditors and supports the integrity audit of dynamic operation of data by users, without focusing on the security and performance issues in data sharing scenarios. In the future, the security and performance issues of integrity audit will be further considered when data are shared by entities other than users.

APPENDIX
Proofs of Proposition 1. Given the correct private key skID � b + x 0 H 2 (ID,B)modq generated by PKG, the verification equation (A.1) in the KeyExtract algorithm will hold.  Based on the properties of bilinear mapping, the equation (A