Improved Outsourced Provable Data Possession for Secure Cloud Storage

With the advent of data outsourcing, how to efficiently verify the integrity of data stored at an untrusted cloud service provider (CSP) has become a significant problem in cloud storage. In 2019, Guo et al. proposed an outsourced dynamic provable data possession scheme with batch update for secure cloud storage. Although their scheme is very novel, we find that their proposal is not secure in this paper..emalicious cloud server has ability to forge the authentication labels, and thus it can forge or delete the user’s data but still provide a correct data possession proof. Based on the original protocol, we proposed an improved one for the auditing scheme, and our new protocol is effective yet resistant to attacks.


Introduction
Since 2007, as one of the most interesting topics in the computer field, cloud computing has experienced rapid development and has become a key research direction for large-scale enterprises and institutions. Its high flexibility, scalability, high performance ratio, and other characteristics make it serve storage, medical, financial, education and other aspects [1][2][3]. Among them, cloud storage is an emerging technology developing in cloud computing in terms of data storage [4]. Compared with traditional data storage methods, cloud storage has the advantages of high performance and low cost. Cloud storage uses data storage and data management as its basic functions, allowing users to connect at any location and store local data and information on the cloud, facilitating users' management of resources.
However, with the widespread application of cloud storage technology, its security has received more and more attention from users and has gradually become the key to the sustainable development of cloud storage technology. On the one hand, cloud service providers (CSPs) may delete users' data stored in order to free up storage space for their interests or may want to obtain users' data privacy [5]. On the other hand, the CSP has great openness and complexity, and it is easy to become the central target of various malicious attacks, leading to the loss, leakage, tampering, or damage of users' data. erefore, cloud storage integrity audits have emerged to solve the problem. Users regularly audit the integrity of their own data information stored in the cloud, discover whether their data have been discarded or tampered with, and take corresponding remedial measures.

Related Work.
In the early years, cloud audit-related research was mainly about the integrity verification of remote data. Users do not own the original data and can only verify the integrity of the data stored on the cloud server through the protocol. In 2003, Deswarte et al. [6] proposed the first audit scheme that supports remote data integrity verification. e scheme is based on the Diffie-Hellman key exchange protocol using the homomorphic characteristics of RSA signatures and the difficulty of calculating discrete logarithms as a security basis. e entire file is represented by a large number and then subjected to modular exponentiation to achieve remote data integrity audit. However, this solution will generate a great computing overhead, which is a heavy burden for users. In 2006, Filho et al. [7], based on RSA's homomorphic hash function, used the hash function to compress large data files into small hash values before performing operations.
is scheme reduces the expense of calculation, but it is also not suitable for largescale data storage in a cloud storage environment. e scheme put forward the important role of homomorphic hash function in remote data integrity verification, which is the biggest contribution of it. In 2008, Sebe et al. [8] based on the idea of partitioning to improve the previous scheme. e scheme divides the large data file into blocks and then each data block is calculated, which greatly reduces the computational expense. But the prover still needs to access all the data when generating the evidence, so this scheme is also not suitable for large data files. e above schemes all require the user as a verifier to maintain a metadata set for verification. On the one hand, it is easy for users to lose or leak these metadata, which leads to the disclosure of private data. On the other hand, for users with limited computing resources, huge outsourcing data will increase the computing overhead in the audit process. In addition, in the event of a data corruption accident, the user or CSP will shirk each other's responsibilities and cannot provide effective evidence to confirm who should be responsible for the accident. us, scholars have introduced an absolutely impartial third-party auditor to audit on behalf of users. Auditors are more professional than users in terms of data preservation and computing performance, and in the event of an accident, they can be held accountable for solving problems in a fair manner. erefore, the audit scheme has gradually changed from a private audit between users and CSP to a public audit between users, CSP, and third-party auditors (TPAs). In 2007, Shah et al. [9] proposed a public audit scheme based on the difficulty of discrete logarithm calculation to audit ciphertext data and key integrity. e scheme uses a hash function with a key to precalculate a certain number of response values stored by the auditor. During the audit process, the auditor only needs to match the evidence provided by the server with the prestored response value. However, the number of audits in this scheme will be limited by the number of prestored response values. e amount of calculation required for the integrity audit of all data is not a small expense even for professional third-party auditors. Scholars have been studying how to increase audit efficiency to reduce computational overhead, but from another aspect, reducing the data content that needs to be audited can also achieve the goal. In 2005, Noar and Rothblum [10] proposed an online memory detection scheme.
e scheme studied the sublinear authentication and proposed related authentication protocols. e basic idea of the sublinear authentication is to verify the integrity of all the original data by verifying the integrity of a small part of the data block specified randomly. In 2007, Ateniese et al. [11] proposed the first probabilistic provable data possession (PDP) auditing scheme with both safety and practicality. e scheme is based on RSA's homomorphic authentication label, which realizes the audit of outsourced data. e metadata of multiple data blocks can be aggregated into one value, which effectively reduces the communication overhead, and a random sampling strategy is adopted to check the user's remote data instead of verifying all the user's data, so the calculation cost is effectively reduced.
With the continuous improvement of the audit program, some expansion requirements are constantly raised, for example, audit programs that support privacy protection or batch audits. In 2010, Wang et al. [12] proposed an audit scheme supporting privacy protection through the integration of homomorphic authentication tags and random mask technologies for the first time, in which the bilinear signature is used to support batch audits. In 2013, Yang et al. [13] proposed an audit solution based on the index table technology that supports dynamic data update, and the tag aggregation technology is used to process multiple audit requests from different users to support batch audits in a multi-user multi-cloud environment. In 2015, Hui et al. [14] proposed a public audit scheme based on dynamic hash table (DHT), which can record the attribute information of data blocks to support dynamic data update and improve efficiency. e program also supports privacy protection and batch auditing.
In 2007, Juels and Jr [15] proposed an original proof of retrievability (POR) scheme. Different from the PDP scheme described above, the POR scheme can repair the corrupted data when data are detected to be damaged. e scheme uses sampling and error correction codes to perform fault-tolerant preprocessing on outsourced data files, which can restore data with a certain probability when the data are damaged. In 2008, Shacham and Waters [16] proposed a compact POR scheme. e scheme draws on the idea of homomorphic authentication tags and effectively aggregates the evidence into a smaller value, allowing the verifier to perform any number of audits, while also reducing the communication overhead in the verification process. POR and PDP have their own application scenarios. e former can recover damaged data, and the latter is more flexible which can be applied to privacy protection, dynamic auditing, and batch auditing. Cloud auditing schemes are constantly being improved based on users' needs. When scholars are studying how to reduce computing and communication costs, they also try to expand functions horizontally or combine them with different technologies for innovation. In 2013, Zhao et al. [17] proposed the first identity-based cloud audit scheme, which uses random mask technology to achieve privacy protection. In the identitybased cloud auditing scheme, only the private key generator (PKG) holds a public-private key pair and its public key certificate. e public keys of other users can be calculated based on the identity information, and the private key is generated by PKG, which will reduce the calculation and communication overhead of the scheme. In 2015, Zhang and Dong [18] proposed the first certificateless cloud audit scheme that can resist malicious auditors. us, the concept of malicious auditors was introduced into the cloud audit program for the first time. e certificateless cloud audit scheme can solve the certificate management problem in the certificate-based cloud audit solution and the key escrow problem in the identity-based audit solution. In 2016, Xin et al. [19] combined the transparent watermarking technology with the auditing scheme, proposing a scheme to audit the integrity of static multimedia data, which can greatly save multimedia data calculation and storage costs.

Our Contribution.
Recently, an outsourced dynamic provable data possession scheme with batch update for secure cloud storage (ODPDP) was proposed by Wei et al. [20]. However, we find that there are security problems in their scheme. e adversary can easily forge authentication labels. Even if all the outsourced data have been deleted by the cloud server, CSP can still give a correct data possession proof. And malicious auditors do not carry out auditing work but can conspire with the cloud server forging audit log to deceive client. Finally, we propose an improved secure auditing protocol, and roughly analysis shows that our new protocol is secure and can be used in practical settings.

Organization.
is paper is organized as follows. In Section 2, we describe the system model of our scheme. In Section 3, we review Guo et al.'s outsourced dynamic provable data possession scheme with batch update for secure cloud storage. In Section 4, we give our attacks to the original scheme to show that it is not secure. In Section 5, we give our improved secure auditing scheme and roughly analyze its security. Finally, in Section 6, we draw some conclusions.

System Model
First of all, for the convenience of understanding, the notations and their corresponding meanings of this paper are described in Table 1.
ere are three entities in the system model of ODPDP scheme: CSP (cloud service provider), client, and auditor, as depicted in Figure 1.
e following three entities are involved: (1) CSP (cloud service provider): the service provider, which has abundant computing power and physical storage capacity, realizes the maintenance and management of the received data from client. is part is honest and curious. (2) Client: the data owner, which outsources the data that needs to be calculated and stored to the CSP, concern the integrity of the outsourced data, and checks whether the auditor is honest in the audit work regularly. (3) Auditor: the third-party auditor accepts audit task from the client and is responsible for ensuring the integrity of the data of the client stored in CSP.
e protocols used in the ODPDP scheme are as follows: (1) Setup (1 k ) ⟶ {client:sk c , vk c , sk, pk; auditor: sk a , vk a ; CSP:sk CSP , vk CSP }: random key generation protocol. e users input a security parameter K and then it generates pairs of signing-verifying keys (SKP, VKP) for each participant. For the convenience of expression, we assume that all the participants involved in each subsequent protocol always take the owners' public key and its own secret key as input. (2) Store (client: M) ⟶ v{client: P,C; auditor: P, CT; CSP:P M}: the interactive protocol among the three parties. It takes the keys of the three participants as input and the data M owner by client, and then outputs the processed data M � M, Σ { } for the CSP. Σ is generated by the client through the secret key sk as the tag vector of M. And for the auditor, it outputs a RBMTT based on M. Besides, it outputs a public parameter P that is confirmed by three participants and a contract C between the client and the auditor. dec a , L}: the interactive protocol between the CSP and auditor to make the auditor to be sure that the integrity of M in CSP is in good condition. e auditor takes the functionality of Bitcoin to extract pseudo-random challenge Q and then sends it to the CSP. e CSP computes a proof of data possession based on Q and M and then sends it to the auditor for verification. e auditor verifies the proof from the CSP through Q, T, pk and then outputs a binary value dec a as a response to indicate whether the auditor accepts the proof or not and a log entry L to record the auditing behavior. decc}: the interactive protocol between the client and auditor, which can help the client to audit a log file Λ consisting of the log entries recorded by the auditor. e aim of this protocol is to check whether the auditor accomplished the auditing task or not. After the auditor receives the random subset B of the Bitcoin block index released by the client, it calculates the proof of the specified log based on B, T, λ and sends the proof to the client. e client checks the received proof and then outputs a binary value dec c , which indicates if it admits the proof. Compared with the AuditData protocol, the frequency is much lower and the computational efficiency is much higher in this protocol.

Review of Guo et al.'s Scheme
In Guo et al.'s scheme, three parties are involved, which are the user, the auditor, and the CSP. In their scheme, they used the rank-based Merkle tree (RBMT) to protect the integrity of data block hashes, while the hash values and tags protect the integrity of data blocks. en, they proposed a multileaf-authenticated (MLA) solution for RBMT to authenticate multiple leaf nodes and their indices all together without storing status value and height value. At the same time, they proposed an efficient homomorphic verifiable tag (EHVT) based on BLS signature to reduce clients' log verification effort. For the specific implementation of these technologies, one can refer to the original paper [20]. Concretely, the following algorithms are involved in their scheme.

Security and Communication Networks 3
3.1.

Store Protocol.
e data file is divided into M � (m 1 , m 2 , . . . , m n ), and each data block consists of s sectors and has the form m i � m i1 m i2 . . .
Constructing RBMT. With all data blocks, the client first en, the client constructs RBMT T on top of the ordered hash values, meaning that each leaf node w i stores the corresponding hash value h i .
Computing EHVT. Based on g, λ and secret key sk, the client computes (1)

Outsourcing Auditing Work.
Auditing work is outsourced to the auditor and CSP sends T with Sig skc (T) to the auditor. en, the auditor verifies Sig skc (T).

Agreeing Parameters.
A public parameter P � n, h root needs to be agreed on by three participants, where n denotes the number of data blocks and h root denotes the Merkle root of T. In addition, the client and auditor also need to agree on a contract C � BI, F, l { }, where C denotes the auditor's checking policy. e auditing work will start from a Bitcoin block index BI, the auditing frequency depends on F, and l dictates the number of challenged data blocks for each checking. en, the client deletes M and T from its local storage, and she only maintains a constant amount of metadata.

AuditData Protocol.
e scheme leverages the Bitcoin blockchain as a time-dependent pseudo-random source to generate periodic challenges. e auditor inputs the time t ∈ τ to obtain a hash value hash (b) ∈ 0, 1 { } l hash of the latest block that has appeared since time t in Bitcoin blockchain.
en, PRBG is invoked on the input hash (b) to acquire pseudo-random bits, which will be used by the auditor to select a pair of keys k (b) π , k (b) f . At last, the auditor generates a challenge and sends it to CSP, where the block b corresponds to the timet. Upon receiving the challenge Q (b) , CSP first computes the challenged indices and coefficients as follows: en, CSP computes the proof of data possession to prove the integrity of the challenged data blocks as follows: Finally, CSP responses the auditor with the proof . en, the auditor verifies the correctness of ρ (b) . First, the auditor computes the challenged indices and coefficients. Second, the auditor computes the value with T as follows: ird, the auditor verifies the proof ρ (b) by checking the following equation: If the equation holds, the auditor assures that the challenged data blocks are intact. Lastly, the auditor saves the log entry in the log file Λ to record the auditing work as follows: 3.4. AuditLog Protocol. e client chooses a random subset B of indices of Bitcoin blocks and sends it to the auditor. Once receiving B, the auditor finds Q (b) ,h (b) , and ρ (b) from his log file Λ for each b ∈ B and computes.
In addition, for each b ∈ B, the auditor reads k (b) π from Q (b) and computes the challenged indices i η (1 ≤ η ≤ l) by invoking π k (b) π (η). After eliminating the repetitive indices, the last ordered challenge index vector is denoted by Security and Communication Networks en, the auditor obtains the corresponding multi-proof ⊔ p . At last, the auditor generates the proof of appointed logs as follows: and sends it to the client with Sig sk a (ρ (B) ).
After verifying Sig sk a (ρ (B) ), for each b ∈ B, the client first invokes PRBG(hash (b) )to get Q (b) and reconstructs the challenged indices and coefficients i η , a η (1 ≤ η ≤ l). en, the client verifies the correctness of ⊔ p . If the verification is passed, it means that all the challenged leaf nodes w i j (1 ≤ j ≤ c) in ⊔ p are authenticated, and then the corresponding hash value h i j stored in leaf node w i j can be accepted by the client. Finally, with λ and all authenticated h i j , the client verifies h(B) by checking the following equation: If this verification passes, the client checks the last equation by using her secret key sk and the verified h (B) as follows: (10) If the above equation holds, the client assures that the auditor audited CSP for all the past challenged data blocks appointed by B honestly. e correctness of equation can be elaborated as follows:

Our Attack
In Guo et al.'s auditing protocol, their security model indicates that the malicious CSP cannot forge false proof to pass the challenger's verification and the client can resist malicious CSP and auditor collusion attacks. However, we find that we can extract some key information from the client's pk, data blocks, and their corresponding tags which are known to CSP. In this section, we firstly show how CSPs extract key information and how to use this information to forge "correct" data blocks and their corresponding tags. en, we will show how malicious CSP and auditor collude to use false proof to pass the client's verification.

Attack I.
Our attack is based on the following observation: the public key of the client is pk � g, λ, g 1 , g 2 , . . . , g s , y , (12) and this public key is known to all, and thus the adversary can easily use it to forge authentication label. Concretely, the adversary launches the following attack: (1) In the Store Protocol, CSP can receive M from client, which includes the data of the client and its corresponding authentication tags. e adversary can get a large number of authentication tags as follows: (2) e above equations can be rewritten as follows: 6 Security and Communication Networks σ 1 � λ x h 1 · g α 1 x m 11 · g α 2 x m 12 · · · g α s x m 1s , σ 2 � λ x h 2 · g α 1 x m 21 · g α 2 x m 22 · · · g α s x m 2s , ⋮ σ n � λ x h n · g α 1 x m n1 · g α 2 x m n2 · · · g α s x m ns . (14) e CSP knows the data blocks of client, and it can calculate the corresponding hash values through H(m i ): In order to simplify the attack process, let s � 2, , and take three linear irrelevant tags σ 1 , σ 2 , σ 3 as follows: (3) With these equations, the adversary can compute A, B, and C. Concretely, the adversary first computes Next, the adversary computes