Co-Check : Collaborative Outsourced Data Auditing in Multicloud Environment

With the increasing demand for ubiquitous connectivity, wireless technology has significantly improved our daily lives. Meanwhile, together with cloud-computing technology (e.g., cloud storage services and big data processing), new wireless networking technology becomes the foundation infrastructure of emerging communication networks. Particularly, cloud storage has been widely used in services, such as data outsourcing and resource sharing, among the heterogeneous wireless environments because of its convenience, low cost, and flexibility. However, users/clients lose the physical control of their data after outsourcing. Consequently, ensuring the integrity of the outsourced data becomes an important security requirement of cloud storage applications. In this paper, we present Co-Check, a collaborative multicloud data integrity audition scheme, which is based on BLS (Boneh-Lynn-Shacham) signature and homomorphic tags. According to the proposed scheme, clients can audit their outsourced data in a one-round challenge-response interaction with low performance overhead. Our scheme also supports dynamic data maintenance. The theoretical analysis and experiment results illustrate that our scheme is provably secure and efficient.


Introduction
With the increasing demand for ubiquitous connectivity, wireless technology has significantly improved our daily lives.Meanwhile, together with cloud-computing technology (e.g., cloud storage services and big data processing), heterogeneous wireless networking technology has become a foundation infrastructure widely adopted by emerging communication networks, for instance, IoT (Internet of Things), C-RAN (cloud radio access network), and body-area network, as shown in Figure 1.Particularly, the cloud storage technique has been widely used in services, such as wireless data outsourcing and resource sharing, thanks to its convenience, low cost, and flexibility.Nowadays, online service providers, such as Amazon and Baidu, operate large data centers and offer unlimited storage capacity for users, relieving their burden of local data management and maintenance [1,2].In addition, cloud storage enables universal data access in any place.However, users lose the physical control of their outsourced data, while the cloud storage service provider is not always trustworthy.Dishonest service providers may conceal the fact that users' data have been damaged due to some misoperations or unexpected accidents.Even worse, malicious service providers also may delete the data seldom accessed by users to gain more benefits.How to ensure the integrity of their remotely outsourced data becomes a serious concern for users selecting cloud storing services.
Traditional data integrity verification solutions [3,4], which are based on hash functions and digital signatures, are impractical to audit cloud data remotely due to their unacceptable communication and computational overhead to retrieve the outsourced files.To check the remote data integrity effectively without retrieving the whole outsourced document, Ateniense et al. presented the first probabilistic verification model called provable data possession (PDP) based on homomorphic cryptography algorithm and sampling techniques [5].Taking the public verifiability into Most of these previous works mainly target the problem of data integrity audition in a single-cloud storage environment rather than a heterogeneous cloud infrastructure that collaborates multiple internal (private) and/or external (public) cloud resources [8,9].In the multicloud environment, users split their data, duplicate file blocks, and outsource them to different CSP (Cloud Service Provider) servers.The solutions above cannot enforce the data integrity checking efficiently in such an environment where data spread over multiple servers.Aiming at this problem, Zhu et al. propose a cooperative provable data possession (CPDP) scheme [8,10] in the multicloud environment.However, in the CPDP scheme, the security parameter   is independent of other parameters; and thus servers can bypass the authentication by forging the parameter   in the response sequence.Moreover, in the process of third-party public verification, the third party needs to know where every data block is exactly stored.It poses a threat to users' data storage privacy and increases the operation overhead for the third auditing party to maintain the storing state of file blocks.Moreover, besides the effectiveness, efficiency is also a significant concern for a data integrity auditing solution in the multicloud storage environment.
In this paper, we present Co-Check, a collaborative multicloud data integrity audition scheme, which is based on BLS signature and homomorphic tags.According to proposed scheme, users can audit their outsourced data in one challenge-response interaction with low communication cost.Our scheme also enables public verification and supports dynamic data maintenance that users can modify and delete the data with low performance overhead.The contributions made by this paper are summarized as follows.
(i) We propose an effective collaborative multicloud data audition scheme enabling users to conduct data integrity checking among multiple CPS server simultaneously in one-round challenge-response procedure.
(ii) The audition procedure of our scheme is stateless and supports unlimited challenge-response interactions.Moreover, the proposed scheme supports dynamic data maintenance efficiently.
(iii) We prototype our scheme and conduct system evaluation.The theoretical analysis and experiment results illustrate that our scheme is provably secure and efficient.
Paper Organization.The rest of this paper is organized as follows.Section 2 describes the security goals, system model, and the overall architecture of our approach; Section 3 presents the collaborative multicloud data integrity audition scheme; in Section 4, we make the theoretical analysis and evaluate our protocol on security and performance aspects; Section 5 discusses the related work; and Section 6 concludes the paper.

Approach Overview
2.1.System Framework.As shown in Figure 2, the general multicloud storage system includes three types of network entities.
(i) Client (or User).(We use the term user and client exchangeably in this paper.)Clients outsource data to reduce local storage overhead and make use of the computation resources provided by the cloud service providers in multicloud storage system.
(ii) Cloud Service Provider (CSP).CSPs that possess a large quantity of hardware and software resources are clustered to provide remote data storing services.We assume that there is an organizer in the CSP cluster, a mediation node that interacts with users and other CSPs.
(iii) Third-Party Authority (TPA).TPA is an optional entity being partially trusted in the multicloud scenario.
In the multicloud storage system shown in Figure 2, the user splits her/his documents into several file blocks.The False? file blocks will distribute the cloud storage servers deployed by different cloud service providers.In addition, to promote the access efficiency and ensure the data retrievability, users might also duplicate the file blocks and spread the copies to several cloud servers.

Challenges and Goals.
As the CSPs in the multicloud system cannot always be trustworthy, it is necessary for users to establish the integrity audition mechanism that ensures their outsourced data are stored correctly without unauthorized access by CSP servers or other entities.To make the audition more efficient, another challenge of data integrity audition in the multicloud environment is to conduct parallel checking, which means verifying the integrity of block files stored in different CSP servers simultaneously.Moreover, supporting securely dynamic maintenance is also a major concern of the multicloud data audition.
Aiming to address the above challenges, the goal of this paper is to propose an effective multicloud data integrity audition mechanism satisfying the following requirements.
(i) Correctness: benign servers will prove themselves successfully and none of the misbehaved servers can bypass the checking.(ii) Batch verification: the client can simultaneously verify the integrity of the file blocks distributed in different CSP servers without retrieving the file.(iii) Stateless and unbounded checking: the audition procedure is stateless and supports unlimited challengeresponse interactions.

Collaborative Data Integrity Audition Model.
Our collaborative data audition model consists of three stages as we defined in our preliminary version [11]: initialization, challenge-response, and integrity checking.Motivated by the sampling technique introduced by Ateniese et al. [5], users split their files and distribute the file blocks among the cloud service providers (CSPs) in initialization and preprocessing stage.Meanwhile, users keep the corresponding metadata for the future audition.Here we use BLS signature to create the homomorphic tags due to its homomorphic property.Instead of retrieving the whole file to verify its correctness, in stages II and III, users generate the challenges for audition by using parts of the metadata restored at the client side to prompt the audition efficiency and ensure that malicious CSPs cannot bypass the check with a high confidence rate.Additionally, our scheme also designates a subprocedure to support dynamic maintenance.The procedure of our scheme is shown in Figure 3.
(1) Stage I: Initialization and Preprocessing.Stage I consists of steps ( 1)-( 2) in Figure 3.In step (1), the user selects system parameters and generates keys for BLS algorithm used in the successive steps.Meanwhile, the user splits the file  into file block set and each file block   consists of several file sectors.Then the user computes the homomorphic tags   corresponding to the file sectors.After preprocessing the outsourced file, the user distributes the file blocks with the metadata for audition into the cloud servers belonging to the different CSPs and keeps the secret parameter locally.
(2) Stage II: Challenge-Response.Stage II includes steps (3)-( 6) in Figure 3.When the user wants to audit her/his outsourced file, she/he computes a challenge sequence corresponding to the file blocks under test.The user sends organizer to the challenge sequence and organizer will forward the challenges to the aimed CSP servers that contain the user's file blocks.CSP servers calculate and return their proofs to organizer.Organizer aggregates the proof received and sends the corresponding answer to the user.
(3) Stage III: Integrity Checking.Based on the received response from organizer, the user verifies the data integrity in step (7) shown in Figure 3.If data are stored correctly, the algorithm outputs "TRUE"; otherwise, it outputs "FALSE," which means that there exist misbehaved CSP servers.
Dynamic Maintenance.When users need to conduct dynamic operations on their outsourced data, they recreate tags corresponding to the new file sectors and send them to the organizer for updating.All the symbols used in this paper are listed in Notation.

Collaborative Multicloud Data Integrity Audition Scheme
In this section, we present our collaborative multicloud data integrity audition scheme in detail.The notations and concepts employed in our work are illustrated below.
(i)  = (, ,   , , ) is the system parameter. is a big prime number and is the order of the cyclic group ;  :  ×  →   is a nondegenerate bilinear map. is the generator of .
(ii)  is the number of the CSPs, and the CSP set is represented as {CSP 1 , CSP 2 , . . ., CSP  }.
(iii)  is user's file and   is the file name.The file  is separated into  blocks, each of which contains  sectors,  = {  } × , where   ∈   .
(iv)  is the challenge generated by users.
As shown in Figure 3, our scheme includes three entities, a user, CSP servers, and an organizer, which is also one of the CSP servers.The integrity checking scheme is fulfilled by the following eight steps.
The user selects secure parameter  and system parameters  and .She/he randomly selects an  ∈  *  as the private key.The public key is V ←   ∈ .Then the user gets pk = {V, }, sk = {}.
The user splits the file  into  blocks, each of which contains  parts.The file  is represented as follows: We assume that num  ( = 1, . . ., ) is the total number of copies corresponding to each data block   ( = 1, . . ., ) stored in different CSPs, and   ( = 1, . . ., ) represents how many times each data is updated.The initial value of   ( = 1, . . ., ) is 0 for all the elements.We use   =  ‖ num  ‖   ( = 1, . . ., ) to represent it.‖ represents concatenation.
The user randomly selects  parameters  1 , . . .,   ∈  and computes the tags )  for  = 1, . . ., num  corresponding to each data block   ( = 1, . . ., ) and thus the set of all tags is obtained.As shown in Figure 4,   ( = 1, . . ., ) represents data blocks from the file; each Step 2 (data outsourcing).The user sends the file  and corresponding tags to the organizer, and the organizer distributes data blocks with corresponding tags to different CSP servers (as shown in Figure 5).If a file block is stored with several copies, every copy of the file block has a tag.For instance, data block   ( = 1, . . ., ) is stored with num  copies, then there are num  tags, which means the CSPs should store data   along with the tag   ( ∈ [1, . . ., num  ]) from the num  labels.The user computes the public parameter  = (, ) ( = { 1 , . . .,   },  = { 1 , . . .,   }) and sends it to the trusted third party for storage.The user keeps the private key at the client side.
Step 4 (challenge delivery, forward (chal)).The organizer forwards the received challenge chal =  to the CSP servers, CSP ∈[1,...,] .Without losing generality, we assume there are  CSP servers that store the blocks challenged by the user.
Step 5 (proof creation and delivery, GenProof(pk, ,   ,   ) → {  }).∀ ∈ [1, . . ., ], the service provider CSP  computes the evidence according to the following formula: CSP  returns the proofs shown in (3) to the organizer: Step 6 (proof aggregation and response, Aggregation(pk, ,  1 ,  2 , . . .,   ) → {}).The organizer computes Step 7 (user verification).After the user received the data  = { 1 ,  2 } sent by the organizer, she/he gets the parameter  = (, ) from the trusted third party and verifies the response according to the formula If formula (4) holds, it means the outsourced data are stored correctly and the output is "TRUE"; otherwise, the output is "FALSE." We summarize the interactions of collaborative auditing in Figure 6.Dynamic Update.When users need to update data   →   , they should make a modification   =   + 1 from ))  ,  = 1, . . ., num  , and send the updated   along with the corresponding label   ( = 1, . . ., num  ) to the organizer.After that, the organizer conducts the distributed storing operation.Due to the relevance between the label and the sequence of the data, the scheme could only realize part of the update operations, namely, data modification and deletion.

Security Analysis.
In this section, we prove two properties to ensure data integrity under our scheme.

Theorem 1. Correctness. If all CSP servers keep user's data correctly, they can successfully pass the challenge-response verification procedure initiated by the user.
Proof.To verify the data correctness, according to step (7), the use computes ( 2 , ).It can be noticed in step ( 5)-( 6 This completes our proof. Theorem 2. If there exists a probabilistic polynomial time adversary adv and it is able to successfully convince the TPA to accept the fake proof information for a corrupted file in nonnegligible probability, then it is possible to construct a polynomial algorithm  to solve the computational Diffie-Hellman (CDH) problem by invoking adv with nonnegligible probability.

Collaborative multicloud data auditing Organizer Client
False, otherwise  Proof.Suppose that the algorithm  is given an instance of the CDH problem tuple shown as follows: and its goal is to compute   .The algorithm  will execute an interactive game with adv in the following game of security model.
Setup.Let V =   be the public key of the user, and choose a hash function  : {0, 1} * →  which acts as random oracle in the following security proof.And for  = 1 to , it randomly selects   to set   =    .Finally, it returns the public parameter params = {, , ,  1 , . . .,   , } to the adversary adv.
Hash Query.At any time, the adversary adv is able to adaptively query hash oracle for the string   ‖  ‖  ‖   it submits.And to respond to these queries, the algorithm maintains an -list which is initially empty and responds as follows: (1) If (  ‖  * ‖  ‖   , * , * ) exists in the -list,  retrieves the tuple (  ‖  * ‖  ‖   ,   , ℎ  ) and sends ℎ  the adversary adv.
TagGen Oracle.At any time, the adversary can adaptively query the TagGen oracle with message .To respond to it,  executes as follows: (1) First, it divides message into  =  1 ‖ ⋅ ⋅ ⋅ ‖   .
Challenge.The adversary adv chooses a subset  ⊆ {1, . . ., } of indices of the data blocks such that at least one index in set  satisfies   = 1 in the tuple (  ‖  ‖  ‖   ,   , ℎ  ).And ∀ ∈  one has queried the hash oracle before.
The challenge sets the challenge information chal = {(,   ) |  ∈ } and sends it to the adversary.
Thus, we have It means that It means that the solution of the CDH problem can be solved.
From the above simulation, we know whether  could output the correct solution of CDH problem depends on whether the simulation aborts during the TagGen Query and Challenge phases and whether the adversary could output a valid proof information for the challenge information.The adversary is allowed to make the Hash Query at most  times.Nonabort probability during TagGen Query phase requires that all   = 0 for  = 1 to ; thus its probability is ()  .Nonabort probability during Challenge phase requires that at least one index  * 's   * = 1; thus, its probability is at least (1 − ) −1 , where  is the size of subset .Thus, its success probability is When  = 1/, then This completes our proof.

Performance Analysis.
We prototyped our algorithm and the evaluation is conducted on a desktop with Intel Core 2 Duo CPU @2.66 GHz, running Ubuntu 10.10 in Oracle VM VirtualBox Version 4.2.10 configured with 2 GB memory, and adopted PBC library to implement the crypto primitives.
The security parameter of the bilinear pairing function is configured as 80, which means the prime number  is 160 bits.In the evaluation, we set the file size as 80 KB, 160 KB, and 320 KB, respectively.The result of evaluation is illustrated in Table 1.
The experiment results shown in Table 1 illustrate that the time cost of preprocessing and challenge generating will not be influenced by the number of file blocks.The time cost of proof generating decreases with the decline of , the number of file blocks; in contrast the time cost of verification will increase when  decreases.The time cost of preprocessing increases proportionally with the increase of file size.When file size increases, the challenge generation time cost almost remains unchanged and the time cost of proof generating and verification increases.

Related Work
Based on different properties of the proposed models or schemes, related work can be classified as static data verification schemes, integrity verification schemes supporting dynamic operation on data, and verification schemes in multicloud environments.In this section, we discuss the related work in detail.
. Another solution, which is based on RSA signature, requires users to sign the data before it is outsourced with the labels at the server side.Challenges could be issued randomly in the process of verification, and the bandwidth is (1).The computational cost of the server is () ( is the number of the file blocks), which increases linearly with respect to the file size.Gazzoni Filho and Barreto [4] proposed a remote data integrity verification scheme by combining RSA signature and hash function techniques.Their method could verify the same file for unlimited times, but the whole package of data is required to conduct a specific verification.
Sebe et al. [12] proposed a new integrity verification scheme based on Diffie-Hellman key exchange.In their scheme, the computational overhead at the user and server sides is (1), while the storage cost at the user side, (), increases linearly with respect to the entire data size.Their follow-up work [13] combines Diffie-Hellman key exchange and RSA signature to realize remote data integrity verification.
To reduce computational overhead, Ateniese et al. [5,14] propose a probabilistic remote data integrity checking scheme called provable data possession (PDP) by using homomorphic verification tags and sampling technique.
Ateniese et al. [6] proposed a framework to adopt homomorphism identification protocol in data integrity verification and they demonstrated this under the instance of homomorphism identification authentication protocol by Shoup [15].The authors define the model of homomorphism identification authentication, the model of data integrity verification, and the corresponding attack models.
The schemes above can only detect whether data is properly stored but could not correct the mistakes (like retrieving the data).Another branch of remote data integrity checking focuses on the error correction and retrievability along with the cloud data audition.
Therefore, the study emphasis lies on data error correction and retrieval along with data integrity verification.
Juels and Kaliski Jr. [16] combined data possession checking and error correction of coding technique and became the first to propose the model of POR (proof of retrievability) for remote storage of data.This model adds indistinguishable sentinel to the original code which is not only able to preserve data integrity, and data availability is also realized.Their scheme is used to handle encrypt data.
Shacham and Waters [17] proposed two types of POR schemes: one is a public authentication scheme based on BLS signature, the other is private authentication on the basis of pseudorandom function, and both of the schemes have low interactions and computations.Bowers et al. [18,19] introduced POR scheme in distributed static data storage system and realized and practiced it.
Naor and Rothblum [20] study the issue of whether files are damaged badly when they are stored in remoter server.They firstly focus on the entire file correcting error code, then compute message authentication code (MAC) for every data block to verify its integrity.When the integrity is damaged and it is within the range of correcting error, then the error detection and correction are to be realized.
Xu and Chang [21] proposed a high efficiency POR scheme, in which data block is involved with  group elements and  child data blocks, the storage overhead is 1/ of the file block, and computational costs is ().
In addition, for the static data in cloud, multiple integrity verification schemes have been proposed which support public verification and users' privacy preservation.In the cloud storage users worry that data in the cloud server is damaged; on the other hand, they worry about the leakage of their data to the unauthorized third party especially for the sensitive information such as personal health report, corporation financial report.Therefore, to preserve privacy the most direct method is that users preprocess the data to encrypt it before they store the sensitive data into the cloud.With data integrity detection scheme, they could verify the data at any time.
Shah et al. [22] considered the problem raised by integrity verification of data storage after it is encrypted and proposed

Figure 1 :
Figure 1: Overall architecture of the multicloud based heterogeneous wireless network.

Figure 2 :
Figure 2: Multicloud based data storing for wireless communication systems.

Table 1 :
[3]luation results of our approach.Pre is the time cost for preprocessing;  Gch is the time cost for generating challenge;  Gpr is the time cost for generating proof;  Ver is the time cost for verification.5.1.Static Data Integrity Verification.Early research of outsourced verification focuses on static archive data.Deswarte et al.[3]are the first to propose remote data integrity verification.They proposed two solutions to this problem, one is to precompute hash value of files and compare whether the hash value returned by server is equivalent to that of the local storage; this solution could significantly reduce the communication bandwidth between users and the server to