Towards Dynamic Remote Data Auditing in Computational Clouds

Cloud computing is a significant shift of computational paradigm where computing as a utility and storing data remotely have a great potential. Enterprise and businesses are now more interested in outsourcing their data to the cloud to lessen the burden of local data storage and maintenance. However, the outsourced data and the computation outcomes are not continuously trustworthy due to the lack of control and physical possession of the data owners. To better streamline this issue, researchers have now focused on designing remote data auditing (RDA) techniques. The majority of these techniques, however, are only applicable for static archive data and are not subject to audit the dynamically updated outsourced data. We propose an effectual RDA technique based on algebraic signature properties for cloud storage system and also present a new data structure capable of efficiently supporting dynamic data operations like append, insert, modify, and delete. Moreover, this data structure empowers our method to be applicable for large-scale data with minimum computation cost. The comparative analysis with the state-of-the-art RDA schemes shows that the proposed scheme is secure and highly efficient in terms of the computation and communication overhead on the auditor and server.


Introduction
Despite being a promising business concept, cloud computing is also becoming the fastest growing segment of the IT industry. It is all about moving services, computation, and/or data off-site to an internal or external, locationtransparent facility or contractor. It is the way to increase the capacity or to add capabilities without investing in new infrastructure, licensing new software, or training new personnel. Despite many existent cloud definitions, they all agree that this paradigm aims at offering every networkaccessible computing resource "as-a-service" (XaaS); however, the most highly structured definition comes from the National Institute of Standards and Technology (NIST) [1][2][3]. Thus, cloud computing is a key technology for empowering convenient, on-demand network access to a shared pool of configurable computing resources with negligible service provider interaction or management effort. Therefore, enterprise and businesses tend to outsource their data on the cloud storage without investing in extra hardware, software, and the maintenance [4].
Despite the fact that cloud offers noticeable services for data owners, storing data to a remote server and entrusting management of data to a third party result in losing the physical control over the data [5,6]. Though cloud has a promising, resilient, and reliable architecture, the data in the cloud is still susceptible to many threats and encounters many security challenges. It might lead to compromise the confidentiality, integrity, and availability of data. Examples are included to be able to delete less frequently accessed data to make available disk space or to hide data damage in order to protect the reputation Recently, owners may lose their outsourced data on the cloud due to service and data disruptions in servers with major cloud infrastructure providers such as Amazon (1) We propose an efficient remote data auditing scheme for data storage in cloud computing based on algebraic signature. Our data auditing scheme incurs the minimum computation and communication cost on client and cloud server.
(2) We design a new data structure to efficiently support dynamic data operations, such as insert, append, delete, and modify operations. This data structure empowers our method to be applicable for large scale data with least computation cost on client and server.
(3) We implement our scheme to prove the security, justify the performance of our method, and compare with the stat of the art data auditing methods.
The rest of the paper is organized as follows. Section 2 discusses the related work. Section 3 introduces the preliminaries and the fundamental concepts which are used in the construction of our method. In Section 4, we introduce the details of our remote data auditing scheme. We describe the security analysis of our scheme in Sections 5 and 6 gives the performance analysis in terms of computation overhead. Finally, the conclusion of this paper is presented in Section 7.

Related Work
Recently, a great deal of attention has been paid to the RDA schemes that are used to check the correctness of outsourced data in cloud computing [15][16][17][18][19][20][21][22]. Ateniese et al. [15] were the first to propose the provable data possession (PDP) scheme to check the correctness of the outsourced data statically in the cloud storage without having to retrieve the data. They used the RSA-based homomorphic verifiable tag to combine the tags and to build a proof message that permits the client to check whether the server has specific blocks, even when the client has no access to the blocks. However, the PDP scheme incurs high computation and communication cost on the server side due to the usage of RSA numbering over the whole file. It also has linear storage for the user and fails to provide secure data possession when the server has a malicious intent [23,24]. In [16], Ateniese et al. considered static data update issue in the original PDP method [15] and developed a semidynamic data auditing method based on symmetric-key operations. This method allows a user to modify, delete, or append the stored data in the cloud. However, the data owner needs to regenerate all remaining challenges during the update operation, which makes it inapplicable for huge files.
Jules and Kaliski [25] defined a type of the RDA techniques, namely, proof Of retrievability (POR) in which an auditor has also the capability to recover and mitigate data corruption by using forward error-correcting codes when data is stored in untrusted cloud. To achieve this goal, the data owner needs to create a set of sentinel blocks by using a oneway function and inserts the sentinels randomly on the data blocks before uploading to the server. If the server modifies a small portion of the file, the verifier can find it and check the integrity of a file easily due to the effect of file modification on the sentinels. However, the number of queries in such method depends on the number of inserted sentinel blocks. Moreover, The Scientific World Journal 3 the POR method incurs high computation overhead on the client side because of the error recovery and data encryption processes. The work proposed by Shacham and Waters [26] improved the efficiency and security of the original POR based on the data fragmentation concept. The authors used the BLS homomorphic authentication [27] to generate a fixed size tag by aggregating all of the tags to minimize the network computation cost and used the Reed Solomon code to recover the corrupted blocks. The main disadvantage of this method is supporting static data update. Furthermore, during public verification process, the privacy of data cannot be protected against a trusted third party. The majority of POR methods failed to efficiently support dynamic data update because the server is unable to realize the relation between the data blocks and encrypted code words. Cash et al. [28] addressed this issue and designed a first dynamic POR scheme by using the ORAM technique [29]. The dynamic POR method also incurs high computation overhead on the client and server side.
The work by Erway et al. [18] addressed the dynamicity issue in the PDP schemes by combining the skip list [30], rank-based information, and authentication dictionary. Each node in this data structure needs to store the number of reachable nodes from this node as a rank. Although the dynamic PDP method ensures the integrity of variable-sized data blocks, it is unable to verify the integrity of individual block [31].
Wang et al. [19] employed a combination of the Merkle hash tree (MHT) [32] and bilinear aggregate signature [27] to propose a dynamic remote data auditing in cloud computing. The main contribution of this method is in manipulating the classic MHT construction by sorting the leaf nodes from left to right in order to support dynamic update and determine the insert, delete, or modify positions by following this sequence and computing the root in MHT. However, the method leaks the data content to the auditor and incurs heavy computation cost on the auditor.
Yang and Jia [17] implemented an efficient data auditing scheme to overcome the privacy issue in [19]. The authors used the bilinearity property of the bilinear pairing for generating an encrypted proof such that the auditor is only able to verify it. They also design a new data structure to support dynamic operations in which data owner needs to store a row, including block index and block logical location for each block of outsource file. During the delete and insert operations, the auditor has to find the position of the required block ( ) and shift the remaining blocks ( − ) to create or delete a row in such data structure. However, by increasing the number of blocks in the data structure, the auditor needs to shift a huge number of blocks, which incur the high computation overhead on the auditor. The other drawback of this method is that deleting or inserting a large data block imposes high computational overhead on the auditor side. Furthermore, the bilinear pairing computation is more expensive than the algebraic structure that is used in our method [33,34].

Preliminaries
This section provides an overview on the background of our dynamic remote data auditing method. We first describe the general architecture of the remote data auditing protocol. Then, we state the fundamental technique of this method that is called algebraic signature in order to audit the outsourced data efficiently.

System
Model. The architecture of RDA protocols in a network comprises four key entities: (1) user: it represents an enterprise or individual having permission to read the stored data in the cloud, (2) data owner (DO):it is enterprise or businesses which store their data in the cloud storage having the ability to do update operations (modify, delete, and insert), (3) cloud storage provider (CSP): this entity is responsible to back up the user data and generates a proof as a response of the received challenges, and (4) third party auditor (TPA): auditing the outsourced data and its verification is done by TPA. It actually ensures whether the data remains intact over the passage of time in public auditing models. Private auditing schemes, however, cannot support the TPA and DOs in order to check the integrity of the data. Figure 1 clearly depicts the typical RDA components and their interactions.

Algebraic Signatures.
Algebraic signature is a type of hash functions with algebraic properties that allows computing the signatures of unseen messages in a limited way. The fundamental feature of algebraic signature schemes is to take a signature of the sum of some random blocks giving the same result as taking the sum of the signatures of the corresponding blocks [35].
Let an element in the Galois field be composed of a vector of various nonzero elements = ( 0 , 2 , . . . , −1 The Scientific World Journal In the following, a number of algebraic signature properties are listed. [36] have also shown that the algebraic signature of concatenation of two blocks [1] with length and [2] is computed by

Proposition 1. Litwin and Schwarz
Proposition 2. The algebraic signature of summation of blocks of a file is equal to summation of signature of each of the blocks Proof. Assume that the file is divided into blocks and each of the block consists of sectors. Then, where [ ][ ] indicates the th bit of block in file .

Proposition 3.
The algebraic signature of summation of two files , is equal to summation of signature of the files Proof. Assume that the files and include blocks. Then, the summation of signature of such files can be computed by

The Proposed Scheme
This section presents the applied techniques and algorithms of our dynamic remote data auditing scheme. We also show the correctness proof of our method by using the characteristics of the algebraic signature technique.

Data Auditing Algorithm.
Suppose that file includes data blocks and each of the block is divided into sectors by using the data fragment technique. If the last block has less number of sectors, we increase the size of the block by setting , = 0 for ≤ . Our data storage auditing scheme consists of the following phases.
Setup. The DO firstly generates the public and secret key by executing the keygen algorithm (KeyGen(1 ) → ( , )). Then, the unique tag (metadata) for each block of input file is computed based on the algebraic signature of the block using the following formula: where [ ] is th block of file , ID is unique identity of the file, is the logical number of file in the DCT table, and indicates the version of data block. Also, the DO computes = (ID ‖ ‖ ‖ ) for each data block to prevent the replay attack. When all of the tags are generated, the DO outsources the data blocks along with the considering tags to Challenge. When the DO decides to check the correctness of the outsourced data, it selects data blocks randomly as a challenge message (chal = { } =1 ) by using pseudorandom permutation [37] keyed with a fresh randomly chosen key in order to prevent the server from anticipating the block indices.
Proof. Upon receiving the challenge message, the cloud computes a linear combination of the blocks ( ) and the aggregate authenticator tags ( ) as a proof message based on the received challenge and the corresponding tags by using Verification. Upon receiving the pair ( , ), the DO uses the algebraic signature of the block tags to verify the correctness of the blocks by using the following formula:

Dynamic Data Operations.
To support dynamic data update, we propose a data structure that is called Divide and Conquer Table (DCT). The DCT prevents the server from using the previous version of the stored data instead of the updated one to pass the verification phase (replay attack). The DCT consists of two components: logical index ( ) and version number ( ). The indicates the original index of data block and the indicates the current version of block on the basis of number of updates. When a data block is updated, the considered in DCT must be incremented by 1. The index of each block in DCT also denotes the physical position of the outsourced data block.
This data structure must be created by the DO before outsourcing a data block to the cloud. The DO is in charge The Scientific World Journal 5 of managing the DCT during update operation. Therefore, by increasing the size of file, a huge computation overhead is imposed on the owner side. For example, to insert a new data block after the th block, the data owner must shift − blocks, which waste the time and impose additional computation overhead. To overcome this issue, we reduce the size of the DCT by dividing it into data structures in which each of them is able to store ⌈ / ⌉ of the data blocks. As a result, when the DO decides to insert a new block after the th block, the data owner only needs to shift the ⌈ / ⌉ − data block. The experimental results show that the proposed data structure is able to support the large scale data efficiently. In the rest of this section, we discuss how our scheme performs dynamic data operations, such as modify, insert, delete, and append.
Data Modification. One of the important requirements of remote data auditing techniques is to support the data modification operation in which the DO has capability to replace the specified blocks with new ones. Suppose that the DO wants to modify the th block of the file The DO executes the modification algorithm to perform the following modifications: (1) finding the specific DCT that has the required block on the basis of the ranges of DCTs and then updating = + 1; (2) generating a new block tag for modified data block by (5) sending the insert request message to the CSP, which includes (ID , + 1, * [ ], * +1 , * +1 ).
When the CSP receives such message, the new data block and the considering tag are inserted after position in the file. For example, Figure 3 illustrates that the data owner only needs to shift 3 entities down to insert a new block (DCT 2 [3] = {16, 1}) after block [7] in the second table and increases all of range of next tables and the uprange of DCT 2 .
Data Append. The append operation refers to the insertion of a new data block into the end of data blocks. Therefore, the Do only needs to insert a new row to the end of the last DCT without having to shift any entities of the DCTs. For instance, Figure 4 shows that to append a new block, the data owner only needs to create a free row for the last table and increase its uprange (UR 3 = UR 3 + 1).
Data Delete. The delete operation is the opposite of the insert operation in which the th block of the file of ( [ ]) is removed. To achieve this goal, the DO finds the CDT that contains the required block on the basis of the DCT ranges. Then, the block is removed by shifting all of the subsequent blocks (⌈ / ⌉ − ) one position up. The DO sends a request to delete the th block of the file of . As it is shown in Figure 5, to delete a 4th data block ( [4]), the data owner only needs to shift up 1 row ( [5]) and the range of next tables will be reduced along with the uprange of the first table (UR 1 = UR 1 − 1).

Security Analysis
In this section, we evaluate the surety of our remote data auditing construction in term of security and correctness.
In the first step, we analyze the correctness of the verification algorithm. Upon receiving the challenge message ({ } =1 ), the CSP generates a pair ( , ) as a proof message. We extend (8) by using the properties of algebraic signature as follows: When the DO obtains the proof message from the server, it verifies the proof message to ensure the storage correctness by 6 The Scientific World Journal DCT 1 [2] DCT 1 [3] DCT 1 [4] DCT 1 [5] DCT 2 [1] DCT 2 [2] DCT 2 [3] DCT 2 [4] DCT 2 [5] DCT 3 [1] DCT 3 [2] DCT 3 [3] DCT 3 [4] DCT 3 [5] Modifying f [7] 1 ≤ range ≤ 5 6 ≤ range ≤ 10 11 ≤ range ≤ 15   DCT 3 [4] DCT 3 [5] 12 ≤ range ≤ 16 using (9). We rewrite the equation on the basis of the algebraic signature properties to show why it is true: Our scheme relies on the algebraic signature that generates a small entity as a signature for each block and is able to show any modifications in the original block. The algebraic signature also has the capability to verify a large amount of stored data on the distributed storage systems with minimum computation and communication overhead [35]. On the other hand, probability of collision in the algebraic signature is negligible [36]. For example, if the length of signature is 64 bits, the probability of collision is very small (2 −64 ). As a result, the algebraic signature technique is useful for verifying the correctness of outsourced data specially by using the resource restricted devices.

Performance Analysis
In this section, we assess the performance of the proposed remote data auditing method. We also analyze the probability of misbehavior detection of this scheme. We give the computation complexity during the insert, delete, append, and modify operations and compare the results with the state-ofthe-art remote data auditing methods proposed by Yang and Jia [17] and Wang et al. [19].

Probability of Misbehavior Detection.
Our remote data auditing scheme is constructed on the basis of a random sampling strategy to reduce the workload on the server. In the sampling technique, the input file ( ) is divided into several blocks ( ) and a random number of blocks ( ) are used to perform batch processing. We analyse the probability of misbehaviour detection of our scheme based on the block sampling.
The Scientific World Journal DCT 1 [3] DCT 1 [4] DCT 1 [5] DCT 2 [1] DCT 2 [2] DCT 2 [3] DCT 2 [4] DCT 2 [5] DCT 2 [6] DCT 3 [1] DCT 3 [2] DCT 3 [3] DCT 3 [4] DCT 3 [5] DCT 3 [6] 1 ≤ range ≤ 5 6 ≤ range ≤ 11 12 ≤ range ≤ 17 DCT 2 [4] DCT 2 [5] DCT 2 [6] DCT 3 [1] DCT 3 [2] DCT 3 [3] DCT 3 [4] DCT 3 [5] DCT 3 [6] Deleting f [4]  Suppose the CSP modifies blocks out of the outsourced blocks. Then, the probability of corrupted blocks is equal to = / . Let be the number of blocks that the DO asks to verify the outsourced data in the challenge step and let be the number of sectors in each block. Let be a discrete random variable that indicates the number of blocks chosen by the DO that matches the blocks modified by the CSP. We compute the probability that at least one of the blocks picked by the DO matches one of the blocks modified by the server, namely, ( ≥ 1), as follows: On one hand, 8 The Scientific World Journal Therefore, (14), (15) ⇒ ( ≥ 1) ≥ 1 − (1 − ) = 1 − (1 − ) . (16) Since, each of the blocks consists of sectors, such probability on the basis of sector corruption is computed by On the other hand, Therefore, Then, we can conclude that the probability of misbehavior detection is in Suppose the DO divides 1 GB file into 125000 blocks 8 KB and outsources the blocks in the cloud. Figure 6 shows the required number of challenge blocks ( ) that are used to detect the different number of corrupted blocks ( ) when the probability of misbehaviour detection is collected from a set of = {0.7, 0.8, 0.9, 0.99, 0.99999}. For example, if the server modifies 0.1 of the outsourced blocks ( ), the DO needs to randomly select 98 block as a challenge to achieve of at least 0.99999. As it is clear, by increasing the number of corrupted blocks, the least number of challenge blocks is required to achieve such a probability of detection. Figure 7 illustrates the number of challenge blocks when the probability of misbehavior detection is between 0.5 and 1 with variable rate of data corruption. For example, if the server modifies 0.01% out of the outsourced blocks, the DO needs to randomly select 520 data blocks as a challenge for detecting the corrupted blocks with probability of 0.9899. It also can be seen that when the rate of corrupted blocks is more than 0.3%, the minimum numbers of challenge blocks are used to audit the outsourced data. Table 1 shows a comparison of our scheme and state-of-the-art remote data

Evaluation and Experimental Results.
Computation delete Computation append auditing protocols based on the communication and computation overhead through dynamic data update, where is the number of blocks, is the number of sectors of a block, indicates the number of challenge blocks in each auditing query, and indicates the number of the DCTs.
From the table, we can find that the Wang et al. method [19] has the maximum computation overhead during dynamic data update. In the Yang scheme [17], to insert a block after or delete a specific block ( [ ]), the verifier must shift ( − ) entities in the data structure. Therefore, the computation overhead of such method during insert and delete operations is ( ). We improve our auditing scheme by designing a new data structure (DCT) to reduce the computation overhead. As mentioned earlier in Section 4.2, the verifier only needs to shift ( / − ) blocks that incurs ( / ) computation overhead on the verifier. It is important to mention that to find a block ( [ ]) in DCT structure, the verifier only needs to divide the location of block into and find the appropriate DCT that incurs negligible overhead on verifier.
The first step to perform insert, delete, append, and modify operations is to identify that the ith data block of the file is going to be a part of which DCTs. The auditor is able to find the th data block by computing the quotient of a division of the requested block index ( ) by the number of data block in each DCT structure ( ). Such quotient shows the DCT number and the remaining of the division shows the position of block in the found DCT. To insert a new data block after th data block or delete the th data block, the auditor has to find the considered DCT and the position of the block in it ( ) and then moves forward or backward the remaining blocks of the DCT ( / − ). DCTs and modifying the content. Finally, to append to an operation, the auditor must inset a new data block after the last data block of the last DCT which imposes ( ) as a computation cost. We set up our own Eucalyptus private infrastructure as a service (IaaS) cloud in order to conduct this experiment using the existing IT infrastructure of center for mobile cloud computing (C4MCC). Eucalyptus is an acronym for "Elastic Utility Computing Architecture for Linking Your Programs to Useful Systems" and is actually a Linux-based opensource software architecture that can be installed without modification on all major Linux operating systems such as RHEL, Centos, Ubuntu, and Debian. The reason why we choose eucalyptus is due to its compatibility with Amazon AWS APIs [38] which means that we can use Eucalyptus commands to manage Amazon or Eucalyptus instances and move freely between an Eucalyptus private cloud and the Amazon Public cloud making it a hybrid cloud. Secondly, Eucalyptus cloud computing architecture is highly scalable because of its distributed nature and is flexible enough to support businesses of any size. Thirdly, it allows you to make your apps in-house on Eucalyptus and then migrate them to AWS; however, it was designed initially at the University of California, Santa Barbara, to support high performance computing (HPC) research [39]. The main components having their own Web-service interface that comprises our Eucalyptus installation are as follows.
(1) Cloud controller (CLC) is actually the entry point into the cloud for administrators, managers, developers, and end-users and is accountable for satisfying the request of node managers. CLC is also responsible for making and implementing high level scheduling decisions with the help of cluster controllers.
(2) Cluster controller (CC) generally executes on a computer system that has network connectivity to the systems running node controllers (NCs) and to the machine running the CLC. It actually manages a number of VMs and schedules their execution on particular NCs.
(3) Node controller (NC) is executed on every system that is selected for hosting VM instances. It manages the life cycle of instances by making interaction with the OS and the hypervisor running on the same system and the CC.
(4) Storage controller (SC) essentially implements blockaccessed network storage such as EBS (Amazon Elastic Block Storage). Subsequently, it has the ability to send disk traffic across the local network to a remote storage site.
(5) Walrus permits different users to store persistent data. It set access control policies for users to allow certain operations such as delete and create. Its interface is, however, compatible with Amazon's S3 to store and access both the virtual machine images and user data. It is actually a file-level storage system while essentially representing a block-level storage system.
We calculated the signature on the basis of defining multiplication in (2 ) as polynomial multiplication modulo a generator polynomial. The multiplication by the unknown X is carried out by a left shifting and XORing with a parameter corresponding to the generator polynomial. As a result, a can be identified with the unknown so that multiplication by includes a left shift operation followed by a conditional XOR. Broder [40] proposed a technique to perform several shift operations in one time, by creating a table consisting of a number of decisions that are used as the XOR-operand. In this simulation, we assume that the length of a bit string ( ) is 16 bits and the length of each block is 8 KB. We also divide each of the blocks into equal bit strings to compute the algebraic signature of each block.
We conduct the experiments for updating an outsourced file ( ) with length 1 GB, including 125,000 data blocks, and demonstrate the efficiency of the proposed scheme in Figure 8, where the numbers of updated (inserted or deleted) blocks are increasing from 100 to 1000 with intervals of 8. To insert or delete a block in the Wang scheme, the auditor needs  to find the position of the block ( ) in the MHT tree. Moreover, inserting or deleting a block needs to recalculate the hash of the root each time that incurs the huge computation overhead on the auditor. Similarly, in the Yang method, after finding the position of the block ( ), as a precondition, the auditor has to shift the remaining ( − ) blocks for every insert or delete operations. Subsequently, repeating this process multiple (100-1000) times results in a significant computation overhead on the auditor. The proposed method considers 10 DCTs with size 12500 instead of a single array with size 125000 in the Yang scheme. Consequently, the number of shifts reduced in our method results in the minimum computation overhead on the client side. Figure 8 shows the performance in terms of computation cost under different number of update (insert or delete) operations. The analysis of the results shows the efficiency of our scheme.
We also show the impact of the size of the file on the computation overhead of the auditor by Figure 9, where the DO updates the different size of outsource data by inserting or deleting 100 blocks in random positions, respectively, from 1 to 10 GB file. The computation overhead of the Wang method is dramatically increasing from 0.8 to approximately 2.3 by increasing the size of file because the auditor encounters a huge number of data block in MHT. Similarly, in the Yang scheme, when the size of input file is enhancing from 1 GB to 10 GB with the same size of data block (8 kB), a number of data blocks are also increasing. Consequently, the auditor requires shifting a huge number of blocks to insert or delete a data block. As it is shown in Figure 9, our method incurs the minimum overhead on the auditor (maximum 0.2 sec when the size of file is 10 GB) due to using 10 DCTs instead of one while applying the algebraic signature. Therefore, our method can be applicable for auditing large scale files dynamically.   Figures 8 and 9 clearly show the performance and efficiency of our scheme in terms of computation overhead. The comparative analysis shows that our scheme is more efficient than Wang and Yang schemes, respectively.

Conclusion
In this paper, we present an efficient remote data auditing scheme to ensure the data storage security in cloud computing. To achieve this goal, we employed algebraic signature properties that empower our scheme to verify the integrity of outsourced data and reduce the computation overhead on the client and server side of the cloud. We also design a new data structure, namely, divide and conquer table, to support dynamic data update, including insert, delete, append, and modify operations. The divide and conquer table also allows the verifier to audit the large scale data and perform a large number of insert and delete operations with minimum computation overhead on the verifier and server.
The security and performance analysis shows the efficiency and provability of our scheme.
As a part of future work, we extend our scheme to find the optimized number of divisions in the divide and conquer table. We also improve our scheme to be applicable for distributed cloud servers.