Audit Outsourced Data in Internet of Things

With the increase in network transmission rates, the Internet of -ings (IoT) has gradually become a trend. Users can upload the data generated by the device to the cloud database to save local storage space, thereby reducing local storage costs. Because uploading data to the cloud loses physical control of the data, an audit is required. Traditional audit protocols are not completely suitable for lightweight devices in the IoT. -is paper proposes a new type of audit protocol suitable for lightweight devices with weak computing power.-is protocol transfers part of the computation of data tags to a cloud storage provider (CSP) with strong computing power, thereby reducing the introduction of computing entities. Our scheme supports the dynamic operation of data and guarantees the effectiveness of challenge response by blockchain. Compared with existing schemes, our scheme is more secure and effective.


Introduction
Due to the large-scale application of 5G, the Internet of ings has developed rapidly. At the same time, many emerging technologies have emerged, such as cloud storage [1,2]. Because of a series of advantages such as the scalability and lower upfront cost of cloud storage, more and more entities choose to store data in the cloud. With cloud storage, users are free from the physical limitations of local devices and can store and share their data anytime, anywhere. So far, many studies have focused on the cloud [3][4][5][6]. Gartner's latest cloud computing market tracking data shows that the global cloud computing infrastructure service market continued to grow rapidly in 2019, with a year-on-year growth of 37.3% [7].
Although cloud storage brings great convenience, risks follow. e main risk of cloud storage is that users upload their data to the cloud server and lose physical control of the data. At the same time, the cloud service provider is not completely reliable. ey may suffer downtime or attack and passively lose users' data. To make matters worse, CSP may actively discard users' infrequently accessed data to reduce its own operating costs. Frequent cloud security incidents have aggravated people's concerns about cloud security and hindered the development of cloud storage [8].
In order to ensure the integrity of data, promptly detect dishonest behaviors of CSP, and urge CSP to provide highquality storage services, cloud security audits have gradually become a hot issue for cloud storage. rough auditing, users can know whether the data stored in the cloud has been damaged and at the same time can effectively supervise the services provided by the CSP.
Traditional cryptographic schemes cannot be directly used for auditing. It is impractical to download the data in cloud storage directly for verification because of the excessive overhead. With the deepening of research, proof of retrievability (PoR) and provable data possession (PDP) models have been proposed one after another. e data owner sets a small data tag for each data block. During inspection, by concentrating the selected data block and its tag in a small piece of evidence, it can be detected with a high probability whether the data block has suffered damage. In order to reduce the burden on users, a trusted third party is introduced to perform audits instead of users, thus realizing public auditing.
Existing public audit schemes have been able to support the audit of dynamically updated data, by constructing hash tables, Merkle hash tree, and other data structures and at the same time with the help of auxiliary authentication information. Although good progress has been made, there are still many problems. We have conducted specific research and found the following deficiencies. First of all, in the data preprocessing stage, users need to calculate a large number of data block tags on their own, which is not friendly to devices with weak computing power. Secondly, TPA needs to store some additional information. In the case of multiple users, the storage capacity will increase. e current public audit relies on the assumption that the TPA is honest. Once the TPA colludes with the CSP, the entire public audit becomes invalid. In addition, if the user wants to limit the dishonest behavior of TPA by checking the audit log of TPA, the entire audit protocol will still be invalid due to the existence of collusion attacks, because TPA and CSP can manually select intact data blocks for auditing, which is not detected on the log.
After reviewing the shortcomings of previous schemes, on the basis of our previous research [9], we proposed an efficient outsourcing audit scheme for lightweight devices in the IoT environment. Our contributions are as follows: (1) We have conducted research on the outsourcing audit of lightweight devices in the IoT environment.
We have designed an audit protocol suitable for lightweight devices. And it proved the correctness of the proposed scheme under the PDP model. (2) Different from the practice of outsourcing computing tasks to TPA in most schemes, considering that the final data is stored in CSP, CSP has powerful computing capabilities, so we directly outsource computing tasks to CSP to avoid waste of resources. (3) Different from the TPA selected challenge block adopted by most schemes, we generate challenge blocks based on the Ethereum network, effectively avoiding collusion attacks.

Related Work
In recent years, with the increasing maturity of cloud storage technology, more and more users choose to outsource data to the cloud. erefore, cloud security auditing has become particularly important. Blum et al. [10] conducted research on cloud auditing for the first time. Juels and Kaliski proposed proof of retrieval (PoR) [11], which is mainly used for static archiving of large files. Shacham and Waters [12] designed an improved PoR scheme for [11]. ey used a publicly verifiable homomorphic verifier constructed by BLS signatures [13] to concentrate the proofs into small tags. Unfortunately, it still does not support dynamic operations. Ateniese et al. [14] defined a provable data possession (PDP) model based on a homomorphic linear authenticator constructed with RSA. is is a probabilistic verification model that allows auditors to achieve integrity verification without retrieving the entire file from the cloud server. eir followup work [15] still does not support fully dynamic operations.
Erway et al. [16] extended the PDP model and proposed a model for dynamic provable data possession (DPDP), which uses a level-based authentication skip list to perform provable updates to stored data. Nevertheless, computation and communication costs are still very expensive. Wang et al. [17] considered dynamic data storage in a distributed situation, and the challenge-response protocol can both determine the correctness of the data and find possible errors. Similar to [15], they only consider partial support for dynamic operations. Later, they proposed an audit scheme that supports privacy protection [18]. en, in [19], they improved the previous PDP model by manipulating the Merkle hash tree for block tag authentication. Zhu et al. [20] discussed multicloud storage and proposed a collaborative PDP scheme that can effectively support data migration. Yang and Jia [21] proposed an audit scheme that supports dynamic attributes and privacy protection attributes. Armknecht et al. [22] proposed a privately auditable PoR scheme that can be delegated audit while preventing malicious clients, auditors, and cloud servers from colluding attacks. Liu et al. [23] pointed out that MHT itself is not sufficient to verify the block index, which may lead to replacement attacks.
ey provide a top-down multireplica MHT data audit scheme for dynamic big data storage in the cloud. Guan et al. [24] proposed the first cloud storage audit scheme based on indistinguishable confusion. Wang et al. [25] designed a novel identity-based proxy-oriented remote data integrity auditing scheme by introducing a proxy to process data for users. Sookhak et al. [26] presented a new data structure named divide and conquer table, which can be used for the auditing of big data storage. In [27], a Merkle hash tree based trusted third-party auditing of big data was proposed. Shen et al. [28] proposed a remote data integrity auditing scheme based on sensitive information hiding. In their scheme, sensitive information is protected while other information is not affected. Rao et al. [29] presented a new approach, based on batch-leaves-authenticated Merkle hash tree, to batch-verify multiple leaf nodes and their own indexes altogether.

Bilinear Pairing.
Bilinear pairing [30] refers to the corresponding linear mapping relationship between two cyclic groups. Since the set formed by all points on the elliptic curve will form a group relationship in algebraic geometry, the operation of the bilinear pairing function can be applied to the elliptic curve.

Discrete Logarithm Assumption. Diffie and Hellman
introduced the concept of public key cryptography [31] in 1976. Since then, the security of most cryptosystems depends on the discrete logarithm assumption. Odlyzko [32] gave a detailed explanation of this. Once the assumptions become easy to solve, the corresponding systems will be broken. e security of the scheme depends on the following difficult problem: CDH problem: computational Diffie-Hellman problem is that, given g, g x , g y ∈ G, it is difficult to calculate g xy for unknown x, y ∈ Z * q DLP problem: discrete logarithm problem is that,

BLS Homomorphic
Signature. BLS signature [13] is a short signature scheme constructed by Boneh et al. using bilinear pairing. Due to the short signature length under the same security strength, it has a huge advantage in communication and storage overhead.
BLS signature scheme consists of three algorithms, where H(·): 0, 1 { } * ⟶ G is the anticollision hash function: (1) KeyGen: the algorithm selects a random element α from the finite field Z q and then calculates v ⟵ g α as a public key. A private and public key pair (α, v) can be obtained.

Merkle Hash Tree with Relative Nodes.
Merkle hash tree was proposed by Merkle [33]. It is essentially a tree-like data structure, which is widely used in data integrity verification. It is a process of constantly repeating the hash and finally forming a root hash. If there is a way to confirm the correctness of the root hash, then to confirm the integrity of a certain data block of the file only needs to complete the hash calculation process from several leaf nodes up to the root node of the tree. In this paper, in order to ensure the correct location of the data block and prevent data block replacement attacks, we have introduced auxiliary authentication information into the hash tree to achieve the binding of the data block and the location information: Let Num be the number of nodes from the root node to the target node, that is, Num root � 1. In order to locate the node direction, we introduce Dir to represent the node direction. If Dir � 0, it means the node belongs to the left child node. If Dir � 1, it means the node belongs to the right child node. At the same time, we stipulate that Dir � 2 at the root node. [34] is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. It is a series of data blocks associated with cryptographic methods. It is essentially a decentralized database. Each data block contains a batch of Bitcoin network transaction information to verify the validity of the information. A block is a storage unit, which records all the communication information of each block node within a certain period of time. Each block is linked by random hashing. e next block contains the hash value of the previous block. With the expansion of information exchange, a block and a block are successively connected, and the result is called a blockchain. Due to the characteristic of decentralization, it has received more and more extensive research [35].

Problem Statement
e local storage space of IoT devices is limited, and the successively generated data may exceed the storage space of the device itself. erefore, the data generated by the device can be uploaded to the CSP through the network to reduce the cost of local storage hardware. Although the existing audit protocol has achieved rapid development, there are still a lot of problems. e first is the problem of computing overhead. IoT devices do not need very powerful computing power to complete the calculation of data tags. e second problem is the effectiveness of auditing, and the invalidation of audit results caused by collusion attacks should be fully avoided. Finally, there is the issue of the timeliness of the audit. It should be ensured that the third party can perform the audit in a timely manner and record the results. erefore, in response to the above problems, we have designed a more efficient and safe audit scheme.

System Model
In our system model, there are 3 different participants, namely, user, CSP, and TPA. e characteristics of the 3 participants are as follows: User: owner of the local device of IoT who uploads the data to CSP and entrusts maintenance, management, and calculation to CSP, meanwhile entrusting data audit service to TPA CSP: composed of many cloud servers, with strong computing and storage resources TPA: help users to audit data stored on cloud servers 6. The Proposed Scheme 6.1. Notations and Definitions. Table 1 shows all the notations and their corresponding definitions used in this paper.

Details.
In this section, we will describe the specific content of the scheme. In most existing schemes, the user uploads a series of information (such as the data block with its tags) to CSP after all the data is preprocessed locally. For Security and Communication Networks the IoT environment, considering the storage and computation overhead of the local device, we upload the encrypted data to CSP firstly. en, we perform a series of calculations such as data tag and MHT generation with the assistance of CSP, which greatly reduces the overhead of devices. e specific details are as follows.
6.2.1. Data Outsourcing Stage. Before data uploading, global parameter generation, key generation, and data split are required. After successful uploading, DataPreproc, TagGen, and TagVerifywill be executed next, specifically: GlobalPmGen: user selects a parameter λ for the IoT devices according to the required security strength s.t. the large prime number q satisfies log 2 q ≤ λ, selects the multiplicative cyclic group G whose generator is g in Z q , and selects the collision-resistant one-way hash function H(·): 0, 1 { } * ⟶ G. KeyGen: user selects α u in Z q as the private key for signature and calculates v � g α u as the corresponding public key. DataSplit: user divides the data into n blocks, namely, D � (d 1 , d 2 , . . . , d n ), i � 1, 2, . . . , n, and then, uploads them to CSP. CSP will be warned and required to recalculate. If it is passed, H(d i ) α u will be calculated, and then the tag σ i , i � 1, 2, . . . , n will be generated by (1) according to Ξ. erefore, the user can get the homomorphic tag set Φ � σ i | i � 1, 2, . . . , n . en, the user signs the root hash H R : σ H R � (H R ) α u with α u . Finally, the user sends Φ, σ H R to CSP: TagVerify: after receiving Φ, σ H R , CSP verifies the correctness of σ H R by After passing (2), CSP verifies the correctness of each homomorphic tag by If (3) fails, CSP refuses to accept and then asks the user to resend it; if it holds, CSP returns the proof of storage to the user. After the user receives it from CSP, they delete the local data and only keep the key information.

TPA Audit Stage.
In order to reduce the burden on users, it is a common choice to delegate audit tasks to TPA. Here, TPA can be any authorized entity. erefore, the public audit was completed. e TPA audit stage includes the following three algorithms. ChallengeGen: the system automatically triggers the audit through the Ethereum smart contract. Specifically, if the last ℓ digit (ℓ ∈ N) of the current block hash meets the set conditions, the smart contract performs calculations and sequentially obtains the challenge data block number with its corresponding coefficient, where ℓ defines the TPA challenge request period, and at the same time, to ensure the random selection of the challenge block, we generate the challenge request data block tag based on the current hash value and the user ID number. Assuming that s data block indexes I i and their corresponding coefficients r i ∈ Z q are selected finally, then, they will be combined into a challenge request Cl: (I i , r i ) 1≤i≤s and sent to CSP. ProofGen: according to Cl, CSP queries the user's data and corresponding tag stored on the server and then generates the integrity storage proof P, which is composed of R, σ, MHT, σ H R , where If (2) does not hold, it indicates that the data has been corrupted and returns to "refuse"; if (2) holds, the TPA continues to use (6) for judgment. If (6) holds, it returns to "accept," indicating that the data is stored completely: After verification, TPA sends 0eth to the user's Ethereum contract account and notes the block number of the challenge response with the audit result.

User Audit Stage.
In order to check whether TPA has completed the audit task honestly, the user needs to check TPA's audit logs within a data update cycle (considering the user's cost, this cycle should be much larger than the TPA irregular audit). e user checks the historical transaction information from the Ethereum account and extracts x pieces of challenge information with their corresponding random numbers. Firstly, the user randomly extracts the challenge data block information and calculates whether the selected challenge data block matches the data block randomly generated in the blockchain according to the predetermined algorithm. If not, the TPA is fraudulent. If the detection result is that the TPA honestly selected the random block, then check whether the equation is true through the bilinear pairing operation. If it is true, TPA has completed the user's audit task honestly. If it fails, TPA is fraudulent.

Dynamic Update Stage
(1) Data Block Insertion. Suppose that a data block d * needs to be inserted after the k th data block d k . First of all, the user sends d * to CSP. en, CSP generates auxiliary calculation information Ξ * corresponding to it. Next, the user calculates the homomorphic tag σ * of the data blocks according to the returned Ξ * . Finally, the user constructs an update request Update � (Insert, k * , σ * ) and sends it to the client.
After receiving the request, CSP performs the following operations: (1) Finds the corresponding position and inserts this data; verifies and stores the signature σ * .  ), Update � (modify, k, * , σ * ) . If passed, TPA responds to the user with "accept." en, the user signs H R ′ and sends σ H R ′ to CSP. Finally, the user can delete d * , σ d * locally.

Batch Audit.
Due to the aggregation characteristics of bilinear pairs in our scheme, TPA can audit the data of multiple users at one time, and CSP can also process multiple verification requests at the same time, which will reduce a certain cost.

Evaluation
In this section, we will show the correctness analysis, security analysis, performance analysis of our scheme.

Correctness Analysis.
For each challenge and its corresponding proof, each entity can ensure that the data is stored correctly by verifying the equation. We will prove the correctness of these equations.
After receiving the auxiliary calculation information, the data block tag can be generated by the user based on it, which greatly simplifies the computation overhead on the user side. e correctness of equation (1) can be proved as follows: e correctness of equation (2) can be proved as follows: In the data outsourcing stage, the correctness of equation (3) can be proved as follows: en, the proof (6) can be proved as follows: Due to the aggregation feature of bilinear pairs, all the proof can be verified at once, which greatly reduces the burden of the audit.

Security Analysis
Lemma 1. CSP cheats during the auxiliary information generation; it will be checked out with a nonnegligible probability.
Proof. CSP cheats during the process of generating auxiliary information. At this time, there are two situations: (1) after data uploading, in order to save storage space, CSP deliberately discards part of data and selects saved data blocks of other users to perform auxiliary calculation information Ξ so as to trick the user into real tag generation of the fake data block to pass the follow-up audit. Let a be the ratio of error hashes, b the probability that the user detects an error, and x the number of hashes that the user needs to detect, thenb � 1 − (1 − a) x . So, the user only needs to calculate 460 hashes to detect them with a 99% probability. (2) CSP performs an honest calculation of the user's MHT but forges the calculation of v d i ; then in the follow-up audit, equation (3) cannot be passed because of In both situations, it is impossible for CSP to pass the verification. □ Lemma 2. CSP loses part of the data; it cannot respond to the audit challenge successfully.
Proof. Without loss of generality, suppose that d 11 in the challenged data block has been lost; there are three situations at this time: (1) CSP hopes to pass the audit by forging this data, that is, forging d 11 as d 11 ′ . en, due to the change of node hash, the root hash changes accordingly, equation (2) for root node signature verification is invalid, finally, the challenge response fails, and the CSP cannot pass the audit.
(2) CSP hopes to replace d 11 with other data blocks d 21 of the user to participate in the challenge response; then the auxiliary authentication information changes, the root hash changes accordingly, equation (2) does not hold, and the challenge response fails. (3) CSP retains the hash value of it, equation (2) can be passed at this time, but the hash is irreversible, and CSP cannot reversely obtain d 11 , so it cannot pass the verification of (6). erefore, if the CSP loses part of the data, it will not be able to successfully cope with the challenge. Proof. CSP and TPA conduct a collusion attack; there are two situations at this time. (1) TPA deliberately selects data blocks that are intact in CSP to challenge. (2) TPA forged the correctness of audit results. Aiming at 1, our scheme uses the Ethereum blockchain to generate challenge request automatically, and the selected challenge blocks with their corresponding coefficients are randomly generated. erefore, it is impossible to manually select the challenge number and its corresponding coefficient. So, TPA cannot help CSP to falsify audit results by deliberately selecting complete blocks. For 2, due to the existence of user sampling audit, if TPA falsifies the audit results, it cannot pass it. erefore, our scheme can avoid collusion attacks. Proof. TPA does not perform the audit within the prescribed time limit but audits all previous data at one time before the user audit, which will make the audit invalid, because the missing data blocks cannot be detected in time. Due to the introduction of the blockchain, TPA needs to send 0eth to the user's contract account after each audit and mark the challenge-response data block, so that each audit process is public and verifiable. And because of the existence of user audit, delayed audits will be detected in time. erefore, it can be detected whether the audit was performed by the TPA in time. □ 7.3. Performance Analysis. In this section, we analyse the computation and memory overhead, and we make a brief comparison with Rao's [29].
is scheme mainly involves four basic cryptographic operations: hash operation, modular exponentiation operation, modular multiplication operation, and bilinear pair operation, which are represented by symbols T H , T exp , T mul , and T bp , respectively. We concluded these basic operations at each stage according to the scheme in Table 2.
We compared the computation overhead of user, CSP, and TPA at various stages in Table 2. It can be seen intuitively that our computation overhead on the user side has a great advantage. Because we outsource part of the calculation to CSP to reduce the user's computation overhead, this is especially friendly to lightweight IoT devices.
Besides, we comprehensively evaluate our performance through experiments. e experimental environment settings are as follows: the laptop hardware configuration information is 3.30 GHz Core i5-4590 CPU, 4 GB DDR3-1600 RAM. Each algorithm uses the JAVA programming language and uses the currently popular JAVA cryptography library JPBC 2.0 which integrates a large number of commonly used cryptography algorithms. At the same time, we choose the D-type curve to construct the group G, and the group element size is set to 21 bytes, the large prime q is 1024 bits, and the hash operation is SHA-256. e outsourcing data set is set to 1 GB, and the data block size is set to 1 kB to 256 kB, respectively.
Firstly, we measured the computation overhead of the user side during the data outsourcing stage. Figure 1 shows the computation overhead of the user side changes with the block size in the data outsourcing stage. e abscissa represents the data block size, and the ordinate represents the user-side computation time. We can intuitively see that, as the data block size increases, that is, the number of blocks decreases, the computation overhead of both schemes decreases. e computation time difference reaches the maximum when the data block size is the smallest and reaches the minimum when the data block size is the largest.
is is because the more the number of blocks, the more tags that need to be calculated, the greater the depth of the constructed MHT, and the greater the overhead. at is to say, in the case of more data blocks, the computational overhead of our solution has a greater advantage. is advantage comes from the fact that we transfer most of the calculations to CSP with higher computational performance. erefore, when data outsourcing, our scheme is more friendly to lightweight devices with weak storage and computing capabilities. en, we measured the computation overhead of the user side during the dynamic update stage. Figure 2 shows the computation overhead of the user side changes with the block size in the dynamic update stage. e abscissa represents the data block size, and the ordinate represents the user-side computation time. We can see intuitively that, at this time, the computation overhead of our scheme is a small constant value and does not change with the change of the block size. e computation overhead of Rao's [29] decreases as the block size increases. e main reason is that in the dynamic update stage, data block insertion, deletion, or modification operations will cause changes in the structure of the Merkle hash tree. At this time, a large number of node information in the hash tree will change accordingly, which will cause a lot of recalculation in the whole process. erefore, as the number of blocks increases, the number of recalculations increases. Our scheme fully demonstrates the advantages of computation overhead, and users only need very little verification calculation.
Finally, we measured the communication overhead of the user side during the data outsourcing stage. We can intuitively see from Figure 3 that, as the size of the data block increases, that is, the number of data blocks decreases, the communication overhead of both schemes gradually decreases and tends to 1 GB. At this stage, as the data is finally uploaded to the CSP, the communication overhead of the 1 GB data is indispensable. In addition, it also involves the upload of some auxiliary information, such as data tags and Merkle hash tree. Our scheme reduces the process of uploading part of the auxiliary information to the TPA, thereby reducing part of the communication overhead, so our scheme has a slight advantage at this stage. We believe that authorized third party should retain very little user data information, and at the same time, any authorized entity can perform audits on behalf of users when they only know the user's public key information, realizing a true public audit.
In summary, due to the outsourcing of part of the calculation, our scheme significantly reduces the computation overhead on the user side. Since the data is ultimately stored in the CSP, selecting CSP instead of TPA for some calculations will reduce overhead to a certain extent.