Blockchain-Based Proof of Retrievability Scheme

In the internet of things, user information is usually collected by all kinds of smart devices.*e collected user information is stored in the cloud storage, and there is a risk of information leakage. In order to protect the security and the privacy of user information, the user and cloud provider will periodically execute a protocol called proof of retrievability scheme. A proof of retrievability scheme ensures the security of the data by generating proof to convince the user that the cloud provider does correctly store the user information. In this paper, we construct a proof of retrievability scheme using the blockchain technology. Using the advantage that the stored data cannot be tampered with in blockchain, this ensures the integrity of the data. Specifically, some related definitions, security models, and a blockchain-based construction of a proof of retrievability scheme are given. *en the validity and security of the scheme are proved later. As a result, user information can be protected by our scheme.


Background.
With the information systems coming into our life, there are many user private information appliances such as surveillance cameras, smartwatches, smart door locks, and the online supermarket. ey provide a lot of convenience for our life. However, these providers will collect user information and store it in the cloud where new technologies are widely used [1][2][3][4][5][6][7]. Due to the vulnerability of the cloud, user information could be attacked by hackers in information systems and can be easily stolen if the cloud storage provider is compromised. Among the problems and challenges of cloud storage [8][9][10], only the problem of how to ensure the security and integrity of the information is considered in the paper. In order to solve it, three kinds of methods are used [11]: proof of ownership (PoW), provable data possession (PDP), and proof of retrievability (PoR). We focus on the PoR and for the state-of-the-art of PoR, the reader is referred to [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25].
Generally, the schemes of PoR are under different settings and security models. On the one hand, some schemes [11][12][13][14] are for static data. Some schemes [15][16][17] discussed the multiserver setting. In these schemes, the client can identify machines and recover the data from the others by using the audit mechanism. Other schemes [18][19][20][21][22][23][24][25] are for dynamic data. On the other hand, works in [18][19][20][21] are about security. e authors of [22][23][24] researched on memory checking and study how to authenticate remotely stored dynamic data. e scheme in [25] is for the multiserver and dynamic data setting.
Recently, blockchain is used to eliminate a trusted third party in many protocols [26]. However, it is still unknown how to utilize blockchain in PoR schemes, which is also a new challenge in constructing a PoR scheme.

Motivation and Contribution.
e concept of blockchain was first proposed in 2008 in "Bitcoin: a peer-to-peer electronic cash system" [27] published by the cryptography mailing group by a scholar known by the pseudonym "Satoshi Nakamoto." e verification, bookkeeping, storage, maintenance, and transmission of the data in blockchain are all based on the distributed system structure, and the trust relationship between distributed nodes is established by the pure mathematical method instead of the central mechanism. us, a decentralized and reliable distributed system can be formed. e goal of blockchain is to provide trusty for transactions between untrusted entities, without the need for a trusted third party. At present, many institutions have combined the industry conditions with the characteristics of blockchain and made beneficial attempts in many industries, including payment, Internet of things, credit investigation, transaction settlement and clearing, crowdfunding, equity transaction, audit, supply chain, digital asset management, notarization, and other fields [28][29][30][31][32][33]. We consider using blockchain technology to solve the problem of the trusted third party in the verification of the PoR scheme.
In this paper, we first define a security model for the blockchain-based proof of retrievability by modifying the model in [14,34,35]. Secondly, we propose the first concrete PoR scheme based on blockchain. Finally, we demonstrate that the proposed scheme is provably secure in the new model.

Organization.
e rest of the paper is organized as follows. Preliminaries are given in Section 2. In Section 3, we formally define the framework and security model for blockchain-based PoR schemes. en a concrete construction of a blockchain-based scheme is presented in Section 4. We analyze the security of the proposed scheme in Section 5. Finally, conclusions are made in Section 6.

Preliminaries
In this section, some notions are introduced such as hash function, Merkle tree, blockchain, and bilinear pairing.

Hash Function.
e hash function H is used to map data x of an arbitrary length (input) to data y � H(x) of fixed length (output). y is called the hash of x. Many Hash functions [36] are widely publicly available and can be selected based on the context.
is transformation is a compression mapping, which has the following properties: (i) e space of the hash value is usually much smaller than the space of the input. (ii) Different inputs may hash into the same output, but it is hard to find two different inputs x, x ′ such that H(x) � H(x ′ ).
(iii) It is infeasible to determine the input value x from the hash value y.
Assumption 1 (hash function preimage assumption). Given y � H(x), it is hard to compute x.

Merkle Tree.
Merkle tree, also known as a Hash tree, as the name implies, is a tree that stores hash values. A leaf node of a Merkle tree is attached to the hash value for a data block. A nonleaf node is attached to the cryptographic hash of its corresponding child nodes. Figure 1 presents a simple example of a Merkle tree with 4 pieces of data. Let f be a hash function and X � x 0 , x 1 , x 2 , x 3 denotes the set of data used to generate the Merkle tree. A Merkel tree is generated as follows: firstly, for all leaf nodes, y bin(i) � f(x i ) where i � 1, 2, 3, 4 and bin(i) is the binary form of i; secondly, for all inside nodes, the value of the node is f(y l ‖y r ) where y l and y r are the value of left child and right child, respectively. An Merkle tree is valid if and only if the value of each inside node equals to f(y l ‖y r ). As a result, this example outputs the following: y 0 � f y 0,0 ‖y 0,1 , y 1 � f y 1,0 ‖y 1,1 , y � f y 0 ‖y 1 .

(2)
In a Merkle tree, the value of the root node is called the hash of the Merkle tree. For the example in Figure 1, the hash of that tree with data X is In the rest of this paper, we use Merkel(X) to denote the Merkle tree created by the data set X and use H(T) to denote the hash of a Merkle tree T, where H is the underlying hash function. For example, the hash of the Merkle tree created by the data set X can be denoted by H(Merkel(X)).

Blockchain.
Within a blockchain, the hash function is used to determine the state of the blockchain and Figure 2 shows the structure of blockchain which can be viewed as a linked list of blocks. Every block has four basic objects: the hash of the previous block, the timestamp of generation, the random number of security, and the hash of a Merkle tree. Usually, the corresponding Merkel tree is linked with the block too. Two neighbor blocks are linked by a hash pointer that points from the previous block and thus it creates a chain of connected blocks, hence the name blockchain. By linking blocks in this manner, the ordered hashes of all the n blocks represent the entire state of the blockchain, namely, where f is a hash function. A blockchain is valid if f(Block(i − 1)) equal to the value of the field hash of Block(i − 1) in the structure of the block Block(i), for all 1 ≤ i ≤ n.
To utilize blockchain for a data set X (see the example in Subsection 2.2), a corresponding Merkel tree T will be constructed by the data set X. en a new block B denoted by B(X) can be generated with the help of a timestamp provider. Adding more parameters, we use B(X; ts) to denote a block where X is the data set to generate the hash of the Merkle tree, ts is the timestamp of the current time, and rand is the random number. Moreover, a blockchain provides the following operations:  Recently, there are issues in maintaining a blockchain, such as generating blocks [37,38] and updating with efficiency [39]. Anyway, to summarize the characteristic of blockchain, we have the following assumption.
Assumption 3 (blockchain assumption). All the state and blocks of blockchain is hard to modify after they were generated.

Bilinear Pairing.
Bilinear pairing is also called bilinear mapping, which was first used to construct tripartite key exchange protocol [40]. It involves three multiplicative cyclic groups G 1 , G 2 , and G T which have a prime order p. Bilinear pairing is a mapping e: G 1 × G 2 ⟶ G T satisfying the following conditions: (1) For any g 1 ∈ G 1 , g 2 ∈ G 2 , and a, b ∈ Z p , it always has e(g a 1 , g b 2 ) � e(g 1 , g 2 ) ab (2) ere exists two elements g 1 ∈ G 1 and g 2 ∈ G 2 such that e(g 1 , Let c 1 , c 2 , c 3 ∈ Z p and g 1 , g 2 , g be the generators of G 1 , G 2 , G T , respectively. ere are two security assumptions related to bilinear pairing. and a randomly selected element T ∈ G T , it is hard to distinguish e(g 1 , g 2 ) c 1 c 2 c 3 from T.

Security Model
3.1. System Setting. Our system has three entities, the user, the cloud storage provider where user information is stored, and a blockchain where several timestamp providers are available to all entities. e structure of the system setting is shown in Figure 3.
(i) e User. e user is the entity who wants to store the data on the cloud storage. Whenever the user wants to check whether the data is correctly stored on the cloud storage, then a request of PoR will be generated and sent to the cloud storage. With the help of blockchain, the user can verify the retrievability of stored data by the proof received from the cloud storage provider. (ii) e Cloud Storage Provider. A Cloud storage provider is an entity who exactly stores the data for the user. Besides, the cloud storage provider generates and sends the proof of retrievability after receiving the request from the user.

Timestamp
Usage. e timestamps are provided to both the user and the cloud storage provider. e existence of the data is guaranteed by timestamp through computing the hash value which is included in the next timestamp. In our scheme, we will modify the traditional timestamp computation. At the end of every proof generation, the timestamp provider proceeds to compute a timestamp on the current time and makes the timestamp published on the blockchain. e timestamp is used to compute the hash value in blockchain by the cloud storage provider.
We benefit the security from the usage of timestamps. On the one hand, running a PoR scheme twice at two different moments would be the PoR for the duration between the two moments. On the other hand, it gives a timeline of PoR records which can be used to analyze the efficiency.

Definition.
ere are five algorithms in the blockchainbased PoR which are described as follows: (i) Keygen: e input of the algorithm is the security parameter, and the output is the public key and private key of the system and the user. (ii) Outsource: In this stage, it inputs the private key and user data M, and outputs a data set Y with n blocks and one tag σ for each block. For the blockchain, also generates new blocks for the data. (iii) RequestChallenge: e user randomly selects a challenge r and sends it to the cloud storage provider.
(iv) ResponseProof: e proof process is an interactive protocol. e input is a public key, the file name and tag of the file and the output is proof for a proof response. (v) VerifyProof: e input is a system parameter and proof, the output of the algorithm is accepted or rejected.
Remark 1. Note that system parameter includes the structure and the state of selected blockchain, as well as another luxury public information such as the hash function implementations and bilinear pairing implementations.

Security Model.
Under the assumptions mentioned in Section 2, a blockchain-based PoR scheme is secure if it satisfies the following two properties.
(1) Correctness. If all the effective proofs generated by the algorithm (KeyGen, outsourcing, Request Challenge, Response Proof, and Verify Proof ) are defined above, the verification algorithm outputs accept, then a blockchain-based PoR scheme is correct. (2) Reasonableness. For reasonableness, if any malicious cloud storage provider can generate proof such that the Verify Proof outputs accept. at is, the user believes that the cloud storage provider can generate the proof only if it correctly stores the user data.
If the probability that an adversary with arbitrary probabilistic polynomial-time wins the game described below is negligible, then a blockchain-based PoR scheme is reasonableness. (1) In the Request Challenge algorithm, the challenger randomly generates a challenge message and sends it to the adversary. (2) e adversary generates a data set first by running an arbitrary algorithm that returns a proof. e proof will be sent to the challenger in the Response Proof algorithm.

High Description.
In this section, we will propose a blockchain-based PoR scheme. To cut costs, the cloud storage provider only needs to generate a Merkle tree for a data set and store the hash of the Merkle tree in the blockchain. e data set can be stored anywhere by the cloud storage provider. When the user requests a challenge of PoR, the cloud storage provider fetches back the Merkle tree and generates a PoR to the user with the help of blockchain.   , ×), and a bilinear pairing e on Z p . e user chooses a nonzero element s ∈ Z p randomly as a private key and computes and publics g s ∈ Z p as a public key.

4.2.2.
Outsource. When a user wants to store a file on the cloud storage, the interactive algorithm is run between them.
(1) Given a data set X � x 1 , x 2 , . . . , x m , the user uses an error correction code to get the encoded data Y. In the case that some blocks Y ′ ⊂ Y may be lost by the cloud storage, an error correction code is used to reconstruct the original data set X [41].

RequestChallenge.
To verify that the provider has stored the data correctly, the user randomly selects an integer 1 ≤ k ≤ n indicating which block should be checked. en k and r k are sent to the provider for requesting challenge.

ResponseProof.
For the cloud storage provider, there are n blocks of data and the k-th block is requested to be checked. Now when the provider receives a request challenge, a PoR can be generated as follows: (1) Randomly select a nonzero element x ∈ Z p .

(5) e timestamp provider verifies that H(Merkel(Σ))
is valid when received it. If it is valid, then a timestamp ts is generated to run AppendBlock(NewBlock(Merkel(Σ), ts)), (5) and is sent back to the cloud storage provider. Otherwise, the algorithm is terminated. (6) e cloud storage provider generates the proof where H 1 � H(Merkel(Σ)) and H 2 � H(ts‖Merkel(Σ)). en proof k is sent back to the user.

VerifyProof.
After receiving the proof, the user does the following operations in order: (1) Send ts to the timestamp provider in the blockchain. If no accept is returned, then the algorithm is terminated with a reject. Remark 3. Firstly, the above operations first check that the blockchain (without Merkle trees) and the Merkle tree related to the last block are valid. Secondly, the existence of the k-th block is checked.

Theorem 1.
e verify process is correct. It means that e σ, g s � e σ s k , g x , holds where σ � σ x k .

Security and Communication Networks 5
Proof. It follows from the property of bilinear pairing that e σ, g s � e σ x k , g s � e σ k , g sx � e σ s k , g x . (8) □ Remark 4. Due to Assumptions 4 and 5, the private key s is still secure even the result of bilinear pairing computation is public.

Reasonableness
Theorem 2. If the cloud storage provider is honest, the final proof must be where H 1 � H(Merkel(Σ)), H 2 � H(ts‖Merkel(Σ)) and ts is the current timestamp.
Proof. If the cloud storage provider is honest, the following points hold true: (i) σ and σ k guarantee that at least the cloud storage provider stores the k-th block which is not revealed to the public in the bilinear pairing computation (See Remark 5.1). (ii) e Merkle tree is created by the cloud storage provider at the time ts was required, and the leaf nodes of the tree are all part of the data set Y. It follows from Assumptions 1 and 2 that these hashes cannot be found without knowing the original data set Y. (iii) ts generated by blockchain is trusted according to Assumption 3. (iv) e consistency of the Merkle tree and timestamp are assured by H 1 and H 2 , respectively.
To sum up, the cloud storage provider must store the data set correctly if VerifyProof return is accepted. Proof. If the cloud server is dishonest, that is, the server modifies, deletes, or tampers with a piece of file without authorization of the user, S cannot compute the value of the root node correctly, so it cannot prove that he has completely stored the data. By verifying the Merkle tree, it will get which piece of file S has been modified finally.
For example, to verify whether the fifth block file has been modified, the following procedure can be followed and the structure as shown in Figure 4: (i) Verify Node 1. Verify that the calculated value of node 1 is correct through the values of node 2 and node 3.
(ii) Verify the Value of Node 3. e receiver computes the value of node 3 through the values of node 6 and node 7 that he has received and verifies whether the calculated value of node 3 is correct. (iii) Compute the Value of Node 6. e receiver computes the value of node 6 through the values of node 12 and node 13 that he has received and verifies whether the calculated value of node 6 is correct. (iv) e receiver computes the value of node 12 from the value of Y 5 and verifies that the calculated value of node 6 is correct.
e correct value can be determined by whether the value of node 6 is consistent. is allows you to track down blocks of files that have been modified.
(i) If a data storage provider uses an old timestamp ts, then it would be rejected in the first step (1) in Section 4.2.5 since the timestamp provider can easily find that such ts is expired. In other words, such ts may be valid in a short time. However, the user could not run this protocol twice in such a short time. (ii) If a data storage provider uses an old proof proof k , then it would be rejected in the seventh step (7) in Section 4.2.5 since g x is attached with a challenge x that is randomly generated by the user. x should be different in two runs of this protocol.
In a word, our protocol is resistant to replay attacks with old timestamps ts or old proof proof k .

Resistance to Collusion
Attack. If we consider the case that the timestamp provider (and by extension the blockchain provider) colludes with the data storage provider, then, in other words, the data storage provider would also play as a timestamp provider in the blockchain context. However, due to the security analysis of blockchain [42,43], such malicious timestamp providers could be detected by the nodes in the blockchain network. Under Assumption 3 (BlockChain assumption), our protocol is resistant to such collusion attacks which can be reduced to an attack in a blockchain context.

Conclusion
In order to protect the security and integrity of user data, we formally defined a novel security model for a blockchainbased PoR scheme and proposed a secure scheme under the defined security model. e properties of the PoR scheme and the characteristics of blockchain, ensure the security and the integrity of data, respectively. Furthermore, we prove the correctness and reasonableness of our scheme. Our scheme makes user data more secure. In our scheme, blockchain plays an irreplaceable role in the privacy and security of user data. It is believed that as a blockchain improves the PoR scheme, it will continue to promote the progress of technology. However, there are still many attacks not being considered, such as reset attacks and malicious attacks. To improve the performance, it is interesting to remove the bilinear mapping while reserving the same security level.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest in this work.