Toward Serverless and Efficient Encrypted Deduplication in Mobile Cloud Computing Environments

With the proliferation of new mobile devices, mobile cloud computing technology has emerged to provide rich computing and storage functions for mobile users.+e explosive growth of mobile data has led to an increased demand for solutions that conserve storage resources. Data deduplication is a promising technique that eliminates data redundancy for storage. For mobile cloud storage services, enabling the deduplication of encrypted data is of vital importance to reduce costs and preserve data confidentiality. However, recently proposed solutions for encrypted deduplication lack the desired level of security and efficiency. In this paper, we propose a novel scheme for serverless efficient encrypted deduplication (SEED) in mobile cloud computing environments. Without the aid of additional servers, SEED ensures confidentiality, data integrity, and collusion resistance for outsourced data. +e absence of dedicated servers increases the effectiveness of SEED for mobile cloud storage services, in which user mobility is essential. In addition, noninteractive file encryption with the support of lazy encryption greatly reduces latency in the file-upload process. +e proposed indexing structure (D-tree) supports the deduplication algorithm and thus makes SEED much more efficient and scalable. Security and performance analyses prove the efficiency and effectiveness of SEED for mobile cloud storage services.


Introduction
Most mobile devices, such as smartphones and Internet of things products, are constantly connected to the Internet thanks to advances in mobile wireless network technology. Mobile cloud computing (MCC) [1,2], also referred to as mobile edge computing [3], has emerged to fulfill the need for ubiquitous, low-latency services and applications for mobile users. rough the combination of cloud computing, mobile computing, and wireless networks, MCC provides a rich array of computing and storage options for mobile users [4].
With the explosive growth in the volume of data outsourced from mobile devices, it is crucial for mobile cloud service providers (MCSPs) to minimize the costs of storing outsourced data. Data deduplication, a technique that eliminates data redundancy, can achieve this goal and reduce resource use, including disk space and network bandwidth, by more than 90% [5].
To maintain confidentiality of the outsourced data, it is essential to devise a technique to conduct deduplication over encrypted data. As a first attempt for encrypted deduplication, convergent encryption (CE) [6,7] was proposed. CE computes an encryption key from the hash of the data, thus generating identical ciphertexts from identical plaintexts. Although the method is quite simple, it is vulnerable to brute-force attacks [8,9] because encryption keys are deterministically computed from plaintext, which makes them predictable. For example, given CE ciphertext C (of plaintext F) and a dictionary of possible plaintexts D � F 1 , F 2 , . . . , F n , an adversary might attempt to derive an encryption key for each plaintext in D and then perform encryption on it until C is found. [8,[10][11][12] addresses this problem and aims to mitigate brute-force attacks on encrypted deduplication. is approach uses a dedicated key server for the generation of encryption keys. e key server possesses its own secret key and performs an oblivious key generation protocol [13] with users: for each request, it generates an encryption key, using the secret key and a blinded hash computed from the data, and then returns it to the user. By doing so, the randomness of the key server's secret key contributes to the encryption keys, which makes brute-force attacks infeasible while the secret key is kept hidden from adversaries.

Server-aided encryption
Despite its resistance to brute-force attacks, server-aided encryption has several limitations when applied to an MCC environment. First, the achievement of security comes at the cost of managing key servers, which are subject to single point-of-failure or server-compromise attacks [8]. Second, the dedicated key servers that are usually residing within onpremises networks severely reduce user mobility, which significantly degrades the effectiveness and performance of the MCC technology. is restriction on mobility could be relieved by deploying multiple key servers over geographically separated areas. However, this not only incurs high deployment costs but also exposes the system to a wide variety of security threats.
For the successful provisioning of ubiquitous, low-latency, and secure storage services in a mobile cloud environment, it is necessary to devise serverless encryption that enables brute-force-resistant encrypted deduplication without the aid of additional servers.
In this paper, we propose a novel scheme for serverless and efficient encrypted deduplication (SEED) in MCC environments. Instead of using key servers, users perform bilinear pairing-based encryption on files using their own public and secret keys. e bilinearity of file encryption allows an equality test to be conducted for the ciphertexts generated under different secret keys and thus enables crossuser deduplication of encrypted data. e encryption algorithm randomizes all ciphertexts and the corresponding tags, which are susceptible to exposure to adversaries, using a random source supplied from the users' secret keys. e provable security ensures that no information about the plaintext is revealed from either ciphertexts or tags. Furthermore, file encryption allows the tags to be computed independently of the ciphertexts, which makes it possible for the ciphertexts and tags to be generated in parallel. is property of SEED enables lazy encryption, a novel feature in which ciphertext generation, a computationally expensive component of encryption, can be delayed or even omitted in the case of client-side deduplication.
In addition to bilinear pairing-based file encryption, SEED is based on an efficient deduplication algorithm. For this, we propose D-tree, a new indexing data structure that supports deduplication. D-tree is a random binary tree, which is a binary search tree that is formed from the random permutation of nodes. Each node in a D-tree contains a tag for an outsourced file as deduplication information within the storage. e cloud server (i.e., MCSP) can perform a binary search over the D-tree for identical files within the storage by running equality tests on each node. Because nodes are balanced in a random binary tree, D-tree preserves logarithmic computational complexity in the worst case for the deduplication algorithm.
SEED is significantly more practical for mobile cloud storage services than existing solutions because of the following advantages: (i) It eliminates the need for key servers, which severely restricts user mobility. e absence of key servers also allows noninteractive file encryption: users can generate encryption keys directly without server interaction. In combination with lazy encryption, efficient and low-latency file uploading to a mobile cloud is realized. (ii) e random binary-tree-based deduplication algorithm reduces the run time complexity when finding duplicates to O(log n), where n is the number of outsourced files in the storage. is makes the scheme much more efficient and scalable, especially considering very large data items being outsourced. (iii) e use of users' secret keys for file encryption ensures strong data confidentiality even for predictable data, while also guaranteeing data integrity and resistance against collusion attacks. (iv) Noninteractive file encryption with the support of lazy encryption greatly reduces latency in the file uploading process.

Contribution.
We make several contributions in this paper: (i) We address the challenge of encrypted deduplication in an MCC environment and propose a novel serverless and efficient encrypted deduplication scheme, called SEED, suitable for this environment (ii) e security of SEED is rigorously analyzed in terms of data confidentiality, data integrity, and collusion resistance (iii) e effectiveness of SEED is validated by an extensive analysis of its efficiency and performance Organization. e remainder of this paper is organized as follows. In Sections 3 and 4, we present the system model and background knowledge, respectively. In Section 5, we describe the proposed scheme in detail. We analyze the security of the scheme in Section 6 and present a comparative and performance analysis in Section 7. Finally, we conclude the paper in Section 8.

Convergent Encryption.
CE is a cryptographic algorithm that generates identical ciphertexts from identical plaintexts [6,7]. In CE, a convergent key k is derived by computing H(M), where Mis data (or a file) and H is a cryptographic hash function. e ciphertext C ⟵ Enc k (M) is then computed with conventional symmetric encryption algorithm Enc and convergent key k. A given plaintext M will always produce an identical ciphertext C. Bellare et al. [14][15][16] presented message-locked encryption (MLE), which is a generalized framework for CE, and attempted to formalize security. MLE essentially follows the CE approach in the sense that it derives encryption keys deterministically from M. Despite the novel nature of encrypted deduplication, CE and MLE are insufficiently secure for two reasons [17]. First, they cannot preserve semantic security due to their deterministic nature. Second, the distribution of message space is the only entropic source of randomness in the convergent key. us, the key space is reduced to the message space, which is very small compared to the former. is ultimately renders CE and MLE susceptible to brute-force attacks [8].

Server-Aided Encryption.
To overcome the weaknesses of CE, it is necessary to strengthen the generation of convergent keys so that the key space has high min-entropy. Several solutions have been proposed to achieve this goal. e approach used in server-aided encryption is to generate convergent keys through interacting with key servers. By doing so, the probability distribution of the convergent keys becomes independent of the distribution of message space, and thus brute-force attacks can be mitigated.
DupLESS [8] was the first attempt at server-aided encryption. In this approach, users run an interactive key generation protocol with a key server to compute convergent keys. e protocol operates on RSA-based Oblivious Pseudorandom Function (OPRF) [13], and thus, it guarantees that the convergent keys can be computed without revealing any information about the message or the secret of the key server. In this way, adversaries, such as the MCSP or users, cannot recover plaintext (i.e., messages) with offline brute-force attacks on ciphertext, even if the plaintext is easily predicable. e security of the DupLESS scheme requires the aid of a key server, which is inherently vulnerable to the singlepoint-of-failure problem. at is, data confidentiality cannot be retained if the server is compromised.
Subsequent attempts at server-aided encryption have been made to overcome the drawbacks of DupLESS. Miao et al. [11] proposed multiserver-aided encryption, which uses several key servers rather than just one. In this approach, key servers cooperate with each other to process convergent key generation requests. More specifically, convergent keys are generated by executing a threshold blind-signature-based protocol [29] with the aid of the group of key servers. Each key server uses a share of a secret key to generate a partial blind signature for the message a user requested. e partial blind signatures are then combined, and, in turn, a convergent key is computed from the blind signature. Unlike DupLESS, multiserver-aided encryption can resist server-compromise attacks unless the attackers gain access to more than t (i.e., the threshold) key servers.
Another solution, proposed by Duan [10], addressed the single point-of-failure inherent in server-aided encryption. Similar to multiserver-aided encryption, in this approach, multiple entities are involved in key generation using an RSA threshold signature. However, it differs in that the tasks of the key servers are distributed to a number of signers (i.e., a qualified subset of users). A key server participates in the system only during the setup phase: it generates a secret key and disperses shares of the secret key across the signers. Convergent keys can be acquired if more than t signers participate in the interactive key generation protocol. Zhang et al. [30] proposed a server-aided encrypted deduplication scheme for electronic health systems. e aforementioned schemes achieved the goal of mitigating server-compromise attacks on a DupLESS system. However, all server-aided encryption schemes fundamentally require key servers. e necessity of dedicated servers severely restricts user mobility, limiting its application in MCC environments.

Serverless Encryption.
Another approach has been proposed to achieve high levels of data confidentiality in encrypted deduplication without the need for additional servers.
Liu et al. [31] proposed serverless encryption that uses Password Authenticated Key Exchange (PAKE) [32]. Instead of interacting with key servers, it allows convergent keys to be derived in cooperation with online checkers (i.e., a subset of uploaders) through a PAKE-based protocol. However, despite the advantages of removing the servers, this scheme suffers from lower performance, including high latency, because many PAKE steps are required when conducting file encryption.
Several schemes for serverless encryption use pairingbased cryptography. Abadi et al. [15] proposed a scheme that deviates from MLE by fully randomizing all components of the ciphertexts. A study precedent to SEED [33] is also built on bilinear pairing encryption algorithms to make the ciphertexts indistinguishable from a random distribution.
In these pairing-based schemes, a test algorithm that checks for equality among the ciphertexts is necessary because the ciphertexts are fully randomized. However, deduplication using an equality test algorithm inherently has a linear time complexity with the number of files in the storage. Without a tree-based indexing structure, it seriously degrades the performance of the cloud storage service.

System Model.
In this paper, we consider a general architecture of mobile cloud storage services where multiple mobile users outsource their data to remote storage.

Security and Communication Networks
User: this is an entity who owns data (or files (We will use the term "file" and "data" interchangeably in this paper.)) and wishes to outsource the data to the cloud storage. A user who uploaded data is referred to as an uploader: he/she is the initial uploader of the file Fif it is the first time that Fhas been uploaded to the storage, or a subsequent uploader otherwise. MCSP: this is an entity equipped with abundant storage and computing resources and provides cloud storage services to mobile users. It has an interest in saving storage costs, so it performs deduplication of the outsourced data.

reat Model and Security Goals.
We consider honestbut-curious adversaries in our threat model. at is, for assigned tasks, MCSP and users will faithfully perform their work within the system. However, they have an interest in obtaining as much information as possible about the outsourced data, beyond their privileges.
us, our primary security goal is to prevent them from accessing the plaintext version of encrypted data.
In this study, we consider two types of adversaries: (i) an outside adversary, who makes an effort to learn useful information about the outsourced data by playing the role of a user and (ii) an inside adversary, who may be an honest-butcurious MCSP or intruders that have compromised the storage server. Specifically, we aim to achieve the following security goals in the proposed scheme: (i) Data confidentiality: no adversary can acquire information from the outsourced data using bruteforce attacks unless they obtain the corresponding key (ii) Data integrity: any valid user should be able to check whether the data downloaded from cloud storage has been kept intact (iii) Collusion resistance: any adversaries without valid ownership of the data should be blocked from obtaining useful information from the data even if they collude with each other

Server-Side and Client-Side Deduplication.
Data deduplication can be classified into two kinds of approaches according to the location where the deduplication occurs. In server-side deduplication, the MCSP performs deduplication once files have been uploaded to the storage. On the other hand, client-side deduplication is executed on the user's side. at is, before outsourcing a file, a user sends a corresponding tag to the MCSP to check whether the file already exists and, if so, to omit the further upload.

Bilinear Pairings and Hard Problem
Bilinear Map. Let G and G T be two multiplicative cyclic groups of prime order p. Let g be a generator of G. A bilinear map is an injective function e: G × G ⟶ G T with the following properties: Bilinear Diffie-Hellman (BDH) Problem. Let a, b, c ∈ Z * p be chosen at random and let g be a generator of G. e BDH problem is to compute e(g, g) abc ∈ G T given g, g a , g b , g c ∈ G as input. e BDH assumption [34] states that no probabilistic polynomial time algorithm can solve the BDH problem with nonnegligible advantage.

Random Binary Tree.
A binary tree is referred to as a random binary tree if it is constructed at random from a probability distribution (e.g., a uniform distribution) of binary trees. A random binary tree of size n ∈ N is formed in the following way. First, a random permutation of n elements is chosen, and the elements in are added one by one into a binary tree. e addition of elements is similar to the way that elements are inserted into a binary search tree. A root node for a random binary tree is obtained from the first element in . Each subsequent element is then evaluated on the tree from the root until it reaches a leaf. e evaluation result b ∈ Left, Right directs the child node for the next evaluation.

Serverless and Efficient Encrypted Deduplication
SEED consists of two building blocks: file encryption and deduplication. We first present these building blocks in Section 5.1 and then describe a data outsourcing protocol constructed upon them in Section 5.2.

File Encryption.
We introduce some notations prior to giving details on our file encryption algorithm. Let G and G T be two multiplicative groups with the prime order p, and let H 1 : 0, 1 { } * ⟶ G be a hash function family. Let SE k be a symmetric encryption algorithm with an encryption key k ∈ K, where K is a key space of the underlying block cipher (e.g., AES), and let K: G T ⟶ K be a key derivation function.
〈PK, SK〉 ⟵ KeyGen (J). Given global information J � 〈p, g〉, this algorithm runs as follows: (1) Pick a random value x ⟵ Z * p and compute g x (2) Set PK � g x as its public key and SK � x as its secret key, then return 〈PK, SK〉 〈C, MK, τ〉 ⟵ Encrypt(SK, M). Given a secret key SK � x and a message M, this algorithm runs as follows: rived key MK � 〈dk .rk〉, and a tag τ . Given a reencryption key rk in a message-derived key MK and a part of a ciphertext T, this algorithm computes T ′ � T rk and returns T ′ .
. Given a secret key SK, a message-derived key MK � 〈dk , rk〉, and a ciphertext C � 〈C 1 , C 2 , T〉, this algorithm runs as follows: (1) If T is not reencrypted (Without a loss of security, we assume that the information about whether E is reencrypted or not is implicitly augmented with the ciphertext C), then recover s by computing.
(2) If T is reencrypted with another reencryption key rk ′ , recover s by computing.
(3) Compute a symmetric decryption key κ � K(s) and recover M by decrypting Parse δ i and δ j as 〈PK i , τ i 〉 and 〈PK j , τ j 〉, respectively. en, given public keys PK i � g x i and PK j � g x j and tags τ i � ] x i and τ j � (] ′ ) x j , this algorithm runs as follows: (1) Check whether the following equation holds.
(2) If the equation holds, return True. Otherwise, return False.

Deduplication.
e performance of deduplication depends on the computational complexity of the algorithm used to find file duplicates in the storage. To achieve efficient deduplication that is as fast as a binary search algorithm with logarithmic complexity, we define a D-tree, a binary-treebased data structure for deduplication. A D-tree is a random binary tree of size n, where n is the number of all the distinct outsourced files in the storage. Each node N i (0 ≤ i ≤ n − 1) in a D-tree contains deduplication information δ i for each outsourced file.
A D-tree is an index structure for the storage of the MCSP. Once file F has been uploaded to the storage, the MCSP checks whether it has a duplicate. For this, it performs a binary search over a D-tree using the Test algorithm given in the previous section. e search path for F is determined at random based on a globally publicized random seed. If a node that contains information δ of F is found, then the MCSP performs deduplication. Otherwise, it creates a new node for F and inserts it at the leaf node on the search path.
We will introduce here some notation for our deduplication scheme. Let Δ be a D-tree of size n, and let N 0 , . . . , N n−1 be its nodes. δ i � 〈PK j , τ j 〉 denotes deduplication information assigned to node N i , where j indicates initial uploader u j who outsourced τ j and the corresponding file. Let π be a maximum height of Δ, and let ψ ∈ 0, 1 { } λ be a random seed chosen from a uniform distribution. Let D � 〈ψ, π〉 be global publicly known information.

hash function family, and
Vector p denotes a search path from a root node on Δ: the bit value of b i indicates left or right child of the node at the (i − 1)th level of Δ. Figure 1 shows an instance of a D-tree of size n � 7 and π � 3 and its storage structure. Nodes in the tree are traversed from a root node N 0 with respect to a search path p, in which the bit information of each element indicates the next child node: bit 0 directs the traversal to the left child and bit 1 to the right child. For example, nodes traversed along a path p � 0, 1, 0 { } include N 0 , N 1 , N 4 , and N 7 , with which the corresponding deduplication information δ 0 , δ 1 , δ 4 , and δ 7 are sequentially evaluated using the Test algorithm.
Details of the D-tree based deduplication algorithms are given below.
InsertNode(N i , p). Given a node N i and a path p, this algorithm inserts N i at the leaf node of Δ on p.
DeleteNode(N i ). Given a node N i , this algorithm deletes the node from Δ. If N i is a non-leaf node, then the deletion is performed by replacing it with one of its child nodes.

Security and Communication Networks
〈DuplicatedFound, k〉 ⟵ FindDuplicate (p, δ i ). Given path vector p of message M and its corresponding deduplication information δ i � 〈PK i , τ i 〉, this algorithm runs as follows (the detailed procedure is presented in Algorithm 2).
(1) Get the root node N 0 of a D-tree Δ.

Data Outsourcing in Server-Side Deduplication.
We first present the data outsourcing protocol in server-side deduplication. It consists of four operations: system setup, file upload, file download, and file deletion. For clarity, we denote the public key and secret key that belong to user u i as PK i and SK i , respectively. We also denote a message-derived key calculated from SK i as MK i . e details of the proposed protocol are given as follows.
System Setup. Given security parameter λ, the system generates public information Pub � 〈J, D〉.J. I consists of the generator g of G and the order p, and D consists of the randomly generated integer ψ ∈ 0, 1 { } λ and the maximum height π of a D-tree Δ. Each user u i generates a pair of public key PK i and secret key SK i by running 〈PK i , SK i 〉 ⟵ KeyGen(J).
en, PK i is made public, while SK i is kept secret.
File Upload. Suppose that a user u i wishes to upload a file F to the MCSP. u i performs a file uploading operation as follows: (1) u i encrypts F by running Encrypt(SK i , F) with his/ her secret key SK i to get its ciphertext C � 〈C 1 , C 2 , T i 〉, a message-derived key MK i , and a tag τ (2) u i computes a path vector pby running DP ath(F, ψ) (3) en, u i sends 〈i, τ, p, C〉 to the MCSP, where i is the identifier of u i , and keeps MK i secret for later use Once the encrypted file C, as well as its corresponding tag τ and p are uploaded, the MCSP tries to eliminate the duplicate of F by running deduplication as follows: (1) Given 〈i, τ, p〉, the MCSP runs the FindDuplicate(p, δ i ) algorithm, where δ i � 〈PK i , τ〉.
(2) If the result is 〈False, k〉, then C has been previously uploaded to the storage. k is the position of a new node N k where deduplication information of file F will be assigned. N k is on p (if k ≥ 1 ) or a root node if the D-tree Δ is empty (i.e., k � 0). e MCSP inserts N 0 (root) ALGORITHM 1: D-path.
δ i � 〈PK i , τ〉in the position k in Δ and stores C with a link to δ i . e user u i is then assigned as the initial uploader of F. (3) If the result is 〈True, k〉, then u i is a subsequent uploader of F. k indicates a position of node N k that has stored the deduplication information of F. Hence, the MCSP does not have to store C 1 , C 2 in C but T i . Prior to storing T i , the MCSP finds the initial uploader u j assigned to N k and asks u j to reencrypt T i . Upon receipt of the request, u j computes T i ′ � ReEnc(MK j , T i ) with his/her own key MK j and returns T i ′ . e MCSP appends T i ′ to the end of the stored tuple C � 〈C 1 , C 2 , T j , . . . , T l ′ 〉, where l is the identifier of another subsequent uploader.
File Download. User u i interacts with the MCSP to download an outsourced file F. e details are as follows: (1) u i sends a request to download the outsourced file F to the MCSP (2) Upon receiving the request, the MCSP sends the corresponding ciphertext C � 〈C 1 , C 2 , T i 〉 to u i (3) Given message-derived key MK i and secret key SK i , u i recovers F by running Decrypt(SK i , MK i , C) (4) If the result is ⊥, then u i drops the ciphertext File Deletion. Upon receiving a deletion request for F from user u i , the MCSP runs the following steps: (1) If u i is the only user who owns the file F, the MCSP removes C in the storage. It also deletes the corresponding node in Δ by running the DeleteNode algorithm.
(2) Otherwise, the MCSP only removes T i in C.

Data Outsourcing in Client-Side Deduplication.
e previously described protocol of SEED is based on serverside deduplication. We can easily modify the protocol to operate in a client-side deduplication mode. Specifically, it can be modified such that instead of fully uploading 〈i, τ, C〉, user u i first sends 〈i, τ, T i 〉 to the MCSP. e encrypted files C 1 and C 2 will be uploaded later only if FindDuplicate (PK i , τ) returns Nil.
Lazy Encryption. SEED takes advantage of lazy encryption to further enhance the computational efficiency of the file uploading process in client-side deduplication. Lazy encryption is a novel technique that delays file encryption until the MCSP requests to upload subsequent ciphertexts as a result of the FindDuplicate function. It allows a user to omit the job of file encryption when a duplicate is found in the remote storage. In the file uploading process, the task of encryption (i.e., executing SE in the Encrypt algorithm) comprises the majority of computation. Hence, lazy encryption significantly reduces the computational burden of the client. is is a crucial performance factor in mobile devices because it is directly related to a reduction of power consumption. e lazy encryption technique is enabled in SEED due to the concurrency property of the Encrypt algorithm (in Section 5.1.1). More specifically, in the encryption algorithm, a tag τ can be computed concurrently and independently of the computation of a ciphertext. Figure 2(a) intuitively depicts the concurrent processing of the file encryption. e concurrency property is only found in the proposed scheme. All the existing schemes, including MLE [14] and DupLESS [8], have sequential processing; the encryption and tag generation process are performed inherently in a sequential way (see Figure 2(b)).

Security and Communication Networks
Side-Channel Prevention. Client-side deduplication is inherently vulnerable to a side-channel attack [35], by which adversaries can infer information about the existence of a specific file in the cloud storage. To defend against such an attack, we use a randomized-threshold approach [35]. In this technique, a randomly chosen threshold t F (2 ≤ t F ≤ d, where d is a security parameter) is assigned to each F in the storage, along with a counter c F that counts the number of previous uploads of F. Unless c F reaches t F , a user will be required to fully upload F as server-side deduplication despite the existence of the file in the storage.

Security Analysis
In this section, we analyze the security of SEED regarding data confidentiality, data integrity, and collusion resistance.

Data Confidentiality.
As mentioned in the previous section, our primary security goal with SEED is to guarantee the confidentiality of users' outsourced data. In our threat model, we consider an MCSP that is no longer fully trusted although it is faithful. erefore, any leakage of users' data should be prevented from adversaries, including the MCSP and unauthorized users. Because our threat model considers various types of attacks from both internal and external adversaries, we analyzed data confidentiality according to these attacks. In the analysis, we assume that all public information, including the public keys of users, are known a priori to the adversaries. Proof. In the security game, the adversary A will be given a correct ciphertext for each query M in the case of b � 1. We will show that even in such a case, A cannot get any information about M from the ciphertext and cannot distinguish from random with nonnegligible advantage. Suppose that the challenger responds to A's queries as follows: for H 1 -random oracle query of M, the challenger picks a random μ ∈ G and returns it to A. For Encrypt oracle query, the challenger returns the corresponding ciphertext

□
Because the underlying symmetric encryption algorithm SE is semantically secure, the ciphertext C 1 is indistinguishable from random data. at is, because s ∈ G T is chosen at random, the symmetric encryption key κ, which is derived from s, as well as C 1 � SE κ (M), are made pseudorandom. erefore, A cannot get any useful information from C 1 except a negligible advantage, unless s is known to A.
Recovering s from C 2 and T is as hard as solving the BDH problem. Suppose that A can compute s from C 2 and T in polynomial time with nonnegligible probability ε. We can construct an algorithm B that solves the BDH problem using A: given a BDH instance 〈g a , g b , g c 〉, B sets up the instance of A such that 〈PK, C 2 , T〉 � 〈g a , R, g b 〉, where R is chosen at random from G T , and runs A. For an H 1 -query of M, B responds to A with μ � g c . From the view of A, the instance is a valid ciphertext of M, such that PK � g x � g a , where s and rk are random values from G T and Z * p , respectively. If A terminates and returns s ′ as its output, then B outputs O � R/s ′ as the solution of the BDH problem. With nonnegligible probability ε, the output O is the correct answer of the BDH problem, which contradicts the BDH assumption. erefore, computing s from C 2 and T is infeasible.
Moreover, because the ciphertexts C 2 and T are blended with two random values s and rk, these ciphertexts are indistinguishable from random data, except with a negligible probability. With regard to a tag τ � ] x (� H 1 (H 1 (M)) x ), A also cannot distinguish it from random, because for any distinct messages the random oracle H 1 makes ] randomized. erefore, SEED makes ciphertexts and tags indistinguishable from random data, which implies that A has a negligible advantage in winning the security game.

Security against Online Brute-Force
Attacks. Now, we analyze the security of SEED against online brute-force attacks. We consider outside adversaries (e.g., unauthorized users) with a dictionary that contains candidates for a file of interest F. e attack proceeds as follows: the adversary repeatedly performs a file upload operation for each candidate F ′ until he/she observes a deduplication event, which indicates the candidate file matches F in the storage.
If the proposed scheme is run under the mode of serverside deduplication, such an attack cannot succeed, this is because all candidates in the dictionary will eventually be sent to the MCSP during the operation, and thus the adversary can infer no information about whether deduplication takes place. In the case of client-side deduplication, the uploading of a certain file may be omitted if it already exists in the storage, which may give information to the adversary. However, the randomized-threshold strategy makes the adversary fully upload the file even if it exists in the storage and thus obfuscates the information about the file. As analyzed in [35], the adversary cannot obtain the information with probability 1 − 1/(d − 1), where d is a security parameter.

Data Integrity.
e integrity of outsourced data can be compromised by data corruption due to defects in the storage system or adversaries' intentional attacks. SEED provides users with the ability to detect alteration in the outsourced data easily. Say that a user has downloaded an outsourced ciphertext C � 〈C 1 , C 2 , T〉from the MCSP. While running the Decrypt algorithm, the user can restore the plain data F ′ from the ciphertext and then compute If dk ′ and a decryption key dk(� μ SK ) are different, Decrypt outputs ⊥, and the user knows that the outsourced file has been modified. Notice that the probability of Decrypt yielding an output other than ⊥ is negligible for F ≠ F ′ , thanks to the collision-resistant property of the cryptographic hash function H 1 . us, SEED offers an integrity model that allows users to validate the outsourced data effectively.

Collusion
Resistance. SEED also provides security against any collusion attacks. Let us consider the colluding of unauthorized users who do not have valid ownership of file of interest F. Although they have access to ciphertext C of the file, they need the correct decryption key dk(� μ x ) to decrypt the ciphertext. Suppose that the colluding users have obtained sufficiently many decryption keys for other files. Even with these decryption keys, it is impossible to compute the correct decryption key dk for F unless they know both μ and secret key x.
We also consider an attack in which unauthorized users collude with an MCSP. In addition to decryption keys for other files, they would have access to ciphertexts other than C on the storage. However, because other ciphertexts contain no information about F, these adversaries learn nothing about F. is is the same as in the former case that requires the adversary to compute the correct decryption key dk of F to succeed in the attack. erefore, the proposed scheme resists attacks by colluding adversaries.

Comparative Analysis.
We comparatively analyzed secure deduplication schemes regarding attack resistance, mobility support, file encryption, and deduplication cost. e result is summarized in Table 1. CE (or MLE) has the cheapest computational cost among deduplication schemes, because any math operations, such as exponentiation or group multiplication, are not required to perform file encryption. However, because of its weak security against brute-force attacks, this scheme cannot guarantee strong confidentiality to the outsourced data. is implies that CE is also vulnerable to server-compromise attacks, because attackers who compromised cloud servers can easily revert CE ciphertexts to plaintexts by brute-force recovery.
Server-aided encryption schemes achieve resistance against brute-force attacks using an OPRF protocol (and its variants) with key servers. However, they also have vulnerability to server-compromise attacks. is is because if one of the key servers is compromised and a secret key is leaked from the server, then the security of the whole system is downgraded to the level of CE. is implies that it fails to guarantee strong confidentiality of outsourced data. Several works by Miao et al. [11] and Duan [10] tried to alleviate the risk of such attacks. However, these approaches still fail if more than k (i.e., threshold) servers are compromised.
Besides, in server-aided encryption, the cost of key computation is larger than in other solutions, and clients are requested to interact with key servers for the generation of convergent keys. is inevitably adds a nonnegligible latency to file outsourcing operations. Such intrinsic latency and the need for key servers, which usually reside in central data centers, make server-aided encryption solutions less attractive in MCC environments, where the support of lowlatency service and mobility is critically important.
Liu et al.'s scheme [31] eliminates the need for key servers. Instead of OPRF, this scheme uses a PAKE protocol to achieve security against brute-force attacks. e lack of additional servers inherently leads to improved security that prevents server-compromise attacks. In this scheme, however, executing many PAKE protocols with online checkers (i.e., users) is mandatory for each file encryption. Like server-aided encryption, this will incur high latency in performing file encryption, which degrades the effectiveness of the scheme and makes it unsuitable for MCC environments.
Abadi et al.'s scheme [15] requires neither additional servers nor interactive protocols with any entities in file encryption. However, full randomization in file encryption incurs an extremely high computational cost for ciphertext computation. In addition, randomized ciphertexts consequently lose ordering information that is necessary to allow deduplication using a tree-based index structure. Jiang et al. [36] addressed the problem of Abadi et al.'s scheme and proposed a method that achieves logarithmic complexity in searching duplicate files for the fully randomized deduplication. eir method uses a tree-based data structure called a decision tree, which is similar to a D-tree. Despite sharing the underlying tree-based approach, there are significant differences; in Jiang et al.'s scheme, a user is required to interactively query the cloud server for each node on a path to find duplicates, while this can be achieved in a noninteractive manner in the proposed scheme.
As analyzed in Section 6.1, SEED guarantees strong confidentiality against brute-force attacks without using any additional key servers. Even if the MCSP is compromised, plain data cannot be recovered because the success probability of a brute-force attack is negligible. erefore, SEED offers further security against server-compromise attacks. Although more math operations for ciphertext computation are needed than in server-aided encryption schemes, any interactions with servers are unnecessary while conducting file encryption. In addition, SEED achieves low latency in file encryption because it supports the novel property of lazy encryption, which is infeasible for other client-side deduplication schemes that require full ciphertext computation for tag generation. Using a random binary tree reduces the complexity of the deduplication algorithm to O(log n), which makes SEED much more efficient and scalable in MCC environments.

Experiments.
To evaluate the computational efficiency, we implemented SEED and other deduplication schemes using Charm [37], a Python-based framework for prototyping cryptosystems. Charm provides useful math operations, such as group multiplication, exponentiation, and bilinear pairing, through Python wrap-up modules of the native C libraries GNU Multiple Precision Arithmetic Library (GMP) and Paring-Based Crypto Library (PBC). erefore, the performance overhead caused by the use of Python is limited to less than 1% [37]. We selected the SS501 curve in our experiment, which is a supersingular elliptic curve with symmetric Type 1 pairing. We chose SHA-256 as the cryptographic hash function and AES-CBC with 128-bit keys as the symmetric block cipher algorithm.
Our implementation consists of two modules: a clientside program simulating a file-uploading user and a serverside program simulating MCSP, which oversees deduplication. In all our experiments, the client-side program was executed on a PC with an Intel Core i7-4770 3.4 GHz CPU and 4 GB of RAM, and the server-side program was executed on a server with an Intel Xeon E5-2676 2.4 GHz CPU and 8 GB of RAM. Ubuntu 14.04 LTS (64 bits) was installed and run on both the PC and the server. For server-aided encryption schemes, we used a LAN with a 100 Mbps Ethernet link to execute interactive protocols with a remote key server.

File Encryption.
In secure deduplication schemes, file encryption makes up the majority of a user's computational burden for the file uploading phase. erefore, we measured the execution time of file encryption in SEED and other schemes. For server-aided encryption, we chose DupLESS as a comparative scheme because its computational cost is the cheapest of its kind [17]. We conducted the experiment for both deduplication architectures (i.e., client-side and server-side deduplication) with sample files whose size varied from 1 MB to 1 GB. Regarding client-side deduplication, we assumed that deduplication always happens for all the sample files. For each experiment, the measurement was repeated 1,000 times. e results of the experiments are shown in Figure 3. e term "Execution time" on the y-axis refers to the elapsed time to compute a ciphertext C from a corresponding file F. For client-side deduplication, it actually means the required time to generate a tag τ, because of the above assumption.
As shown in Figure 3, SEED shows better computational performance than DupLESS [8], Liu et al.'s scheme [31], and Abadi et al.'s scheme [15], which essentially require large computational tasks or high-latency interactions with remote entities during file encryption. Among server-side deduplication schemes, CE shows the least execution time because of its simplicity. However, in the case of being operated as client-side deduplication (Figure 3(b)), SEED shows the best computational performance owing to the novel property of lazy encryption. is is because the encryption of a file (i.e., SE operation) can be omitted when deduplication takes place. All other schemes, including CE, must generate a full ciphertext whatever the deduplication result is, because the ciphertext is required for computing the corresponding tag. Hence, those schemes in client-side deduplication showed no difference in the performance of file encryption with server-side deduplication.

Deduplication.
In our second experiment, we measured the computational efficiency of the D-tree-based deduplication algorithm. e data set for the experiment consisted of files sampled from Windows system files, media files, Office files, and so on. e number of files varied from 100 to 20,000. e maximum height of the D-tree was set to be π � 15.
For the comparison, we also implemented the deduplication algorithms of other schemes. We chose a red-black tree as the indexing structure of our implementations for CE, DupLESS, and Liu et al.'s scheme. A red-black tree is a type of self-balancing binary search tree that guarantees searching in O(log n) time in the average case [38]. For Abadi et al.'s scheme, we used sequential search, because the equality test algorithm does not support a binary search tree. Figure 4 shows the result of the experiment. e term "Number of test operations" refers to the number of operations to test equality between ciphertexts for each deduplication. For SEED, it means the number of executions of the Test algorithm. Because D-tree allows binary search for tags, the number of Test executions for each data set is almost the same as in the other schemes using red-black trees. SEED achieves 2-3 orders of magnitude higher performance (i.e., fewer equality-test operations) than Abadi et al.'s scheme.
We also measured the actual elapsed time during the execution of deduplication algorithms. Figure 5 presents the execution time to complete deduplication for each data set. SEED needed slightly more execution time than the other schemes that use red-black trees, because the Test algorithm includes bilinear pairing operations, which incur high computational costs. Despite the computational overhead, however, the execution time does not exceed 150 ms even for the data set with the maximum number of files. We believe that the computational overhead can be further reduced using high-performance computing technologies, such as distributed and concurrent processing.

Discussion
8.1. Reliability of the Initial Uploader. In the proposed scheme, an initial uploader contributes to subsequent file upload processes by reencrypting a part of a ciphertext. Because the reencryption is crucial for subsequent uploaders to access the encrypted content, it is required that the initial uploader remains online to serve requests without interruption. In a mobile environment, for which the proposed scheme is intended, mobile devices are likely to be connected to the Internet most of the time. Hence, we reasonably assume that the reliability of the participation of an initial uploader (i.e., a mobile device) will be acceptable in most cases.   However, we should consider the possibility that the initial uploader might not be available due to various reasons (e.g., temporary loss of the connection). For the sake of more reliable service, we may relax the protocol, so that the first N(N ≥ 1) users that uploaded a file are regarded as the initial uploaders. Subsequent uploaders will be able to successfully conduct the file upload process if at least one initial uploader responds to the reencrypting request.
We analyzed the reliability of file uploading for subsequent uploaders with regard to the number of initial uploaders. Consider the case where an initial uploader is not available when a reencryption request has been sent. Suppose that this event happens with a probability p independently from each other. en, the probability that at least one initial uploader will successfully respond to the request is 1 − p N . Figure 6 shows the probability with regard to N and p. Commercial cloud services such as AWS and Azure usually provide an Service-Level Agreement (SLA) that guarantees more than 95% in terms of service availability. With this information, we can choose the appropriate parameter N. For instance, we choose N � 1 for the case of p � 0.05 and N � 2 for p � 0.1.  cryptosystem. erefore, a ciphertext generated under the proposed scheme consists of several components whose size is directly related to the pairing. More specifically, the Encrypt algorithm, described in Section 5.1.1, generates a ciphertext C � 〈C 1 , C 2 , T〉, among which C 2 is an element of G T and T is an element of G, where G and G T are multiplicative groups that form a pairing e: G × G ⟶ G T . Hence, the ciphertext size expands exactly by |G| + |G T |.

Storage
Regarding the storage overhead for the cloud service provider (i.e., MCSP), the ciphertext expansion may cause a certain level of performance degradation. However, the storage overhead can be minimized due to the deduplication feature of the proposed scheme. at is, it is not necessary to store all the ciphertext components for deduplicated files in the storage. As described in Section 5.2.1, only Tin C � 〈C 1 , C 2 , T〉 needs to be stored in the case where a duplicate file is found.

Conclusion
In this paper, we addressed the problem of deduplication over encrypted data in MCC environments by proposing SEED, a serverless and efficient encrypted deduplication scheme. e novelty of SEED originates from the elimination of key servers, which severely restrict user mobility, while not losing effective data confidentiality. e computational efficiency of file encryption is achieved through noninteractive file encryption and support for lazy encryption. As a result, SEED offers efficient, low-latency file uploading for mobile cloud storage.
Furthermore, a D-tree-based deduplication algorithm successfully reduces the time complexity of deduplication to O(log n).
is makes SEED much more efficient and scalable, even in the case of large data items being outsourced in the storage. e security of SEED was rigorously analyzed in this paper, and it was shown that the proposed scheme strongly guarantees security against brute-force attacks without the help of any key servers. e analysis showed that other desired security properties, such as data integrity and collusion resistance, were also achieved by SEED.
Extensive comparative analysis and experiments were conducted to evaluate the performance of SEED. We showed that SEED has advantages in security and efficiency compared to other encrypted deduplication solutions.

Data Availability
e experimental results used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they do not have any conflicts of interest regarding the publication of this paper.

Acknowledgments
is work was extended from the poster presented at IEEE CloudCom [33]. is research was conducted under a Research Grant from Kwangwoon University in 2020.  Kill-Switch and Biomarker-Based Defense System for Lifereatening Internet of ings Medical Devices), and (No. 2020-0-00325, Traceability Assurance Technology Development for Full Lifecycle Data Safety of Cloud Edge).