PMDP : A Framework for Preserving Multiparty Data Privacy in Cloud Computing

1State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450002, China 2State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100088, China 3School of Computer Science and Information Security, Guangxi Key Laboratory of Cryptography and Information Security, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China


Introduction
With the significantly increasing data size and the rapid development of the corresponding data analysis technology, the original data, which usually has characteristics of big volume, heterogeneity, and low quality, begins to play a very important role in various fields, such as healthcare, advertisement, government decision-making, and transportation.This is mainly because making deep mining and analysis over these large datasets (i.e., big data) would reveal some hidden and valuable information, and further produces great benefits.On the other hand, owing to the characters of big data and the pursuit of better outputs with more complex analysis, big data processing requires more computational overhead and resource expenditure, which challenges the traditional data processing model.
Cloud computing provides a ubiquitous and on-demand approach of accessing a shared pool of configurable computing resources, which can be rapidly provisioned and released with minimal management effort [1].Therefore, it gives a desirable platform for big data processing and enables users to outsource their computations to cloud servers with powerful computing capabilities sufficient for big data processing.To this end, users need to outsource their original data to cloud servers.However, this will bring serious security and privacy issues, especially when the data is sensitive for users.For exemplary purpose, we consider the following scenario.
The mobile wearable device has been very popular in recent years.With a smart band on your wrist, you can not only collect your own health data like sleep time, heart rate.and motion trail, but also compare your quantity of motion with average level or other people's level.In this case, the data containing your privacy is collected, aggregated, analyzed, and published by businesses with the assistance of cloud servers, which may result in privacy disclosure of users.Some related reports have been published (https://techcrunch.com/ 2016/11/03/fitbit-jawbone-garmin-and-mio-fitness-bands-criticized-for-privacy-failings/).Even worse, with the appearance of the sharing economy, user privacy issues become increasingly prominent.For instance, Uber, which brings convenience by providing car-ride service, is accused of allowing its employees to look through and gather users' travel data and device information at will, and its application named God View has always been criticized since it can track users even after they get off the car (https://www.nbcnews.com/tech/tech-news/uber-fined-settlement-ny-over-godview-tracking-n491706.).
To guarantee data confidentiality and preserve user privacy, various security mechanisms have been developed and employed in each phase of the data life cycle, which roughly includes data storage, data processing, and data publishing (there also exist security issues in the process of data acquisition and data destruction, but they are out of the scope of this paper).For example, attribute-based encryption schemes [2,3] are used to secure data storage in public cloud servers, secure multiparty computation schemes [4,5] are introduced to protect data aggregation, and differential privacy mechanisms [6] provide a way to quantize the disclosed privacy in data publishing.In addition, we also note that there are several works proposed to protect data processing and data publishing by combining differential privacy with other cryptography primitives.However, there are few studies concerning the entire privacy preservation throughout the full life cycle of multiparty data, especially in the context of cloud computing.On the other hand, from a practical viewpoint, once the data privacy gets exposed in some phase of the data life cycle, the security mechanisms deployed in other phases would be useless.Therefore, it is necessary to conquer the privacy issues of big data in cloud computing from a global perspective.

Our Contribution.
In this paper we propose a general framework for Preserving Multiparty Data Privacy (PMDP for short) in cloud computing, which provides complete protection throughout the entire life cycle of users' data and is suitable for securing multiparty data aggregation and publication with the assistance of an untrusted cloud server.Specifically, the contributions of this study can be summarized as follows: (1) Based on well studied security mechanisms for preserving user privacy in the process of data storage, processing, and publishing, respectively, we combine these techniques in a nontrivial and tight manner and propose the PMDP framework that covers the full lifecycle of multiple users' data.(2) We present the principles of choosing the building security mechanisms involved in the PMDP framework and a specific instance.Furthermore, to illustrate the advantage and practicability of the PMDP framework, we evaluate the performance of the instance in terms of efficiency and functionalities by comparing it with other related works.(3) We formally discuss the security of the PMDP framework.Concretely speaking, we reduce its security to the security of the building mechanisms including fully homomorphic encryption, secure multiparty computation, and differential privacy, which are all with the feature of provable security.Thus, the PMDP framework is also provably secure.(4) We put forward a reinforced version of the PMDP framework named sPMDP, to provide stronger security and privacy guarantee.In addition, we also show the application scenarios of the PMDP and sPMDP frameworks.
1.2.Outline.The remainder of the paper is organized as follows.In Section 2, we review the related work on techniques for data privacy in different phases.Section 3 introduces some preliminary knowledge used in this paper.The PMDP framework is presented in Section 4. Section 5 illustrates an instantiation of the framework.In Section 6, we evaluate the performance of the framework and discuss its security along with its application scenarios.We propose the reinforced framework sPMDP and analyze its security in Section 7.
Finally, some concluding remarks are given in Section 8.

Related Work
In this section, under the background of cloud computing, we briefly introduce the security mechanisms used to protect data privacy in each phase of the life cycle of data.
Secure Data Storage.When the data storage is outsourced to cloud servers, data owner completely loses the access control of his/her data.But data owner hopes that the outsourced data can only be accessed by authorized users for privacy issues.A natural solution is to encrypt the data before sending it to cloud servers so that the users holding corresponding secret keys can decrypt the data.Although traditional public key encryption schemes can guarantee data security, they suffer from the limitation of efficiency.A new approach is identitybased encryption, but it is faced with some new challenges [7].As an extension of identity-based encryption, attributebased encryption (ABE) [8] enables the data owner to place fine-grained access policy over the outsourced data and can perfectly conquer the problem of securing data storage in cloud computing.For this reason, many ABE schemes [9,10] with extended functionalities have been proposed.In addition, there are also various privacy-preserving authentication protocols, like two-factor authentication [11,12], three-factor authentication [13,14], end-to-end authentication [15], and so on.And some new works focus on practical application fields, such as smart metering [16], Internet of Things [17], and WBAN [18].
Secure Data Processing.The purpose of aggregating and storing data is to make analysis on them and further find the valuable information.However, when the data is outsourced to cloud servers with the above encryption mechanisms, the data analyst has to download and decrypt the data before processing it, which is not convenient enough to satisfy the data analyzing demands under the background of big data.Fortunately, fully homomorphic encryption (FHE) [19], which has the feature of allowing cloud servers to evaluate arbitrary functions on the encrypted data without decryption, can simultaneously guarantee the security of the data in phases of storage and processing.Due to its significant advantages of securing data sharing, many FHE schemes were proposed to improve the security and efficiency of the original one.Besides, some frameworks for efficient and privacy-preserving outsourced computation have been proposed based on FHE, such as EPOM [20], POFD [21], and POCR [22].
Another important security mechanism used to secure data processing is secure multiparty computation (MPC), which enables multiple users to perform the assigned computation on their collected data and obtain the computation result without getting any information about one another's data.To improve the efficiency of MPC in the context of cloud computing, several variants of MPC have been proposed, such as on-the-fly MPC [23] and cloud-assisted MPC.In addition, motivated by security requirements of practical applications (e.g., outsourced database query, private set intersection, and information retrieval), some efficient and specific cloud-based MPC protocols [24,25] have been designed.
Secure Data Publishing.The significance of output privacy is remarkable especially under the background of big data.Due to the technology upgrade of data mining, preserving data privacy is getting more and more difficult since the sensitive information in original data would suffer from direct and indirect (via inference) exposure during the mining process.Namely, not only the original data but also the data mining output can lead to the disclosure of sensitive information.So it is necessary to pay attention to output privacy.Anonymization technologies are widely used to preserve data privacy in the process of data publishing.Although classical anonymization methods (e.g., -anonymity, -diversity, and -closeness) have been well studied and there are various corresponding algorithms, they cannot resist structure-based deanonymization attacks [26], which implies that the published data would reveal user privacy.In contrast, differential privacy [27] provides strong theoretical guarantees on the privacy of data by adding noise with specific distributions to raw data.Roughly, the research of differential privacy applied to data publishing is comprised of two aspects, interactive data publishing and noninteractive data publishing.In the first one, the fundamental methods are about responding to queries by disturbing the outcome derived from the original dataset, including Laplace mechanism and exponential mechanism [27].These methods are easy to implement but the noises achieving privacy protection are relatively big.Afterwards, researchers developed techniques [28] providing responses to queries according to the histogram with noise generated from raw data, which have low sensitivity and comparatively small noise.As for the noninteractive mode, the research results at present are mostly focusing on batch query [29], contingency tab publishing [30], grouping and generalization [31], and sanitized dataset publishing [32].
Secure Data in Multiphase.The MPC technique can protect data privacy in the input and computation process, but it is not designed for output privacy.From the theoretical perspective, it is obvious that MPC cannot guarantee output privacy.With the rapid development of database technology and cloud computing, researchers begin to design security schemes with the capability of preserving data privacy in multiple phases of its lifecycle simultaneously.For instance, Pettai and Laud [33] gave a good example of combining MPC and differential privacy with reasonable performance to achieve both computational and output privacy.However, their framework, which we call DPSharemind, is built upon GUPT [34], which is secure under the assumption of the existence of a trusted third party.Consequently, in the situation where multiple clients intend to delegate the computation of a joint function on their data to an untrusted cloud, it can not guarantee the security of the framework.Bindschaedler et al. [35] showed how to obtain a noisy (differentially private) aggregation result in a star network topology using Shamir secret sharing scheme and additively homomorphic encryption; we call their work DPStar.They also ensured that the amount of noise in the final result would neither be reduced by colluding entities nor be influenced by a cheating aggregator secretly, which had important practical significance.

Preliminaries
In this section we review the concepts of secure multiparty computation and differential privacy, which are building tools of our PMDP framework.

Secure Multiparty Computation.
Our framework is partially built upon the on-the-fly MPC protocol that is constructed from multikey FHE.In this paper we use the FHE from NTRU encryption scheme of Hoffstein with the modifications of Stehlé and Steinfeld [36].So we start from the NTRU encryption.
NTRU Encryption.The NTRU cryptosystem is constructed over the ring  def = Z[]/⟨  + 1⟩, where  = 2  for some integer  ∈ N. Let  be an odd prime number and  be a -bounded distribution over  ( ≪ ).Denote the polynomial ring / by   and the coefficient-wise reduction modulo  into the set {−⌊/2⌋, . . ., ⌊/2⌋} by [⋅]  .Roughly, given a security parameter , the NTRU cryptosystem is specialized as follows.
Keygen(1  ): the key generation algorithm samples polynomials   ,  ←  and lets  = 2  +1.Particularly, if  is not invertible in   , then   is resampled.Denote the inverse of  in   by  −1 ; then the public key pk and the secret key sk are calculated as follows: sk = ,
With the NTRU encryption scheme and the conversion techniques introduced in [23], we can derive a multikey fully homomorphic encryption scheme.Denote by {E () = (Keygen, Enc, Dec, Eval)} >0 the family of the resulting multikey fully homomorphic encryption schemes and by  the collection of parties.The notation Eval means the homomorphic evaluation performed by the cloud (relinearization and squashing involved in the homomorphic multiplication are not detailed due to space limitations).Then, an on-the-fly MPC protocol secure against semimalicious adversaries can be constructed as follows.
Step 1.For each  ∈ , the participant   samples a key tuple (pk  , sk  , ek  ) ← Keygen(1  ;   ) and uses pk  to encrypt his/her input   :   ← Enc(pk  ,   ), where ek  is an evaluation key.Then (pk  , ek  ,   ) is sent to a cloud server .At this point a function , represented as a circuit , has been selected on {  } ∈ for some  ⊆ .Let  = ||.
Step 3. By running a general secure MPC protocol Π mul dec , these parties  1 , . . .,   compute the function The above on-the-fly MPC protocol can be modified to achieve security against malicious adversaries by adding zeroknowledge proofs and succinct noninteractive arguments of knowledge system, which is used in both of our frameworks to guarantee their security property.

Differential Privacy.
Informally, differential privacy guarantees that a single record in a dataset being missing has only limited impact on the outputs of the queries executed on the dataset.The formal definition is captured as follows.
Definition 1 (-differential privacy [27]).An algorithm A satisfies -differential privacy (-DP) if for any pair of neighboring datasets  and   , and any  ⊆ Range(A), it holds that where Range(A) denotes the collection of all possible outputs of the algorithm A.
The datasets  and   are neighboring provided that they differ by only one tuple.We denote this by  ≃   .We can see that the change in the probability distribution of the output caused by adding/removing any single tuple is bounded by   .
As a major -differential privacy mechanism, Laplace mechanism perturbs the output of a function  DP on a dataset  by adding to  DP () a noise randomly sampled from the Laplace distribution.We define the global sensitivity of  DP as Then a Laplace mechanism A  is given as follows: where Sample-and-Aggregate. The Sample-and-Aggregate technique provides a way to lower the global sensitivity and improve the parallel degree of algorithm and further results in a differentially private method of computing a function .The basic mechanism is specified as shown in Algorithm 1.

Our Framework
In this section, we first introduce the entities involved in the PMDP framework and then provide an overview of the PMDP framework, followed by its details.Before presenting the details of our framework, we summarize the basic notations used throughout this paper in Notations.

Involved Entities.
The PMDP framework involves the following entities: (1) Completely trusted authority: the authority takes charge of producing secret keys for all legal system users.Since the authority can decrypt any ciphertext generated by any user, we thus suppose that it is completely trusted.(2) Semitrusted cloud server: a cloud server has powerful resources of storage and computation that can be easily accessed when required by system users.In the framework it is in charge of performing evaluation part of FHE in a MPC protocol and is assumed to be semitrusted.Namely, the cloud server will honestly complete the given computation assignments but will try to learn the information of the outsourced data.(3) System users: in the PMDP framework, several system users can compute the value of a public function on their private data.Each participant is associated with a unique identifier   ( = 1, 2, . . ., ) and holds the corresponding secure key issued by the authority.Let  be the collection of all parties.Each user is possible to be corrupted by adversaries, which results in the disclosure of the user's secret key.(4) Malicious adversaries: in fact, malicious adversaries do not participate in the procedure of the PMDP framework.But they do exist when we consider the security of the framework.In this paper, only adversaries that can corrupt any subset of  <  parties are considered.In the privacy analysis of our framework, we assume that adversaries have strong background knowledge.

4.2.
Overview of the PMDP Framework.Roughly, the PMDP framework is a nontrivial and tight integration of MPC and DP proposed by using the Sample-and-Aggregate mechanism, and the enhanced framework sPMDP in Section 7 is also nontrivial and even tighter because it is based on PMDP and its noise addition part is conducted on the cloud in the evaluation stage of MPC.
The PMDP framework consists of the following six stages.
Stage 0. Initially, the authority sets up the system by generating public parameters and corresponding secret keys.Like most of other frameworks, the setup procedure is important to make it work correctly.
Stage 1.Each participant encrypts his/her private data with a multikey fully homomorphic encryption scheme and outsources the resulting ciphertext to the cloud server.
Stage 2. The cloud server identifies the parties involved in the multiparty computation of the corresponding ciphertexts and partitions them into some blocks.
Stage 3. The cloud server operates on the encrypted data with on-the-fly MPC and outputs the calculated results to their corresponding blocks in the form of ciphertext.
Stage 4. These parties in the same block decrypt the returned ciphertext.Furthermore, all these decrypted results from different blocks are aggregated into a final result in accordance with the partition and sample method in the second stage.
Stage 5. Finally, to ensure output privacy, a designated participant first runs a differential privacy mechanism on the final result and then publishes the designated result.

Details of the PMDP
Framework.Now we present the details of the PMDP framework.As shown in Figure 1, the entire procedure is comprised of the following phases.

System Setup.
Initially, the authority makes the framework specific by choosing appropriate algorithms for each part and presetting related parameters.Since our framework is mainly based on the on-the-fly MPC from multikey FHE, the following components are necessary: (a) A collection of multikey fully homomorphic encryption schemes with semantical security (b) A NIZK argument system for the NP relation (c) An adaptively extractable SNARK system for all of NP.(d) A family of collision-resistant hash functions (e) An -party MPC protocol Π mul dec secure against malicious adversaries corrupting  <  parties, for computing the family of decryption functions  ,pk 1 ,ek 1 ,...,pk  ,ek  ((sk 1 ,  1 ) , . . ., (sk  ,   )) where Security and Communication Networks Security parameter  In addition, the method of sampling and partitioning Ψ & should also be determined in this part, after which the aggregation mechanism Λ agg would be explicit.Note that we use differential privacy in the last stage.Since there are several available mechanisms to achieve differential privacy and many variants of the original definition of differential privacy, participants should pay attention to select an appropriate one in the light of the function to be computed on the cloud sever.
Moreover, the authority specializes the NIZK proof system Ω enc according to a security parameter  and send the resulting common reference string crs enc to the cloud server and all parties.The privacy budget  in differential privacy is also initialized.

Data Encryption and Uploading.
In this phase all parties encrypt their data and upload them to the cloud server, before making the decision of performing which kind of multiparty computation on the outsourced data.Namely, this phase can be completed in an offline way, which reduces the communication delay of the framework.Specifically, to encrypt his/her data   , each party   ∈  runs the following algorithms: In addition,   samples a hash key hk  and computes a hash value of the ciphertext as   =  hk  (  ).Moreover,   generates a tuple of verification reference string and private verification key (vrs  , priv  ) ← Setup Φ (1  ).After that,   sends the tuple (pk  , ek  ,   ,  enc  , hk  ,   , vrs  ) to the cloud server and keeps the corresponding secret/private keys and random values as secret.
Note that after receiving the tuple from   , the cloud server will check whether the ciphertext   is well formed by verifying the associated proof  enc  .

Sampling and Partitioning.
In this phase all participants agree on the mechanism of sampling and partitioning.So far, there are a variety of sampling and partitioning mechanisms with different features for differential privacy when dealing with datasets, such as random sampling, uniform (fixed size) sampling, fraction sampling, Bernoulli sampling, random partitioning, and cell-based and kd-tree-based partitioning.Such mechanisms can not only decrease the sensitivity of the function to be computed, but also make the computation paralleled on the cloud server.Typically, if sampling is not necessary, random partitioning is a common choice due to its simplicity, effectiveness, and practicability.This is the first step of the well-known Sample-and-Aggregate algorithm.By randomly partitioning the original dataset  into  disjoint subsets {  | 1 ≤  ≤ } with almost the same size, we can perform the secure delegation of multiparty computation on each subset in the next phase.When the original dataset is partitioned, the corresponding collection of parties  is also partitioned into several subsets {  | 1 ≤  ≤ }.The parties belonging to the same   form a group.
It is suggested that the size of the blocks obtained from sampling and partitioning should be in the range of ( 0.5 ,  0.6 ), since a too big size makes the sampling and partitioning operation meaningless, and a too small size will threaten the data privacy of participants.

Homomorphic Evaluation.
In this phase the cloud server first represents the function  to be computed as a circuit , and then performs the multiparty computation on each subset   (1 ≤  ≤ ).
Concretely, for each  ∈ {1, . . ., }, let the size of   be , and the corresponding parties be { 1 , . . .,   }.The cloud server computes   ← Eval (, ( 1 , pk 1 , ek 1 ) , . . ., (  , pk  , ek  )) , (17) and produces succinct arguments {  } ∈[] for the following NP language: Second, these participants from different groups merge their outputs into a common output by performing the aggregation mechanism  ← Λ agg ({  ,  = 1, . . ., }) . (20) To simplify the above aggregation procedure, the party with the most computing power in each group will be assigned as the agent of the group, and all these agents are designated to accomplish the aggregation work.Moreover, the aggregation procedure can also be outsourced to the cloud server by using a secure delegation protocol, for example, the cloud-assisted MPC from threshold FHE [5].

Privacy-Preserving Result Release.
To preserve the output privacy well, in this phase, the noise generated by differential privacy mechanisms is added to the aggregated output.Specifically, the following issues should be synthetically considered by parties before sampling the noise.
First, since there are a few variants of -differential privacy, it is necessary and important to make a proper choice among them according to the participants' requirements.Particularly, the (, )-differential privacy is the most widely used one and is adaptable to the situation with less restriction of privacy loss.Personalized differential privacy provides more accurate control of the consumption of privacy budget at an individual level and is usually used in interactive queries.Concentrated differential privacy gives high probability bounds for cumulative loss of numerical computations and works well in the aspect of preserving group privacy.All in all, the choice should be based on the computation to be made on the cloud server and the desired privacy-preserving level.
Second, researchers have developed several differential privacy mechanisms, which can achieve the same DP definition in different application scenes.For example, the Laplace mechanism can achieve -DP for real valued queries, while the exponential mechanism is a -DP method of sampling from a discrete set of candidate outputs.Thus, if parties want to publish their average wage in a differentially private way, the Laplace mechanism is available; but if they want to release the result of their secure voting for a new leader, the exponential mechanism is a better option.
Third, by the definition of differential privacy, we know that the sensitivity of the function is an important parameter in the specific realization of differential privacy.Additionally, the computation outsourced to the cloud server and the used aggregation mechanism significantly affect the magnitude of sensitivity.Therefore, it is essential to pay attention to both of them.
The last issue needed to be considered is which party should be responsible for generating, handling, and adding the noise.It seems that any party can take on such a task.But this is built upon the assumption that all parties are at least semihonest without collusion in the result release phase.Otherwise, malicious participants may try to reduce the differential privacy of honest participants through sampling a noise that is smaller than the required magnitude, and semihonest participants may collude to disclose the information of honest participants by sharing their input data and subtracting the noise term from the published result.In our framework we assume that all participants are honest, or at least semihonest without collusion in this phase.As a result, the output privacy can be guaranteed by selecting an agent of all parties to run the differential privacy mechanism on the aggregated result.

An Instantiation of the Framework
In this section, to illustrate the effectiveness of the PMDP framework, we present an instance of the PMDP framework and show how to use it to solve the problem that  parties securely compute and publish their average wage with the help of a semitrusted cloud sever.Specifically, this instantiated framework consists of the following stages.
for all of NP.(d) The family of cryptographically collision-resistant hash functions HmacSHA256 (e) The cloud-assisted -party MPC protocol proposed by Asharov et al. [5] for computing the family of decryption functions where   ,   are randomly sampled from  and   is   's wage (the new symbols  () ,  () are related to ek, which can be further referred to in [23]).
Then,   sends the tuple (pk  , ek  ,   ,  enc  , hk  ,   , vrs  ) to the cloud server, who will verify the correctness of the proof  enc  .These values (  ,   ), (hk  ,   ), and priv  are locally maintained by   .
Stage 2. After gathering the encrypted data from all parties, the cloud sever forms them as a ciphertext dataset .Then, by calling the Sample-and-Aggregate algorithm [37], it randomly partitions  into  = ⌊ 0.4 ⌋ disjoint subsets {  | 1 ≤  ≤ }, which are with almost the same size.Meanwhile, the collection of all parties  is also partitioned into several corresponding groups {  | 1 ≤  ≤ }.
Stage 3. Since the problem is to compute the average wage of all parties, the cloud server needs to conduct the mean value function which is represented as a circuit  avr .To this end, for each subgroup   = { 1 , . . .,    } (1 ≤  ≤ ), the cloud server computes and generates the succinct arguments {  } ∈[  ] for the following NP language: Then, the agent in the group   computes where  left ,  right are the left and right bounds in the Sampleand-Aggregate algorithm.Furthermore, all agents calculate the following function with the help of on-the-fly MPC protocol: Stage 5.All parties vote for a publisher from all subgroup agents and authorize him/her to sample a noise and publish the finial average wage of all parties as where  is the privacy budget.

Performance Discussion and Security Analysis
In this section we first systematically analyze the security of the PMDP framework along with its application scenarios.Then, we roughly discuss the performance of the PMDP framework in terms of computation overhead and security property.

Security Analysis.
To simplify the security analysis of the PMDP framework, we separate participants into two categories, honest and semihonest, as in the security analysis of MPC protocol.On the other hand, the cloud server is always assumed to be untrusted.In addition, since the framework uses the differential privacy mechanism, we thus take the background knowledge attack into consideration.Below we present the security analysis of our framework in honest model and semihonest model, respectively.Since we extend the security models in traditional MPC protocols to those in our sPMDP framework, in the following security analysis we mainly focus on demonstrating that the output privacy cannot be violated since the input and computational privacy are already proven to be guaranteed by former works on MPC [23].
6.1.1.Honest Model.We first show that the PMDP framework does preserve the data privacy under the assumption that all participants are honest and do not attempt to get others' information.That is, each participant will honestly follow the framework procedure as required.Specifically, in Stage 2 of the framework, each participant's data is encrypted with FHE and then outsourced to the cloud server.Thus, the data privacy in the storage phase is ensured by the security of the underlying FHE.In Stage 4, all data are computed in the form of ciphertext with on-the-fly MPC protocol based on multikey FHE, whose security and correctness have been proved before.Therefore, data privacy in the processing phase is also protected, which means that each participant will only know his/her input and the output of his/her group.The operations of sampling, partitioning, and aggregation in Stages 3 and 5 do not influence the data privacy or the correctness of intermediate results if chosen appropriately, though there may be accuracy loss.The aggregated result is transformed into differentially private form and published in Stage 6, which ensures the output privacy owing to the theoretical guarantee of differential privacy.To sum up, in the honest model, we can see that the PMDP framework preservers the data privacy of all participants throughout its lifecycle, without affecting the data usability.
Although the framework in the honest model is very simple and seems to be idealized, it can be deployed in particular applications.An actual example is that some medical facilities try to obtain meaningful insights on health according to analyses of their clients' health data.Obviously, these facilities have the need for health data storage, processing, and release.By calling our framework, these facilities can be regarded as participants, whose data is taken good advantage of without any risk of privacy disclosure.In this instance, the medical facilities only care about the features, objective laws, and tendency of the collectivity rather than information of individuals, so it complies with the definition of honest model.

Semihonest Model.
In the semihonest model, a semihonest participant will follow the framework procedure but will also try to learn information about other parties.Moreover, a semihonest participant may collude with others, including other semihonest participants and external adversaries with background knowledge.In this work we assume that there is at least one honest party in the collection of all parties.We will show that the PMDP framework is secure only when there is no collusion in the semihonest model.
Firstly, we consider the noncolluding situation.For a semihonest participant   ∈   , he/she knows his/her input   ∈   , the local result of his/her group   , the aggregated global result of all groups , the noise  0 , and the output  DP .Besides, he/she knows the method of sampling, partitioning, 6.2.Performance Discussion.Since the PMDP framework involves several general security mechanisms, it is difficult to accurately evaluate its computation and computation costs.Thus, we show its efficiency by discussing the performance of the instance presented in prior section.
In the instance, each general party   , who is not assigned to be an agent, needs to generate his/her public/secret keys and perform encryption and NIZK operations in Stage 2.Moreover,   is also in charge of running the verification and decryption algorithms in Stage 5. Thus,   's computation cost is roughly the same as that of each party in an on-the-fly MPC protocol.According to López-Alt et al. 's [23] analysis, we know that the computation cost of each general party   is at most polylogarithmic in the circuit size || and the total size of all inputs and polynomial in his/her own input   .For an assigned agent, he/she has an extra calculation task, that is, the aggregation operation.Mostly, the aggregation complexity is O(|  |) for the agent of group   .In addition, since we assume that the cloud server has sufficient computation resources, we thus omit its computation overhead.
In Table 1 we compare the PMDP framework with other related works from the aspect of security properties, including delegation of storage, privacy in different processes, the reliability of cloud server, security model, and whether they support distributing computation.We select three representative works for comparison, namely, the GUPT system [34], the scheme designed by Pettai and Laud [33], which we call DPSharemind, and the protocol from Bindschaedler et al. [35], which we call DPStar.The last row is about the sPMDP framework, which we will introduce later.
From Table 1, we can see that overall PMDP overweighs the works in the first three rows.Firstly, PMDP enjoys the advantage in delegation of storage, which is important in the era of big data.This is because it employs FHE, while GUPT does not consider data storage and both DPSharemind and DPStar employ secret sharing, thus breaking the integrity of data.Consequently, all works but GUPT can provide lifelong data privacy guarantee, covering input privacy, computational privacy, and output privacy.All of the works can be cloud-assisted, but only PMDP and DPStar allow the cloud server to be untrusted.The security model is mainly about the participants of the works.We have shown that PMDP is secure under noncolluding semihonest model, and this is where it is weaker than DPStar.However, DPStar does not support distributed computing; only PMDP and DPSharemind do since they use sampling and aggregation.In order  to make up for the shortcoming of PMDP in security model, we propose the sPMDP framework in the next section.
Before introducing sPMDP, we further illustrate the differences between PMDP and DPStar, which is a latest public work in addressing the privacy concerns related to multiparty computation.The fundamental difference is that DPStar uses Shamir secret sharing and homomorphic encryption, and PMDP uses on-the-fly MPC from multikey FHE.As a result, the security model of PMDP can be strengthened easily while that of DPStar can not.Besides, in [35] DPStar is firstly introduced as a summation protocol and extended to other queries including count queries, histograms, and linear combinations, meaning that it has to be readjusted and several new operations have to be added when facing different queries.By contrast, PMDP is a framework and all of its details have been explained; thus it can be easily applied to different queries.So the generality and usability of DPStar is not as well as PMDP.From what has been mentioned above, we can conclude that DPStar has good security properties and important practical significance, and our PMDP framework has advantage over it in delegation of storage, distributed computing, generality, and usability.As a reinforced version of PMDP, sPMDP also has advantage over DPStar in security model, and we prove this in the next section.

Security Enhanced PMDP Framework
The main reason that the PMDP framework suffers from the above attacks is that too much information is accessible to participants.If the intermediate result   of each group   is unknown to each one, then some attacks are not effective anymore.Furthermore, the aggregated result  and noise  0 should also be not available to any participant; otherwise the adversary with strong background knowledge might be able to infer users' inputs.Motivated by the above observations, we propose a security enhanced PMDP framework (sPMDP for short).
Since the initialization mechanism and cryptography primitives used in the sPMDP framework are similar to that in PMDP, we briefly introduce its details.As shown in Figure 2, the sPMDP framework consists of the following stages.
Stage 0. Initialize the system as in the PMDP framework.Stage 3.Each party samples a noise   as the agent does in the PMDP framework, and all parties calculate the ciphertext of the average noise of all parties with the help of an on-the-fly MPC protocol.Particularly, we use the protocol without the decryption, and the ciphertext of average noise will be used in the next stage.The average noise is denoted by  and its ciphertext is denoted by  enc .The value of  will not be known by any entity in the whole procedure.
Then, the function  * is represented as a circuit  * , and the cloud server computes The cloud server also produces succinct arguments { *  } ∈[] for the NP language  * similar to the language  in the original framework.
Stage 5.Each party   runs the algorithm and  DP is released as the final result.
As to the security of sPMDP against malicious adversaries, we prove it from two different perspectives.Firstly, we demonstrate the security by theoretical proof.The difference between PMDP and sPMDP is obvious.In sPMDP, we integrate the operations of sampling, aggregation, generating, and adding noise into the on-the-fly MPC scheme, thus letting the on-the-fly MPC scheme provide global security guarantee.The aim of merging functions is to turn the original evaluation function and the function of average noise addition into one function conducted by the cloud, thus protecting the amount of noise against malicious participants.Another advantage of merging functions besides enhancing security is that it optimizes the structure of PMDP framework, reducing participants assignments and achieving a better combination between different techniques.So sPMDP is a tighter combination between MPC protocol and differential privacy.It has been proved in [23] that the on-the-fly MPC scheme is secure against malicious adversary, which means that sPMDP is secure under malicious model.So the security property of sPMDP is better than PMDP and DPStar.It is remarkable that the computation of encrypted average noise also uses the on-the-fly MPC scheme, thus ensuring that no one will know about others' noise or the value of average noise.Now we roughly explain the security of the sPMDP framework from another perspective.Note that each party in sPMDP only knows his/her own input   , noise   (44 Because there is at least one honest party, w.l.o.g., we assume that party  * is honest, who will not reveal his input   * and noise   * to others.Therefore there are at least two unknowns, namely,   * and   * , in (44) for parties other than  * .From the perspective of solutions of the equation, the equation has infinitely many solutions when it has more than one unknown with no more conditions.Therefore the input and the noise of party  * will always be unknown to others.Again, we see that in sPMDP the privacy of honest parties will be preserved.
As the cost of stronger security guarantees, the computation overhead of each party in sPMDP is larger than that in PMDP, and the additional part comes from the computation of the encrypted average noise in Stage 4. Overall, its computation cost also follows the on-the-fly MPC protocol.

Conclusion
In this work we focus on the problem of how to preserve multiparty data privacy throughout its lifecycle in cloud computing and propose a privacy-reserving framework named PMDP.This framework is built upon the on-the-fly MPC, sampling, partitioning, and aggregation mechanisms, and differential privacy, thus guaranteeing input privacy, computational privacy, and output privacy in cloud computing simultaneously, even if the cloud server is untrusted.The security analysis shows that the framework can achieve intended security goals in the honest model.To conquer those potential attacks in the semihonest model, we further present a security enhanced PMDP framework.The performance discussion indicates that the proposed framework owns advantages in security guarantees and thus is more desirable for secure multiparty data aggregation and publishing.The privacy budget Δ:

Notations
The sensitivity of  DP A: An algorithm satisfying -differential privacy Range(A): All possible outputsofA ,   : A pair of neighboring datasets  DP : The differentially private output  DP : A function on dataset  left ,  right : Clippingrange   : Th ep a r t i c i p a n to ft h ef r a m e w o r k : Th en u m b e ro fp a r t i c i p a n t s   : The input of a participant ,  * : The number of blocks (groups) ,   : The original dataset and its subsets E: The original dataset and its subsets Φ: An adaptively extractable SNARK system Ω enc : A NIZK argument system  enc : N Pr e l a t i o n H: A family of collision-resistant hash function Π dec : A n-party decryption MPC protocol  [⋅] : Decryption functions hk  : H a s hk e y s   : The hash digest of the ciphertext  enc  : A NIZK of the ciphertext vrs  : A verification reference string priv  : Ap r i v a t ev e r i fi c a t i o nk e y P versuc : The verification of argument is successful crs enc : Security parameter ,   : Th es e to fp a r t i e sa n di t ss u b s e t ,   : The set of all parties' data and its subset  * : A nh o n e s tp a r t i c i p a n t   ,  ∘ : Semihonest participants Λ agg : The aggregation function   : R a n d o mv a l u e , ,   ,   : Polynomials sampled from some distribution : The function that multiparties want to compute   : The mergence of  and aggregation function  * : Th es u mo f  and average noise ,  * : Th ec i r c u i to f and  *   : The random noise following some distribution ,  enc : Average noise and its ciphertext.

Stage 1 .Stage 2 .
parties encrypt their inputs with multikey FHE and upload the resulting ciphertexts to the cloud server.Parties decide the method of sampling and partitioning and obtain groups {  | 1 ≤  ≤  * }.

Stage 4 .
All parties (or an agent of all parties) firstly merge the original function  to be computed on each group and the aggregation function for all groups Λ agg into one function   , which is taken as inputs of all parties' inputs   = Merge (, Λ agg ) = Λ agg ({ (  ) ,  = 1, . . .,  * }) .