Sharing Privacy Protected and Statistically Sound Clinical Research Data Using Outsourced Data Storage

It is critical to scientific progress to share clinical research data stored in outsourced generally available cloud computing services. Researchers are able to obtain valuable information that they would not otherwise be able to access; however, privacy concerns arise when sharing clinical data in these outsourced publicly available data storage services. HIPAA requires researchers to deidentify private information when disclosing clinical data for research purposes and describes two available methods for doing so. Unfortunately, both techniques degrade statistical accuracy.Therefore, the need to protect privacy presents a significant problem for data sharing between hospitals and researchers. In this paper, we propose a controlled secure aggregation protocol to secure both privacy and accuracy when researchers outsource their clinical research data for sharing. Since clinical data must remain private beyond a patient’s lifetime, we take advantage of lattice-based homomorphic encryption to guarantee long-term security against quantum computing attacks. Using lattice-based homomorphic encryption, we design an aggregation protocol that aggregates outsourced ciphertexts under distinct public keys. It enables researchers to get aggregated results from outsourced ciphertexts of distinct researchers. To the best of our knowledge, our protocol is the first aggregation protocol which can aggregate ciphertexts which are encrypted with distinct public keys.


Introduction
Researchers can accelerate their learning curve if they are able to freely access clinical data from other studies.Such clinical data sharing in outsourced publicly available services is crucial to scientific progress in clinical research.The benefits of clinical data sharing using these services have been widely reported, including reduced research costs, reduced management costs, improvement of quality control, and reduced time in discovering diseases and dealing with them effectively.Through shared data, researchers access valuable information that they would not ordinarily obtain.In its policy statement on grants, the U.S. National Institute of Health (NIH) supports data sharing by requiring investigators to include a plan for data sharing or explain why data sharing is not possible.
The problem with clinical data sharing in outsourced publicly available services for research is that researchers can inadvertently violate patient privacy.HIPAA (Health Insurance Portability and Accountability Act) offers protection of patients' personal health information, but it is difficult not to invade patient privacy while sharing clinical data in outsourced publicly available data storage services [1].Therefore, researchers would rather not make their data publicly available than run the risk of violating HIPAA.
To mitigate privacy concerns, the HIPAA describes two ways to use and disclose clinical data for research purposes.Under the HIPAA Safe Harbor policy, clinical data should be deidentified so that patients are not individually identifiable.The HIPAA Safe Harbor policy stipulates that the data sharer should deidentify data by removing 18 specific data attributes, such as name, address, and all dates related to the individual patient, which may include birth date and date of death.(In addition, some researchers continue to assert that combinations of other data that are excluded from the HIPAA Safe Harbor policy could individually identify a specific person with nonnegligible probability, so they insist that there are more than 18 specific data attributes that should be included in the Safe Harbor policy [2][3][4].)Once identifying information has been removed, the deidentified data are no longer subject to the Institutional Review Board (IRB) overview.Alternatively, researchers may use anonymity techniques to deidentify patient information instead of removing all of the 18 or more data attributes that are required to be deidentified.To date, anonymity techniques have been proposed, such as -anonymity [5][6][7], ℓ-diversity [8], and -closeness [9].
It is useful to protect patient privacy with deidentification formats when sharing clinical data in outsourced publicly available data storage services, but doing so degrades the statistical accuracy since it makes it difficult to get precise statistical results.However, in some cases where accurate statistical data on patients are critical, the anonymity techniques for deidentification are not sufficient.Due to poorly deidentified data, researchers can make bad decisions.Therefore, there needs to be a privacy-preserving method for accurate statistical data.
In this work, we propose how to outsource clinical research data securely and how to control the outsourced data against potential breaches of privacy, while not compromising the accuracy of statistical results.For example, a malicious researcher could circumvent any encryption by asking for one piece of data on one patient; in this way, the researcher could ultimately obtain each patient's private information.In this case, we propose a method that will foil such a malicious attempt.
The system environment we propose for hospitals, aggregator, and researchers is illustrated in Figure 1.In our system, each hospital outsources its own clinical data to cloud storage servers.The clinical data must be deidentified or encrypted to be stored publicly.We use a hybrid method to store the clinical data; that is, we deidentify the clinical data for approximate statistical data requests and encrypt numerical clinical data for accurate statistical data requests.Therefore, researchers can request both approximate and accurate statistical data.Researchers would obtain approximate statistical data directly from the cloud storage servers but cannot obtain accurate statistical data directly.When researchers would like to get accurate statistical data, they can get the data through the aggregator.The aggregator aggregates the requested data from the encrypted database stored in the cloud storage servers, and then asks each hospital to decrypt the aggregated data by consent.Hospitals can refuse the request of the aggregator, unless initial consents that have been obtained from patients allow the secondary research.Since there are ethical and practical issues associated with aggregating databases [10], hospitals should ensure that they are following "best practices" for their outsourced data, such as determining whether initial consents that have been obtained allow secondary research.
Since clinical data should remain private beyond a patient's lifetime, cryptographic long-term security is absolutely needed [11] in the area of managing clinical data.Therefore, we take advantage of a lattice-based homomorphic encryption in order to encrypt clinical data.Latticebased cryptography is believed to be secure against quantum computing attacks and guarantees long-term security.RSA, ECC, and DLP cryptosystems, which have gained attention so far, could be attacked with quantum computers [12].Quantum computing is not yet possible, but may become so in our lifetime.Furthermore, lattice-based cryptographic algorithms are more efficient than others in computational overhead because they require only linear operations on matrices such as addition, multiplication, and inverse.
In 2009, Gentry proposed the first fully homomorphic encryption scheme using ideal lattices [13].In 2010, Gentry et al. have proposed a novel homomorphic encryption scheme (referred to as GHV homomorphic encryption scheme hereafter) that supports one multiplicative and polynomially many additive operations on encrypted data [14].As a building block, we use a variant of the GHV homomorphic encryption scheme, which supports only additive operations.This can make it possible to aggregate ciphertexts which are encrypted under distinct public keys.Due to this property, the aggregator can aggregate the outsourced encrypted data from hospitals.Therefore, once hospitals outsource their clinical data, they do not need to encrypt the clinical data again for individual researchers.Each hospital only has to encrypt the clinical data, and then it outsources the encrypted data.

Contributions.
In this paper, we propose a controlled secure aggregation protocol in sharing clinical research data to balance the interests between hospitals and researchers.The main contributions of this paper are as follows.
(i) Researchers can get approximate statistical data from deidentified clinical data directly.Researchers can also obtain accurate and aggregated clinical data from the encrypted database through the aggregator by obtaining each hospital's consent.
(ii) We take advantage of a lattice-based homomorphic encryption which is secure against quantum computing attacks.Therefore, our protocol resists quantum attacks and could remain secure in the long term.
(iii) The aggregator can aggregate encrypted clinical data which are encrypted with distinct public keys.Therefore, hospitals do not have to encrypt the clinical data again whenever researchers send requests.
To the best of our knowledge, our protocol is the first protocol which takes advantage of the lattice-based homomorphic encryption in order to share outsourced clinical research data.
Organization.The remainder of this paper is organized as follows.Section 2 provides related works and background.Section 3 presents our controlled secure aggregation protocol.We present our secure clinical data aggregation system in Section 4 and analyze it in Section 5. We provide our conclusions in Section 6.

Related Works and Background
In this section, we present related works and background.

Data Aggregation
Based on Homomorphic Encryption.In 2004, Hacıgümüs ¸et al. proposed an aggregation protocol over encrypted relational databases [15].They designed the aggregation protocol using the PH (Privacy Homomorphism) which supports additive and multiplicative operations.In the aggregation protocol, permitted users can get the accurate and aggregated data.However, Mykletun and Tsudik showed that the aggregation protocol using the PH is not secure against ciphertext-only attacks [16].Since then, various aggregation protocols over encrypted data have been proposed in the literatures [17][18][19][20][21][22][23].Among those protocols, few literatures have focused on the health-care environment.In addition, most protocols considered the aggregation for a single provider's data.
Molina et al. [22] designed the aggregation protocol, HICCUPS, using homomorphic encryption in the healthcare environment.In HICCUPS, clinical data of multiple providers can be aggregated as follows: caregivers who store clinical data on their own database are randomly chosen as the aggregator.When a researcher requests the aggregated result, the aggregator aggregates the encrypted clinical data from each caregiver and sends the aggregated result to the researcher.
Since HICCUPS is not based on the outsourcing system, caregivers have to provide clinical data whenever a researcher requests a certain data.In addition, HICCUPS requires each caregiver to aggregate and encrypt clinical data with the researcher's public key so that the aggregator can aggregate the encrypted clinical data.However, a malicious aggregator may want to have a researcher get a misleading result by intentionally excluding the encrypted clinical data from certain caregivers.Even though the malicious aggregator fabricates the aggregated result on purpose, there is no way for a researcher to detect the malicious behavior of the aggregator in HICCUPS.
To resolve the above issues, we design the controlled secure aggregation protocol which can aggregate outsourced ciphertexts under distinct public keys.Therefore, data providers (or hospitals) do not have to encrypt clinical data again, once they have outsourced their clinical data.Our protocol also enables a researcher to detect the malicious behavior of the aggregator.If the malicious aggregator excludes the encrypted clinical data from certain data providers on purpose, a researcher can detect that.Since each data provider (or hospital) collaboratively makes the aggregated data decryptable by a researcher, if the aggregated data is generated maliciously then the researcher cannot get a plausible result.The researcher gets the random result that cannot seem to be a meaningful result.Therefore, in our protocol, the researcher can be sure that the requested data are aggregated correctly.

Anonymity Techniques for Deidentification. Samarati and
Sweeney introduced an anonymity technique called anonymity [5][6][7].They considered a relational database that consists of unique identifiers, quasi-identifiers, and sensitive attributes.A unique identifier is any attribute that is able to identify only one private individual, such as a personal ID, an e-mail address, or a cell phone number.A quasi-identifier is any set of attributes that can be joined with additional information to identify only one private individual, such as a zip code and a birthday.A sensitive attribute is any attribute that a data owner does not want to publish, such as health-care data.In order to preserve privacy, all unique identifiers must be removed and all quasi-identifiers must be anonymized.In -anonymity, each quasi-identifier is indistinguishable from at least  − 1 other quasi-identifiers.Tables 1 and 2 are good examples of the original health-care data and the 4-anonymous pieces of health-care data.
However, -anonymity is not secure against homogeneity attacks and background knowledge attacks [8].For example, suppose that Alice knows that Bob is in his twenties and his zip code is 13032; then, Alice can identify that Bob must have a gastric ulcer from Table 2.
To mitigate these attacks, Machanavajjhala et al. introduced a new anonymity technique, called ℓ-diversity [8].In ℓdiversity, all the equivalence classes that have the same quasiidentifiers must have ℓ or more different sensitive attributes.Table 3 shows the 3-diverse kinds of health-care data.
Since this result of ℓ-diversity, Li et al. showed that ℓdiversity is insufficient for anonymity [9].In ℓ-diversity, any information can be released if there exists a significant distribution difference between sensitive attributes of any equivalence class and all sensitive attributes.For example, if Alice knows Bob's personal information such as his age and zip code, she will be able to identify from Table 3 that Bob  [9].closeness requires the distribution of sensitive attributes of any equivalence class to be similar to that of all sensitive attributes.

2.3.
GHV Homomorphic Encryption Scheme.GHV homomorphic encryption scheme supports one multiplicative and polynomially many additive operations on encrypted data [14].The security of the GHV homomorphic encryption scheme is based on the learning with errors (LWE) problem [24] which is one of the hardest assumptions so far.
In this algorithm, ( ( That is, the output of GHV.Dec(, That is, the output of GHV.Dec(, In this paper, we use, as a building block, a variant version of the GHV homomorphic encryption scheme which supports only additive operations.We call this variant version of the GHV homomorphic encryption scheme a GHV * homomorphic encryption scheme hereafter.We can replace

Ajtai's One-Way Function.
Ajtai constructed a one-way function whose security is based on some well known approximation problems in lattices [27,28].
Let  be the security parameter,  a positive integer, and  a positive integer.For a uniformly random matrix M ∈ Z ×  and r ∈ {0, 1}  , the Ajtai's one-way function h M : {0, 1}  → Z   is as follows: Note that the Ajtai's one-way function h M is regular [29]; that is, every output of h M is uniformly distributed over Z   [30].

Controlled Secure Aggregation Protocol
In this section, we propose our controlled secure aggregation protocol (CSA protocol hereafter).Let  be the security parameter.Then we choose other parameters which are used in our CSA protocol as follows: Suppose that there are n users, U  (1 ≤  ≤ n), a receiver U R , and an aggregator AGG.Each user U  (1 ≤  ≤ n) outsources its own numerical data b  with encrypted form.We assume that the receiver U R wants to know an aggregated value b = ∑ =|I| =1 b   , where I = { 1 , . . .,  |I| } ⊆ {1, . . ., n} and |I| is the number of elements in I.We also assume that the receiver U R has a public key  R = A R and a secret key  R = T  by performing GHV * .Key(1  , 1  , ).Then the receiver U R can get b by performing our CSA protocol.
Our CSA protocol consists of the following phases which are illustrated in Box 1: Key Generation, Encryption, Aggregation, re-Aggregation, and dec-Aggregation.In the Key Generation phase, each user generates a public key pair and a secret key.In the Encryption phase, each user encrypts its numerical data with his/her public key pair.In the Aggregation phase, ciphertexts generated under distinct public key pairs are aggregated.That is, to get an aggregated value, the receiver U R allows the aggregator AGG to know I = { 1 , . . .,  |I| }.Then an aggregator AGG aggregates each ciphertext on b   (1 ≤  ≤ |I|) in this phase.In the re-Aggregation phase, the user U  eliminates A  s  from a ciphertext c  and adds GHV * .Enc(A R , 0) = A R s   + x   (mod) which is a ciphertext on 0 under the receiver's public key A R .In this phase, a ciphertext under the public key A  is converted into a ciphertext under the public key A R maintaining the same CSA.Key(1  , 1  , ): Given , , and , output a public key pair   = (A  , M  ) and a secret key   = T  by performing following steps: (1) perform GHV * .Key(1  , 1  (1) Each user As a result, a ciphertext which is decryptable by a user U  is converted into a ciphertext which is decryptable by the receiver U R maintaining the same plaintext b  .This phase is needed in the dec-Aggregation phase to make an aggregated ciphertext c  decryptable by the receiver U R .In is a sufficiently short value [14].In the dec-Aggregation phase, any user can refuse to perform the CSA.reAgg algorithm, unless initial consents that have been obtained from patients allow the secondary research.Then the receiver cannot get the result.The receiver can get the result only if all users perform the CSA.reAgg algorithm.That means the receiver can get an aggregated value that he/she is seeking only by the unanimous consent of all fusers who have the data aggregated.That is the reason why we use the term "controlled" in the CSA protocol.

Security.
We now analyze the security of our controlled secure aggregation protocol.
First, we show that our encryption CSA.Enc(  , b  ) is IND-CPA secure.Intuitively, the only difference between our encryption scheme CSA.Enc(  , b  ) and the GHV * homomorphic encryption scheme is how to generate a vector s  ∈ Z   .In the GHV * homomorphic encryption scheme; the vector s  is chosen uniformly, but in our encryption scheme CSA.Enc(  , b  ), it is generated by computing s  = h M  (r  ) using a randomly chosen vector r  ∈ {0, 1}  .Since every output of the Ajtai's one-way function h M  : {0, 1}  → Z   is uniformly distributed over Z   , a vector s  = h M  (r  ) from our encryption scheme is uniformly distributed over Z   .Therefore, the security of our encryption scheme CSA.Enc(  , b  ) is the same as the GHV * homomorphic encryption scheme.In our controlled secure aggregation protocol CSA, ciphertexts generated under distinct public key pairs can be aggregated.To decrypt the aggregated ciphertext c which is generated by ciphertexts of users U   ( Therefore, CSA.reAgg(c −1 , c    ,    ,    , A R ) is the aggregation of the secure ciphertexts.
In the fifth step of the CSA.reAgg algorithm, GHV * .Enc(A R , 0) is added to be secure against an adversary A who can eavesdrop on our controlled secure aggregation protocol CSA.
Since x   +b   is a sufficiently short value, x   +b   ( mod ) is the same as x   +b   [14].Therefore, A can decrypt c   without the secret key    .
In the dec-Aggregation phase, after all the users U   (1 ≤  ≤ |I|) eliminate A   s   from c, the result is the same form as a ciphertext generated under the public key A R .Therefore, the receiver U R can decrypt it.

Secure Clinical Data Aggregation System
In this section, we provide an overview of our system and how it works.

System
Overview.The proposed system environment consists of hospitals, an aggregator, and researchers.In our system, each hospital outsources its clinical data to cloud storage servers.Hospitals use the following hybrid method to store data when outsourcing their clinical research data in cloud servers: they make anonymous data publicly available in the cloud servers using anonymity techniques for deidentification in Section 2.1.In addition, hospitals also store their encrypted numerical data together with the anonymous data for statistical accuracy.Suppose that there are 4 hospitals, H 1 , H 2 , H 3 , and H 4 that want to share their clinical data and have public and sescret key pairs ( 1 ,  1 ), ( 2 ,  2 ), ( 3 ,  3 ), and ( 4 ,  4 ) of our CSA protocol, respectively.Suppose that there is an aggregator AGG and a researcher R who has a public and secret key pair ( R ,  R ) = (A R , T R ) of the GHV * homomorphic encryption scheme.The original clinical data of hospitals are shown in Table 4.Each hospital outsources its clinical data to cloud storage servers.That is, H  stores deidentified nonsensitive data (such as zip code and age), sensitive data in the raw, and numerical data (such as age) using CSA.Enc(  , Age) on cloud servers.Both anonymous and encrypted clinical data on cloud servers are shown in Table 5, where ) is an output of CSA.Enc( 1 , 31), and so on.
When the researcher R wants to know the rough estimate of the age of the hospitals' cancer patients, R can directly get the estimate data from the cloud servers.When R wants to figure out the average age of the hospitals' cancer patients, R can ask the aggregator AGG for an aggregated age.AGG sums up the ages of cancer patients in each hospital, then totals the ages across hospitals.That is, AGG performs homomorphic additions to ciphertexts under the same public key, such as

Attack Model.
For designing a secure clinical data aggregation system, the following conditions should be considered.
(1) (Anonymity) Adversaries should not exactly identify only one private individual after looking ciphertexts on cloud storage servers.
(2) (Confidentiality) Adversaries should not reveal any information from the encrypted numerical data on cloud storage servers.
(3) (External security) The third parties (external adversaries) should not know any information with information flow.
(4) (Internal security) Hospitals and researchers (internal adversaries) except the researcher who sends a request should not know any information with information flow.
Suppose that there are n H hospitals H  (1 ≤  ≤ n H ),n R researchers R  (1 ≤  ≤ n R ), and an aggregator AGG.We assume that the th hospital H  has n T tuples and the relational database in the cloud servers has n C numerical clinical data attributes.
The building blocks of our SCDA system are our controlled secure aggregation protocol CSA and the GHV * homomorphic encryption scheme GHV * .Our SCDA system consists of the following phases which are illustrated in Box 2: Preparation, Data Publication, Query, Aggregation, Consent, and Acquisition.In the Preparation phase, each hospital and each researcher generates a public key (pair) and a secret key.In the Data Publication phase, each hospital encrypts its numerical clinical data with his/her public key pair and makes anonymous data using anonymity techniques for deidentification.Then each hospital stores them in the cloud servers.In the Query phase, one of the researchers asks the aggregator AGG for an aggregated clinical data.In the Aggregation phase, ciphertexts generated under distinct hospitals are aggregated.In the Consent phase, each hospital goes through the procedure for consent.In the Acquisition phase, the researcher can get the aggregated clinical data.

Analysis
In this section, we analyze the security and efficiency in our protocol.
Using the above parameters, a ciphertext pair is only six times as large as a plaintext because  ≈  3 and the lengths of a plaintext and a ciphertext pair are  log 2  bits and 2 ⋅  log 2  bits, respectively.Our SCDA system supports   additive operations in common with [14].In the Query phase of our SCDA system, therefore, the number of including tuples in a request for an aggregated data must be less than   .Table 6 provides examples of secure parameters.

Security.
We now analyze that our SCDA system is anonymous, confidential, and secure against external and internal adversaries.Theorem 3 (anonymity).Our SCDA system is anonymous if the anonymity techniques which are used in our SCDA system are anonymous.
Proof of Theorem 3. We use anonymity techniques for deidentification, which guarantee anonymity.In our SCDA system, each hospital outsources its clinical data to cloud storage servers using these techniques for researchers.Therefore, researchers, other hospitals, and the third party cannot identify any individual using ciphertexts on cloud storage servers.
Besides using the anonymity techniques that are mentioned in Section 2.2, we could use the technique that is used to make statistical database differentially private.In 2006, Dwork introduced the new concept, called "differential privacy, " which provides a strong privacy guarantee in statistical databases [31].To achieve the differential privacy, we could add appropriately chosen random noise in statistical databases.

Theorem 4 (confidentiality). Our encrypted numerical data are confidential if the GHV * homomorphic encryption scheme is IND-CPA secure and every output of the Ajtai's one-way function h M
Proof of Theorem 4. By Theorem 1 in Section 3.1, the encrypted numerical data are confidential.
Theorem 5 (external and internal security).Our SCDA system is secure against external and internal adversaries if the anonymity techniques for deidentification are anonymous and the GHV * homomorphic encryption scheme is secure.
Proof of Theorem 5.All clinical data outsourced on cloud storage servers are anonymous and confidential since all hospitals use the anonymity techniques for deidentification  and the GHV * homomorphic encryption scheme.All transmitted data in our SCDA system are encrypted by the GHV * homomorphic encryption scheme with fresh random numbers.Therefore, our SCDA system is secure against external and internal adversaries if the anonymity techniques for deidentification are anonymous and the GHV * homomorphic encryption scheme is secure.

Efficiency.
Table 7 shows the complexity of our SCDA system.In Table 7, parameters n H , n R , n C , n T , I, and   follow in Box 2.

Experimental Results.
To demonstrate the efficiency of our system, we use MATLAB on a computer with an Intel(R) Core(TM) i3-2100 CPU (3.10 GHz) processor and 4 GB of RAM.Table 8 gives our experimental results.We assume that there are 100 hospitals with 100 clinical data each.Each row in Table 8 represents the mean of 15 trials.As illustrated in Section 5.1,  >   because  ≥  3+1 log 5  and the number of including tuples in a request for an aggregated data should be less than   .If we use the above method (i.e., b  ∈ Z  2 ), then our SCDA system has no overflow problem, because b 1 + ⋅ ⋅ ⋅ + b   ∈ Z   .
5.6.Long-Term Confidentiality.In the area of managing sensitive information, cryptographic long-term confidentiality is absolutely needed [11].In 1996, Shor showed that the RSA cryptosystem is broken by quantum attacks [12].And the DLP (Discrete Logarithm Problem) cryptosystem and ECC (Elliptic Curve Cryptography) which are important alternatives to the RSA cryptosystem are also broken by quantum attacks.In our SCDA system, we use the GHV * homomorphic encryption scheme which is secure if the LWE problem is hard.The LWE problem is hard if the SVP (Shortest Vector Problem) is hard, and the SVP is known to be hard to quantum attacks.Therefore, our SCDA system guarantees longterm confidentiality because all algorithms in our SCDA system are secure against quantum attacks.

Conclusion
In this paper, we have proposed how to outsource clinical research data securely and how to control the outsourced data against potential breaches of privacy.We also were able to share accurate statistical patient data.To achieve this, we design the controlled secure aggregation protocol that enables a researcher to get aggregated results from outsourced ciphertexts of distinct researchers.Since our protocol is designed by using the lattice-based GHV * homomorphic encryption, it guarantees long-term security against quantum computing attacks and is very efficient in computational overhead.
(i) GHV.Key(1  , 1  , ): given , , and , output a public key  = A ∈ Z ×  and a secret key  = T ∈ Z × such that TA = 0(mod), T is invertible, and the elements of T are bounded by ( log ).(To generate two matrices A and T, the trapdoor sampling algorithm in [26] can be used.For further details, please refer to [14].) (ii) GHV.Enc(, B): given  and a plaintext B ∈ Z ×  , choose a uniformly random matrix S ∈ Z ×  and a Gaussian error matrix X ∈ Z ×  .Then output a ciphertext C = AS + X + B(mod).(iii) GHV.Dec(, C): given  and a ciphertext C ∈ Z ×  , compute E = TCT  (mod).Then output a plaintext , and C ∈ Z ×  of the GHV homomorphic encryption scheme with s ∈ Z   , x ∈ Z   , b ∈ Z   , and c ∈ Z   of the GHV * homomorphic encryption scheme without any loss of security.Then the IND-CPA secure [25] GHV * homomorphic encryption scheme GHV * = {GHV * .Key, GHV * .Enc, GHV * .Dec, GHV * .Add} is as follows.(i) GHV * .Key(1  , 1  , ): given , , and , output a public key  = A ∈ Z ×  and a secret key  = T ∈ Z × such that TA = 0(mod), T is invertible, and the elements of T are bounded by ( log ).(ii) GHV * .Enc(, b): given  and a plaintext b ∈ Z   , choose a uniformly random vector s ∈ Z   and a Gaussian error vector x ∈ Z   .Then output a ciphertext c = As + x + b(mod).(iii) GHV * .Dec(, c): given  and a ciphertext c ∈ Z   , compute e = Tc(mod).Then output a plaintext b = T −1 e(mod).
c = c  1 + ⋅ ⋅ ⋅ + c  |I| .re-Aggregation.Each user U  (1 ≤  ≤ n) can run the following CSA.reAgg(c  , c   ,   ,   , A R ) algorithm to get a re-aggregated ciphertext.CSA.reAgg(c  , c   ,   ,   , A R ): Given an aggregated ciphertext c  , a ciphertext c   , a public key pair   = (A  , M  ), a secret key   = T  , and a public key A R of U R , output c by performing following steps: (1) perform GHV * .Dec(  , c   ) to get r  , (2) compute s  = h M  (r  ), (3) choose a uniformly random vector s   ∈ Z   , (4) choose a Gaussian error vector x   ∈ Z   , (5) compute c = c  − A  s  + A R s   + x   (mod ).dec-Aggregation.AGG gives an aggregated ciphertext c = c  1 + ⋅ ⋅ ⋅ + c  |I| to U  1 , and a ciphertext c    and a public key A R of U R to each user U   (1 ≤  ≤ |I|), respectively.Let c 0 = c, then the receiver U R obtains b = b  1 + ⋅ ⋅ ⋅ + b  |I| by performing following steps: the dec-Aggregation phase, each user U   (1 ≤  ≤ |I|) in turn makes an aggregated ciphertext c decryptable by the receiver U R .Through these phases, the receiver U R can get an aggregated value b = ∑ =|I| =1 b   .For example, we assume that n = 5 users participating in our controlled secure aggregation protocol CSA and each user U  (1 ≤  ≤ 5) has its numerical data b  .Each user U  (1 ≤  ≤ 5) outsources its numerical data with encrypted form (c  , c   ) = (A  s  +x  +b  ( mod ), A  s   +x   +r  ( mod )) using CSA.Enc(  , b  ) algorithm.Suppose that the receiver U R wants to know an aggregated value b = b 2 + b 5 .The receiver U R lets the aggregator AGG know I = {2, 5}.Then AGG runs the CSA.Agg((c 2 , c  2 ), (c 5 , c  5 ), {2, 5}) algorithm to get c = c 2 + c 5 = A 2 s 2 + x 2 + b 2 + A 5 s 5 + x 5 + b 5 (mod ).AGG gives c, c  2 , and A R to U 2 , and c  5 and A R to U 5 .Then U 2 runs GHV * .reAgg(c0

Theorem 1 .
Our encryption scheme CSA.Enc(  , b  ) provides IND-CPA if the GHV * homomorphic encryption scheme provides IND-CPA and every output of the Ajtai's one-way function h M  : {0, 1}  → Z   is uniformly distributed over Z   .Proof of Theorem 1. Formally, we show that if there exists an adversary A breaking the IND-CPA security of our encryption scheme CSA.Enc(  , b  ), there exists a challenger C breaking the IND-CPA security of the GHV * homomorphic encryption scheme.Let { = A, GHV * .params}be an instance given to C. C chooses a uniformly random matrix M  ∈ Z ×  and sends {  = A  = A, params = (GHV * .params,M  )} to A.A chooses {b 0 , b 1 } and sends {b 0 , b 1 } to C. C outputs {b 0 , b 1 } and returns c  , where  ∈ {0, 1}.C sends c  to A, and A outputs  ∈ {0, 1}.Then C outputs  ∈ {0, 1}.

and c 3
, respectively.After the agreement procedure, R can get the aggregated ciphertext c 3 under his/her public key.Then, R can get an average age of the cancer patients, 131/4 = 32.75, by performing GHV * .Dec( R , c 3 ) = b = 31 + 35 + 22 + 43 = 131, that is, the sum of the age of the cancer patients.

)
algorithm to get (c  , c   ).Then AGG runs the CSA.Agg({(c  , c   )} ∈I , I) algorithm to get c and {c   } ∈I .Consent.AGG determines the order in which hospitals consented to R  's request, then sends c to the first hospital and (c   , A  ) to each hospital H  ( ∈ I).Each hospital H  ( ∈ I) in turn performs the dec-Aggregation phase in our CSA protocol.If any hospital H  ( ∈ I) does not want R  to have the aggregated clinical data, it can deny the request by simply not performing the dec-Aggregation phase.Acquisition.After the consent procedure, the last hospital H |I| sends c |I| to R  .R  runs the GHV * .Add( R  , c |I| ) to get b that is an aggregated clinical data.Box 2: SCDA Protocol.

Table 1 :
Original health-care data.
has stomach-related disease (e.g., gastric ulcer, gastritis, and stomach cancer.)To mitigate such potential problem, Li et al. introduced another new anonymity technique, called -closeness

Table 4 :
Original clinical data.

Table 5 :
Anonymous and encrypted clinical data stored on cloud servers.

Table 6 :
Examples of secure parameters.Each hospital H  (1 ≤  ≤ n H ) runs the CSA.Key(1  , 1  , ) algorithm to get a public key pair   = (A  , M  ) and a secret key   = T  .Each researcher R  (1 ≤  ≤ n R ) runs the GHV * .Key(1  , 1  , ) algorithm to get a public key   = A  and a secret key  = T  .Data Publication.For all k (1 ≤ k ≤ n C ) and  (1 ≤  ≤ n T  ), each hospital H  (1 ≤  ≤ n H ) runs the CSA.Enc(  , b ,k, ) algorithm to get a ciphertext pair (c ,k, , c  ,k, ), where b ,k, is the th cell of the kth numeric clinical data attribute of the th hospital H  .Then each hospital H  (1 ≤  ≤ n H ) makes its data anonymous using anonymity techniques for de-identification.Finally, each hospital H  (1 ≤  ≤ n H ) outsources its data in the cloud servers.Query.The th researcher R  sends a request for an aggregated data to the aggregator AGG.We assume that R  is interested in the kth attribute and |I| hospitals, H  ( ∈ I), have the data in which R  is interested.Each hospital H  ( ∈ I) has   tuples that meet the request, respectively.Aggregation.AGG retrieves all ciphertext pairs satisfying R  's request.For each  ∈ I, AGG runs the GHV * .Add(c ,k, 1 , . .., c ,k,  ) and GHV * .Add(c  ,k, 1 , . . ., c  ,k,

Table 7 :
Complexity analysis of our SCDA system.H ) ⋅ CSA.Key+ (n R ) ⋅ GHV * .Key ⋅ Data Publication.(nC ⋅ n H ⋅ n T ) ⋅ CSA.Enc (n C ⋅ n H ⋅ n T ⋅ |c|)Let n H be the number of hospitals, n R the number of researchers, n C the number of numeric clinical data attributes, n T an average of the number of tuples owned in the th hospital H  , I the number of hospitals such that the th researcher R  is interested,  an average of the number of tuples in the th hospital H  such that the th researcher R  is interested, and |c| the size of a ciphertext in the GHV * homomorphic encryption scheme.

Table 8 :
Experimental results of our SCDA system.