Cloud computing is highly suitable for medical diagnosis in e-health services where strong computing ability is required. However, in spite of the huge benefits of adopting the cloud computing, the medical diagnosis field is not yet ready to adopt the cloud computing because it contains sensitive data and hence using the cloud computing might cause a great concern in privacy infringement. For instance, a compromised e-health cloud server might expose the medical dataset outsourced from multiple medical data owners or infringe on the privacy of a patient inquirer by leaking his/her symptom or diagnosis result. In this paper, we propose a medical diagnosis system using e-health cloud servers in a privacy preserving manner when medical datasets are owned by multiple data owners. The proposed system is the first one that achieves the privacy of medical dataset, symptoms, and diagnosis results and hides the data access pattern even from e-health cloud servers performing computations using the data while it is still robust against collusion of the entities. As a building block of the proposed diagnosis system, we design a novel privacy preserving protocol for finding the
Cloud computing, as an emerging computing paradigm, is revolutionizing the data processing methodology of many organizations because of its resource efficiency and reduction in management cost. As the costs of healthcare services rise, e-health is considered as one of the promising fields that could benefit from using cloud computing [
Meanwhile, adopting cloud computing for medical diagnosis causes privacy issues because of the sensitive personal information contained in medical data. Specifically, if medical data owners such as hospitals outsource their medical diagnosis dataset in the open to e-health cloud, a compromised e-health cloud service provider might expose them. Similarly, if a patient inquirer sends and receives his/her symptom and diagnosis result in the open with the e-health cloud for diagnosis service, the compromised e-health cloud service provider might infringe on his/her privacy by exposing them. Even though the medical data owners and the patient inquirer encrypt them before sending them to the e-health cloud to protect their privacy, it is still possible that the compromised e-health cloud service provider might obtain additional information by observing data access patterns during processing.
The Health Insurance Portability and Accountability Act (HIPAA) regulates the privacy and security of individually identifiable health information to be guaranteed obligatorily [
For medical diagnosis, case-based reasoning (CBR), which has been applied to the medical diagnosis since late 1980 [
In real healthcare service environment, health records are owned by multiple data owners such as hospitals, which are unwilling to reveal the health records due to privacy or legal issue. If a data owner collects the health records to outsource them to e-health cloud servers, it brings privacy concerns. Unfortunately, most of the previous works to compute
The main theme of this paper is to design a privacy preserving
Functionality comparison with related works.
Functionality | [ |
[ |
Our study |
---|---|---|---|
Privacy of dataset | O | O | O |
Privacy of input query | O | O | O |
Privacy of |
X | O | O |
Privacy of data access pattern | X | O | O |
Robustness for collusion attack | O | X | O |
As one of the building blocks of our PP
As mentioned in [
In MPC based on secret sharing, data are to be shared among multiple cloud servers and each share reveals nothing on the original data, which can be reconstructed only when a sufficient number (i.e., more than the predefined threshold value) of shares are combined together. Since our PP
The remaining part of this paper is organized as follows: in Section
We explain MPC protocols based on Shamir’s secret sharing in Section
MPC allows a set of parties (i.e., cloud servers) to jointly compute an agreed function on their inputs in a distributed fashion and to obtain the results of the function but nothing else. Each party receives shares generated from input values of function and computes results using the shares. MPC assumes that it allows for an adversary to compromise at most
MPC based on secret sharing proceeds in three phases: input sharing, computation, and output reconstruction. In the input sharing phase, a party or an external entity holding a secret
Since Shamir’s secret sharing has a linear property, addition in MPC is homomorphic. For addition of [
Multiplication by two shares [
Our proposed protocol uses comparison (lessThan) and equality MPC operations as well as basic addition and multiplication. In [
Notations for MPC operations.
Syntax | Output |
---|---|
[a] + [b], [a] + b | [a+b] |
[a]−[b], [a] − b | [a−b] |
[a]∗[b], [a]∗b | [a∗b] |
|
[1] if a < b, and |
[0] otherwise | |
|
[1] if a == b, and |
[0] otherwise | |
|
a |
We evaluate the efficiency of a protocol in terms of both the number of rounds and the amount of communication. We measure the round complexity by the invocation count of a dominant operation performed in parallel and the communication complexity by the total number of invocations of the dominant operation to be carried out, as in [
We outline the proposed PP
The proposed PP
Architecture of the proposed PP
We represent the medical data by symptom and its diagnosis result, denoted by (
We assume that the input symptom of a patient inquirer consists of
In this subsection, we explain how cloud servers privately generate global dataset from datasets distributed to multiple data owners for PP
In order to carry out the proposed PP
We consider a semi-honest adversary model where a compromised entity follows a specified protocol but tries to obtain additional information on dataset of data owners, input query, intermediate results, and
The attack scenarios in our PP
Since our PP
For simplicity,
PP
The basic idea of PE-FTK is to find the top-
While PE-FTK examines each bit of all data from the most significant bit (we will call it While examining each bit from the most significant bit to the least significant bit, PE-FTK computes It decides candidate data, that is, the data whose current bit is 1 among the data in which bitwise 0 continually appears in the prior bit For the next bit of candidate data, it computes
Table
Example of PE-FTK (dataset {16, 12, 11, 10, 9} and
Data | Data in binary |
---|---|
16 | 1 0 0 0 0 |
12 | 0 1 1 0 0 |
11 | 0 1 0 1 1 |
10 | 0 1 0 1 0 |
9 | 0 1 0 0 1 |
Bit-round | ( |
Step | Result set | Candidate set |
---|---|---|---|---|
1 | 1 < 3 | 2-3 | {16} | |
2 | 5 > 3 | 2-1 | {16} | {12, 11, 10, 9} |
3 | 2 < 3 | 5-3 | {16, 12} | {11, 10, 9} |
4 | 4 > 3 | 5-2 | {16, 12} | {11, 10} |
5 | 3 == 3 | 5-1 | {16, 12, 11} |
Input: dataset
Output:
for
end for
for
end for return
PE-FTK consists of part 1 (lines 2–13) and part 2 (lines 15–24). When it checks the
We present the PPkNN protocol in Algorithm
Input: input query
Output: class score information
return
PP
In this section, we discuss the efficiency and the security of the proposed protocols. Specifically, we analyze the empirical result of PE-FTK implementation in Section
We implemented the proposed PE-FTK with the source code of [
Figures
The number of bit-rounds and average running time according to the number of data.
The number of bit-rounds and average running time according to the length of data.
The number of bit-rounds and average running time according to
As seen in PE-FTK (Algorithm
Table
Running time of PE-FTK (seconds).
Operation | Average running time for one round | Total running time |
---|---|---|
[ |
18.3 | 132.8–332.7 |
Our study | 12 | 106.7–118.8 |
Figure
Figure
Figure
As explained above, we evaluated the complexity of PE-FTK with the execution count of part 2 (lines 15–24), since the complexity of part 2 contributes most to that of PE-FTK. Table
Round complexity and communication complexity of PE-FTK (
Operation | Comparison | Equality | Result | ||
---|---|---|---|---|---|
Round | Communication | Round | Communication | ||
[ |
2 |
( |
|
|
Probabilistic |
Our study |
|
|
|
|
Deterministic |
The complexity of PP
In the part 1 of our PE-FTK, cloud servers reconstruct the number of the highest data (
As a variation of PE-FTK, it is possible to find the top-
We show that the proposed PP
Since data owners send randomized shares of their dataset to each of cloud servers in the input sharing phase, at most
Similar to data owners, an inquirer sends to each of cloud servers the randomized share of an input query generated in secret sharing phase and receives
Compromised cloud servers can attempt to guess additional information by observing data access patterns even though the stored data are randomized. For example, when the compromised cloud servers collude with an inquirer, the compromised inquirer can send an input query to cloud servers and the compromised cloud servers can observe the data access patterns. However, since the cloud servers access all data to compute
In this section, we review existing works related to PP
After Lindell and Pinkas first introduced privacy preserving data mining in [
In [
In [
In this paper, we proposed PP
The data used to support the findings of this study are included within the article.
The authors declare that there are no conflicts of interest regarding the publication of this paper.
This research was supported by Samsung Electronics.