Secure subset problem is important in secure multiparty computation, which is a vital field in cryptography. Most of the existing protocols for this problem can only keep the elements of one set private, while leaking the elements of the other set. In other words, they cannot solve the secure subset problem perfectly. While a few studies have addressed actual secure subsets, these protocols were mainly based on the oblivious polynomial evaluations with inefficient computation. In this study, we first design an efficient secure subset protocol for sets whose elements are drawn from a known set based on a new encoding method and homomorphic encryption scheme. If the elements of the sets are taken from a large domain, the existing protocol is inefficient. Using the Bloom filter and homomorphic encryption scheme, we further present an efficient protocol with linear computational complexity in the cardinality of the large set, and this is considered to be practical for inputs consisting of a large number of data. However, the second protocol that we design may yield a false positive. This probability can be rapidly decreased by reexecuting the protocol with different hash functions. Furthermore, we present the experimental performance analyses of these protocols.
National Natural Science Foundation of China6127243561373020Fundamental Research Funds for the Central Universities2016TS061National Foundation Fund of China201706870028Natural Science Foundation of Inner Mongolia2017MS0602University Scientific Research Project of Inner MongoliaNJZY171641. Introduction
The prompt development of networks provides a great opportunity for multiparty cooperative computation, and it challenges the privacy of the participants’ information. In a complex network environment, parties may not trust each other during computations, and they are required to keep their information private. Secure multiparty computation is a key technology for privacy-preserving in cooperative computations. Thus, secure multiparty computation attracts increasing attention in the international cryptographic community.
Secure multiparty computation was first introduced by Yao [1] as a millionaires’ problem in 1982. The millionaires’ problem can be described as follows. Two millionaires, Alice and Bob, want to know who is richer, but neither Alice nor Bob wants to disclose her/his own wealth to the other. This is a secure two-party computation problem. After this, Ben-Or et al. [2] gave the first secure multiparty computation protocol. A secure multiparty computation involves any two or more parties who use their own private data to cooperatively compute a function in order to obtain the predetermined output while keeping their input information private. Secure multiparty computation is a general cryptographic protocol. Many cryptographic protocols for cooperative computations that contain two or more parties can be viewed as secure multiparty computation protocols, and these include key exchange protocols [3], digital signature protocols [4], secret sharing protocols [5], zero-knowledge proof protocols [6], and oblivious transfer protocols [7]. Secure multiparty computation is a key technology in network security, and it has been the focus of the international cryptographic community for many years. The Turing Award winner Goldwasser [8] predicted that “the field of multiparty computations is today where public key cryptography was ten years ago, namely, an extremely powerful tool and rich theory whose real-life usage is at this time only beginning but will become in the future an integral part of our computing reality.”
Goldreich et al. [9, 10] thoroughly studied the secure multiparty computation problem and established its theoretical foundation. They proved that secure multiparty computation problems are theoretically solvable and proposed a general solution to secure multiparty computation problems. Because the general solution is inefficient and impractical for special problems, they also noted that, to improve efficiency, special solutions should be developed for special problems. This observation motivates people to study solutions to various secure multiparty computation problems. The problems studied include millionaires’ problems [11, 12], secure computational geometry problems [13], comparisons of information without it being leaked [14], private bidding and auction problems [15], and privacy-preserving data mining problems [16]. In addition, there are many other new secure multiparty problems that need to be studied.
Because many problems can be abstracted as set problems, private set operation is a highly important field in secure multiparty computation. These problems include set intersection [18], set union [19], and subsets [17]. The set intersection problem and the set union problem have been widely studied, while there are only few studies of the subset problem. However, there are a variety of applications for the subset problem.
In data mining, there is an important principle (Apriori Principle) about the association rule, which states that if an itemset is frequent, then all of its subsets must also be frequent [20]. Suppose that both Alice and Bob are suppliers of a supermarket W. Alice has a large frequent itemset A that is generated with data mining from the transactions of W. Bob has an itemset B, and he wants to know whether B is also a frequent itemset. However, he cannot perform data mining on the transaction data of W (either he cannot obtain the transaction data or he does not have data mining knowledge). Therefore, he resorts to Apriori Principle, but he does not want to disclose B to Alice. As expected, Alice also wishes to keep A a secret. In this application, they have to privately determine whether B⊆A.
In secret sharing, a secret is divided into w shares, and they are privately given to w parties who are called the legal shareholders, and any t or more shareholders can reconstruct the secret. During the reconstruction of the secret, some illegal shareholders may take part in the reconstruction. To prevent illegal shareholders from taking part in the reconstruction, the authenticity of the shareholder participants must be privately determined. This is where the secure subset protocol comes into play.
It is generally known that the subset problem is a special case of the set intersection. However, when applied to solve the subset problem, existing set intersection protocols can lead to both insecure and inefficient solutions. For the subset problem, we only need to determine whether B⊆A. Meanwhile, the intersection protocols have to compute every element where x∈A∧x∈B. This method will first disclose the same elements between B and A for the subset problem. Furthermore, the subset problem is a decision problem, and it does not need to compute all the elements of A∩B. Thus, the set intersection protocols are not suitable for the subset problem.
If there are two sets A and B, where |A|≥|B|, in most current studies, many private subset operations can be classified into two different cases. First, two parties proved that B⊆A, leaking the elements of set B [21–23]. Second, two parties proved that B⊆A without keeping the privacy of the elements of set A [24–27].
In addition, Kissner and Song [17] proposed a secure solution to the subset problem based on the Paillier additively homomorphic encryption scheme [28], the representation of elements of a set as roots of a polynomial, and the mathematical properties of polynomials. In their proposed solution, both sets A and B can be kept private. Let δ be the encryption of the polynomial p(x) that represents the larger set A. Note that if B⊆A, then p(b)=0 is true for every element b∈B (or vice versa). That is, B⊆A⇔∀b∈Bp(b)=0. The party who has the smaller set B evaluates the encrypted polynomial δ at each element b∈B to obtain |B| ciphertexts, and it multiplies these ciphertexts to obtain β. If β is an encryption of 0, then B⊆A. However, the computational complexity of this protocol takes 2mn+4m+8logN+mn (|A|=m, |B|=n) modular multiplications (mod N2, details are presented in Section 5.1). This depends on the product of |A| and |B|. However, the protocol is inefficient for the computation of a large quantity of data.
Furthermore, Ye et al. [29] and Sang and Shen [30] separately gave their subset protocols, which are mainly based on the oblivious polynomial evaluations, and which are similar to Kissner’s protocol. The subset protocol of [29] was presented in the distributed setting. By using (t,w) Shamir’s secret sharing scheme, the polynomial constructed based on the larger set A was distributed to multiple servers. The party who had the smaller set interacted with at least t servers to compute the subset problem based on the standard variant of the ElGamal encryption. The overall cost for the computation is O(t|A||B|), and the communication is O(tABlogp) bits. In the subset protocol of [30], Sang utilized a nonmalleable NonInteractive Zero-Knowledge (NIZK) argument, which is based on the Boneh-Goh-Nissim (BGN) cryptosystem to protect it against malicious attacks. Without considering the computational complexity of malicious attacks, the computational complexity of this protocol is O(|A||B|) besides the NIZK argument. Meanwhile, our protocols have a linear computational complexity in the cardinality of the large set O(|A|) (details are presented in Section 5.1).
Moreover, Blanton and Aguiar [31] created an efficient subset protocol based on the oblivious algorithms, such as oblivious sorting algorithms and oblivious equality algorithms. Unfortunately, this protocol is constructed using the circuit method and has the drawbacks of the circuit method [32].
Shundong et al. [12] described a secure subset protocol that retains the privacy of both sets A and B, and it is based on symmetric cryptography and has high efficiency. However, the smaller set B can only have one element in the protocol. If set B has more than one element, the parties have to execute the protocol |B| times and choose new pseudorandom sequences on each occasion, which is tedious.
In this study, we mainly propose two secure subset protocols for different situations using homomorphic encryption schemes which can be multiplicative or additive. Because a multiplicatively homomorphic encryption scheme is more efficient than an additive one, we choose a multiplicative one to build our protocols. To the best of our knowledge, encryption schemes can currently encrypt only integer messages. In addition, the sets to be computed always come from a known set whose elements are not integers for many often-occurring ranges. For this case, we design an efficient protocol, which is based on a new encoding method, and a homomorphic encryption scheme. The computational complexity of this protocol is linear in the size of the large set. For the situation in which the sets are taken from a large domain, we further present an efficient protocol based on Bloom filters and a homomorphic encryption scheme to improve efficiency without compromising accuracy much. Furthermore, we show that, by using the Bloom filter, we can solve the subset problem for sets that are taken from an exponentially large domain.
The rest of this paper is organized as follows. In Section 2, we introduce some preliminaries. In Section 3, we propose an efficient secure subset protocol for sets whose elements are drawn from a known set using a new encoding method and homomorphic encryption schemes. In Section 4, we show the secure subset protocol for sets within a large domain based on the Bloom filter and homomorphic encryption schemes, while in Section 5, we present an analysis of secure subset protocols and the experimental implementation. Finally, in Section 6, we conclude this paper.
2. Preliminaries2.1. Secure Subset Problem
Alice has a set A={a1,…,am}, and Bob has a set B={b1,…,bn}. Alice and Bob want to determine whether B is a subset of A without disclosing any information about the elements of their sets relative to each other. This can be abstracted as a secure subset problem.
2.2. Homomorphic Encryption Scheme
A homomorphic encryption scheme is an encryption scheme with some special properties that make the homomorphic encryption scheme a building block of many secure multiparty computation protocols. A conventional public key encryption scheme E consists of three algorithms: KeyGenE, EncryptE, and DecryptE.
KeyGenE. KeyGenE takes a security parameter k as the input, and it outputs a secret key sk and the corresponding public key pk with the definition of the plaintext space P and the ciphertext space C. (1)sk,pk,P,C⟵KeyGenEk.
EncryptE. Taking pk and a plaintext M∈P as inputs, EncryptE outputs a ciphertext C∈C. (2)C⟵EncryptEpk,M.
DecryptE. Taking a ciphertext C∈C and the secret key sk as inputs, DecryptE outputs the plaintext M∈P. (3)M⟵DecryptEsk,C.
In addition to the three conventional algorithms, a homomorphic encryption scheme E has an efficient algorithm EvaluateE, which takes as inputs the public key pk, an operation S, and a tuple of ciphertexts C=〈C1,…,Cs〉 (Ci is the ciphertext of Mi, i=1,…,s), and it outputs a ciphertext of S(M1,…,Ms). (4)EncryptEpk,SM1,…,Ms=EvaluateEpk,S,C.
Our construction uses semantically secure public key encryption schemes that preserve the group homomorphism under some computational complexity assumptions. This property is obtained by the Paillier encryption scheme [28] and the ElGamal encryption scheme [33] under the Composite Residuosity Class (CRC) assumption and the Computational Diffie-Hellman (CDH) assumption, respectively. Details are presented as follows.
Pailler Encryption Scheme
(i) KeyGen. On inputting a security parameter k, this algorithm generates two large primes p,q, sets N=pq, and λ=lcm(p-1,q-1) and computes g such that gcd(L(gλmodN2),N)=1, where L(x) is defined as (5)Lx=x-1N.ZN2∗ is the ciphertext space, and ZN∗ is the plaintext space. The public key is (g,N), and the private key is λ.
(ii) Encrypt. To encrypt plaintext M∈ZN∗, the Encrypt algorithm selects a random number r<N and computes (6)EM=gM·rNmodN2.
(iii) Decrypt. To decrypt the ciphertext C=E(M)∈ZN2∗, the Decrypt algorithm computes (7)M=LCλmodN2LgλmodN2modN.
(iv) Evaluate. For ciphertexts C1=E(M1), C2=E(M2), and C3=E(M) and a constant c, we have (8)EM1·EM2=gM1r1NmodN2·gM2r2NmodN2=gM1+M2·r1r2NmodN2=EM1+M2,EMc=gMr1NmodN2c=gcMr1cNmodN2=EcM.
In this encryption scheme, if M1=0, then E(M1)·E(M2)=E(M1+M2)=E(M2).
ElGamal Encryption Scheme
(i) KeyGen. On inputting a security parameter γ, the KeyGen algorithm generates a large prime p and a generator α, and it randomly chooses a number z as a private key. The public key is y=αzmodp.
(ii) Encrypt. Taking M and y as inputs, the Encrypt algorithm selects a random number r and computes (9)EM=c1,c2=αrmodp,Myrmodp.
(iii) Decrypt. This algorithm takes E(M) and z as inputs and computes (10)c2·c1-zmodp=Myr·αr-zmodp=Mmodp.
(iv) Evaluate. Given ciphertexts E(M1),E(M2), and E(M) and a constant c, we can compute that (11)EM1·EM2=αr1modp,M1yr1modp·αr2modp,M2yr2modp=αr1+r2modp,M1·M2yr1+r2modp=EM1·M2,EMc=αrmodp,Myrmodpc=αrcmodp,Mcyrcmodp=EMc.
In this encryption scheme, if M1=1, then E(M1)·E(M2)=E(M1·M2)=E(M2).
These two schemes are semantically secure under the CRC assumption or the CDH assumption. That is, given two messages M0 and M1, as well as a ciphertext E(Mt)(t∈{0,1}) encrypted by these encryption schemes, no probabilistic polynomial-time algorithm can determine whether the ciphertext E(Mt) is a ciphertext of M1 or M0 with nonnegligible advantages.
2.3. Security of Secure Multiparty Computation
We assume that all parties are semihonest. In general, a semihonest party follows the prescribed protocol correctly, except that it keeps a record of all its intermediate computations and may try to derive the other party’s private inputs from the record. Goldreich [10] also designed a compiler that can force each party to either behave in a semihonest manner or be detected. Given a protocol π, which privately computes function f in the semihonest model, this compiler can produce a new protocol Π, which privately computes f in the malicious model. This work demonstrates that the study based on the semihonest model is very important. Therefore, our work focuses on solutions to the subset problem in the semihonest model.
Different methods are used to prove the security in different cryptographic fields. The proof method, which reduces the security to a difficult assumption in the standard model or the random oracle model, is suitable for verifying encryption schemes and signature schemes. The simulation paradigm is widely accepted and is used to prove the security of secure multiparty computation protocols. The basic idea behind the simulation paradigm is to compare a real secure multiparty computation protocol with an ideal one. The real protocol is considered as secure if the real secure multiparty computation protocol does not leak more information than the ideal one. The ideal secure multiparty computing protocol can be described as follows.
Assume that there is an absolute trusted third party, denoted by Trent, who will neither lie nor leak any information that should not be revealed. Alice has a number x1, Bob has a number x2, and they want to securely compute a function f(x1,x2). They can do as follows: (a) Alice and Bob, respectively, send x1 and x2 to Trent, (b) Trent computes the function f(x1,x2), and (c) Trent tells Alice and Bob the result.
Because most secure multiparty computation protocols are constructed using public key encryption schemes, the security proof for a secure multiparty computation protocol is to reduce the security of the protocol to the security of the public key encryption scheme on which the protocol is based. That is, to prove that a multiparty computation protocol is secure, we must prove that the real secure multiparty computation protocol does not leak more information than the ideal protocol with the assumption that the public key encryption scheme used in the real protocol is secure. In other words, the information that a party obtains in a real secure multiparty computation protocol can be simulated by a simulator that only obtains the result and one party’s input, and if the sets of information obtained from both methods are computationally indistinguishable, the real protocol is secure.
Intuitively, a protocol that computes f is secure if whatever a set of semihonest parties can obtain after participating in the protocol could be obtained from the inputs and outputs of these same parties. In the simulation paradigm, this means that the VIEW (this will be discussed later) of a set of semihonest parties during a protocol execution can be simulated by their inputs and outputs.
Suppose that there are two parties Alice and Bob who have sets A and B, respectively. They want to privately compute f(A,B), which is a polynomial-time function. Further, suppose that π is a protocol-computing function f(A,B). The VIEW of Alice, who has the set A, during the execution of π on the input (A,B), is denoted by VIEW1π(A,B)=(A,r1,m11,…,mt1), where r1 is the result of Alice’s internal coin tosses, and mi1(i=1,…,t) is the i-th message that Alice received. The output of Alice after the execution of π is denoted as OUTPUT1π(A,B), which is implicit in Alice’s VIEW. Similarly, Bob’s VIEW and output during the execution of π are VIEW2π(A,B)=(B,r2,m12,…,mt2) and OUTPUT2π(A,B).
Definition 1 (security in the semihonest model [<xref ref-type="bibr" rid="B10">10</xref>]).
For a function f, we say that π privately computes f if there exist two probabilistic polynomial-time simulators, denoted by S1 and S2, such that (12)S1A,f1A,B,f2A,BA,B∈0,1∗≡cVIEW1πA,B,OUTPUT2πA,BA,B∈0,1∗,f1A,B,S2B,f2A,BA,B∈0,1∗≡cOUTPUT1πA,B,VIEW2πA,BA,B∈0,1∗,where ≡c denotes computational indistinguishability.
3. Protocol for Sets Whose Elements Are Drawn from a Known Set
Suppose that Alice has a set A and Bob has a set B. A straightforward way to compute the subset problem between A and B, without worrying about the privacy, is as follows: Alice sends her set A to Bob; Bob computes whether B⊆A; then tells the result to Alice. Thus, Alice and Bob obtain the subset relation between A and B.
By the definition of subset, if B⊆A, then for any element x∈B⇒x∈A. Thus, we can reduce the subset problem to checking whether all the elements of set B are in set A. If all the elements of B are the elements of A, then B⊆A; otherwise, B⊈A.
Suppose Alice and Bob have sets A={a1,…,am}, B={b1,…,bn} (A,B⊆U={u1,…,ul}), respectively. They want to determine whether or not B⊆A without disclosing either A or B.
3.1. Foundations of This Protocol
Before we describe the idea of our protocol, we first present the building blocks— a 1-r encoding method and a 1-0 encoding method—based on the definition of the characteristic vector of mathematics.
1-r Encoding. A 1-r encoding is used to encode a set to a 1-r vector, where every component is either 1 or r, where r is a random number and r∈Zp∗∧r≠1. The principle for encoding a set A={a1,…,am}⊆U={u1,…,ul} to a 1-r vector VAr=(va1,…,val) is as follows: if ui∈A(i=1,…,l), then vai=1; otherwise, vai=r(r∈Zp∗∧r≠1). This can also be described by the following pseudocodes:
For i=1 to l
If ui∈A
vai←1
Else vai←r(r∈Zp∗∧r≠1)
End
1-0 Encoding. This method is similar to a 1-r encoding, but with a small difference. Encoding a set B={b1,…,bn}⊆U={u1,…,ul} to a 1-0 vector VB1=(vb1,…,vbl) is as follows: if ui∈B(i=1,…,l), then vbi=1; otherwise, vbi=0. This can also be described by the pseudocodes as follows:
For i=1 to l
If ui∈B
vbi←1
Else vbi←0
End
From a high-level perspective, the 1-r encoding (1-0 encoding) encodes an x∈A (x∈B) with a one component and an x∉A (x∉B) with a random (zero) component. Alice and Bob can use the above encoding methods to compute the subset problem.
Alice encodes set A to a 1-r vector VAr, and Bob encodes set B to a 1-0 vector VB1. Alice sends her vector VAr to Bob. Bob chooses the components of VAr corresponding to the one components of VB1 and computes their product v, v=∏vbi=1vai. If v=1, then B⊆A; otherwise B⊈A. This is the principle of deciding the subset relation between sets A and B. For simplicity, we give a simple example in Table 1. V′ is the vector that is chosen from vector VAr according to the one components of VB1.
Principle of the subset problem for sets whose elements are drawn from a known set.
Set/vector
1
2
3
4
5
6
U
11
12
13
14
15
16
A
11
12
14
15
VAr
1
1
ra3
1
1
ra6
B
11
14
15
VB1
1
0
0
1
1
0
V′
1
1
1
Alice and Bob can also compute the subset using another method. Alice and Bob encode A to a 0-r vector VAr∗ and B to a 1-0 vector VB1 based on a 0-r encoding and the 1-0 encoding, respectively. The 0-r encoding is similar to the 1-r encoding and requires only that we change one component to zero components and r∈Zp∗∧r≠0. Bob computes v∗=∑vbi=1vai. If v∗=0, then B⊆A; otherwise, B⊈A.
However, we can use the above approaches to easily determine whether B⊆A easily, but it is not secure. We use semantically secure and homomorphic encryption schemes to privately compute v or v∗ in order to privately determine whether B⊆A.
3.2. Protocol
We give a solution to the secure subset problem in Protocol 2 based on the above foundations. Because a multiplicatively homomorphic encryption scheme is more efficient than an additive one, we choose a multiplicative one and encode A to VAr to present this protocol. The ElGamal encryption scheme is semantically secure if the CDH assumption holds, which can make the ciphertexts of the same plaintext indistinguishable. Therefore, we can have different ciphertexts of plaintext 1. In addition, the ElGamal encryption scheme is multiplicatively homomorphic, and we can therefore obtain E(M1·M2) using ciphertexts E(M1) and E(M2). Furthermore, E(M)1=E(M); E(M)0=1. Thus, we present Protocol 2 based on the ElGamal encryption scheme. For ease of explanation, we define P(A,B) as follows: if B⊆A, P(A,B)=1; otherwise, P(A,B)=0.
Protocol 2.
Secure subset protocol for sets whose elements are drawn from a known set.
Inputs. Alice and Bob’s input sets A={a1,…,am} and B={b1,…,bn}(A,B⊆U={u1,…,ul}).
Output. P(A,B).
Alice generates both her private key sk and its corresponding public key pk. She publishes pk while keeping sk private.
Alice encodes set A as vector VAr=(va1,…,val). She further encrypts VAr as (13)EVAr=Eva1,…,Eval
with pk. She sends E(VAr) to Bob.
Bob encodes B as VB1=(vb1,…,vbl) using 1-0 encoding. He computes (14)Ev=∏i=1lEvaivbimodp=E∏vbi=1vai.
Furthermore, he randomly chooses a number rb∈Zp∗(rb≠0) and computes (15)EV=Evrbmodp=αrmodp,v·yrmodprb=αr·rbmodp,vrb·yr·rbmodp=Evrb.
Then, he sends E(V) to Alice.
Alice decrypts E(V) to obtain V. If V=1, then Alice tells Bob that B⊆A; otherwise, Alice tells Bob that B⊈A.
Because the ciphertexts of random numbers are also random, Alice needs only to encrypt the one components of VAr in step (2). That is, Alice needs only to encrypt her own m elements. This reduces the computational complexity.
If v=1, then V=vrb=1; otherwise, if v≠1, then V=vrb is a random number. Thus, the random number rb does not change the result. In this protocol, all the parties are semihonest and may try to derive information based on the message sequences that they obtained. The random number rb can randomize the computation of Bob. In step (3), if Bob does not insert the random number rb, Alice may deduce useful information from B. If |B|=n is small, Alice can obtain the ciphertexts that Bob used to compute the product ciphertext E(v). Thus, Alice obtains the 1-0 vector VB1. Furthermore, she obtains set B. Even if |B|=n is sufficiently large, Alice cannot derive Bob’s set from E(v), but if B is not a subset of A, Alice may derive which elements are not in set A based on v and VAr.
3.3. Security of Protocol <xref ref-type="statement" rid="protocol1">2</xref>
In this manuscript, we prove the security of Protocol 2 using the simulation paradigm.
Theorem 3.
Protocol 2, denoted by π, for computing the subset problem is private.
Proof.
To prove this theorem, we show that there exist two simulators S1 and S2 such that (12) holds. We first show the construction of S1.
S1 receives (A,P(A,B)) as the input and randomly chooses a set B′⊆U such that P(A,B′)=P(A,B). S1 simulates the execution of Protocol 2 based on A, B′. S1 encodes sets A and B′ to VAr=(va1,…,val) and VB1′=(vb1′,…,vbl′), respectively.
S1 encrypts the vector VAr using the public key pk to obtain ciphertexts E(VAr)=(E(va1),…, E(val)).
S1 first computes (16)Ev′=∏i=1lEvaivbi′modp=E∏vbi=1vai.
S1 further chooses a random number rb′(rb′≠0) and computes E(V′)=(E(v′))rb′=E(v′rb′).
S1 decrypts E(V′) and obtains V′.
Let S1(A,P(A,B))={A,VAr,E(VAr),E(V′),V′} (V′=P(A,B′)). In this protocol, VIEW1π(A,B)={A,VAr, E(VAr),E(V),V} (V=P(A,B)). Because the ElGamal encryption scheme is semantically secure with the CDH assumption, messages that are encrypted based on this scheme are computationally indistinguishable. This means that the message sequences that Alice obtained in Protocol 2 and the message sequences that S1 simulated are computationally indistinguishable. As P(A,B)=P(A,B′), it follows that (17)S1A,f1A,B,f2A,BA,B∈0,1∗≡cVIEW1πA,B,OUTPUT2πA,BA,B∈0,1∗.
Now, let us examine the construction of S2. Based on the inputs (B,P(A,B)), S2 proceeds as follows:
S2 chooses a set A′⊆U such that P(A′,B)=P(A,B), and it simulates the execution of Protocol 2 with sets A′ and B. Based on the 1-r encoding and 1-0 encoding, S2 encodes A′ to VAr′=(va1′,…,val′) and B to VB1=(vb1,…,vbl), respectively.
S2 encrypts the vector VAr′ to obtain (18)EVAr′=Eva1′,…,Eval′.
Furthermore, S2 chooses a random number rb≠0 and computes E(V′)=E(v′)rb=E(v′rb).
S2 obtains V′ from E(V′).
Let S2(B,P(A,B))={B,VB,E(VA),E(V′),V′} (V′=P(A′,B)). In this protocol, VIEW2π(A,B)={B,UB,E(VA),E(V),V} (V=P(A,B)). Because messages that are encrypted using the ElGamal encryption scheme are computationally indistinguishable under the CDH assumption, the message sequences that Alice obtains in Protocol 2 and the message sequences that S2 is simulating are computationally indistinguishable. As P(A,B)=P(A′,B), we have (20)f1A,B,S2B,f2A,BA,B∈0,1∗≡cOUTPUT1πA,B,VIEW2πA,BA,B∈0,1∗.
4. Protocol for Sets with Large Domains
In Protocol 2, we present a subset protocol for sets whose elements are drawn from a known set. Because the communication complexity is linear in |U|, this is awkward if |U| is large. Therefore, we construct a secure subset protocol for sets taken from a large domain. Suppose there are two sets A and B (|A|≥|B|). This protocol is efficient with a linear computational complexity in |A|, whereas the computational complexity of Kissner’s protocol [17] is linear in the product of |A| and |B|. If |A|=|B|=m, the protocol of Kissner has an O(m2) computational complexity that is quadratic. Thus, this protocol cannot generally be considered practical for inputs consisting of a large number of data [34]. However, the protocol that we construct reduces the computational cost at the cost of degraded accuracy. That is, our protocol has a negligible false positive, and in Section 4.2, we show how to decrease the false positive.
4.1. Foundations of This Protocol
The following secure subset protocol is based on the Bloom filter [35] and a variant Bloom filter. We present the building blocks of the protocol before giving the idea behind the protocol.
Bloom Filter. A Bloom filter BFB=(w,n,k,H) is a vector of w bits that can represent a set B of at most n elements. There are k independent uniform hash functions H=(h1,…,hk), and each hj(j=1,…,k) maps the elements of B to [1,w] uniformly. In this paper, we use BFB[i] to denote the bit at index i in BFB. Initially, all bits in the array are set to 0. To insert an element b∈B into the filter, we compute each hash function hj(b) and set BFB[hj(b)]=1. After all the elements of B are inserted in BFB, the Bloom filter BFB represents set B.
To check if an item b′ is in B, we check all components of BFB that are hashed by b′. If any bit at the components is 0, then b′∉B; otherwise, b′∈B with high probability. However, while a Bloom filter may yield a false positive, it never yields a false negative. That is, if b∈B, it must be that BFB[hj(b)]=1(j=1,…,k); if b∉B, it may be that BFB[hj(b)]=1. The probability of the false positive is (21)P′=w!wkn+1∑i=1w∑j=1i-1i-jjknikw-i!j!i-j!.According to [36], it is about 2-k. We can choose k based on our practical applications. If the size of BFB is w=nklog2e, the probability that a specific component is one is 1/2 [37].
Suppose Alice has a set A, and Bob has a set B. Sets A and B are taken from an exponentially large domain, and they can compute whether B⊆A using the Bloom filter as follows.
Protocol 4.
Secure subset protocol for sets taken from an exponentially large domain.
Inputs. Alice and Bob input sets ={a1,…,am}, B={b1,…,bn}.
Output. Whether B⊆A.
Alice and Bob negotiate the parameters w,k,H for their Bloom filters.
Alice and Bob represent sets A and B to Bloom filters BFA and BFB, respectively.
Alice sends BFA to Bob.
Bob checks BFA using BFB. If BFB[i]=1(i=1,…,w), then BFA[i]=1, B⊆A; otherwise, B⊈A. He sends the result to Alice.
The above protocol has a low computational complexity based on hash functions. However, when the sets of parties are not taken from an exponentially large domain, Bob can obtain the set A from BFA using an exhaustive search. Thus, we designed a solution for sets taken from a large, but not exponentially large domain based on the Bloom filter and a variant of the Bloom filter. Before presenting the principles behind this solution, we show the variant of the Bloom filter.
Variant Bloom Filter. The variant Bloom filter is similar to the Bloom filter with a small difference. In the Bloom filter, each component is either 0 or 1 bit, while the component of the variant Bloom filter is either r(r∈Zp∗∧r≠1) or 1. Similarly, to insert an element a∈A into a variant Bloom filter VBFA=(w,m,k,H) of a set A, we compute hj(a)(j=1,…,k) and set VBFA[hj(a)]=1. After all the elements of A are inserted, we let the remaining components of VBFA be random numbers other than 1. Because the variant Bloom filter just changes 0 components to random numbers compared to the Bloom filter, the false-positive probability of the variant Bloom filter is the same as that of the Bloom filter.
Suppose that Alice and Bob have sets A and B, respectively, which are taken from a large domain, and they want to decide whether B⊆A. Alice represents set A to a variant Bloom filter VBFA and sends to Bob. Bob represents set B to a Bloom filter BFB. He computes (22)v=∏BFBi=1VBFAii=1,…,w.If v=1, then B⊆A; otherwise, B⊈A. This is the idea behind deciding whether B⊆A. However, Alice and Bob can also solve the subset problem to represent set A to another variant Bloom filter VBFA∗. Besides VBFA∗ represents the 1 component of VBFA to 0 components, and it is similar to VBFA. Bob computes (23)v∗=∑BFBi=0VBFA∗iinstead of v. If v∗=0, then B⊆A; otherwise, B⊈A.
For simplicity, we give a simple example in Table 2 with the variant Bloom filter VBFA. Suppose that Alice has a set A={134,189,258,393} and Bob has a set B={134,258}. Alice and Bob represent A and B to a variant Bloom filter VBFA and a Bloom filter BFB, respectively. Let the length w=14 and hash functions H=(h1,h2,h3) for both VBFA and BFB. The hash functions hj(x)(j=1,2,3) map any value to [1,14] uniformly, as in Table 3. Alice sets as 1 the 134, 189, 258, and 393 corresponding components in VBFA and as random numbers other components that are not mapped. Bob sets the components corresponding to 134 and 258 as 1 within BFB and other components as 0. BF is the vector that is chosen from VBFA according to the 1 component of BFB. Thus, if the product v of all the components of BF is 1, then B⊆A; otherwise, B⊈A.
Idea of the subset problem for sets with a large domain.
i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
VBFA
1
1
1
1
r5
1
1
1
1
1
1
1
r13
r14
BFB
1
1
0
1
0
0
1
0
1
0
0
1
0
0
BF
1
1
1
1
1
1
Hash table.
h1
h2
h3
134
9
2
12
189
8
10
4
258
7
1
4
393
6
11
3
Because the domain of sets is not exponentially large, these ideas are insufficiently secure for strict applications. Fortunately, we can obtain a secure scheme using homomorphic encryption. The VBFA method can be implemented with a multiplicatively homomorphic encryption scheme, and the VBFA∗ method can be implemented with an additively homomorphic encryption scheme. Because multiplicatively homomorphic encryption schemes are usually more efficient than additive ones, we represent our protocol in the next subsection with the VBFA method and the ElGamal encryption scheme that has multiplicative homomorphism.
4.2. The ProtocolProtocol 5.
Secure subset protocol for sets within a large domain.
Inputs. Alice inputs set A={a1,…,am}, and Bob inputs set B={b1,…,bn}.
Output. Whether or not B⊆A.
Alice and Bob negotiate the parameters w,k,H={h1,h2,…,hk} to construct their Bloom filters.
Alice performs the following:
generating her private key sk and its corresponding public key pk;
representing her set A to a variant Bloom filter VBFA and encrypts VBFA, (24)EVBFA=EVBFA1,EVBFA2,…,EVBFAw;
sends E(VBFA) and pk to Bob.
Bob computes the following:
E(v)=1
For i=1 to w
If BFB[i]=1(25)Ev⟵Ev·EVBFAi
Return E(v)
Bob randomly chooses rb∈Zp∗(rb≠0,1) and evaluates (26)EV=Evrbmodp=αrmodp,v·yrmodprb=αr·rbmodp,vrb·yr·rbmodp=Evrb
He sends E(V) to Alice.
Alice decrypts E(V) and obtains V. If V=1, then B⊆A; otherwise, B⊆A.
The ElGamal encryption scheme is multiplicatively homomorphic, and multiplying the ciphertexts is the same as multiplying the corresponding plaintexts. Thus, (27)EVBFA1BFB1·EVBFA2BFB2⋯EVBFAwBFBw=E∏BFBi=1VBFAi=Ev.
In step (3), if v=1, then vrb=1; otherwise, vrb≠1. Thus, rb does not change the result. However, if Bob does not insert the random number rb, Alice may obtain more information than she should. This is similar to Protocol 2 (we have omitted details in this paper). To lower the computational complexity, Alice can only encrypt the 1 component in VBFA as Protocol 2.
Analogously, Alice can also represent A as VBFA∗, and it can encrypt VBFA∗ with an additively homomorphic encryption scheme, such as the Paillier encryption scheme. She sends E(VBFA∗) to Bob. Bob computes (28)Ev∗=E∑BFBi=0VBFA∗ibased on the additive homomorphism and his Bloom filter BFB. He chooses a random number rb∗(rb∗≠0) to randomize v∗ and obtains (29)EV∗=Ev∗rb∗=gv∗rNmodN2rb∗=grb·v∗rrb∗N=Erb∗v.Bob sends E(V∗) to Alice. Alice decrypts E(V∗). If V∗=0, then B⊆A; otherwise, B⊈A. Thus, they obtain the subset relation.
The successful probability of Protocol 5 is stated in Theorem 6.
Theorem 6.
Protocol 5 will succeed with probability P=1-2-nk.
Proof.
According to [38], the probability that a particular component in the Bloom filter is 1 is 1/2. Because the variant Bloom filter is similar to the Bloom filter, the probability is also 1/2. In Protocol 5, Bob chooses nk components of E(VBFA) to compute the product E(v) based on the 1 component of BFB. The product E(v) is E(1) only if all of the nk components that Bob chose from E(VBFA) encrypt 1. This shows that B⊆A; otherwise, B⊈A. Thus, the successful probability of Protocol 5 is P=1-2-nk.
The successful probability of Protocol 5 can be increased for important applications. If Alice and Bob choose another set of k different hash functions H′={h1′,h2′,…,hk′} to reexecute Protocol 5 for the same set A and B, the probability of false positive is also 2-nk. In addition, these two executions are in series. Therefore, the successful probability is P=1-2-nk·2-nk=1-2-2nk. Thus, the successful probability to execute Protocol 5t times is P=1-2-tnk for the same sets with different hash functions on each occasion.
Corollary 7.
Secure subset protocol for sets within a large domain is private.
Based on the Theorem 3, it is easy to prove Corollary 7, and we omit the proof here.
5. Analysis of above Protocols5.1. Efficiency Analysis
Because the subset protocols of [29, 30] have foundations that are similar to Kissner’s protocol [17] and Kissner’s protocol is more efficient, we give the efficiency comparisons of computational complexity and communication complexity among the protocol of Kissner, Protocols 2 and 5 in this analysis.
Computational Complexity. In Protocol 2, Alice needs m encryptions and one decryption. While the messages to be encrypted are 1, each ElGamal encryption takes 2logp modular multiplications. For the ElGamal encryption scheme, each decryption takes logp modular multiplications. Thus, Alice needs 2m+1logp modular multiplications. Bob computes E(v) using 2n modular multiplications, and it requires 2logp to compute E(V) for Bob. The computational cost of Bob is 2logp+2n modular multiplications. Therefore, the computational overhead of Protocol 2 is 2m+3logp+2n modular multiplications (modp).
Alice encrypts her variant Bloom filter using mk encryptions during the execution of Protocol 5. Because the components are 1, each encryption takes 2logp modular multiplications. Alice also needs to decrypt E(V). Thus, Alice takes 2mk+1logp modular multiplications. Bob evaluates E(v) using 2nk modular multiplications. He computes E(V) based on E(v) taking 2logp modular multiplications, and 2nk+2logp modular multiplications are required during Protocol 5. The total computational cost is 2mk+3logp+2nk modular multiplications (modp) in Protocol 5. Because k is a constant, the computational cost is linear in m.
In the protocol proposed by Kissner and Song [17], suppose that Alice has a set A and Bob has a set B and |A|=m and |B|=n, where m≥n. Alice needs m+1 encryptions to encrypt her polynomial p(x) in order to obtain the encrypted polynomial δ and 1 decryption to decrypt the ciphertext β. Bob needs m modular exponentiations and m modular multiplications to evaluate the encrypted polynomial δ for every element bj∈B(j=1,…,n). There are n elements within B, and this takes Bob mn modular exponentiations and mn modular multiplications. For the Paillier encryption scheme, every encryption and decryption require two modular exponentiations, and every modular exponentiation requires 2logN modular multiplications. This protocol takes 2mn+4m+8logN+mn modular multiplications (modN2).
Communication Complexity. We can measure the communication complexity using the exchanged bits or the communication rounds. In secure multiparty computation, the communicating round is widely used. For Protocols 2 and 5 and Kissner’s protocol, each of them involves three communicating rounds.
Based on the above discussion, we summarize the comparison in Table 4. In this table, the modular multiplication for Kissner’s protocol is mod N2, and for our proposed protocols, it is modp. In order to achieve the same security, logN=logp.
Comparison of secure subset protocols.
Computation
Communication
Kissner’s protocol [17]
2mn+4m+8logN+mn
3
Protocol 2
2m+3logp+2n
3
Protocol 5
2mk+3logp+2nk
3
5.2. Performance Evaluation
Based on the efficiency analysis described above, the experimental setting and the performance evaluation are shown. Our experiment includes the Kissner protocol, Protocols 2 and 5.
In our implementation, we used the Java programming language to implement these protocols, and the experimental environment was as follows: Windows 10 64-bit operating system, with an Intel(R) Core(TM) i3-2100 CPU @ 3.10 GHz processor, and 4 GB of memory. We set both the Paillier scheme modular N and the ElGamal scheme modular p to be 1024 bits.
The experimental results of the subset protocols are shown in Figure 1. “KissnerP” is the protocol proposed by Kissner. Both Protocols 2 and 5 are based on the ElGamal encryption scheme, while Kissner’s protocol is based on the Paillier encryption scheme. Because the successful probability of Protocol 5 is 1-2-nk, we instantiate k=16 in the following implementation.
Comparison of the implementation of subset protocols.
In Figure 1, we showed that both Protocol 2 and Protocol 5 have a linear computational complexity, while the computational complexity of Kissner’s protocol is quadratic. Thus, our protocols are efficient and practical for large inputs. However, the probability of a false positive of Protocol 5 is 1-2-nk. Then, n,k can be chosen to be sufficiently large to make the probability negligible.
6. Conclusion
The subset problem is an important building block in secure multiparty computation, and it has many applications in privacy-preserving problems. In this study, we first presented an efficient subset protocol for sets whose elements are drawn from a known set. For sets whose elements are obtained from a large domain, we further designed an approximated and efficient subset protocol. These protocols have a linear computational complexity in the size of the large set. However, all parties of our protocols are semihonest. In future work, it is necessary to solve similar problems that fall under the malicious model.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Grant nos. 61272435 and 61373020), Fundamental Research Funds for the Central Universities (Grant no. 2016TS061), the National Foundation Fund of China (201706870028), Natural Science Foundation of Inner Mongolia (Grant no. 2017MS0602), and University Scientific Research Project of Inner Mongolia (Grant no. NJZY17164).
YaoA. C.Protocols for secure computationsProceedings of the 23rd Annual Symposium on Foundations of Computer Science1982160164MR780394Ben-OrM.GoldwasserS.WigdersonA.Completeness theorems for non-cryptographic fault-tolerant distributed computationProceedings of the 20th Annual ACM Symposium on Theory of Computing (STOC '88)May 1988USA11010.1145/62212.622132-s2.0-84898960610BaderC.HofheinzD.JagerT.KiltzE.LiY.Tightly-secure authenticated key exchangeBoldyrevaA.PalacioA.WarinschiB.Secure proxy signature schemes for delegation of signing rightsWuY.HuangL.WangX.YuN.An extensible cheat-proofing multi-secret sharing scheme with low computation complexityCanettiR.LinH.PanethO.Public-coin concurrent zero-knowledge in the global hash modelLindellY.ZarosimH.On the feasibility of extending oblivious transferGoldwasserS.Multi-party computations: past and presentProceedings of the 16th Annual ACM Symposium on Principles of Distributed Computing1997New york, NY, USAACM Press162-s2.0-0030689525GoldreichO.MicaliS.WigdersonA.How to play any mental gameProceedings of the Proceeding of the nineteenth annual ACM conference on Theory of Computing1987Piscataway, NJ, USA21822910.1145/28395.28420GoldreichO.ZhangY.ZhongS.An efficient solution to generalized Yao's millionaires problemLiS.WangD.DaiY.LuoP.Symmetric cryptographic solution to Yao's millionaires' problem and an evaluation of secure multiparty computationsLiS.WuC.WangD.DaiY.Secure multiparty computation of solid geometric problems and their applicationsFaginR.NaorM.WinklerP.Comparing Information Without Leaking ItMitsunagaT.ManabeY.OkamotoT.Efficient secure auction protocols based on the boneh-goh-nissim encryptionBogdanovD.NiitsooM.ToftT.WillemsonJ.High-performance secure multi-party computation for data mining applicationsKissnerL.SongD.Privacy-preserving set operationsFreedmanM. J.HazayC.NissimK.PinkasB.Efficient set intersection with simulation-based securityHongJ.KimJ. W.KimJ.ParkK.CheonJ. H.Constant-round privacy preserving multiset unionShengG.HouH.JiangX.ChenY.A novel association rule mining method of big data for power transformers state parameters based on probabilistic graph modelCamenischJ.ChaabouniR.Efficient protocols for set membership and range proofsCamenischJ.LysyanskayaA.Dynamic accumulators and application to efficient revocation of anonymous credentialsCramerR.DamgardI.SchoenmakersB.Proofs of partial knowledge and simplified design of witness hiding protocolsCamenischJ.ChaabouniR.Efficient protocols for set membership and range proofsAuM. H.TsangP. P.SusiloW.MuY.Dynamic universal accumulators for DDH groups and their application to attribute-based anonymous credential systemsGuoF.MuY.SusiloW.VaradharajanV.Membership encryption and its applicationsGuoF.MuY.SusiloW.Subset membership encryption and its applications to oblivious transferPaillierP.Public-key cryptosystems based on composite degree residuosity classesYeQ.WangH.PieprzykJ.Distributed private matching and set operationsSangY.ShenH.Efficient and secure protocols for privacy-preserving set operationsBlantonM.AguiarE.Private and oblivious set and multiset operationsBellareM.HoangV. T.RogawayP.Foundations of garbled circuitsProceedings of the 2012 ACM Conference on Computer and Communications Security (CCS '12)October 2012USA78479610.1145/2382196.23822792-s2.0-84869382999ElGamalT.A public key cryptosystem and a signature scheme based on discrete logarithmsFeigenbaumJ.IshaiY.MalkinT.NissimK.StraussM. J.WrightR. N.Secure multiparty computation of approximationsDongC.ChenL.WenZ.When private set intersection meets big data: An efficient and scalable protocolProceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS '13)201378980010.1145/2508859.25167012-s2.0-84888990465ChristensenK.RoginskyA.JimenoM.A new analysis of the false positive rate of a Bloom filterBoseP.GuoH.KranakisE.MaheshwariA.MorinP.MorrisonJ.SmidM.TangY.On the false-positive rate of Bloom filtersBroderA.MitzenmacherM.Network applications of Bloom filters: a survey