Efficient Secure Multiparty Subset Computation

Secure subset problem is important in secure multiparty computation, which is a vital field in cryptography. Most of the existing protocols for this problem can only keep the elements of one set private, while leaking the elements of the other set. In other words, they cannot solve the secure subset problem perfectly. While a few studies have addressed actual secure subsets, these protocols were mainly based on the oblivious polynomial evaluations with inefficient computation. In this study, we first design an efficient secure subset protocol for sets whose elements are drawn from a known set based on a new encoding method and homomorphic encryption scheme. If the elements of the sets are taken from a large domain, the existing protocol is inefficient. Using the Bloom filter and homomorphic encryption scheme, we further present an efficient protocol with linear computational complexity in the cardinality of the large set, and this is considered to be practical for inputs consisting of a large number of data. However, the second protocol that we design may yield a false positive. This probability can be rapidly decreased by reexecuting the protocol with different hash functions. Furthermore, we present the experimental performance analyses of these protocols.


Introduction
The prompt development of networks provides a great opportunity for multiparty cooperative computation, and it challenges the privacy of the participants' information.In a complex network environment, parties may not trust each other during computations, and they are required to keep their information private.Secure multiparty computation is a key technology for privacy-preserving in cooperative computations.Thus, secure multiparty computation attracts increasing attention in the international cryptographic community.
Secure multiparty computation was first introduced by Yao [1] as a millionaires' problem in 1982.The millionaires' problem can be described as follows.Two millionaires, Alice and Bob, want to know who is richer, but neither Alice nor Bob wants to disclose her/his own wealth to the other.This is a secure two-party computation problem.After this, Ben-Or et al. [2] gave the first secure multiparty computation protocol.A secure multiparty computation involves any two or more parties who use their own private data to cooperatively compute a function in order to obtain the predetermined output while keeping their input information private.Secure multiparty computation is a general cryptographic protocol.Many cryptographic protocols for cooperative computations that contain two or more parties can be viewed as secure multiparty computation protocols, and these include key exchange protocols [3], digital signature protocols [4], secret sharing protocols [5], zero-knowledge proof protocols [6], and oblivious transfer protocols [7].Secure multiparty computation is a key technology in network security, and it has been the focus of the international cryptographic community for many years.The Turing Award winner Goldwasser [8] predicted that "the field of multiparty computations is today where public key cryptography was ten years ago, namely, an extremely powerful tool and rich theory whose real-life usage is at this time only beginning but will become in the future an integral part of our computing reality." Goldreich et al. [9,10] thoroughly studied the secure multiparty computation problem and established its theoretical foundation.They proved that secure multiparty computation problems are theoretically solvable and proposed a general solution to secure multiparty computation problems.Because the general solution is inefficient and impractical for special problems, they also noted that, to improve efficiency, special solutions should be developed for special problems.This observation motivates people to study solutions to various secure multiparty computation problems.The problems studied include millionaires' problems [11,12], secure computational geometry problems [13], comparisons of information without it being leaked [14], private bidding and auction problems [15], and privacy-preserving data mining problems [16].In addition, there are many other new secure multiparty problems that need to be studied.
Because many problems can be abstracted as set problems, private set operation is a highly important field in secure multiparty computation.These problems include set intersection [18], set union [19], and subsets [17].The set intersection problem and the set union problem have been widely studied, while there are only few studies of the subset problem.However, there are a variety of applications for the subset problem.
(1) In data mining, there is an important principle (Apriori Principle) about the association rule, which states that if an itemset is frequent, then all of its subsets must also be frequent [20].Suppose that both Alice and Bob are suppliers of a supermarket .Alice has a large frequent itemset  that is generated with data mining from the transactions of . Bob has an itemset , and he wants to know whether  is also a frequent itemset.However, he cannot perform data mining on the transaction data of  (either he cannot obtain the transaction data or he does not have data mining knowledge).Therefore, he resorts to Apriori Principle, but he does not want to disclose  to Alice.As expected, Alice also wishes to keep  a secret.In this application, they have to privately determine whether  ⊆ .
(2) In secret sharing, a secret is divided into  shares, and they are privately given to  parties who are called the legal shareholders, and any  or more shareholders can reconstruct the secret.During the reconstruction of the secret, some illegal shareholders may take part in the reconstruction.To prevent illegal shareholders from taking part in the reconstruction, the authenticity of the shareholder participants must be privately determined.This is where the secure subset protocol comes into play.
It is generally known that the subset problem is a special case of the set intersection.However, when applied to solve the subset problem, existing set intersection protocols can lead to both insecure and inefficient solutions.For the subset problem, we only need to determine whether  ⊆ .Meanwhile, the intersection protocols have to compute every element where  ∈  ∧  ∈ .This method will first disclose the same elements between  and  for the subset problem.Furthermore, the subset problem is a decision problem, and it does not need to compute all the elements of  ∩ .Thus, the set intersection protocols are not suitable for the subset problem.
If there are two sets  and , where || ≥ ||, in most current studies, many private subset operations can be classified into two different cases.First, two parties proved that  ⊆ , leaking the elements of set  [21][22][23].Second, two parties proved that  ⊆  without keeping the privacy of the elements of set  [24][25][26][27].
In addition, Kissner and Song [17] proposed a secure solution to the subset problem based on the Paillier additively homomorphic encryption scheme [28], the representation of elements of a set as roots of a polynomial, and the mathematical properties of polynomials.In their proposed solution, both sets  and  can be kept private.Let  be the encryption of the polynomial () that represents the larger set .Note that if  ⊆ , then () = 0 is true for every element  ∈  (or vice versa).That is,  ⊆  ⇔ ∀ ∈ () = 0.The party who has the smaller set  evaluates the encrypted polynomial  at each element  ∈  to obtain || ciphertexts, and it multiplies these ciphertexts to obtain .If  is an encryption of 0, then  ⊆ .However, the computational complexity of this protocol takes (2 + 4 + 8) log  +  (|| = , || = ) modular multiplications (mod  2 , details are presented in Section 5.1).This depends on the product of || and ||.However, the protocol is inefficient for the computation of a large quantity of data.
Furthermore, Ye et al. [29] and Sang and Shen [30] separately gave their subset protocols, which are mainly based on the oblivious polynomial evaluations, and which are similar to Kissner's protocol.The subset protocol of [29] was presented in the distributed setting.By using (, ) Shamir's secret sharing scheme, the polynomial constructed based on the larger set  was distributed to multiple servers.The party who had the smaller set interacted with at least  servers to compute the subset problem based on the standard variant of the ElGamal encryption.The overall cost for the computation is (||||), and the communication is (|||| log ) bits.In the subset protocol of [30], Sang utilized a nonmalleable NonInteractive Zero-Knowledge (NIZK) argument, which is based on the Boneh-Goh-Nissim (BGN) cryptosystem to protect it against malicious attacks.Without considering the computational complexity of malicious attacks, the computational complexity of this protocol is (||||) besides the NIZK argument.Meanwhile, our protocols have a linear computational complexity in the cardinality of the large set (||) (details are presented in Section 5.1).
Moreover, Blanton and Aguiar [31] created an efficient subset protocol based on the oblivious algorithms, such as oblivious sorting algorithms and oblivious equality algorithms.Unfortunately, this protocol is constructed using the circuit method and has the drawbacks of the circuit method [32].
Shundong et al. [12] described a secure subset protocol that retains the privacy of both sets  and , and it is based on symmetric cryptography and has high efficiency.However, the smaller set  can only have one element in the protocol.If set  has more than one element, the parties have to execute the protocol || times and choose new pseudorandom sequences on each occasion, which is tedious.
In this study, we mainly propose two secure subset protocols for different situations using homomorphic encryption schemes which can be multiplicative or additive.Because a multiplicatively homomorphic encryption scheme is more efficient than an additive one, we choose a multiplicative one to build our protocols.To the best of our knowledge, encryption schemes can currently encrypt only integer messages.In addition, the sets to be computed always come from a known set whose elements are not integers for many often-occurring ranges.For this case, we design an efficient protocol, which is based on a new encoding method, and a homomorphic encryption scheme.The computational complexity of this protocol is linear in the size of the large set.For the situation in which the sets are taken from a large domain, we further present an efficient protocol based on Bloom filters and a homomorphic encryption scheme to improve efficiency without compromising accuracy much.Furthermore, we show that, by using the Bloom filter, we can solve the subset problem for sets that are taken from an exponentially large domain.
The rest of this paper is organized as follows.In Section 2, we introduce some preliminaries.In Section 3, we propose an efficient secure subset protocol for sets whose elements are drawn from a known set using a new encoding method and homomorphic encryption schemes.In Section 4, we show the secure subset protocol for sets within a large domain based on the Bloom filter and homomorphic encryption schemes, while in Section 5, we present an analysis of secure subset protocols and the experimental implementation.Finally, in Section 6, we conclude this paper.

Preliminaries
2.1.Secure Subset Problem.Alice has a set  = { 1 , . . .,   }, and Bob has a set  = { 1 , . . .,   }.Alice and Bob want to determine whether  is a subset of  without disclosing any information about the elements of their sets relative to each other.This can be abstracted as a secure subset problem.

Homomorphic Encryption Scheme.
A homomorphic encryption scheme is an encryption scheme with some special properties that make the homomorphic encryption scheme a building block of many secure multiparty computation protocols.A conventional public key encryption scheme E consists of three algorithms: KeyGen E , Encrypt E , and Decrypt E .
(i) KeyGen E .KeyGen E takes a security parameter  as the input, and it outputs a secret key sk and the corresponding public key pk with the definition of the plaintext space P and the ciphertext space C.
(ii) Encrypt E .Taking pk and a plaintext  ∈ P as inputs, Encrypt E outputs a ciphertext  ∈ C.
(iii) Decrypt E .Taking a ciphertext  ∈ C and the secret key sk as inputs, Decrypt E outputs the plaintext  ∈ P.
Our construction uses semantically secure public key encryption schemes that preserve the group homomorphism under some computational complexity assumptions.This property is obtained by the Paillier encryption scheme [28] and the ElGamal encryption scheme [33] under the Composite Residuosity Class (CRC) assumption and the Computational Diffie-Hellman (CDH) assumption, respectively.Details are presented as follows.
Pailler Encryption Scheme (i) KeyGen.On inputting a security parameter , this algorithm generates two large primes , , sets  = , and  = lcm( − 1,  − 1) and computes  such that gcd((  mod  2 ), ) = 1, where () is defined as *  2 is the ciphertext space, and  *  is the plaintext space.The public key is (, ), and the private key is .

Security and Communication Networks
In this encryption scheme, if ElGamal Encryption Scheme (i) KeyGen.On inputting a security parameter , the  algorithm generates a large prime  and a generator , and it randomly chooses a number  as a private key.The public key is  =   mod .
(ii) Encrypt.Taking  and  as inputs, the  algorithm selects a random number  and computes (iii) Decrypt.This algorithm takes () and  as inputs and computes (iv) Evaluate.Given ciphertexts ( 1 ), ( 2 ), and () and a constant , we can compute that In this encryption scheme, if These two schemes are semantically secure under the CRC assumption or the CDH assumption.That is, given two messages  0 and  1 , as well as a ciphertext (  ) ( ∈ {0, 1}) encrypted by these encryption schemes, no probabilistic polynomial-time algorithm can determine whether the ciphertext (  ) is a ciphertext of  1 or  0 with nonnegligible advantages.

Security of Secure Multiparty Computation.
We assume that all parties are semihonest.In general, a semihonest party follows the prescribed protocol correctly, except that it keeps a record of all its intermediate computations and may try to derive the other party's private inputs from the record.Goldreich [10] also designed a compiler that can force each party to either behave in a semihonest manner or be detected.Given a protocol , which privately computes function  in the semihonest model, this compiler can produce a new protocol Π, which privately computes  in the malicious model.This work demonstrates that the study based on the semihonest model is very important.Therefore, our work focuses on solutions to the subset problem in the semihonest model.
Different methods are used to prove the security in different cryptographic fields.The proof method, which reduces the security to a difficult assumption in the standard model or the random oracle model, is suitable for verifying encryption schemes and signature schemes.The simulation paradigm is widely accepted and is used to prove the security of secure multiparty computation protocols.The basic idea behind the simulation paradigm is to compare a real secure multiparty computation protocol with an ideal one.The real protocol is considered as secure if the real secure multiparty computation protocol does not leak more information than the ideal one.The ideal secure multiparty computing protocol can be described as follows.
Assume that there is an absolute trusted third party, denoted by Trent, who will neither lie nor leak any information that should not be revealed.Alice has a number  1 , Bob has a number  2 , and they want to securely compute a function ( 1 ,  2 ).They can do as follows: (a) Alice and Bob, respectively, send  Because most secure multiparty computation protocols are constructed using public key encryption schemes, the security proof for a secure multiparty computation protocol is to reduce the security of the protocol to the security of the public key encryption scheme on which the protocol is based.That is, to prove that a multiparty computation protocol is secure, we must prove that the real secure multiparty computation protocol does not leak more information than the ideal protocol with the assumption that the public key encryption scheme used in the real protocol is secure.In other words, the information that a party obtains in a real secure multiparty computation protocol can be simulated by a simulator that only obtains the result and one party's input, and if the sets of information obtained from both methods are computationally indistinguishable, the real protocol is secure.
Intuitively, a protocol that computes  is secure if whatever a set of semihonest parties can obtain after participating in the protocol could be obtained from the inputs and outputs of these same parties.In the simulation paradigm, this means that the VIEW (this will be discussed later) of a set of semihonest parties during a protocol execution can be simulated by their inputs and outputs.
Suppose that there are two parties Alice and Bob who have sets  and , respectively.They want to privately compute (, ), which is a polynomial-time function.Further, suppose that  is a protocol-computing function (, ).The VIEW of Alice, who has the set , during the execution of  on the input (, ), is denoted by VIEW  1 (, ) = (,  1 , where  ≡ denotes computational indistinguishability.

Protocol for Sets Whose Elements Are Drawn from a Known Set
Suppose that Alice has a set  and Bob has a set .A straightforward way to compute the subset problem between  and , without worrying about the privacy, is as follows: Alice sends her set  to Bob; Bob computes whether  ⊆ ; then tells the result to Alice.Thus, Alice and Bob obtain the subset relation between  and .
By the definition of subset, if  ⊆ , then for any element  ∈  ⇒  ∈ .Thus, we can reduce the subset problem to checking whether all the elements of set  are in set .If all the elements of  are the elements of , then  ⊆ ; otherwise,  ̸ ⊆ .Suppose Alice and Bob have sets  = { 1 , . . .,   },  = { 1 , . . .,   } (,  ⊆  = { 1 , . . .,   }), respectively.They want to determine whether or not  ⊆  without disclosing either  or .

Foundations of This
Protocol.Before we describe the idea of our protocol, we first present the building blocks-a 1- encoding method and a 1-0 encoding method-based on the definition of the characteristic vector of mathematics.

End
From a high-level perspective, the 1- encoding (1-0 encoding) encodes an  ∈  ( ∈ ) with a one component and an  ∉  ( ∉ ) with a random (zero) component.Alice and Bob can use the above encoding methods to compute the subset problem.
Alice encodes set  to a 1- vector    , and Bob encodes set  to a 1-0 vector  1  .Alice sends her vector    to Bob. Bob chooses the components of    corresponding to the one components of  1   and computes their product ⊆ .This is the principle of deciding the subset relation between sets  and .For simplicity, we give a simple example in Table 1.  is the vector that is chosen from vector    according to the one components of  1   .Alice and Bob can also compute the subset using another method.Alice and Bob encode  to a 0- vector    * and  to a 1-0 vector  1   based on a 0- encoding and the 1-0 encoding, respectively.The 0- encoding is similar to the 1- encoding and requires only that we change one component to zero components and  ∈  *  ∧  ̸ = 0. Bob computes V * = ∑ V  =1 V  .If V * = 0, then  ⊆ ; otherwise,  ̸ ⊆ .However, we can use the above approaches to easily determine whether  ⊆  easily, but it is not secure.We use semantically secure and homomorphic encryption schemes to privately compute V or V * in order to privately determine whether  ⊆ .

Protocol.
We give a solution to the secure subset problem in Protocol 2 based on the above foundations.Because a multiplicatively homomorphic encryption scheme is more efficient than an additive one, we choose a multiplicative one and encode  to    to present this protocol.The ElGamal encryption scheme is semantically secure if the CDH assumption holds, which can make the ciphertexts of the same plaintext indistinguishable.Therefore, we can have different ciphertexts of plaintext 1.In addition, the ElGamal encryption scheme is multiplicatively homomorphic, and we can therefore obtain ( 1 ⋅  2 ) using ciphertexts ( 1 ) and ( 2 ).Furthermore, () 1 = (); () 0 = 1.Thus, we present Protocol 2 based on the ElGamal encryption scheme.For ease of explanation, we define (, ) as follows: if  ⊆ , (, ) = 1; otherwise, (, ) = 0. Protocol 2. Secure subset protocol for sets whose elements are drawn from a known set.
(1) Alice generates both her private key sk and its corresponding public key pk.She publishes pk while keeping sk private.
Because the ciphertexts of random numbers are also random, Alice needs only to encrypt the one components of    in step (2).That is, Alice needs only to encrypt her own  elements.This reduces the computational complexity.
is a random number.Thus, the random number   does not change the result.In this protocol, all the parties are semihonest and may try to derive information based on the message sequences that they obtained.The random number   can randomize the computation of Bob.In step (3), if Bob does not insert the random number   , Alice may deduce useful information from .If || =  is small, Alice can obtain the ciphertexts that Bob used to compute the product ciphertext (V).Thus, Alice obtains the 1-0 vector  1  .Furthermore, she obtains set .Even if || =  is sufficiently large, Alice cannot derive Bob's set from (V), but if  is not a subset of , Alice may derive which elements are not in set  based on V and    .

Security of Protocol 2.
In this manuscript, we prove the security of Protocol 2 using the simulation paradigm.
Theorem 3. Protocol 2, denoted by , for computing the subset problem is private.
Proof.To prove this theorem, we show that there exist two simulators  1 and  2 such that (12) holds.We first show the construction of  1 .

Protocol for Sets with Large Domains
In Protocol 2, we present a subset protocol for sets whose elements are drawn from a known set.Because the communication complexity is linear in ||, this is awkward if || is large.Therefore, we construct a secure subset protocol for sets taken from a large domain.Suppose there are two sets  and  (|| ≥ ||).This protocol is efficient with a linear computational complexity in ||, whereas the computational complexity of Kissner's protocol [17] is linear in the product of || and ||.If || = || = , the protocol of Kissner has an ( 2 ) computational complexity that is quadratic.Thus, this protocol cannot generally be considered practical for inputs consisting of a large number of data [34].However, the protocol that we construct reduces the computational cost at the cost of degraded accuracy.That is, our protocol has a negligible false positive, and in Section 4.2, we show how to decrease the false positive.

Foundations of This Protocol.
The following secure subset protocol is based on the Bloom filter [35] and a variant Bloom filter.We present the building blocks of the protocol before giving the idea behind the protocol.
Bloom Filter.A Bloom filter BF  = (, , , ) is a vector of  bits that can represent a set  of at most  elements.There are  independent uniform hash functions  = (ℎ 1 , . . ., ℎ  ), and each ℎ  ( = 1, . . ., ) maps the elements of  to [1, ] uniformly.In this paper, we use BF  [] to denote the bit at index  in BF  .Initially, all bits in the array are set to 0. To insert an element  ∈  into the filter, we compute each hash function ℎ  () and set BF  [ℎ  ()] = 1.After all the elements of  are inserted in BF  , the Bloom filter BF  represents set .
To check if an item   is in , we check all components of BF  that are hashed by   .If any bit at the components is 0, then   ∉ ; otherwise,   ∈  with high probability.However, while a Bloom filter may yield a false positive, it never yields a false negative.That is, if  ∈ , it must be that BF  [ℎ  ()] = 1 ( = 1, . . ., ); if  ∉ , it may be that BF  [ℎ  ()] = 1.The probability of the false positive is According to [36], it is about 2 − .We can choose  based on our practical applications.If the size of BF  is  =  log 2 , the probability that a specific component is one is 1/2 [37].Suppose Alice has a set , and Bob has a set . Sets  and  are taken from an exponentially large domain, and they can compute whether  ⊆  using the Bloom filter as follows.
Protocol 4. Secure subset protocol for sets taken from an exponentially large domain.

Inputs. Alice and Bob input sets
Output.Whether  ⊆ .
(1) Alice and Bob negotiate the parameters , ,  for their Bloom filters.
(2) Alice and Bob represent sets  and  to Bloom filters BF  and BF  , respectively.
The above protocol has a low computational complexity based on hash functions.However, when the sets of parties are not taken from an exponentially large domain, Bob can obtain the set  from BF  using an exhaustive search.Thus, we designed a solution for sets taken from a large, but not exponentially large domain based on the Bloom filter and a variant of the Bloom filter.Before presenting the principles behind this solution, we show the variant of the Bloom filter.
Variant Bloom Filter.The variant Bloom filter is similar to the Bloom filter with a small difference.In the Bloom filter, each component is either 0 or 1 bit, while the component of the variant Bloom filter is either  ( ∈  *  ∧  ̸ = 1) or 1.Similarly, to insert an element  ∈  into a variant Bloom filter VBF  = (, , , ) of a set , we compute ℎ  () ( = 1, . . ., ) and set VBF  [ℎ  ()] = 1.After all the elements of  are inserted, we let the remaining components of VBF  be random numbers other than 1.Because the  variant Bloom filter just changes 0 components to random numbers compared to the Bloom filter, the false-positive probability of the variant Bloom filter is the same as that of the Bloom filter.Suppose that Alice and Bob have sets  and , respectively, which are taken from a large domain, and they want to decide whether  ⊆ .Alice represents set  to a variant Bloom filter VBF  and sends to Bob. Bob represents set  to a Bloom filter BF  .He computes If V = 1, then  ⊆ ; otherwise,  ̸ ⊆ .This is the idea behind deciding whether  ⊆ .However, Alice and Bob can also solve the subset problem to represent set  to another variant Bloom filter VBF *  .Besides VBF *  represents the 1 component of VBF  to 0 components, and it is similar to VBF  .Bob computes instead of V.If V * = 0, then  ⊆ ; otherwise,  ̸ ⊆ .For simplicity, we give a simple example in Table 2 with the variant Bloom filter VBF  .Suppose that Alice has a set  = {134, 189, 258, 393} and Bob has a set  = {134, 258}.Alice and Bob represent  and  to a variant Bloom filter VBF  and a Bloom filter BF  , respectively.Let the length  = 14 and hash functions  = (ℎ 1 , ℎ 2 , ℎ 3 ) for both VBF  and BF  .The hash functions ℎ  () ( = 1, 2, 3) map any value to [1,14] uniformly, as in Table 3. Alice sets as 1 the 134, 189, 258, and 393 corresponding components in VBF  and as random numbers other components that are not mapped.Bob sets the components corresponding to 134 and 258 as 1 within BF  and other components as 0. BF is the vector that is chosen from VBF  according to the 1 component of BF  .Thus, if the product V of all the components of BF is 1, then  ⊆ ; otherwise,  ̸ ⊆ .Because the domain of sets is not exponentially large, these ideas are insufficiently secure for strict applications.Fortunately, we can obtain a secure scheme using homomorphic encryption.The VBF  method can be implemented with a multiplicatively homomorphic encryption scheme, and the VBF *  method can be implemented with an additively homomorphic encryption scheme.Because multiplicatively homomorphic encryption schemes are usually more efficient than additive ones, we represent our protocol in the next subsection with the VBF  method and the ElGamal encryption scheme that has multiplicative homomorphism.

The Protocol
Protocol 5. Secure subset protocol for sets within a large domain.
Output.Whether or not  ⊆ .
The ElGamal encryption scheme is multiplicatively homomorphic, and multiplying the ciphertexts is the same as multiplying the corresponding plaintexts.Thus,  (VBF  [1]) BF  [1] ⋅  (VBF  [2]) BF  [2] ⋅ ⋅ ⋅  (VBF  []) In step (3), if V = 1, then V   = 1; otherwise, V   ̸ = 1.Thus,   does not change the result.However, if Bob does not insert the random number   , Alice may obtain more information than she should.This is similar to Protocol 2 (we have omitted details in this paper).To lower the computational complexity, Alice can only encrypt the 1 component in VBF  as Protocol 2.
Analogously, Alice can also represent  as VBF *  , and it can encrypt VBF *  with an additively homomorphic encryption scheme, such as the Paillier encryption scheme.She sends (VBF *  ) to Bob. Bob computes based on the additive homomorphism and his Bloom filter BF  .He chooses a random number  *  ( *  ̸ = 0) to randomize V * and obtains Bob sends ( * ) to Alice.Alice decrypts ( * ).If  * = 0, then  ⊆ ; otherwise,  ̸ ⊆ .Thus, they obtain the subset relation.
The successful probability of Protocol 5 is stated in Theorem 6. Theorem 6. Protocol 5 will succeed with probability  = 1 − 2 − .
Proof.According to [38], the probability that a particular component in the Bloom filter is 1 is 1/2.Because the variant Bloom filter is similar to the Bloom filter, the probability is also 1/2.In Protocol 5, Bob chooses  components of (VBF  ) to compute the product (V) based on the 1 component of BF  .The product (V) is (1) only if all of the  components that Bob chose from (VBF  ) encrypt 1.This shows that  ⊆ ; otherwise,  ̸ ⊆ .Thus, the successful probability of Protocol 5 is  = 1 − 2 − .
The successful probability of Protocol 5 can be increased for important applications.If Alice and Bob choose another set of  different hash functions   = {ℎ  1 , ℎ  2 , . . ., ℎ   } to reexecute Protocol 5 for the same set  and , the probability of false positive is also 2 − .In addition, these two executions are in series.Therefore, the successful probability is  = 1 − 2 − ⋅ 2 − = 1 − 2 −2 .Thus, the successful probability to execute Protocol 5  times is  = 1 − 2 − for the same sets with different hash functions on each occasion.

Corollary 7. Secure subset protocol for sets within a large domain is private.
Based on the Theorem 3, it is easy to prove Corollary 7, and we omit the proof here.

Analysis of above Protocols
5.1.Efficiency Analysis.Because the subset protocols of [29,30] have foundations that are similar to Kissner's protocol [17] and Kissner's protocol is more efficient, we give the efficiency comparisons of computational complexity and communication complexity among the protocol of Kissner, Protocols 2 and 5 in this analysis.
Computational Complexity.In Protocol 2, Alice needs  encryptions and one decryption.While the messages to be encrypted are 1, each ElGamal encryption takes 2 log  modular multiplications.For the ElGamal encryption scheme, each decryption takes log  modular multiplications.Thus, Alice needs (2 + 1) log  modular multiplications.Bob computes (V) using 2 modular multiplications, and it requires 2 log  to compute () for Bob.The computational cost of Bob is 2 log +2 modular multiplications.Therefore, the computational overhead of Protocol 2 is (2+3) log +2 modular multiplications (mod ).
Alice encrypts her variant Bloom filter using  encryptions during the execution of Protocol 5.Because the components are 1, each encryption takes 2 log  modular multiplications.Alice also needs to decrypt ().Thus, Alice takes (2 + 1) log  modular multiplications.Bob evaluates (V) using 2 modular multiplications.He computes () based on (V) taking 2 log  modular multiplications, and 2 + 2 log  modular multiplications are required during Protocol 5.The total computational cost is (2 + 3) log  + 2 modular multiplications (mod ) in Protocol 5.Because  is a constant, the computational cost is linear in .
In the protocol proposed by Kissner and Song [17], suppose that Alice has a set  and Bob has a set  and || =  and || = , where  ≥ .Alice needs  + 1 encryptions to encrypt her polynomial () in order to obtain the encrypted polynomial  and 1 decryption to decrypt the ciphertext .Bob needs  modular exponentiations and  modular multiplications to evaluate the encrypted polynomial  for every element   ∈  ( = 1, . . ., ).There are  elements within , and this takes Bob  modular exponentiations and  modular multiplications.For the Paillier encryption scheme, every encryption and decryption require two modular exponentiations, and every modular exponentiation requires 2 log  modular multiplications.This protocol takes (2+4+8) log + modular multiplications (mod  2 ).
Communication Complexity.We can measure the communication complexity using the exchanged bits or the  communication rounds.In secure multiparty computation, the communicating round is widely used.For Protocols 2 and 5 and Kissner's protocol, each of them involves three communicating rounds.Based on the above discussion, we summarize the comparison in Table 4.In this table, the modular multiplication for Kissner's protocol is mod  2 , and for our proposed protocols, it is mod .In order to achieve the same security, log  = log .

Performance Evaluation.
Based on the efficiency analysis described above, the experimental setting and the performance evaluation are shown.Our experiment includes the Kissner protocol, Protocols 2 and 5.
In our implementation, we used the Java programming language to implement these protocols, and the experimental environment was as follows: Windows 10 64-bit operating system, with an Intel(R) Core(TM) i3-2100 CPU @ 3.10 GHz processor, and 4 GB of memory.We set both the Paillier scheme modular  and the ElGamal scheme modular  to be 1024 bits.
The experimental results of the subset protocols are shown in Figure 1."KissnerP" is the protocol proposed by Kissner.Both Protocols 2 and 5 are based on the ElGamal encryption scheme, while Kissner's protocol is based on the Paillier encryption scheme.Because the successful probability of Protocol 5 is 1 − 2 − , we instantiate  = 16 in the following implementation.
In Figure 1, we showed that both Protocol 2 and Protocol 5 have a linear computational complexity, while the computational complexity of Kissner's protocol is quadratic.Thus, our protocols are efficient and practical for large inputs.However, the probability of a false positive of Protocol 5 is 1 − 2 − .Then, ,  can be chosen to be sufficiently large to make the probability negligible.

Conclusion
The subset problem is an important building block in secure multiparty computation, and it has many applications in privacy-preserving problems.In this study, we first presented an efficient subset protocol for sets whose elements are drawn from a known set.For sets whose elements are obtained from a large domain, we further designed an approximated and efficient subset protocol.These protocols have a linear computational complexity in the size of the large set.However, all parties of our protocols are semihonest.In future work, it is necessary to solve similar problems that fall under the malicious model.
1 and  2 to Trent, (b) Trent computes the function ( 1 ,  2 ), and (c) Trent tells Alice and Bob the result.

Figure 1 :
Figure 1: Comparison of the implementation of subset protocols.

Table 1 :
Principle of the subset problem for sets whose elements are drawn from a known set.

Table 2 :
Idea of the subset problem for sets with a large domain.

Table 4 :
Comparison of secure subset protocols.