Multiparty Threshold Private Set Intersection Protocol with Low Communication Complexity

Multiparty threshold private set intersection (MP-TPSI) protocol allows n mutually untrusted parties P1, P2, . . . , Pn holding data sets A1, A2, . . . , An of size m respectively to jointly compute the intersection I � A1 ∩A2 ∩ · · · ∩An over all their private data sets only if the size of intersection is larger than (m − t), while ensuring that no other private information of the data sets other than the intersection is revealed, where t is the threshold. In the MP-TPSI protocol, multiple parties first decide whether the size of the intersection is larger than the threshold t; then, they compute the intersection if the size of the intersection is larger than the threshold t. However, the existing MP-TPSI protocols use different forms of evaluation polynomials in the cardinality testing and intersection computing phases, so that parties need to transmit and calculate a large number of evaluation values, which leads to high communication and computational complexity. In addition, the existing MP-TPSI protocols cannot guarantee the security and the correctness of the results, that is, an adversary can know the additional information beyond the intersection, and the elements that are not in the intersection are calculated as the intersection. To solve these issues, based on the threshold fully homomorphic encryption (TFHE) and sparse polynomial interpolation, we propose an MP-TPSI protocol. In the star network topology, the theoretical communication complexity of the proposed MP-TPSI protocol depends on the threshold t and the number of parties n, not on the size of set m. Moreover, the proposed MP-TPSI protocol outperforms other related MP-TPSI protocols in terms of computational and communication overheads. Furthermore, the proposedMP-TPSI protocol tolerates up to n − 1 corrupted parties in the semi-honest model, where no set of colluding parties can learn the input of an honest party in the strictest dishonest majority setting.

However, in certain application scenarios, such as vertical federated learning (VFL) [37], the MP-PSI protocol mentioned above cannot satisfy the requirements. Specifically, in vertical federated machine learning, the training data is distributed among multiple parties, and each party has different features of the same object, multiple parties want to combine different features of common samples to train a better machine learning model. It is worth noting that all parties are willing to perform multiparty entity alignment only when the number of sample intersection is large. If the number of sample intersection is too small, the sample alignment will have no effect on improving the performance of the model, and the parties will not be interested in jointly computing the intersection of training samples. To meet such demands to determine whether the size of intersection is large enough before performing sample alignment, the multiparty threshold private set intersection (MP-TPSI) protocols [38][39][40][41] have been introduced, which enables n mutually distrusted parties P 1 , P 2 , . . . , P n holding data sets A 1 , A 2 , . . . , A n of size m respectively to jointly compute the intersection over all their private data sets only if the size of intersection is larger than (m − t), while ensuring that no other private information of the data sets other than the intersection is revealed. e MP-TPSI protocol consists of two phases: the cardinality testing phase, where multiple parties decide whether the size of intersection is larger than a certain threshold t; and the intersection computing phase, where multiple parties calculate the intersection if the size of intersection is larger than a certain threshold t. Unfortunately, the existing MP-TPSI protocols [38][39][40][41] still have the heavy communication complexity. To solve this problem, using sparse polynomial interpolation and threshold fully homomorphic encryption (TFHE) [42], this paper proposes an MP-TPSI protocol with low communication complexity. e main contributions are as follows: (1) Firstly, in a star network topology where the designated party P 1 can communicate with each party P i (i � 2, 3, . . . , n), using an evaluation method that represents the set as a polynomial, we construct an MP-TPSI protocol based on the TFHE. To reduce the communication and computational cost, we use the same form of evaluation polynomial in the cardinality testing and intersection computing phases, which enables the parties to transmit and compute only a small number of evaluation values. (2) Secondly, in the proposed MP-TPSI protocol, the theoretical communication complexity of the designated party P 1 and each party P i (i � 2, 3, . . . , n) are O (tn) and O (t), respectively, which are smaller than the existing MP-TPSI protocols [38][39][40] and TAHE-based MP-TPSI protocol [41]. In contrast to conventional MP-PSI protocols [28][29][30][31][32][33][34][35][36], the communication complexity of the proposed MP-TPSI protocol only depends on the threshold t and the number of parties n, not on the size of set m. (3) Finally, we evaluate the proposed MP-TPSI protocol and the related TFHE-based MP-TPSI protocol [41] under n ∈ 2, 3, · · · , 8 { }, m ∈ 2 10 , 2 11 , 2 12 , and t ∈ 2 9 , 2 10 , 2 11 . e experimental results demonstrate that, compared with the TFHE-based MP-TPSI protocol [41], the computational and communication costs in the proposed MP-TPSI protocol are reduced by nearly 92.0%-97.3% and 67.2%-67.3%, respectively. e security analysis illustrates that the proposed MP-TPSI protocol can achieve semi-honest security in the dishonest majority model where up to n − 1 parties can be allowed to corrupt. e remainder of the study is organized as follows. We introduce some related works in Section 2. In Section 3, we review some preliminaries. In Section 4, our protocol is described in detail. e performance evaluation of our protocol is presented in Section 5. e security analysis of our protocol is shown in Section 6. Finally, we conclude in Section 7.
By representing the set as a polynomial, based on threshold additive HE (TAHE) that can be realized from Paillier encryption [43], Kissner et al. [28] implement the PSI operations in multiparty setting. Leveraging the Bloom filters (BF) [44] and exponential additive HE (AHE) [45], Miyaji et al. [29] presented a scalable MP-PSI protocol, they set a dealer to decrease the computational complexity of the parties. In a star network topology, based on the two-party protocol of [46], Hazay et al. [30] described the MP-PSI protocols in semi-honest and malicious settings. Kolesnikov et al. [31] proposed a method called oblivious programmable PRF (OPPRF), designed MP-PSI protocols based OPPRF in the semi-honest model, and further optimized it to the augmented-semi-honest model. Inbar et al. [32] extend the PSI construction of [12] to multiparty setting, and described the MP-PSI protocols for semi-honest and augmented-semihonest settings in a star network topology. Setting the elements of its own set to the roots of a polynomial, based on the OLE, in a star network topology, Ghosh et al. [33] presented an approach to achieving secure MP-PSI. Lu et al. [34] proposed an MP-PSI protocol for VFL in a star network topology, which is able to compute the intersection in the event that some of the parties are offline. Combining of the star and path communication patterns which in the former, one party at the center can communicate with all other parties, and in the latter, each party can communicate with neighboring parties, Kavousi et al. [35] presented an efficient protocol for MP-PSI using oblivious PRF (OPRF). Based on the TAHE schemes and BF, in a star network topology, Bay et al. [36] proposed an MP-PSI protocol, which is secure in the semi-honest model. However, the communication and computational complexity of the MPSI protocol [28][29][30][31][32][33][34][35][36] mentioned above depend on the size of the input data set, which directly becomes a basic obstacle to efficiency.
Based on the AHE, Ghosh et al. [38] introduced an MP-TPSI protocol, which is the first MP-TPSI protocol with communication complexity that depend on threshold t, not on the set size m. However, Abadi et al. [47] pointed out that [38]'s protocol is not secure because an adversary can learn other information about the sets of honest parties beyond the intersection. Using the OPRF and hash function, Mahdavi et al. [39] introduced two constructions for the MP-TPSI protocol, namely t − PSI 0 and t − PSI, but the computational complexity is exponential in the threshold t, and thus have a poor performance. By employing the TAHE from Elgamal encryption [48] and Paillier encryption [43], Branco et al. [40] developed a protocol to securely compute linear algebra functions and proposed an MP-TPSI in a star network topology. Badrinarayanan et al. [41] pointed out that [38]'s protocol has a subtle issue, that is, elements that are not in the intersection may also be computed as elements in the intersection. To solve this issue, in the star network topology, they proposed the TAHE-based MP-TPSI and TFHE-based MP-TPSI protocols. However, their TFHEbased MP-TPSI protocol uses different forms of evaluation polynomials in the cardinality testing and intersection computing phases, which requires the transmission and calculation of a large number of evaluation values, and brings to heavy communication and computational cost.

Notations.
For ease of reading, the definitions of symbols in the proposed MP-TPSI protocol are described in Table 2.

Security Model.
We define the security of the proposed MP-TPSI protocol in universal composability (UC) framework [49]. Considering a multiparty protocol Π that realizes the ideal functionality F, we can define the security of the protocol Π in the ideal/real world.
In an ideal world: n parties transmit all inputs to F, and receive the computation result. Simulator S is regarded as an adversary in an ideal world, has complete control of the parties that are corrupted, and simulates Z's view of on the execution of the real protocol.
In a real world: n parties perform Π, Π is permitted to call an ideal functionality G. Environment Z selects all inputs of n parties, simulates anything outside Π. Z can represent the adversary and corrupt any subset of the parties.
Assuming Ideal[Z, S, F] and Real[Z, Π, G] are the output of Z in the ideal and real world, respectively, we define Π securely realizes F, if there is a S so that for any Z we have

Functionality. Ideal functionality F MP−TPSI−CT for MP-TPSI cardinality testing:
In a star network topology, for n parties P 1 , P 2 , · · · , P n holding data sets A 1 , A 2 , · · · , A n of equal size m, respectively, the goal of the F MP−TPSI−CT is to execute a multiparty protocol Π, at the end of Π, every party P i can know whether if its data set A i and intersection I � Figure 1.
Ideal functionality F MP−TPSI−C for MP-TPSI computing: In a star network topology, for n parties P 1 , P 2 , · · · , P n holding data sets A 1 , A 2 , · · · , A n of equal size m, respectively, the goal of the F MP−TPSI−C is to execute an multiparty protocol Π, at the end of Π, either every party P i outputs an intersection I � A 1 ∩ A 2 ∩ · · · ∩ A n or outputs none ⊥. e formal definition of F MP−TPSI−C is described Figure 2.

Multiparty Threshold Private Set Intersection
In a star network topology where party P 1 to be the designated party that can communicate with other parties P 2 , P 3 , · · · , P n , suppose n, parties P 1 , P 2 , · · · , P n with input sets A 1 , A 2 , · · · , A n of equal size m, respectively, based on TFHE with distributed setup, we propose an MP-TPSI protocol, in which each party P i can compute the intersection e proposed MP-TPSI protocol is formally described in Figure 3.

Correctness. MP-TPSI cardinality testing:
First we consider the situation where the MP-TPSI cardinality testing outputs true. Based on the correctness of the TFHE, we only Observe the rational interpolation polynomial we can see that the degree of numerator a A 1 \I (x)+ a A 2 \I (x) + · · · + a A n \I (x) and denominator a A 1 \I (x) is at most t, and the degree of rational polynomial y 1 (x) is at most 2t. erefore, y 1 (x) can be computed from a total of 2t + 1 evaluation values, and the equation holds. Next, we consider the situation where the MP-TPSI cardinality testing outputs false. From the above equation, we can observe that gcd(a A 1 \I (x)+ a A 2 \I (x) + · · · + a A n \I (x), a A 1 \I (x)) � 1. Since |A i \I| ≥ (t + 1), the degree of a A 1 \I (x) + a A 2 \I (x)+ · · · + a A n \I (x) and a A 1 \I (x) are at least t + 1, the degree of rational polynomial y 1 (x) is at least 2t + 3, and hence calculating y 1 (x) requires at least 2t + 3 evaluation values. However, there are only 2t + 1 evaluation values in the MP-TPSI cardinality testing. erefore, the equation y 1 (x)| x�z � f(z)/a A 1 (z) � a A 1 (z) + a A 2 (z) + · · · + a A n (z)/a A 1 (z) does not hold. From the above analysis, we are able to obtain that the MP-TPSI cardinality testing is correct. MP-TPSI computing: If |A i \I| > t for any i � 1, 2, · · · , n, the MP-TPSI computing quits after the MP-TPSI cardinality testing. If |A i \I| ≤ t, observe the rational interpolation polynomial we can see that the degree of numerator a A 1 \I (x) + a A 2 \I (x) + · · · + a A n \I (x) and denominator a A i \I (x) are at most t, and hence y i (x) is a random polynomial with degree at most 2t + 1. Since gcd(a A 1 \I (x) + a A 2 \I (x) + · · · + a A n \I (x), a A i \I (x)) � 1, no  other terms will be canceled out in the numerator and denominator.
erefore, based on the correctness of the TFHE, each party P i is able to interpolate the rational random polynomial y i (x) by utilizing 2t + 1 evaluation values. Finally, each P i can easily compute intersection I from the set A i \I of the roots of the denominator of polynomial y i (x).

Performance Evaluation
e proposed MP-TPSI protocol is an improvement of the TFHE-based MP-TPSI protocol [41], so we evaluate the proposed MP-TPSI protocol and the TFHE-based MP-TPSI protocol [41]. In the star network topology, we implement the proposed MP-TPSI protocol on top of the lattice-based multiparty HE library Lattigo [50] that implements the full-RNS BFV scheme [51] and its multiparty versions in Go. We run all experiments on a 32-core Intel Xeon CPU with 256 GB of RAM. For the multiparty BFV scheme in Go, to ensure 128 bits security, we choose that polynomial-degree is 4096, ciphertext-modulus is 109 bits, and plaintext-modulus is 17 bits. For ease of comparison, we perform all experiments on the same machine with 16 threads, emulate the networks latency by utilizing the Linux tc command, and consider a LAN with a 10 Gbps throughput and 0.2 ms round-trip time. It is worth noting that the authors of [41]  Security and Communication Networks did not implement their TFHE-based MP-TPSI protocol, for a fair comparison, we implement the TFHE-based MP-TPSI protocol [41] in the same experimental environment.

Analysis of Communication Cost.
In a star network topology, according to the selected parameters in Section 5.1, we can obtain the size of ciphertext, partial decryption ciphertext and plaintext are |x| � 2 × 4096 × 109 bits, |x i | � 4096 × 109 bits, and |x| � 4096 × 17 bits, respectively. e comparison of communication cost between the proposed MP-TPSI protocol and the TFHE-based MP-TPSI protocol [41] are shown in Table 4.

Security Analysis
In security model, we assume an environment Z who is able to corrupt the set A * of n * < n parties, a simulator S knows the output value w ∈ true, false { } of the ideal functionality F MP−TPSI−CT . If w � true, S sets b � 0, otherwise sets b � 1. S also has the output value I or ⊥ of the ideal functionality F MP−TPSI−C . In addition, for each corrupt party A i ∈ A * , S   has the input data set A i and random value r i of A i . e simulation strategy of S is described as follows.
Initialization. S represents each honest party P i running the distributed setup TFHE.DisSet algorithm just like in the real world. S also knows the secret key share sk i A i ∈A * of all corrupt parties A * .

MP-TPSI Cardinality
Testing. S does the following: In Step 1, S encodes the intersection set I � a 1 , a 2 , · · · , a I as a rational polynomial a I (x) � a i ∈I (x − a i ), chooses randomly a rational polynomial u(x) of degree t, and computes a rational polynomial In Steps 2-4, whenever each honest party P i sends any encrypted value, S computes the ciphertext 0 � TFHE.Enc(0) employing fresh random value on behalf of P i just like in the real world.
In Steps 5-6, instead of computing the value b i by executing the partial decryption algorithm TFHE.PartDec(sk i , b) on behalf of every honest party P i just like in the real world, S calculates the value b i by executing the simulator algorithm b i P i ∈P � TFHE.S(C, b, b, sk i A i ∈A * ), where C represents the computation circuit performed by P 1 to calculate the value b just like in the real world, this corresponds to the ideal world, P denotes the set of the honest parties. If P 1 is honest, S sends the evaluation value b just like in the real world.     , f(k)) on behalf of every honest party P i just like in the real world, S calculates the value f(k) i by executing the simulator algorithm f(k) i P i ∈P � TFHE.S(C, f(k), f(k) i , sk i A i ∈A * ), where C represents the computation circuit performed by P 1 to calculate the value f(k) just like in the real world, this corresponds to the ideal world. If P 1 is honest, S sends the evaluation value f(k) just like in the real world.
In steps 2, S outputs the interpolation polynomial y i (x) and set intersection I on behalf of every honest party P i just like in the real world.
Next, suppose a simulator S h , we show that the proposed MP-TPSI protocol is secure against the environment Z in the semi-honest setting through a set of computationally indistinguishable consecutive hybrids.

Conclusion
In this study, using sparse polynomial interpolation and TFHE, we introduce a MP-TPSI protocol with low communication complexity, in which the communication complexity only depends on the threshold t and the number of parties n, not on the size of data set m. Compared with the existing MP-TPSI protocols, the proposed MP-TPSI protocol utilizes the same form of evaluation polynomial in the cardinality testing and intersection computing phases, which enables the parties to transmit and compute only a small number of evaluation values, and hence reduces the communication and computational cost. Performance evaluation demonstrates that our MP-TPSI protocol requires 92.0% and 67.2% less computational and communication costs respectively than the competitive MP-TPSI protocol. Moreover, the proposed MP-TPSI protocol can achieve the correctness of the intersection result, and ensure the security of the data of the parties, that is, the semi-honest adversary cannot learn additional information beyond the intersection. In the future, we will explore the MP-TPSI protocol in the broadcast communication setting, optimize the rounds of MP-TPSI, and design a more efficient MP-TPSI protocol with malicious security. Proof.

Appendix
e difference between Hybrid 0 and Hybrid 1 is that in Hybrid 0 , S h calls F MP−TPSI−CT honestly, while in Hybrid 1 , S h simulates the ideal functionality F MP−TPSI−CT that returns true if |A i \I| ≤ t and false otherwise. In Hybrid 0 , the output result of F MP−TPSI−CT is correct due to the correctness of our protocol Π MP−TPSI . In Hybrid 1 , the output result of F MP−TPSI−CT is always correct. erefore, Hybrid 0 and Hybrid 1 are computationally indistinguishable. Lemma 2. Hybrid 1 and Hybrid 2 is computationally indistinguishable due to the simulation-based security of TFHE [42].

Proof.
e difference between Hybrid 1 and Hybrid 2 is that in Hybrid 1 , S h computes the partial decryption of TFHE of all honest parties just like in the real world, while in Hybrid 2 , S h simulates the partial decryption by running TFHE.S. If there is an Z that is able to distinguish Hybrid 1 and Hybrid 2 with a non-negligible probability ϵ, we are able to build a reduction algorithm B that has the ability to break TFHE's simulation-based security with a non-negligible probability ϵ ′ .B interacts with a challenger C in TFHE's simulationbased security game, and interacts with Z in the game of Hybrid 1 and Hybrid 2 . e corrupt parties in the game of B and Z are the same as the corrupt parties in the game of B and C. B sends the public key share pk i and secret key share sk i of the corrupt party that it receives from Z to C, and sends the public key share pk i of the honest party that it receives from C to Z. B sends the corrupt party's input data set A i and random value R i that it receives from Z to C. B sends the honest party's ciphertext that it receives from C to Z. B sends the evaluation circuit of rational polynomial f(x) to C. C returns the honest party's partial decryption to B. B continues to interact with Z for the rest progress just like in Hybrid 1 . In the interaction process, if C sends honestly computed partial decryption, then the interaction process between B and Z is associated with Hybrid 1 , if the partial decryption is simulated by TFHE.S, the interaction process between B and Z is associated with Hybrid 2 .
From above, if there is an Z that is able to distinguish Hybrid 1 and Hybrid 2 with a non-negligible probability ϵ, B has the ability to break TFHE's simulation-based security with a non-negligible probability ϵ ′ , this contradicts with TFHE's simulation-based security [42]. erefore, Hybrid 1 is computationally indistinguishable from Hybrid 2 . Lemma 3. Hybrid 2 is statistically close to Hybrid 3 .

Proof.
e difference between Hybrid 2 and Hybrid 3 is how the rational polynomial f(x) is calculated. In Hybrid 2 , S h computes For each i ∈ [n], Deg(r i · a A i \I (x)) � t. us, Deg(v 1 (x)) � Deg(v 2 (x)) � t. Since v 2 (x) is statistically close to a uniform random polynomial of degree t, we can obtain f(x) � a I (x) · (v 1 (x) + v 2 (x)) � a I (x) · u(x), where u(x) is uniform random polynomials of degree t. In Hybrid 3 , S h computes f(x) � a I (x) · u(x). erefore, the distribution of f(x) in Hybrid 2 is statistically close to the distribution of f(x) in Hybrid 3 . 3 and Hybrid 4 is computationally indistinguishable due to the semantic security of TFHE [42].

Proof.
e difference between Hybrid 3 and Hybrid 4 is that in Hybrid 3 , S h computes the encryption of TFHE of all honest parties just like in the real world, while in Hybrid 4 , S h computes the encryption of 0.
If there is an Z that is able to distinguish Hybrid 3 and Hybrid 4 with a non-negligible probability ϵ, we are able to build a reduction algorithm B that has the ability to break TFHE's semantic security with a non-negligible probability ϵ ′ . B interacts with a challenger C in TFHE's semantic security game, and interacts with Z in the game of Hybrid 3 and Hybrid 4 . e corrupt parties in the game of B and Z are the same as the corrupt parties in the game of B and C. B sends the public key share pk i and secret key share sk i of the corrupt party that it receives from Z to C, and sends the public key share pk i of the honest party that it receives from C to Z. B sends the honestly generated plaintext and 0 to C. C returns their ciphertexts to B. B uses the ciphertext it receives from C to interact with Z. B continues to interact with Z for the rest progress just like in Hybrid 3 . In the interaction process, if C sends honestly computed ciphertext, then the interaction process between B and Z is associated with Hybrid 3 , if the ciphertext is computed as 0's encryption, the interaction between B and Z is associated with Hybrid 4 .
From above, if there is an Z that is able to distinguish Hybrid 3 and Hybrid 4 with a non-negligible probability ϵ, B has the ability to break TFHE's simulation-based security with a non-negligible probability ϵ ′ , this contradicts with TFHE's semantic security [42]. erefore, Hybrid 3 is computationally indistinguishable from Hybrid 4 .

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare no conflicts of interest.