Private set intersection (PSI) allows participants to securely compute the intersection of their inputs, which has a wide range of applications such as privacy-preserving contact tracing of COVID-19. Most existing PSI protocols were based on asymmetric/symmetric cryptosystem. Therefore, keys-related operations would burden these systems. In this paper, we transform the problem of the intersection of sets into the problem of finding roots of polynomials by using point-value polynomial representation, blind polynomials’ point-value pairs for secure transportation and computation with the pseudorandom function, and then propose an efficient PSI protocol without any cryptosystem. We optimize the protocol based on the permutation-based hash technique which divides a set into multisubsets to reduce the degree of the polynomial. The following advantages can be seen from the experimental result and theoretical analysis: (1) there is no cryptosystem for data hiding or encrypting and, thus, our design provides a lightweight system; (2) with set elements less than
Private set intersection (PSI) can be described that participants complete computation based on their private inputs and cannot learn additional information other than the set intersection. PSI has a wide range of applications such as privacy-preserving contact tracing for infection detection [
PSI has been well studied. Several cryptographic technologies have been proposed to implement PSI. According to cryptographic techniques involved, PSI protocols are mainly divided into the following three categories: PSI based on the public-key technology: the main cryptographic technique was homomorphic encryption. The protocols were designed in such a way that the sender encrypted sets and the receiver performed some operations on the ciphertexts using the property of homomorphic encryption; then, the sender decrypted them by using his private key and got the intersection. With small communication complexity, these protocols were suitable for the scenario where the participants had strong computing power but the communication bandwidth was a bottleneck. However, the protocols had a higher time complexity because of using public-key cryptography. PSI based on the generic circuit: the protocols transformed any function into garbled Boolean circuit and then completed the generic secure computation. The circuit generator encrypted each circuit gate using a double symmetric cryptosystem and generated a garbled circuit; the evaluator computed keys for the output wires by decrypting the appropriate ciphertexts without learning any intermediate values. The key technique used in the protocols was symmetric cryptosystem. The advantage of the general circuit protocol was that it made the protocol easier to design and implement. But as a general solution, the garbled circuit could not achieve scalability, and the protocols were inefficient. PSI based on the oblivious transfer (OT) scheme: this kind of protocols introduced some variants of OT. The protocols were that elements were stored in some data structures, and parties ran an OT for each bit of inputs to get private outputs. Then, each party performed XOR operations with random values and its own elements. Lastly, the sender sent the results to the receiver, who locally checked the existence of its inputs. To improve efficiency, most of OT variants were implemented by using the symmetric cryptosystem. Thus, these protocols had lower time complexity and communication complexity. Nevertheless, such protocols required additional keys-related computations such as secret key negotiations.
From the above analysis, PSI protocols based on public-key cryptosystem suffer from two constraints: low efficiency and needing a complicated system for private/public-keys management. On the other hand, PSI protocols based on symmetric cryptosystem have higher efficiency, but negotiating or secure transferring of secret keys leads to additional computations and communications. Furthermore, the secure storage of keys will burden the system. In the paper, we transform the problem of the intersection of sets into the problem of finding the roots of polynomials by using point-value polynomial representation and propose an efficient PSI protocol without any cryptosystem.
Our work can be applied to the following several practical scenarios.
The COVID-19 pandemic has posed an unprecedented challenge for humans. Due to the highly contagious nature of the virus, social distancing is one fundamental measure that has already been adopted by many countries. Based on the matching of location information between infected patients and regular people, contact tracing for infection detection enables users to securely upload their data to the server, and later, in case one user got infected, other users can check if they had ever got in contact with the infected user in the past. To protect users’ private location information, PSI can be applied to securely compute shared location data.
Two national law enforcement bodies have a list of suspected terrorists. Due to national laws, they may not be allowed to disclose their whole lists, even when collaborating. Using a PSI protocol, both agencies can find commonly suspected terrorists and share their information, while other relevant information will not be disclosed.
Different space agencies have their own orbiting satellites. In order to determine the collision problem of the same orbiting satellite pair and adjust the orbit of the satellite appropriately, these agencies need to share more detailed information. However, each agency does not want to disclose anything other than whether there was a collision in orbital information. Thus, it is necessary to use PSI for computing the probability of a collision among satellites without revealing their other private information.
We transform the problem of the intersection of sets into the problem of finding roots of polynomials by using point-value polynomial representation and propose a new approach to PSI protocol without any cryptosystem. Then, we optimize our protocol based on the permutation-based hashing technique that reduces the length of the stored elements and the degree of the polynomial. Eventually, our protocol and the related PSI protocols are implemented on the Linux platform. The main contributions are as follows.
We propose a new approach for designing PSI protocol based on point-value polynomial representation and pseudorandom function. Firstly, we represent sets as polynomials’ point-value pairs. Each party denotes
We optimize the new PSI protocol using the permutation-based hashing method, which converts the hashed elements into shorter strings without collisions and reduces the degree of polynomials. The hashing is to create a two-dimensional table
We implement our hashing protocol and other related protocols in C/C++ on the Linux platform. We use Number Theory Library (NTL) [
The related works on PSI protocols are introduced in Section
According to the underlying cryptographic techniques, PSI protocols can be divided into the following three categories.
In 1986, Meadows [
The PSI protocol based on oblivious polynomial evaluation [
In 2009, Jarecki et al. [
In 2010, Cristofaro et al. proposed PSI and Authorized PSI (APSI) protocols [
In 2017, Chen et al. [
The two main approaches were Yao’s garbled circuits [
In 2012, Huang et al. [
In 2018, Pinkas et al. [
In 2001, Naor et al. [
In 2013, Dong et al. [
In 2015, Pinkas et al. [
We give the transformation from operations of sets to operations of polynomials. This representation allows us to represent a set using a random point evaluation polynomial.
Polynomial representation of a set. Given a set
Polynomial in point-value pairs: distinct point-value pairs
Set intersection: let
The simple hashing maps each element
Permutation-based hashing technique is to allow the hashed elements to be converted shorter strings that can be stored in the hash table for reducing storage space and computation complexity, which was proposed by Arbitman et al. [
This section focuses on the security definition of PSI protocol.
We consider a semihonest adversary who follows the protocol specifications while trying to obtain extra information from the exchanging messages.
The functionality being implemented in this paper is
Semihonest security: in the semihonest model, a protocol
The functionality that is implemented in the new PSI protocol is
Setup: party
Initialization: each party Select a dummy number Construct polynomial Compute vectors Pick another random number
Intersection interaction: party Party Receiving party
where
Party
Party
Then, it sends the vector
Intersection result: party
Party
Party
Party
A PSI Protocol based on point-value polynomial representation.
Because for
Next,
Then,
And we can get the polynomial
Thus, the new protocol is correct.
We optimize the above protocol using the permutation-based hashing. At first, each party constructs a two-dimensional hash table Setup: party Hashing: each party Create a hash table For every bin Initialization: each party Generate a pseudorandom value Generate Construct a polynomial Choose a random number Compute vectors Intersection interaction: party Party Receiving party where Party where Party where Intersection result: for each bin Party Party Party
where
Efficient PSI protocol using the permutation-based hashing.
Because for
Next,
Then,
And the polynomials
Thus, the hashing protocol is correct.
The above hashing PSI protocol is securely computing the set intersection in the presence of a semihonest adversary.
If
We prove it by considering the cases where each of parties has been corrupted. In each case, we will construct a simulator who is only given the corrupted party’s input/output and generates a simulated view that has to be computationally indistinguishable from the real protocol.
Corrupted party
The simulator Create an empty view and then append Pick a set Construct polynomials Generate the Random Values Blind the polynomials’ values and get random vectors Compute Insert vectors
So, the view of the simulator
Note that, in both views, the input
Corrupted party
We construct a simulator Create an empty view and then append Pick a set Construct polynomials Generate the Random Values Blind the polynomials’ values and get random vectors Compute Insert vector
So, the view of the simulator
Note that, in both views, the input
Combining the above, we can get that
Therefore, the hashing protocol is secure in the semihonest model.
We ran our experiments in Ubuntu 18.04 with Linux 4.4.0.59 64-bit desktop PC. All protocols were implemented and executed using the same hardware equipped with Intel Core i7-7700K CPU with 3.6 GHz and 8 GB of RAM. We implemented our protocol and related protocols [
We give the running times of related protocols in Table
Running time in ms with related PSI protocols.
Type | Protocol | Set size | |||
---|---|---|---|---|---|
Public-key | Cristofaro2010 [ | 779 | 12546 | 203036 | 3193920 |
Circuit | Huang2012 [ | 79 | 1377 | 32292 | — |
OT | Dong2013 [ | 105 | 448 | 4179 | 65218 |
Pinkas2014 [ | 95 | 346 | 2991 | 49171 | |
Pinkas2018 [ | 311 | 362 | 702 | 5847 | |
Novel | Our | 5095 | 132220 |
Running time in ms with related PSI protocols.
A detailed analysis with related PSI protocols is given in Table
Comparison with related PSI protocols.
Type | Protocol | Property | |||
---|---|---|---|---|---|
Needing cryptosystem | Simulated-based Security | Computation Complexity | Communication Complexity | ||
Public-key | Cristofaro2010 [ | asym | Yes | ||
Circuit | Huang2012 [ | sym | No | ||
OT | Dong2013 [ | sym | Yes | ||
Pinkas2014 [ | sym | No | |||
Pinkas2018 [ | sym | No | |||
Novel | Our |
asym: public-key cryptography. sym: symmetric cryptography.
In this paper, we proposed a new approach to PSI protocol without any cryptosystem based on point-value polynomial representation and pseudorandom function and optimized it based on hashing techniques. Our protocol had high performance with set elements less than
All the pseudocodes used to support the findings of this study are included within the article.
The authors declare that they have no conflicts of interest.
This work was supported by the National Natural Science Foundation of China under Grant nos. 61672010, 61702168, and 61701173 and the fund of Hubei Key Laboratory of Transportation Internet of Things (WHUTIOT-2017B001).