Efficient Cloud-Based Private Set Intersection Protocol with Hidden Access Attribute and Integrity Verification

<jats:p />


Introduction
With the rapid progress of the technology of wireless communication and sensor, the Internet of ings (IoT) is changing our lives. From medical devices and air quality monitoring to intelligent street lights, energy-efficient buildings, smart home, and more, the IoT is in more places than ever. As the number of the connected IoT devices increases, the amount of data generated by these devices will also exponentially increase. According to the newer forecast from International Data Corporation (IDC) [1], there will be 41.6 billion connected IoT devices in 2025, and they will generate 79.4 ZB of data. For the IoT devices with limited computation and storage resources, it is an important challenge to how to properly use and store such vast amounts of data. Cloud computing makes it possible to process and store massive amounts of data, and it also makes those data users possessing limited resources to easily access the stored data in cloud at any time and from anywhere.
As a cryptographic tool that is realizing secure multiparty computation, private set intersection (PSI) can make two parties holding sets compute their intersection without revealing any information other than the intersection. Since PSI was proposed by Freedman in [2], a mass of PSI schemes [3][4][5][6][7] have been put forward. e powerful privacy protection of PSI makes it to be important applications in real life, such as private contact tracing [8], DNA testing and pattern matching [9], remote medical diagnostics [10], and the effectiveness assessment of online advertising [11]. Over the last few years, PSI has been further developed such that it becomes very practical with extremely fast implementations that can conduct millions of items in seconds. However, most PSI schemes require two parties possessing datasets to jointly calculate the intersection of the available datasets locally. As the commercial value of cloud computing services drives, the user might delegate cloud service provider to execute the PSI computation for the outsourced datasets in cloud.
To our knowledge, in most of the existing PSI protocols, both participants jointly compute the intersection of their sets in an interactive manner, which makes that each participant must have a local copy of its dataset. It brings a heavy burden to resource-limited users. e advent of cloud computing makes the delegation of PSI computation promising since cloud servers can provide flexible and costeffective storage space and on-demand computing power service. Recently, several cloud-based PSI protocols [6,12,13] are proposed. In these schemes, to achieve PSI computation, the user needs to outsource its dataset to a cloud server. However, the cloud server is not fully trusted, and it might reveal or tamper with the items in the outsourced dataset. To ensure the privacy of the outsourced dataset, they should be processed by applying cryptographic algorithms before outsourcing. However, the complicated cryptographic operations do not only incur the heavy computation burden to the resource-constrained data users but also impede access over the dataset of the data owner. To realize access control, data owners must be real-time online to execute PSI computations with the authorized data user.
Recently, to avoid data owner real-time online and achieve fine-grained access control, Ali et al. proposed an attribute-based private set intersection computation protocol [14]. However, their scheme cannot ensure the integrity of the returned blinded dataset from the cloud, since when the cloud is not fully trusted, it may return the partial blinded dataset or delete some items of the outsourced dataset. Additionally, in their scheme, the access policy needs to be embedded in the blinded dataset. is kind of direct exposure of access policy can result in privacy revelation of data users since the access policy often contains some sensitive information.
To solve the above issue and provide fine-grained access control, in the paper, we proposed an efficient cloud-based private set intersection computation (PSI) protocol. e main contributions in this works are summarized as follows: (i) Fine-grained access control of PSI computation: it provides fine-grained access control for PSI computation in the cloud environment and makes access control over the outsourced dataset of data owners realized by applying attribute-based encryption.
(ii) Offline data owner: in the PSI-computation phase, data owner does not need to be online in real-time, which reduces the burdens of communication and computation of data owner. (iii) Resisting colluding attack: the collusion between cloud server and unauthorized data users cannot obtain any information about the outsourced datasets. (iv) Data secrecy: for data owner, an authorized data user with dataset Y only learns the information of the intersection X ∩ Y; none of the other information about dataset X except X ∩ Y is obtained by data user. (v) Integrity: it can ensure the integrity of the returned blinded dataset in order to resist the malicious behaviors of the cloud server. (vi) Hidden access policy: to satisfy more practical privacy requirements, the proposed protocol enables that cloud server cannot derive any sensitive information about attribute from the blinded dataset.

Paper
Organization. e remainder of the paper is organized as follows. Section 2 reviews related work, and Section 3 describes some preliminaries. In Section 4, we give problem formulation of the proposed protocol. In Section 5, the proposed PSI protocol is given. In Section 6, we analyze the security of the proposed protocol. In Section 7, we evaluate the performance of the proposed protocol. Finally, the paper is summarized in Section 8.

Related Work
Meadows proposed the first secure PSI protocol [15] based on multiplicative homomorphic techniques. Due to being based on public-key cryptography, the scheme running time is unacceptable, in particular, when the size of the dataset becomes large. In [2], Freedman et al. proposed a private set intersection protocol by means of partial homomorphic encryption and point-value polynomial representation of sets. Later, Hazay and Nizim extended it to the malicious setting in [16]. In [17], Kissner and Song proposed privacypreserving set operations. eir scheme can compute not only the intersection of the sets but also the union of the sets in a privacy-preserving way.
In [18], Jarecki and Liu proposed a novel PSI protocol based on the composite residual problem. In their scheme, the user and the server obtain the intersection of the two datasets by using parallel oblivious pseudorandom function. However, it relied on a common reference model. In [19], De Cristofaro et al. proposed two PSI protocols in malicious model. However, their schemes are unable to hide the cardinality of the user's dataset. To overcome this problem, Ateniese et al. proposed a PSI protocol [20] under the RSA assumption. But their scheme is only proven to be secure in the random oracle model. Based on the scheme in [18], De Cristofaro and Tsudik presented an efficient PSI protocol [21] by using OPRF techniques. Over the past few years, many efficient PSI protocols [3-13, 16, 17] have been successively proposed.
According to cryptographic techniques used to construct PSI protocol, the existing PSI protocols are mainly classified into three different groups: (i) Public-key-based PSI protocol: homomorphic encryption is a common cryptographic technique to design PIS protocol. In the early days, most PSI protocols were constructed based on homomorphic encryption, where the protocols in [2,8,10,15,22,23] are the classic instances. In the type of protocols, data owner first encrypts dataset X to obtain the corresponding ciphertext C and sends C to data user, and then using homomorphic properties of homomorphic encryption, data user conducts some specific operations on the ciphertext C and its dataset Y. Finally, data owner obtains the corresponding intersection by using its private key.
is type of protocol is suitable for the scenario in which both participants possess strong computing capability. In general, such protocols require a higher computation cost since public-key cryptography is included. However, it is suitable for designing some PSI protocols with a custom function. (ii) Circuit-based PSI protocol: circuit-based generic technique of secure computation is another method to design PSI protocol. Fairplay proposed the first PSI protocol by using Yao's garbled-circuit approach in [24]. In the subsequent works, Huang et al. presented three PSI protocols based on Yao's generic garbled-circuit method [25]. eir schemes are competitive with the fastest public-key-based protocols. Afterwards, Pinkas et al. gave some new optimizations for circuit-based PSI in [26]. By using secure multiparty computation idea, this type of protocol can transform the specific function into garbled Boolean circuit to realize secure computation, and its key technique is symmetric cryptography. For this general circuit protocol, its advantage is that it makes the design and implementation of the protocol easier. However, due to its generalization, the garbled circuit makes the scalability of the protocol poor. (iii) Oblivious transfer-(OT-) based PSI protocol: oblivious transfer protocol is a foundation of secure computation. To realize large-scale data processing, Dong et al. presented two efficient PSI protocols [27] based on bloom filters and OT extension protocol. eir protocols are rather efficient and highly scalable compared with some PSI protocols. To reduce the runtime, Pinkas et al. gave an optimization of PSI protocol [27] using random OT extension in [26]. e core idea of this type of protocols is to have both parties collaboratively engage in many OT protocols. In general, this type of PSI protocols had lower computation costs and communication consumption, but extra keys-related computations are demanded such as secret key agreement.
From the above analysis, public-key-based PSI protocols exist as higher computation complexity, and circuitbased PSI protocol and OT-based PSI protocols have higher efficiency due to using symmetric encryption, but key agreement or secure transferring of secret keys also require additional computation costs and communication overhead.

Composite Order Bilinear Group.
roughout the paper, we only consider composite order bilinear groups since our scheme is based on such construction. In the following, we review some concepts of such bilinear pair: (1) G 1 and G T are two cyclic groups with the same composite order N � pq where p, q are two distinct primes, and it is deemed to be hard for solving the discrete logarithm problem in group G i , i ∈ 1, T { }. (2) Let e: G 1 × G 1 ⟶ G T denote a computable bilinear map which satisfies the following criteria: (i) Bilinearity: for arbitrary a, b ∈ Z N and all Q, F ∈ G 1 , we have e(Q a , F b ) � e(Q, F) ab . (ii) Nondegeneracy: ∃g, h ∈ G 1 such that e(g, h) has the order N in G T . (iii) Orthogonal property: let G p and G q denote two subgroups of G 1 with the order p and q, respectively. For ∀h p ∈ G p and ∀h q ∈ G q , then e(h p , h q ) � 1.

e Decisional Bilinear Diffie-Hellman Assumption in
G. Let (g, g a , g b , g c , T) ∈ G 4 1 and G T be a random 5-tuple where a, b, c ∈ Z p , there does not exist an efficient PPT algorithm A which can distinguish T� ? e(g, g) abc . A's advantage of breaking the decisional Diffie-Hellman problem in G 1 is defined as ε � Pr e(g, g) abc � T←A g, g a , g c , g b . (1) We think that the DBDH problem is against A if the algorithm A is capable of distinguishing e(g, g) abc and T in a nonnegligible probability ε.
3.2. Access Tree. Access trees can make the representation of access control policies easier to understand. In what follows, we explain the access trees used in our constructions.
Access tree T is a tree-like access structure, and each leaf node is associated with an attribute value, and an inner-node l is represented with a threshold gate (num l , k l ), where num l is the children number of innernode l and k l is its threshold value satisfying 0 < k l ≤ num l . Specifically, when k l � 1 and k l � num l , it means that the corresponding threshold gate is the OR-gate and the ANDgate, respectively. For each leaf node l, its threshold value is k l � 1.
For the sake of presentation, the children of each node l are ordering from 1 to num l . At the same time, we define that function parent(l) is the parent of node l, index(l) is the number associated with node l, and then function att(l) is an attribute associated with the leaf node l.
Assume that R is the root of an access tree T, then we use T R to represent this tree, and T x is denoted by a subtree of T rooted at node x if node x is an inner node of T. For an attribute set S, if it satisfies the subtree T x , then it is represented as T x (S) � 1. e satisfied conditions are divided into the following two cases: (1) When x is a leaf node, T x (S) � 1 is returned if and only if att(x) ∈ S; (2) When x is a nonleaf node, T x (S) can be computed recursively. For all children z of node x, if at least k x children satisfy T z (S) � 1, then T x (S) � 1 is returned.

Problem Formulation
In this section, to better understand the motivation of the proposed scheme, we will give the system model and threat model that correspond to our protocol.

System Model.
For a cloud-based PSI computation protocol with fine-grained access control and integrity verification, its system model is shown in Figure 1. e system model consists of four entities: key generation center (KGC), cloud service provider (CSP), data owners, and data users. e roles of these entities are described as follows.

Key Generation Center.
It is responsible for establishing system parameters and generating the secret keys of the attributes for data users. In addition, it also generates a public-private key pair for the data owners' signature algorithm.

Cloud Service Provider.
It has abundant storage space and powerful computing capability, and it can provide storage services of the outsourced dataset for data owners and PSI computation services for data users.

Data Owner.
It is the owner of a dataset X. To achieve fine-grained access control, data owner needs to define an access control policy before outsourcing the dataset; and then it blinds the dataset based the defined access policy.

Data User.
It also possesses a dataset Y and can request the CSP to generate a token in order to compute private set intersection X ∩ Y. It is worth noting that only data user whose attributes satisfy access policy defined by data owner can obtain the returned PSI token by the CSP.

reat Model.
In our proposed protocol, the CSP is not a fully trusted entity, like [6,11,14,17], it may attempt to tamper or delete the items in the outsourced dataset, and it also might try to extract sensitive information from the outsourced dataset by colluding some unauthorized data users. For data user, it is identified as a malicious entity. It may collude the CSP to obtain more information beyond the intersections. Additionally, an unauthorized data user also may attempt to obtain the qualification of the PSI computation. e KGC is assumed to be a trusted entity. It is responsible for generating secret keys for data user's attribute set and public/private key pairs for data owners. Data owner is assumed to be a trusted entity. It honestly encrypts its dataset and outsources them to the CSP. ey never reveal the elements of their private datasets.

Security Goals.
For the proposed protocol, its goals are given as follows: (1) It achieves fine-grained access control and makes that only data users satisfying the defined access policy by data owner can conduct PSI computation. (2) e authorized data users learn nothing, except the intersection of datasets, this is to say, it achieves data secrecy.

Our Concrete Construction
In this section, we present a concrete construction of the cloud-based PSI protocol with fine-grained access and integrity verification. e protocol consists of the following three stages, the detailed descriptions are given as follows.

System Initiation.
In this stage, key generation center (KGC) is responsible for initializing system parameters and producing secret key of the data user with attribute set Att.
To do it, it needs to execute the following two algorithms: Setup and KeyGen.
. Taking a security parameter λ and the universal attribute set U as inputs, it outputs where G 1 and G T are cyclic groups of order N and p, q are two distinct 512-bit primes. Let G p and G q denote two subgroups of group G 1 with the order p and q, respectively, and g p and g q are the generators of G p and G q , respectively. e: For the universal attribute set U � (w 1 , . . . , w n ), where n is the maximum number of attributes, randomly choose α, t 1 , t 2 , . . . , t n ∈ Z p and R 0 , R 1 , . . . , R n ∈ G q to compute public keys Additionally, KGC chooses two collision-resistant hash functions H and H 0 satisfying H: Finally, public parameters Param are published as follows: and master secret keys (α, t i , (1 ≤ i ≤ n)) are securely stored. Note that each T i corresponds to each attribute w i . (Att, α, a, β). For a data user with an attribute set Att � (w 1 , . . . , w t ), if it wants to register to the system, it sends its attribute set Att to KGC, and then KGC makes use of its secret key to generate the secret key for the data owner with attribute set Att.

KeyGen
(1) First of all, KGC randomly chooses a number r ∈ Z p to compute (2) and then for each attribute w j in Att, it computes e resultant secret key sk Att � (sk 0 , sk i w i ∈Att ).
In addition, for data owner, it randomly chooses sk o ∈ Z p to compute its public key pk o � g sk o p . is public-private key pair (pk o , sk o ) is used to generate digital signature.

Blinding of Dataset.
For a data owner, if it wants to outsource the dataset X � x i i∈I to cloud server, to ensure the security of the dataset, it needs to blind the dataset before outsourcing. To achieve the confidentiality and fine-grained access control of data, it needs to execute the following Blind algorithms. (Param, T, X). For a dataset X and an access structure T, the algorithm takes X and T as inputs and outputs the blinded data. e detailed process is given as follows.

Blind
(1) First of all, it picks a number t ∈ Z p and R 0 ′ ∈ G q at random to compute where I � |X| is the size of the dataset X and ‖ is a concatenate operator.
denotes an index value of node v in the children of its parent node, and parent(v) is the parent node of node v. Finally, for each leaf node i of access tree T corresponding to an attribute w i , the blinded data is computed where R i ′ is a random element in group G q . Note that T i corresponds to the attribute w i . (4) At last, the blinded dataset of X is BT � (T, δ s , C 0 , C x i i∈I , C j j∈T ), and data owner uploads them to cloud server.

PSI Computation.
To obtain private set intersection of a blinded dataset BT of dataset X uploaded by a special data owner, the stage is divided into three parts: Token1 generation, Token2 generation, and set intersection computation. Firstly, a data user with dataset Y needs to run TokenGen1 algorithm to produce a PSI-token 1 for cloud Security and Communication Networks server, and then cloud server runs TokenGen2 algorithm to produce a PSI-token 2 by using PSI-token 1. Finally, data user computes the intersection X ∩ Y of the two datasets X and Y by using the PSI-token 2 produced by cloud server. e detailed processes are given as follows.
5.3.1. TokenGen1 (Param, sk Att ). Taking system parameters Param and secret key of data user sk Att , this algorithm randomly selects a number μ ∈ Z p to compute a PSI-token 1: (Param, ToK 1 , BT). Taking the PSI-token 1 ToK 1 and the blinded dataset BT, this algorithm firstly verifies whether Att satisfies the access tree T correlating to the blinded dataset BT. If it does not, then it outputs ⊥ and aborts it.

TokenGen2
For the sake of illustration, we define a recursive algorithm Tok 2 (·) that takes as input an access tree T, an attribute set Att, the blinded dataset BT, and a node from T.
For each node in T, cloud server executes a recursive algorithm Tok 2 (·) as follows: (1) If θ i ∈ T is a leaf node and att(θ i ) ∈ Att, where w i � att(θ i ) is an attribute corresponding to leaf node θ i , then it computes where sk μ i is the i-th element in PSI-token 1 ToK 1 and C i is the i-th in C j j∈T .
(2) If θ i ∈ T is a nonleaf node, the recursive algorithm runs as follows: for all child node child(θ i ) of node θ i , it calls Tok 2 (T, Att, BT, index(j, child(θ i ))) for j � 1, . . . , ω and stores them, where child(θ i ) denotes all children of θ i and index(j, child(θ i )) denotes the j-th child node of θ i . e algorithm is run until children nodes of a node are leaf nodes, and then we compute where child(θ i ) is the child node of node θ i and w is threshold value of parent node parent(θ i ), and for θ i , its Lagrange coefficient is rough recursively running Tok 2 (T, Att, BT, θ i ) in the bottom-to-top manner in access tree T, we can obtain the PSI-token 2, In the end, the algorithm returns the PSI-token 2 Tok 2 , C x i i∈I and the signature δ s of C x i i∈I .

{ }
, δ s ). Upon receiving ToK 2 and C x i i∈I , the algorithm takes ToK 2 , C x i i∈I , Y, and δ s as inputs and executes Algorithm 1. In this algorithm, lines 1-3 are used to check the validity of the signature δ s , which can achieve the integrity checking of the returned dataset by cloud server. Lines 7-17 are used to seek the intersection of two datasets. If C x i �� τ j holds, then the element of the intersection is found.

Correctness.
In the subsection, we show that the proposed scheme is correct, because in the blinding stage, the dataset X � x i i∈I is blinded into BT, and each C x i in BT has the following format C x i � H 0 (H(C‖pk 0 ) ⊕ H 0 (x i )). In addition, in the stage of intersection computation, data user can use PSI-token 2 to obtain the following relation: Tmp � Tok μ − 1 2 · e C 0 , sk 0 � e g p , g p rt e P t x · R 0 ′ , g α− r p � e g p , g p rt e g t p , g α− r p � e g p , g p αt � P t y .
us, it can obtain X ∩ Y by running Algorithm 1 in PSI computation phase. It means that the proposed scheme is correct.

Security Analysis
In this section, we show that the proposed PSI scheme satisfies data secrecy and resists the adaptively chosendataset attack.

Theorem 1. Supposed that the decisional bilinear
Diffie-Hellman problem (DBDHP) in G 1 is difficult to solve, then the proposed PSI scheme is adaptively secure against chosen-dataset attack in the standard model.
Proof. Let A be a (probabilistic polynomial time) PPT adversary who launches an attack on the proposed PSI scheme. If it breaks the proposed PSI scheme in a nonnegligible probability ϵ, then we are able to construct a challenger CH which solves the DBDHP problem.
First of all, let us recall the DBDHP problem in subgroup G p of group G. Assume that Y → � (g p , g a p , g b p , g c p , Z) is an instance of the BDHP problem, where a, b, c are random numbers and Z∈ R G T , its goal is to determine whether the case e(g p , g p ) abc � ? Z holds. In the following interactive game, CH attempts to solve the BDHP problem by invoking A as a subroutine. □ 6.1. Init Phase. In this phase, the adversary A randomly selects the challenged access structure T * and an attribute set U and sends them to the challenger.

Setup.
In this phase, CH initializes system parameters based on the instance of the DBDH problem. Firstly, it chooses α ′ ∈ Z N to compute P y � e g a p , g b p e g p , g p α ′ .
It implies α � ab + α ′ . Also, for i � 1 to n, it randomly chooses t i ∈ Z p , R 0 ∈ G q , and R i ∈ G q to compute ∀w i ∈ U: Additionally, it also sets P x � g p · R 0 and chooses sk o ∈ Z p to set data owner's public key as pk o � g sk o p . Finally, it sends (P x , P y , T 1 , . . . , T n ) to the adversary A.

Phase 1.
To simulate the game, A can adaptively make a series of queries in this phase.
(i) KeyGen query: while the adversary A makes a KeyGen query with an attribute set W � (w 1 , . . . , w l ), where w i ∈T * , to response it, CH executes the following steps: (1) First of all, it randomly chooses r ′ ∈ Z p to compute It implicitly defines r � ab + r ′ b. (2) For i � 1 to l, because all w i ∉ T * , we can compute (3) Finally, the secret key of the corresponding attribute Wsk W � (sk 1 , . . . , sk l , sk 0 ) is returned to the adversary A.
Subsequently, CH randomly flips a coin c ∈ 0, 1 { } to produce the following blinded dataset: (1) It sets (2) Assume that root denotes the root node of T * . According to the principle from top to bottom, to compute leaf node information, we first construct a polynomial f root (x) satisfying g Although f root (x) is unknown, we can compute the exponential form of f root (x), namely, where a i ∈ Z p is the random number and π is a threshold value of root node in access tree T * . For the children node v i of root node root, we can compute g represents the polynomial corresponding to node v i that is the i-child of root node. According to the above method, we can obtain the following values by applying the manner from top to bottom, namely, where f i (x) denotes the polynomial of leaf node i in access tree T * . Because T i � g t i p · R i and R i is a random element, we have (3) and then it produces a signature δ * s � (δ * 1 , δ * 2 ) by randomly choosing t * 0 , where δ * 1 � g , C x i i∈I ) which is returned to A I . Obviously, when Z � e(g p , g p ) abc , we have C * � Z · e g c p , g α ′ p � e g p , g p abc+α′c � e g p , g p ab+α′ ( )c � e g p , g p αc � P c y . (22) us, the produced blinded one of the dataset X c by the above way is valid.
6.5. Phase 2. In this phase, A can still issue a series of new queries as in Phase 1, but the following restriction conditions must been satisfied: (1) T * is not allowed to make the KeyGen queries.
(2) e blinded dataset BT * is not allowed to make TokenGen1 query.
6.6. Guess. Eventually, the adversary A returns its guess If c � c ′ , then it outputs true. Otherwise, false is returned.
From the point of the adversary A's view, the simulation of CH is indistinguishable from the real game. When Z is a random element of G 1 , the produced blinded dataset BT * has the same distribution as the real blinded dataset. It is independent of the choice of dataset X c . In this case, the probability of A guessing c ′ is When Z � e(g p , g p ) abc holds, the produced blinded dataset BT * is a valid one. If the adversary A breaks the proposed scheme in nonnegligible probability ε, then it means that the adversary can solve the DBDH problem in groups G 1 with the following probability: Pr g p , g p a, g b p , g c p , e g p , g p abc , Z ≥ 1 2 However, due to the difficulty of solving the DBDH problem in groups G 1 , thus the probability of the adversary A breaking the proposed scheme is negligible.

Theorem 2. If it is infeasible to generate a message that yields a given hash value for one-way hash function H(·), then our proposed PSI scheme can satisfy data secrecy in the standard model.
Proof. Suppose that there exists an adversary A II which breaks data secrecy in the proposed scheme, then we can construct an algorithm B to solve the one-way problem of hash function. Firstly, we review the one-way problem of hash function. Given a hash value y and one-way cryptographic hash function H 0 , its goal is to find a number η such that it satisfies H 0 (η) � y. To break the one-way problem of hash function, the algorithm B needs to initialize system parameters and plays an interactive game with the adversary A II . e detailed processes are given as follows: (i) Setup: the algorithm B takes a security parameter λ as inputs and outputs system parameters (G 1 , G T , N � p · q, e, G p , G q , g p , g q )←Gen(1 λ ). For the universal attribute set U � (w 1 , . . . , w n ), randomly choose α, t 1 , t 2 , . . . , t n ∈ Z p and R 0 , R 1 , . . . , R n ∈ G q to compute public keys P x � g p · R 0 , P y � e g p , g p α , Additionally, KGC chooses two collision-resistant hash functions H and H 0 satisfying H: N, e, H, H 0 , e, P x , P y , T i (1 ≤ i ≤ n) (26) are sent to the adversary A II . (ii) Phase 1: in this phase, the adversary A II is able to adaptively issue KeyGen queries and TokenGen1 queries. When the adversary A II issues such queries, the algorithm B runs KeyGen() and TokenGen1() to response them since it has master secret key. (iii) Challenge: to produce a challenge, the adversary A II submits an access tree T * and k − 1-element dataset X * � x * 1 , . . . , x * k− 1 to the algorithm B. e algorithm B randomly chooses a number r ∈ Z p to compute the blinded dataset BT * .
(3) For the other components of BT * , they are computed by the Blind algorithm since the algorithm B possesses master secret key and the private key of data owner.
to the adversary A II . (iv) Phase 2: in this phase, the adversary A II is still able to issue the same queries as those in Phase 1. (v) Guess: at last, the adversary A outputs its guess x * k in the position k for the dataset X * in the nonnegligible probability (note that x * k � d * or x * k ≠ d * ). If the adversary A II wins the game, then it means that x * k satisfies the following relation: us, given a hash value y, the algorithm can find a value H(P r y � � � � � pk 0 ) ⊕ H 0 (y * ) to satisfy Obviously, it is in contradiction with the one-way property of hash function. erefore, for two datasets X * and Y * , the adversary A can obtain nothing about (Y * \(X * ∩ Y * )) or (X * \(X * ∩ Y * )), except the intersection in X * ∩ Y * . □ Theorem 3.
e proposed scheme can ensure the integrity of the returned blinded dataset by the cloud assuming that the underlying Schnorr signature is unforgeable.
Proof. Suppose that there exists an adversary A III which breaks the integrity problem of the returned blinded dataset in the proposed scheme, then a challenger can construct an algorithm B which can break Schnorr signature; its goal is to output a new message-signature.
Let CH be a challenger; to break the security of Schnorr signature, the challenger CH runs as follows: (i) Setup: the algorithm B takes security parameter λ as input and outputs system parameters (G 1 , G T , N � p · q, e, G p , G q , g p , g q )←Gen(1 λ ). For the universal attribute set U � (w 1 , . . . , w n ), randomly choose α, t 1 , t 2 , . . . , t n ∈ Z p and R 0 , R 1 , . . . , R n ∈ G q to compute public keys are sent to the adversary A III , and it keeps the master private key (α, t i , (1 ≤ i ≤ n)) secretly. (ii) Blinding dataset queries: the adversary A III is able to adaptively issue Blinding Dataset queries with a dataset X � x i i∈|X| and an access tree T. When receiving a query (X � x i i∈|X| , T), the challenger executes as follows: (1) It picks a number t ∈ Z p at random to compute (2) en, it sends C x 1 � � � �· · · ‖C x |I| to algorithm B, and algorithm B makes a signing query to signing oracle in the Schnorr signature security game with string C x 1 � � � �· · · ‖C x |I| . After obtaining the returned signature δ s , B returns it to the challenger CH.

Security and Communication Networks
(3) Next, the challenger CH computes the following values by adopting to access trees T: where R i ′ is a random element in group G q . (4) Finally, it returns the blinded dataset BT � (T, δ s , C 0 , C x i i∈I , C i i∈T ) to the adversary A III .
(iii) Output: eventually, the adversary A III outputs (Tok * 2 , C * x i i∈I , δ * s ). e adversary A III wins the game if δ s is a valid signature on C * x 1 � � � �· · · ‖C * x |X| and the string C * x 1 � � � �· · · ‖C * x |X| is never made signing query to B. If A III wins this game, then CH sets δ * s as the output of B algorithm. Because δ s is a valid signature on C * x 1 � � � �· · · ‖C * x |X| and never makes a signing query with string C * x 1 � � � �· · · ‖C * x |X| , it means that algorithm B successfully breaks the unforgeability of Schnorr signature.
Obviously, it is in contradict with the unforgeability of Schnorr signature.
us, the proposed PSI protocol can ensure the integrity of dataset. □ Theorem 4. Our proposed PSI scheme can achieve hidden access attributes.
Proof. In the proposed scheme, to achieve the attributes anonymously, we use the orthogonality property of composite order bilinear groups. In the encryption phase, the random elements R 0 and R j are introduced into C 0 and C j j∈T in the blinded dataset. It can effectively prevent some malicious attackers from testing the access policy by a possible access structure w j ′ and guessing the access structure. us, it achieves hidden access attributes.

Performance Analysis
In this section, we evaluate the efficiency of the proposed PSI protocol in terms of computational costs. To give a fair comparison, we abandon the comparison with the other PSI protocols. e reason is that the goals of the proposed PSI protocol support fine-grained access control with hiding attribute and ensure the integrity of the returned dataset, which differs from most of the existing PSI protocols in which their goals are to be securely against semihonest adversaries or to improve efficiency of the PSI protocol. To our knowledge, it is the first PSI protocol with supporting attribute hidden fine-grained access control and integrity verification. Additionally, experiment results also show that the proposed PSI protocol is quite efficient.
To illustrate the effectiveness of the proposed PSI scheme, we also implement the experiment simulation based on an Ubuntu 18.04 laptop computer with the Intel(R) Core(TM) 4130 CPU@3.40 GHz, 4 GB RAM. All algorithms are written with C language in Linux system. Because the proposed scheme is based on composite order bilinear pairing, we adopt "Type A1 pairing" in PBC Library which provides a level of security equivalent to 1024-bit discrete logarithm problem. To be simple, we assume that access tree T only includes a root node and n leaf nodes, namely, it is in the form of (w 1 AND w 2 AND · · · AND w n ), where w i is an attribute.
In the proposed protocol, private set intersection is only computed by data user, and data owner does not need to participate in this phase. In the whole protocol, after the dataset is outsourced to cloud server, the data owner is offline. However, in many PSI protocols [6,18,20,27], both of data owner and data user have to be online and have interaction with each other.
Here, we show the performance of the proposed PSI protocol by evaluating the execution-time overhead of each algorithm. e setup algorithm is used to initialize system parameters; the required execution times are mainly determined by the cardinalities of the universal attribute set. Its running time increases as the cardinalities of the universal attribute set increase; the corresponding performance graphs are given in Figure 2(a). For the KeyGen algorithm, its performance is shown in Figure 2(b). According to Figure 2, we can find that the running time in the two algorithms is greatly affected by the size of the data user's attribute set. us, they are two slashes that increase monotonically. When the number of attributes is 60, these two algorithm's running time is approximately 1.2 seconds. It is acceptable. For Blind algorithm, its running time is mainly from generation of fine-grained access structure and linear to the size of attribute set. However, the size of cardinality of the dataset has little influence on the runtime of algorithm after access attributes are fixed, the reason is that we blind the dataset by adopting XOR operation in the blinding process, and XOR operation's runtime is negligible. eir performance graphs are shown in Figures 3(a) and 3(b). From Figure 3(b), we can know when attribute number is 20 and the cardinality of dataset is 2 20 ; the runtime of algorithm is only about 371.578 ms. For TokenGen1 algorithm and TokenGen2 algorithm, their runtime is linear to the size of the data user's attribute set, the reason is that the two algorithms correspond to decryption process of attributebased encryption scheme. eir performance graphs are shown in Figures 4(a) and 4(b). For PSI Computation algorithm, the runtime of algorithm is only related to the cardinality of dataset; it is shown in Figure 5, and we can know that for the cardinality of dataset, from 2 10 to 2 18 , the runtime of algorithm hardly changes since only 3 exponentiation in G 1 , 1 pairing operator, |Y| hash operators, and |Y| XOR operators are needed in this algorithm. However, XOR operator and hash operator are two kinds of lightweight operations that take almost no time. When the cardinality of dataset is 2 20 , PSI computation runtime is 296.369 ms.
us, it is very suitable for the resource-limited data user. 10 Security and Communication Networks

Conclusion
In this work, we presented the first private set intersection (PSI) computation with supporting hidden attribute finegrained access control and integrity verification based on attribute encryption. e main goal of the proposed scheme is to realize customized function to cater to practical application. Compared to most of the existing schemes, the proposed scheme has the following merits: (i) for data owner, it achieves dataset's access control by defining an access policy before the dataset is outsourced. (ii) It makes data owners to be offline during the whole PSI protocol. (iii) It ensures the integrity of the returned dataset from the cloud. (iv) e main PSI computation burden is transferred to the cloud. (v) It supports one-to-many PSI computations, that is to say, after the blinded dataset is outsourced to the cloud, data user can implement PSI computations with the cloud for arbitrary times. In addition, after giving the corresponding security analysis, we evaluate each algorithm of the proposed scheme by experiment simulation; the results show that the performance of the proposed scheme is efficient and practical. To reduce computational cost of the whole scheme, we will study fine-grained access control PSI scheme with constant computation in the future work.

Data Availability
is article contains the data that support the results of this study. If other data used to support the results of this study are needed, they can be obtained from the corresponding author.
Disclosure e funders had no role in the design of the study; the collection, analyses, or interpretation of data; the writing of the manuscript; or the decision to publish the results.

Conflicts of Interest
e authors declare that they have no conflicts of interest.