HCV: Practical Multi-Keyword Conjunctive Query with Little Result Pattern Leakage

use,


Introduction
With the wide application of cloud computing and the commercialization of 5G, more and more people tend to outsource their data. Some remote storage systems [1,2] help clients with limited resources to manage large amounts of data at a very low cost. Although it brings great convenience and higher efciency, the security and privacy issues cannot be ignored. For example, famous social networking site Facebook may have leaked data for millions of users to a political frm Cambridge Analytica [3]. Encryption is a simple solution to protect data security, but it would prevent the data from being searched. Searchable encryption (SE) can address this issue by providing a way to search without decryption. Xiaodong et al. proposed the frst SE scheme in 2000 [4]. Tey realized the retrieval of a keyword on the encrypted data. Later, a series of practical SE schemes [5][6][7][8][9] were proposed.
Te schemes mentioned above can only support singlekeyword queries. However, a practical system needs to fnd the documents containing a set of keywords, which was called conjunctive query. A naive method of conjunctive query is to perform single-keyword query for each keyword one by one and then flter the desired ones. Nevertheless, if the resultant document set is very large for a keyword, this method will have low efciency. Besides, this method causes signifcant information leakage, as it reveals the resultant document sets for each queried keyword. A passive attacker in [10,11] can leverage the common leakages in SSE schemes to reveal the user's query. Recently, Zhang et al. [12] proposed an even more powerful attack, in which the attacker can adaptively inject new documents. With this power, the attacker can recover the content of user's query by learning which added documents match it [3].
Some conjunctive SSE schemes are proposed to compromise security and efciency. Golle et al. frstly proposed the conjunctive equality queries [13]. Each conjunctive query builds a set of tokens that can be used to identify matching documents in the database. Teir methods only leak the set of matching documents. However, the workload of the server is heavy. Moreover, the communication complexity between server and client is high. Te scalability of this solution is limited [14]. Cash et al. [14] proposed the frst conjunctive query scheme with sublinear searching complexity, that is, "Oblivious Cross-Tags" (OXT). Before that, all other solutions can only work linearly in the database's size. Nevertheless, the OXT protocol leaks some "partial" information to the server, containing the queries themselves and the database contents. Result pattern (RP) leakage is one of the information leakages mentioned in OXT. Te adversary can use it to steal information. Te fleinjection attacks [12] have exploited RP leakage to reveal all queried keywords with 100% accuracy [15]. Kamara and Moataz [16] proposed highly efcient SSE schemes with worst-case sublinear search and achieved optimal communication complexity. Tey used the set operations (union, intersection, and complement) for efciency. Also, their methods can support conjunctive, disjunctive, and Boolean queries. However, set operations inevitably cause information leakage. For the sake of efciency and functionality, Kamara's method does not prevent RP information leakage. It leaks more than many other solutions. Nevertheless, Kamara's method of building the inverted index is very worthy of our reference. Lai et al. [15] analyzed the result pattern (RP) leakage and proposed "Hidden Cross-Tags" (HXT). Te HXT protocol eliminates keyword-pair result pattern (KPRP) leakage presented in the OXT protocol, leaving only the minimal and signifcantly smaller whole result pattern (WRP). Tus, the HXT protocol ofers high security than the OXT protocol. Although the HXT protocol is efcient and practical, it cannot get the full query results because of using Bloom flter to build an index database.
Yin et al. [17] proposed an efcient and privacypreserving multi-keyword conjunctive query over the cloud. Tey used the binary tree structure and homomorphic encryption to achieve high query efciency and small privacy leakage. Tis method supports the multi-keyword conjunctive query and protects its privacy. However, the scale of this scheme is limited for the tree-based index. Tey gave the experiment result based on eight keywords, not enough in most of the settings.
Te existing scheme mentioned above cannot achieve unnecessary information leakage and precise query results. We make progress on these issues and give an afrmative answer by constructing a practical SSE scheme. Our construction has been focused on the conjunctive queries, like OXT and HXT, since such queries are the most common in many practical settings. In our construction, we assume that the server is honest but curious and there are only one reader and one writer.

Our Contribution.
We make progress on the multikeyword conjunctive query and construct a method with high privacy and accuracy query result. Te main contribution of this paper can be summarized as follows: (1) First, we progress on the multi-keyword conjunctive query setting since it is a common search method for most people. Unlike the index database based on inverted index, we propose a novel data structure-counter vector (CV)-to construct the index database, which is the basis of our proposed solutions. Using CV, we built a map from three-tuple keywords to a fle-path collection containing all the three keywords, and we designed an algorithm to build the CV database as quickly as possible. Compared to Kamara and Moataz's multi-map [16], CV achieves a higher search efciency while compromising storage efciency. Although the number of three-tuple keywords is much larger than the keyword pair, storage space's natural growth is limited due to the sparsity inverted index database. However, due to the original single-word inverted index database's sparsity, the actual growth of storage is limited. Experiments show that when the keywords' weight is less than 110 and the number of keywords is not more than 512, the CV database would add about 30% storage to the keyword-pair database, and the search efciency improved dozens of times. (2) Second, we propose three schemes CVX, ICV, and HCV, using two kinds of cryptography primitives and CV data structure. CVX is the basic scheme and has a much higher search efciency than Kamara's IEX. However, CVX's search efciency is greatly afected by the weight of keywords (i.e., the number of documents containing a keyword). For this reason, we propose the improved ICV, which is more efcient for heavy-weight keywords. For HCV, it uses BGN homomorphic encryption algorithm to reduce the RP leakage [18]. We prove that HCV achieved ideal leakage for a 3-keyword query. All of these schemes have a strong practicability. Tey are easy to implement and can work on a PC. (3) Tird, we analyze our scheme's security and evaluate the performance of all three schemes. Te experiments show that both CVX and ICV have much more searching efciency than Kamara's IEX. When the keywords' weight is larger, ICV has a signifcant advantage over CVX. For HCV, we analyze its privacy, indicating that it leaks less than Cash's OXT and gives the accuracy query results better than the probabilistic result of Lai's HXT.
We compared the performance of some schemes, and Table 1 gives the details.

Related Work.
Xiaodong et al. [4] gave the frst SSE scheme, whose search complexity is linear to the size of database. Later, Goh [8] introduced a search index for each fle and made the search cost to be proportional to the number of fles. Curtmola et al. [7] presented the inverted index to achieving sublinear search complexity. Tis scheme defned two formal security models and gave a formal security defnition. To support expressive queries, Golle et al. frstly proposed the conjunctive equality queries [13]. In each conjunctive query, a set of tokens can be built to identify matching documents in the database. Teir methods leak little information; however, the performance is not so good and the scalability of this solution is limited.
To support more scalable and expressive queries, Cash et al. [14] proposed the Oblivious Cross-Tags (OXT) protocol with worst-case sublinear search complexity. Te OXT protocol divides conjunctive search process into s-term and x-terms. Te s-term is for a regular single-keyword search and x-terms are used to acquire document identifers containing multiple keywords. However, the OXT protocol cannot avoid the "keyword-pair result pattern" (KPRP) leakage, which can be exploited in recent attacks [10,12]. Since then, a line of extensions [19][20][21] have been made for OXT. However, such schemes actually trade of performance, security, and functionality. To improve the efciency of OXT, Kamara and Moataz [16] introduced two schemes LEX-2Lev and LEX-ZMF, and both achieve worst-case sublinear search complexity for Boolean query. Lai et al. [15] proposed "Hidden Cross-Tags" (HXT) protocol, which achieved conjunctive query and reduced the leakage of OXT. Te HXT protocol used the "Cross-Tags Set" (XSet) data structure and a lightweight Hidden Vector Encryption (HVE) to encrypt it, and then it avoids the KPRP leakage. In fact, HXT's XSet is actually a Bloom flter, and thus it cannot give the precise query result.
Very recently, some conjunctive SSE schemes with extended functions are proposed. Wang et al. [22] pointed out that the scheme proposed in article [23] is not correct. A new SE scheme was proposed by adopting a special additive homomorphic encryption scheme to achieve the multiplicative homomorphic property efciently. Furthermore, they enhanced the security on the user side. Ma et al. presented a practical SSE protocol that supports conjunctive queries without KPRP leakage [24]. Tey proposed a novel SSE protocol called "Practical Hidden Cross-Tags" (PHXT). Using subset membership check (SMC), the PHXT protocol maintains the same storage size as OXT while preserving the same privacy and functionality as HXT. Fan et al. [25] proposed a verifable conjunctive keyword search scheme based on cuckoo flter (VCKSCF), which signifcantly reduces verifcation and storage overhead. Gan et al. also focused on the verifable conjunctive SSE [26] and presented an efcient verifable SSE (VSSE) scheme for conjunctive queries with sublinear search overhead. VSSE is built on the OXT protocol and completes the verifcation through Symmetric Hidden Vector Encryption (SHVE) and greatly reduces the computation payload in the verifcation process. For IoT application, Zhang et al. [27] proposed a lightweight and efcient attribute-based encryption scheme for data sharing and searching (namely, LSABE). Teir scheme can signifcantly reduce the computing cost of IoTdevices with the provision of multiple keyword searching for data users.
Tis paper is organized as follows. We give the preliminaries in Section 2. In Section 3, we depict the details of the CVX SSE scheme, containing the proof of correctness and evaluation of the efciency and security. Section 4 depicts the ICV scheme. In Section 5, we introduce the HCV scheme. Te experiment results are shown in Section 6. We conclude our method in Section 7.

Preliminary
We frst depict the notations and defnitions. Ten, we list part of them in Table 2.
x ⟵ A represents that the element x is output by algorithm A. For a tuple v of n elements, its ith element can be denoted as v i or v [i]. Given an element s ∈ v, let l − 1 (s) denote the index of s in v. For a set S, we use #S to represent its cardinality. For a string s, |s| means its bit length and s i means its ith bit. Given strings s and r, s‖r refers to their concatenation.
Multi-map (MM) is an abstract data type. Typically, it can be instantiated by an inverted index. MM with capacity n is a collection of n label/tuple pairs (l i , V i ) i≤n . Getting the tuple associated with label l i can be denoted as Similarly, associating the tuple V i to label l i can be denoted as MM[l i ] � V i .
Te symbol D � (D 1 , . . . , D n ) denotes a document collection. Each document contains a number of keywords from the universe W. Te ith keyword in W can be denoted as W [i], and the document identifer can be denoted as id(D i ). Each multi-map can be regarded as a database and denoted as DB. Te document collection containing keyword w can be written as DB(w), and the set of keywords in W that co-occur with w can be written as coDB(w) ⊆ W. - We substitute many leakages by upper bounds and assume some search times' interaction. "PI" means probabilistic (Bloom flter) indexing, "RPH" means result pattern hiding, "Conj" means supporting conjunctive query or not, and "Query comm" means the size of message from client. For notations, q means the number of queried keywords, n � # do cuments, Informally, for a private-key encryption scheme, if its ciphertexts do not reveal any partial information about the plaintext even to an adversary that can adaptively query an encryption oracle, we say it is secure against chosenplaintext attacks (CPAs). Similarly, if its ciphertexts are computationally indistinguishable from random even to an adversary that can adaptively query an encryption oracle, we say it is random-ciphertext-secure against chosen-plaintext attacks (RCPAs) [28].

Result Pattern Hiding Searchable Encryption.
Lai et al. [15] proposed result pattern (RP) hiding searchable encryption to resist RP leakage proposed by Cash et al. [14]. RP leakage is the leaked information obtained by the server during query. In [15], Lai et al. analyzed the RP leakage and gave three forms: single-keyword result pattern (SP) leakage, keyword-pair result pattern (KPRP) leakage, and multiple keyword cross-query intersection result pattern (IP) leakage.
Te KPRP leakage is a "nonideal" leakage and can be eliminated. Consider an n-keyword conjunction query w 1 ∧ · · · ∧ w n ; during this process, the server gets the set DB(w 1 ) ∩ DB(w i ) of documents containing every pair of query keywords of form (w 1 , w i ), 2≤i≤n, and it can acquire the fnal query result, which is the set ∩ n j�1 DB(w j ) of documents matching all n query keywords.
Only the fnal query result, which is called whole result pattern (WRP) leakage, cannot be avoided during this process. In addition, other leaks are intermediate links in the query process and can be reduced in some ways.

Bilinear Groups of Composite
Order. Given a security parameter k, generate a tuple (N, g, G, G T , e), where N � p · q and p, q are two k-bit prime numbers. G and G T are two fnite cyclic multiplicative groups of composite order N, g ∈ G is a generator, and e: G × G, G ⟶ G T is a bilinear map with the following properties: (i) Bilinearity. e(g a , h b ) � e(g, h) ab for any (g, h) ∈ G 2 and a, b ∈ Z N . (ii) Nondegeneracy. If g is a generator of G, then e(g, g) is a generator of G T with order N.
(iii) Computability. Tere exists an efcient algorithm to compute e(g, h) ∈ G T for all (g, h) ∈ G.
(i) Key generation: Given a security parameter k, generate a tuple (N, g, G, G T , e) as described in Section 2.3.1. Set h � g q ; then, h is a random generator of the subgroup of G of order p. Compute the key pair, containing the private key sk � p and the public key pk � (N, G, G T , e, g, h). (ii) Encryption: let m denote the message to be encrypted, choose a random number ∈ Z N , and compute the ciphertext c � E(m, r) � g m h r ∈ G. (iii) Decryption: Given the ciphertext c � E(m, r) � g m h r ∈ G, then compute c p � (g m h r ) p � (g p ) m . Set g � g p and compute the discrete log of c p base g according to Pollard's lambda method (see [30] (p.128) and [17]).

The Basic Scheme
In this section, we will introduce the basic multi-keyword conjunctive query scheme proposed in this paper. First of all, we give the details of a counter vector data structure.

Counter Vector.
A counter vector cv with length n is an array of n integers. Given a three-tuple(w 1 , w 2 , w 3 ), suppose DB(w 1 ) contains seven fles fid 1 , . . . , fid 7 , and we can get counter vector cv as in Table 1. A collection of countervectors compose of CV database.. Given a counter vector cv � x 1 , . . . , x n , we can easily query the 2-conjunctive keywords (w 1 , w 2 ) and 3conjunctive keywords (w 1 , w 2 , w 3 ) as follows: Te set of all keywords, the key used to encrypt In fact, our constructions are mainly based on the counter vector. First, we make database (DB) containing CV and MM. Ten, we encrypt them separately to get EDB � (EMM, ECV). During the search, we decrypt and get the counter vectors frst and then query them to get the result.

Counter Vector Database.
In the proposed mechanism, we need to construct CV database. Te details are shown in Algorithm 1.
Step 1: sort the original inverted index according to the length of DB (w).
Step 2: for each keyword pair (w i , w j ), initial a integer vector cv with length #DB(w i ), and compute DB(w i ∧ w j ).
Step 3: for each three-tuple(w i , w j , w k ), compute DB(w i ∧ w j ∧ w k ) and get the counter vectors as in Table 1.
Once fnishing all three tuples, we can get the CV database.

Description of CVX Scheme
. CVX mechanism is the basic scheme we proposed. It consisted of three modules: Setup, Token, and Query. Te Setup generates an index database DB and encrypts it to EDB. Token outputs a token with the key and keyword set. Query returns the queried result once given the token and EDB. Tese modules contain several algorithms: which is used to get tags given the counter vector and document database.
Te CV structure is a multi-map data structure with diferent contents from the MM, so CV and MM support the same encryption algorithm, but it will be diferent in the process of Search. We choose a black-box MM encryption scheme as in Kamara and Moataz [16].
Te project is depicted as follows.
(i) Setup. In the Setup modules, an index database DB was generated and encrypted to be EDB. Given the security parameter k, compute the keys for encryption. Ten, given the keyword/docID pairs, we can generate an inverted index database MM and get a counter vector using Algorithm 1. MM maps each keyword w ∈ W to encrypted identifers in DB (w). CV maps each threetuple(w i , w j , w k ) to a vector of integers. Finally, encrypt the MM and CV using algorithms MM and CV , respectively. We can get encrypted multi-maps (EMM) and encrypted counter vectors (ECV). All the output of Setup mainly contains the encrypted structures EDB = (EMM, ECV) and their keys. (ii) Token. Given the key and a set of queried keywords w � (w 1 , . . . , w q ), the Token algorithm outputs the token TK = (gtk, ltk). gtk is the global token used to query the MM. ltk is local token used to query the CV. ltk contains q − 1)/2 subtokens. Take q � 5 for example, gtk is calculated from w 1 , and ltk contains two subtokens (stk) computed from (w 1 , w 2 , w 3 ) and (w 1 , w 4 , w 5 ), respectively. (iii) Query. We can get the output result of Query using the following steps. First of all, input the EMM and global token , and we can get the encrypted multimaps mm = DB (w 1 ). Second, from ECV using the local token ltk � stk 1 , . . . , stk ⌈(q−1)/2⌉ , get encrypted counter vectors cv i , i � 1, . . . , (q − 1)/2 . Tird, decrypt them using the exiting encryption schemes MM and CV separately. Subsequently, for each cv i , get the set S i of tags according to the counter, compute the intersection S � ∩ S i , and output the set S.
We detailed the CV-based conjunctive SSE scheme CVX � (Setup, Token, Query) in Algorithm 2.

Correctness and Efciency.
In this subsection, we prove the correctness and analyze the efciency of our scheme.

Correctness.
To show the correctness of CVX, we consider the operations of the counter vectors. For a common conjunctive query w � (w 1 ∧ · · · ∧ w q ), the output of CVX.Query (EDB, tk) is We want to get T for conjunctive queries, where We know that Security and Communication Networks Input: inverted index Output: CV database (1) sort the original inverted index according to #DB(w); initial the tmp i,j of length #DB(w) with 0; ALGORITHM 2: Basic CV SSE. 6 Security and Communication Networks C w is a constant relating to w. #co(w) is the number of keyword sets which contain pairs (w, v), w, v ∈ W. #co(w, v) is the number of keyword sets which contain threetuple(w, v, u), w, v, u ∈ W. strg is the storage complexity of MM , which is a black-box multi-map encryption scheme used in this paper.

Security Analysis.
CVX is adaptive secure on condition of controlled disclosure [6]. We mainly consider its leakage functions, which include the Setup leakage and Query leakage. Te details of the CVX leakage profle is depicted below. Its Setup leakage is where L mm S (MM) and L cvx S (CV) are the Setup leakages of the multi-map encryption schemes and counter vector encryption schemes, respectively. Te Query leakage is where for all 1≤i≤ q − 1)/2 , and f i is a random function from 0, 1 { } |id|+log #W to 0, 1 { } k . If the defnition of adaptive security is similar to Kamara and Moataz [16] (Defnition 4.2), we give the theorem as follows.
Theorem 1. CVX is (L cvx S , L cvx Q )-secure on the condition that MM and CV are adaptively (L mm S , L mm Q )-secure, SKE is RCPA-secure, and F is pseudo-random.
Proof. Suppose S CV and S MM are the simulators guaranteed to exist from MM and CV 's adaptive semantic security. To simulate EDB, the simulator S for CVX takes the Setup leakage as input: computes EMM ⟵ S MM L S (MM) , ECV ⟵ S CV (L bv S (CV)), and outputs EDB � (ECV,EMM).
Te gtk can be simulated as For all 1≤i≤ q − 1)/2 , stk is simulated as As we know, for all probabilistic polynomial-time adversaries A, the probability Real(k) outputs 1 and Ideal(k) outputs 1 is very close. Tat is, if the Σ CV and Σ MM are adaptive security, the SKE is RCPA-security, and the F is pseudo-randomness, then the simulated EDB and tk are indistinguishable from the real EDB and tk. It shows that the leaked random tag is indistinguishable from the encrypted identifer in the Real(k) experiment.

Description of ICV.
Firstly, we set #DB(w) to be the weight of keyword w. When the average weight of all keywords is larger, the CVX will be less efcient. For this reason, we improve the CVX and propose ICV scheme, which is more efcient when the average weight is heavy.
Similar to CVX, ICV has three modules (Setup, Token, and Query) and fve security algorithms that are Σ CV , Σ MM , private-key encryption scheme SKE, pseudo-random function F, and GetTag · { } function. Te biggest diference between CVX and ICV lies in the construct of DB. In CVX, the MM is the inverted index, while ICV uses two-dimensional inverted index to construct MM. Tat is, for each keyword pair (w i , w j ), 1≤i, j≤ | W | , compute the DB (w i ∩ w j ), pad it to the value of MM [i, j], and then we get a two-dimensional MM, i.e., each label l i,j corresponds to the set DB(w i ∩ w j ).
Obviously, we can query a keyword pair easily using the MM. For a single-keyword query, we only need to let i � j, that is, to acquire DB(w i ∩ w i ). For q-conjunctive query, where q ≥ 3, we can parse it to (q − 2)three-tuples, and then, query each three-tuple to get a set S i , 1≤i≤q − 2, and then output the result S � ∩ q−2 i�1 . Te scheme is depicted as follows.
(i) Setup. In the Setup process, an index database is generated and encrypted. Firstly, given the security parameter k, we can compute the keys. Ten, calculate the two-dimensional index database MM using the keyword/docID pairs. For every keyword pair w i , w j ∈ W, MM maps it to encrypted identifers in DB (w i ∩ w j ). Otherwise, given a new keyword w k , compute a binary counter vector cv i,j with length | DB(w i ∩ w j ) | ; if tag ii ∈ DB(w i ∧ w j · · · ∧ w k ), set the cv i,j [ii] to be value "1"; otherwise, set it to "0." Up to now, CV maps each three-tuple (w i , w j , w k ) to a vector of binary integers with length #DB(w i ∩ w j ). Finally, encrypt the MM and CV using algorithms Σ MM and Σ CV , respectively. We can get EMM and ECV. All the output of Setup mainly contains the encrypted structures EDB � (EMM, ECV) and their keys. (ii) Token. Given the key and a set of queried keywords w � (w 1 , . . . , w q ), suppose q ≥ 3, and the Token algorithm generates the token TK � (gtk, ltk). gtk is the global token obtained from (w 1 , w 2 ) and is used to query the MM. ltk contains q − 2 subtokens, (stk 1 , . . . , stk q−2 ). Each stk i is calculated from (w 1 , w 2 , w i ), i � 3, . . . , q. (iii) Query. We can get the results of Query through the following steps. First of all, input the EMM and global token gtk, we can get MM � DB (w 1 ∩ w 2 ). Ten, parse the ltk � (stk 1 , . . . , stk q−2 ), for each stk i , query ECV and get encrypted counter vectors ecv i , i � 1, . . . , q − 2, and decrypt them to cv i using Σ CV , separately. Finally, for each cv i , run GetTag · { } to get the set S i . If we have all the subtokens processed, compute the intersection S � ∩ S i and output the set S.
Te details of the ICV SSE scheme ICV � (Setup, Token, Query) are given in Algorithm 3.

Correctness and Efciency.
We now analyze the correctness and efciency.

Correctness.
To show the correctness of ICV, we consider the operations of the counter vectors. For a common conjunctive query w � (w 1 ∧ · · · ∧ w q ), we can get the following equation, which gives the correctness of the ICV. O(q − 2). Te sizes of tokens are O(q − 2). Te community complexity achieves optimal because the search result has no redundancy. Te storage complexity is

Efciency. Te Query complexity of ICV is
where C w,v is a constant relating to w, v and strg is the storage complexity of Σ MM , which is a black-box multi-map encryption scheme used in this paper.

Security Analysis.
Because the CVX and the ICV use the same data structures MM and CV and use the same algorithm for security, their safety performance is equivalent and can be proved in the same way.

Hidden Result Pattern CV SSE (HCV)
In this section, we propose our new hidden result pattern conjunctive keyword query scheme. Tis scheme employs Yin et al.'s scheme [17] to decrease the result pattern leakage. Unlike Lai's HXT scheme, we do not use the Bloom flter to construct index database, so our HCV can achieve accurate query and even ideal leakage when q � 3. Te details of HCV are given as follows.

Description of HCV.
HCV mechanism consists of three modules: Setup, Token, and Search. In Setup module, an index database DB was generated and encrypted to EDB. Token outputs a token tk for searching with the key and keyword set. Query returns the queried result once given the token and EDB. Tese modules contain several algorithms: (iii) A private-key encryption scheme SKE � (Gen, Enc, Dec), which is RCPA-secure.
(iv) A pseudo-random function F.
Te project is depicted as follows.

Setup.
In the Setup process, an index database is generated encrypted. Firstly, it generates a MM and CV database using the same method as CVX .Setup. Ten, encrypt the MM and CV, respectively.
MM is encrypted using the existing black-box encryption scheme Σ MM , outputting EMM. CV is encrypted using Σ HCV based on BGN homomorphic encryption technique. All the output of Setup mainly contains the encrypted structures EDB � (EMM, ECV) and their keys.
Σ HCV is depicted as follows.
(a) Initialize: Compute a pair of public key pk � (N, G, G T , e, g, h) and the private key sk � p, according to the given security parameter k. Further, a one-way hash function H: 0, 1 { } * ⟶ Z N was initialized by the data user, and three random numbers R 0 , R 1 , and R 2 were chosen from Z N . Te tuple sk, R 0 , R 1 , R 2 and keyword dictionary W are kept secret, and the key pk, H is sent to the cloud server. (b) Encrypt CV: for each three-tuple w � (w i 1 , w i 2 , w i 3 ), we can get a cv t w � t 1 , . . . , t n , t i ∈ 0, 1, 2 { }, and the user encrypts it as where R 0 , R 1 , R 2 are three random numbers owned by the data user. Ten, each counter vector t w can be encrypted to c � c 1 , . . . , c n , and all of the encrypted counter vectors constitute the ECV. (c) Encrypt fles: encrypt each fle f j ∈ D using the existing SKE encryption scheme, and then send them to the cloud server, as well as the EMM and ECV.

Token.
It is the same as CVX.

Query.
Te conjunctive query between client and server can be done as follows.

Security and Communication Networks
(a) Te client sends the token tk to server. Server parses it as some subtokens stk i , for each i, query the ECV, and if one of the queries returns ∅, then return ∅ to the client and end the search. Otherwise, send a new query request command to client. (b) Once received query request, client initializes a integer vector V of length n, sets each element of V to a fxed value, which can be chosen from 0, 1, 2 { }, chooses n random numbers r 0 , r 1 , . . . , r n , and encrypts the vector V as E(V), where Client sends the subtokens and E(V)s to the server. Te details of the HCV SSE scheme HCV � (Setup, Token, Query) are given in Algorithm 4.

Correctness and Efciency.
In this subsection, we prove the correctness and analyze the efciency.

Correctness.
Te correctness of HCV can be proved using the following equation. While it is true, the check operations in HCV .Query is valid, and the result is correct.

Efciency. Te search complexity of HCV is
O( (q − 1)/2 . Te sizes of Tokens are O( (q − 1)/2 . Te communication complexity achieves optimal because the search result has no redundancy. For CVX and HCV, they have the same CV data structure before encryption, while the encryption algorithm of CVX keeps the length of encryption unchanged, and HCV extends the length of encryption to N times, so the storage complexity of HCV is N times over CVX, where N is determined by HCV's security parameter k.

Security Analysis.
In this section, the privacy and security of the HCV can be analyzed. We keep our eyes on two types of security: one is the RP leakage, and the other is the privacy of outsourcing data and conjunctive queries.

RP Leakage Comparison.
We consider two leakage components: KPRP and WRP, and illustrate the diference. For example, given a database containing six documents labelled by id i 1≤i≤6 and each document contains some keywords. Te details are listed in Table 3. Consider the conjunctive query w 1 ∧ w 2 ∧ w 3 . Suppose the keyword w 1 is the least frequent. For the three queried keywords, the inverted indexes are listed below: DB(w 1 ) � id 1 , id 4 , id 5 , DB(w 2 ) � id 1 , id 2 , id 4 , id 6 , and DB(w 3 ) � id 2 , id 4 , id 5 , id 6 .
Firstly, we compute the RP leakage component in Cash's OXT: As shown in Table 4, the RP leakage reveals 4 entries of the inverted index.
Lai's HXT protocol eliminates the "partial query" (KPRP) leakage, which is a part of RP leakage. It only has the leakage of whole result pattern (WRP). Actually, in HXT, WRP � ∩ 3 j�1 DB(w j ). Te HXT protocol reveals the exact result of the query, that is, only id4 { }. However, the HXT protocol leaks two entries w 2 and w 3 as shown in Table 4.
Our scheme HCV reveals the exact result id4 { }, which is ideal leakage. Besides, HCV reveals nothing about the entry. Te server only gets the encrypted keyword-tuples. Te leakage comparison is given in Table 5.

HCV Privacy-Preserving
(1) Outsourcing Data Is Privacy-Preserving. In HCV scheme, the outsourced data include a fle collection and an index database, both of which are encrypted. Te encrypted fles are secure because the server does nothing to them. Consider the encrypted index, and it includes two data structures, MM and CV data structures. For MM, we use a black-box MM encryption scheme, and the security is not discussed here. For the CV structure, as described in Section 5.1, it contains n encrypted elements represented as e(g, g) H(R t i ‖i)q , i � 1, 2, . . . , n, where t i ∈ 0, 1, 2 { }. Teorem 2 shows that the cloud server can get nothing related to t i .

Theorem 2.
If H is a secure one-way hash function, the cloud server can get nothing related to t i ∈ 0, 1, 2 { } from the encrypted CV database.
Proof. Now we prove the correctness of the theorem.
On the one hand, for each encrypted element e(g, g) H(R t i ‖i)q , to get the value of R t i , the server can inverse the function H or exhaust the value of R t i . However, H is a one-way function, it cannot be inverted. Meanwhile, R t i is a very large integer (i.e., 1024 bit). It is impossible currently to exhaust it. To guess the value, the cloud server only has 1/3 probability to get correct result for each element, as t i ∈ 0, 1, 2 { }. On the other hand, for diferent encrypted elements, e.g., e(g, g) H(R t i ‖i)q and e(g, g) H(R t j ‖j)q , if i ≠ j, the server cannot know whether t i and t j have the same value as H is a secure one-way hash function. Te cloud server gets nothing about t i (1≤i≤d) from the encrypted tag array [17].

□
(2) Conjunctive Query Is Privacy-Preserving. Te conjunctive query contains three steps: query request, query processing, and query response.
(1) -Setup Input: Security Parameter k, Inverted Index Output: K, EDB (2) Generate key K � K g , K l according to k; (3) Get the EMM as in CVX .Setup; (4) Compute CV database using Algorithm 1; (5) Encrypt CV as equation (15) to acquire ECV; (6) set K � K g , K l , EDB � EMM, ECV { }  (16), send stk l and E(V l ) to Server. (21) (c) Server do as follows: (22) set each element of P to integer 1 forl � 1:      In the query request, the query vector Q � (k 0 , k 1 , . . . , k d−1 ) is encrypted to be E(Q) � (E(k 0 ), E(k 1 ), . . . , E(k d−1 )), where for each k i , we have E(k i ) � g H(R k i ‖i)+p·r i . Te encryption guarantees k i is secure. Meanwhile, as each query uses a random value r i , the encryption is nondeterministic. Tat is, for two queries, the honest-but-curious server cannot determine whether they have the same queried keywords.
For the query processing, the server receives the query vector E(Q) � (E(k 0 ), E(k 1 ), . . . , E(k n−1 ) which is encrypted, calculates e(E(k i ), h), and matches the result with the ith element in the encrypted tag array. In this process, the server can only get information about the access pattern and search pattern, which our scheme allows.
For the query response, the search result is returned to the client by the cloud server. Te search result consists of some encrypted fles, and the encryption algorithm can be nondeterministic, e.g., AES-CBC. Tus, the adversary cannot correlate two queries even if they use the same queried keywords [17].

Experiment Evaluation
All experiments were run on Intel(R) Core(TM) i5-8400 CPU@2.80 GHz processor with 8 GB RAM. We use a commodity Windows 10 system. For CVX and ICV scheme, we implement our work in Python, while for HCV, we did the experiment in C++ for efciency. To show our solution's practicability, the data used in the experiment are from the NSF e-mail dataset. Te whole set contains 30799 keywords and 49078 fles. We sampled and selected some of them for the experiment.
We use the common hash algorithm HMAC-SHA1, and the encryption algorithm for CVX and ICV is AES-CBC. In both the CVX and ICV schemes, Setup, Token, and Search algorithms are contained. While Setup and Token algorithms are implemented locally by the client, Query is implemented by the server. We assume that the client has sufcient computing resources, so only the efciency of Query is considered here. In HCV scheme, we use the homomorphic encryption algorithm BGN and mainly experiment on the Setup performance. We generate the BGN parameters through type a1 pairing in PBC library, based on the curve y 2 � x 3 + x (the group order N is a 1024 bit number).
Firstly, we test the performance of CVX and ICV, and we compare them with Kamara's IEX. To perform a search, Kamara's scheme needs to perform q − 1 decryption and then compute the intersection of q − 1 sets. CVX only needs to perform q − 1)/2 decryption, and then do the intersection of q − 1)/2 vectors. Take q � 4 for example, and we implement 2 20 experiment. For each keyword w, we assumed that DB(w) represents the weight of w. Obviously, search efciency is closely related to keywords' weight. Te larger the keywords' weight, the less efcient the query operation.
For the sake of simplifcation, we use the mean value of all keywords' total weight as the metric to test the Query efciency. In the case of low weight, we compare the Search efciency of CVX and ICV with Kamara's scheme. Te experiment shows that Search efciency of CVX and ICV is not signifcantly diferent. However, both of them are dramatically better than Kamara's IEX. Te experimental results are shown in Figure 1. Te abscissa is the mean value of keyword weight, and the ordinate is the Query time in seconds. Te keywords sampled range from 756 to 779.
Ten, we fnd that when the weight mean value is larger, the CVX is not so efcient, so we proposed ICV. When the weight is larger, the advantage of ICV is obvious. Compared with CVX, ICV is more suitable for larger mean weight value. Figure 2 shows the experimental results. Te abscissa is the mean value of keyword weight, and the ordinate is the Query time in seconds. Te keywords sampled range from 203 to 537.
Secondly, we test the performance of HCV, which uses BGN homomorphism technology with high computational complexity to encrypt the index database. Terefore, we focus on the experiment of Setup which contains generating CV database and encrypting CV using BGN algorithm. We mainly tested the time complexity and storage complexity of Setup, both of which are closely related to the number of keywords and the weight of keywords. Our experiments are based on the sampled NSF dataset, which can be divided to two types. One contains about 512 keywords, and the other contains about 256 keywords. Our experiments mainly depict how the keyword number and weight afect the Setup efciency. When the number of keywords is about 512, we set the mean weight as 23/51/83/112. When the sampled keyword number is about 256, we set the mean weight as 53/ 76/95/116. Te experimental results are shown in Figure 3. Figure 3(a) describes the time complexity of HCV Setup process. Te horizontal axis is the mean weight, and the vertical axis is the time of Setup process. If we fx the number of the keywords, the storage complexity depends on the number of counter vectors, mainly correlated to the keywords' weight. Figure 3(b) describes the relationship between the number of counter vectors and the keywords' weight, which can depict storage complexity. Te horizontal axis is the mean weight, and the vertical axis is the number of the reserved counter vectors.

Conclusion
Tis paper proposed a novel CV data structure, which has been used in our proposed CVX, ICV, and HCV conjunctive query SE scheme. In CVX, the search efciency is greatly improved compared with Kamara's IEX. Furthermore, when the weight of the keywords is larger, the search efciency will decrease dramatically. So, we improved the CVX and proposed the ICV, which is more efcient but leaks more information than CVX. In HCV, we use the homomorphic encryption technology (BGN) to encrypt the CV data structure for the sake of resisting the RP information leakage. Security analysis shows that our scheme is secure, and performance evaluation also validates its efciency. However, homomorphism encryption is used in our scheme and causes large computation and heavy storage, so our scheme can only be used for small datasets. In future work, we will study the scalable scheme and consider more security properties.

Data Availability
Te data used to support the fndings of this study are available from the corresponding author upon request.

Conflicts of Interest
Te authors declare that there are no conficts of interest regarding the publication of this paper.