Efficient Searchable Symmetric Encryption Supporting Dynamic Multikeyword Ranked Search

Searchable symmetric encryption that supports dynamic multikeyword ranked search (SSE-DMKRS) has been intensively studied during recent years. Such a scheme allows data users to dynamically update documents and retrieve the most wanted documents efficiently. Previous schemes suffer from high computational costs since the time and space complexities of these schemes are linear with the size of the dictionary generated from the dataset. In this paper, by utilizing a shallow neural network model called “Word2vec” together with a balanced binary tree structure, we propose a highly efficient SSE-DMKRS scheme. )e “Word2vec” tool can effectively convert the documents and queries into a group of vectors whose dimensions are much smaller than the size of the dictionary. As a result, we can significantly reduce the related space and time cost. Moreover, with the use of the tree-based index, our scheme can achieve a sublinear search time and support dynamic operations like insertion and deletion. Both theoretical and experimental analyses demonstrate that the efficiency of our scheme surpasses any other schemes of the same kind, so that it has a wide application prospect in the real world.


Introduction
Nowadays, with the development of the network and virtualization technology, cloud computing technology has been developed rapidly. rough the cloud service, enterprises and individuals can obtain better computing and storage services at a lower cost. Since cloud servers are not entirely trusted, utilizing cloud services while maintaining data privacy is an essential concern. A straightforward way to address this issue is encrypting the data before outsourcing it to the cloud servers. However, this approach fails to meet the requirement of data retrieval since traditional encryption will scramble the original data, making the data inconvenient to utilize. In this scenario, the users have to download all the ciphertext data and decrypt them locally, which will bring huge transmission, storage, and computation overhead, which is not applicable in cloud environment.
Searchable encryption (SE) can support keyword search without decrypting the data, and thus it is very suitable to achieve the keyword search over ciphertext. Based on the SE scheme, data owners and authorized users share a secret key. Data owners can encrypt the sensitive data and upload them to the cloud server. If data users want to search the encrypted data, they can generate an encrypted trapdoor by using the query and the secret key. When the cloud server receives the trapdoor, it tests the trapdoor against the encrypted data without decrypting these data and returns the data related to the query to the users. e first searchable symmetric encryption (SSE) scheme was proposed by Song et al. [1]. is scheme can not only encrypt data, but also provide a search mechanism over the encrypted data. With the improvement of security and efficiency of SSE schemes, it has attracted the community attention. During recent years, researchers focused on how to construct solutions with complex query functions, such as multikeyword search [2][3][4][5][6][7], similarity search [8,9], and ranked search [10][11][12][13][14][15]. In particular, ranked search schemes can sort the query results according to the relevant degree between the documents and queries and only return the most related (top-k) documents. us, ranked schemes can significantly reduce the computation and storage costs. e primary ranked search schemes were proposed in [10,11], which only support a single-keyword search. e early SSE scheme supporting multikeyword ranked search was given by Cao et al. [12]. e score evaluation method used in their scheme is the inner product between the query and document vectors. In their scheme, since each document has its own vector representation, the search time is linear with the number of documents in the dataset, which will have a very high storage overhead for a big data environment. en, Sun et al. [13] gave a similar scheme with a better-than-linear search efficiency by using a tree-based index [16,17]. ey adopt the technique of term frequencyinverse document frequency (TF-IDF) to evaluate the score between the index and queries. To further improve the search efficiency and support dynamic update, Xia et al. proposed an efficient SSE scheme supporting dynamic multikeyword ranked search [14]. In their scheme, they construct a tree-based structure and propose a parallel search algorithm to accelerate the search process. Moreover, they also provide a dynamic update method to cope with the deletion and insertion of documents flexibly. Recently, by utilizing the Bloom filter [18], Guo et al. constructed an efficient SSE scheme supporting dynamic multikeyword ranked search [15] to further improve the efficiency of keywords search and index construction. Owing to the Bloom filter, the internal nodes in the index tree are not needed to be encrypted, and the dimension of the vectors in the internal nodes is also reduced. As a result, this scheme can achieve a better performance than the previous similar schemes.
Another kind of SE is called searchable public key encryption (SPE), which is established on the public key system. In SSE, the key for encrypting data is the same as the key for generating search trapdoor. By contrast, in SPE, the public key for encrypting data is open to public, while the secret key for generating search trapdoor is only given to the authorized data receivers. e very first SPE scheme supporting keyword search was introduced by Boneh et al., and it is so-called public key with keyword search (PEKS) [19]. However, their work only supports a single-keyword search. In order to support more expressive query, many SPE schemes [20][21][22][23] were proposed to realize advanced search, for example, conjunctive, disjunctive, and Boolean keyword search. By using a special hidden structure, Xu et al. proposed two SPE schemes supporting single-keyword search [24,25] whose search performance is very close to that of a practical SSE scheme. By converting an attribute-based encryption scheme, Han et al. proposed an SPE scheme which can control user's search permission according to an access control policy [26]. After this, Kai et al. proposed an SPE scheme achieving both Boolean keyword search and fine-grained search permission [27]. Sepehri et al. proposed a scalable proxy-based protocol for privacy-preserving queries, which allows authorized users to perform queries over data encrypted with different keys [28]. Later, by utilizing an El-Gamal elliptic curve encryption system, Sepehri et al. gave a similar scheme with better efficiency [29]. In order to improve search accuracy, Zhang et al. proposed an SPE scheme supporting semantic keywords search by adopting a method called "Word2vec" [30]. For the sake of brevity, we summarize some SPE and SSE schemes in Table 1, which describes the difference between our scheme and previous schemes.

Motivation.
e previous ranked search schemes in symmetric key setting are secure and somewhat efficient. However, the index building, trapdoor generation, and search time are all related to the size of the dictionary generated from the dataset, which is not suitable for the big data environment. According to the statistical information given in [20], we found that the vocabulary size in a dataset is commonly linear with O (10 6 ). erefore, it is necessary to construct a more efficient ranked search scheme. Motivated by this, in this paper, we aim to construct a novel SSE scheme supporting dynamic multikeyword ranked search (SSE-DMKRS) with high efficiency.

Contributions.
e main contributions are summarized as follows: (1) Based on "Word2vec" [31] technique, we propose a novel method which can change the documents and queries into vector representations. e dimension of the vector representation obtained by our method is nearly 10% of that in the previous SSE-DMKRS schemes [14,15]. (2) We propose an efficient index building algorithm which can create a balanced binary tree to index all the documents. e obtained index tree can achieve a sublinear search time and support dynamic update operations. (3) rough applying the secure k-nearest neighbour (KNN) scheme [32] to encrypt the index tree and the query, we propose an efficient SSE-DMKRS scheme.
In addition, we implement our scheme on a widely used data collection. e experiment results show that our scheme extremely reduces the time cost of index building, trapdoor generation, keywords search, and update without losing too much accuracy; e.g., the time cost of index building in our scheme is nearly 10% of that in the previous schemes. Meanwhile, the storage cost of encrypted index is also reduced greatly; e.g., the storage cost of the index in our scheme is nearly one percent of that in the previous schemes. In conclusion, compared to the previous SSE-DMKRS schemes [14,15], our scheme is very suitable for the mobile cloud environment in which the client device has limited computation and storage resources.

Organization.
is paper is organized as follows. In Section 2, we give a formal definition of the system model and threat model in our scheme and also introduce the tools we adopt in our scheme, which contains "Word2vec" and the vector space model. In Section 3, we present the construction of the search index tree and the SSE-DMKRS scheme. Besides, a detailed security analysis and update operations of our scheme are also given. eoretical and experimental analyses are given in Section 4. Section 5 gives the conclusion.

Preliminaries
In this section, we first give the framework of the system model and introduce the threat model adopted in our scheme.
en, we introduce some tools adopted in our schemes, including a famous term representation method in the field of natural language processing, e.g., "Word2vec," and the vector space model. Finally, we present the design goal of our scheme. In addition, the main notations used in this paper are summarized in Table 2.

System Model.
e system model contains three different roles: data owner, data user, and cloud server. e data owner outsources a group of documents F � {f 1 , f 2 , . . ., f n } to the cloud in ciphertext form C � {c 1 , c 2 , . . ., c n }. Moreover, the data owner also generates an encrypted searchable index for keywords search operation. For each query of an arbitrary keyword set Q, the data user computes a search trapdoor T Q of the query Q and sends it to the cloud server. Upon receiving T Q from the data user, the cloud server searches against the encrypted index and returns the candidate encrypted documents. After this, the data user decrypts the candidate documents and obtains the plaintext.
As illustrated in Figure 1, the architecture of the system model is formally described as follows: (1) Data Owner (DO). DO holds a group of documents F � {f 1 , f 2 , . . ., f n } and generates a secure searchable index I from F and an encrypted document collection C for F. en, DO uploads I and C to the cloud server and distributes the secret key to the authorized data users. Furthermore, DO needs to update the index and documents stored in the cloud server. (2) Data User (DU). Authorized DU can launch keywords query over the encrypted data by utilizing a trapdoor which is generated by using the secret key fetched from DO. Moreover, DU can decrypt the encrypted documents by utilizing the secret key. (3) Cloud Server (CS). CS stores the encrypted index I and documents C from DO. When CS receives the trapdoor for query Q from DU, CS executes keywords query over the index and returns the top-k most relevant encrypted documents associated with the query Q. Upon receiving the update information from DO, CS also performs update operation over the encrypted data. In addition, we assume that CS is "honest-but-curious," which is employed by many searchable encryption schemes [12,14,15]. is means that CS honestly and correctly executes the algorithms in our scheme. However, CS curiously infers and analyses the received data to obtain extra privacy information.

reat Model.
roughout the paper, we mainly utilize two threat models proposed by Cao et al. [12]: (1) Known Ciphertext Model. CS only knows the information of the encrypted index, ciphertext, and trapdoor. at is to say, CS can execute cipher-only attacks in this model. (2) Known Background Model. CS knows more information than the known ciphertext model, such as the statistical information inferred from the documents. By taking advantage of these pieces of statistical information, e.g., term frequency (TF) and inverse document frequency (IDF), CS can conduct statistical attack to verify whether certain keywords are in the query [33].

Design Goals.
As mentioned before, we aim to build a secure and efficient SSE-DMKRS scheme. e design goal of our scheme is described as follows: (1) Efficiency. e scheme aims to realize a sublinear search efficiency, and the time and space costs of index building and trapdoor generation are much less than those of the current schemes.  [6] Conjunctive keyword search - [8] Multikeyword fuzzy search - [11] Single-keyword ranked search - [12] Multikeyword ranked search - [14] Multikeyword ranked search Dynamic update Ours Multikeyword ranked search Semantic search and dynamic update SPE [22] Conjunctive and disjunctive keyword search - [23] Boolean keyword search - [25] Single-keyword search Fast search [27] Boolean keyword search Access control [29] Multikeyword search Data sharing [30] Multikeyword search Semantic search Security and Communication Networks 3 (2) Privacy Preserving. Similar with previous schemes [12,14,15], our scheme needs to prevent CS from learning extra privacy information, which is inferred from the documents, secure index, and queries. More precisely, the privacy requirement is listed as follows: • Index and Trapdoor Privacy. e plaintext information concealed in the index and the trapdoor cannot be leaked to CS. is information involves the keywords and the corresponding vector representation of each keyword. • Trapdoor Unlinkability. CS cannot determine whether two trapdoors are built from the same query.
• Keyword Privacy. CS cannot identify whether a specific keyword is in the trapdoor or index by analysing the search results and the statistical information of the documents.
(3) Dynamic. e scheme can efficiently support dynamic operations like documents insertion and deletion. Note that the efficiency of update operations in our scheme is better than the previous SSE-DMKRS schemes.

Word2Vec
. "Word2Vec" model is a shallow, two-layer neural network, which is used to convert words into a group of vector representations [31]. Under this model, each word in the document set is mapped to a vector, which can be used to calculate the similarity between words. For instance, Figure 2 shows that, through training a simple corpus, three words "dog," "fox," and "orange" are mapped to three vector representations, respectively. By utilizing these vectors, the similarity among these three words can be calculated. We can find that the similarity between "dog" and "fox" is more than that between "dog" and "orange" since "dog" and "fox" are animals. us, we can utilize "Word2Vec" to convert the keywords in a corpus into a group of vector representations and then apply these vectors to perform ranked search.

Advice on Equations.
Vector space model is a very popular method used in the field of information retrieval, usually along with the TF-IDF rule to realize the top-k search, where TF is term frequency and IDF is the inverse document frequency [34]. By utilizing the TF-IDF rule, the Matrices for encryption (encryption key).
Matrices for decryption (decryption key). N e number of semantic keywords associated with each dictionary's keyword. m e number of keywords in D. d e dimension of vector generated by using "Word2vec." k e number of files returned to the user.
Secret key

Data owners Data users
Encrypted e-mails

Encrypted index
Search results Trapdoor Semitrusted cloud server documents and queries can be represented as a group of vectors. ese vectors can be adopted in the top-k search over the ciphertext [12,14,15]. However, the dimension of these obtained vectors is linear with the number of words in the dataset, which is not efficient if the dataset has a lot of words. To address this issue, we will apply "Word2Vec" to present a novel keywords conversion method, which is described as follows: (1) rough applying "Word2Vec" to a corpus, we create a dictionary in which each keyword is associated with a vector representation.
After this, we set W i Note that the dimensions of W i and Q are very small, e.g., 200, which is significantly smaller than the number of words in the dataset. us, the proposed method is better than the previous method based on the TF-IDF rule. In addition, together with the vector space model mentioned above, we use the cosine measure to evaluate the relevance between the document and the query. e relevant evaluation function is defined in the next section.

Proposed Scheme
In this section, we first give the algorithms of the index tree building and the search algorithm on this tree. en, we give the concrete construction of our scheme and the dynamic update operations of our scheme. Finally, we give a detailed analysis of the security of our scheme.

Search Index Balanced Binary Tree.
In this section, we adopt a balanced binary tree to create the search index, which will be used in our main scheme. Inspired by the construction process in [14], the tree building and the search process for our scheme are described as follows.
3.1.1. Tree Building Process. Formally, the data structure of the tree node u is defined as where ID is the identity of the node u, u min ���→ and u max ���→ are the vector representations of the node u, P l and P r are pointers which point u's left and right children, respectively, and FID stores the identity of a document if u is a leaf node. Note that, compared with the previous index trees [12,14,15], the node in our tree has two vectors while it has only one vector in previous trees. e main reason is that the node vector in our tree has a negative number while the node vector in previous trees only contains positive number. For clarity, we give a simple example.
} be two vectors of leaf nodes A and B, respectively. For the previous index trees, the vector of the parent node C of these two leaf nodes is c the scores of the nodes A, B, and C are 0.2, 0.2, and −0.4, respectively. It is very important to note that the score of the parent node is less than the scores of its children, which causes the fact that these two leaf nodes will be ignored in the tree search process even if they should be considered.
In our index tree, let the dimensions of u min ���→ and u max ���→ be both d. e methods for constructing u min ���→ and u max ���→ are denoted by M 1 and M 2 , respectively, and given as follows: (1) M 1 : if the node u is a leaf node which is corresponding a file f, we create a vector u → for f by adopting the keywords conversion method mentioned in Section 2.5. en, we set u min ���→ � u → and if the node u is an internal node, the u min ���→ and u max ���→ are based on its children vectors. Let P l · u min ���→ and P l · u max ���→ be the two vectors of u's left child, and let P r · u min ���→ and P r · u max ���→ be the two vectors of u's right child.
Suppose that Min () and Max () are the functions of the minimum and maximum, respectively; u min ���→ is built as follows: And, u max ���→ is built as follows: We find that u max ���→ is built by utilizing the larger number of P l · u max ���→ and P r · u max ���→ , and u min ���→ is created by using the smaller number of P l · u min ���→ and P r · u min ���→ .
Dog Fox Orange Figure 2: A vector space representation of words shows that "dog" is closer to "fox" since they share more common attributes than "dog" and "orange." Security and Communication Networks 5 An illustration of the above methods is given in Figure 3. From Figure 3, let the node u be a leaf node, and let W be the keyword set of the file that u stores. By using the keyword conversion method, W is converted to be a vector u a. If the node u is an internal node, and the vectors of its children are P l · u min ���→ , P l · u max ���→ , P r · u min ���→ , and P r · u max ���→ and the vectors of the internal node are u min Based on the methods M 1 and M 2 , inspired by the tree building algorithm introduced in [14], our tree building algorithm is given in Algorithm 1. An example of the proposed index tree is given in Example 1 and Figure 4. In Algorithm 1, we use function GenID () to generate the unique identity ID for each node, and apply GenFID () to generate the unique file ID for each leaf node. Current-NodeSet contains a group of nodes having no parent node, which are needed to be processed. |CurrentNodeSet| is the number of nodes in CurrentNodeSet. If |CurrentNodeSet| is even, we assume that |CurrentNodeSet| � 2h; otherwise, we assume that |CurrentNodeSet| � 2h + 1, where h is a positive number. TempNodeSet is a set containing the newly generated nodes. Moreover, for each node u, if u is a leaf node, we use method M 1 We can utilize the above equation to evaluate which documents are the most related to the query. Moreover, we can verify that the score of the parent node is larger than its children's score. is property can significantly reduce the number of nodes which will be checked in the search process.
e search process is given in Algorithm 2. In Algorithm 2, we use RList to store the top-k files which have the k-largest relevance scores to the query.
e RList is initialized to be an empty list, and it is updated when finding a relevance file. e kth score is defined as the smallest relevance score in the current RList, which is initialized to be a very small integer. By using the kth score, we can accelerate the search process by ignoring some paths with low scores. In Example 1 and Figure 4

Example 1.
An example of an index tree and a search process on this tree is illustrated in Figure 4. In Figure 4, we show an index tree with F � {f 1 , f 2 , . . ., f 6 } in which the dimension of the vector for each node is 3. For each node u in the tree, the upper vector and lower vector are corresponding to u min ���→ and u max ���→ , respectively. In the tree building process, we first generate the leaf nodes from F and then create the internal nodes based on these leaf nodes. Moreover, Figure 4 also gives an illustration of the search process. In Figure 4, we set q We suppose that top-3 files will be returned to the data user. According to Algorithm 2, the search process begins with the root node r and calculates the score between the query Q and the two child nodes r 11 and r 12 of r by using equation (3). e calculation process is presented as follows: Because the score between r 11 and Q is higher than that between r 12 and Q, Algorithm 2 will traverse the subtree with r 11 as the root node and compute the score between the query Q and two child nodes of r 11 . Since the score between r 21 and Q is higher than that between r 22 and Q, Algorithm 2 will traverse the subtree with r 21 as the root node and add the leaf nodes f 1 , f 2 to the RList. After this, the subtree with r 22 as the root node will be traversed, and the leaf nodes f 3 and f 4 are reached. Since the number of files in RList is less than 3, f 3 is added to RList directly. For the file f 4 , since the number of files in RList equals 3 now, Algorithm 2 will compare the score between f 4 and Q to the minimum score in the RList. Because the score between f 4 and Q is smaller than the minimum score in the RList, f 4 is not added to the RList. At present, the subtree with r 11 as the root node has been traversed. Algorithm 2 will traverse the subtree with r 12 as the root node. As the score between r 12 and Q is smaller than the minimum score in the RList, which means that the score of all child nodes of r 12 is smaller than the minimum score in the RList (this property is described in Section 3.1.2), f 5 and f 6 will not be checked. erefore, Algorithm 2 outputs RList � {f 1 , f 2 , f 3 }.

Construction of SSE-DMKRS.
In this section, through combining the secure KNN algorithm [32] and the index tree building algorithm, we propose a concrete SSE-DMKRS scheme.
e SSE-DMKRS scheme consists of five algorithms. e algorithms KeyGen, DictionaryBuild, and IndexBuild are executed by the data owners, while the algorithms TrapdoorGen and Search are performed by the data users and the cloud server, respectively: and P l · u min ���→ , P l · u max ���→ , P r · u min ���→ and P r · u max ���→ are the vectors of its children.
Input: the document collection F � {f 1 , f 2 , . . ., f n }, a semantic dictionary D generated by applying "Word2Vec" to F. Output: the index tree T. (1) for each i ∈ [1, n] do: (2) Construct a leaf node u for f i , with u.ID � GenID (), u.P l � u.P r � NULL, u.FID � GenFID (f i ,), and generate u min ���→ and u max ���→ according to the method M 1 ; (3) Insert u to CurrentNodeSet; (4) end for (5) while |CurrentNodeSet| ≥ 1 do: (6) if |CurrentNodeSet| is even, i.e. 2h then: (7) for each pair of nodes u′ and u″ in CurrentNodeSet do: (8) Create a parent node u for u′ and u″, with u.ID � GenID (), u.P l � u′, u. P r � u″, u.FID � NULL, and set u min ���→ and u max ���→ according to the method M 2 ; (9) Insert u to TempNodeSet; (10) end for (11) else \\Suppose that |CurrentNodeSet| � 2h + 1 (12) for each pair of nodes u′ and u″ of the former 2h − 2 nodes in CurrentNodeSet do: (13) Create a parent node u for u′ and u″; (14) Insert u to TempNodeSet; (15) end for (16) Create a parent node u 1 for the (2h − 1)-th and (2h)-th nodes, and then generate a parent node u for the (2h + 1)-th node and u 1 ; (17) Insert u to TempNodeSet; (18) end if (19) Set CurrentNodeSet � TempNodeSet and clear TempNodeSet; (20) end while (21) return CurrentNodeSet; (22) \\Note that the CurrentNodeSet only contains one node which is the root of the index tree T. . ., f n }, the algorithm runs "Word2vec" to generate the dictionary D of F. In the dictionary D, each keyword is associated with a vector representation. Besides, each keyword is also corresponding with a set of semantically related keywords. (iii) IndexBuild (sk, F, D): given the document set F and the dictionary D for F, the algorithm first creates the index tree T by using the algorithm BuildIndexTree (F, D) (Algorithm 1). en, for each node u in the tree T, the algorithm generates two random vector for the vectors of u min ���→ and u max ���→ , respectively. More precisely, if are set as four random values under the constraints V u min . is process is expressed as the following equation: Finally, for each node u, it computes rough replacing the plaintext vectors u min ���→ and u max ���→ with the encrypted index I u , an encrypted index tree I T is created.
(iv) TrapdoorGen (sk, Q): given a query keyword set Q, the algorithm first extends Q to a new semantic keyword set Q′. e process is as follows: (a) It generates a new keyword set Q′, which is initialized to an empty set.  en, based on Q′, the TrapdoorGen algorithm generates a pair of vectors q min ���→ and q max ���→ by adopting the method M 3 .
After this, it generates two random vector pairs Q q min ′ , Q q min ″ and Q q max ′ , Q q max ″ for the vectors of q min ���→ and q max ���→ , respectively. is process is similar to the process in the IndexBuild algorithm and can be expressed as the following equations: Finally, this algorithm generates (v) Search (sk, T Q , I T ): for each node u in I T , the algorithm computes According to equation (3), the relevance score calculated from the encrypted vector I u and the trapdoor T Q equals the value of Score (u, Q). By using this property, the algorithm can utilize the SearchIndexTree algorithm (Algorithm 2) to perform ranked search.

Dynamic Update Operations.
Besides search operation, the proposed scheme also supports some dynamic operations, e.g., documents insertion and deletion, satisfying the requirement of real-world application. Because the proposed scheme is built over a balanced binary tree, the update operations are realized by modifying the nodes in the tree. Inspired by the update method introduced in [14,15], the update algorithm is presented as follows: (i) UpdateInfoGen (sk, T s , f i , Utype): this algorithm is executed by the data owners and generates the update information {I s , c i } to the cloud server, where T s is a set containing all the update nodes, I s is an encrypted form of T s , f i is the target document, c i is an encrypted form of f i , and Utype is the update type. In order to reduce the communication cost, the data owners will store the unencrypted index tree on its own device. For the Utype ∈ {Ins, Del}, the algorithm works as follows: (a) If Utype � "Del," it means that the algorithm will delete a document f i from the tree. e algorithm first finds the leaf node associated with the document f i and deletes it. In addition, internal nodes associated with this leaf node are also added to T s . Specifically, if the deletion operation will break the balance of the index tree, the algorithm can set the target leaf node as a fake node instead of removing it. After this, the algorithm encrypts T s to generate I s . Finally, the algorithm sends I s to the cloud server and sets c i as null.
Input: A vector q → of query Q, a semantic dictionary D generated by applying "Word2Vec" to F, a root node u of IndexTree and

RList.
Output: RList. (1) Split q → into q min ���→ and q max ���→ according to the method M 3 ; (2) if u is an internal node then: (3) if Score (u, Q) > k-th score then: (4) SearchIndexTree ( q → , D, u.P l , RList); (5) SearchIndexTree ( q → , D, u.P r , RList); (6) else: (7) return (8) end if (9) else (10) if Score (u, Q) > k-th score then: \\Update RList. (11) Delete the element holding the smallest relevance score in RList; (12) Insert a new element <Score (u, Q), u.FID> in the Rlist, and sort the elements in RList; (13) end if (14) return (15)  (b) If Utype � "Ins," it means that the algorithm will insert a document f i to the tree. e algorithm first creates a leaf node for f i according to the method M 1 introduced in Section 3.1 and inserts this leaf node to T s . en, based on the method M 2 , the algorithm updates the vectors of the internal nodes which are placed on the path from root to the new leaf node and inserts these internal nodes to T s . Here, the algorithm prefers to replace the fake leaf node with the new leaf node rather than insert a new leaf node. Finally, the algorithm encrypts T s and f i to generate I s and c i , respectively, and sends them to the cloud server.
(ii) Update (I T , C, I s , c i , Utype): this algorithm is executed by the cloud server to update the index tree I T with encrypted nodes set I s . After this, if Utype � "Del," then the algorithm removes c i from C. Otherwise, the algorithm inserts c i to C.
Note that after a period of insertion and deletion operations, the number of keywords in the dictionary should be changed. Because the dimensions of the index and trapdoor vectors in the previous schemes are linear with the number of keywords in the dictionary, these schemes have to rebuild the search index tree. By contrast, our scheme will not be affected by this problem. For the proposed scheme, the dimensions of the vectors in the index and trapdoor are determined by the tool of "Word2vec" and set by the users. For example, if we set the dimension of the vector as 200, the dimension of each keyword's vector is 200, and thus the dimensions of the vectors of u min ���→ , u max ���→ , q min ���→ , and q max ���→ are all 200. According to the above analysis, our scheme is more suitable for the update operations than the previous schemes.

Security Analysis.
In this section, we analyse the security of the proposed SSE-DMKRS scheme according to the privacy requirement introduced in Section 2.3: (1) Index and Trapdoor Privacy. In the proposed scheme, each node u in the index tree and the query Q in the trapdoor are encrypted by using the secure KNN algorithm introduced in [32]. us, the attackers cannot obtain the original vectors in the tree nodes and the query, which means that the index and trapdoor privacy are well protected.
(2) Trapdoor Unlinkability. In the trapdoor generation phase, the query vector will be split randomly. Moreover, the same keyword set Q will be extended to be multiple different semantic keyword sets Q′. So, the same query Q will be encrypted to be different trapdoors, which means that the goal of the trapdoor unlinkability is achieved. (3) Keyword Privacy. Since the index and the trapdoor are protected by the secure KNN algorithm, the adversary cannot infer the plaintext information from the index and the trapdoor under the known ciphertext model. Considering that the known background model is common in real-world applications, we will analyse the security of the proposed scheme under the known background model. For the TrapdoorGen algorithm, the original query keyword set Q is extended to a new set Q′. Specifically, for each keyword q in Q, randomly choosing a number k′, the algorithm chooses k′ semantic keywords related to q by utilizing the dictionary and inserts these keywords into the Q′. Suppose that each keyword is associated with N semantic keywords in the dictionary, each keyword can generate 2 N different keyword sets since each semantic keyword can be chosen or not. For example, if a keyword q is associated with three semantic keywords {q 1 , q 2 , q 3 }, then q can generate 2 3 keyword sets {q}, {q, q 1 }, {q, q 2 }, {q, q 3 }, {q, q 1 , q 2 }, {q, q 1 , q 3 }, {q, q 2 , q 3 }, and {q, q 1 , q 2 , q 3 }. Since the query Q usually contains more than one keyword, Q will generate more than 2 N different semantic keyword sets. According to this method, the final similarity score is obfuscated by these random semantic keyword sets. As the analysis in [14,15], our scheme can protect the keyword privacy under the known background model.

Proposed Scheme
In this section, we analyse the proposed SSE-DMKRS scheme theoretically and experimentally. A detailed experiment is given to demonstrate that our scheme can efficiently perform dynamic ranked keywords search over the encrypted data. Our experiment is run on Intel ® Core ™ i7 CPU at a 2.90 GHz processor and 16 GB memory size and is based on a real-world e-mail dataset called Enron e-mail dataset [35]. We mainly analyse the performance of our scheme in two aspects: (1) the efficiency of the proposed scheme including index building, trapdoor generation, search, and update; (2) the relationship between the search precision and the privacy level. Moreover, in order to show the advantages of our scheme, we also compare our scheme to two previous schemes related to our scheme. For simplicity, we denote these two schemes introduced in [14,15] by X15 and G19. Since the dimensions of each node's vector in X15 and G19 are both linear with the number of keywords in the dictionary (m), the time costs for index building in X15 and G19 are both O (nm 2 ). Due to d ≪ m, we can argue that the time cost for index building in our scheme is much less than that in X15 and G19. In addition, for the scheme G15, the internal nodes are constructed by the tool called bloom filter, and thus the dimension of each internal node's vector is linear with b. Since b is usually smaller than m, the index building time in G19 is less than that in X15. Figure 6(a) shows that the time cost for index building in our scheme is much less than that in X15 and G19. More precisely, when n � 1000, m � 20000, d � 1000, and b � 10000, the time consumption for index building in X15 and G19 is nearly 100∼200 times that in our scheme, respectively. As m increases, the advantages of our scheme will become even more significant.
In addition, because the index tree has O (n) nodes and each node holds two d-dimensional vectors, the space complexity of the index tree is O (nd). By contrast, the space complexities of the index tree in X15 and G19 are both O (nm). From Table 3, even if we set n � 1000, m � 20000, d � 1000, and b � 10000, the storage cost of the index tree in our scheme is still much less than that in X15 and G19.

Trapdoor Generation.
In our scheme, the query is converted to be two vectors q min ���→ and q max ���→ , whose dimensions are both d. e trapdoor generation process is to multiply these two vectors by the d × d matrices in the key. So, the time complexity of trapdoor generation in our scheme is O (d 2 ). By contrast, since the dimensions of query vectors in X15 and G19 are both m, the time complexities of trapdoor generation are both O (m 2 ). us, the time cost of trapdoor generation of our scheme is much less than that in X15 and G19. Particularly, from Figure 6(b), when n � 1000, m � 20000, and d � 1000, the time cost for trapdoor generation in our scheme is 1.5 ms, while that in G19 and X15 is 287 ms and 290 ms, respectively.

Search.
In the search process, if the relevance score of an internal node u and the query Q is less than the minimum relevance score of the current top-k documents, the subtree which uses node u as the root node will not be accessed.
us, not all of the nodes in the tree will be accessed during the search process. We suppose that there are θ leaf nodes that contain at least one keyword in the query Q. Since the height of the tree is O (log n) and the time complexity of the relevance score calculation is O (d), the time complexity of the search process is O (θ d · log n). For the scheme X15, because the time complexity of relevance score calculation is O (m), the time complexity of the search process is O (θm · log n) in X15. For the scheme G19, because each internal node contains a Bloom filter whose size is b and each leaf node involves a vector whose size is m, the time complexity of search process in G19 is O (θ(m + b · log n)). From Figure 6(c), when n � 1000, m � 20000, d � 1000, and b � 10000, the search time cost in our scheme is 36 ms, while that in G19 and X15 is 135 ms and 214 ms, respectively.

Update.
When the data owners want to insert or delete a document, they will not only insert or delete a leaf node, but also update O (log n) internal nodes. Since the encryption time for each node is O (d 2 ), the time complexity of an update operation is O (log n·d 2 ). For X15 scheme, because the encryption time for each node is O (m 2 ), the time complexity of an update operation is O (log n·m 2 ). For G19 scheme, because the internal nodes are based on the Bloom filter which is not encrypted, the time cost for updating the internal nodes can be ignored. us, the time complexity of update in G19 is O (m 2 ) since only the leaf node is encrypted. From Figure 6(d), when n � 1000, m � 20000, d � 1000, and b � 10000, the time cost for updating one document in our scheme is 16 ms, while that in X15 and G19 is 1020 ms and 107 ms, respectively.

Precision and Privacy.
e search precision of our scheme is affected by a group of semantic keywords related to the original index and query keywords. We measure our scheme by adopting a metric called "precision" defined in [12]. e metric of precision is defined as follows: where k′ is the number of real top-k documents in the retrieved k documents. In addition, the semantic keywords in the index and query keyword set will disturb the relevance score calculation in the search process, which makes it harder for adversaries to identify keywords in the index and trapdoor through the statistical information about the dataset. To measure the disturbance extent of the relevance score, we use the following equation called "rank privacy" introduced in [12] to quantify this obscureness: where r i is the rank number of the document i in the retrieved top-k documents and r i ′ is document i′s real rank number in the real ranked results.
We compare our scheme to the schemes of X15 and G19 in terms of "precision" and "rank privacy." Note that an important parameter in the previous two schemes is a standard deviation σ, which is utilized to adjust the relevance score for the dummy keywords. In the comparison, we set σ � 0.05, which is usually used in the previous schemes. Besides, in our scheme, we set the number of semantic keywords for each keyword in the dictionary is 100, and the dimension of each node's vector is 1000 (d � 1000). Based on these settings, the comparison is illustrated in Figure 7.
From Figure 7, as k grows from 10 to 50, the precision of our scheme decreases slightly from 59% to 55%, and the rank privacy increases slightly from 26% to 28%. For the schemes X15 and G19, the precision decreases and the rank privacy increases when k grows. is characteristic exists in all three schemes. Because the vector representations for the index tree and query in our scheme are compressed deeply, some statistical information in the index and the query will be lost.     us, the precision of our scheme is less than that in X15 and G19. However, the rank privacy in our scheme is accordingly more than that in X15 and G19.

Impact of the Dimension of Vector Representation.
e dimension of the vector representation (d) which we set in the "Word2vec" is an important parameter in our scheme. Next, we give the discussion of the impact of d for our scheme. e impact of d on the efficiency of our scheme is given in Figure 8. From Figure 8, we know that the time costs of index building, trapdoor generation, search, and update all increase when d grows. Besides, Figure 9 gives an illustration of the impact of d on the precision and rank privacy in our scheme. As d increases from 200 to 1000, the precision of our scheme increases slightly, while the rank privacy decreases gradually accordingly. ese phenomena are all consistent with our previous theoretical analysis. So, in the proposed scheme, data users can balance the efficiency and accuracy by adjusting the parameter d to satisfy the requirements of different applications.

Discussion.
From the experiment results, when n � 1000, m � 20000, d � 200, and b � 10000, the time cost of index building is 3 s, the generation time of a single trapdoor is 1.5 ms, and the search time is 36 ms, which are all much better than the previous schemes X15 and G19. Efficiency in our scheme demonstrates that our scheme is extremely suitable for practical applications, especially the mobile cloud setting in which the clients have limited computation and storage resources. e experiment result shows that the precision of our scheme is less than that in the previous two schemes, while the rank privacy is more than that in the previous schemes accordingly. In addition, by using the "Word2vec" method, the vector representations used in our scheme contain the semantic information of the documents and queries. Based on these facts, we argue that the proposed scheme is suitable for applications requiring similarity and semantic search, such as mobile recommendation system, mobile search engine, and online shopping system.

Conclusions
In this paper, by applying "Word2Vec" to construct the vector representations of the documents and queries and adopting the balanced binary tree to index the documents, we proposed a searchable symmetric encryption scheme supporting dynamic multikeyword ranked search. Compared with the previous schemes, our scheme can tremendously reduce the time costs of index building, trapdoor generation, search, and update. Moreover, the storage cost of the secure index is also reduced significantly. Considering that the precision of our scheme can be further improved, we will construct a more accurate scheme based on the recent information retrieval techniques in the future work.

Data Availability
e data used to support the findings of this study is available from the following website: Http://www.cs.cmu. edu/∼./enron/.

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the publication of this paper.