Secure kNN Computation and Integrity Assurance of Data Outsourcing in the Cloud

. As cloud computing has been popularized massively and rapidly, individuals and enterprises prefer outsourcing their databases to the cloud service provider (CSP) to save the expenditure for managing and maintaining the data. The outsourced databases are hosted, and query services are offered to clients by the CSP, whereas the CSP is not fully trusted. Consequently, the security shall be violated by multiple factors. Data privacy and query integrity are perceived as two major factors obstructing enterprises from outsourcing their databases. A novel scheme is proposed in this paper to effectuate 𝑘 -nearest neighbors (kNN) query and 𝑘𝑁𝑁 query authentication on an encrypted outsourced spatial database. An asymmetric scalar-product-preserving encryption scheme is elucidated, in which data points and query points are encrypted with diverse encryption keys, and the CSP can determine the distance relation between encrypted data points and query points. Furthermore, the similarity search tree is extended to build a novel verifiable SS-tree that supports efficient 𝑘𝑁𝑁 query and 𝑘𝑁𝑁 query verification. It is indicated from the security analysis and experiment results that our scheme not only maintains the confidentiality of outsourced confidential data and query points but also has a lower 𝑘𝑁𝑁 query processing and verification overhead than the MR-tree.


Introduction
As the spatial data resources have been developed by leaps and bounds, to be well geared into such transition, the enterprises are required to proliferate the resources of both the hardware and software resources and to recruit professionals to manage and maintain data.Accordingly, the data maintenance has been boomed overhead.On the other hand, cloud computing has become progressively popularized in recent years.This arises from their capability to offer scores of benefits, such as quick deployment, on-demand service, high scalability and cost reduction [1][2][3][4].A growing number of companies are currently being motivated to outsource their daily business, even their core business, to the cloud service provider to eliminate the investment in hardware and software and to reduce the costs to maintain data.Moreover, using the advantages of cloud computing, the end users can use on-line software applications and access the service at any time and any place [5].Outsourcing spatial database is progressively reflecting the trend of reality.Spatial data has scores of practical applications, such as environmental monitor, location-based services, flow control, etc.In the data outsourcing model, the cloud service provider (CSP) hosts the outsourced databases and provides query services for the clients, and the data owner loses the management and control of the outsourced data.Consequently, the confidentiality and security of data shall be violated.Data privacy and security problems count as the major factors obstructing the data owners from outsourcing their databases to the CSP [6,7].
Data encryption is perceived as the most frequently adopted approach to maintain data confidentiality.Merely the authorized parties can conduct decryption.It is noteworthy that the destination of outsourcing data is to draw on the strong computing power and high bandwidth of the cloud service provider to offer the rapid and efficient services to users.Yet, the traditional encryption approaches, such as DES and RSA, are primarily designed to encrypt the confidential data, the encrypted data cannot support efficient query and analysis as well as the original data.Many effective schemes [8][9][10][11] have been proposed to support how to execute queries on encrypted data.
With the exception of data privacy, query integrity, also known as query authentication, is deemed as another critical problem to be solved in the domain of data outsourcing.Since the CSP is not fully trusted, it can return incorrect or incomplete query results.Extra authentication information shall be offered to the client to ensure the correctness and completeness of query results without having to trust the CSP.Correctness bespeaks that the records in the results really exist in the owner's database and are not modified by any user.Completeness bespeaks that all the records that satisfy the query condition are included in the results.Query integrity, in particular, is crucial in the case of the results laying the foundation for critical decisions.
The -nearest neighbors (kNN) query is deemed as a crucial data analysis operation which can be used as an independent query or as a core module of data mining and has been applied in many practical applications, such as geospatial technology, location-based services, and pattern recognition.Recent studies [12][13][14][15][16] have proposed various techniques to support either  queries on encrypted data or  query authentication.However, both privacy protection and query authentication should be provided in an insecure cloud computing environment.Thus, we focus on the  query processing and  query authentication on an encrypted spatial dataset.In this paper, we introduce an asymmetric scalar-product-preserving encryption to encrypt confidential data points and query points, and then we propose an authenticated spatial index structure based on the SS-tree [17], called verifiable SS-tree (VSS-tree), for secure  query processing and  query authentication.Our main contributions are illuminated as follows: (1) We introduce an asymmetric scalar-product-preserving encryption through which the data owner encrypts confidential data and query points with diverse encryption keys.The cloud server can perform a  query on encrypted outsourced spatial database.
(2) We extend SS-tree [17] and propose a novel verifiable SS-tree (VSS-tree) for the  query processing and  query authentication.
(3) We perform a detailed security analysis and performance evaluation of our scheme.
The rest of this paper is organized as follows.The relevant work is reviewed in Section 2. The system model is proposed in Section 3. Section 4 specifies the encryption scheme.Section 5 elaborates on the VSS-tree.In Section 6, we perform security analysis of our scheme.In Section 7, the performance and experimental results are presented.Eventually, we conclude this paper in Section 8.

Related Work
The encryption approach, called "bucket-based," is proposed in [8,9].The domain of private data is subdivided into multiple disjoint ranges and each range is identified by a unique identifier.The cloud server performs a range query in the light of the identifiers and returns a super set of real result set.The client has to do extra processing to get the real results.Agrawal et al. [10] proposed an order-preserving encryption to support one-dimensional range query on encrypted data.The input data distribution is accordingly transformed into a user-specified target distribution.The encrypted data is kept in the same order as the original data, which simplifies the course to effectuate encrypted range query.Nevertheless, this scheme fails to resist known-plaintext attack [18].Oliveira and Zaiane [11] proposed a distance-preserving transformation (DPT) approach.DPT transforms an original data point  into a new point  + , where  is a  ×  matrix and  is a -dimensional vector.DPT ensures that the Euclidean distance between any two encrypted data points is equal to that between the corresponding original data points; that is, (, ) = ((), ()).However, DPT cannot resist level 2 and level 3 attacks [19].Man et al. [20] proposed a data transformation approach to maintain data confidentiality.
Using the transforming function, the data owner and user transform their original spatial data and query ranges into encrypted ones.The cloud server performs range queries on encrypted data.Chen et al. [21] proposed a random space encryption approach to support range query on encrypted data.The outsourced data and queries are encrypted on the trust agent.The cloud server indexes the encrypted data and executes queries on it.And yet, the trust agent may become the single point of failure and network bottleneck of the system.Kalnis et al. [22] adopted  anonymity to the outsourced data.The cloud server cannot distinguish a record from at least  − 1 records.Obviously, the cloud server fails to perform exact query in line with this scheme.Similarly, Chow et al. [23] adopted location anonymity to hide the real location of query points.Asymmetric scalarproduct-preserving encryption (ASPE) is proposed in [12] for secure  query on encrypted data.The outsourced data and query points are encrypted with diverse encryption keys, and the cloud server can execute a  query on encrypted data.However, ASPE assumes that the clients are fully trusted, which is unrealistic in real applications.The client can easily obtain the encryption key from his legal inputs and outputs.Optimized ASPE is proposed in [13,14] in which the clients are not trusted and only the data owner knows the encryption key.Data points and query points are extended to (2 + 2)-dimensional points in [13], which requires more than double computation overhead than that of computing original data.Paillier homomorphic encryption is used in [14] to keep the query points confidential to the data owner.Thus, the client has to provide more computing resources to encrypt and decrypt the query points.Digital signature chain mechanism is adopted in [24][25][26][27] for query authentication.Each record is signed with its immediate predecessor or successor record (attribute).The records and their corresponding signatures are stored together on CSP.When answering a query, the CSP returns the matched records along with their corresponding signatures to the client, and thereupon the client verifies the correctness and completeness of query results according to the corresponding signatures.However, the computational complexity of digital signature is too high.Even though using signature aggregation [28], the client has to provide more computing power for signature verification and modular multiplications.
Being different from the signature chain, Merkle [29] first proposed Merkle Hash Tree (MHT), a memory-based binary tree with authentication information, for one-dimensional equality query.The digest rather than signature, computed by a one-way and collision-resistant hash function, is associated with each node.Only one signature is computed on the top of MHT.Devanbu et al. [30] extended MHT to support onedimensional range query.Pang and Tan [31] extended MHT and proposed a verifiable B-tree (VB-tree).Data owner has to sign each record and node in the VB-tree, which results in high signature computation overhead.The VB-tree only guarantees the correctness of the query results.Li et al. [32] proposed Merkle B-Tree (MBT) to support disk-based query authentication.To response a range query, the CSP performs two depth-first traversals to find the leftmost and rightmost records of query results and build the verification object (VO).The VO includes the following information: the digests of entries contained in each visited internal node that do not overlap with the range; the query results along with the digests of the residual entries in the corresponding leaves; the leftmost and rightmost records of the query results for completeness verification.
In multidimensional database outsourcing, Cheng et al. [33] proposed verifiable KD-tree (VKD-tree) and verifiable R-tree (VR-tree), applying signature chain to the KD-tree and the R-tree to ensure the integrity of query results, respectively.Yang et al. [34] proposed Merkle R-tree (MR-tree) and Merkle R * -tree (MR * -tree) for query verification of spatial data.MR * -tree combines the concepts from MBT and R * -tree.The authentication information is combined with the R * -tree.Range query is performed by a depth-first traversal of MR *tree.The VO comprises all the data entries in the visited leaf nodes and the MBRs along with the corresponding digests of the sibling nodes pruned in the visited internal nodes.Yiu et al. [15] presented a framework for authenticating moving kNN queries using the safe region approach.
In other query authentication approaches, Xie et al. [35] proposed a probabilistic query integrity authentication scheme.The data owner inserts some fabricated records into the database and outsources them together to the CSP.The CSP cannot distinguish the fabricated records from the real ones.In response to a query, the CSP returns the query results (including the real and fabricated records) to the client.The client verifies the correctness and completeness of query results by verifying whether all the qualified fabricated records are included in the results.
Both privacy protection and query authentication are realized in [36][37][38].Wang et al. [36] applied duplicated encryption for query verification.Part of the original data are encrypted with two diverse encryption keys.The user transforms a query into two different queries according to the encryption schemes and probabilistically verifies the query integrity by checking the two query result sets.Shamir secret sharing is used in [37] for privacy protection and  multiparty computation. service providers collaboratively compute the aggregation results without gaining knowledge of intermediate results.The integrity of aggregate results is guaranteed by MHT.Pedersen commitment protocol and MHT are used in [38] for aggregation computation and query verification.

System Framework and Assumption
3.1.System Model. Figure 1 illustrates our system framework.It consists of three parts: data owner (DO), client, and cloud service provider (CSP).The data owner possesses two encryption keys.One is obtained from a trusted key distribution center, encompassing a private and a public key for signature.The other is generated by  himself for encrypting sensitive data and query points. outsources the encrypted database   to the .Whenever updates occur, the corresponding encrypted data and the new signature are forwarded to .The  hosts the outsourced database   and provides query services for the client.To process spatial query efficiently,  maintains an authenticated spatial index structure.For each incoming query, it initiates the search algorithm to find a  query results and builds the corresponding  for query authentication.To maintain the confidentiality of query points, a client transmits the processed query points to  for encryption.After receiving an encrypted query point, the client transmits it to  for a  query.Once receiving , the client extracts the query results and performs query authentication.In general, the clients are located at the edge of networks and possess low network bandwidth and computing power.The clients only trust the signature information that the  published.

System Assumption.
We assume that the privacy data are numeric and denoted by real numbers, like the position coordinates of navigation, minimum, maximum and amount, etc.Each multidimensional piece of data is denoted as a column vector.The  is semitrusted; thus, it can directly access the outsourced database   , fabricate or tamper with the data, and return a subset of real result set to save the computation power for providing paid services for more users.Simultaneously,  performs our scheme honestly.To maintain the confidentiality of query points, the client first transmits a processed query point to DO for encryption and thereupon extracts the encrypted query point and transmits it to  for a  query processing.The client verifies a  query results through  and the public key  published.Furthermore, the client is semitrusted, which may collude with  or other clients to recover the original data.Therefore, the encryption key owned by DO should not be revealed to the client and the CSP.
In summary, the attacks can be divided into three levels based on the knowledge the attackers can learn.
Level 1.The attacker only observes encrypted database and encrypted query points.This is known as ciphertext-only attack proposed in [18].
Level 2. With the exception of encrypted data, the attacker also knows part of the original plain data and some encryption information, such as the maximum, minimum, and data distribution of encrypted data.However, the attacker does not know the corresponding encrypted values of those plain data.This corresponds to known-sample attack [39].
Level 3. In addition to the knowledge obtained in level 2, the attacker observes a set of plain data and knows the corresponding ciphertext, and this is known as knownplaintext attack in cryptography [18].
It turns out to be evident that the knowledge of lower level that the attacker learns is a subset of what a higher-level attacker learns.If an encryption scheme can resist higherlevel attacks, it can also resist lower-level attacks.Since we usually capture known-sample attack in practical applications, we design our encryption scheme against knownsample attacks.
Based on these assumptions, we should preserve the confidentiality of outsourced sensitive data and query points and provide query integrity authentication for  queries.The details are as follows: (1) Data privacy: the confidential data should not be revealed to anyone else.Only encrypted data is outsourced to the CSP.
(2) Query privacy: query privacy bespeaks that a client's query points should be kept private to himself.Neither  nor  can obtain the plain query points.
(3) Key privacy: the existing research usually shares the key with the clients.The  can easily obtain the key from the colluded or compromised client to recover the original data.Therefore, these schemes have to assume the clients are fully trusted.In our system assumption, each part of our system is semitrusted, the encryption key owned by DO should not be disclosed to anyone else.The main symbols used are listed in Table 1.

Preliminary of ASPE.
The basic idea of ASPE [12] is the observation that the distance between database points is not necessary for a  query.According to (1) where ‖  ‖ 2 is the scalar product of point   with itself, which can be computed in advance and stored with the corresponding data together for  queries.Then, ASPE does not need to preserve ‖‖ 2 .For any two encrypted points    ,    ,    ⋅    = (  , ) ⋅ (  , ).The distance between them can be computed by It is easy to see from ( 1) and ( 2) that ASPE does not keep the scalar product   ⋅   to ensure that the CSP cannot compute the distance between any database points through (2).Moreover, the CSP is able to determine which data point is nearer to the query point  through (1).

Definition 1 (asymmetric scalar-product-preserving encryption (ASPE)). An encryption function 𝐸 is an ASPE if and only if it satisfies the following two conditions:
(1) For any point   and any query point ,   ⋅  = (  , ) ⋅ (, ).
As can be seen from Definition 1, data point and query point must be encrypted with diverse encryption keys to ensure that the encrypted value of any query point  is different from that of any data point  in , even if  = .
When encrypting a data point, ASPE randomly generates a ( + 1) × ( + 1) invertible matrix   as the encryption key and extends every data point  to a new ( + 1)-dimensional point p = (  , −0.5‖‖ 2 )  which is encrypted into   =    ⋅ p.When encrypting a query point , the client randomly selects a positive random number  and extends the query point  to a new ( + 1)-dimensional point q = (  , 1)  , and then he encrypts q into   =  −1  ⋅ q, where  −1  is the encryption key of query points.To determine whether an encrypted data point    is nearer to a query point   than    is, the  search algorithm checks whether (   −   )⋅  > 0: ( Since  is a positive random number, we can determine that  (  , ) −  (  , ) > 0 ⇐⇒  (  , ) >  (  , ) . (4)

kNN Query on ASPE.
As described in Section 4.1, the client is assumed to be fully trusted by the data owner, and the encryption key and configuration information are shared with the client.However, in a more practical scenario, a client may be compromised or colludes with the CSP so that the CSP can easily obtain the key and the private configuration to decrypt the encrypted data.One plausible approach is that the DO keeps the encryption key privately and performs a secure two-party computation protocol [29,30] with the clients.DO encrypts a processed query point q and only transmits the encrypted query point   to the client without disclosing the encryption key .However, the combination of ASPE and secure two-party computation remains unable to maintain the key confidentially [14].The encryption key shall be leaked to the others from legal outputs.The client can adequately choose enough query points  = ( 1 ,  2 , . . .,  +1 ) and obtain the corresponding encrypted query points   = (  1 ,   2 , . . .,   +1 ), and then the client obtains   =  −1 .Obviously, if  is an invertible matrix, the client can obtain  −1 =    −1 , by which the client can encrypt a new query point   =    −1 ( new , 1)  .Therefore, the encryption key and sensitive data are exposed to the attackers.

Enhanced ASPE.
We propose an enhanced ASPE (EASPE) that keeps the encryption key confidential to the clients.Being different from ASPE, it is hypothesized in this paper that the three parties in our system model are not trusted by each other.Therefore, the DO must keep the encryption key confidentially and the key cannot be obtained by anyone, while the client should keep the query points secret to the DO and the CSP.Our encryption scheme is similar to the approach proposed in [14].However, the scheme in [14] adopted Paillier homomorphic encryption to encrypt query points which burdened the client with more computation overhead.In our scheme, we apply a 1-out-of- oblivious transfer protocol [40] for query processing.A 1out-of- oblivious transfer protocol [40] is a protocol such that one party, Bob, has  inputs  1 , . . .,   and the other party, Alice, learns one of the inputs   for some 1 ≤  ≤  of her choice, without learning anything about the other inputs and without allowing Bob to learn anything about .A random matrix encompassing the processed inquiry point is generated by the client and is sent to the DO for encryption.
Before encrypting data points, several artificial columns are introduced to the data points and are associated with some nonce random numbers generated independently which allows the same points to be encrypted into diverse points.Likewise, the client adds the same number of artificial columns to a query point and then perturbs the query point with some random numbers generated independently.The client sends a mixed matrix , encompassing the extended query point and some random vectors generated randomly, to the DO for encryption.Eventually, DO perturbs  before matrix transformation so that the encrypted query points cannot reveal the key.
The outputs of   ,  in the data process stage are denoted by p , q, respectively. finishes the encryption of p and q and outputs the ciphertexts    and   in the data encryption stage.It is noteworthy that  cannot directly compute q while encrypting the query point; nobody except the client knows the original query point.To simplify the description, q is adopted to state our scheme in the first phase.Next, the two phases are elaborated on.
Data Processing.For each data point ,  first selects a positive integer  as system security parameter in advance.In point perturbation, two random vectors  of ( + 1) dimensions and  of  dimensions are generated by , taking up the encryption key and shared by all points in the Mathematical Problems in Engineering database.The permutation function   changes the sequence of the extended vector randomly.As the foregoing processing is effectuated, (5) is acquired.
For each query point , firstly a positive random  is selected, and a random vector  of  dimensions is created by the client, followed by the client's extension of  to q = ((  , 1), ) and transmitting q to the . generates a random vector  of  dimensions to perturb the last  dimensions of q .Accordingly ( 6) is acquired.
Encryption Phase. generates an invertible matrix  as the encryption key to encrypt p , such that    = p .For each query point ,  randomly generates a positive random number  to compute   =  −1 q .
The details of encryption process are as follows.
The first  columns of the other ( − 1) column vectors in   are generated randomly and are extended to ( + 1 + )-dimensional column vectors the same as the query point.The position  of column vector q is randomly selected from 1 to  and is only known to the client himself.The client transmits   to  for encryption.
(2) For each query point ,  randomly generates a random vector  of c dimensions to confuse the last c dimensions of q , and then he applies the permutation function   to obtain   (  ). randomly selects a random positive number  and computes matrix  =  −1   (  ).
(3) After obtaining , the client extracts the encrypted query point   , that is, the th column vector of matrix .

VSS-Tree
The simplest approach to find the results of a  query is to scan the entire database space.Yet, the query time and complexity are proportional to the data size and disk accesses, which usually cannot meet the needs of users.To improve the efficiency of spatial query, researchers build diverse spatial index structures, like R-tree [41], SS-tree [17], etc.In this section, we extend the SS-tree [17] with authentication information and build a verifiable SS-tree (VSS-tree) for  query processing and  query authentication.

VSS-Tree.
Being different from the R-tree and the R *tree, the similarity search tree (SS-tree) [17] applies bounding sphere rather than bounding rectangle for region shape.The SS-tree divides multidimensional points into isotropic neighbors.Due to the use of bounding sphere, the overlap area between regions is reduced, thereby improving  query efficiency.The structure of SS-tree is shown in Figure 2.
A verifiable SS-tree (VSS-tree) is built by extending the SStree with authentication information, and its structure is shown in Figure 3.The center of a bounding sphere is the centroid of the underlying points of its children.Compared with the R-tree, the SS-tree only spends nearly half storage.Because a bounding sphere can be denoted by a center and a radius, its storage cost is a multidimensional point plus an integer, while a rectangle is determined by the two points at the lower left and upper right corner, its storage is twice that of dimensions.This determines that the SS-tree has more fanout and lower height.The structure of leaf nodes is defined as follows: Leaf : ( 1 , . . .,   ) ( ≤  ≤ ),   : (, , ) , (10) Mathematical Problems in Engineering

Internal node
Root where  and  denote the minimum and maximum values of the entries in the leaf node, respectively.An entry of the leaf node is denoted as a triple (, , ), where  is a data point in database,  is the enclosing sphere of , and  is the hash value computed on the record that  points to.An internal node of VSS-tree is elucidated as follows: Node : ( 1 , . . .,   ) ( ≤  ≤ ) ,   : (, , , ℎ) , (11) where  indicates the minimum bounding sphere that encompasses all the regions of the th children, consisting of a center and a radius.The pointer  points to the th child.The variable  indicates the number of points contained in the subtree whose top is the child   .The hash value ℎ summarizes all the bounding spheres and their digests of the th child, that The center of a bounding sphere ( 1 ,  2 , . . .,   ) is computed according to where  ( C.Enlarge() (6) end if (7) if E.getProduct() ≤ .MaxDist and . in  then (8) result.add()(9) end if (10) if E.getProduct() ≤ .and E.id not in  then (11) Alarm the client (12) end if (13) if  is a symbol [ then (14) (, ℎℎ) = (VO, , ) (15) end if (16) if  is a pair (, ℎℎ) then (17) if .() ≤ .then (18) Alarm the client (19) end if (20) .() (21)  = ||ℎℎ (22) end if (23) if is a symbol ] then (24) (, ℎℎ()) (25) end if (26) end for Algorithm 2: .
Once receiving the  and , the client extracts the encrypted  query results from the  and performs query verification.Diverse from other approaches, the client obtains the maximum .and verifies whether it is less than the other scalar products not in the list  to check the completeness of the  results.The verification process is as follows: (1) The client obtains .from the list  and verifies that any scalar product in the  is less than or equal to .,while the other scalar products are greater than ..
(2) The client verifies that any scalar product between the bounding sphere in the pair (, ℎℎ) and the query point  is greater than ..
(3) The client checks whether the reconstructed hash ℎ root agrees with  root .
The  verification algorithm is shown in Algorithm 2.
The essence of the  verification algorithm is to reconstruct the VSS-tree by scanning .During the process of verification, the bounding sphere is enlarged gradually by encompassing the objects read from .Eventually, the algorithm reconstructs the bounding sphere and digest of root node, and the client validates whether the reconstructed ℎ root agrees with  root for query verification.

Security Analysis and Integrity Verification
6.1.Security Analysis.As described in Section 3, three parties are all semitrusted.In our scheme, the privacy issues of outsourced database   , query points, and encryption key are deliberated. can directly access the outsourced database.We need to ensure their confidentiality against .We consider data privacy together with query privacy against  under level-2 attack.

Theorem 2. EASPE is not distance-recoverable.
Proof.EASPE is an enhanced ASPE, its encryption key is {  , , , }, where the role of the invertible matrix  is applicable to the encryption key of ASPE. and  −1 are adopted by EASPE to encrypt data points in  and query points, respectively.As ASPE proves, our EASPE is also not distance-recoverable.

Theorem 3. EASPE is secure against level 2 attacks.
Proof.There are scores of types of level 2 attacks.According to the system security assumption, the following attacks are deliberated: distance-based inference attack, PCA, duplicate analysis, distribution analysis attack, and ICA-based attack.According to Theorem 2, EASPE is not distance-recoverable.Distance-based inference attack is obviously not feasible to our scheme.
Principal component analysis (PCA) has been proposed in [19] to match the correlations in the known data and the correlations in the encrypted data.Using the matched data, the attacker endeavors to reconstruct the entire original database.However, in EASPE, the values on each dimension of () are a linear combination of the values on all dimensions in the original database.EASPE adds  artificial columns and generates a random vector  to confuse the original data.Furthermore,  uses permutation function   to change the sequence of the extended data point randomly.It turns out to be evident that EASPE does not preserve the correlations among the original dimensions in the transformed space, and thus PCA is not applicable to EASPE.
Duplicate analysis [10] is applicable to the attribute whose domain is small, such as the day of the week or the day of the month.Through the analysis of observations on encrypted data, the attacker may determine the domain of original attribute.Duplicate analysis is value-based encryption, that is, the values in each dimension are encrypted individually.However, EASPE is a tuple-based encryption, and duplicate analysis is not applicable to EASPE.Similarly, distribution analysis attack exists for estimating from .Observations on the encrypted database may help an attacker to determine the plain data fall into intervals  1 ,  2 , . . .,   .This attack is value-based encryption and is not applicable to EASPE.
ICA-based attack [18,19] tries to recover the plain data  from the transformed data .The approach is based on the observation that the eigenvectors of  are computed by  left-multiplied by .Therefore, by estimating ∑  and ∑  and matching their eigenvectors, the attacker can produce M, an estimation of , and then data record   is estimated as x = M   .This attack is on the assumption that the known samples follow the same distribution with the original data.The matrix  must be orthogonal or full rank.However, we introduce one-time random vectors  and  for each data point and query point, respectively.Random vectors  and  are generated independently and privately kept by , and matrix  can be generated as an invertible but nonorthogonal matrix.Hence, EASPE can impede both ICA and deriving the transformation matrix .EASPE is therefore resilient to ICA-based Attacks.
To keep the query points confidential to , a positive number  is randomly selected and a random vector  of  dimensions is generated to extend a query point  to a ( + 1 + )-dimensional point q = ((, 1), ).And, then, 1-out-of- oblivious transfer protocol is used to generate a  ×  matrix   including the processed query point q and other ( − 1) random column vectors.The position  of column vector q is randomly selected in range from 1 to  and is only known to the client himself. cannot learn which one the client has chosen.

Theorem 4. The encryption key is kept confidentially against 𝐶𝑆𝑃 and clients.
Suppose that a client can transmit a few number of query points to  for encryption, and then the encryption key is derived from the correlation between plaintext and corresponding ciphertext.If we can keep the encryption key confidential to the clients.It turns out to be evident that the key is confidential to .Thus, we only need to prove that the encryption key is confidential to the clients.
Proof.A client transmits processed query points to  and interacts with the  during the query encryption stage.The encryption of query points is considered without applying permutation function in the first place. encrypts a processed query point q = ((  , 1),  + ) into   =  −1 q.The th dimension of   is    =  −1  * q; concretely, In (14), all the values of {,  The client cannot learn the correspondence of the dimensions of q and   * .In addition, the permutation can prevent the client from setting up equation (14).Obviously, it enhances the security of our scheme.In conclusion, the encryption key is kept private against  and the clients.

Integrity Verification.
Our scheme provides correctness and completeness verification for  queries.
Theorem 5.The correctness of  query results can be ensured by our scheme.
Proof.Suppose that there is one or more falsified or modified data points in the results.We note that VSS-tree is built from bottom to top.All data points in the database are involved in the construction of the root hash.As we know that the hash function is one-way and collision-resistant.The digest of any falsified or modified data must be different from the original one, and this change propagates from the leaf node to the root node which makes the reconstructed root digest ℎ root different from the original one and thus does not agree with  root .Therefore, the client can detect any falsified or modified data in the results.Theorem 6.The completeness of  query results is ensured by our scheme.
Proof.Suppose that a data point  in a leaf node   is one of a  query results, but  is not involved in the results.To make the reconstructed root hash ℎ root match  root ,  either comprises all the data entries in   or comprises the pair (, ℎℎ) of   .For the former, the client can determine that  is one of a  query results according to the verification algorithm and there exists at least one point in the results whose distance to  is farther than that of .For the latter, the client can detect that the scalar product between   and  is less than .,which means that   comprises one or more data points that are closer to the query result , but   is not visited by the search algorithm which can be detected during the verification process.

Experiment Evaluation
In this section, we mainly evaluate and compare the performance between DPT and our scheme.All programs are implemented in Java.Experiments are performed on an Intel Core i7-4790 3.6 GHz computer with 8 GB RAM running Windows 7. The block size is set as 2048 KB and the default value of security parameter  as 1.The experiments are conducted on both synthetic and real datasets.The random points generated in the synthetic database are uniformly distributed in a -dimensional space.The real dataset adopted  is the dataset "Shuttle" from the UCI repository, which comprises 58 K points and 9 dimensions.We run each experiment 100 times and take the average to show the performance of diverse schemes.We effectuate two experiments under diverse data cardinalities and dimensions in the synthetic database.In the first experiment, data cardinality is changed from 50 K to 500 K with a fixed dimension  = 6.In the second experiment, the dimensions are changed from 3 to 100 with a fixed data cardinality  = 100 K.The performance is evaluated from the following aspects: (1) data encryption; (2) construction and storage of the VSS-tree; (3)  query; (4) query verification.

Key Generation and Data Encryption.
As described in Section 4, the transition matrix used in EASPE is a ( + 1 + ) × ( + 1 + ) invertible matrix.In practical applications, the dimension  of spatial data is usually less than 100.In our experiments, we generate the encryption key only once, which takes less than 1 ms for diverse dimensions ranging from 3 to 100. Figure 5 illustrates the data encryption time on diverse data cardinalities.The encryption time includes: generating the encryption key and encrypting all the data points.The encryption time of the Shuttle dataset is shown in Table 2.
As can be seen from Figure 5, data encryption time is proportional to both data dimension and data cardinality.The encryption time of DPT is slightly shorter than that of EASPE in that EASPE performs ( + 1 + ) × ( + 1 + ) multiplications and ( + ) additions, while DPT performs ( × ) multiplications and (2 − 1) additions.As EASPE has  more dimensions than ASPE, the encryption time of EASPE is slightly larger than that of ASPE.

Construction and Storage Cost of VSS-Tree.
The storage cost of VSS-tree is indicated from Figure 6.The storage costs under all schemes are proportional to data dimension and data cardinality.Due to the added ( + 1) dimensions of EASPE, the storage cost of EASPE is larger than that of DPT and ASPE.Furthermore, the SS-tree only spends nearly half storage of that of the MR-tree as described in Section 5.1, the storage cost of the MR-tree is lager than that of the VSS-tree.
The build time of the VSS-tree is illuminated from Figure 7.The build time of the VSS-tree under both encryption schemes is proportional to both data cardinality and dimension.The build time of VSS-tree under EASPE is longer than that under DPT, this is because a d-dimensional data point is extended to a ( + 1 + )-dimensional data point in EASPE which makes the computation overhead greater than that under DPT.Eventually, it should be noted that the larger the parameter  we set, the longer the time required to build the VSS-tree.The build time under the MR-tree is shorter than that under the VSS-tree, the reason is that bounding rectangle requires only comparison operations between each dimension of point, while bounding sphere needs to compute the center and radius.
The fanouts of internal node are exhibited in Figure 8 under diverse encryption schemes.Since we add ( + 1) dimensions to each data point in EASPE, the fanout of VSStree based on it is slightly less than that based on DPT and ASPE.The fanouts under all schemes decrease with the increase of the dimensions.This is because the storage cost of a record increases as the dimension increases.Furthermore, bounding rectangle is used in MR-tree whose storage is twice that of dimensions, the fanout of the MR-tree is less than that of the VSS-tree.

kNN Query Cost.
We perform a  query on the VSS-tree and set  = 3. Figure 9 shows that the query processing time is proportional to both data dimension and data cardinality.The query efficiency under EASPE is higher than that under DPT.This is because the  search algorithm performs ( + 1 + ) multiplications and ( + ) additions to compute   ⋅   for each visited entry under EASPE, while Euclidean distance (, ) is computed in DPT, the  search algorithm performs  multiplications,  subtractions and ( − 1) additions for each visited entry.As described in Section 5.1, the overlap area and regions in the MR-tree are larger than those in the VSS-tree, and more nodes need to be accessed for a query.Thus, the query processing time based on the MR-tree is longer than that based on the VSS-tree.The size of  directly affects the server's response speed and network bandwidth resources.In our experiment,  contains multidimensional data points of the visited leaf nodes, the bounding spheres, and corresponding digests of nodes pruned.Figure 10 illustrates that the  size increases with data cardinality.Due to the use of bounding rectangle in the MR-tree, its  size is larger than that under the VSS-tree.
Once receiving , the client extracts the query results from it and validates the correctness and completeness of the  query results.The verification cost includes the  following: scanning , hash computation, scalar product computation and comparison, and signature verification.
The verification time is shown in Figure 11.We can see that the verification time is proportional to the data cardinality.The verification time under EASPE is shorter than that under DPT.The reason is that  verification algorithm computes   ⋅   under EASPE, while it computes (, ) under DPT.

Conclusion
In this paper, EASPE is firstly introduced to support secure  query.EASPE is not distance-recoverable and only preserves the scalar products between data points in database and query points.In addition, we proposed a verifiable spatial data index structure VSS-tree to improve  query efficiency and provide  query verification.The security analysis and experiment results show that EASPE can resist level 2 attacks; the cloud server can efficiently perform a  query on encrypted data points and query points.The encryption cost,  query cost, and verification cost can meet the practical requirement.
In the future, the actual application scenarios shall be considered that there are more than one data source or the outsourced databases distributed on diverse cloud service providers.The VSS-tree shall be extended to support query authentication with multiple data sources or distributed databases.

Table 1 :
Symbol list.,   Inverse and transpose of matrix  (  , ) The distance between   and  (   ,   ) The distance between    and     ,   Transposition of ,            Euclidean norm of point  (, ) Encryption function: Key is the encryption key   ⋅   Scalar product of (4) Query authentication: based on the SS-tree, we propose a novel authenticated spatial index structure for  queries and  query authentication.
1 ≤  ≤ ) is an index of its children,  is an index to the dimensions,   . indicates the th dimensional coordinate of   .,and   .indicates the number of its children of   .The radius of a bounding sphere is computed according to 1≤≤ (      . −   .     +   .), (13) where . indicates the center of the current node itself,   .and   .indicate the center and radius of the th child node, respectively, and ‖ −   .‖indicates the distance between the centers . and   ..
−1  , } are kept confidential to the client.The client only knows the original query point  and its corresponding encrypted query point   .Let   −−1 ).The client can set up an equation    =    +    .The client can obtain enough encrypted query points by his legal input or collusion with other clients.However, the invertible matrix  is generated randomly and {, } are one-time random parameters selected independently for each query point.   is entirely random to the client.Moreover, the client can learn nothing about   (1 ≤  ≤ ) from    .Furthermore, EASPE applies permutation function   to the query points.