Authenticated Location-Aware Publish / Subscribe Services in Untrusted Outsourced Environments

Location-aware publish/subscribe is an important location-based service based on server-initiatedmodel. Often times, the owner of massive spatio-textual messages and subscriptions outsources its location-aware publish/subscribe services to a third-party service provider, for example, cloud service provider, who is responsible for delivering messages to their relevant subscribers. The issue arising here is that the messages delivered by the service provider might be tailored for profit purposes, intentionally or not. Therefore, it is essential to develop mechanisms which allow subscribers to verify the correctness of the messages delivered by the service provider. In this paper, we study the problem of authenticating messages in outsourced location-aware publish/subscribe services. We propose an authenticated framework which not only can deliver the messages efficiently but also can make the subscribers’ authentication available with low cost. Extensive experiments on a real-world dataset demonstrate the effectiveness and efficiency of our proposed authenticated framework.


Introduction
With the rapid development of mobile Internet and positioning-enabled devices (e.g., smart phones), massive amount of data that contain both text information and geographical location information are being generated at an unprecedented scale on the Web.This enables location-based services (LBS), such as Foursquare (https://foursquare.com) and Yelp (https://www.yelp.com), to be extensively deployed in many systems and widely accepted by Internet users.Location-aware publish/subscribe is an important kind of service based on server-initiated model (relative to user-initiated model, like spatial-keyword query) in LBS.For example, in a Groupon system, subscribers register their spatio-textual subscriptions to capture their interests (e.g., "Adidas shoe discount at Beijing, China") (for the rest of this paper, we use "subscriber" and "subscription" interchangeably if the context is clear).For each Groupon message with textual description and location (e.g., "Adidas running shoes at cheap prices at Adidas factory store, Beijing, China"), the system delivers the message to relevant subscribers.
Since location-aware publish/subscribe is a computeintensive task, if the data owner of massive spatio-textual messages and subscriptions wants to efficiently deliver each message to relevant subscribers, to strengthen its ability of computing, it needs to build up basic IT infrastructure and hire specialized personnel.However, as such cost might be unaffordable for small-to-medium businesses, outsourcing the data and computations to a third-party service provider (e.g., a cloud service provider) has been an appealing option.Yet, this outsourcing model presents a great challenge that the messages delivered by the service provider might be incomplete or incorrect.There are a variety of reasons for this.First, the service provider might deliver tailored messages to favor its sponsors.Second, the service provider might use some inferior algorithms and deliver the suboptimal messages to the subscribers to save computing resources.Third, with the growing popularity of the cloud, more and more security breaches and attacks on such systems have been reported.In case an attacker takes control of the service provider's server, it may forge the messages for its own interest.The aforementioned reasons necessitate the development of mechanisms that allow subscribers to authenticate the messages delivered by the service provider.They should be verified in terms of two conditions: (1) soundness and (2) completeness.The former means that the messages are not tampered with, while the latter implies that no valid message is missing.
In this paper, to make one step further towards practical deployment of location-aware publish/subscribe in untrusted outsourcing environments, we study the problem of authenticating messages in outsourced location-aware publish/subscribe services.To address this problem, we present an authenticated location-aware publish/subscribe framework.We assume that messages are allowed a maximum delay Δ to be delivered to their corresponding subscribers.The data owner organizes the messages within Δ  (Δ  ≤ Δ) in an authenticated data structure (ADS) called TMR-tree.Based on the TMR-tree, the service provider first computes the relevant messages for each subscription.During this process, we present an inverted index pruning technique to reduce the times of inverted index (used to index the subscriptions) traversal, thus improving the efficiency of computing the relevant messages for each subscription.Then, the service provider constructs a verification object (VO) for each subscription and the corresponding subscriber can authenticate the messages delivered to it.A thorough experimental study on a real-world dataset is conducted over a wide range of workload settings to evaluate the effectiveness and efficiency of our proposed framework in terms of various performance metrics.
Roadmap.The rest of this paper is organized as follows.Section 2 introduces some preliminaries, which include system model, problem definition, and background knowledge.Section 3 presents our proposed authenticated location-aware publish/subscribe framework.In Section 4, we experimentally evaluate the performance of our proposed framework.Related work on the location-aware publish/subscribe and authenticated query processing is surveyed in Section 5.In the end, we conclude the paper in Section 6.

Preliminaries
In this section, we first describe our system model.Then, we define the problem studied in this paper.At last, we introduce some background knowledge on cryptographic primitives and location-aware publish/subscribe which underlie our proposed framework.

System Model.
As shown in Figure 1, our system involves four entities: the data owner, the service provider, the subscribers, and the key distribution center (KDC).
First, the data owner builds an authenticated data structure (ADS) over the messages within Δ  (Δ  ≤ Δ; recall that Δ is a predefined maximum permissible delivery delay) and signs the ADS using the private key distributed by the KDC.Then, the data owner outsources the locationaware publish/subscribe services to the service provider, who provides the storage resources for the messages, the ADS, the signature of the ADS, and algorithms.Based on the ADS, the service provider finds the messages which are relevant to the registered subscriptions and constructs a verification object (VO) for each subscription.After that, the service provider delivers the messages and the VO to corresponding subscribers.The subscribers authenticate the soundness and completeness of these messages using the VO and the public key distributed by the KDC.
Throughout this paper, we assume that (1) the KDC and the data owner are trusted but the service provider is the potential adversary and might fabricate the messages (intentionally or not); (2) the KDC or the data owner does not collude with the service provider; (3) the computation and storage capacities of the service provider are polynomially bounded.

Problem Definition.
In this paper, we study the problem of authenticating messages in outsourced location-aware publish/subscribe services.That is, the subscribers register their interests as subscriptions in the system first.Then, the service provider not only needs to efficiently deliver the messages within Δ  to the relevant subscribers whose subscriptions have high relevancy to the messages, but also needs to construct a VO for each subscriber to allow them to authenticate the soundness and completeness of the delivered messages.The VO should be constructed as small as possible for minimizing the communication cost between the service provider and subscribers.Meanwhile, the VO should be suitable for subscribers' authentication for minimizing the computational cost at the subscribers side.

Cryptographic Primitives.
We present the essential cryptographic primitives on one-way hash function, cryptographic signature, and Merkle hash tree as follows.
One-Way Hash Function.A one-way hash function ℎ(⋅) maps a message  of arbitrary length to a fixed-length output H().It works in one direction.It is easy to compute H() for a message .However, it is computationally infeasible to find a message that maps to a given H(⋅).
Cryptographic Signature.A cryptographic signature (or simply signature) is a mathematical scheme for demonstrating the authenticity of a digital message.A signer applies for a pair of private key and public key from the KDC.The former is kept by the signer secretly and the latter is publicly distributed.A digital message can be signed using the private key.The authenticity of the message can be verified by anyone who receives this message using the public key.
Merkle Hash Tree.The Merkle hash tree (MHT) [1] is an authenticated data structure used for collectively authenticating a set of messages.The MHT is a binary tree and built in a bottom-up manner, by first computing the hash values of the messages in leaf nodes.The hash value of each internal node is derived from its two children nodes.Finally, the hash value of the root is signed by the owner of the messages.The MHT can be used to authenticate any subset of messages, in conjunction with a proof.The proof consists of the signed root and sibling nodes (auxiliary hash values) on the path from the root down to the messages which need to be authenticated.

2.3.2.
Location-Aware Publish/Subscribe.We present the state-of-the-art method [2] for location-aware publish/subscribe as follows.
A location-aware publish/subscribe service delivers each message, denoted by  = (.,.), to its relevant subscribers who register spatio-textual subscriptions (each subscription is denoted by  = (.,.)) to capture their interests.. (.) is a spatial location with the latitude and longitude.. (.) is a set of keywords { 1 ,  2 , . . .,  |.| } ({ 1 ,  2 , . . .,  |.| }) and each keyword is associated with a weight (  ) which can be set as the inverted document frequency (IDF) of the keyword.To quantify the relevancy between a subscription and a message, [2] used a spatio-textual similarity function SIM (, ) =  ⋅ TSIM (, ) + (1 − ) ⋅ SSIM (, ) , (1) where is a textual similarity function which is similar to the weighted Jaccard coefficient and is a spatial similarity function, where DIST(.,.) is the Euclidian distance between . and ., and maxDIST is the maximum user-tolerated Euclidian distance between subscriptions and messages (which can be set as the maximum distance between subscriptions). is a preference parameter to tune the weight of textual and spatial similarity.
A subscription  and a message  are called relevant if their similarity exceeds a threshold .Since subscribers usually have different preferences and requirements on  and  (e.g., some subscribers prefer highly relevant results while some subscribers want to get more results), subscribers are allowed to parameterize their parameters  and .Therefore, a parameterized spatio-textual subscription can be redefined as  = (.,., ., .).Figure 2 shows an example of 11 parameterized spatio-textual subscriptions and 7 messages.
To deliver messages to relevant subscribers efficiently, [2] proposed a spatial-oriented prefix to prune irrelevant subscriptions and devised a filter-verification framework.In particular, with respect to the textual filter, [2] claimed that if a subscription  is relevant to a message , they must share at least one common keyword in the so-called prefix of , which is computed from the textual similarity threshold.More specifically, based on (1), given a subscription , since the spatial similarity cannot exceed 1, [2]   When . T ≤ 0, a message  may be relevant to  no matter whether they share common keywords.To address this issue, for a subscription , if . ≤ 1 − ., [2] introduced a virtual dummy keyword " * " with weight of 0 (i.e., ( * ) = 0), and the prefix of  includes its keywords and " * ".
Regarding the spatial filter, based on the first match keyword (denoted by   ) between SIG() and  (i.e.,  does not contain keywords before   in SIG()), [2] estimated an upper textual similarity bound of  to  as follows: Accordingly, [2] estimated a lower spatial similarity bound between  and  as follows: For any message, if its spatial similarity to  is smaller than the lower spatial similarity bound LSB( |   ), the subscription  can be safely pruned.Since given a subscription  and a message , we do not know which keyword is their first match keyword (if they have), and the first match keywords for different messages to the subscription are different, for each keyword  in SIG(), and [2] computed the lower spatial similarity bound LSB( | ).This prefix of each subscription  with lower spatial similarity bound is called spatial-oriented prefix.If subscription  is relevant to message , there must exist a keyword  in SIG() ∩  such that SSIM(, ) ≥ LSB( | ).
Based on the spatial-oriented prefix, [2] devised a filterverification framework.In particular, an inverted index is built on the spatial-oriented prefixes first.Then, in the filter phase, for each message keyword , the framework retrieves the inverted list L() of  and for each subscription In the verification phase, based on (1), the framework verifies whether each candidate  is an answer, and if yes, the message  is delivered to .

Authenticated Location-Aware Publish/Subscribe Framework
In this section, we present our proposed authenticated location-aware publish/subscribe framework.In the publish/subscribe scenario, the messages delivered to the subscribers need to be verified as correct or not (i.e., soundness and completeness).However, compared with the subscriptions data, the messages data set is infinite, which can be regarded as the stream data.In such a situation, we (actually the data owner in the practical framework) cannot construct an authenticated data structure (ADS) over the infinite messages data and, based on such a structure, construct the VO for subscribers' authentication.Therefore, intuitively, we need to sign every coming message and when the signed message is delivered to its corresponding subscribers, they can authenticate this message.However, when many messages need to be delivered to only one subscriber (the subscriber registers many interests in the framework), since every message has a signature, the communication cost between the service provider and this subscriber is high.Moreover, since the decryption of the signature is not a cheap operation, the authentication cost at the subscriber is also high.To tackle this problem, we present an authenticated location-aware publish/subscribe framework, which not only can deliver the messages more efficiently than the framework in the existing work [2], but also can make the subscribers' authentication available with low communication and authentication cost.
The main idea of our framework is to assume that the messages are allowed a maximum delay Δ to be delivered  to their corresponding subscribers.Under this circumstance, a batch of messages, rather than only one message, can be computed at a time.We organize these messages in a Merkle hash tree (MHT) like structure (i.e., the ADS).When more than one message is delivered to a subscriber, only one signature is returned, thereby reducing the communication and authentication cost.Moreover, recall that, in [2], an inverted index of spatial-oriented prefixes of all the subscriptions is constructed and when a message comes, the framework retrieves the inverted index to compute which subscription is relevant to this message.The message needs to be computed with every subscription in L() of every message keyword .To reduce the computational cost and improve the efficiency of messages delivery, we present an inverted index pruning technique.By using the constraint of the spatial proximity between these messages (the messages are also organized in an R-tree like structure), we can prune some subscriptions which cannot become the delivery destinations from the inverted index and thus they need not be involved in the further computation.

Text-Aware Merkle R-Tree (TMR-Tree).
We first introduce the method of constructing the ADS, called Text-aware Merkle R-tree (TMR-tree), at the data owner side.Consider a predefined maximum permissible delivery delay Δ.The data owner builds one TMR-tree on all the messages within every time interval Δ  (Δ  ≤ Δ).Specifically, the TMR-tree has four main features: (i) The messages in Δ  are spatially organized in an Rtree.(ii) Each node has a pseudo-text which includes the union of the keywords in its children's texts.A node  with children V and  has pseudo-text . = V. ∪ ..
(iii) Similar to the MHT, the TMR-tree stores one hash value in each node.Assume the default fanout of the TMR-tree is 2. A leaf node  with children (messages) V and  stores hash value H() = ℎ(V | ).An internal node  with children V and  stores hash value , where H(V) and H() are the hash values of V and .More specifically, the spatial and textual information of V () are both involved in the computation, that is, if  is an internal node in the TMR-tree.(iv) The hash value of the root of the TMR-tree is signed by the data owner, producing signature S.
Example 1. Figure 3 shows an example of the TMR-tree constructed over the messages in Figure 2.Here the fanout of the TMR-tree is set as 2. The leaf node  5 has a pseudotext { 5 ,

Filter-Verification
Framework with Inverted Index Pruning.We use the idea of filter-verification framework proposed in [2] and, based on this, present an inverted index pruning technique to reduce the computational cost, thereby improving the efficiency of messages delivery.Since a location and a pseudo-text are associated with a node in the TMR-tree, each node in the TMR-tree can be treated as a dummy message.The main idea of our inverted index pruning technique is based on the following proposition.
Algorithm 1 shows the pseudo-code of our framework with the inverted index pruning technique.It takes the TMRtree and L as input.For each node  and each message  in leaf nodes in the TMR-tree, we use .L (.L) to denote the inverted index where the subscriptions in .L (.L) are likely to be relevant to  (), which is pruned from L of 's ('s) parent.We let .L (.L) denote L of 's ('s) parent.Thus, L of each node (or message) is pruned from its L, which is its parent's L.In the beginning, we set the root's L (i.e., TMR-tree.Root.L) as L, which is the inverted index of spatial-oriented prefixes of all the subscriptions without any pruning (line (1)).Then, we initialize an empty stack  and push the root into it (lines (2)-( 3)).In the filter phase, every element in  is computed until  is empty (lines ( 4)-( 11)).First, we pop an element  from  (line ( 5)).Then, for each keyword  i in ., we retrieve the inverted list .L(  ) of   and for each subscription   in .L(  ), if SSIM(  , ) ≥ LSB(  |   ),   is added to .L(  ) (lines ( 6)-( 9)).Other subscriptions which do not satisfy the condition in line (8) are pruned from .L.If .L ̸ = 0 and  is not a message, that is, there exist some subscriptions in .L which might be relevant to 's children, 's children are put into stack  (lines (10)-( 11)).The subscriptions in each   .L are the candidates which might be relevant to   .In the verification phase, for each message   , we verify whether each candidate   in   .L is the answer of   and if yes,   is added to the answer set of   , that is, A  (lines ( 12)-( 15)).After computing all the messages in the TMR-tree, all the answer sets (A 1 , A 2 , . . ., A  ) are together returned (line ( 16)).Here we assume that there are  messages that come within Δ  .Each subscription in A  is the delivery destination of the message   .Example 3. Figure 4 shows an example of procedures of our proposed inverted index pruning technique in filterverification framework.In step A, the root is popped from stack  first.Then, for each keyword in .( 1 to  5 and * ), we retrieve the corresponding inverted list in .L to compute the spatial similarity between the root and each subscription in this inverted list.Take subscription  9 as an example, we retrieve .L( 1 ) and compute the spatial similarity between the root and  9 : SSIM( 9 , ), which equals 1.Since SSIM( 9 , ) = 1 is greater than LSB( 9 |  1 ) = 0.60,  9 is added to .L( 1 ).Notice that the values of spatial similarity between the root and  0 ,  1 ,  9 in .L( 4 ) and  3 ,  7 ,  8 in .L( 5 ) are smaller than the values of these subscriptions' LSB( | ); that is, they do not satisfy the condition in line (8) in Algorithm 1.Therefore, they are pruned from .L and are not added to .L.Then, since .L ̸ = 0, its children  1 and  2 are pushed into stack .Similarly, in step B,  9 in  2 .L( 1 ),  9 in  2 .L( 2 ),  5 in  2 .L( 3 ),  2 ,  8 in  2 .L( 4 ), and  4 in  2 .L( 5 ) are pruned from  2 .L.Note that, in step C, since  6 .does not contain the keywords  3 and  5 , we need not compute the spatial similarity between  6 and each subscription in inverted lists  6 .L( 3 ) and  6 .L( 5 ) (indicated by "###" in the example), and these subscriptions are also pruned from  6 .L.At last, in step D, the candidates which might be relevant to  6 are  0 ,  4 ,  5 ,  6 , and  8 .Compared with all the subscriptions in the unpruned inverted index L, the computational cost is reduced dramatically.Time Complexity.For the convenience of comparison between the state-of-the-art method for location-aware publish/subscribe and our proposed filter-verification framework with inverted index pruning technique, we first give the time complexity of filter-verification framework proposed in [2] as the following proposition.

Proposition 4. The time complexity of delivering one message
to its relevant subscriber  in filter-verification framework proposed in [2] is O(∑ ∈.|L()| + |.|),where L is the inverted index built on the spatial-oriented prefixes.
Proof.In the filter phase, for each keyword  in . (including the dummy keyword " * "), the framework retrieves the inverted list L() of  and for each subscription  in L(), if SSIM(, ) ≥ LSB( | ),  is a candidate to the message  and we add it into the candidate set.Therefore, the time complexity of filtering is O(∑ ∈.|L()|).
In the verification phase, based on the spatio-textual similarity function (see ( time complexity of delivering one message  to its relevant subscriber  in filter-verification framework proposed in [2] is O(∑ ∈.|L()| + |.|).
The time complexity of our proposed filter-verification framework with inverted index pruning technique is given by the following proposition.

Proposition 5. The time complexity of delivering 𝑀 messages that come within Δ𝑡 󸀠 to their relevant subscribers in our proposed filter-verification framework with inverted index pruning technique is O(∑
where  is the fanout of the TMR-tree,   is a node in the TMR-tree (also can be treated as a dummy message), and L is the inverted index associated with   's parent in which the subscriptions are likely to be relevant to   's parent.
Proof.A TMR-tree is constructed over the  messages that come within Δ  .If the fanout of the TMR-tree is , in the worst case, the height of the TMR-tree (excluding the layer of messages) is ℎ = log   .Thus, the number of internal and leaf nodes in the TMR-tree (assuming the root has depth 1) is In the filter phase, when visiting a node  (or a message ) in the TMR-tree, we retrieve its L, prune .L (or .L), and generate .L (or .L).Therefore, the time complexity of filtering with inverted index pruning is O(∑ In the verification phase, suppose each message  within Δ  is delivered to only one subscriber.The time complexity of verifying whether  subscriptions are the answers of messages is O(∑  =1 |  .|).Therefore, the time complexity of delivering  messages that come within Δ  to their relevant subscribers in our proposed filterverification framework with inverted index pruning technique is O(∑ Compared with the filter-verification framework proposed in [2], our proposed filter-verification framework with inverted index pruning technique needs to visit more inverted indexes (the inverted indexes associated with internal and leaf nodes in the TMR-tree).However, since the subscriptions in each node's inverted index are constantly pruned from the root to the leaf nodes in the TMR-tree, the total times of inverted index traversal are reduced.Therefore, our proposed filter-verification framework with inverted index pruning technique can be considered efficient, which also can be demonstrated from our experimental study (Section 4).

VO Construction and Authentication.
After finding the subscribers who are the delivery destinations of messages, that is, A  , the service provider still needs to construct a VO for each subscriber for their authentication.Algorithm 2 shows the pseudo-code of constructing the VO.It takes the TMR-tree and answers of each message   (A  ) as input.First, we initialize a VO for each subscription in A 1 ∪A 2 ∪ ⋅ ⋅ ⋅ ∪ A  (VO  ) with the root of the TMR-tree (lines (1)-( 2)).Then, we initialize an empty queue  and put the root into it (lines (3)-( 4)).Every element in  is computed until  is empty (lines ( 5)-( 12)).When the distance between a message   and the picked element  (from ) is smaller than 0, that is,   is in the subtree rooted at , for each subscription   in A  , we replace  in VO  with three parts: (1) the token "["; (2) 's children; and (3) the token "]" (lines ( 6)-( 10)).Note that we use a pair of tokens "[" and "]" to indicate the scope of the entries in .Then, if 's children are not in  and they are not messages, they are put into  (lines ( 11)-( 12)).At last, the constructed VO  is delivered to each subscriber   with the corresponding messages (line ( 13)).Here we assume that there are  subscribers to whom the messages will be delivered.
Example 6.Following the example in Figure 2, after computing the delivery destinations of messages  1 to  7 , we obtain their answer sets as follows: A 1 : To authenticate the soundness of delivered messages, each subscriber   needs to scan their VO  to recompute the hash value of the root of the TMR-tree and compare it against the root signature using the data owner's public key distributed by the KDC.Since each VO  includes the entries which have been visited during messages delivery, the subscriber can simulate the procedure of the TMR-tree traversal and recursively reconstruct each MBR and compute its hash value in a bottom-up manner.Specifically, each MBR and its hash value can be computed from the entries in its child node which are indicated by "[" and "]".
To authenticate the completeness of delivered messages, the subscriber   needs to check that each message in results is indeed present in VO  and whether they satisfy the parameters  and .What is more, the subscriber still needs to check that the other entries returned in the VO  do not satisfy  and .
Example 7. Still taking  6 as an example, the subscriber can recursively reconstruct  3 from  1 and  2 ,  1 from  3 and  4 ,  6 from  6 and  7 ,  2 from  5 and  6 , and at last the root from  1 and  2 and compute its hash value to compare it against the root signature to authenticate the soundness of delivered messages  1 ,  2 ,  6 , and  7 .As for authenticating the completeness of  1 ,  2 ,  6 , and  7 , the subscriber needs to recompute whether they satisfy  6 .= 0.2 and  6 .= 0.7, while  4 and  5 do not.
From the example we can see that when more than one message is delivered to a subscriber, only one signature is returned, thus reducing the communication and authentication cost.
Space and Time Complexity.We first give a baseline method for the problem of authenticating messages in outsourced location-aware publish/subscribe services.Then, we give the space complexity of its VO and compare it with our proposed authenticated location-aware publish/subscribe framework.We also compare the authentication's time complexity of baseline method and our framework.
Baseline: the data owner signs every message within Δ  and when the signed messages are delivered to their corresponding subscribers, each VO  consists of the messages (to   ) and their signatures.Then, the subscriber   can verify the soundness by computing the hash value of each message in VO  and comparing it against the message's signature.Recomputing the spatio-textual similarity between each message in VO  and the subscription   enables the subscriber to verify the completeness.
The space complexity of the VO of baseline method is given by following proposition.Compared with the baseline method, the space complexity of VO of our proposed authenticated locationaware publish/subscribe framework is given by the following proposition.From the above propositions we can see that, in our proposed filter-verification framework with inverted index pruning technique, if more than one message is delivered to a subscriber, only one signature is returned.Although our framework has  dummy messages in its VO, its VO size is still smaller than that of the baseline method when  is large since the signatures are space consuming.
Since the authentication time is co-related to the size of VO, the time complexity of authentication of our proposed filter-verification framework with inverted index pruning technique is also smaller than that of the baseline method.

Experimental Study
In this section, we proceed to conduct extensive experiments to evaluate the performance of our proposed authenticated location-aware publish/subscribe framework.

Experiment Setup
4.1.1.Datasets.Similar to [2], we use a real-world dataset POI which contains 10 million points of interests in USA.We randomly select 1-5 keywords from each POI to generate subscriptions.Thus the average keyword number in each subscription is 3.The maximum permissible response delay Δ and the messages delivery interval Δ  (Δ  ≤ Δ) are both set as 5 mins.During this interval, we randomly select 2000 POIs as messages.To generate long messages, we combine 10 POIs as a single message.The average keyword number in each message is 41.4.1.2.Parameters.The performance of our proposed framework is evaluated by varying the preference . (0.1, 0.3, 0.5, 0.7, and 0.9) and threshold . (0.5, 0.6, 0.7, 0.8, and 0.9).We set . as 0.5 and . as 0.7 in the default setting.When we vary a parameter, the other parameter will be in the default setting.We use inverted document frequency (IDF) to generate keywords weights.

System Configuration.
All the experiments are run on a server with Intel(R) Xeon(R) CPU E5-2609 v2 @2.5 GHz (Quad Core) and 64 GB RAM, running Linux Ubuntu.We use in-memory setting and the programs are implemented in C++.

Performance Metrics.
The metrics for performance evaluation include (i) PAS and PC: percentage of accessed subscriptions and candidates, which indicate the ratios of accessed subscriptions in the inverted index of spatial-oriented prefixes and candidates to the number of total subscriptions (ii) FS: time of finding the relevant subscriptions for each message within Δ  (iii) CVO: time of constructing the VO (iv) VOS: VO size, which affects the communication cost between the service provider and subscribers (v) AM: time of authenticating the messages at the subscribers side Note that, in our framework, we process a batch of messages at one time; thus each time we first get a total value of each metric.Then, for the metrics PAS, PC, and FS, we report the average value corresponding to each message and, for the metrics CVO, VOS, and AM, we report the average value corresponding to each subscriber.4.1.5.Algorithms.For metrics (i), (ii), and (iii), algorithms to be evaluated in our experiments include (1) SP (the method of finding the relevant subscriptions for each message using the spatial-oriented prefixes, which is proposed in [2]); (2) SP + IIP (our filter-verification framework with inverted index pruning technique); (3) VOC (our method of constructing the VO).
Note that, to the best of our knowledge, this is the first attempt to define and solve the problem of authenticating messages in outsourced location-aware publish/subscribe services.Therefore, no existing algorithm is included in our experiments as comparative analysis.

Performance Study
4.2.1.Cost at the Service Provider.The cost at the service provider is evaluated from two aspects.First, in Figure 6, we evaluate the ratios of accessed subscriptions and candidates (as a function of . and .) to the number of total subscriptions (PAS and PC), where the accessed subscriptions refer to subscriptions that are accessed in the inverted index and candidates refer to subscriptions that are verified using the Verify function in Algorithm 1.Second, as shown in Figure 7, we evaluate the running time (as a function of . and .), which includes the time of finding the relevant subscriptions for each message (FS) and constructing the VO (CVO).
According to Figures 6 and 7, we make the following observations.First, SP + IIP outperforms SP; that is, the PAS and PC of SP + IIP are both smaller than those of SP (shown in Figure 6).Besides, FS of SP + IIP is smaller than that of SP (shown in Figure 7).The reason lies in that SP + IIP uses the inverted index pruning technique to prune the subscriptions from the inverted index of spatial-oriented prefixes.These pruned subscriptions are not relevant to the messages and thus they need not be involved in the computation.Second, with the increase of ., the performance of SP and SP  + IIP increases, because for larger . there are smaller number of subscriptions required to be visited and verified, and we have greater opportunity to prune more irrelevant subscriptions.Third, with the decrease of ., SP and SP + IIP take much longer time, because for smaller ., the spatial similarity is more important and they cannot estimate accurate prefix bounds.Fourth, as . (.) increases, CVO increases (decreases) slightly since we use the answers of each message to construct the VO and CVO depending on the number of answers.With the increase of . (.), we get more (less) answers and thus CVO increases (decreases).Fifth, although in our framework it costs extra time to construct VO for subscribers' authentication, the total running time (FS + CVO) is still better than SP.For example, in Figure 7, when . = 0.3, SP + IIP costs around 60 ms and VOC costs about 20 ms, thus the total running time is about 80 ms, which is still less than the cost of SP, 90 ms.

Cost between the Service Provider and Subscribers.
We evaluate the metric VOS, that is, VO size, which affects the communication overhead between the service provider and subscribers.Figure 8 shows VOS under the experimental settings by varying . and ..
From Figure 8, we make the following observations.First, ALPF outperforms BL since we process a batch of messages rather than only one message at a time and when many messages are delivered to a subscriber   , the VO  consists of only one signature, which is computed using the root hash value of the TMR-tree.However, in BL, the VO  would include the signatures of every message.Second, with the increase of . (.),VOS increases (decreases) in a near linear manner.The reason lies in that VOS depends on the number of messages delivered to each subscriber.When . (.) increases, the number of answers of each message increases (decreases) and, conversely, the number of messages delivered to each subscriber increases (decreases).
Third, the biggest value of VOS is about 240 KB when . = 0.5.This value is acceptable especially when more than one message needs to be verified by a subscriber.

Cost at the Subscribers.
The last metric AM, that is, the time of authenticating the messages at the subscribers side, is evaluated.AM is crucial since the subscribers may have limited computing resources.Figure 9 shows AM as a function of . and ..
According to Figure 9, we first find that, in ALPF, it always costs subscribers less time to authenticate the messages delivered to them than that in BL.This is because in ALPF when the soundness is verified, subscribers just need to decrypt one signature and recompute the root hash value of the TMR-tree to compare against it.However, in BL, the number of decryption operations equals the number of messages delivered to the subscribers but decryption is not a cheap operation comparing with the hashing operation.Thus, ALPF outperforms BL.Second, we find that, with the increase of . (.),AM increases (decreases) in a near linear manner since AM is always related to VOS and they have the same changing situation.Third, the worst case of authenticating the messages costs subscribers about 1.2 s, which is reasonable and would not have too many bad effects on the subscribers experience.

Security Analysis.
In this paper, we study the problem of authenticating messages in outsourced location-aware publish/subscribe services.Therefore, our goal of security analysis is to prove that our proposed authenticated locationaware publish/subscribe framework can guarantee the verification of soundness and completeness of messages by their corresponding subscribers.
Proof of Soundness.Assume that a message  delivered to a subscriber  is bogus or modified.In this paper, we adopt  the commonly used hash function SHA1 [3].Because SHA1 is collision-resistant and the hash value of the root of the TMR-tree is computed recursively from the messages that come within Δ  , which must include , the recomputed root hash value of the TMR-tree cannot be verified against the signature, which can be detected by the subscriber .Therefore, through our framework, subscribers can receive sound messages from the service provider.

Proof of Completeness.
Let  be a message satisfying the parameters  and  which is delivered to a subscriber .For the recomputed hash value of the root of the TMR-tree to match the signature (i.e., the soundness is satisfied), there are the following two cases: (i) The message  is included in the corresponding VO.
In this case, the subscriber  can confirm whether this message is the result of using the returned spatial and textual information of .
(ii) The message  is not included in the corresponding VO.In this case, it must be in the subtree rooted at  which is included in VO.However, the subscriber  cannot make sure that  does not satisfy  and  since if  is the result,  must satisfy  and , which alarms the subscriber about potential violation of the completeness.
Therefore, through our framework, subscribers can receive complete messages from the service provider.

Related Work
Our work is related to the location-aware publish/subscribe and authenticated query processing.Sections 5.1 and 5.2 retrospect the related work done in these areas.
Most studies in this field can be categorized according to different evaluation methods of relevancy between subscriptions and messages [2,[4][5][6][7][8].In particular, [4][5][6][7] use a spatial region to indicate the spatial information of each subscription and spatial overlap to evaluate spatial similarity and "AND", "OR" semantics or Boolean expressions to evaluate textual relevancy, while [2,8] combine the textual relevancy and spatial similarity into a ranking function to quantify the relevancy between subscriptions and messages.More specifically, regarding the first category, Chen et al. [4] study the problem of matching Boolean range continuous queries over a stream of incoming spatio-textual messages in real time.A Boolean range continuous query is to continually retrieve the spatio-textual messages arriving before the user-specified expiration time such that the retrieved spatio-textual messages satisfy the user's keywords which are connected by "AND" or "OR" semantics and are located in the query range.The authors present IQ-Tree, which is a hybrid index based on Quad-tree and inverted files.In [5], Li et al. study the location-aware publish/subscribe, which delivers a message to its corresponding subscribers having spatial overlap with the message and all the keywords in the subscriptions are contained in the message ("AND" semantic).They propose the   -tree, which extends the R-tree by selecting some representative keywords from subscriptions and adding them into R-tree nodes to enable textual pruning.Both matching algorithms of [4,5] follow the filtering-andrefinement paradigm.More recently, although they study the same problem, Wang et al. [6] find that, in [4,5], the spatial factor is always prioritized during the index construction regardless of the keyword distribution of the query set and the inverted indexing technique is not well-suited to textual filtering.Therefore, they the keyword partition and space partition in one tree structure when constructing the index for queries based on expected matching cost.They compute the cost based on the number of queries associated with each partition and the probability of whether the partition is explored during message matching, instead of the complexity of filter and verification steps.Guo et al. [7] study filtering dynamic streams for continuous moving Boolean subscriptions.Different from previous works, it continuously monitors users' locations and sends nearby messages in real time and it allows users to specify their interests with Boolean expressions, which provides better flexibility and expressiveness in shaping an interest.
With respect to the second category, as introduced in Section 2.3, Hu et al. [2] study the parameterized locationaware publish/subscribe, which requires subscribers to specify parameters to enable personalized filtering.In [8], Chen et al. study top- spatial-keyword publish/subscribe, which aims to continuously feed the user with new spatio-textual messages whose temporal spatial-keyword scores are ranked within the top-.They use a Quad-tree to partition the whole space.Each subscription is assigned to a number of covering cells, forming a disjoint partition of the entire space and an inverted file ordered by subscription id is built to organize the subscriptions assigned to each cell.

Authenticated Query
Processing.Authenticated query processing has been studied extensively.Most studies on query authentication are based on an ADS, Merkle hash tree (MHT) [1], as introduced in Section 2.3.The notion of the MHT is generalized to multiway trees and widely adapted to various index structures.Typical examples include the Merkle B-tree and its variant Embedded Merkle B-tree [9].Following the concept of the MHT, the authenticated query processing problem has also been studied for the relational data [9,10], data streams [11][12][13][14][15], and textual search engines [16].
In the spatial databases domain, based on the MHT, there are also many query authentication applications.Yang et al. [17] first introduce the query authentication problem to the domain of spatial data and study the authentication of spatial range queries.They propose an ADS called MR-tree, which combines the ideas of MB-tree [9] and  * -tree [18].Yiu et al. investigate how to efficiently authenticate moving NN queries [19], moving range queries [20], and shortestpath queries [21].More recently, Hu et al. [22] and Chen et al. [23] develop new schemes for range and top- query authentication that preserve the location privacy of queried objects.Besides, Lin et al. [24] investigate the authentication of location-based skyline queries.A new ADS called MR-Sky-tree is proposed.Authentication of reverse  nearest neighbor query is studied by Li et al. in [25].For the mixed data types, such as spatio-textual data, Su et al. [26] and Wu et al. [27] study the authentication problem for snapshot and moving top- spatial-keyword queries, respectively.Yan et al. [28] explore the authentication problem in the area of spatio-textual similarity joins.Instead of only supporting the relational data as [10] does, the proposed authentication schemes in [28] can support spatial data.Zhang et al. [29] study the authentication of location-based top- queries which ask for the POIs in a certain region and with the highest  ratings for an interested POI attribute.
Besides the MHT, there are some other index structures which can be used to construct the ADS, such as Voronoi diagram and prefix-tree.Hu et al. [30] propose a novel approach that authenticates spatial queries based on the neighborhood information derived from the Voronoi diagram.The problem of authenticating query results in data integration services is studied by Chen et al. in [31], which addresses multisource data authentication that can simultaneously support a wide range of query types.Based on the prefix-tree, they propose Homomorphic Secret Sharing Seal, which is to merge the authentication codes of nonresult values with a common prefix, thus allowing them to be verified as a whole.

Figure 4 :
Figure 4: An Example of procedures of filter-verification framework with inverted index pruning (partial).

Figure 5 :
Figure 5: An example of procedures of VO construction (VO 6 ).

Proposition 8 .
If there are  messages which are delivered to a subscriber  at one time, the VO size for , that is, the space complexity of VO, is O(∑  =1 |  |+|S|), where |S| is the size of the signature and each |  | includes the size of its spatial and textual information.

Proposition 9 .
If there are  messages which are delivered to a subscriber  at one time, the space complexity of VO is O(∑  =1 |  | + ∑  =1 |  | + |S|), where |S| is the size of the signature.  is a dummy message and we assume there are  dummy messages that are included in VO.

Figure 6 :
Figure 6: Evaluation of PAS and PC.

Figure 7 :
Figure 7: Evaluation of FS and CVO.
Therefore, the prefix of  can be defined as SIG() = { 1 ,  2 , . . .,  −1 }.Since the total weight of keywords after   is smaller than .T , if a subscription  is relevant to a message  (i.e., TSIM(, ) ≥ . T ), they must share at least one common keyword in SIG().
4,  3 ,  2 ,  1 } which is the union of its children's texts, that is, { 5 ,  4 ,  3 } ∪ { 5 ,  4 ,  2 ,  1 } ( 4 .∪  5 .).Since no matter the leaf or internal node is represented by a rectangle area (the Minimum Bounding Rectangle (MBR) of messages or other MBR in it), its location is defined by two points which can be the bottom-left and upper-right points.For example,  6 is the MBR of  6 and  7 and its location includes the rectangle's bottom-left and upper-right points ((0.11, 0.70) and (0.20, 0.79)).