Efficient Processing of Moving Top-k Spatial Keyword Queries in Directed and Dynamic Road Networks

A top-k spatial keyword (TkSk) query ranks objects based on the distance to the query location and textual relevance to the query keywords. Several solutions have been proposed for top-k spatial keyword queries. However, most of the studies focus on Euclidean space or only investigate the snapshot queries where both the query and data object are static. A few algorithms study TkSk queries in undirected road networks where each edge is undirected and the distance between two points is the length of the shortest path connecting them. However, TkSk queries have not been thoroughly investigated in directed and dynamic spatial networks where each edge has a particular orientation and its weight changes according to the traffic conditions. Therefore, in this study, we address this problemby presenting a newmethod, calledCOSK, for processing continuous top-k spatial keyword queries formoving queries in directed and dynamic road networks. We first propose an efficient framework to process snapshot TkSK queries. Furthermore, we propose a safe-exit-based approach to monitor the validity of the results for moving TkSK queries. Our experimental results demonstrate that COSK significantly outperforms existing techniques in terms of query processing time and communication cost.


Introduction
With the popularization of geo-tagged data (e.g., geo-tagged photos, videos, check-ins, and text messages), many online location-based services such as Google Maps, Yahoo Maps, and Bing Maps have started providing useful information via location-based queries [1][2][3][4].Moreover, textual descriptions of points of interest, e.g., hotels, shopping malls, and tourist attractions, are easily accessible on the Web.These developments demand techniques that efficiently process top-k spatial keyword queries that return a ranked list of the k best facilities based on their proximity to the query location and relevance to the query keywords.Several algorithms have been proposed for processing top-k spatial keyword queries in Euclidean space [5,6].Although few algorithms exist that study keyword queries in a road network, they all focus on undirected road networks.However, in real scenarios the urban road networks are directed and dynamic where each edge has a particular orientation and its weight changes according to traffic conditions such as traffic congestion and reversible lanes.Therefore, in this study, we investigate moving top-k spatial keyword queries in directed and dynamic road networks.
Top-k keyword queries can be used for a wide range of applications in recommendation and decision support systems.For example, tourists may want to retrieve a sorted list of restaurants that serve Italian steak based on the shortest distance from their location and textual relevance to the query keywords.Tourists can issue a top-k spatial keyword query to the location-based services (LBS) to collect information about qualifying restaurants in their vicinity.However, through moving top-k spatial keyword queries if they does not like the results, they can simply keep moving, and the updated results will be provided until a desired restaurant is found.Typically, the query issuer follows the underlying road network to reach at the desired location.Therefore, TkSK algorithms based on Euclidean space does not work in road networks.A road network is generally 2 Wireless Communications and Mobile Computing modeled as a weighted directed graph, where each edge has some direction and its weight can vary according to the traffic conditions.Given a set of data objects  = { 1 ,  2 , . . .,  || }, query location, and set of keywords, the TkSK query returns the best k data objects from D according to their combined textual and spatial relevance to the query.We use distance function (, ) to represent the shortest network distance from q to data object d. Figure 1 presents an example of a directed road network, where rectangles represent the data objects with a textual description, and the triangle represents the query location.The number label on each edge indicates the weight of that edge such as the amount of time required to travel along it, e.g., ( 1 ,  1 ) = 1 and ( 1 ,  2 ) = 2. Consider a scenario where a tourist is interested in finding an "Italian Restaurant." If an undirected road network is considered, the top-1 "Italian Restaurant" is  6 .However, in a directed road network, the shortest path from q to  6 is ( →  3 →  7 →  6 ).Therefore, for a directed road network, the top-1 result is  3 because it is closer to the query location than  6 .Now, consider that the tourist is looking for "Cafe Bakery".The data object  7 could score higher than data object  1 because  7 ("Cafe and bakery") is more textually relevant to query keywords than  2 ("Cafe"), and (,  7 ) is only marginally greater than (,  2 ).
Moving Top-k spatial keywords in directed and dynamic road networks are useful for many location-based applications.However, query processing is costly because movement of query object q may invalidate the query results.Therefore, the main challenge in moving TkSk is to maintain the freshness of the query results when the query objects are moving freely.A straightforward approach is to increase the update frequency of the query.However, this approach not only compromises the up-to-date query results but also increases the computation and communication overhead.Because whenever query object changes its location the query object has to report its location to server which increases the communication cost and server has to recompute the results again which increases the computation cost.
To address the aforementioned challenges, we first present an efficient processing technique of snapshot TkSK queries in directed road networks.Then, we present a safeexit-based approach for processing and monitoring moving TkSk queries where query object q is freely moving in a directed spatial network.The safe exit point of query object q represents a boundary point between the safe region and nonsafe region of q.A safe region of query points indicates that the query result remains valid if the query object lies within its respective safe region.Therefore, the query results will only be recomputed when q leaves its respective safe region which significantly reduces the computation and communication costs.To the best of our knowledge, this is the first attempt to study moving top-k spatial keyword queries in directed and dynamic road networks.
Below, we summarize our contributions: (i) We study the problem of continuous monitoring of moving top-k spatial keyword queries in a directed and dynamic road networks.
(ii) We present an algorithm to monitor the moving TkSK queries which efficiently computes the safe exit points for query object q in a directed road network.The algorithm significantly minimizes the computation and communication costs for moving queries.
(iii) We also propose a method that monitors the validity of query results and safe region when weight of road segments is updated due to traffic conditions.
(iv) Finally, we conduct extensive experiments on real road network datasets and demonstrate the superiority of the proposed algorithm over the existing approach.
The remainder of this paper is structured as follows.Section 2 reviews the existing work on the processing of TkSk queries on Euclidean and road networks.Section 3 provides terminology definitions and describes the problem.Section 4 elaborates on the proposed query processing technique for TkSK queries in directed road networks.In Section 5, we present our safe-exit-based technique to process moving TkSK queries.Section 7 presents a performance analysis of the proposed technique.Section 8 concludes this paper.

Related Work
In this section, we discuss some of the promising related studies of top-k spatial keyword queries.Our related work is divided into two sections: Section 2.1 reviews snapshot TkSK queries, and Section 2.2 presents the studies proposed to address moving TkSK queries.

Snapshot Top-k Spatial Keyword
Queries.In recent years, spatial keyword queries have drawn the attention of many researchers.Several approaches have been proposed for ranking spatial data objects.Initially, Zhou et al. [7] worked on combining inverted indexes [8] and R-trees [9].They proposed three different hybrid indexing structures.Their study demonstrated that building an inverted index on top of an R-tree provides superior performance.Hariharan et al. [10] proposed the indexing structure KR * -tree by capturing the joint distribution of keywords in space.Ian de Felipe et al. [11] proposed a data structure that combines an R-tree with text signatures.Each node of the R-tree exploits a signature to indicate the presence of keywords in the subtree of the node.However, both these approaches address only Boolean keyword queries in Euclidean space.
Top-k spatial keyword queries where data objects are ranked according to their combined textual and spatial relevance to keyword queries were first studied by Cong et al. [5] and Li et al. [6].Both studies [6] integrate location indexing and text indexing to generate IR-trees.These studies process top-k spatial keyword queries only in Euclidean space and are not suitable for processing top-k spatial preference queries in road networks, where the distance between objects is determined by the shortest path connecting them.Later, Rocha et al. [12] proposed the indexing technique S2I, which maps each term in the vocabulary into a separate block or aR tree for efficient processing of top-k spatial keyword queries.Zhang et al. [13] proposed an m-closest keyword query that returns the closest object based on distance and which matches m query keywords.
Top-k spatial keyword queries in road networks were introduced by Rocha et al. [14].In particular, they proposed three different indexing techniques (Basic Indexing, Enhanced Indexing, and Overlay Indexing) for processing spatial keyword queries in road networks.

Moving Top-k Spatial Keyword
Queries.Recently, research focus has shifted to the continuous processing of spatial queries where query or data objects are arbitrarily moving in road networks, which is the most realistic scenario.Considerable research effort has been undertaken to process moving range, k nearest neighbor (kNN), and reverse k nearest neighbor queries (RkNN) [15][16][17][18].However, there is a lack of efficient algorithms for moving top-k spatial keyword queries.Initially, Wu et al. [19] and Huang et al. [20] [22] proposed TPR-tree-based indexing to monitor moving top-k spatial keyword queries.In contrast to [21,22], in this study we consider moving top-k spatial keyword queries in directed and dynamic road networks where each road segment has a particular orientation and its weight changes due to according to traffic conditions.Table 1 compares our problem scenario with related work in terms of query type, space domain, and orientation of road networks.

Preliminaries
Section 3.1 defines the terms and notations used in this paper.Section 3.2 formulates the problem using an example that illustrates the general results of top-k spatial keyword queries.

Road Network.
A road network is represented by a weighted directed graph  = (, , ) where N, E, and W denote the node set, edge set, and edge distance matrix, respectively.The network distance of an edge changes depending on the traffic conditions.Each edge is also assigned an orientation that is either undirected or directed.The undirected edge is represented by  = (  ,   ) where   and   are the boundary nodes   of an edge, whereas the directed edge is represented by  =   → (  ,   ) or  = ←   (  ,   ).Naturally, the arrow above the edge indicates the associated direction.We refer to   as the starting node and   as the ending node of an edge.For example, in Figure 1,  6 is the starting node of edge → ( 6 ,  2 ), whereas it is the ending node for edge ←  ( 6 ,  5 ).The particular edge where a query object is located is called an active edge.It is important to note that, the distance between two points,  1 and  2 , is not symmetrical in directed road networks (i.e., ( 1 ,  2 ) ̸ = ( 2 ,  1 )).For example in Figure 1, the ( 3 ,  4 ) = 3, whereas the ( 4 ,  3 ) = 11 because shortest path from  4 to  3 is ( 4 →  6 →  2 →  3 →  3 ).

Segment.
Segment  = ( 1 ,  2 ) is the part of an edge between two points,  1 and  2 , on the edge.An edge consists of one or more segments.An edge is also considered a segment where the nodes are the end points of the edge.The weight of a segment ( 1 ,  2 ) is denoted by ().

Problem Formulation.
Similar to previous studies [5,14,23], we assume each data object  ∈  has a point location . in the road network and a text description ..Given a query location ., a set of keywords ., and k number of data objects to return, the top-k spatial keyword query   is defined as   = (.,., ), which takes three arguments and returns the best k data objects from D according to a score that considers spatial proximity and text relevance.The score () of a data object d is defined by the following equation: where (.,.) is the spatial relevance between . and ., (.,.) is the textual relevance between . and ., and  is a positive real number that determines the importance of one measure over the other.For example, if only textual relevance is considered, then  = 0.If more importance is given to spatial relevance, then  > 1.
Spatial relevance () is defined as the shortest distance between data objects d and q: (.,.) = (.,.).Thus, (  .,.) < (  .,.) indicates that data object   is more spatially relevant to q than data object   .The textual relevance () can be computed using any popular information retrieval model, such as cosine similarity or the language model.In this study, we use the cosine similarity between . and ..The textual relevance is defined as follows: The weight  (.)= 1 + ln( (,) ), where  (,) represents the frequency of term t in ..The weight  (.)= ln(1 + ||/  ), where || is the number of objects in D, and   is the document frequency.A higher  means a higher textual relevance to the query keywords.We used the variation of cosine similarity based on the significance factor  () of term t in a document n, where n represents the description of data object . or query keywords ..The significance  () =  () /√∑ ∈ ( () ) 2 is the normalized weight of the term in the document by taking into account the length of the document [24,25].Hence, the textual relevance (.,.) can be rewritten as  (., .) = ∑ ∈. (.).(.) (3)

Query Processing System
In this section, we present the proposed query processing system that indexes the data objects and prunes the irrelevant edges for efficient query processing.In Section 4.1, we discuss the indexing framework, and in Section 4.2, we present an efficient keyword query processing algorithm for snapshot queries.
4.1.Indexing Framework.In this study, our main work focuses on moving queries in a directed and dynamic road networks.We use a method similar to the enhanced technique presented in [12] as our basic framework for processing snapshot queries in directed and dynamic road networks.The indexing framework combines a road network framework [1] for storing spatial information and an inverted file for indexing data objects.For easy traversing of the network, we store the adjacent nodes of each given node by storing node id (  ), edge id (  ), the direction of the edge, and the weight of the edge.The indexing framework consists of two main components: a pruning component and an inverted file component.Figure 2 illustrates the main components of an indexing framework.The pruning component first prunes the edges that contain data objects irrelevant to the query keyword.To achieve this, we introduced the highest significance  +  of a given term t in the description of objects lying on the edge.The  +  on an edge is retrieved by a key composed of a pair of edge id and term id (  ,   ).The  +  represents an upper-bound significance of any object lying on an edge with term t in its description.The inverted list of a term t on an edge is accessed only if the upper-bound score composed by  +  and the minimum network distance between the starting node of the edge and query q may return a candidate data object.Naturally, the edges with upper-bound scores smaller than the score of the k-th object found so far are pruned.
We implement an inverted file for indexing data objects.The inverted file contains a vocabulary and inverted lists.The vocabulary keeps general information about each term (such as the frequency of the term), which is helpful in computing the textual relevance of the data objects.The inverted list stores the data objects located on the edge   → (  ,   ) that have a term t in their description.An inverted list is identified by a key composed of (  ,   ).Each inverted file is a set of inverted lists.A separate inverted list is used for each term in the object description.An inverted list stores two attributes for each data object: first, the distance between the data object and the starting node (  ,   ); second, the significance factor (  ,   ) of the term   in the description of the data object.Note that the network distance between two points in a directed road network is not symmetrical (i.e., (  ,   ) ̸ = (  ,   )).Recall that the starting node is chosen according to the orientation of the edge such that the direction of the edge is from the node toward the data object.In Figure 1,  3 is the starting node for  7 .For bidirectional edges, any of the adjacent nodes can act as a starting node.
The proposed indexing scheme has three main advantages.First, the object search relevant to query keywords is very efficient using the (  ,   ) pair.Second, inverted files also store the network distance between the starting node and the data object, which helps in accessing the data object in the directed road network.Finally, the pruning technique allows for faster query processing by exploring fewer edges.Highest non-answer object algorithm [26].Algorithm 1 returns the top-k data objects with the highest scores according to their joint textual and spatial relevance to the query.The algorithm begins by exploring the active edge where query object q is located, and expands the network in an increasing order of distance from q.Each entry in the min-heap has the form (  , ), where   indicates the anchor point in the edge.For an active edge, q becomes the anchor point.Otherwise, for directed edges, ending node   becomes the anchor point.For bidirectional edges, either of the adjacent boundary nodes, i.e.,   or   , becomes the anchor point.Let   be the current set of top-k data objects and   be the score of the k-th data object in   .The ℎ((  ,   ),   ) function retrieves the candidate data objects   located in an edge with a better score () than   .Next, the   set is updated with the data objects in   , and so does   .The algorithm continues its expansion and inserts the adjacent edges of the boundary node until the heap is exhausted or the upper-bound score of the remaining data objects cannot have a better score than   .The upper-bound score () of node n is computed using (, ) and the maximum textual relevance ( = 1).Therefore, if () ≤   , it means that even if there is unexplored data object d matching all query keywords, its score can be better than the k-th object in   because (, .) ≥ (, .).This is certain owing to the fact that the algorithm strictly expands the node with a minimum distance to the query location.
∈ . and the shortest distance (  , .) between the edge and the query location.In the next step, the inverted lists of term t are fetched if their upper-bound score is greater than   .In the inverted lists, the objects with score () greater than   are returned.
To understand the proposed algorithm, consider the road network presented in Figure 1.Assume that a query q generated a top-1 keyword query with q.d "Italian Restaurant."For ease of presentation, we assume  = 1 and the textual relevance  is the number of occurrences of query keywords in . divided by the number of keywords in the document (description of data object).For example, ( 4 ) = ( 4 .,.)/(1 + ( 4 .,.)) = 0.5/8 = 0.06.The algorithm starts the network expansion from an active edge → ( 2 ,  3 ) where q is the anchor point.Note that the direction of the edge → ( 2 ,  3 ) is from  2 to  3 .Therefore, the algorithm explores only   → (,  3 ).There is no data object found in   → (,  3 ).Then,  3 becomes the anchor point and edges ( 3 ,  4 ), ( 3 ,  5 ), and ( 3 ,  7 ) are inserted in min-heap.Next, the ℎ function retrieves the candidate data objects on edges ( 3 ,  4 ), ( 2 ,  3 ), and ( 3 ,  7 ), whose score is better than   .On edge ( 3 ,  5 ), data object  3 is retrieved with ( 3 ) = 0.2.Data object  3 is inserted in the   set, and the value of   is set to 0.2.For edges ( 3 ,  4 ) and ( 3 ,  7 ), there is no candidate object found because  2 .("Cafe") and  7 .("Cafe and Bakery") do not match with ..The algorithm continues expanding the edges whose upper-bound score is greater than   .The edge → ( 7 ,  2 ) is explored next.The upper-bound score of → ( 7 ,  2 ) is 1/7, which is less than   .Similarly, for edge ←  ( 6 ,  5 ), the upper-bound score is 0.5/8 <   .Therefore, the algorithm terminates and reports  3 as the top-1 result.q q issues TkSK query at p 1 Server returns a set of objects for p 1

Moving Top-𝑘 Spatial Keyword Queries
In this section, we present our method to monitor the moving top-k spatial keyword queries where query objects are moving in a directed road network.Figure 3 provides an example of TkSK in road networks, where query point q issues a TkSK query at point  1 .Note that the numbers on the arrows in the figure indicate the order of the steps.To obtain top-k results at  1 , the server executes Algorithm 1 as mentioned in Section 4.2.Now, consider that the query object is moved to  2 as shown in Figure 4 to retrieve the top-k results at point  2 .The simple method is to repeat the procedure executed at  1 .However, the use of recomputation whenever query q changes its location significantly increases the computation cost.Furthermore, it also increases the communication overhead because the query object must report its location whenever it moves, and the server must send the results set.To address these issues, we introduce the safe exit approach.
In the proposed framework, the server computes safe exit points for a query object.The server maintains a set of moving queries, and the query result remains valid until the query objects remain inside their respective safe exit points.Whenever a query object leaves its safe exit points, the server recomputes the TkSK and safe exit points for the query object.
Next, we present our method to compute the safe exit points for a query object.The safe exit point represents a point in the segment where a safe region and nonsafe region meet.We compute the safe exit point using the divide-and-conquer technique.Before presenting the detailed methodology, we define the terminologies used in this section.
Definition 1 (safe region).A portion of a road segment that can guarantee that, as long as the query point lies in it, its top-k results remain valid.
Definition 2 (answer objects  + ).A data object d is called an answer object of query q if the score of data object d (() > (  )), where   represents any other data object in the directed road network.Similarly, we can generalize this definition for TkSK: a data object d is called an answer object of query q if the score of a data object d (() > ( +1 )), where  +1 represents the ( + 1)ℎ data object in the directed road network.In other words, we can state that all answer objects are top-k results of query q.Definition 3 (nonanswer objects  − ).A data object d is called a nonanswer object of query q if the score of data object d (() < (  )), where   represents any other data object in the directed road network.Similarly, we can generalize this definition for TkSK: a data object d is called a nonanswer object of query q if the score of data object d (() < (  )), where   represents the kth data object in the directed road network.That is, we can say that all answer objects are topk results of query q.Therefore, we can state that none of the nonanswer objects are in the top-k results of query q.As discussed earlier, the main challenge in the continuous processing of moving TkSK is to maintain the validity of the result set because the movement of query objects can nullify the result set.To monitor the validity of the result set, we propose a safe-region-based approach.

Computation of Safe Exit Points.
In this section, we present our technique to compute the safe exit points.The main goal is to find a point in the road network where the query result set will change.The result set will change when the score of highest nonanswer  − ℎ surpasses the score of  +  .Generally, the textual relevance score does not change.Therefore, the score of data objects only changes because of the spatial relevance score, which can only change by the movement of query objects.The computation of the safe exit point is based on two key observations: =  +   , there is no safe exit point in the segment Explanation. +   represents the set of answer objects at anchor point   , whereas  +   represents the set of answer objects at boundary node   .As discussed earlier, the safe exit point is the particular point where the query results changed.If the query results at the starting node are the same as the ending node of any segment/edge, there does not exist any point where the query result is changing.Hence, we do not search the safe exit point in that segment.

Observation 2. If 𝐷 +
̸ =  +   , there is a safe exit point in the segment Explanation.In contrast to Observation 1, if the query results are different at the starting and ending points, then there exists a point where the query results are changing.Hence, there is a safe exit point in the segment.
To find the safe region, we observe the following cases: Case 1 (when  = 1 and the textual relevance of the highest nonanswer object and lowest answer object is the same).In this case, both the textual and spatial relevance have the same importance (i.e.,  = 1).In addition, the top-k result depends only on the spatial relevance because the textual relevance of both objects is the same.The data object that is closer to query point q becomes the answer object.For an undirected edge, the safe exit point   is the center point, i.e., max((  ,  Case 2 (when  ̸ = 1 and the textual relevance of the highest nonanswer object and lowest answer object is different).In this case, the top-k result depends on all functions that are the , spatial, and textual relevance.Clearly, for the undirected edges, the midpoint between the lowest answer object and the highest nonanswer object does not provide a valid safe exit point.Therefore, we introduce the divide-and-conquer technique.This will keep dividing the search space until we get the point where the score of the nonanswer is greater than that of the answer object.Typically, the safe exit point should be closer to the data object whose score is lower.Based on this observation, first we compute the midpoint in a similar fashion to Case 1, and then we continue dividing the search space until we find the point.For undirected edges, the safe exit point can be computed in a similar fashion to Case 1.
Case 2 also works for other cases when the safe exit point is not the mid point between the lowest answer object and the highest nonanswer object.In these cases the safe exit point depends on two or more functions.Therefore, the safe exit point can be easily computed using the aforementioned divide-and-conquer technique.Following are the scenarios where the safe exit point can be computed using Case 2.
(a) When  = 1 and textual relevance of the nearest nonanswer object and farthest answer object is different.
(b) When  ̸ = 1 and textual relevance of the nearest nonanswer object and farthest answer object is same.
Case 3 (when  = 0).This means the spatial relevance has no effect on the score of data objects.Hence, no monitoring is required for this scenario.Similarly, for Case 2, if (  ,   ) = (  ,   ), then the safe exit point is computed by dividing the search space by half until we find the closest point such that ( − ℎ ) > ( +  ).The safe exit point is computed in the same way as in Case 2 if (  ,   ) ̸ = (  ,   ).

Computation of Safe Exit Points for
Example.Consider the same example in Figure 1, where the query point q issues a top-1 keyword query with q.t "Italian restaurant."For this example, let us consider  = 1.The monitoring algorithm starts exploring from the active edge containing the query object q.Therefore,   → (,  3 ) is explored first.As shown in Therefore, according to Case 1, the safe exit point  1 is the midpoint between  3 and  6 .That is, ( 1 ,  3 ) = ( 1 ,  6 ), where ( 1 ,  3 ) = +3 and ( 1 ,  6 ) = − + 5 for 0 <  < 3. Consequently,  = 1, which means that the distance from  3 to  1 is 1.
Next, we determine a safe exit point in ( 3 ,  5 ).As shown in Table 3, the answer object at  5 is also the same as  3 .
The bold lines in Figure 5 indicate the safe region of q.The top-1 result remains  3 until the query q lies in the safe region.
Next, we analyze the time complexity for determining a set of safe exit points using a set of qualifying objects  ∈  +   ∪  +   ∪ (  ,   ).Note that  +   ( +   ) indicates  the set of k data objects that satisfies the query condition at   (  ).According to Dijkstras algorithm [26], the time complexity ( +  ) for computing a set of answer objects at a query point q is ( +  ) = (||+|| log ||).This means that ( +   ) = ( +   ) = (|| + || log ||) holds for endpoints   and   .Thus, time complexity (Ω ℎ ) when determining the skyline Ω ℎ with the k-th highest score is (Ω ℎ ) =  ℎ .(|+   ∪  +   ∪ (  ,   )|) where  ℎ is the number of qualifying objects that participate in the constitution of the skyline with the k-th highest score.Therefore, the time complexity of determining a safe exit point coincides with the time complexity of determining the two skylines, i.e., the skyline  +  with the k-th highest (or lowest) score for answer objects and the skyline  − ℎ with the highest score for nonanswer objects.This is because the safe exit point is found at the cross point between these skylines.
Proof.We will prove the correctness of the COSK algorithm by contradiction.We assume that if  +   ̸ =  +   , there is no safe exit point in a road segment (    ).This means that, for each point p in the road segment (    ), the query result at p equals  +   , i.e.,  +  =  +   ∀ ∈ (    ).However, it leads to a contradiction that  +   =  +   when  =   .Therefore, if  +   ̸ =  +   , a safe exit point exists in (    ).In addition, a safe exit point is determined using the skyline  +  for answer objects and the skyline  − ℎ with the highest score for nonanswer objects when  +   ̸ =  +   .The first skyline is a composite polyline drawn from answer objects in  +   .The second skyline is a composite polyline drawn from nonanswer objects in  +   ∪ (  ,   ) −  +   .

Monitoring Query Results and Safe Regions in Dynamic Directed Road Networks
In this section, we discuss the monitoring of spatial keyword queries in dynamic road networks where the network distance changes depending on the traffic conditions.The updates on weight of some edges may invalidate the query results or safe region of q, even though the query object q remains within their respective safe region.Figure 7 illustrates an example of changing the weights edges ←  ( 1 ,  2 ) and ←  ( 1 ,  6 ).For convenience we consider  = 1 and q.t = "Italian restaurant."In Figure 7(a), the top-1 result is  1 and bold lines show the safe region of query q.Now consider at time   the weights of two edges ←  ( 1 ,  2 ) and ←  ( 1 ,  6 ) changed due to heavy traffic condition as shown in Figure 7(b).The update in weight of edges may invalidate the query result or safe region of q.Therefore, it is necessary to monitor the validity of results and safe region when the changes occur.
Next, we introduce a monitoring region to monitor the validity of the safe region effectively when the weight of an edge is changed.Monitoring region MR contains all the points between query point q and lowest answer object and highest nonanswer object.Formally, it is defined as  = (,  +  ) ∪ (,  − ℎ ), where (,  +  ) is the distance between q and lowest answer object and (,  − ℎ ) is highest nonanswer object.In given example, the  +  =  1 and  − ℎ = { 2 ,  3 }.Therefore, the dotted lines in Figure 8(a) shows the monitoring region of query object q.Now at time   , the update to edges ←  ( 1 ,  6 ) and ←   ( 1 ,  1 ) which is not part of monitoring region can safely be ignored.
However, the updated on segment   → ( 2 ,  1 ) which is associated with monitoring region may nullify the results.As shown in Figure 8(b), after update the top-1 result becomes  2 and bold lines represents the new safe region of q.
Algorithm 5 monitors the validity of result set and safe region of query object q when the weight of any edge changes.Let us consider weight of edge (  ,   ) changes at time   .First, algorithm checks whether edge (  ,   ) is associated with monitoring region or not.If it is not part of monitoring region then algorithm simply ignores the update in edge (  ,   ) and query results and safe region remains valid.In contrast, if edge is associated with monitoring region (i.e.,  ∩ (  ,   ) ̸ = 0) then algorithm evaluates the query results.Consequently, the top-k results and safe region of query q needs to be updated.Finally, the algorithm updates the monitoring region of q.

Performance Evaluation
In this section, we evaluate the performance of COSK through simulation experiments.We describe our experimental settings in Section 7.1, and we present our experimental results for static and dynamic road networks in Sections 7.2 and 7.3, respectively.7.1.Experimental Settings.All of our experiments were performed using real road networks, namely, Oldenburg, San Francisco, and San Joaquin.All three road networks were obtained from [27].The original road network of San Francisco had 21,047 nodes and 21,692 edges.We reformatted the network, pruned approximately 30% of the nodes, and adjusted the edges and their weights accordingly.This resulted in a network with 14,732 nodes and 14,316 edges.Both the direction of edges and data objects on the edges were generated randomly.The description of each data object was extracted from Twitter messages [28], and we assigned one tweet per data object.Table 4 presents the characteristics of the data sets used in the experimental evaluation.We simulated moving query objects by using a spatiotemporal data generator [29].The input to generator was the road network of the data set used, and the output was the set of query objects moving on the road network.Each experiment had 100 moving queries which were continuously monitored for 100 timestamps (1 timestamp = 1 second), and the average result was reported in the experiments.
As a benchmark for COSK in static road network, we implemented a CMTkSK+ algorithm [22] which also continuously monitored the moving top-k spatial keyword queries in the road networks.However, this algorithm was originally designed for undirected road networks.To make a fair comparison, we modified CMTkSK+ to process top-k spatial keyword queries in directed road networks and called it CMTkSK+.Specifically, we modified the distance computation method between two points such that in directed road networks, ( 1 ,  2 ) ̸ = ( 2 ,  1 ).Since CMTkSK+ does not handle top-k spatial queries in dynamic road roads, we compared the performance of COSK with basic algorithm which recomputes the results whenever query object changes its location.All algorithms were implemented in Java and were executed on a desktop PC 2.80-GHz Intel Core i5 with (  5 summarizes the parameters used in the experiments.In each experiment, we varied a single parameter within the range that is shown in Table 5 while maintaining the other parameters at the bolded default values.We evaluated the performance of the algorithms by using the following measures: (1) total amount of server CPU time, which indicates the query processing time, and (2) total communication cost as the total number of points (i.e., the location updates sent by query objects, and the query results and safe exit points returned by the server) transferred between clients and the server.The battery power and wireless bandwidth consumption typically increase with the amount of data transferred between objects (clients) and servers.Thus, we used the amount of transferred data as a metric to evaluate the communication cost.

Experimental Results of Top-k Spatial Keyword
Queries in Static Road Networks 7.2.1.Effect of k. Figure 9 indicates the effect of the number of results on the query processing time and communication cost for both algorithms.Figure 9(a) indicates that the query processing time increases for both algorithms as the value of k increases.This is expected because with an increase in k, more data objects are required to be explored and verified.Nevertheless, COSK significantly outperforms CMTkSK+ for two main reasons.First, a relevant object search is very efficient when using the highest significant factor; and second, COSK does not need to verify the set of answer objects as long as the query object lies in a safe region.On the other hand, the CMTkSK+ query processing time increases significantly because it has to monitor and verify the set of candidate objects periodically.In Figure 9(b), the communication costs for both algorithms increase as the number of objects increases.However, the proposed algorithm demonstrates superior performance compared to CMTkSK+ because clientserver communication is not required when the query object lies within the safe exit points, whereas in CMTkSK+, the query object is required to report its location to the server whenever it moves.

7.2.2.
Effect of   .This experiment was conducted on dataset San Joaquin.This dataset included 19,098 data objects; therefore, we randomly generated approximately 30,000 additional data objects on different edges.In Figure 10, we evaluate the performance of COSK and CMTkSK+ by varying the cardinality of the data objects.Note that   = 10 corresponds to a low density of data points, while   = 50 corresponds to a high density.In Figure 10(a), it is interesting to notice that the query processing times of both algorithms decrease as the cardinality of the data objects increases.For CMTkSK+, this is because with high density, the monitoring range of a query decreases.However, for COSK, it is mainly because when the data density is high, fewer edges are required to be expanded, which decreases the query processing time.In Figure 10(b), we study the influence of the cardinality of the data objects on the communication costs.The experimental results indicate that the communication costs of CMTkSK+ incur almost constant communication costs regardless of data object cardinality.However, the communication costs of COSK increase in proportion to the   value.This is expected because the safe region becomes smaller as the density of the data objects increases, which increases the communication costs.

Effect of Query Keywords (n).
Figure 11 shows the query processing time and communication for COSK and CMTkSK+ as a function of the number of query keywords.Figures 11(a) and 11(b) show the trend that the performance of both algorithms degrades when the number of keywords increases.This is mainly because by increasing the number of query keywords, the number of relevant objects may also increase, resulting in a higher query processing time and communication cost.However, the safe-region-based algorithm COSK scales better than CMTkSk+ because of its less expensive monitoring technique.
7.2.4.Effect of . Figure 12 demonstrates the impact of query parameter  on the query processing time and on the communication cost.A small value of  indicates a greater importance of textual relevance, whereas a high value of  gives more preference to the spatial relevance.It is interesting to note that the query processing time is lower for higher   values of , which indicates more importance to the spatial relevance.This is mainly because when the spatial relevance is higher, fewer edges and objects are required to be explored and processed to determine the top-k data objects.Observe that in Figure 12(b), the number of messages sent by COSK decreases sharply with an increase in .7.2.5.Effect of Speed. Figure 13(a) demonstrates the influence of the speed of the query objects on the query processing time of the COSK and CMTkSK+ algorithms.The experimental results indicate that the performance of CMTkSK+ is not significantly influenced by the speed of the query objects because the candidate objects must be continuously monitored after a regular interval of time, regardless of the speed.On the other hand, for COSK, the performance gradually decreases as the speed of the query objects increases because the objects leave their respective safe regions more frequently.Figure 13(b) shows the communication costs of COSK and CMTkSK+ with respect to the speed of the query objects.CMTkSK+ incurs almost constant communication costs because a server-initiated request to verify the candidate objects does not depend on the speed.For COSK, the query objects cross safe regions more frequently when the speed is high, which increases the communication costs.

Experimental Results of Top-k Spatial Keyword
Queries in Dynamic Road Networks.In this section, we evaluate the performance of COSK and basic algorithm for dynamic road networks.The   indicates the percentage of all edges that change their weight at each timestamp.The length of an updated edge is randomly selected between 0.1 to 10 times the original length.Figure 17(a) depicts the query processing time of COSK and basic algorithm.It is evident from the figure that query processing time of basic algorithm is not significantly affected by   .This is mainly because the query objects issue top-k spatial queries at each timestamp.However, query processing time of COSK increases with the value of   because the probability that the updated edge may associated with the monitoring region of query q increases with   .Therefore, when   becomes large the results need to be frequently updated which increases the query processing time.

Conclusion
In this paper, we investigated moving top-k spatial keyword queries in directed and dynamic road networks.We presented an efficient indexing framework using inverted files that indexes the data objects on edges, allowing for the effective searching of data objects relevant to queries in terms of both textual and spatial relevance.We also presented a safeexit-based algorithm called COSK to monitor moving topk spatial keyword queries.We demonstrated that the query results remain valid as long as the query object resides within a safe region.Furthermore, COSK can effectively monitor the validity of query results and safe regions in dynamic road networks.Finally, an experimental evaluation conducted on real road networks demonstrated that COSK significantly reduced the query processing time and communication costs compared to the CMTkSK+ algorithm.

Figure 1 :
Figure 1: Illustration of directed road network.

Figure 3 :
Figure 3: Illustration of directed road network.

Figure 4 :
Figure 4: Illustration of directed road network.

Figure 5 :
Figure 5: Illustration of safe region of q.

d 2 (Figure 7 : 1 d 1 (Figure 8 :
Figure 7: Updating the weight of edges in a dynamic road network, where   <   .

Figure 9 :
Figure 9: Effect of k on query processing time and number of edges processed.

Figure 10 :
Figure 10: Effect of   on query processing time and communication cost.

Figure 11 :
Figure 11: Effect of number of keywords on query processing time and communication cost.

Figure 12 :
Figure 12: Effect of  on query processing time and communication cost.

Figure 13 :
Figure 13: Effect of speed on query processing time and communication cost.

Figure 14 :
Figure 14: Effect of mobility on query processing time and communication cost.
Figure 17(b)  shows the communication costs of COSK and basic algorithm with respect to   .Basic algorithm incurs almost constant communication costs regardless of the value of   .In contrast, the communication cost of COSK increases with   because the query result and safe regions needs to be frequently updated.

Figure 15 :Figure 16 :
Figure 15: Effect of   on query processing time and communication cost.

Table 1 :
Comparisons with existing solutions.

Table 2
presents the notations used in this study.4.2.Query Processing Algorithm.Our algorithm traverses the road network incrementally in a similar fashion to Dijkstra's Wireless Communications and Mobile Computing 5

Table 2 :
Summary of notations used in this paper.  ,   ) Length of shortest path from   to   , where   and   represent start and end points, respectively ( 1 ,  2 ) Length of segment connecting two points  1 and  2 Node in road network  = (  ,   ) Edge in edge set E, where   and   are start and end points of the edge   Boundary node corresponding to start (  ) or end (  ) point of an edge () Weight of edge (  ,   ) q Query point in road network k A number that represents q can be among k number of closest facilities to a data object d D Set of data objects  =  1 ,  2 , . . .,  ||  (  ,  ) Set of data objects in an edge   Anchor point that corresponds to start point of expansion   Safe exit point where safe and non-safe regions of q intersect  query parameter () Score of data object d (.,.) textual relevance of data object d with query keywords (.,.) Spatial relevance of data object d with query location ← ℎ((  ,   ),   )(15)update   and   with  ∈ (7)orithm 1: EvaluateSnapshotQuery(Node   , Edge   ).(1) Input: Edge ID:  , Term ID:   , score of k-th object   (2) Output: candidate list   (3) compute   (  ) (4) if   (  ) > 0 then (5).( ) ← .(,(  , .))(6)end(7)if .( ) >   then
between the lowest answer object and the highest nonanswer object.However, in case of a directed edge where (  ,   ) ̸ = (  ,   ), the safe exit point is either  +  or   .If  +  ∈ (  ,   ), then the safe exit point is  +  ; otherwise, the safe exit point is   .

Table 3 :
Computation of safe exit points for example scenario.

Table 4 :
Summary of datasets.
7.2.6.Effect of Mobility.Figure14shows the effect of mobility  (mobility refers to the percentage of query objects that are moving at any timestamp) on the performance of COSK and CMTkSK+ algorithms.As expected, the query processing time and communication costs for both algorithms increase with  y .Nevertheless, COSK performs better than CMTkSK+ in terms of query processing time and communication costs.7.2.7.Effect of Directed Edges.Figure15shows the impact of percentage of directed edges   on the performance of COSK and CMTkSK+ algorithms.The query processing time increases with   because algorithm needs to explore more edges to retrieve the top-k keyword queries.However, the communication cost is not significantly affected by the value of   for both the algorithms.7.2.8.Effect of Datasets.Figure16demonstrates the index sizes of the COSK and CMTkSK+ approaches for different datasets.As shown in Figure16, both algorithms have similar index sizes.However, COSK has minor space overhead because it stores additional information of the highest significance factor   of edges.More important, this space overhead is minimal as compared to the gain achieved by COSK in query processing time and communication costs.