K nearest neighbor search in navigation systems

A frequent type of query in a car navigation system is to find the k nearest neighbors (kNN ) of a given query object (e.g., car) using the actual road network maps. With road networks (spatial networks), the distances between objects depend on their network connectivity and it is computationally expensive to compute the distances (e.g., shortest paths) between objects. In this paper, we propose a novel approach to efficiently and accurately evaluate kNN queries in a mobile information system that uses spatial network databases. The approach uses first order Voronoi diagram and Dijkstra’s algorithm. This approach is based on partitioning a large network to small Voronoi regions, and then pre-computing distances across the regions. By performing across the network computation for only the border points of the neighboring regions, we avoid global pre-computation between every object-pair. Our empirical experiments with real-world data sets show that our proposed solution outperforms approaches that are based on on-line distance computation by up to one order of magnitude. In addition, our approach has better response times than approaches that are based on pre-computation.


Introduction
Over the last decade, due the rapid developments in information technology (IT), particularly communication technologies, a new breed of information systems have appeared such as mobile information systems.Mobility is perhaps the most important market and technological trend within information and communication technology.Mobile information systems will have to supply and to adopt services that go beyond traditional web-based systems, and hence it comes with new challenges for researchers, developers and users.
One of well-known applications that depend on mobility is the car navigation system, which allows drivers to receive navigation instructions without taking their eyes off the road.Using a Global Positioning System (GPS) in the car navigation system enables the driver to perform a wide manner of queries, from locating the car position, to finding a route from A to B, or dynamically selecting the best route in real time.One of the frequently used queries in such systems is k nearest neighbor (kN N ) queries.This type of query is defined as: given a set of spatial objects (or points of interest, e.g., hospitals), and a query point (e.g., vehicles' location), find the k closest objects to the query.An example of kN N query is a query initiated by a GPS device in a vehicle to find the 5 closest restaurants to the vehicle.With spatial network databases (SNDB), objects are restricted to move on pre-defined paths (e.g., roads) that are specified by an underlying network.This means that the shortest network distance between objects (e.g., the vehicle and the restaurants) depend on the connectivity of the network rather than the objects' location.
The majority of the existing work on kN N queries is based on either computing the distance between a query and the objects on-line [3,5,9], or utilizing index structures [4,6,8,10,11,13,[16][17][18].The solution proposed by the first group is based on the fact that the current algorithms for computing the distance between a query object q and an object O in a network will automatically lead to the computation of the distance between q and the objects that are (relatively) closer to q than O.The advantage of these approaches is that they explore the objects that are closer to q and computes their distances to q progressively.However, the main disadvantage of these approaches is that they perform poorly when the objects are not densely distributed in the network since then they require large portion of the network to be retrieved for distance computation.The second group of approaches is designed and optimized for metric or vector spatial index structures (e.g., m-tree and r-tree, respectively).The approaches that are based on metric index structures require pre-computations of the distances between the objects and grouping of the objects based on their distances to some reference object (this is more intelligent as compared to a naive approach that pre-computes and stores distances between all the object-pairs in the network).These solutions filter a small subset of possibly large number of objects as the candidates for the closest neighbors of q, and require a refinement step to compute the actual distance between q and the candidates to find the actual nearest neighbors of q.The main drawback of applying these approaches on SNDB is that they do not offer any solution as how to efficiently compute the distances between q and the candidates.Moreover, applying an approach similar to the first group to perform the refinement step in order to compute the distance between q and the candidates will render these approaches, which traverse index structures to provide a candidate set, redundant since the network expansion approach does not require any candidate set to start with.In addition to this drawback, approaches that are based on vector index structures are only appropriate for spaces where the distance between objects is only a function of their spatial attributes (e.g., Euclidean distance) and cannot properly approximate the distances in a network.
A comprehensive solution for spatial queries in SNDB must fulfill these real-world requirements: 1) be able to incorporate the network connectivity to provide exact distances between objects, 2) efficiently answer the queries in real-time in order to support kN N queries for moving objects, 3) be scalable in order to be applicable to usually very large networks, and 4) be independent of the density and distribution of the points of interest.
Taken into consideration that Mobile devices are usually limited on memory resources and have lower computational power, in this paper, we propose a novel approach that fulfills the above requirements by reducing the problem of distance computation in a very large network, into the problem of distance computation in a number of much smaller networks plus some online "local" network expansion.The main idea behind our approach, termed Progressive Incremental Network Expansion (PINE), is to first partition a large network in to smaller/more manageable regions.We achieve this by generating a network Voronoi diagram over the points of interest.Each cell of this Voronoi diagram is centered by one object (e.g., a restaurant) and contains the nodes (e.g., vehicles) that are closest to that object in network distance (and not the Euclidian distance).Next, we pre-compute the inter distances for each cell.That is, for each cell, we pre-compute the distances across the border points of the adjacent cells.This will reduce the pre-computation time and space by localizing the computation to cells and handful of neighbor-cell node-pairs.Now, to find the k nearest-neighbors of a query object q, we first find the first nearest neighbor by simply locating the Voronoi cell that contains q.This can be easily achieved by utilizing a spatial index (e.g., R-tree) that is generated for the Voronoi cells.Then, starting from the query point q we perform network expansion two different scales simultaneously to: 1) compute the distance from q to its first nearest neighbor (its Voronoi cell center point), and 2) explore the objects that are close to q (centers of surrounding Voronoi cells) and computes their distances to q during the expansion.
At the first scale, a network expansion similar to INE performed inside the Voronoi cell that contains q (V C(q)) starting from q.To this end, we utilize the actual network links (e.g., roads) and nodes (e.g., restaurants, hospitals) to compute the distance from q (e.g., vehicle) to its first nearest neighbor (the generator point of V C(q)) and the border points of V C(q).When we reach a border point of V C(q), we start a second network expansion at the Voronoi polygons scale.Unlike IN E and similar to V N 3 , the second expansion utilizes the inter-cell pre-computed distances to find the actual network distance from q to the objects in the other Voronoi cells surrounding V C(q).Note that both expansions are performed simultaneously.The first expansion continues until all border points of V C(q) are explored or all kN N were found.
To the best of our knowledge, the Incremental Network Expansion (IN E) and the Voronoi-based Network Nearest Neighbor (V N 3 ) approaches presented in [5,9], respectively, are the only other approaches that support the exact kN N queries on spatial network databases.However, V N 3 performance suffers with lower density data sets.
Our empirical experiments with real-world data sets (presented in Section 5) show that V N 3 disk access time tends to be on average twice to four times more than that for P INE.In addition, V N 3 CPU time tends to be on average five to ten times more than that for P INE.Finally, we show that the required computation for the pre-computation component of V N 3 is on average 13.59 times more than that of P INE.IN E approach also suffers from poor performance when the objects (e.g., restaurants) are not densely distributed in the network.Our empirical experiments show that IN E query processing time is 10 to 12 times more than that of P INE, depending on the density of the points of interest.Also, we show that P INE's performance is independent of the density and distribution of the points of interest, and the location of the query object.
The remainder of this paper is organized as follows.We review the related work on k nearest neighbor queries in Section 2. We then provide a review of the Voronoi diagrams and Dijkstra's algorithm, the basis of our proposed P INE approach in Section 3. In Section 4, we discuss our proposed P INE approach.Finally, we discuss our experimental results and conclusions in Sections 5 and 6, respectively.

Related work
Numerous algorithms for k-nearest neighbor (kN N ) queries are proposed.This type of queries is extensively used in car navigation systems, geographical information systems, shape similarity in image databases, etc.Some of the algorithms are aimed at m-dimensional objects and are based on utilizing one of the variations of multidimensional vector or metric index structures.Other algorithms are based on pre-calculation of the solution space or the computation of the distance from a query object to its nearest neighbors on-line and per query.Finally, there are approaches that support the exact k nearest neighbors' queries on spatial network databases.In this section, we consider each group in turn.
Some of the algorithms are aimed at m-dimensional objects and are based on utilizing one of the variations of multidimensional vector or metric index structures.The algorithms that are based on index structures usually perform in two filter and refinement steps and their performance depend on their selectivity in the filter step.These approaches can be divided in two group: 1) vector index structures [4,6,8,10,15,18], and 2) metric index structures [11,12,16,17].Vector index structures are approaches that are designed to utilize spatial index structures and aimed to minimize number of candidates, index nodes and disk accesses required to obtain candidates.There are two major shortages with these approaches that render them impractical for networks: 1) networks are metric space, i.e., the distance between two objects depends on the connectivity of the objects and not their spatial attributes, however, the filter step of these approaches is based on Minkowski distance metrics (e.g., Euclidean distance).Hence, the filter step of these approaches cannot be used for, or properly approximate exact distances in networks.2) These approaches do not propose any method to calculate the exact network distance between objects and the query for their refinement step, rather they assume that the distance function can be easily calculated.Metric index structures are approaches that are also based on a filter and refinement process, but as opposed to the vector index structures, they index and filter the objects considering their metric distance.The main disadvantage of these approaches is that they do not offer any solution on how to efficiently compute the distances between the query and the candidates (i.e., the same as the second shortage of the approaches based on vector index) which is required by the refinement step.
There are also other algorithms that are based on pre-calculation of the solution space or the computation of the distance from a query object to its nearest neighbors on-line and per query.Berchtold et al. in [13] suggest pre-calculating, approximating and indexing the solution space for nearest neighbor problem in m dimensional spaces.Pre-calculating the solution space means determining the Voronoi diagram of the data points.The exact Voronoi cells in m dimensional space are usually very complex, hence the authors propose indexing approximation of the Voronoi cells.This approach is only appropriate for first nearest neighbor problem in high-dimensional spaces.Jung et al. in [14] propose an algorithm to find the shortest distance between any two points in a network.Their approach is based on partitioning a large graph into layers of smaller sub graphs and pushing up the pre-computed shortest paths between the borders of the sub graphs in a hierarchical manner to find the shortest path between two points.This approach can potentially be used in conjunction with one of the approaches that are based on metric index, however, the main disadvantage of this approach is its poor performance when multiple shortest path queries from different sources are issued at the same time.
Finally, to the best of our knowledge, there are only two other approaches that support the exact kN N queries on spatial network databases (i.e., Papadias et al. in [5], and Kolahdouzan et al. in [9]).Papadias et al. in [5] propose a solution for nearest neighbor queries in network databases by introducing an architecture that integrates network and Euclidean information and captures pragmatic constraints.Their approach is based on generating a search region for the query point that expands from the query.This approach performs similar to Dijkstra's algorithm and the underlying data structures of the architecture are aimed to minimize number of disk accesses that are required to fetch adjacent links and nodes from the database.The advantages of this approach are: 1) it offers a method that finds the exact distance in networks, and 2) the architecture can support other spatial queries like range search and closest pairs.Since the number of links and nodes that need to be retrieved and examined are inversely proportional to cardinality ratio of entities and number of nodes in the network, the main disadvantage of this approach is a dramatic degradation in performance when the above cardinality ratio is (far) less than 10%, which is the usual case for real world scenarios (e.g., the real data sets representing the road network and different types of entities in the State of California show that the above cardinality ratio is usually between 0.04% and 3%).This is because spatial databases are usually very large and small values for the above cardinality ratio will lead to large portions of the database to be retrieved.This problem happens for large values of k as well.
Kolahdouzan et al. in [9] propose a solution for k nearest neighbor queries in spatial networks, termed V N 3 , which is based on the properties of the Network Voronoi diagrams (NVD).In addition, it uses localized pre-computation of the network distances for a very small percentage of neighboring nodes in the network to enhance query response time and reduce disk accesses.V N 3 iterative filter/refinement process is based on the fact that the Network Voronoi Polygons (NVPs) of an NVD can directly be used to find the first nearest neighbor of a query object q.Subsequently, NVPs' adjacency information can be utilized to provide a candidate set for other nearest neighbors of q.Finally, the pre-computed distances can be used to compute the actual network distances from q to the generators in the candidate set and consequently refine the set.The filter/refinement process in V N 3 is iterative: at each step, first a new set of candidates is generated from the NVPs of the generators that are already selected as the nearest neighbors of q, then the pre-computed distances are used to select "only the next" nearest neighbor of q.The advantages of this approach are: 1) it offers a method that finds the exact distance in networks, 2) fast query response time, and 3) progressively returns the k nearest neighbors from a query point (i.e., at each iterative step it computes the exact next nearest neighbor and the shortest distance to it).The main disadvantage of this approach is its need for pre-computing and maintaining two different sets of data: 1) query to border computation: computing the network distances from q to the border points of its enclosing network Voronoi polygon, and 2) border to border computation: computing the network distances from the border points of NVP of q to the border points of any of the other NVPs.

Background
Our proposed approach to address the nearest neighbor queries is based on both the Voronoi diagram and Dijkstra's algorithm.A Voronoi diagram divides a space into disjoint polygons where the nearest neighbor of any point inside a polygon is the generator of the polygon.Dijkstra's algorithm provides one the most efficient algorithm that finds shortest paths from the source node to all the other nodes.In this section, we review the principles of the Voronoi diagrams.We start with the Voronoi diagram for 2-dimensional Euclidean space and present only the properties that are used in our approach.We then discuss the network Voronoi diagram where the distance between two objects in space is their shortest path in the network rather than their Euclidean distance and hence can be used for spatial networks.A thorough discussion on Voronoi diagrams is presented in [2].Finally we discuss Dijkstra's algorithm.

Voronoi diagram
The Voronoi diagram of a point set P , V D(P ), is a unique diagram that consists of a set of collectively exhaustive and mutually exclusive Voronoi polygons (Voronoi cells), V P s.Each Voronoi polygon is associated with a point in P (called generator point) and contains all the locations in the Euclidean plane that are closer to the generator point of the Voronoi cell than any other generator point in P .The boundaries of the polygons, called Voronoi edges, are the set of locations that can be assigned to more than one generator.The Voronoi polygons that share the same edges are called adjacent polygons and their generators are called adjacent generators.Figure 1 shows an example of a Voronoi diagram [9], its polygons and generators.The following property holds for any Voronoi diagram and will be used later to answer kN N queries: "The nearest generator point of p i (e.g., p j ) is among the generator points whose Voronoi polygons share similar Voronoi edges with V P (p i )." [2,9].distance."[2,9].Spatial networks (e.g., road networks) can be modeled as weighted planar graphs where nodes of the graph represent the intersections and roads are represented by the links connecting the nodes.

Network Voronoi diagram "A network Voronoi diagram, termed NVD, is defined for graphs and is a specialization of Voronoi diagrams where the location of objects is restricted to the links that connect the nodes of the graph and distance between objects is defined as their shortest path in the network rather than their Euclidean
Assume a planar graph G(N, L) that consists of a set of nodes N = {p 1 , . . ., p n , p n+1 , . . ., p o }, where the first n elements (i.e., P = {p 1 , . . ., p n }) are the generators (e.g., points of interest in a road network), and a set of links L = {l 1 , . . . ,l k } that connects the nodes.Also assume that the network distance from a point p on a link in L to p i in N , d n (p, p i ), is defined as the shortest network distance from p to p i .For all j ∈ I n \{i}, we define: The set Dom(p i , p j ), called the dominance region of p i over p j on links in L, specifies all points in all links in L that are closer to p i or of equal distance to p j .The set b(p i , p j ), called bisector or border points between p i and p j , specifies all points in all links in L that are equally distanced from p i and p j .Consequently, the Voronoi link set associated with p i and network Voronoi diagram are defined as following respectively: where V link (p i ) specifies all the points in all the links in L that are closer to p i than any other generator point in N .Similar to V D defined in Section 3.1, elements of NV D are also collectively exhaustive and mutually exclusive except for their border points.Note that b is a set of points, which unlike Voronoi diagram in Euclidean space, cannot directly generate polygons.However, by properly connecting adjacent border points of a generator g to each other without crossing any of the links, we can generate a bounding polygon, called network Voronoi polygon, we term NV P (g), for that generator.Note that generation of NV P (g) only requires local network information, i.e., the links and nodes that are in the area between g and its adjacent generators are used to generate NV P (g).
An example of NVD [9] is shown in Fig. 2, where p 1 , p 2 , and p 3 are the generators.We can assume that the set of generators is the set of points of interest (e.g., hotels, restaurants, . . . ) and p 4 to p 16 are the intersections of a road network that are connected to each other by the set of streets L. The NVD of the graph where each line style corresponds to a Voronoi link set of a generator is shown in the same figure.Some links are completely contained in V link of a generator (e.g., the link connecting p 6 and p 9 is completely inside V link (p 1 )), while others are partially contained in different V link s (e.g., the link connecting p 4 and p 5 is divided between and contained in V link (p 1 ) and V link (p 2 ).The figure also shows how adjacent border points should be connected to each other: if two adjacent border points are between two similar generators (e.g., b 5 and b 7 ), they can be connected with an arbitrary line that does not cross any of the members of L. Three or more adjacent border points (e.g., b 2 , b 3 and b 5 ) can be connected to each other through an arbitrary auxiliary point (e.g., v in the figure).By using arbitrary lines and auxiliary points, NVPs will become non-unique.However, since objects in a graph can only be located on links, different NVPs will contain exactly identical Voronoi link sets and hence are unique in this respect.Moreover, as shown in the figure and unlike Voronoi polygons in the Euclidean space, common edges between two NVPs may contain more than two border points and are not necessarily straight lines.Despite this, properties 1 and 2 of Section 3.1.1are still valid for NVPs.

Dijkstra's algorithm
In a weighted graph G = (V, E) (otherwise known as a network); where V is a set of vertices, and E is a set of edges; it is frequently desired to find the shortest path between two nodes.Dijkstra's algorithm provides one the most efficient algorithm that finds shortest paths from the source node to all the other nodes.The main idea of the Dijkstra's algorithm is to maintain a set of vertices (S) whose final shortest-path weights from the source (s) have already been calculated, along with a complementary set of vertices (Q = V − S) whose shortest-path weights have not yet been determined.The algorithm repeatedly selects the vertex with the minimum current shortest-path estimate among Q.It updates the weight estimates to all vertices adjacent to the currently selected vertex (known as relaxation).The vertex is then added to the set S. It continues to do this until all vertices' final shortest-path weights have been calculated (i.e., until the set Q is empty).
kN N problem can be solved by first applying Dijkstra's algorithm to find the distance from the source node to all other nodes in the graph.Next, by sorting the nodes in an ascending order according to their distances from the source node, the kN N nodes can be identified as the top k elements of the sorted list.

kN N queries using network Voronoi diagram expansion
In this section we propose a new approach, termed Progressive Incremental Network Expansion (P INE), based on Dijkstra's algorithm.P INE is used to find the exact kN N of a query point using network Voronoi diagram and network expansion algorithm.It performs network expansion starting from the query point q and examines the interest points (i.e., NN s) in the order they are encountered.This approach is also based on the properties of the network Voronoi diagrams and pre-computation of the network distances for a very small percentage of the nodes in the network.
P INE reduces the problem of distance computation in a very large network, into the problem of distance computation in a number of much smaller networks plus some online "local" network expansion.The main idea behind our approach is to first partition a large network in to smaller/more manageable regions (using network Voronoi diagram).Next, we pre-compute the inter distances for each cell.This will reduce the pre-computation time and space by localizing the computation to cells and handful of neighbor-cell node-pairs.Unlike IN E [5], our expansion method utilizes the inter-cell pre-computed  distances to find the actual network distance from the query point to the objects in the surrounding area, hence saves on computation time.V N 3 [9] performance suffers with lower density data sets, because it has to access the pre-computed border-to-border (inter-cell) and query-to-border (intra-cell) distances stored with the polygons.To avoid this problem, with P INE, we only need to access border-to-border (inter-cell) distances stored with the polygons.

Network node types
To explain how P INE works, we need to distinguish between two different types of nodes.First node type (n Net ) represents the nodes that are inside NV P (q).These nodes are the original network map nodes (e.g., n 2 in Fig. 4).The second node type (n NV P ) represents the border points of a polygon.In the sequel we use BoP (e) to specify the set of border points of an entity e. n NV P can be either BoP (q) or BoP (NV P )s of the polygons that we will explore.These nodes are either original network map nodes or nodes generated to create NV D (e.g., b 3 , b 17 , . . .etc.).

Border to border computations
Border to border distance (inter-cell) computations are required to find the network distances from BoP (NV P (q)) to the border points of the NVP of any generator, BoP (NV P (g)).To this end, we precompute the point-to-point network distances between the border points of each NVP "separately".For example, this approach suggests that for the NVD shown in Fig. 4, the point-to-point network distances among {b 1 , . . ., b 8 } (corresponding to NV P (P 1 )) be pre-computed.It also suggests that the point-topoint network distances among {b 1 , b 2 , b 14 , . . ., b 19 } (corresponding to NV P (P 3 )) be pre-computed.Note that each border point (e.g., b 1 ) belongs to at least two NVPs (e.g., NV P (P 1 ) and NV P (P 3 )) and hence, its distances to all the border points of two NVPs must be pre-computed.The intuition for this approach is that once the point-to-point network distances among the border points of "each" NVP is computed, these distances can be used to find the network distances between the border points of "any" two NVPs.The other intuition is that this approach has low complexity with respect to both space and computation.The reasons are: 1) The pre-computation is only performed for the border points of each NVP separately, and in real world scenarios (as opposed to the example shown in Fig. 4), the ratio of the total number of the border points to the total number of the nodes in the network is small (see Section 5), and 2) the pre-computation is performed for each NVP separately and not across all NVPs, and the border points of each NVP are fairly close to each other.

P INE algorithm
P INE works as follows.First, it locates the polygon (NV P (q)) that the query point q belongs to using Contain() function, hence, retrieving the first nearest neighbor (the generator point of NV P (q)).This is based on Voronoi Property 2 mentioned in Section 3.1.1.Note that we do not have the distance information from the query point to that neighbor yet.Then, the links that cover q are located and the nodes of those links are placed on a priority queue P Q according to their network distance to q and are explored later.Next, the node closest to q (i.e., the node on top of P Q), e.g., c, is removed from P Q and nodes that are connected to c are retrieved from the database.Subsequently, the minimum possible network distance, d mnp , from q to these nodes are computed and the nodes are placed on P Q (if they are already on P Q, their locations on the queue are updated based on their new distances to q).During this progressive process, when we reach a n Net node we expand from it as IN E algorithm using the original network edges/nodes of NV P (q).However, when we reach a n NV P node connecting to a new polygon (NV P (j)) adjacent to NV P (q), then we do not need the original network edges/nodes of NV P (j).This is because, for any polygon we know how to reach (shortest path) from any border point to any other border point and how much it costs.Therefore, we consider that all the other border points of NV P (j) as the links that we can reach from n NV P and we add them to P Q to be explored later.In addition, we know the distance from any border point of NV P (j) (including n NV P ) to its generator point (interest point).Therefore, we compute the d mnp to that generator (not the actual shortest distance) and add the generator point for NV P (j) to our candidate k-1 interest points queue (Cand-NN).At any point during the algorithm execution, the nodes inside the Cand-NN are only candidate neighbors that we can reach so far and their order in the queue (ordered in ascending order according to distance).The order of the nodes in the queue may change at each execution step, if we were able to find a shorter path to reach the same node that is already in Cand-NN.Nodes can also be added or dropped deepening on their updated d mnp .This is because our algorithm depends on Dijkstra's algorithm that updates the distance estimates to all vertices adjacent to the currently selected vertex (known as relaxation process).Only at the end of the progressive process, Cand-NN would have the next k-1 interest points of q ordered from top to bottom.We recursively apply the above algorithm and terminate it when we have the queue Cand-NN filled with k-1 neighbors and the distance from q to any element of P Q is greater than the distance to reach any node in Cand-NN.This means, from any point in P Q, we cannot reach another neighbor with a distance shorter than any of the k-1 interest points already discovered.See Fig. 3 for the complete P INE algorithm.

Analysis
This approach is more suitable for large networks with a small number of points of interest (i.e., a large value for n/mb; m points of interest, n nodes in the network, b connected nodes on one disk block).The approach does not have to retrieve a large portion of the network data (edges/nodes) before the distance from q to BoP (NV P (q)) can be computed.It only retrieves the part of the network in the direction that it will explore next, and delays the exploration of the rest of the network, until it is needed.
With IN E algorithm, if the k interest points cover a large spatial area, then the algorithm would have to explore the exact network edges and nodes for that area (i.e., a large number of nodes and edges).However, in our approach, the search area is divided into a set of network Voronoi polygons.Hence, to find the first interest point, we only need to explore the exact network edges and nodes of the polygon area that contains the query point q (NV P (q)).Then, to find the next k-1 interest points, we utilize the saved information stored with the polygons (i.e., shortest path between all border points of each polygon) instead of exploring the exact network edges and nodes of the rest of the polygons.
Our preliminary experiments show that the total number of border nodes of an NV P is much smaller than the number of actual network nodes inside NV P .Therefore, P INE would access fewer network nodes and links to explore the same spatial area that IN E would explore.Hence, we would have to compute fewer distances online than IN E. In addition, it boosts the performance since it eliminates the need for executing complex algorithms for distance computations in the adjacent polygons to NV P (q), rather, the distances can be computed from a lookup table in one disk block access.The disadvantage of this approach is the requirement for an off-line process to pre-calculate and store the above network distances.

Performance evaluation
We conducted several experiments to: 1) compare the performance of P INE with its competitor, the IN E approach presented in [5], and 2) evaluate the overhead of the pre-computations for P INE, and 3) compare the performance of P INE with that of V N 3 .We used real-world data set obtained from NavTech Inc., used for navigation and GPS devices installed in cars, and represent a network of approximately 110,000 links and 79,800 nodes of the road system in the downtown Los Angeles.The experiments were performed on an IBM ZPro with dual Pentium III processors, 512 MB of RAM, and Oracle 9.2 as the database server.We present the average results of 1000 runs of k nearest neighbor queries where k varied from 1 to 500.

P INE Vs. IN E
Our experiments show that the total query response time of P INE is up to one order of magnitude less than that of IN E. Table 1 shows the results of comparing query response time between P INE and IN E approach proposed in [5].The first and second columns specify the entities (or points of interest) and their population and cardinality ratio (i.e., number of entities over number of links in the Depending on the density of the entities, the time incurred by IN E to retrieve the network from the database is between 10.1 (for high densities and larger ks) and 12.4 (for low densities and higher values of k) times more than that incurred by P INE.This is because for lower densities of entities, IN E requires larger portion of the network to be retrieved.For example, while there are only 340 links retrieved from the database to find the 10 closest restaurants to a query, 17900 links (equal to 16% of the network) need to be retrieved to find the 10 closets hospital to the same query object.Note that IN E does not retrieve the required links in one step, rather, only a small number of links are retrieved from the database at each step.Note that P INE also requires pre-computed values to be retrieved from the database, and the number of required pre-computed values increases for lower densities of the entities and larger values of k.However P INE retrieves the required data in only one step, resulting in much faster data retrieval time.
Table 2 shows the overhead incurred by the pre-computations required by P INE.As shown in the table, for entities with higher densities (e.g., restaurants) which generate smaller and more number of NVPs, the average number of nodes inside each NVP and number of border points per NVP are less.This will lead to faster pre-computation process since the pre-computations are performed in smaller size local areas.The third column of the table shows the total number of border-to-border pre-computations, which is almost constant for entities with different densities.This is because when there is more number of NVPs (e.g., restaurants), the average number of border points is smaller and when there is less number of NVPs (e.g., hospital), the average number of border points is larger.
Figure 5 depicts the performance of P INE with respect to the size of the candidate set when kN N queries are performed for different entities.For each value of k (x-axis) we performed 1000 queries where the location of the query point is randomly selected, and we averaged the results.Two observations can be made from the figure.First, the ratio of the size of the candidate set over k (SKS/k) decreases as k increases.For example, while 7 candidates are selected when k = 3 (2.33 times the value of k), only 31 candidates are selected when k = 20 (1.55 times the value of k).The figure also shows that for large values of k, the size of the candidate sets become very close to k.The reason for this is that as k increases, once a generator g is explored, the possibility that some of its adjacent generators have already been explored (already in the candidate NN queue) and no longer need to be examined increases.This is a very important feature of P INE since for large values of k, the average number of points of interest that must be examined significantly decreases.The second observation is that the P INE behaves independently from the density of the points of interest and their distribution in the network.For example, while Restaurants have a cardinality ratio of almost 5 times the Parks, the difference between the corresponding generated candidate sets is only 5.58% (for k = 500) to 4.2% (for k = 3).This means that whether the points of interest are very dense or sparsely scattered in the network, the performance of P INE does not change.This is because the average number of adjacent generators is "independent" of the density of the points of interest, their distribution, and the underlying network (see [2] for further details).

P INE Vs. V N 3
Our experiments show that the query response time of P INE is about one half of that of V N 3 .Table 3 shows the results of comparing query response time between P INE and V N 3 approach proposed in [9].As shown in the table, when k = 1, and regardless of the density of the entities, both P INE and V N 3 generates the result set almost instantly.This is because a simple Contain() function is enough to find the first NN.However, P INE has a slightly larger CPU and disk times.This due to the fact that with V N 3 method the distances from each border point to all the nodes inside the polygons that contain the border point are pre-computed in an off-line process.With P INE the distance Contain() returns the generator point of NV P (q) (first NN) but not the distance to it.Hence, Dijkstra's method is applied locally using the local edges and nodes to compute the distance to the first NN.However, for all other values of k and for different data densities, P INE outperforms V N 3 in most cases in terms of CPU processing and disk times (see Fig. 7).V N 3 CPU time tends to be on average 8.25 times more than that for P INE, and V N 3 DISK time tends to be on average 2.7 times more than that for P INE.V N 3 DISK time is higher because it has to access the pre-computed border-to-border (inter-cell) and query-to-border (intra-cell) distances stored with the polygons, while P INE has to access only border-to-border (inter-cell) distances stored with the polygons.V N 3 CPU time is high because V N 3 updates the distance from the query point to the

Entities
Query processing time (sec.)  4 shows the overhead incurred by the pre-computations required by V N 3 .The third column of the table shows the total number of border-to-border pre-computations (inter-cell computation), which is exactly the same as for P INE (shown in Table 2).The fourth column shows the extra computations (intra-cell computations) that are required for V N 3 .
Using Tables 2 and 4, we compare the total required pre-computations for both P INE and V N 3 .One observation can be made is that V N 3 requires on average a larger number of pre-computations than P INE (due to the need for the extra intra-cell computations), however, this ratio reduces as the density of the data set increases.This is because for entities with higher densities (e.g., restaurants), which generate smaller and more number of NVPs, the average number of nodes inside each NVP and number of border points per NVP are less.This will lead to faster pre-computation process since the pre-computations are performed in smaller size local areas.For example, for the hospital data set, V N 3 requires 38.8 times the number of computations required by P INE.While V N 3 requires only 6.5 times the number of computations required P INE for the restaurant data set (a higher density set).
The distance pre-computations are usually performed offline, hence it should not affect the overall performance of V N 3 .However as shown in Table 3, the time required by the database to retrieve the

Conclusion
In this paper we presented a novel approach for k nearest neighbor queries in spatial network databases.Our approach, P INE, is based on: pre-calculating the network Voronoi polygons (NVP), pre-computing some network distances, and Dijkstra's algorithm.We showed how NVPs could immediately be used to find the first nearest neighbor of a query object.We also showed how the pre-computed distance expedites the process of finding the other nearest neighbors.The main features of P INE are as follow: The time required by the database to retrieve the links from network is the dominant factor and the CPU times are almost negligible.Hence, we can conclude that the total query response time of P INE is up to 11.1 times faster than that of IN E, and up to 2.7 times faster than that of V N 3 .P INE's algorithm results in up to 2.33 times less number of candidates as compared to that of the traditional approaches.In addition, the size of P INE's candidate set has less variance across different query point locations and densities of the points of interest.Consequently, the query response time becomes more deterministic, which is an important feature for many real-time kN N query applications.
As with V N 3 , the pre-computation required by P INE has low computation and space complexities due to performing the pre-computations in local areas as opposed to across the entire network.V N 3  and IN E progressively return the k nearest neighbors from a query point (i.e., at each iterative step it computes the exact next nearest neighbor and the shortest distance to it), which is vital for an interactive real time system such as navigation system.While with P INE, the k nearest neighbors is returned at the end of the searching algorithm.However, during each step it provides some candidates of k nearest neighbors, which is useful for systems where the exact NNs are not required.We plan to extend P INE to address similar kN N queries such as group kN N , constraint kN N , and finding the actual shortest path between a query and its closest neighbors, as our future work.

Fig. 7 .
Fig. 7. Performance of P INE VS.V N 3 for restaurants data.links from network online is the dominant factor and the CPU times are almost negligible.Since V N 3 requires larger amount of pre-computed values to be retrieved from the database than P INE, therefore P INE outperforms V N 3 .See Fig. 8 for a complete comparison between P INE, IN E, and V N 3 in terms of CPU, Disk and total query processing time for Restaurants data.

Fig. 8 .
Fig. 8. Performance comparison of P INE, INE, and V N 3 .1) P INE's CPU time outperforms both IN E and V N 3 , the only other approaches proposed for kN N queries in spatial network databases.It outperforms IN E with a factor of 5.41, and V N 3 with a factor of 8.25 depending on the value of k and density of the points of interest, 2) P INE's DISK time outperforms both IN E and V N 3 .It outperforms IN E with a factor of 11.1, and V N 3 with a factor of 2.7 depending on the value of k and density of the points of interest.The time required by the database to retrieve the links from network is the dominant factor and the CPU times are almost negligible.Hence, we can conclude that the total query response time of P INE is up to 11.1 times faster than that of IN E, and up to 2.7 times faster than that of V N 3 .P INE's algorithm results in up to 2.33 times less number of candidates as compared to that of the traditional approaches.In addition, the size of P INE's candidate set has less variance across different query point locations and densities of the points of interest.Consequently, the query response time becomes more deterministic, which is an important feature for many real-time kN N query applications.As with V N 3 , the pre-computation required by P INE has low computation and space complexities due to performing the pre-computations in local areas as opposed to across the entire network.V N3  and IN E progressively return the k nearest neighbors from a query point (i.e., at each iterative step it computes the exact next nearest neighbor and the shortest distance to it), which is vital for an interactive

Table 1
Query processing time of P INE vs. INE

Table 2
Overhead of P INE pre-computations Entities Points inside each NVP Average BPs per NVP Number of Pre-comp.Note that for the given data set, restaurants and hospitals represent the entities with the maximum and minimum cardinality ratios.As shown in the table, when k = 1, and regardless of the density of the entities, P INE generates the result set almost instantly.This is because a simple Contain() function is enough to find the first NN.However, depending on the density of the entities, IN E approach requires between 0.49 to 12.4 seconds to provide the first NN.For all values of k and for different data densities, P INE always outperformed IN E in terms of CPU processing times (values inside "( )") (see Fig. 6).IN E CPU time tends to be on average 5.42 times more than that for P INE, and IN E DISK time tends to be on average 11.11 times more than that for P INE.This is because IN E explores a larger set of the exact network edges and nodes, while P INE utilizes the pre-computed distances stored with the polygons.It is obvious from Table 1 that the time required by the database to retrieve the links from network is the dominant factor and the CPU times are almost negligible.Hence, we can conclude that the total query response time of P INE is up 11.1 times faster than that of IN E.

Table 3
Query processing time of P INE vs. V N3 INE only the border-to-border distances for the newly explored neighbor (one polygon) is updated in the Candidate queue.It is obvious from Table3that the time required by the database to retrieve the links from network is the dominant factor and the CPU times are almost negligible.Hence, we can conclude that the total query response time of P INE is up 2.7 times faster than that of IN E.Table