^{1}

^{2}

^{1}

^{1}

^{2}

The paper focuses on the design of an optimum method for handling the continuous skyline query problem in road networks. Existing studies on processing the continuous skyline query focus exclusively on static road networks, which are limited because the state of roads in road networks is constantly changing. Therefore, to apply current methods for dynamically weighted road networks, a distributed skyline query method based on a grid partition method has been proposed in this paper. The method adopts the concepts of a distributed computing framework and road network preprocessing computations in which multiple parallel computing nodes are allocated and organized in grids. Using this approach, the road network map is simplified to a hub graph with much smaller scale such that the query load of the central node can be significantly reduced. The theoretical analysis and experimental results both indicate that, by using the proposed method, the system can achieve quick response time for users as well as a good balance between response times and accuracy. Therefore, it can be concluded that using the proposed method is beneficial for handling continuous skyline queries in a dynamically weighted road network.

Providing location-based services (LBSs) [

The skyline query was first introduced into the database domain in 2001 by Borzsonyi et al., who proposed two basic skyline query algorithms which are block nested loops and divide and conquer (D&C) [

More recently, researchers have studied the continuous skyline query problem of moving objects in road networks from a variety of perspectives. In [

In the skyline query process for road networks, the required computation of shortest path between moving objects occupies the main portion of the calculations. Dijkstra’s algorithm [

In practice, there are two key challenges when conducting queries using the skyline-based moving object recommendation approach. First, data of moving object is collected by many sensors distributed throughout the road network. Using a traditional method of centralized data processing, large amounts of resources will be wasted in data transmission, thus causing low efficiency. Second, such a centralized processing approach is inadequate for real-time and on-demand application because of the huge numbers and widely scattered moving object data. Finally, road networks are often dynamic rather than static; therefore, the edge weights usually depend on timing because of traffic congestion and road maintenance work being carried out. In general, it is not possible to rely on any precomputation of road network data for determining an optimum route between vertices. Alternatively, computation must be based on real-time data. However, any computation based on a single point will be inadequate in meeting the real-time requirement because of the vast amount of data acquired from road networks.

Summarizing the existing skyline query methods for road network, there are two ways for calculating the shortest distance between moving objects in the skyline query process. The first approach is on-the-fly, that is, to identify the two objects as two vertices in the road network and then to use a shortest path algorithm (e.g., Dijkstra’s algorithm) to perform direct calculations. This method can also be applied to a dynamically weighted road network, but the required computation rapidly increases as the scale of the road network increases. The second approach is to precompute and store the shortest distance between vertices within the road network. When querying, the shortest distance between two objects can be obtained by querying the shortest distance between the nearest neighbor vertices of each object. This approach results in faster query response times, but it cannot be applied to a dynamically weighted road network because of the huge cost from precomputation. Therefore, in this paper, grid-skyline query (Gsky) is proposed to strike a balance between the above two methods in large-scale dynamically weighted road networks.

Gsky is based on dividing a dynamic road network into small manageable units of grids. Using the proposed method, the entire road network is divided into a finite number of grids with a computing node allocated within each grid. This computing node is responsible for collecting and updating information about all moving objects and maintaining a localized distance for all vertices of each grid. Subsequently, a central node is placed for the entire system. When queries are submitted, the central node will gather the required data of moving object and information of localized distance from relevant computing nodes and compute the distance between the moving objects in real time. Finally, users will receive the updated skyline set from Gsky based on changes in distances between moving objects.

Gsky handles moving object information with distributed computing nodes. Therefore, the central node is only activated when there is a query; also, only the interested moving object will be involved in the computation. By using this approach, we can avoid a large amount of data transmission. Generally, parallel computation is performed by computing nodes to update the distances between local vertices. The required information is then fed to the central node only when there is a query. Therefore, the computing workload of the central node should be significantly reduced and managed. Hence the central node only needs to maintain the topology structure of the grids, thereby reducing data maintenance for the central node.

A road network can be abstractly defined as weighted undirected graph G(V, E, W), where vertex set V denotes all cross-points in the road network, w represents the length of each road with the consideration of traffic congestion, with w

Example road network graph G with four grid partitions.

A moving object is defined as a point p(m, e, pos) moving along the edge of graph G, where m represents the moving object’s nonlocational attributes (i.e., symbol, type, and rating), e represents the edge, and pos identifies the distance between the starting points and the moving object on the edge. Hence, according to certain rules, one end of the road can be set as the starting point.

The entire road network is divided into grids, each with an equal size N × M. Each grid is defined as g(_{1}–g_{4}.

Assume that a moving object is identified as q(m, e,

Theorem

The distance between p and q is defined as the length of the shortest route between p and q in G; this distance is denoted as d(p, q).

For graph G, if

Example hub graph corresponding to the road network presented in Figure

The shortest route between any two vertices in a hub graph corresponds to the shortest route in the original road network.

Consider the situation in which the shortest route between any two vertices in the hub graph will not include any virtual edges. On the basis of Definition

The shortest route between any two points in a hub graph traverses no more than one virtual edge in the same hub.

Assume that the shortest route between vertices p and q in hub graph _{1},m_{2}) + ⋯ + _{1},

From Definition

Nonlocational attributes which the user is interested in together with the distance d between the moving object and the user’s position form attribute space A and define data point p(a1,a2,…an,d) in A, with a1,a2,…an,d

Given that set C contains information that the user is interested in corresponding to a given road network, all data which are not dominated by other data in C form a skyline set, denoted as SKY(C) =

The overall structure of Gsky approach is presented in Figure

Schema of our Gsky approach.

The procedure is summarized as follows. When there is no query task, each computing node updates locational information for moving objects on a regular basis, also computing and updating distance information between points depending on changes in relevant weights. Once a query is initiated, the central node requests regional location information for the interested moving object from the computing node and then processes this global location information. Next, the central node computes the distance between each interested moving object and the user’s position by gathering information of the virtual edges in each relevant hub from computing nodes. Eventually, the central node generates a skyline set based on the distances between the interested moving objects and the user’s position, as well as nonlocational attributes of the moving objects; finally, this skyline set is returned to the user.

The distributed computation structure is based on an approach known as MapReduce [

Maintaining location information for a moving object includes processing for both regional and global location information. The computing node within each grid is responsible for updating regional location information for each moving object. Here, maintaining each data node is based on the road information in the grid within which the computing node is located. According to Definition

When a query is submitted by a user, the central node requests the global location information from the computing nodes. The global location information of the moving objects can be described as (m, e, pos). According to Theorem

For example, the system allocates a computing node to maintain the moving object data in g3 shown in Figure

These calculations are processed via MapReduce only once, with the detailed algorithm presented as Algorithm

to mapper:

P.get(); // read object assigned to current mapper from input cache

foreach p in P

P.send(); // send the result to output cache

to reducer:

P.get(); // gather objects from each mapper

If the position of a user is fixed on a given road network and the moving objects move with a constant speed, in order to obtain the skyline set which is interested by the user, it is necessary to process the distance between them in the road network. If the location of the user and the interested moving objects are viewed as vertices in the road network, we can then use Dijkstra’s algorithm [

More specifically, assume that we have two random points, p and q, both on roads ab and cd. If the starting point of road ab is point q and the starting point of road cd is point c, then the road lengths by a broad definition of distance for ab and cd are

r_{1}(p,

r_{2}(p,

r_{3}(p,

r_{4}(p,

The shortest length of the four routes is represented as follows:

d(r_{1}) = p.pos + d(a,c) + q.pos;

d(r_{2}) = p.pos + d(a,d) + (

d(r_{3}) = (

d(r_{4}) = (

Therefore, d(p, q) = min(d(r_{1}), d(r_{2}), d(r_{3}), d(r_{4})), which indicates that the distance between two points p and q can be obtained by calculating the distance between the endpoints of the two routes.

The distance between two random points in the road network can be processed by the central node in the corresponding hub graph. The length of the bridge edge in the hub graph is regularly updated by the central node, whereas the length of the virtual edge is regularly updated by the computing nodes. When calculating distance, the central node will start from one end using Dijkstra’s algorithm and stop once the algorithm reaches the other endpoint; further, when the length of the route needs to be compared, MapReduce will collect length information for virtual edges from the computing node. When calculating the optimum route, Theorem

For example, suppose that in the road network shown in Figure

Algorithm

to reducer:

S.include(s); // define set S,add s to S

T=V-S; // set V include all vertices in the road network

while(T)

to mapper:

Vedge.get(); // get virtual edges from input cache

Vedge.length=GetLength(Vedge.id);

Vedge.sent(); // sent length of virtual edges into output cache

Finally, note that the computing nodes should update the shortest path tree between each of the nodes and provide length information for the virtual edge back to the central node as necessary. Given the ever-changing weights, the shortest path tree must be reprocessed whenever there is an update; therefore, it is time-consuming and may also cause outdated information to be fed from the computing nodes to the central node. Therefore, instead, an incremental updating algorithm [

Once the system finishes computing the distance between the interested moving objects and the user’s position, the central node updates the skyline set using the distance attribute and other nonlocational attributes. Often, the numbers of interested moving objects are limited; therefore, it is not necessary to use skyline query algorithm with index. Among nonindexing skyline algorithms, the sort-filter-skyline (SFS) algorithm [

Because the nonlocational attributes of p and q do not change as they move, their relationship will not change. Further, if the relationship between d of p and q remains the same, then relationship between all attributes of p and q will not change. Therefore, on the basis of Definition

To use pruning rule 1 to modify the SFS algorithm, the system establishes two global data tables which are M_{1} and M_{2}. M_{1} maintains dominant relationship of data points whereas M_{2} maintains the relationship between distance attribute of data points. In the SFS algorithm, all data is arranged first by the distance attribute dimension; at the same time, the system updates M_{2} and extracts data for p from the pending skyline set by sorting from the shortest distance to the longest distance. This process consists of the following three steps.

(1) Query M_{2}: if the relationship between the distance attribute of p and the point _{1}; otherwise, compute the dominant relationship between p and _{1}.

(2) If there is a point dominating p in SKY, discard p; otherwise, add p to SKY.

(3) Return SKY when all of the points have been processed.

Algorithm

_{1}, comparison of distance M_{2.}

A.SortByDistance(); // sort points in A by distance attribute

for(i=1;i<=A.MaxNum;i++) // updateM2

SKY.include(A

for(i=2;i<=A.MaxNum;i++) // compute SKY

return SKY;

The majority of the computational cycles for Gsky are concentrated on calculating the distance between the moving objects and the user. Therefore, this subsection focuses on describing how the calculation is actually processed. For the convenience of the discussion here, the road network is defined as an n × n grid network, divided with M square grids of the same size, where 1 ≤ M ≤ n^{2}. Further, assume that there are N vertices in the road network. Given the above, then

Here,

Bridge vertices are the outermost crossing points, identified as hollow circles in Figure

Example of bridge vertices (i.e., the hollow circles) within a grid (i.e., the bounding box).

Here, assuming that the number of bridge vertices within each grid is

Taking the example with road network shown in Figure

Further, assuming that the total number of bridge vertices in the hub graph is

Next, assuming that the number of interested moving objects is P, the computational capacity for a single computing node is

Next, on the basis of (^{3}); for the central node, the computational complexity is O(PMN).

Communication in the system consists of two key parts. First, there is the traffic distribution (denoted by D_{1}) of the central node which collects information from the computing nodes within each grid. This portion of the traffic distribution is equal to the number of interested moving objects; i.e., D_{1} = P. Second, there is the traffic distribution originating from queries of the distance between the interested moving objects and the user. Here, the central node requests information regarding the virtual edges from computing nodes; this portion of the traffic distribution is denoted as D_{2}. During the computation, the central node acquires the number of virtual edges with the maximum number of all virtual edges in all grids M, performing such collections for P times. Assuming that the number of bridge vertices in each grid is

Given (

From (^{2}) for the road network. Therefore, when P ≤ N, the traffic through the system does not exceed the scale of data for the road network.

From Section ^{2}]. When M = 1, the computational complexity is O(PN) for the central node and O(N^{3}) for each computing node. Therefore, in this case, Gsky degenerates into a precomputing algorithm on a single computing node. Conversely, when M = n^{2}, the complexity of computation for the central node is O(PN^{2}) and O(1) for each computing node. Therefore, in this case, Gsky degenerates into a Dijkstra algorithm running on a single central node. To properly balance the computational load over the central node and the computing nodes, a suitable value of M must be determined. Assuming that the computational capacity of the data processing center is similar to that of the computing nodes, then it will achieve this balance when the computational load for the central node is equal to that for each computing node; i.e., from (

From (

Taking the example with road network shown in Figure

Further, the distribution of vertices is not homogenously distributed in real road networks. Therefore, to balance the computational load between the nodes, the road network may be divided into grids unevenly such that each grid contains a similar number of vertices. Balanced load of nodes can maximize the efficiency of the entire system.

For the experiments, six PCs connected using 100 M Ethernet, each with a 2 GHz Intel Core 2 processor, a 2 GB RAM, a 260 GB hard drive, and the CentOS 6.2 operating system, were used. The system used Hadoop Online Prototype (HOP) [

For the experiments, two basic algorithms are used to serve as comparisons for Gsky. First, the Dsky algorithm corresponds to the direct calculation method described in the Section

Note that managing moving objects when applying Dsky and Psky occurs through the distributed computing system proposed in our work; however, distance calculation and preprocessing computation are both performed in the central node.

The data used in the experiment were based on a real road network from Beijing [

Accuracy and response times were two criteria used to determine the performance of each method. Changes to these two criteria were observed while we changed the scale of the road network (i.e., increasing the number of vertices) for all three algorithms. Average results were based on a 10-time test for each criterion.

Further, both synchronous and asynchronous modes were used to evaluate Gsky and Psky. For the synchronous mode, before processing began, the information of virtual edges and the shortest path tree were updated to help in achieving a high level of accuracy. Conversely, with the information of virtual edges and the shortest path tree being periodically updated, the asynchronous mode, before processing the data, does not consider whether the data is in the latest state and directly uses the current state of data for distance computation to ensure that response times are reasonable.

Figure

Query response times at different road network scales in synchronous mode (comparison of basic algorithms).

Figure

Query response times at different road network scales in asynchronous mode (comparison of basic algorithms).

Figure

Accuracy at different road network scales in asynchronous mode (comparison of basic algorithms).

In summary, in synchronous mode, the response times for Gsky increased steadily as the scale of the road network increased, indicating that Gsky can perform well even in relatively large-scale road networks. Therefore, we can conclude that Gsky is the best method of the three evaluated methods in synchronous mode. In asynchronous mode, the response times for Gsky were not as short as in Psky, and the accuracy of Gsky was not as good as Dsky; however, overall, Gsky still performed the best when we consider both criteria. Therefore, we conclude that Gsky is the most suitable for the real-world application.

In order to compare the performance of Gsky and existing methods, two representative existing algorithms have been implemented under the experimental environment above. The first one is the LBC algorithm in [

For LBC, the query points have been set to 1, and one computing node has been set for maintaining moving objects and road network data. The computing node is also responsible for maintaining the

Figure

Query response times at different road network scales in synchronous mode (comparison of existing algorithms).

Figure

Query response times at different road network scales in asynchronous mode (comparison of existing algorithms).

Finally, Figure

Accuracy at different road network scales in asynchronous mode (comparison of existing algorithms).

In summary, compared with the existing algorithms, the Gsky algorithm can maintain reasonable response time and balance the response time with high levels of accuracy in large-scale dynamically weighted road network.

In this paper, Gsky algorithm was proposed to process moving objects in a dynamically weighted road network. Using the approach, the road network is first divided into grids such that it can be simplified and treated as a small-scale hub graph. In a hub graph, the skyline set of moving objects is then processed and recommended to users. This method has the advantage of supporting dynamically weighted road networks and applies a distributed computing structure, thereby distributing very large computing load from vertices to computing nodes to reduce the computing load of the central node and improve query response times. In the work here, the system applies a new computing-focused-around-data approach in which a local road network is managed locally and information is collected only when necessary such that traffic throughout the system is substantially reduced. Through the work, analysis and experiments show that, compared to current methods, the Gsky algorithm and approach can maintain reasonable response time and balance the response time with high levels of accuracy, even in very large-scale dynamically weighted road networks.

In Gsky algorithm, since the vertices are not homogenously distributed in real road networks, dividing road network into grids will make the number of vertices in each grid inhomogenously distributed. This leads to the unbalanced load of the computing nodes and reduces the efficiency of the system. The future research will focus on finding a better road network partition method, which can make a more balanced partition of the road network, and apply the method to the presented distributed system.

A road network which is defined as a weighted undirected graph

Vertex set of G

Edge set of G

Length of each edge

Positions of moving objects which are defined as points on edges

The moving object’s nonlocational attributes

Vertices in V

Edges in E

Lengths in W

Length between the starting points and position of the moving object on the edge

Grid

Vertices which falls in grid g

All edges passing through grid g

Starting points of edges which is defined in grid g

Points in

Global starting point of e

Length between s and

Set of

Length between

The length of the shortest route between p and q in G

Hub graph of G

Bridge vertices

Set of

Bridge edges

Set of

Virtual edges

Set of

Length of bridge edge

Set of

Length of virtual edge

Set of

Route from p to q in G

Data points

Attribute space

Attributes in A

Set of P

Skyline set of C

Number of grids in the grid network

Number of grids which divide the grid network

Number of vertices in the grid network

Number of vertices in one grid

Number of bridge vertices in one grid

Number of bridge vertices in in the hub graph

Computational capacity for a single computing node

Computational capacity for the central node

Traffic distribution.

Location-based services

Global positioning system

Divide and conquer

Collaborative expansion

Euclidean distance constraint

Lower bound constraint

Grid-skyline query

Sort-filter-skyline

Hadoop online prototype

Direct skyline query with Dijkstra’s algorithm

Skyline query with precomputing

Skyline query with shortest range.

The datasets analyzed during the current study are available in the website

The authors declare that there are no conflicts of interest regarding the publication of this article.

The authors would like to thank the University of Shanghai for Science and Technology and Shanghai University of International Business and Economics for supporting this work and Dr. Nian X. Zhang for translating the manuscript. This work was supported by the National Natural Science Foundation of China (No. 61170277 and No. 61472256). The authors would like to thank Enago (www.enago.cn) for the English language review.