An Efficient Algorithm for Maximizing Range Sum Queries in a Road Network

Given a set of positive-weighted points and a query rectangle r (specified by a client) of given extents, the goal of a maximizing range sum (MaxRS) query is to find the optimal location of r such that the total weights of all the points covered by r are maximized. All existing methods for processing MaxRS queries assume the Euclidean distance metric. In many location-based applications, however, the motion of a client may be constrained by an underlying (spatial) road network; that is, the client cannot move freely in space. This paper addresses the problem of processing MaxRS queries in a road network. We propose the external-memory algorithm that is suited for a large road network database. In addition, in contrast to the existing methods, which retrieve only one optimal location, our proposed algorithm retrieves all the possible optimal locations. Through simulations, we evaluate the performance of the proposed algorithm.


Introduction
With the widespread use of mobile computing devices [1][2][3][4][5][6][7], location-based services [8] have attracted much attention as one of the most promising applications whose main functionality is to process location-related queries on spatial databases. Most traditional research in spatial databases have focused on finding nearby data objects (e.g., range queries, nearest neighbor queries [9], etc.), rather than finding the best location to optimize a certain objective. Recently, a maximizing range sum (MaxRS) query was introduced in [10]. This query is useful in many location-based applications such as finding the most representative place in a city with a limited reachable range for a tourist or finding the best location for a pizza store with a limited delivery range. Given a set of positive-weighted points and a query rectangle (specified by a client) of a given size, the goal of a MaxRS query is to find the optimal location of such that the sum of the weights of all the points covered by is maximized. Figure 1 shows an example of the MaxRS query, where the size of the query rectangle is × and all the points are assumed to have the same weight and be equal to 1. In the figure, the center of the solid-lined rectangle is the optimal location of because the solid-lined rectangle covers the largest number of points (i.e., 3).
To process MaxRS queries, Choi et al. [10] proposed an external-memory algorithm, while Imai and Asano [11] an internal-memory algorithm. Tao et al. [12] proposed the solution for approximate MaxRS queries, each of which retrieves a rectangle whose covered-weight is at least (1 − ) * , where * is the optimal covered-weight and is an arbitrary constant between 0 and 1. All of these studies aim at Euclidean spaces. In many real-life location-based services, however, the motion of a client may be constrained by an underlying (spatial) road network; that is, the client cannot move freely in space. Consider the scenario of a tourist service as an example, where a tourist (i.e., client) tries to find the hotel whose location is close to as many sightseeing spots as possible (e.g., maximum is 1.5 km walking from the hotel). In this scenario, a MaxRS query can be applied. However, the existing MaxRS query processing methods cannot be applied in this scenario because the distance between the hotel and each sightseeing spot is confined by the underlying (spatial) road network, and thus the actual distance between   two locations can differ significantly from their Euclidean distance. We can see this significant difference in Figure 2, where the Euclidean distance between 2 and 4 is about 1.24, while for moving from 2 to 4 in real-life, we must pass through V 5 and V 4 with total length around 3.74, which is three times farther than Euclidean distance. With this problem in mind, we study, for the first time to the best of our knowledge, the problem of processing MaxRS queries in a road network, where the distance between two points is determined by the length of the shortest path connecting them (i.e., network distance [13]). Figure 2 shows an example of the road network, which consists of 5 nodes (square vertices) and 7 edges. In the figure, there are 4 facilities (weighted points), each of which, denoted by , is associated with a positive weight ( ) indicating the importance of . The numbers that appear in parenthesis next to nodes and facilities show their respective coordinates. Note that it is assumed in this paper that all the facilities must be located on edges of the road network. Then, a MaxRS query in a road network is defined as follows. Given a set of facilities and a radius , the MaxRS query finds all the locations (on a road network), which maximizes the total weights of all the facilities whose network distance to is less than or equals .
In the case of road network in Figure 2, we have an example of MaxRS query with the radius 1.5 (km) in Figure 3 (the weight of each facility is 1). The distance between each point in the stage to three facilities 1 , 2 , and 3 is less than or equal to 1.5. And the total weight of all the facilities whose network distance to all points of stage is less than or equals 1.5 is 3, which is maximum in this scenario. Then, stage is an optimal result in this MaxRS query and user can choose any hotel on this stage.
In this paper, we propose the external-memory algorithm for MaxRS queries in a road network. The proposed algorithm is suitable for a large road network database. In addition, in contrast to the existing methods, which find only one optimal location, our proposed algorithm finds all the possible optimal locations. This can help clients of diverse interests choose their own best locations by considering other additional conditions. The remainder of this paper is organized as follows. In Section 2, the problem is formally defined, and in Section 3, the details of the proposed algorithm are provided. In Section 4, the performance evaluation results are presented. In Section 5, some related work is reviewed. Finally, Section 6 concludes the paper.

Problem Formulation
A road network is represented by an undirected graph = ( , ), where is a set of vertices (i.e., nodes) and is a set of edges. Let be a set of facilities, each of which, denoted by , is located on an edge (in ) and is associated with a positive weight ( ).
Definition 1 (network range and network radius). Network range ( ) of a point in a road network consists of all points (in the network) whose network distance to is less than or equals the value , where is called the network radius of . (1) the rectangle intersection query discussed in [14], which is the fundamental idea for processing MaxRS queries in Euclidean space [10].

The Proposed Method
Definition 3 (max-enclosing rectangle query). Given a set of points , a rectangle with a given size, a max-enclosing rectangle query finds the location of such that encloses the maximum number of points in . The MaxRS query calculates the total weight of points, while the max-enclosing rectangle query counts the number of points in rectangle. Note that when assuming all points have the weight being equal to 1, the result of the MaxRS query equals that of the max-enclosing rectangle query.
Definition 4 (rectangle intersection query). Given a set of rectangles , a rectangle intersection query finds the area, where most rectangles overlap. Figure 4 shows two examples of the max-enclosing rectangle query and the rectangle intersection query. It can be observed from the figure that the optimal location in the max-enclosing rectangle query can be any point in the most overlapped area (i.e., the gray area, where 3 rectangles overlap), which is the outcome of the rectangle intersection query.
Our solution is based on the above idea. Consider an example of a MaxRS query in a road network shown in Figure 5. To simplify our discussion, we use a simple road network that consists of two edges (i.e., ⟨V 1 , V 2 ⟩ and ⟨V 2 , V 3 ⟩) and two facilities (i.e., 1 and 2 ) on two edges.
In this example, we assume that the weight of each facility is 1 and the network radius is 1. The gray solid segments in Figure 5 indicate the network range 1 ( ) of the facility 1 , and gray dotted segments indicate the network range 2 ( ) of facility 2 . Let be the set of all segments presented in · · · · · · · · · · · · · · · Adj list B + -tree B + -tree Facility list Definition 5 (location-weight). Let be the location in road network. The location-weight of with regard to equals the total weights of all the segments (in ) that cover .
Definition 6 (max-segment). The max-segment with regard to is a segment such that every point in has the same location-weight , and no point in the network has a location-weight higher than .
From the idea of the transformation mentioned before, we can see that the overlapping segment in Figure 5 is a maxsegment. Because all max-segments in the network contain all the optimal locations (i.e., the result of the MaxRS query in the road network), we need to find all max-segments in the network to evaluate the MaxRS query.

Storage
System. Similar to the disk-based storage model proposed in [13], the road network and the facility set are stored in a secondary storage. Figure 6 shows the files and indexes for the network and facility set. In this storage model, the network (adjacency list) is stored in a flat file, which is indexed by the B + -tree. For each node V (e.g., V 1 ), besides the information of V (i.e., node identifier, coordinates), we also store the additional information of all adjacent nodes including adjacent node identifier and Euclidean distance between V and its adjacent node (e.g., length of edge ⟨V 1 , V 2 ⟩ is 2.236). Similarly, the facility list is also stored in a flat file and indexed by the B + -tree. To support the algorithm efficiently, besides the information of each facility (i.e., facility identifier, coordinates, and weight of facility), we store the additional information of the edge that contains including start node identifier, end node identifier, and the Euclidean distance (offset) between start node and (e.g., start node of 1 is V 2 , end node of 1 is V 3 , and length of segment ⟨V 2 , 1 ⟩ is 1.0). The Scientific World Journal segments that cover the network range ( ). The segments generated by facility will have the weight of , namely, ( ). These segments are organized in a seg-file. Then, we process the seg-file to find out all max-segment. The following three main steps constitute the proposed algorithm:

Generating Segments.
In this step, we generate segments from all facilities of facility flat file. For each facility , we generate the segments, which cover the overall network range ( ). This process is described in Algorithm 1. First of all, we retrieve the information of the edge that contains , start node, and end node. Then, we generate the segments at the start node side first (lines 8-16), after which we generate the segments at the end node side (lines [17][18][19][20][21][22][23][24][25][26]. If the distance between and the start node is greater or equals the network radius , we only need to generate one segment with the length being equal to (lines 9-10). On the contrary, we generate the segment between and the start node (the length is equal to the offset of facility, lines [13][14] and continuously generate segments from the start node with the remaining network radius by calling the function recursiveGenerateSegs (line 15), which will be described in Algorithm 2. We do the same way to generate segments at the end node side (with the new offset is the length from to end node, line 17). Each new generated segment has the weight of and contains the facility identifier of . This facility identifier will help the merging process when there is more than one segment of generated in one edge. These new generated segments are inserted into the seg-file with the edge that contains them. In our algorithm, we use a list in order to contain edges processed completely in generating process of a facility (finished-edge-list). The edges in this finished-edge-list will not be processed during the invocation of the function recursiveGenerateSegs. After generating the segments of finishes, we need to clear the finished-edge-list to start generating the segments of a new facility (line 27).
After finishing generation of the segments from a facility to start node (and the end node) in Algorithm 1, if the network radius is greater than the distance between and the start node (and the end node), the generating process of the segments is continued from this start node (end node) with the new shortened network radius (lines 15 and 25). This process is described in Algorithm 2, which helps segments spread out the network range ( ).
In Algorithm 2, we generate all edges of the current node (i.e., the node we start generating segments). These edges are created from the neighbor list of current node, except the old node, which has been already processed (line 1). To process an edge, we need to consider two situations. In the first situation, this edge does not exist in finished-edge-list (line 5). If the length of this edge (e.g., ⟨curN, neighN⟩) is greater than or equals the new network radius, we only need to create a new segment between the current node and the neighbor node with its length being equal to the new radius. Then, we insert this segment into seg-file (lines 6-8). If the length of the edge is smaller than the new network radius, we create a new segment between the current node and the neighbor node, and insert this new segment into seg-file, after which we continuously generate segments from the neighbor node with the new shorten network radius (line 13). In the second situation, this edge existed in the finished-edge-list (lines [15][16][17][18][19]. If the length of the edge is smaller than the new network radius, we only need to generate segments from the neighbor node with the new shortened network radius (line 17). This process continues until the generated segments cover the network range ( ) of the original facility. Figure 7 shows the process of generating segments of facility 1 in road network shown in Figure 3. In this example, the network radius is 1.5. First of all, we generates the first segment ⟨ 1 , V 2 ⟩ with length 1 and then two segments with length 0.5 on 2 edges ⟨V 1 , V 2 ⟩ and ⟨V 2 , V 4 ⟩. After that, we generate segment ⟨ 1 , V 3 ⟩ with length 0.803 and 3 segments on 3 edges⟨V 1 , V 3 ⟩, ⟨V 3 , V 4 ⟩, and ⟨V 3 , V 5 ⟩ with the same length 0.697. The numbers nearby segments show the generating order of these segments.

Inserting Segments into Seg-File.
Segments generated at step 1 are inserted into seg-file (together with containing edge information). Algorithm 3 describes this insertion process. One important point of seg-file is that all segments on the same edge will be grouped into one record (edge-record). So, each edge-record in seg-file has the format of the form ⟨edge, (segment 1, segment 2, . . .)⟩. This seg-file is indexed by B + -tree. This structure of seg-file helps to find maxsegments effectively.

Seg1 Seg2
(d) Figure 8: Two segments of a facility in one edge.
be merged). After updating segment list of edge-record, we update this edge-record in the seg-file (lines [15][16]. Figure 9 shows the records in seg-file after finishing the generating segments step and inserting segments step. In the figure, the segments generated from the facility 1 are gray dotted segments, the segments generated from the facility

Finding Max-Segments.
After finishing construction of the seg-file, Algorithm 4 is invoked, which is the process of finding max-segments from the seg-file.
In this algorithm, we find the local optimal segments in each edge-record first (line 4), after which we compare the maximum weight of segments on these edge-records, and the segments that have maximum weight are added into the list as final result (lines [6][7][8][9][10][11][12][13][14]. The process of finding local optimal segments is processed by function lineSweep, which is the line version of algorithm plane Sweep proposed in [11]. Figure 10 illustrates the algorithm line Sweep on the record associated with the edge ⟨V 3 , V 5 ⟩. Assuming that we are sweeping on an edge (e.g., ⟨V 3 , V 5 ⟩), if we meet a start node of a segment (e.g., positions 1 in the case of segment 2,. . .) the weight of this segment will be included in the calculation of local maximum weighted segment; in case we meet an end node (e.g., position 4 in the case of segment 2,. . .), we will remove the weight of this segment from the calculation. In the figure, the segment from position 3 to position 4 on edge ⟨V 3 , V 5 ⟩ is the local maximum weighted segment of this record.

Simulation Setup.
We use two real datasets, namely, North America (NA) road network and San Francisco (SF) road network. These datasets are depicted in Figure 12. The NA dataset is obtained from http://www.cs.fsu.edu/∼lifeifei/ SpatialDataset.htm and the SF dataset is obtained from [15]. The cardinalities of datasets are shown in Table 1.  Because this is the first work for processing MaxRS queries in a road network database, we develop a naive algorithm to compare with our proposed algorithm. The naive algorithm uses an unstructured seg-file, and thus the generated segments are inserted directly to seg-file in step 2 (segments on the same edge are not grouped into one edgerecord). In step 3, the naive algorithm reads the segments from seg-file, groups segments in the same edge, and finds max-segments.
We use disk-based storage model to store very large road network databases, so in our simulation, the performance metric is the number of I/O's, which is the number of read/write blocks from files. We do not consider CPU time because it is dominated by I/O cost [10,12,16]. The default values of the parameters are shown in Table 2. Figure 13  method is much less sensitive to this parameter than the naive algorithm. Figure 14 shows the results for the varying of network radius (network range). When the network radius increases, the number of segments increases, and thus the I/O cost also increases. The increment of I/O cost in SF dataset is greater than NA dataset because we can see the destiny of edges in SF is higher than NA. Therefore, the number of generated segments of SF is more than NA. Figure 15 shows the results for the varying of buffer size. Although both algorithms have better performance as the buffer size increases, the proposed algorithm is more sensitive to the size of buffer than the naive algorithm. Figure 16 shows the results for the varying of block size. We can see that when the block size increases, the I/O cost decreases. This is because as the block size increases, the number of objects stored in a block also increases, which causes the number of read/write blocks to decrease. Similar to the buffer size case, the proposed algorithm is more sensitive to the size of block than the naive algorithm.

Related Work
In this section, we review related work on facility optimization location problem in general and MaxRS problem in particular.
Facility Optimization Location Problem. MaxRS problem can be seen as an instance of facility location optimization problem, which has been studied extensively in current years. The aim of this facility location optimization problem is to find an optimal location to maximize/minimize an objective function. Cabello et al. introduced and investigated optimization problems according to the bichromatic reverse nearest neighbor (BRNN) rule [17], while Wong et al. [18] studied a related problem called MaxBRNN; find an optimal region that maximizes the size of BRNNs. These two problems are The Scientific World Journal    The Scientific World Journal studied in 2 space. Du et al. [19] proposed that the optimallocation query returns a location with maximum influence, where the influence of a location is the total weight of its RNNs. In the extension version of [19], Zhang et al. [20] proposed and solved the min-dist optimal-location query.
There are some studies, specially, about facility location optimization in road network database. Xiao et al. [21] have studied about optimal location queries in road network, with the introduction of three important types of optimal location queries: competitive location query, MinSum location query, and MinMax location query. Yan et al. also proposed some algorithms for finding optimal meeting point, which have smallest sum of network distances to all the points in a set of points in road networks [22].
MaxRS Problem. Imai and Asono proposed an optimal algorithm for the max-enclosing rectangle problem [11] with the time complexity being ( log ); n is the number of rectangle. Nandy and Bhattacharya also presented another algorithm which is based on interval tree data structure with the same cost [14]. Those algorithms are internal memory algorithms. Choi et al. [10] proposed an algorithm for solving MaxRS problem in the case of external memory with optimal I/O cost. Tao et al. [12] proposed a new problem called (1− )approximate MaxRS which returns a solution that can be worse than optimal solution by a factor at most ; is an arbitrary small constant between 0 and 1.
Another version of MaxRS problem is maximizing circular range sum (MaxCRS) problem. This is a circle version of MaxRS problem with the boundary being a circle. Chazelle and Lee [23] proposed an algorithm for solving the maxenclosing circle problem with the time complexity being ( 2 ). As max-enclosing circle problem is 3SUM-HARD [24], in which the best algorithm takes ( 2 ) time, many studies used approximate approaches to solve max-enclosing circle problem. Aronov and Har-Peled [25] give a Monte-Carlo (1 − )-approximation algorithm for unweighted point sets that runs in ( − 2 log ) time; this algorithm can be extended to the weighted case, giving an algorithm that uses ( −2 log 2 ) time. de Berg et al. [26] proposed another approximation algorithm for max-enclosing circle problem with time complexity ( log + −3 ). The MaxCRS problem is also proposed in [10] by a novel reduction that converts the MaxCRS problem to the MaxRS problem.

Conclusions
The MaxRS problem can be used in location-based applications to find the most profitable service place or the most serviceable place. All of previous studies are stated in Euclidean distance; however, in many location-based applications, the network distance is used instead of Euclidean distance. This paper proposed an efficient algorithm for solving the MaxRS problem in road network database. We proposed an external-memory algorithm, which is suitable for large dataset of road network. In our algorithm, all optimal locations (max-segments) on the network will be returned while all previous methods only return one result.
This can help clients of diverse interests choose their own best locations by considering other additional conditions. For the future works, we plan to improve our method and calculate the complexity of algorithm.