Transportation cyber-physical systems are constrained by spatiality and real-time because of their high level of heterogeneity. Therefore, applications like traffic control generally manage moving objects in a single-machine multithreaded manner, whereas suffering from frequent locking operations. To address this problem and improve the throughput of moving object databases, we propose a GPU-accelerated indexing method, based on a grid data structure, combined with quad-trees. We count object movements and decide whether a particular node should be split or be merged on the GPU. In this case, bottlenecked nodes can be translated to quad-tree without interfering with the CPU. Hence, waiting time of other threads caused by locking operations raised by object data updating can be reduced. The method is simple while more adaptive to scenarios where the distribution of moving objects is skewed. It also avoids shortcomings of existing methods with performance bottleneck on the hot area or spending plenty of calculation resources on structure balancing. Experiments suggest that our method shows higher throughput and lower response time than the existing indexing methods. The advantage is even more significant under the skewed distribution of moving objects.
National Key R&D Program of China2018YFB1003404National Natural Science Foundation of China618720711. Introduction
In modern economic infrastructure, a very important part is the transportation network. It connects cities, manufacturers, retailers, and nations by moving large volumes of freight and passengers through a complex network [1] (highway, railways, and so on). For better utilizing this network, the transportation system generally consists of the physical and cyber system (i.e., transportation cyber-physical systems, TCPSs) to provide innovative services. However, TCPS with mobile devices presents additional complexity because of the uncertainty of the object (vehicles and mobile devices). Many important applications such as traffic control and autonomous driving relay on the moving object satabase (MOD) [2], whose main goal is to provide users with a certain range of query results of mobile objects under the premise of meeting certain time accuracy and spatial accuracy. For example, in taxi applications, mobile users and taxis are mobile objects, and recommending a taxi within a certain distance for users is a typical mobile object query operation [3].
TCPS, in general, demonstrates a high level of heterogeneity, including sensor nodes, mobile devices, high-end workstations, and servers. The different components of TCPS probably have a nonuniform granularity of time and spatiality, TCPS is constrained by spatiality and real-time [4], and traditional single-threaded spatial data management methods are difficult to meet practical application requirements in terms of efficiency for massive mobile object data. Although considerable data processing platforms such as Hadoop and Spark have the advantage in dealing with massive data, their platform structures are mostly distributed and thus required for mutual communication between nodes. The time cost of these platforms cannot meet the real-time requirements of mobile spatial data management. Therefore, mobile object management is currently implemented in a single-machine multithreaded manner.
The facets of TCPS mentioned above require MOD to process large amounts of updates and queries in real-time, and thus spatial indexes are generally used to enhance its performance. There are currently two major categories of mobile object indexing structures: tree-based index and grid-based index. In the beginning, most of the indexes resided in disks since the index itself can be too large to fit in memory. Thus, tree-based indexes [5–7] are more popular structures because of sophisticated improvements that reduce disk I/O. Recently, the rapid development of computer hardware has made low-cost and high-capacity main memory capable of processing millions of moving objects. In fact, due to the high frequency of object position changes, the in-memory data structure for storing moving objects is much better than the on-disk one to improve IO efficiency [8]. Therefore, many works have been done to extend tree-based indexes to in-memory variants, such as [9–11]. However, the tree-based index update operation is specially constructed to reduce disk I/O, thus complicated and time-consuming.
On the other hand, compared with the tree-based index covering the space with its leaf nodes, the grid-based index divides space into uniform grids of uniform size. In the in-memory environment, these simple uniform grids are easy to update and maintain and thus more efficient than its tree-based competitor. Šidlauskas et al. [9] proposed u-Grid, an update efficient grid-based structure where a secondary index is employed to support the bottom-up updates. Xu et al. [12] proposed D-grid that takes advantage of the velocity information for further improving query performance. They also proposed the lazy deletion and garbage cleaning mechanism for accelerated update processing. Šidlauskas et al. [13] proposed PGrid, a parallel main memory indexing technique that supports heavy location-related query and updates operation by exploiting the parallelism of modern processors.
The grid-based index structure indeed is simple and easy to implement, but it is not suitable for uneven distribution, which is widespread in real-world applications [14]. For example, in the traffic monitoring applications, downtown compared with suburban (or in morning and evening peaks compared with other periods), the load on mobile objects’ space and time is uneven. Although the tree-based index can make every leaf node contain approximately the same number of moving objects, the experimental results show that its performance is not as good as grid-based indexing [9].
In fact, the performance bottleneck of single-machine multithreaded mobile object management in main memory is no longer the I/O operation but the delay caused by the coordination among multiple threads, the most crucial influence of which is thread’s locking operation on data structure on other threads. Reducing locking operations on other threads is the research direction of mobile object management. We observe that the objects need to be locked only when entering and leaving the leaf nodes. Thus, the division of leaf nodes in the tree index should not be determined by the total number of moving objects but the number of objects entering and leaving the leaf nodes in a unit time. However, counting the incoming and outgoing objects involves atomic operations, which will frequently lock the counters. We also need to decide whether to split the leaf nodes or merge the adjacent leaf nodes according to the changing count results that require continuous calculations, which will seriously affect the efficiency of object data update. Therefore, the existing research does not adopt the index method of dividing the leaf nodes by the entry and exit object counts [15].
Therefore, this paper proposes an adaptive parallel GPU-accelerated transportation moving object indexing method, which uses a grid-integrated quad-tree structure and in which GPU process counts the number of incoming and outgoing objects of leaf nodes and whether the leaf nodes need to be split or merged. This method only occupies a small amount of CPU computing resources and achieves the goal of continuously optimizing the index structure without affecting data update and query efficiency. The experimental results show that the performance of this index method is better than that of the best existing method for moving object updating.
2. Index Structure2.1. Problem Definition
Given space plane S, moving object set Ο=o1,…,on, any of them oi=oiid,oix,oiy,oit, oiid is the unique identifier of oi, oix,oiy is its location, and oit is the last update time. Query operation set Q=q1,…,qe, in which any query request qj=xmin,ymin,xmax,ymax,tq. The first four items define a rectangular query box and tq is the time when the query started. The purpose of moving the object database is to return the mobile object located in the query box to the user when the query qj arrives in Q.
This query method is called Range Query. Since other types of queries such as kNN queries can be converted into a series of range queries, this article only needs to discuss the support of the index structure for such queries.
2.1.1. Index Structure Design
The main task of moving the object database is to constantly update the position of the moving object and return the result according to the query requirement. Therefore, the mobile object index structure needs to meet the two basic conditions:
Find object by object identification oiid
Find and update moving objects based on object position oix,oiy
An auxiliary index based on the hash table can be used to support condition (1).
Definition 1.
Auxiliary index: all spatial objectsusing hash table ℋ according to oiid d save key-value pairs like (oiidp_bkt, idx) in ℋ. p_bkt is the location of the memory space of the bucket in which oi is located in the hybrid index (see Definition 2). idx represents its relative position in the bucket.
Figure 1 shows an example of an auxiliary index. The nature of the hash table shows that using an auxiliary index can find the memory location of oi in memory in a constant time based on oiid.
Both grid-based and tree-based indexes have advantages and disadvantages in satisfying condition (2). The grid-based indexing method directly calculates the grid to which oi belongs according to oix,oiy. However, when the moving object space is not uniformly distributed, the number of moving objects is excessive in the grid of hotspot region, and updating the location information of the object will cause the grid in the hotspot region to be locked frequently, which reduces the parallel performance of the hotspot region grid.
Tree-based indexing reduces the congestion of objects in hotspots. However, searching oi within oix,oiy requires multiple queries from tree root nodes to a series of intermediate nodes up to leaf nodes, which is inefficient. The more significant effect on overall efficiency is that tree-like indexes need constantly adjusting the structure to fit the distribution of moving objects. Computation of whether leaf nodes needs to be adjusted, and adjust operations themselves will consume a large number of computing resources.
This paper proposes a hybrid indexing method combining grid and quad-tree to avoid the disadvantages of the above two methods.
An example of the auxiliary index.
Definition 2.
Hybrid index Ρ divides the space plane S into Gnum=2ρ×2ρ grids, each of which can be converted between grid nodes and quad-trees depending on the conditions.
Hybrid index balances the load of each cell by transforming the grid of the hotspot region into a quad-tree [16]. ρ is the gridding parameter, using the selection condition given by the document [17]:(1)ρ=12logN−logCL,where N represents the total number of moving objects in space and CL represents the capacity of the leaf node.
During execution, the number of moving objects in each grid is dynamically determined, and the grid that satisfies the split requirement is converted into a quad-tree, and the quad-tree that meets the merge requirement is converted back to the grid. Within each quad-tree, its leaf nodes are also split or merged according to conditions.
Figure 2 shows an example of a hybrid index where the uppermost layer in the right half is a plane S that is divided into Gnum=2ρ×2ρ grids, where the black grid represents the hot grid that is converted to a quad-tree, and the lower layers represent the corresponding spatial regions of the quad-tree nodes at various levels, where the black nodes represent further subdivided regions. The left part of the figure shows one of the zoomed-in quad-trees. The top of the figure shows the quad-tree division to the grid. After converting to a quad-tree, the region is divided more finely.
Each grid node ci contains only one pbucket pointer to a bucket list ℒi. This list contains a series of buckets that hold a fixed number of all moving objects belonging to the node. Each quad-tree node ξi is represented as ξi=pbucket,pc1,pc2,pc3,pc4, where pc1,pc2,pc3, and pc4 are pointers to a child node in the four quadrants with the current node as a parent.
An example of the hybrid index.
2.1.2. Node Split Conditions
Since the splitting and merging operations on the grid are essentially the same as those on the quad-tree leaf nodes, we use the term nodes for both conditions.
When an object moves inside a node c, we only need to lock the object itself for updating its position without locking the node. When the moving object enters and exits the node c, the information of this object must be appended or deleted in a bucket corresponding to c and c must be locked simultaneously to prevent data collisions caused by different threads concurrently updating the c. Therefore, the update operation of all other threads to node c needs to be suspended. The greater the number nc of moving objects into and out of c per unit of time, the more threads waiting at the same time. The total time required for the update operation φc is the sum of the actual execution time of the operation ncτ (where τ is the execution time of each update operation) and the waiting time ψc. Therefore, there is a positive correlation between total update operation time and the number of moving objects in and out of the nodes.
By splitting the nodes, we can reduce the number of simultaneous waiting threads within the node. However, after splitting a node c into nodes, the movement of objects within a node c may become a movement among multiply nodes, increasing the total number of moving objects in and out of the nodes, i.e.,(2)nc≤nc1+nc2+nc3+nc4.
Therefore, to reduce φ, it requires a moderate split of the node, only when(3)φc>φc1+φc2+φc3+φc4.
Splitting the nodes helps to improve update performance. Conversely, when the four split nodes c1,c2,c3, and c4 satisfy the condition φc1+φc2+φc3+φc4>φc, they need to be merged into one node.
This section will quantify the relationship between φ and n. We will first discuss the effect of each additional update on the total waiting time for moving objects.
Lemma 1.
Set the execution time of the update operation required for each node to enter and exit as τ and the number of moving objects in or out of the node per unit time as ncnc>1. Mark the expected value of the total waiting time required for updating nc moving objects entry and exit information as ψnc, and then update the expected value ψnc+1 of the waiting time for nc+1 moving objects as follows:(4)ψnc+1=ψnc+32ncτ2+4nc−1τ3.
Proof.
Assume that node c has nc objects into and out of the node within a unit time. Then, the nc+1 th object cannot move in or out of its cell until the previous nc+1th objects finish updating, which is denoted by the following:(5)ψnc+1=ψnc+ϕnc.nc≥1,where ϕnc is the correlation function between waiting time and the number of moving objects. There are two types of update waiting time among the nc objects. The overlapping update waiting time means that an object’s update start time point appears within the period when another object is performing an update operation, whereas the nonoverlapping update waiting time means that the update start time point of an object occurs before the update start point of another object. The interval between two update start time points is greater than τ. This ensures that two objects are updated without waiting for each other. The significance of the correlation function ϕnc is the relationship between overlapping and nonoverlapping of object updates.
When the update waiting time of nc objects is not overlapping, we mark the probability that the n+1 object falls within the object execution period as p1′, and p1'′ is the probability that n+1 object falls before the time τ before an object updates time in time.
If p1′=nc, the waiting time at this time is the integral of τ on 0,τ, i.e.,(6)w1′=∫0ττdτ.
If p1″=nc×τ, the waiting time at this time is the integral of the constant 1 on 0,τ, i.e.,(7)w1″=∫0τ1dτ.
Finally, the waiting time between nc objects can be expressed as w1=p1′×w1′+p1″×w1″.
When the update waiting time is overlapping among nc objects, the overlapping time will need to be postponed. The waiting time for the overlapping part is set as w2, and the execution time of the object is τ. Each object can be overlapped for a period of time τ, so the period that can be overlapped is ot=2τ.
The postponed period is denoted as w2′, the integral of the overlapped period ot on 0,τ:(8)w2′=∫0τ2τdτ.
There are only overlapping periods among nc objects, and the postponed period among nc overlapping objects is(9)w2′=nc−1∫0τ2τdτ.
The concept of overlap between two objects shows that each time the object can be overlapped is denoted as ot=2τ. The overlapping probability between two objects is(10)p2′=2×ot=4τ.
The overlapping waiting time w2 between nc objects can be expressed as(11)w2=p2′×w2′,ϕnc=w1+w2.
Substituting the above formula into it, we get(12)ϕnc=p1′×w1′+p1″×w1″+p2′×w2′=nc∫0ττdτ+ncτ∫0τ1dτ+4τnc−1∫0τ2τdτ=32ncτ2+4nc−1τ3.
Finally, ψnc+1=ψnc+3/2ncτ2+4nc−1τ3.
The closed form of the waiting time ψn can be derived from Lemma 1.
Lemma 2.
Set the number of moving objects into and out of a node per unit time as n and the update execution time of each object as τ. Then, the total waiting time ψn for n update operations is as follows:(13)ψn=34nn−1τ2+2n−1n−2τ3.
Proof.
According to Lemma 1, the update waiting time of a moving object is related to the number of moving objects and the update time point of the object. Formula (4) can be recursively expanded as(14)ψn=32n−1τ2+4n−2τ3+ψn−1,ψn−1=32n−1τ2+4n−3τ3+ψn−2,ψ2=32τ2+ψ1,ψ1=0.
Substituting ψ1,…,ψn into the formula, we get(15)ψn=∑n=2n−132nτ2+∑n=2n4n−2×τ3.
So, Lemma 2 was proved.
Lemma 2 describes the relationship between the object update waiting time and the number of object updates, and the total update time φ is the sum of the update waiting time ψ and the actual execution time nτ.
Theorem 1.
Set the number of moving objects to and from the node per unit time as n. The update execution time of each object is τ. Then, the total update time φn including node waiting for other thread per unit of time can be expressed as follows:(16)φn=nτ+34n−1τ2+2n−1n−2τ3.
Proof.
Substitute (5) into φn=nτ+ψn. Then, Theorem 1 can be proved.
3. Data Structures and Algorithms
This section uses the C++-like pseudocode to describe our index structure. The index structure is divided into two parts: the auxiliary index and the hybrid index, in which the auxiliary index ℋ adopts a hash table; i.e., the form is(17)unorderedmapint,pairBucket∗,int.
We use this structure to store oiid,p_bkt,idx key-value pairs. The hybrid index takes the form of a 2-D array combined with a quad-tree. Next, we will focus on the hybrid index data structure and its associated algorithms.
The hybrid index data structure includes grid index and QuadTree index, which balances the update load of each cell according to the conditions given by Theorem 1. It merges cells containing a small number of objects to reduce the cost of querying quad-trees. It also split cells that contain larger numbers of objects to reduce update waiting time.
From Section 2.1.2, we know that continuously comparing the relationship between the total execution time φc of the node c and the total execution time of its four child nodes φc1+φc2+φc3+φc4 is crucial to deciding whether to split or merge a node. Since these comparison operations consume massive computing resources, the indexing method performs these operations on the GPU. Although counting moving objects in or out of nodes does not require much calculation, each time increased, the counter needs to be locked, which seriously impacts the overall parallel performance. Therefore, the counting operation needs to be performed on the GPU.
This chapter introduces the indexing algorithm from two aspects: CPU and GPU. All algorithms are based on a unified data structure stored in the main memory. The data required for GPU operation are copied from main memory to memory only when used. After the calculation result is obtained, the data are copied back to the main memory.
4. Data Structure
The grid index in a hybrid index is implemented using a two-dimensional array:(18)arrayarrayuniqueptrNode,width,height.where width and height represent the number of grid columns and rows, respectively. The Node class is the parent of Cell and quad QuadTree. Cell represents a grid that contains only one pointer of type uniqueptrBucket, and QuadTree represents a quad-tree:(19)Struct QuadTreeunique_ptrBucketp_bucket;/∗Bucket Chain Header∗/arrayuniqueptrQuadTree,4children;/∗Child nodes∗/QuadTree∗parent;/∗Parent nodes∗/int left,right,floor,ceiling,/∗Coverage∗/.
Cell or Objects in QuadTree are stored in a linked list bucket. Buckets are used to store moving objects on leaf nodes. The size of the buckets is fixed. The number of moving objects determines the number of buckets of each leaf node. If the bucket is full when the object is inserted, a new bucket is created; if the bucket becomes empty when the object is deleted, the bucket is deleted. The bucket structure is as follows:(20)struct BucketSiteMax×_sizesites;/∗Array of objects∗/int current,/∗The number of objects in the bucket∗/unique_ptrBucketbucket,/∗Next bucket∗/.
The Site class holds moving object data, including id,x,y and updates time tu. It can be inferred from the contents of the Site class that the data of one object needs at least 128 bits of memory space (four int values on a 32 bit machine). An access violation may occur between different threads when reading and writing Site classes in parallel without protection. The traditional way to avoid an access violation is to lock it while reading and writing Site. Since Site is the most frequently used class in the index, in order to avoid the impact of locking on performance, we merge the four data in the Site class into an object of type _m128i, use _mm_load_si128, and _mm_set_epi32 operation in the Intel MMX instruction set [18] to read and write the content and use _mm_extract_epi32 to extract the corresponding data. In this way, the index can correctly read and write Site data without locking.
The index also maintains two Node∗ lists: split_candicates list and merge_candidates list. Both save nodes need splitting and need merging separately. We calculate the nodes that need splitting or merging on the GPU.
4.1. CPU Algorithm
As the object continues to update, the structure of the quad-tree will change. The object’s insertion algorithm inserts it into the appropriate node based on its coordinates. The deletion algorithm finds the bucket based on the object id and deletes it. The division or merging of cells is a crucial operation for balancing quad-trees. Only cells that meet the conditions for splitting and merging can be divided and merged.
4.1.1. Spatial Object Insertion Algorithm
During the movement of the object, as the position changes, the object will continue to move between nodes. The main purpose of Algorithm 1 is to insert an object into the leaf node. The idea is to find the leaf node to which the object should be inserted based on the position of the object oix,oiy and determine the bucket state of the node. If the bucket is not full, insert the bucket directly. If the bucket is full, create a new bucket n_bucket and insert the new bucket into the bucket list of the current node. Then, insert the object oi into the bucket and increase the number of objects stored in the bucket by one.
Algorithm 1: Object o insert leaf node add_to_leaf.
Input: move the object oi=oiid,oix,oiy,oit
Output: no output; the leaf node is updated after the operation is completed
cur_leaf = get_leaf oix,oiy /∗ find the inserted leaf node by location oix,oiy ∗/
if (is_fill(cur_leaf.p_bucket))
n_bucket = new Bucket();
insert_bucketn_bucket;
Insert_object(oi); /∗ insert the object oi into the bucket of the current node ∗/
4.1.2. Spatial Object Deletion Algorithm
When the location of a moving object is updated, objects that no longer belong to the current leaf node range need inserting into its new belonging leaf node and deleted from the current node.
As shown in Algorithm 2, it finds the bucket where the object is located according to the object that uniquely identifies oiid. The object is then removed from the bucket, and the number of objects stored in the bucket is decremented. After deleting the object oi, if the bucket is empty, the bucket will also be deleted.
Algorithm 2: Deleting object from leaf node Oremove_from_leaf.
Input: move the object oi=oiid,oix,oiy,oit
Output: no output; the leaf node is updated after the operation is completed
bucket = get_bucket (oiid);
delete_from_bucket (oi);
if (is_empty(bucket))
delete_bucket(bucket); /∗ delete empty bucket ∗/
4.1.3. Cell Partitioning Algorithm
According to the node splitting condition in Section 2.1.2, when the leaf node satisfies the condition φc>φc1+φc2+φc3+φc4, the QuadTree index structure should divide the leaf node s_node into four equal-sized subgrids, where x_middle=left+right/2 and y_middle=floor+ceiling/2 (lines 1-2). After the division is completed, all children’s parent nodes are set as the current leaf nodes, and all the objects in the divided node s_node are moved to the bucket of the corresponding child. If the parent of the partitioned node belongs to the split_candidates list before the node is unpartitioned, it will be removed from the list split_candidates and appended the current node to split_candidates list as shown in Algorithm 3.
Output: no output; quad-tree structure changes after the operation
x_middle = (left + right)/2;
y_middle = floor+ceiling/2;
s_node.children[0] = new QuadTree (x_middle, right, y_middle, ceiling; /∗ in a similar way to initialize s_node.children [1–3] ∗/
for each (child in s_node.children)
child.parent = s_node;
s_node.p_bucket. oi⟶child.pbucket;
if (s_node.parent in split_candidates)
delete_from_split_candidates (s_node.parent);
insert_split_candidates (s_node);
4.1.4. Cell Merge Algorithm
According to the node split merge condition in Section 2.1.2, when the leaf node satisfies the condition φc1+φc2+φc3+φc4>φc, QuadTree index structure merges four grids into one. The purpose is to adjust the quad-tree structure and balance the update load to improve the efficiency of the update query.
As shown in Algorithm 4, when a node satisfies the merge condition, buckets of all children of the node are linked and assigned to the bucket of the current node. Adjust the bucket chain of the current node and delete the empty bucket. If the current node belongs to the merge_candidates list, the current node m_node is deleted from merge_candidates. After merging, if the parent node of the current node belongs to the merge_candidates list, the parent node m_node.parent of the current node is added to merge_candidates.
Output: no output; quad-tree structure changes after the operation
if(for each m_node, children is_leaf()) /∗all children with the node m_node are leaf nodes∗/
m_node.p_bucket ← child.p_bucket;
delete _null_bucket (m_node.p_bucket);
if (m_node in merge_candidates)
delete _from_merge_candidates (m_node)
if (for each m_node.parent.child is_leaf())
insert _merg_candidates (m_node.parent);
4.2. GPU Algorithm
GPU’s logic computation ability is weaker than that of the CPU, so it is not suitable for making complex logical decisions and is more suitable for a large amount of parallel data calculation. Different from CPU, GPU adopts a SIMD (single-instruction, multiple-data) operating mode. The same instruction can be executed on multiple sets of data at the same time so as to improve the efficiency of parallel execution [19]. Therefore, the work that is handed to the GPU in the index is mainly composed of two parts that involve large data amount calculations: (i) counting the number of moving objects entering and leaving the node and (ii) deciding whether the nodes need to be split or merged. Both operations involve two Node∗ type lists split_candidates and merge_candidates, which are maintained in the index.
4.2.1. Counting Algorithm
We take counting objects entering the node as an example to describe the counting algorithm in this section. If the operation runs on the CPU, several threads are usually opened according to the parallel capability of the CPU, and the following processes are executed, respectively, Algorithm 5 is as follows.
Input: partial object movement information linked list update_info_list
Output: update each node counter value
for each (item in some part of update_info_list)
node_in = get_location (item.new_pos);
lock_and_increase (node_in);
update_info_list stores a series of object movement data received by the system, including the object id, original location, and new location. Each CPU thread handles part of the update_info_list. After the node where the object is moved is calculated by get_location, and the third row increments the shifted-in node counter. Note that the counter must be locked when incrementing to ensure correct reading and writing of counters in parallel.
Compared with the CPU, GPU generally has thousands of stream processors [13], each of which maintains a fixed number of counters for several nodes. Since each counter can only be accessed by a fixed stream processor, it does not need to be locked when incrementing. Algorithm 6 demonstrates that each stream processor is responsible for m×n node operation flow.
threadIdx is a built-in stream handler index value automatically set by the system. Count array is a local array of stream processors and saves the node counters it is responsible for. Line 6 determines whether the node belongs to the node that the stream processor is responsible for, and if it belongs to the stream processor, it increments the corresponding counter (line 7).
The indexing system not only calls GPU_count on all nodes in the lists split_candidates and merge_candidates to record the number of objects in or out of nodes but also calls GPU_count on four child nodes that each node may divide in the list split_candidates and parent nodes that may be merged by four neighboring nodes in the list merge_candidates.
4.2.2. Split Merge Judgment Algorithm
We take the split judging algorithm as an example. Because the indexing system has counted any node c in the split_candidates list and the four child nodes c1,c2,c3, and c4 in which it may be split into, we just need to assign nodes in the split_candidates list to each stream handler and then substitute equation (16) into (3) to make the decision.
5. Simulation Experiment and Result Analysis
This section compares the index structure, denoted as GAPI with PGrid. PGrid is the state-of-the-art parallel moving object index structure and is widely adopted in applications for indexing moving objects [13].
The experimental simulation environment is Win 10 system, Intel Xeon E5-2620 v3 six-core CPU × 2, NVIDIA Quadro K2200 GPU. The experimental code is implemented by C++ and CUDA Toolkit 7.5, and the spatial object data are generated by the open-source mobile object generation tool MOTO (http://moto.sourceforge.net) based on the Brinkhoff [20] algorithm. We set the parameter of GAPI as ρ=1/2logN−logCL according to equation (1), where N represents the total number of moving objects in space and CL represents the capacity of the leaf node. Experimental parameters are shown in Table 1.
Experiment parameters.
Parameter
Experimental value
Defaults
The total area (km2)
200000 × 200000
—
Query area (km2)
0.25, 1, 4, 16, 32
4
CPU threads
24
24
GPU stream processors
640
640
Update/query ratio (×103)
0.25, 0.5, 1, 2, 4, 8, 16
1
Number of spatial objects (×106)
5, 10, 20, 40
10
Update interval (second)
10, 20, 40, 80, 160
10
Figure 3 shows the experimental results of the effect of the update/query ratio on throughput. With the increase in the update/inquiry ratio, the throughput of the two index structures increases. Among them, our method’s throughput is higher than that of PGrid in most cases but is only lower when the update/query is small. This is because our method mainly optimizes the update operation. The GPU-assisted splitting and merging of the quad-tree reduce the thread waiting time in the parallel update. Therefore, the average update time is better than PGrid. Because the query on the quad-tree structure needs multiple queries from the root node to the leaf nodes, our index’s query time is generally higher than that of the PGrid.
The effect of throughput on update/query ratio.
Figure 4 shows the experimental results of the effect of query area on throughput. From the figure, our method is better than PGrid in all cases. With the increase in the area of the query area, the overall throughput is declining. This is because the query operation occupies more and more computing resources.
The effect of throughput on the size of the queried area.
Figure 5 shows the experimental results of the effect of the update interval on throughput. The longer the update interval, the greater the probability that the object moves out of the current node. When the renewal interval gradually increases and the PGrid throughput decreases drastically, the throughput of our method can still be stable. This is because our method can dynamically adjust the nodes according to the actual situation at running time. We can also see that the dropping speed of GAPI is getting slower as update intervals increase. The reason is that a larger update interval allows GAPI to have enough time to adjust its nodes dynamically, which can balance the effect of the larger moving out ratio and get a rather stable throughput. Figure 5 also shows that our throughput is better than PGrid in all cases.
Effect of throughput on the interval between updating.
Figure 6 shows the experimental results of the effect of the number of moving objects on throughput. As the number of moving objects increases, the throughput of both types of indexes decreases. This is because the greater the number of moving objects, the greater the probability of simultaneous updates of multiple threads in the same node and the more threads that need to wait. However, as the number of mobile objects increases, the throughput of our method decreases significantly slower than PGrid. This is because our method can dynamically split the nodes according to the actual running conditions, which reduces the number of objects in the nodes, reduces the number of threads that need to wait, and improves the efficiency of parallel updating.
Effect of throughput on the number of moving objects.
6. Conclusions
For satisfying the requirement of TCPS applications’ spatiality and timeliness, based on the grid index, this paper proposes a hybrid indexing method that combines quad-tree and GPU acceleration to avoid the disadvantages of tree-based indexing and grid-based indexing. And, through experimental verification, the following conclusions have been drawn:
Our index passes the calculation of the balanced index structure to the GPU for processing, making full use of the advantage that the GPU can quickly calculate a large amount of data, reducing the CPU’s computational load as much as possible, thereby greatly improving the index optimization efficiency.
The index structure divides the number of objects entering and leaving the leaf nodes per unit of time into criteria. Compared with the traditional dynamic index structure that uses the number of leaf nodes as the dividing criteria, the index structure has a better balance of hotspots to update the load. Capability performance in moving object updates is significantly better than existing methods.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they haveno conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported by the National Key R&D Program of China (2018YFB1003404) and the National Nature Science Foundation of China (61872071).
MöllerD. P. F.VakilzadianH.Cyber-physical systems in smart transportationProceedings of the 2016 IEEE International Conference on Electro Information Technology (EIT)2016Grand Forks, ND, USAHuangY.-K.Indexing and querying moving objects with uncertain speed and direction in spatiotemporal databases201416213916010.1007/s10109-013-0191-62-s2.0-84896986814NguyenT.HeZ.ZhangR.WardP.Boosting moving object indexing through velocity partitioning5Proceedings of the VLDB Endowment2012986087110.14778/2311906.23119132-s2.0-84863741374XiongG.Cyber-physical-social system in intelligent transportation201523320333SaltenisS.JensenC. S.LeuteneggerS. T.LopezM. A.Indexing the positions of continuously moving objects20002331342SilvaY. N.XiongX.ArefW. G.The rum-tree: supporting frequent updates in r-trees using memos200918371973810.1007/s00778-008-0120-32-s2.0-67649560075ZhuY.WangS.ZhouX.ZhangY.Rum ± tree: a new multidimensional index supporting frequent updates2013223524010.1007/978-3-642-38562-9_24IdlauskasD.AltenisS.JensenC. S.Processing of extreme moving-object update and query workloads in main memory2014235817841SidlauskasS. S.ChristiansenJ. M. J.SaulysD.Trees or grids?: indexing moving objects in main memory20092236245XuX.XiongL.SunderamV.LiuJ.LuoJ.Speed partitioning for indexing moving objects2015221623410.1007/978-3-319-22363-6_122-s2.0-84983740902RayR. B.GoelA. K.Supporting location-based services in a main-memory database20142312XuX.XiongL.SunderamV.D-grid: an in-memory dual space grid index for moving object databases20161252261ŠidlauskasD.ŠaltenisS.JensenC. S.Parallel main-memory indexing for moving-object query and update workloads201223748CheQ.LiC.-W.ZhangY.GAPI: GPU accelerated parallel method for indexing moving objects2017111117131722Nguyen-DinhL.-V.ArefW. G.MokbelM.Spatio-temporal access methods: Part 2 (2003-2010)20103324655TangJ.ZhouZ.NingK.A novel spatial indexing mechanism leveraging dynamic quad-tree regional division2013ChenS.OoiB. C.TanK.-L.ST 2 B-tree: a self-tunable spatio-temporal b ± tree index for moving objects2008PelegA.WilkieS.WeiserU.Intel MMX for multimedia PCs1997401243810.1145/242857.2428652-s2.0-0030826020CookS.2012London, UKNewnesSarmaA. D.GollapudiS.NajorkM.A sketch-based distance oracle for web-scale graphs2010