Velocity-Preserving Trajectory Compression Based on Retrace Point Detection

With the increasing development of GPS-equipped mobile devices such as smart phones and vehicle navigation systems, the trajectories containing valuable spatiotemporal information are recorded. Typically, plenty of trajectory records are generated and stored, making the device memory su ﬀ er a heavy storage pressure. Thus, it is a vital issue to compress the trajectories. The trajectory semantics are usually ignored or reduced in traditional trajectory compression techniques. In addition, most of existing trajectory compression algorithms only concern the position errors rather than the velocity errors of trajectories. This paper proposes a velocity-preserving trajectory compression algorithm based on retrace point detection (VPTC-RP) that can compress a set of trajectories by removing unnecessary redundancy points, while the skeleton of these trajectories is maintained as much as possible. In VPTC-RP, the retrace points and the velocity errors are taken to re ﬂ ect the speeds and directions attached with the points. VPTC-RP ﬁ rst determines the retrace points based on the changed movement directions, and then, the retrace points are extracted from the original trajectories. Especially, the retrace points are put in a bu ﬀ er, and the subtrajectories in the bu ﬀ er are compressed according to the measured velocity errors. Simulations are carried out on the Geolife trajectory dataset, and the simulation results indicate that VPTC-RP can achieve a preferable tradeo ﬀ among the compression error, compression ratio, and running time.


Introduction
With the rapid development of wireless communication and mobile computing technologies, more and more devices (e.g., mobile phones, smart watches/bands, and auto navigators) are equipped with GPS modules, and thus, the amount of trajectory data collected by GPS-enabled devices is drastically increased. To exploit the valuable information hidden in the trajectory data, the issue of trajectory mining has attracted more and more attention, and the trajectory mining results can be applied to various fields, such as location recommendations [1], destination predictions [2], and personal navigations [3]. For example, the travel modes and the preferential restaurants of some people (nodes) can be reflected by their personal trajectories. Thus, with the previ-ous trajectories, proper travel modes and restaurants can be recommended to the nodes.
Especially, note that the generous trajectory data brings a heavy storage burden. For example, when a GPS device records the positions at the interval of two seconds, it yields more than 10,000 points in one day (the size of recorded trajectories of 2000 moving nodes will reach 1 GB). Besides, much time [4] will be consumed to upload and download the trajectory data between the mobile devices and the servers, and thus, the real-time positions are hardly obtained and exploited.
Therefore, it is essential to select and discard some points from the original trajectories, while the main features (e.g., the movement speeds and directions of nodes) of trajectories are expected to be preserved. To address this issue, several works have been employed to compress the GPS trajectory data using some trajectory compression strategies, and the trajectory storage pressure can be relieved accordingly.
The error measures evaluate the degree of deviations between the original trajectories and the compressed trajectories, and the main principle of trajectory compression is to compress the trajectories under a certain error measure. Most of trajectory compression algorithms need to employ some error measures, and the compressed trajectories contain different trajectory feature losses. Therefore, different error measures should be selected for different compression targets, such as the position information, speed information, and direction information. The main objectives of trajectory compression methods are to reduce the number of trajectory points and reduce the compression errors, while some existed algorithms only make a tradeoff between the compression accuracy and storage size.
Most of existing methods concern the position errors rather than the velocity features of trajectories. The velocity represents the moving distance in a particular direction per unit time. Velocity features must be considered several scenarios and applications [5,6]. For instance, the velocity features of different nodes are much different, such as the nodes with various travel modes: walking and taking the vehicles of bikes, buses, or trains. The existing velocity-preserving trajectory compression methods [7,8] usually attempt to reduce the speed errors, and these methods focus on some speed components in the velocity, while the direction information is usually neglected.
Moreover, some current works have ignored the semantic meanings of trajectories. Hence, semantic trajectory compression methods are used to solve this problem. Although the original trajectory can record the movements of a mobile object, it is hard to understand the meaning of the trajectory by the coordinates including latitudes and longitudes. The original trajectories can be converted into the points of interest sequences and become the semantic forms that people can easily understand. Schmid et al. [9] propose the concept of semantic trajectory compression, and they think the dynamic state of trajectory can be roughly expressed by some meaningful states and related events. In [10], the stay point is firstly proposed to extract the life patterns from original trajectories. A stay point represents a geographic region where the node stays for a specific period. Each stay point carries some semantic meanings and hidden knowledge. For example, the stay points can map the trajectories to a semantic level and then find related regions of interest (e.g., a restaurant, an office building, and department stores). Besides, by analyzing the semantic meanings of trajectories, the retrace behaviors of nodes can be found to compress the trajectories. Especially, the points where the directions of nodes change sharply are referred to as retrace points (e.g., someone lingers around a shopping region or some tourists visit the scenic spots), and these retrace points can help to compress the original trajectories.
The movement of a pedestrian node typically exhibits a strong randomness. Thus, this paper puts forward a trajectory compression algorithm based on retrace point detection to mine the semantics of trajectories. Specially, to overcome the limitations of speed errors, we take into account the direction features of velocity and propose an online direction-preserving trajectory compression method. In this paper, the main contributions are given as follows: (1)A new trajectory compression algorithm is proposed to compress the trajectories through extracting the retrace points based on the maximum angular deviations of trajectory segments. The proposed algorithm can remove the redundant trajectory points, while ensuring the trajectory accuracy (2)The circumferential angle theorem and distance ratio between trajectory points are applied to define the retrace area where each trajectory point can be taken as a retrace point (3)The velocity vector standardized Euclidean distance is evaluated to measure the velocity (speed and direction) error. More valuable information hidden in the trajectories can be reserved while compressing the trajectories according to the velocity error rather than the speed error The rest of the paper is organized as follows: Section 2 gives some related works. The problem formulation is described in Section 3. Section 4 introduces the online velocity-preserving trajectory compression algorithm based on retrace point detection. Section 5 reports the simulation results for the performance evaluation of algorithms. Finally, Section 6 concludes this paper.
This work is a significant extension of our early work [11]. Specifically, many related works are reviewed. The proposed VPTC-RP is improved and explained with more details, e.g., a lemma and its proof are provided for defining the retrace region, and more simulations are conducted to clarify the merits of VPTC-RP. Besides, the time complexity of VPTC-RP is analyzed.

Related Works
In this section, we will briefly review the related algorithms of trajectory compression. These algorithms could be roughly classified into two categories: offline compression and online compression. Offline compression algorithms collect the complete trajectories and then delete the redundant trajectory points. The mobile nodes can collect realtime trajectory data and compress them at the same time when the nodes are in the moving states. Due to the integrity of the original trajectories, these algorithms are relatively easier to achieve global optimization and more suitable for offline data analysis, while online compression algorithms are more suitable for real-time applications.
The most famous offline trajectory compression algorithms called Douglas-Peucker (DP) algorithm [12] is put forward by Douglas and Peucker. DP algorithm is a heuristic algorithm which ensures that the max position error of the compressed trajectory is confined to a threshold. DP algorithm starts with the first point and the last point of each trajectory and recursively adds the points with max perpendicular Euclidean distance (PED), until the PED of each point reserved is below a preset error threshold. DP algorithm adopts a top-down strategy, and the compressed trajectory is of high quality. Compared with the original 2 Wireless Communications and Mobile Computing trajectory, the compressed trajectory loses relatively little precision. However, the time complexity is very high, and it could reach OðN 2 Þ (N denotes the number of trajectory points). As shown in Figure 1, a trajectory starts from p 1 and ends with p 7 . The PEDs of p 4 and p 5 are greater than the threshold, and p 4 and p 5 are reserved. Although DP algorithm can effectively compress the trajectories, it must run offline and require the full collected trajectories, which is not suitable for some real-time applications, i.e., the trajectories must be compressed immediately. To this end, a generic sliding-window algorithm and open window algorithm (OPW) [13] are proposed to improve the real-time issue, which compress the points in window moving on original trajectories. The sliding-window algorithm initializes with a segment called sliding-window between the first point and the third point; then, the algorithm calculates all position errors of points in the segment and adds the points with maximum error into the sliding-window repeatedly. Once one position error in the sliding-window is over threshold, the point over threshold is chosen as feature point to con- OPW is an online trajectory compression algorithm. Different from the offline compression algorithms, the online compression can handle local trajectories, and the error is larger than that of offline ones, while the average time complexity is much smaller.
Spatial QUalIty Simplification Heuristic (SQUISH) algorithm [14] uses Time Synchronized Euclidean Distance (SED) error as constraint to compress the trajectories. It yields a short runtime, a high compression ratio, and a small trajectory error. However, SQUISH fails to guarantee the required compression ratio and compression error. SQUISH-E algorithm [15] is an extension of SQUISH, and it has the capability of minimizing the trajectory SED error under λ (compression ratio bound). Especially, SQUISH-E uses a priority queue to temporarily save the priority of points to be processed and repeatedly removes the points with the lowest priorities. When a point is deleted, its two neighboring points will be adjusted according to the priority score of this point. Recently, Yang et al. [16] give some new error measurements and develop a two-component method to compress the trajectories. This method is derived from DP and can simplify the trajectories with a guaranteed position error, and it enhances the semantic components by a data enrichment strategy to restrict the speed error, which can preserve the original speed in the compressed representation as well.
Besides, in [17], an enhanced Douglas-Peucker (EDP) algorithm implements a set of enhanced spatial-temporal constraints to simplify the trajectory data. These constraints ensure that the essential properties of a trajectory be preserved through preserving critical points. Likewise, [18] analyzes the movement behaviors of nodes from the aspects of moving speeds, stop points, and moving directions, and then, a novel Trajectory Partition Method based on combined movement Features (TPMF) is proposed to partition the trajectories. TPMF first extracts the change points where the movement speeds of nodes are varied significantly and then extracts the stop points by detecting the speed variations of nodes. Finally, the Douglas-Peucker algorithm is applied to partition the subtrajectories according to the extracted change points and stop points. Lin et al. propose a one-pass error bounded trajectory simplification algorithm (OPERB) [19]. Based on a local distance checking method, OPERB maintains a directed line segment to approximate the buffered points and guarantees that the distance from the current point to the line segment is bounded. Reference [20] proposes a Joint Spatial-Temporal Trajectory Clustering Method (JSTTCM), where some spatial-temporal properties of the trajectories are exploited to cluster the trajectory segments. In [21], the hierarchical structure of the DP-based trajectory compression algorithm has been redesigned according to the GPU architecture and programming framework. It is able to significantly accelerate the compression of large-scale vessel trajectories while maintaining the required compression quality. Reference [22] proposes an unsupervised learning method which automatically extracts the low-dimensional features through a Convolutional Auto-Encoder (CAE). In particular, the informative trajectory images are first generated by remapping the raw vessel trajectories into two-dimensional matrices. Besides, a kind Step 1 Step 2 (b) Figure 1: A compressed trajectory by Douglas-Peucker algorithm.
3 Wireless Communications and Mobile Computing of parallel algorithm utilizing Hopfield Neural Network is proposed in [23]. The proposed algorithm trajectory compression based on the Hopfield neural network (HNN-based) is a parallel algorithm, which evidently reduces the processing delay. The total compression error, called the integral square (ISE) between the origin trajectories and the compressed trajectories, is a sum of the squared Euclidean distances of trajectory segments. The objective of HNN-based algorithm is to find a subset of origin trajectories while minimizing the total compression error. Typically, the HNN-based algorithm employs a two-dimensional binary Hopfield neural network to locate and save the points in the compressed trajectory. The network consists of N × M mutually interconnected neurons. The rows and columns of the Hopfield network represent the points on the original trajectory and on the compressed trajectory, respectively. If the neuron state matrix achieves a stable state after updates, the total compression error is considered to be the minimum.
Most of the aforementioned works compress the trajectories spatially, and it is necessary to take into account the critical movement features such as the retrace points, moving speeds, and the directions. Moreover, a preferable tradeoff between the compression ratio while the compression error is expected to be made as much as possible.

Trajectory Representation.
A trajectory represents the path that a moving object travels over time. A spatial trajectory can be described either by the path geometry or by the sequential positions of the object.
Definition 1 (GPS trajectory). Generally, a GPS trajectory T consists of some spatial data (such as latitudes and longitudes). Besides, the timestamp t is always stored along with spatial data. A series of time-stamped position points (locations) form an ordered sequence, which is denoted by a tuple p n = <x n , y n , t n > , where x n and y n represent the latitude and the longitude at time stamp t n , respectively. Definition 2 (trajectory segment). A trajectory segment, denoted by Tr½1, n = fp 1 , ⋯, p n g, is a sequence extracted from the original trajectory.
Definition 3 (compressed trajectory segment). A compressed trajectory segment, denoted by Tr ′ ½1, n = fps 1 , ⋯, ps m g, is the compression of Tr½1, n = fp 1 , ⋯, p n g. All points from a compressed trajectory segment are consecutive and contained in the original trajectory T.
Definition 4 (direction of trajectory segment line). A line segment in a trajectory segment Tr½1, n is expressed as a vector We have the following: Definition 6 (compression ratio). Trajectory compression is aimed at extracting a set of sequential points Tr′½1, n = f ps 1 , ⋯, ps m g from the trajectory segment Tr½1, n, and Tr ′ ½ 1, n can maintain the features of Tr½1, n as much as possible. The compression ratio CR is defined as CR = m/n, n > m, where n denotes the number of points in the original trajectory and m denotes the number of points in the compressed trajectory, respectively.

Error Measurement.
Firstly, we introduce two types of position error metrics: Perpendicular Euclidean distance (PED) [5] and Time Synchronized Euclidean Distance (SED) [24], and then, we give some definitions for the error measurements.
3.2.1. Perpendicular Euclidean Distance. Given a trajectory segment Tr½1, n and its compressed representation Tr′½1, n, the PED of Tr ′ ½1, n with respect to a point For example, as shown in Figure 3(a), p 1 ′ = p 1 and p 4 ′ = p 4 , where a trajectory containing p 1 , ⋯, p 6 is compressed by p 1 , p 4 , and p 6 .
. Thus, with regard to a trajectory segment Tr½s, k, for any point

Speed
Error. GPS data collected by positioning modules usually does not contain the speed information. Therefore, the speed is calculated based on the trajectory points and their relations, i.e., the speed can be obtained by the distance and the time interval between adjacent points. The speed between two adjacent points p s and p k is denoted by spðp s , p k Þ, and spðp s , p k Þ is expressed as follows: where disðp s , p k Þ denotes the Euclidean distance between p s and p k . Likewise, if the points p s and p k are not adjacent, the speed between p s and p k is denoted by tspðp s , p k Þ, and tspðp s , p k Þ is calculated as the average speed of traveling from p s to p k , i.e., Speed error [25] is a vital metric for various kinds of traffic applications. It measures the difference between the actual speed and the estimated speed. The speed error between p s and p k is denoted by SpdEðp s , p k Þ, which is written as follows: For example, as show in Figure 3, Besides, the segment p i p i+1 ! also contains some direction Likewise, the coordinate of average travel velocity vector is with the similar form of average velocity. The average travel velocity jV ′ðp i p j ! Þj can be represented by the point with the coordinate ðV X ′ ðp i p j Þ, V Y ′ ðp i p j ÞÞ: Especially, note that the velocity component jV ′ ðp i p j ! Þj is equal to the value of tspðp i p j Þ. Let θ ave ðp i p j ! Þ denote the average direction of the trajectory segment p i p j ! , and θ ave ðp i p j ! Þ is expressed as follows: Figure 3: PED and SED errors are illustrated by Tr½1, 6 and Tr′½1, 6 = fp 1 , p 4 , p 6 g.

Wireless Communications and Mobile Computing
Moreover, we introduce the Direction Preserving Velocity Error (DPVE). Given a trajectory T and its compressed trajectory Tr′½1, n = fps 1 , ⋯, ps m g, the compression velocity error of the segment Tr½ps k , ps k+1 is denoted by DPVEðps k , ps k+1 Þ, representing the standardized Euclidean distance between two velocity vectors Vðps k ps k+1 ! Þ and V ′ ðps k ps k+1 ! Þ : where distðVðps k ps k+1 ! Þ, V ′ðps k ps k+1 ! ÞÞ is obtained by the following: where ps k and ps k+1 denote two adjacent points in the compressed point set and m i denotes the average of the i-th dimension of Vðps k ps k+1 ! Þ and V ′ ðps k ps k+1 ! Þ.

Algorithm
In this section, we present a velocity-preserving trajectory compression algorithm based on retrace point detection (VPTC-RP). In VPTC-RP, the semantic meanings regarding the movement randomness are taken into account, while VPTC-RP exploits the retrace points and the velocity information to compress the trajectories.

Retrace Point Detection.
In our work, we note that the sharp direction changes may frequently occur especially in the pedestrian trajectories. Moreover, the sharply changed direction of a point may indicate that the moving object is retracing towards the previous points. In real scenes, some trajectory points often indicate that the pedestrian nodes pass through the landmarks with high weights, and the trajectories around these landmarks are with distinct features. According to this phenomenon, these trajectory points in the trajectories are much more vital than other trajectory points. Thus, we define the term of retrace point: a node retraces towards a previous point, and several subsequent points are very close to the previous point. The previous point is referred to as an anchor point. To form the retracement, several points are probably located around the anchor point (e.g., someone lingers around some goods in a shopping mall). The points falling into such retrace region are taken as the retrace points, and the number of the retrace points is typically small.
Then, we introduce an index retrace stability to measure the proximity of the subsequent points with the anchor points, through which the retrace points can be found. The retrace points must be accompanied with the maximum directional deviation of trajectory segment which is larger than π/2. Therefore, the retrace point detection algorithm should check the maximum directional deviation of the trajectory segment. A theorem and a lemma are given for finding the retrace points: Theorem 7. An angle θ inscribed in a circle is half of the central angle 2θ that subtends the same arc on the circle.
Proof. The angle will not be changed as its vertex has been moved to different positions on the circle, as proven in [26].
According to Theorem 7, an example is shown in Figure 4, where p 1 , p 2 , and p 3 are located on the same circle, and there is ∠p 1 p 3 p 2 = ∠p 1 p 4 p 2 . Moreover, Lemma 8 can be derived from Theorem 7.

Lemma 8.
With regard to any arc on the circle, if a point located outside the region enclosed by the arc and the two endpoints of the arc, the angle which is formed by the point and the two endpoints is smaller than the inscribed angle. Similarly, the angle which is formed by the point inside the circle is larger than the inscribed angle.
Proof. Given an arc ð d p 1 p 3 Þ and the center of the circle O, as depicted in Figure 4, ∠p 1 p 3 p 2 denotes the inscribed angle of d p 1 p 3 .The point p 3 ′ and point p 3 ′ ′ are located on the half of the line Op 3 . In Δ p 2 p 3 p 3 ′ ′ , according to exterior angle theorem, the exterior angle ∠p 2 p 3 O is equal to ∠Op 3 ′ ′ p 2 + ∠p 3 p 2 p 3 ′ ′ . Likewise, ∠p 1 p 3 O is equal to ∠Op 3 ′ ′ p 1 + ∠p 3 p 1 p 3 ′ ′. Furthermore, we obtain that  Wireless Communications and Mobile Computing Thus, the point in same inscribed angle can form an egg-shape on both sides of a line segment due to the symmetry. To clarify the algorithm, we provide the following definitions: Definition 9 (retrace region). Given a line segment and an inscribed angle threshold, two arcs formed by the point in the inscribed angle of the line segment generate an eggshape area, and the egg-shaped area enclosed by the dotted line is termed a retrace region. As show in Figure 4, the shadowed area represents a retrace region.
Definition 10 (retrace stability). Given an anchor point, a float point, and the line segment between the two points, the retrace stability S re on the subsequent points is determined by two variables: (1) The distance ratio δ, which is defined as the ratio of two distances (ratio between the distance from the subsequent point to a float point and the distance to an anchor point) where disðp fl , p sub Þ and disðp an , p sub Þ denote the distance from the subsequent point to an anchor point and the distance to a float point, respectively. Then, we explore the impact of δ on the relationship between the subsequent point and anchor point. To clarify this issue, we provide Theorem 11.
Theorem 11. When the distance ratio δ is larger, the subsequent point is closer to the anchor point.
Proof. Given a rectangular coordinate system, the coordinate of a float point p fl ð−a, 0Þ and the coordinate of an anchor point p an ð0, 0Þ taken as the origin are shown in Figure 5. Suppose the coordinate of subsequent point is denoted by p sub ðx, yÞ, and there is disðp fl , p sub Þ/disðp an , p sub Þ = kðk > 0, k ≠ 1Þ. Thus, we obtain the following equations: We find that p sub at the same δ moves on the circle with the center ðak 2 /k 2 − 1, 0Þ and the radius R = ak/k 2 − 1.
k/k 2 − 1 is an odd function, and the domain is ð0, 1Þ ∩ ð1,+∞Þ. The differential coefficient of k/k 2 − 1 is expressed as follows: Similarly, ak/k 2 − 1 is monotonically decreasing in the domain.
Thus, the circle generated the subsequent point at a larger distance ratio δ will be involved in that at a smaller δ, i.e., the subsequent point is closer to the anchor point.
Moreover, when k = 1, the subsequent point is on the perpendicular bisector of the line segment p f l p an . Obviously, when k > 1, the subsequent point is on the right of perpendicular bisector. Therefore, with the increase of δ falling into the interval ð0, +∞Þ, the subsequent point is closer to the anchor point.
Then, S re ðp an , p fl Þ is expressed as ðδ, θÞ: Definition 12 (retrace point). If S re ðp an , p fl Þ ≥ ðδ th , θ th Þ, where δ th and θ th denote the preset thresholds of δ and θ, respectively, then the subsequent point is considered to be located in the retrace region. We use the line between the subsequent point and the anchor point as the radius to generate a circle (including the circle periphery) where each point falling into this region is taken as a retrace point.
An example of retrace points is given in Figure 6, where p 5 has a trend of retracing towards p 1 . Hence, p 1 is regarded as an anchor point, and p 4 is a float point. If S re ðp 1 , p 4 Þ ≥ ðδ th , θ th Þ, thus p 5 is a retrace point. Then, in the circle region with radius p 1 p 5 ! , p 6 and p 7 are taken as retrace points.
The pseudocode of RPD is given in Algorithm 1.
The retrace point detection steps are described as follows: Step 1. The process starts by defining a segment between the first point p 1 and the third point p 3 in Tr½1, n, where p 1 is taken as the first anchor point p i and p 3 is taken as the first float point p j . With regard to each segment between the anchor point and the float point p i p j ! , the direction deviation εðp i p j ! Þ is calculated, and then, the subsequent points in Tr ½1, n will play the roles of p i and p j , and the direction deviations will be processed as well.
Step 2. The maximum direction deviation in Tr½1, n is selected, and the selected pair of anchor point and float point (with the maximum direction deviation) is recorded as p s and p e , respectively. Especially, if εðp i p j ! Þ is larger than π/2, then p i and p j will be directly recorded as p s and p e , which is due to the fact that this derivation indicates a trend of retracing towards p s .
Step 3. If the retrace stability of p e is larger than a preset threshold, then a retrace point detection is performed to find the retrace points ðp r ∈ RPÞ, so that the inequality disðp s , p r Þ ≤ disðp s , p e Þ is satisfied.
Step 4. The above steps will be executed until all points in Tr½1, n have been processed, and the retrace points have been found.

Trajectory Compression.
In VPTC-RP, the retrace points are first deleted based on the results of the retrace point detection, and then, the velocity information is preserved as much as possible for the trajectory compression. Direc-tion Preserving Velocity Error (DPVE) is proposed to preserve the information of velocity (speed and direction) which is represented by the standardized Euclidean distance between two velocity vectors, and this measurement can reserve the speed information preferably. Moreover, the retrace points are some redundant trajectory points where the directions of nodes change sharply. Therefore, by adopting the above mechanisms, the information of speed and direction can be preserved in the compressed trajectories simultaneously.
The pseudocode of VPTC-RP is given in Algorithm 2: Step 5. The retrace points are found according to Steps 1-4. With regard to each pair of anchor point and float point, two cases are discussed: (a) If εðp i p j ! Þ > π/2, suppose the point p r is the last point in the current retrace point set, then the p s of p r will be added into the compressed trajectory. (b) If εðp i p j ! Þ ≤ π/2 or the retrace points cannot be found, DPVEðp i , p j Þ is calculated and the result is marked as λ. If λ is larger than a preset threshold, then p j is added into the compressed trajectory Tr′½1, n.
Step 6. When all points in the trajectory have been processed and compressed, the compressed points will be output.
According to the steps of VPTC-RP, VPTC-RP adapts the sliding-window strategy. If VPTC-RP fails to find the retrace points, the process of VPTC-RP is similar to the sliding-window algorithm. Therefore, the worst-case time complexity of VPTC-RP is written as OðnβÞ, where β is the maximum buffer size and n is the number of points in the original trajectory.

Simulation Settings.
In this section, we conduct an extensive simulation study of VPTC-RP. We evaluate VPTC-RP based on the Geolife dataset. All simulations are implemented on a computer equipped with Windows 10, 1.60GHz CPU and 8 GB memory. The trajectory compression algorithms are realized by Python language.
5.2. Dataset. GPS trajectory dataset is cited from the (Microsoft Research Asia) Geolife project [27], which collects the trajectories of 182 users during five years (from April 2007 to August 2012). This dataset contains 17,621 trajectories with a total distance of 1,292,951 kilometers and a total duration of 50,176 hours, which recorded a broad range of users' outdoor movements. These trajectories are recorded by different GPS loggers and GPS phones and have a variety

Parameter Settings.
After filtering some abnormal data, we randomly select 450 trajectories for the simulations. These trajectories are collected in one day and from 20 different users, which are enough to exhibit the behaviors of nodes. The total number of trajectory points is 449,796. For the effect of parameter variations, we divided all the trajectories into three groups. The details of each group are shown in Table 1.
Firstly, we observe the impacts of the inscribed angle threshold θ and the distance ratio δ on the number of retrace points, and the simulation results are given in Figure 7.
From Figures 7(a)-7(c), we can observe that the curves decrease rapidly with the distance ratio threshold increasing from 1.1 to 1.4, which is attributed to the fact that in VPTC-RP, more points are treated as the retrace points when a smaller distance ratio threshold δ is set. When δ ≥ 1:4, the curves descend slowly, especially when 1:5 ≤ δ ≤ 1:9, the curves remain almost stable, which indicates that the number of retrace points fluctuates very slightly, and the reason is that the trajectory points become denser in retrace regions when 1:5 ≤ δ ≤ 1:9, and thus, the value of δ should be selected from the interval [1.4, 1.6]. Moreover, when δ = 1:4 or δ = 1:6, the number of extracted retrace points changes obviously compared with that when δ = 1:5. Hence, we set δ th = 1:5 in the following simulations.
As shown in Figures 8(a)-8(c), the number of retrace points remains almost unchanged when 0:2π ≤ θ ≤ 0:4π, due to the fact that when the value of θ is small, i.e., it does not have an obvious impact on the number of retrace points when θ ∈ ½0:2π, 0:4π. Besides, we can find that when 0:4π ≤ θ ≤ 0:8π, the number of retrace points decreases rapidly with a larger value of θ. In addition, when 0:4π ≤ θ ≤ 0:8π, the number of extracted retrace points is not large enough, and thus, we set θ th = 0:2π for extracting more retrace points. The main simulation parameters are provided in Table 2.

10
Wireless Communications and Mobile Computing We will compare VPTC-RP with other algorithms in terms of several evaluation metrics: (i)The Compression Error between the Compressed Trajectories and the Original Trajectories. Based on the compression error, we define a new metric termed average velocity error to measure the velocity error between two adjacent compressed points. For example, with regard to a trajectory Tr½1, n and its compressed representation Tr′½1, nfps 1 , ⋯, ps m g, the expression of average velocity error is written as follows: where ps n denotes the n-th compressed point in the compressed trajectory point set and DPVEðps k , ps k+1 Þ represents the velocity error between ps k and ps k+1 .

Number of Residual Retrace Points.
In this section, VPTC-RP is compared with four algorithms (DP, SQUISH-E, OPW, and HNN-based algorithm). The parameter settings of DP, SQUISH-E, and OPW are given in Table 3, and the following simulations are run on the trajectory #tra1. Besides, HNN-based algorithm employs a Hopfield network consists of N × M mutually interconnected neurons. The rows and columns of the Hopfield network represent the points on the original trajectory and the positions of points on the compressed trajectory, respectively. For example, a neuron V x,i in a firing state indicates that the point p x on the original trajectory is the i-th point on the compressed trajectory. The state of a neuron in each column (except for column 1 and column M) with the maximum input compared with other neurons on the same column is set to 1, and other neurons on this column are set to 0. Firstly, we observe the number of residual retrace points after the compressions, and the simulation results are reported in Figure 9.
As shown in Figure 9, the number of residual retrace points is observed under different compression ratios. The curve of SQUISH-E is lower than those of other algorithms when the compression ratio is smaller than 50%, and the curve of SQUISH-E is higher than those of DP, HNNbased algorithm, and OPW when the compression ratio is larger than 50%. Particularly, the curves of DP and OPW are always very close to each other. Moreover, HNN-based algorithm obtains the smallest number of residual retrace points among other algorithms when the compression ratio is larger than 50%. This is attributed to the fact that SQUISH-E can compress the trajectories according to the SED measurement which considers the temporal information, and hence, the number of residual retrace points is significantly reduced. When the compression ratio is set very large, DP can compress some dense points in a small region where the spatial error is extremely low. Although HNNbased algorithm can achieve the optimal results, the trajectories are compressed under squared Euclidean distances which ignore temporal information. Thus, SQUISH-E outperforms HNN-based algorithm when the compression

12
Wireless Communications and Mobile Computing ratio is very small. These phenomena imply that many retrace points have been ignored when the trajectories are compressed.

Algorithm
Comparisons. In Figure 10, the running time of different algorithms is observed. The running time of SQUISH-E is always shorter than those of DP, OPW, VPTC-RP, and HNN-based algorithm. This is due to the use of a priority queue in SQUISH-E which enables the fast removals of points. Besides, VPTC-RP consumes a shorter running time than those of DP and OPW, when the compression ratio is smaller than 30%. This is attributed to the fact that VPTC-RP compresses the trajectories based on the retrace points, and hence, the number of computations is significantly reduced, especially when there are more retrace points in the original trajectories. The running time of HNN-based algorithm is much longer than others, because the number of iterations is extremely large. Figure 11 illustrates that VPTC-RP achieves a smaller average velocity error than those of DP, OPW, SQUISH-E, and HNN-based algorithm, and this is because VPTC-RP compresses the trajectories by exploiting the velocity information, and it preserves the trajectory velocity as much as possible. When the compression ratio is larger than 40%, these curves are very close to each other, and the reason is that lots of points must be deleted from the initial trajectories.
Besides, note that the average velocity error of OPW is worst when the compression ratio is smaller than 40%, and this is because OPW does not take into account the velocity information and the semantic meanings of trajectories, and thus, OPW achieves a smaller compression ratio. In addition, it can be found that the average velocity error of HNN-based algorithm outperforms DP, OPW, and SQUISH-E, when the compression ratio is larger than 50%. This phenomenon is attributed to the fact that HNN-based algorithm is a metaheuristic algorithm.
Therefore, VPTC-RP makes a preferable tradeoff between the running time and the average velocity error through detecting the retrace points which include the velocity information and the semantic meanings.

Conclusion
In this paper, we investigate the problem of compressing the trajectories based on the retrace point detection. We define the retrace points which denote the positions where the moving directions change sharply. After the detection of retrace points, VPTC-RP (velocity-preserving trajectory compression algorithm based on retrace point detection) compresses the trajectories under DPVE. VPTC-RP can preserve both the velocity information and the semantic meanings of the trajectories as much as possible. In VPTC-RP, the adopted Direction Preserving Velocity Error (DPVE) focuses on preserving velocity information, which could fail to capture some information such as position information. Besides, when the interval between the two trajectory points is extremely long, the retrace point detection could fail as well.
Future research will take advantage of the knowledge regarding the real road networks to further improve the detection accuracy of the retrace points. In addition, we will investigate the classifications of transportation modes based on the obtained compressed trajectories.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Disclosure
This work is a significant extension of our early work which has been published in IEEE ICCT 2020.