Releasing Differential Private Trajectory Datasets Without Revealing Trajectory Correlations

With the prevailing use of smartphones and location-based services, vast amounts of trajectory data are collected and used for many applications. When trajectory data are sent to a third-party research institute for analytical applications, the privacy of users would be severely disclosed. For example, the relationship between users will be revealed from the correlation between trajectories. In this paper, we propose a method for releasing trajectory datasets without revealing correlation between trajectories, called RDPT. In RDPT, we ﬁrst quantify the trajectory correlation and convert the problem of protecting trajectory correlations into reducing the trajectory similarities of users and preserving the utility of the perturbed trajectories. Based on the insight, we model a multi-objective optimization problem and solve the problem with the particle swarm optimization algorithm modiﬁed to satisfy diﬀerential privacy. Then we generate synthetic trajectories and the correlations between trajectories are reduced. We conduct extensive experiments on three real trajectory datasets. The experimental results show that RDPT achieves almost equivalent data utility to and better privacy than the existing methods.


Introduction
With the extensive use of smart mobile devices furnished with location-based applications, such as WeChat and Twitter, vast amounts of trajectory data are being collected by location-based service (LBS) providers. Trajectory data can be widely used in many analytical applications, such as intelligent transportation, smart cities, infectious disease surveillance and epidemiological investigations. When LBS providers want to know whether a traffic congestion or certain events will occur in a certain area in the future, they often send the trajectories to a third-party research institute for an offline analysis. However, the third-party research institute is not always trusted and is likely a potential adversary. Moreover, since trajectories are formed by users' temporal and spatial behavior as well as their social relationships [1], the trajectories can reflect the features of users' behaviors. For example, an adversary could infer the social relationships of a user by analyzing correlation between trajectories of users. If two trajectories are correlated and have many check-in points close or even coincident that indicates they often visit close or the same locations, the attacker could infer the users of the two trajectories are friends with high probability because the two users often have activities together. From the social relationships, the attacker can even infer the health status, habits of daily life and occupations of the users with a high probability [2]. erefore, the privacy of users is seriously revealed.
ere are many privacy-preserving methods for trajectories, and these approaches can be summarized into three categories: methods without considering protection correlation [3][4][5], mechanisms considering correlations within a single trajectory [6][7][8] and approaches considering the trajectory correlations [9][10][11]. For the previous two categories, correlation between trajectories is not taken into consideration in these methods, causing the trajectories perturbed by these approaches to still face privacy leakage risks.
ere are also several researchers [9][10][11] who have proposed privacy-preserving mechanisms considering trajectory correlation. However, approaches in the literature [9,10] are restricted to scenarios of publishing two trajectories. For example, these methods work well when two passengers in DiDi want to hide their trajectory correlation. However, there are more than two users in real social applications, so the number of trajectories published offline is usually greater than two. Moreover, even if we can reduce the correlation between any two trajectories in a dataset, there is still a risk that the correlations amongst three or over three real trajectories are not reduced. is would still reveal the privacy of those users. Under scenarios in which a large number of trajectories need to be released, the research in the literature [9,10] is no longer effective. In addition, although Zhao et al. [11] proposed a location privacy-preserving mechanism considering trajectory correlation, their approach was derived from k-anonymity and cannot resist location inference attacks [12][13][14]. erefore, even if the trajectories are perturbed by existing methods, the problem of privacy leakage caused by trajectory correlation still persists.
To solve the problem of privacy leakage caused by trajectory correlation when LBS providers release a large number of trajectories offline, we propose a method for releasing differential private trajectories, called RDPT. First, the continuous geographical space of real trajectories is discretized via an adaptive grid partition method. e continuous geographical space is divided into N × N top-level cells, and the original location trajectories are converted into cell trajectories. For each top-level cell, to preserve the privacy when we divide each top-level cell into bottom-level cells, we add Laplacian noise [15] to the frequencies of locations in the top-level cell. Second, as the spatial distribution and frequencies of visiting to the toplevel cells are important features for calculating correlation of trajectories, we extract the cell visit probability vector from each cell trajectory of a user. en we quantify the trajectory correlation and convert the problem of protecting the trajectory correlation into reducing the similarity of each pair of obfuscated cell visit probability vectors and retaining the similarity between the real cell visit probability vector and the corresponding perturbed cell visit probability vector to preserve the utility of the perturbed trajectories. Based on this insight, we model a multi-objective optimization problem to find the balance between the security and data utility. By solving the problem with the particle swarm optimization algorithm [16], which is modified to satisfy differential privacy(PSO-DP), we obtain a perturbed cell visit probability vector for a cell trajectory of a user and reduce the correlations between the obfuscating trajectory of a user and the perturbed trajectories of other users. Finally, based on the perturbed cell visit probability vector, we generate a synthetic trajectory.
us, the correlation between trajectories of users is reduced and the social relationship between users is protected.
To solve multi-objective optimization problem, there are generally several methods, such as derivative-based methods and evolutionary algorithms (EAs). Derivative-based methods are useful for solving problems with low dimensions, which are not suitable for our multi-objective optimization problem because our problem is complexity and high dimensions N 2 . Conversely, EAs are widely used optimization methods that can effectively deal with complex problems. e genetic algorithm (GA) [17], particle swarm optimization algorithms (PSO) [16,18,19] and differential evolution(DE) algorithms [20,21] are widely used in the proposed EAs. However, these algorithms cannot be used directly to solve our problem. No matter what algorithms in literature [16,[18][19][20][21] are selected, we need to add noise into these algorithms to satisfy differential privacy because reading an original trajectory would lead to a potential privacy leakage. Furthermore, compared with GA and DE, PSO has a simpler principle and fewer parameters, and we need not process the crossover and mutation for the trajectories such as genetic algorithm with differential privacy [17], thus ensuring its efficiency in different situations. Moreover, since users usually interact with other users in social networking, the trajectory of a user is formed by the user's social ties in LBS applications, which is similar to the idea of PSO. erefore, we choose a classical PSO to solve our problem. e difference between our PSO-DP and other multiobjective optimization algorithms is that we modify a classical PSO algorithm with exponential mechanism of differential privacy in selection procedure. When we select a local optimal particle, since we read a real trajectory we need add noise to preserve privacy. So, we select a particle as a local optimal particle according to the probabilities of local particles that are calculated by utility scores. Usually, the local optimal particle would be selected with higher probability, but we may or may not select the local optimal particle because of randomness. us, the privacy is preserved. To the best of our knowledge, this is the first work about PSO with differential privacy in trajectory privacypreserving.
e main contributions of this paper are summarized as follows: (1) We propose a method of releasing differential private trajectory datasets i.e., RDPT. We firstly model the problem of reducing trajectory correlations between trajectories of users and retaining data utility as a multi-objective optimization problem. By solving the problem, we can synthesize trajectories for users.
(2) We adapted a classical PSO algorithm to solve the modelled multi-objective optimization problem. By modifying the selection procedure of local optimal particles in each iteration to satisfy differential privacy, we obtain a PSO-DP algorithm and prove the security of PSO-DP.
(3) We evaluate our method on three real datasets, five different metrics and two specific implementations of quantifying the trajectory correlation. e 2 Security and Communication Networks experimental results demonstrate that our method achieves almost equivalent data utility to and better privacy than the existing methods. e remainder of this paper is organized as follows. In section 2 we review the related work in the literature. We present preliminaries in section 3 and formally define the problem in section 4. In section 5 we give a detailed description of RDPT. We show the experimental results and detailed analysis in section 6. Finally, we conclude this paper in section 7.

Related Work
In this section we briefly review three existing categories of privacy protection methods for the trajectories.
en we analyze the evolutionary algorithm satisfying differential privacy.
Trajectory privacy protection methods without considering the correlation. We can divide the approaches into: suppression methods [4], bounded perturbation [22], K-anonymity and its derivation approaches [3,23,24], and differential private methods [25][26][27][28][29][30]. Hasan et al proposed a privacy architecture with a bounded perturbation technique to protect user's trajectory from the privacy breaches [22]. Huo et al. proposed k-anonymity called "YCWA" [23]. ey first extracted the sensitive locations in the trajectory according to the time interval and location density. ey then partitioned the geographic space of trajectories into discrete k-anonymity regions, and the sensitive locations in the trajectories were replaced with the corresponding k-anonymity regions to protect the sensitive locations. e YCWA method reduces the complexity and information loss of k-anonymity over complete trajectories and preserves the privacy of the individuals. Zhang et al. proposed a dual-K mechanism (DKM) to protect users' trajectory privacy [24]. DKM first inserts multiple anonymizers between the user and the location service provider (LSP), and K query locations are sent to different anonymizers to achieve K-anonymity. Simultaneously, they combined a dynamic pseudonym and location selection mechanisms to improve user trajectory privacy. Although k-anonymity has been widely applied in many applications, the datasets published by these methods still face combination attacks and background knowledge attacks. Differential privacy has become a widely adopted privacy protection mechanism because of its strong guarantee of privacy. In literature [28], Ding et al. proposed a stream processing framework with differential privacy that contains two modules for trajectories. One module can concurrently receive real-time queries from individuals and release new sanitized trajectories, and another module comprises three algorithms based on differential privacy to facilitate publication of the distribution of location statistics. Wang et al. developed a privacy-preserving reference system that can extract privacy-demanding feature-based anchors that can subsequently be used to calibrate sequences from raw trajectories [29]. ey provided a private trajectory data sensitization approach that scales to large spatial domains reflecting realistic trajectories.
In the literature [30], the authors presented an algorithm for protecting sensitive place visits in privacy-preserving trajectory publishing. By generalizing sensitive places using sensitive zones and distorting the sub-trajectories within the sensitive zones based on differential privacy, their method not only prevents leakages of sensitive place visits, but also preserves individual movement features. ese privacypreserving approaches for trajectories focus on sensitive information of locations and do not consider the privacy leakage caused by trajectory correlations contained in the trajectories.
Trajectory privacy protection methods considering correlations within a single trajectory. e spatiotemporal correlation contained in the trajectory data easily leads to the privacy leakage problem of the users [31]. Many researchers have proposed trajectory privacy-preserving methods considering the temporal and spatial correlation within a trajectory. He et al. proposed a differential private approach based on the spatiotemporal correlation in a trajectory, called DPT [6]. ey first constructed a hierarchical index system to capture users' mobility features. ey then constructed a prefix tree to represent the spatial transfer features between adjacent locations of trajectories and added Laplacian noise to the visit frequencies of each node in the prefix tree. Finally, the perturbed trajectory is synthesized according to the obfuscated prefix tree to protect the spatial correlation contained within the trajectory. In the literature [7], the authors presented a differential private method called TGM for publishing trajectories. In TGM, they partitioned the geographical space and constructed a prefix sequence graph to model the spatial transfer features between grids in trajectories. en the trajectories were iteratively synthesized using an exponential mechanism. Gursoy et al. [8] proposed a differentially private trajectory synthesis method. ey extracted the Markov transition matrix, trajectory length probability distribution and journey probability distribution, and obfuscated the three features by the Laplacian mechanism. During the synthesis of trajectories, the trajectories were processed considering against Bayesian attacks, area (sub trajectory) sniffing attacks and abnormal location leakage attacks. ese methods only consider spatiotemporal correlations within a single trajectory, and still do not focus on the trajectory correlation between different trajectories, which could still cause serious privacy leakage.
Trajectory privacy protection methods with correlation between different trajectories. Ou et al. [9] proposed a trajectory publication mechanism based on a hidden Markov model (HMM) to protect the correlation between trajectories. Similarly, in [10] the authors proposed an n-body Laplace framework. en, under the framework, they presented two privacy protection methods for two types of data utilities. However, these methods are restricted to the scenario of releasing two trajectories and cannot provide an efficient privacy protection when publishing a large number of trajectories offline. To ease the social relationship attacks caused by trajectory correlation, Zhao et al. designed an effective model to simultaneously deal with social relationship attacks and reidentification attacks while Security and Communication Networks maintaining a high data utility [11]. In their model, they proposed a sliding-window algorithm that is a variant of K-anonymity, i.e., K m -anonymity.
e K m -anonymity generates anonymized trajectories according to the socialaware distance, which concerns both the spatiotemporal distance and the social proximity. Moreover, the K m -anonymity processes the anonymity with sub-trajectories in an m-length window instead of the entirety of the trajectories. However, the K m -anonymity approach only satisfies k-anonymity and cannot resist homogeneous attacks or location inference attacks [12][13][14].
We summarize the three category works about privacypresrrving approaches in Table 1.
Evolutionary algorithm satisfying differential privacy. Evolutionary algorithms are widely used to solve multiobjective optimization problems. ere are few works on the application of evolutionary algorithms in privacy protection. In [17], Zhang et al. proposed PrivGene, a differentially private model fitting task using genetic algorithms. ey modified the genetic algorithm and proposed a differential private version of the genetic algorithm. In PrivGene, the authors use an exponential mechanism to select parent individuals for crossing and mutation, thus enhancing the security of the selection process.

Preliminaries
In this section, we introduce the basic concepts, including differential privacy, global sensitivity, and particle swarm optimization.
Definition 1 (Trajectory). A trajectory T is a time-series sequence of tuples (location, time), i.e., T � (ℓ 1 , t 1 ), (ℓ 2 , t 2 ), . . . , (ℓ L , t L ) , where ℓ i is a location consisting of latitude and longitude, t i is the moment when location ℓ i is generated, and L is the number of locations in trajectory T.
Definition 2 (Neighbor datasets). Suppose D 1 and D 2 are two datasets. D 1 and D 2 are neighbor datasets if and only if Definition 3 (Differential Privacy). Let A be a privacy protection mechanism, O be any output of A, and D 1 and D 2 be neighbor datasets. A will satisfy ε-differential privacy if we have: Definition 4 (Global sensitivity). Global sensitivity indicates the maximum difference of two query results over neighboring datasets. Suppose Q is a query function, the definition of global sensitivity is: ere are two widely used mechanisms for achieving differential privacy, i.e., the Laplace mechanism [15] and the exponential mechanism [32]. e Laplace mechanism is suitable for perturbing numerical query results, and the exponential mechanism is suitable for perturbing nonnumerical query results. We use the Laplace mechanism and exponential mechanism in RDPT.
ere are two common properties in differential privacy. e first property is sequential composition, indicating that a sequence of computations where each provides differential privacy in isolation also provides differential privacy in sequence, but the privacy budget ε is accumulated. e second property is parallel composition, meaning that if the sequence of computations is performed on disjoint databases, then the privacy budget is not accumulated additively, but rather determined by the worst privacy guarantees of all computations. e following definitions formally describe these two properties.
Definition 5 (Sequential Composition). Suppose an algorithm A runs n randomized algorithms: A 1 ; A 2 ; . . . ; A n , and each A i (1 ≤ i ≤ n) satisfies ε i -differential privacy. When publishing the output A that runs over a dataset D in sequence: Definition 6 (Parallel Composition). Suppose an algorithm A runs n randomized algorithms: A 1 ; A 2 ; . . . ; A n , and each Particle Swarm Optimization (PSO). PSO is an evolutionary algorithm [16]. In PSO, each particle searches for the optimal solution as an extremum relative to the objective. e optimal individual extremum in the swarm is considered to be the current global optimal solution. en, each particle adjusts its speed and position based on its extremum and the global optimal solution. e above process is iterated many times until PSO is convergent. en the current global optimal solution is the final solution of a given optimization problem.

The problem
Attack hypothesis: Suppose a third-party research institute obtains a trajectory dataset for analysis applications. Since the third-party institute is not always trusted and is likely a potential adversary, if the data owners had not perturbed the trajectories before they released the trajectory dataset offline, the adversary could have extracted the correlations from the trajectories, and the social ties among users or other privacy would be revealed. erefore, in this paper, we will suppose the adversary can obtain the following information.
′ is a perturbed trajectory and N ′ is the number of trajectories.
where for any user u ∈ U there is only one trajectory T ′ ∈ D ′ . (3) Quantification of the trajectory correlation and the trajectory privacy-preserving method in this paper.
Our goal is as follows: given a real trajectory dataset, we perturb the dataset to reduce the trajectory correlation between a trajectory and other trajectories to the greatest extent possible, while ensuring a high data utility. Even if the adversary obtains the perturbed dataset, the quantification method of the trajectory correlation and the privacy-preserving approach, the adversary still cannot infer the social relationship between users. In a word, we can protect as much privacy of individuals as possible while maintaining a high data utility. 5.1. e overview. Before describing RDPT in detail, we provide a high-level description of our approach. Suppose that the overall privacy budget consumed by RDPT is ε � ε 1 + ε 2 . ere are the following three steps in RDPT:

Releasing correlated trajectory datasets
Step 1. We divide the geographical space of D into N 2 identical cells and obtain a grid G via an adaptive grid partition method. en the trajectory T u in D is converted from location mode into cell mode, i.e., a cell trajectory T C u , and the dataset D C is obtained for cell trajectories. en the cell visit probability vectors are extracted from the cell trajectories D C , and we can quantify the trajectory correlation using the N 2 -dimensional cell visit probability vectors. For each cell in grid G, we add Laplacian noise into the density for locations in the top-level cell for all trajectories when we divide the cell (in the following, we call the cell a top-level cell) into bottom-level cells.
e density for locations in a top-level cell for a trajectory is calculated by normalizing the location visit frequencies that the duplicated visits to locations are removed. e privacy budget consumed in this step is ε 1 , and the bottom-level cells will be used in step 3 for synthesizing the final trajectory.
Step 2. According to the quantification method of the trajectory correlation and the cell visit probability vectors, in order to reduce the correlation of T C u and other perturbed cell trajectories, and to preserve the utility of the dataset as much as possible, we model a multi-objective optimization problem. en we solve the problem via a particle swarm optimization algorithm modified by an exponential mechanism and obtain a perturbed cell visit probability vector for T C u . e privacy budget consumed in this step is ε 2 .
Step 3. According to the perturbed cell visit probability vector, and the bottom-level cells (in which Laplacian noise is added when the top-level cells are divided into bottomlevel cells), we generate a synthetic trajectory of the location mode for the trajectory T u . After all the trajectories in the dataset D are processed one by one according to step 2 and step 3, we can obtain a perturbed dataset D ′ .

Adaptive Grid Partition.
It is difficult for us to generate the location visit probability vectors in the same dimension from the original location trajectory dataset D. erefore, we divide the geographic space of D into N 2 identical cells and obtain the grid space . en, the geographic space of D is discretized, and the original dataset D is converted into a set of trajectories in cell mode D C . We have following definitions.

Methods considering correlations between trajectories
Only releasing two trajectories [9,10] K m -anonymity [11] Not suitable for publishing trajectories more than three cannot resist homogeneous attacks or location inference attacks [12][13][14] Security and Communication Networks We improve the adaptive grid partition method in the literature [8]. Suppose that the total number of locations in a toplevel cell for a dataset D is X. Intuitively, a large X would lead to many bottom-level cells. However, in extreme cases, if all of the X locations occur in one place (or several near places), then all locations would be in exactly one bottom-level cell, which means that there is no location in other bottom-level cells, thus resulting in too many useless bottom-level cells and affecting the efficiency of trajectory synthesis in the last step. erefore, both the diversity and the number of locations need to be considered when constructing the query function for bottom-level cell partitioning. erefore, after we divide D into N 2 top-level cells, for each top-level cell C i we count the sum of normalized number of locations in that cell for all cell trajectories as follows: where f C i ,T C j in the numerator is the number of different locations inside the top-level cell C i for a cell trajectory T C j corresponding to original trajectory T j , |T C j | is the number of locations in T C j , and q(C i ) is the density of locations in the top-level cell C i for cell trajectories in D C .
q(C i ) is used for the adaptive partition of cell C i . To enhance the security of adaptive grid partitioning, we add Laplacian noise to q(C i ) to obtain a stochastic partition for the bottom-level cells.
e important parameter for Laplacian noise is global sensitivity. en we have the following theorem 1.

Theorem 1.
e global sensitivity of the query q(C i ) is 1.
Proof. Suppose that the neighboring datasets for the query are D C 1 and D C 2 � D C 1 ∪ T C , where T C is a cell trajectory. en we have the following deduction.
According to Definition 4, the global sensitivity of the query q(C i ) is 1. erefore, it suffices to add Lap(1/ε 1 ) noise to each query answer to obtain the noisy answers q(C i ) � q(C i )+ Lap(1/ε 1 ), where ε 1 is the privacy budget. en, each toplevel cell C i is further divided into B 2 i bottom-level cells according to q(C i ), where B i is proportional to q(C i ). B i is the same as that in the literature [8], i.e.,

Quantifying Trajectory Correlation.
Quantifying trajectory correlation is a fundamental problem. ere are three categories of methods: extraction of features from original check-in data [10,33,34], machine learning [35,36], and statistical information of trajectories [31,37]. e first category of methods has limitations: the lengths of different trajectories are different, and we need to preprocess the trajectories via interpolation to align the trajectories. en, unnecessary errors are introduced in quantifying the trajectory correlation. In addition, these methods have strict requirements on length of trajectories, for instance, the length of trajectory cannot not be too long. For the second category, the time cost is too massive to be suitable for applications in which the trajectory correlation needs to be computed many times. Conversely, the methods based on statistical information of trajectories consume less time and are more efficient, which is proportional to trajectory length or to the number of top-level cells. For our problem, the frequencies of visiting locations in the top-level cells are important feature for a trajectory, and the sequence of cells would describe the space distribution feature and transitions amongst check-ins within the trajectory.
en we use cell visit probability vector of a trajectory as the statistical information to describe the feature of a trajectory. In addition, we can align the trajectories in top-level cells by the adaptive grid partition. erefore, we compute the trajectory correlation using the third category of method, and the statistical information we use in RDPT is the cell visit probability vector of a trajectory.
Definition 8 (The cell visit probability vector). After we partitioned the geographic space of a trajectory dataset D to N × N identical cells, for a cell trajectory T C′ , we define a cell visit probability vector P C which is a N × N-dimension vector. e k-th component of P C is then calculated as follows: for a cell C k , if T C has X locations within C k and the number of locations in T C (or the length of T C ) is L, then the k-th component of P C is P C [k] � X/L.

Definition 9 (Trajectory Correlation).
e trajectory correlation S is a measure of the correlation between two trajectories. Suppose P C i and P C j are two cell visit probability vectors of two trajectories T C i and T C j , respectively, and a function sim is a type of method calculating the vector similarity. We define S as follows: ere are many common implementations for calculating the similarity between vectors, such as cosine similarity and Pearson correlation coefficient. erefore, our method can be applied to different specific implementations of trajectory correlation.

Modeling a multi-objective optimization problem.
From this subsection, we will describe the second step in RDPT. We first model a multi-objective optimization where P C is a N 2 -dimensional vector to be solved, P C,act u is the cell visit probability vector of real cell trajectory T C,act u , and Ψ � P C,syn v |v ≠ u denotes the set of perturbed cell visit probability vectors of trajectories for other users. By solving the extremum for two functions in formula (6), we can obtain a solution P C that is the perturbed cell visit probability vector P C,syn u corresponding to T C,act u .
Since P C is a cell visit probability vector, the sum of all dimensions in P C is equal to 1. Formula (6) has following constraint: To preserve data utility, we restrict the lower bound and upper bound of each dimension in P C . If a cell in a real cell trajectory T C,act u in time slot i is C j , the actual scope of activities for a user u is within C j and the adjacent eight cells of C j .
erefore, if C j is perturbed to its 9 adjacent cells (including C j itself) while solving the multi-objective optimization problem in formula (6), the data utility for the trajectory of the user u will not be largely lost. We can calculate the maximum count of locations from the 9 adjacent cells of C j , and the count is divided by the number of locations in T C,act u . en the j-th component of upper bound vector U ′ of P C for C j is obtained. erefore, after traversing each top-level cell C j (1 ≤ i ≤ N 2 ) corresponding to each cell in trajectory T C,act u , we obtain the upper bound vector U ′ for each dimension of P C , where the lower bound vector of P C is a zero vector.
We choose particle swarm optimization algorithm (PSO) to solve the problem in formula (6). Since a particle is the basic object in the iterative process of PSO, we need to integrate the cell visit probability vector P C into a particle. We will have following definition for a particle.
Definition 10 (Particle). A particle is a four-tuple W � (P C , V, Λ, F), where P C is the cell visit probability vector to be solved, P C denotes the position vector of a particle, V is the speed vector of a particle, F is the utility score function of a particle, which is determined by objective functions in formula (6), and Λ is the degree of violation of constraint: e particle swarm optimization algorithm with a linearly decreasing inertia weight (PSO-IW) in the literature [16] is a commonly used version. e C-th round iteration of PSO-IW has two important steps: select and update. In the select step, the new local optimal particles and global optimal particles are chosen from all the particles in particle swarm Ω, the historically global optimal particles and the historically local optimal particles. In the update step, the velocity vector V and position vector P C of each particle are modified. However, when calculating the objective function value of each particle in the select step, we need to read the real cell visit probability vector from the cell trajectory dataset D C that would lead to a potential privacy leakage. erefore, we modified the select step and present a particle swarm optimization algorithm with differential privacy to enhance the security in solving the multi-objective optimization problem in formula (6).

e particle swarm optimization algorithm with differential privacy.
In this subsection, we will describe the particle swarm optimal algorithm with differential privacy (PSO-DP) in detail.

e privacy budget for the C-th round iteration.
In PSO-DP, the noise of differential privacy is added in the select step. Specifically, the original step select is changed into step em_select, which satisfies differential privacy by introducing an exponential mechanism into selecting new local optimal particles.
In PSO-DP, suppose the step em_select is executed M times, and the privacy budget ε 2 is divided into M parts. When the number of iterations C are small, the the randomness of particles is strong and the difference between each particle in the particle swarm and the final solution is large. At this time, the particle swarm optimization algorithm can search the solution space thoroughly, and we do not need to allocate too much privacy budget to determine the global and local optimal particles earlier. With an increasing C, the particles in particle swarm Ω tend to be steady, and we need to allocate more privacy budget to reduce the randomness caused by differential privacy to avoid deteriorating the convergence of the particle swarm optimization algorithm again. erefore, we use reciprocals of triangular numbers with the following series, which elegantly provides this property: 1/3, 1/6, . . . , 2/M(M + 1), where the sum of the series converges to 1. In the series, the total privacy budget consumed by M iterations is less than ε 2 . Next, we divide the remaining privacy budget 2ε 2 /(M + 2) into M iterations, and then the privacy budget for the C-th iteration is computed as follows:

5.4.2.
e utility score function for the exponential mechanism. In step em_select, the selection of local optimal particles are perturbed by an exponential mechanism. To select the local optimal particle with the highest probability by using an exponential mechanism, each particle needs to be evaluated by the utility score function F. e evaluation of a particle is then determined by objective functions. erefore, according to formula (6) we perturb each particle with an exponential mechanism by the utility score function as follows.
In formula (10), F(P C ) is the utility score for cell visit probability vector P C , and O and O ′ denote the values of the two objective functions when substituting P C into formula (6). e function Rank denotes the rank for the objective function value of P C in all candidate particles. For example, Rank(O(P C , P C,act u )) denotes the ascending rank of the objective function value (O(P C , P C,act u )) in that of all candidate cell visit probability vectors.
In the formula (10) for F(P C ), the former part (before +) evaluates the preservation degree of the perturbing cell visit probability vector for the real cell trajectory T C,act u of user u, i.e., the data utility, while the latter part (after +) evaluates the reduction degree of trajectory correlations for the perturbing cell visit probability vector P C and perturbed cell visit probability vectors of other users, i.e., the data security. c is a weight, and we define c � 1/

��� �
. en we have: (1) c is inversely proportional to the privacy budget ε 2 for PSO-DP. When ε 2 is small, F(P C ) focuses on security, and when ε 2 is large, F(P C ) focuses on data utility; (2) the maximum value of c will not exceed 1; and (3) two square root operations over ε 2 and e � � ε 2 √ avoid c being too small (i.e., ε 2 is too large) or too large (i.e., ε 2 is too small). en, F(P C ) can balance the data utility and security to an appropriate extent.
According to the constraint condition in formula (7), when the constraint condition is satisfied, the trajectory correlation between the solution P C of user u in equation (6) and the perturbed cell trajectories of other users can be reduced. Namely, the security for the output of PSO-DP would be increased.
To reduce the noise added to the particle swarm optimization algorithm, after the utility scores F for the cell visit probability vectors of all candidate particles are calculated, the utility scores F of all particles are normalized, and then the global sensitivity of the exponential mechanism is 1. We thus have following theorem. (10) for all candidate particles are normalized, the global sensitivity of queries based on F is 1.

Proof. Suppose the set of cell visit probability vectors in candidate particles is
x is a cell trajectory. e sets of cell visit probability vectors for D C 1 and D C 2 are D P 1 and D P 2 , respectively. According to equation (10), we can calculate the utility scores for all cell visit probability vectors as F � F i | i ∈ [1, |Γ|] . Utility scores are the weighted sum of Rank(·). en we have the denominator of the normalizing utility scores: According to the sets of cell visit probability vectors D P 1 and D P 2 , as well as formula (10) and NF, we have: erefore, according to Definition 4 the global sensitivity of normalized utility score F is 1.
According to eorem 2, when we add noise to utility score F with an exponential mechanism, the global sensitivity is 1. □ 5.4.3. PSO-DP algorithm. We modify the particle swarm optimization algorithm with differential privacy and obtain the new algorithm PSO-DP shown in Algorithm 1. e output of Algorithm 1 is the perturbed cell visit probability vector P C,syn u . In Algorithm 1, the step em_select that adds noise for PSO-DP to enhance the security of Algorithm 1 is shown in Algorithm 2. After P C,act u is perturbed in Algorithm 1, user u is added into Ψ to update Ψ when we use Algorithm 1 to perturb the cell visit probability vector of the next user.
In Algorithm 2, when selecting the local optimal particles, an exponential mechanism is introduced. We illustrate the forumla following exponential mechanism to calculate the probabilities for particles in step 4. In step 5, we select a local optimal partilce according to the probabilities, and thus the security is enhanced.

Trajectory Synthesis.
After we obtain the perturbed cell visit probability P C,syn u , we synthesize the perturbed location trajectory T u ′ for user u.
We first obtain a top-level cell set Set C from P C,syn u , in which the visit probability of each cell is greater than 0. en we randomly select a cell from Set C for each time slot of T u , and from the cell we select a bottom-level cell with a maximum density of check-ins. We randomly generate a location in the bottom-level cell as the perturbed location. en the perturbed location trajectory T u ′ is formed. is is the post-processing of RDPT and will not consume privacy budget and can preserve the privacy of individuals [39] because when we partition a cell into botton-level cells we add Laplacial noise.
In the following, we illustrate Algorithm 3 to process all trajectories in a dataset. In Algorithm 3 we have two stages: the adaptive grid partition in line 2 -8, and calling PSO-DP in line 14 to perturb trajectories in the original dataset one by one. e privacy budgets consumed in each stages are ε 1 and ε 2 , respectively.

e Parameters and Convergence of RDPT.
In Algorithm 3, several parameters will influence the solution of the multi-objective optimization problem in formula (6). ese parameters are the population size α of particle swarm Ω, the number of new global optimal particles selected in each iteration β, and the parameter M that controls the maximum number of iterations. According to the literature [40,41], since there are two objective functions and one constraint in formulas (6) and (7), we will select an appropriate value in the range [70, 300] for the population size α. Usually we let β � 1. We empirically select appropriate M to control the maximum number of iterations to make the PSO-DP convergent for each trajectory, usually we set the M in [400,1000]. When ε 2 is larger, M is less. When ε 2 is less, for example ε 2 � 0.1, we need more iterations for PSO-DP(M is set 1000). Usually, for different datasets, these parameters should be analyzed and empirically adjusted to make the results better. Other parameters in PSO-DP, we did not adjust them.
It is difficult to prove the convergence of Algorithm 2(the PSO-DP algorithm). Usually, as shown in Algorithm 1 when the difference of values of objective function between two iterations is less than 10 − 3 in several consecutive iterations and the number of iterations is larger than M/2, we consider the PSO-DP is convergent. e exponent mechanism will interfere the convergence of PSO-DP, but the general trend of PSO-DP is convergent. In fact, the randomness introduced by exponent mechanism may let the PSO-DP algorithm jump out of a local optimal solution and get a better local optimal solution. Even so, there are several trajectories not convergent. At this time, we can re-execute the Algorithm 2 for the trajectory, and then the PSO-DP usually will be convergent. In our experiments, the Algorithm 2 is convergent for all trajectories in D.

Privacy Analysis.
e need for data privacy appears in two different scenarios. One is the data collection scenario in which individuals regard data collectors are untrusted and send their check-ins with local differential privacy (e.g. voluntarily on social network sites). e second is the data releasing scenario in which datasets are released to a thirdparty research institute for analysis applications and differential privacy protection is used over the centralized datasets. RDPT solved the problem of privacy disclosure in the second sceario before differential privacy implementation.
Next, we prove that the RDPT satisfies differential privacy, we first prove that Algorithm 1 satisfy differential privacy.
Proof. We first analyze Algorithm 2, in which the important step is selecting a local optimal particle. When Algorithm 2 is called in each iteration, the exponential mechanism selects a new local optimal particle in the particle swarm, and the corresponding candidate particle set Ω disjoints from that of other iteration calls. According to the parallel composition property of differential privacy in Definition 6, the privacy budget consumed by selecting a new local optimal particle for each particle is ε C 2 . erefore, the process for selecting local optimal particles in Algorithm 2 is max ε C 2 � ε C 2 -differential privacy. For the selection of global optimal particles, we only select β global optimal particles from the union of the newly local optimal particles and the historically global optimal particles; we do not need to add noise anymore. erefore, Algorithm 2 satisfies ε C 2 -differential privacy as a whole. In Algorithm 1, Algorithm 2 is sequentially called with maximum M times; hence, the privacy budget comsumed in Algorithm 1 is no more than M C�1 ε C 2 � ε 2 -differential privacy, according to the sequential composition property in Definition 5.

Theorem 4. Algorithm 3(RDPT) satisfies ε-differential privacy.
Proof. Suppose the total privacy budget is ε. We split it into two parts in RDPT, denoted by ε 1 , and ε 2 . ε 1 is for adaptive grid partitioning, while ε 2 is for perturbing the cell visit probability vector in Algorithm 3.
In line 2-8 in Algorithm 3, we add Laplacian noise to the result of formula (3) to ensure the privacy of dividing a toplevel cell into bottom-level cells. As shown in eorem 1, the sensitivity of formula (3) is 1, which means that the added Laplacian noise follows a distribution of Lap(1/ε 1 ). erefore, the adaptive grid partitioning in Algorithm 3 satisfies ε 1 -differential privacy.
When we perturb a real trajectory in a dataset, Algorithm 1 is called in Algorithm 3 in line 14. e Algorithm 1 only reads the real trajectory of one user and perturbs cell visit probability vector. According to the sequential composition property in Definition 5 and eorem 3, the process of perturbing a real trajectory satisfies ε 2 -differential privacy for Algorithm 1. When next real trajectory is processed, the real trajcetory that the Algorithm 1 reads is disjointed with previous trajectory. erefore, when we perturb real trajectories in a dataset, although Algorithm 1 are called N ′ times, perturbing real trajectories in a datset still satisfies (ε 2 )-differential privacy according to parallel composition property in Definition 6.
As a result, Algorithm 3 (RDPT) satisfies 2 i�1 ε i � ε-differential privacy according to the sequential composition property in Definition 5.
RDPT not only satisfies differential privacy theoretically but also reduce trajectory correlation, and then the privacy is then preserved.
□ Input: privacy budget ε 2 , real cell visit probability vector P C,act u for current user u, the set of other users Ψ � v | v ≠ u { } that the cell visit probability vectors are perturbed by RDPT (If u is the first user to be processed, then Ψ � ϕ), upper bound vector U′ for particle position vector, the number of global optimal particles β, population size α, and the maximum number of iterations M Output: perturbed cell visit probability vector P C,syn u of a user u where P C i is randomly initialized such that each component of P C i is not higher than the upper bound of its corresponding dimension in U′ and the sum of all dimensions of P C i is 1. V is a zero vector. Λ i is calculated by equation (8) and F i is calculated by equation (10), respectively. When F i is calculated, the values O i , O i ′ of the objective function are calculated by equation (6) (3) ε C 2 � 2ε 2 /(M − C + 1)(M − C + 2) + 2ε 2 /M(M + 2) (4) Initialize two sets: Ω * � Ω to save local optimal particles, Θ � to save global optimal particles. (5) Ω * , Θ � em_select(ε C 2 , Ω, Ω * , Θ, α, β) //call algorithm 2 (6) w min � 0.4, w max � 0.9 //the bounds of the weight w for updating the velocity of the particle and are obtained from the literature [16]. (7) c 1 � c 2 � 2 //two accelerating constants obtained from the literature [16]. , respectively (14) F′ (17) recompute Λ i and F i according to formulas (8) and (10), respectively. When F i is calculated, the values O i , O i ′ of objective function are calculated by formula (6).

Experiments and Analysis
Extensive experiments are conducted on three real datasets to verify the effectiveness of RDPT. By adjusting the total privacy budget ε, we compare RDPT with several methods over five metrics to verify that RDPT achieves almost the same data utility and better security. We also verify the stability of RDPT over two specific implementations for trajectory correlation.

Datasets.
We use three real datasets in our experiments: Gowalla [42], Yonsei [10] and Geolife [8]. e three datasets denote different applications for releasing trajectory data offline. Input: privacy budget ε C 2 , particle swarm Ω, the set of local optimal particles Ω * , the set of global optimal particles Θ, population size α, and the number of global optimal particles β Output: the new set of local optimal particles Ω * and the new set of global optimal particles Θ (1) for i � 0 to α − 1 do (2) let are utility scores of particles W i and W l Best i , respectively (4) According to the set of utility scores σ, privacy budget ε C 2 , and exponential mechanism, we can generate a set of probabilities /2))} //according to eorem 2 the global sensitivity is 1 (5) Sample a particle j from the two probabilities following the probability distribution in σ′.
7) end for (8) Π � Ω * ∪ Θ and Θ � //generate the candidate particle set for the global best particles according to [38]. (9) sort Π in descending order according to the objective function in formula (6) (10) select the previous β particles from Π as the new global best particles Θ (11) return Ω * and Θ find the bottom-level cell C ij of C i with largest q(C ij ) from MD (20) randomly select a location in C ij and add the location into T′ as a perturbed location in T′ Gowalla is a location-based social networking website where users share their locations by checking-in. e dataset Gowalla-New York (GNY) in our experiments is a subset of check-in records in the Gowalla dataset, in which latitude and longitude coordinates are located in the city of New York. e dataset depicts the relationships and daily activities of users in real social networking. In data preprocessing, the locations where latitude, longitude or timestamp (time slot) are null are deleted from these two datasets. In addition, if the number of locations in a trajectory is less than 80, the trajectory is also deleted from the datasets because the trajectory correlation between two users are formed in a relative long time. us the experiments over the dataset can simulate a real scene and patterns in releasing a trajectory dataset offline.
e Yonsei is a dataset collected by Yonsei University, Seoul, Korea [10]. e trajectories of nine graduate students at Yonsei University were collected using the mobile location service application SmartDc over two months in 2011. e dataset Yonsei-Seoul (YSO) is a subset of locations in the Yonsei dataset in which latitude and longitude coordinates are located in Seoul. Different from the GNY dataset, all users in Yonsei dataset have relationships and the data are denser than GNY dataset. Yonsei dataset also records the behavior of users, and simulate a real scene in releasing trajectories offline.
e Geolife dataset contains trajectories from 182 users over three years. Locations are represented by GPS latitude and longitude coordinates, the date and time when the user visits the locations. We selected trajectories of 33 users from October 23 to November 6, 2008 and all the locations are in Beijing. For each user, we connect multiple trajectories of the user into one trajectory, then we obtain a datasets(GEO) for our experiments. e data are much more dense than other two datasets. e statistical information of the dataset after preprocessing is summarized in Table 2. 6.2. e methods to be compared. We used DPT [6], TGM [7], and AdaTrace [8] as three comparison methods. ese methods consider the correlations within a trajectory. In addition, these methods are privacy-preserving approaches with differential privacy. Although the correlations within a trajectory in DPT, TGM and AdaTrace are different from the trajectory correlation between trajectories in RDPT, we still choose them for a comparison because their aim is to generate synthetic trajectories that can provide efficient protection while publishing a large number of trajectories offline. As for the methods in literature [9,10], these methods only perturbed two trajectories with the same length, and require the two trajectories are not two long. It is difficult to find a real dataset for the experiments to compare with them. So we do not compare with the two methods. For the comparison approaches, we use the parameters recommended by the literature or the parameters that improve the experimental results.

e metrics.
We select five metrics in our experiments to evaluate the data utility and privacy of RDPT. e metrics for evaluating the data utility are the Jensen-Shannon divergence of the location visit probability vector (JS) [6 -8], the Kendall coefficient of the location visit probability (Ken) [8], and the query error (Qerr) [8]. e ability to a resist Bayes inference attack (ϑ) [8] and the ability to protect the trajectory correlation (δ) are two metrics for privacy.
e Jensen-Shannon divergence (JS) of the location visit probability vector.
is metric evaluates the degree of preserving the location visit frequency vectors between two trajectories in the perturbed dataset. e smaller the value is, the better preserved the spatial distribution feature is, and the smaller the utility loss of the dataset is. e equation for calculating JS is: where P, Q denote global location visit probability vectors in the real dataset and perturbed dataset, and KL(·) is the Kullback-Leibler divergence. e Kendall coefficient (Ken) of the location visit frequencies.
e Kendall coefficient (Ken) evaluates the statistical results for any pair of locations in the datasets, such that the order of the visit frequencies of the two locations are not changed in the real dataset or perturbed dataset. e larger the value is, the more relative "hot" locations are preserved, and the higher the data utility is. To describe the metric, we first define "order preserving". "Order preserving" means that for two locations ℓ 1 and ℓ 2 , if the visit frequencies of ℓ 1 are larger than (or less than) that of ℓ 2 in the real dataset, as in perturbed dataset, then we say that ℓ 1 and ℓ 2 is a pair of "order preserving" locations. Suppose the number of "order preserving" pairs is X and the number of "order not preserving" pairs is X ′ over the real dataset and perturbed dataset, respectively. e equation for calculating Ken is: where n is the number locations in a dataset. e query error (Qerr). We define a query q as follows: counting the number of trajectories passing through a specific region R in a dataset. Suppose q(D) denotes the query results over a dataset D, and the query error (Qerr) is calculated as follows: where b controls the impact of the query results in extreme cases. In our experiments, we let b � 1% × |D| [8]. R is a set of 500 regions selected uniformly and randomly to avoid the query error influenced by accidental abnormal query results. e less Qerr is, the more number of trajectories is preserved that are traversed in different regions of the dataset, and the more conducive the dataset is for commercial block planning and other application scenarios. e three metrics are used in the three compared papers. e JS for the trip distribution in DPT [6], TGM [7] and AdaTrace [8] is the same as the JS in our paper and the trip distribution indicates the global location visit probability vectors. e Ken and Qerr are used in AdaTrace. erefore, we select the three metrics and ensure the fairness of the comparison.
e ability to resist a Bayes inference attack (ϑ). Suppose Z is a sensitive region; P(Z|D) is a vector corresponding to the prior Markov transition matrix of Z over a real dataset D; and the attacker knows the vector. P(Z|D ′ ) is a vector corresponding to the posteriori Markov transfer matrix of Z over the perturbed dataset D ′ , and the attacker also knows the vector. We will evaluate the difference of P(Z|D) and P(Z|D ′ ) using the Jensen-Shannon divergence as follows.
ϑ � max JS(P(Z|D), P Z|D ′ |Z i ∈ Z . (16) e less ϑ is, the less the difference between P(Z|D) and P(Z|D ′ ) is, the less privacy the attacker infers, and the higher the security is. To avoid being influenced by the specific sensitive area Z, we regard the maximum value of the calculation results of all regions in the sensitive region set Z as the final result. e ability to protect the trajectory correlation (δ). e metric evaluates the protection degree of trajectory correlation for a privacy-preserving method. e larger the value is, the better the trajectory correlation is protected, and the lower the probability that an attacker infers the social relationship through trajectory correlation is. e metric is: In the following, we compare RDPT with DPT [6], TGM [8], and AdaTrace [9] over above five metrics, including the Jensen-Shannon divergence (JS), the Kendall coefficient (Ken), the query error (Qerr), the ability to resist a Bayes inference attack (ϑ), and the ability to protect trajectory correlation (δ) over three datasets.

Experimental results and analysis.
We implement RDPT in Java, and the experiments are conducted on a computer with a 3.60 GHz Intel(R) Core(TM) i5-8600K, and 16 GB of memory. We adjust the parameters in our experiments to guarantee the Algorithm 1 is convergent. For each experiment, we repeat several times and obtained the average values for the five metrics. e parameters for our experiments are listed in Table 3.
In the following subsections, we compare the data utility when ε is increasing. We use the Pearson correlation coefficient and cosine similarity to calculate the trajectory correlation in RDPT. In following results, the RDPT-Pearson and RDPT-Cos are the results for trajectory correlation calculated by Pearson correlation coefficient and cosine similarity resprctively.
6.4.1. Comparison of data utility. We will first compare RDPT with DPT, TGM, and AdaTrace over three datasets for three data utility metrics: JS, Ken, and Qerr. e experimental results. Figure 1, Figure 2 and Figure 3 show the results of RDPT, DPT, TGM and AdaTrace on different ε over three datasets for three different metrics: JS, Ken and Qerr, respectively. From Figure 1, the results of JS for RDPT are better than other three methods. In Figure 2, the results of Ken, Ken for RDPT are better than other three methods over GNY and YSN datasets, but the results of Ken for RDPT-Peason is less than Adatrace over Geolife dataset when ε � 1 and the difference of Ken s between RDPT-Peason and AdaTrace is less than 0.04. erefore, the results of Ken over Geolife for RDPT are better or almost equivalent to other three methods.
In Figure 3(a), the results of Qerr, Qerr for RDPT are a slight worse than DPT and are very close to AdaTrace, but much better than TGM. From Figure 3(b), RDPT-Pearson and RDPT-Cos are better than TGM and AdaTrace, and a slight worse than DPT. However, the difference of the Qerr, Qerr s between DPT and RDPT is about 1%. In Figure 3(c), the results of RDPT are a slight worse than TGM, and are better than AdaTrace. However, the results of RDPT-Pearson are better than DPT and the results of RDPT-Cos are almost equivalent to DPT. After all, the Qerr, Qerr s for RDPT, DPT, and TGM are approximate.
From Figure 1 to Figure 3, RDPT preserves almost equivalent data utility to the comparison methods on two different implementations of trajectory correlation.
Analysis of the results. RDPT perturbs the trajectories one by one, and preserves the feature of spatial distribution of locations for each trajectory. Although DPT, TGM, and AdaTrace all focus on features of global spatial distribution, they replace the spatial distribution feature of each trajectory with the common spatial distribution feature of all trajectories.
en, when the spatial distribution feature is generalized, JS and Ken metrics are worse than or equivalent to RDPT. Moreover, as shown in the multi-objective functions in (6), the first part (i.e., O(P C , P C,act u )) is formulated for preserving the cell visit probability vectors of the trajectories. After solving it by PSO-DP, we can obtain its extremum and achieve the purpose, thus preserving a better JS and Ken metrics than DPT, TGM, and AdaTrace. As for the Qerr metric, the post-processing in step 3 introduced noise, then the query for the number of trajectories that are traversed in different regions would be changed slightly, then the Qerr, Qerr metric is a slight worse than DPT or TGM. As a result, the data utility for RDPT is almost equivalent to DPT, TGM, and AdaTrace.

Comparison of privacy.
e experimental results. Since the ability to protect trajectory correlation (δ) will be influenced by the specific implementations of trajectory correlation for DPT, TGM, and AdaTrace, then we show the results of DPT, TGM, and AdaTrace with different trajectory correlation for Pearson correlation coefficient and cosine similarity. e experimental results are shown in Figure 4 and Figure 5 with different privacy budgets ε over three datasets and two metrics: the ability to resist Bayes inference attacks (ϑ) and the ability to protect trajectory correlation (δ), respectively. As shown in Figure 4, for ϑ, RDPT-Pearson and RDPT-Cos are better than DPT, TGM, and AdaTrace on different privacy budgets ε. e results indicate that RDPT  could preserve the features of Markov transfer matrix in sensitive regions on datasets better than comparison methods, and RDPT could avoid the leakage of privacy in sensitive areas after the dataset being perturbed. For δ, the results of RDPT are better than DPT, TGM, and AdaTrace. In addition, the smaller ε is, the better the metric is. Analysis of the results. As the metric ϑ reflects the spatial feature of the Markov transfer matrix on trajectories, and the results from Figure 1 to Figure 2 show that RDPT better preserves the location visit probability vector (i.e., spatial feature). erefore, from the aspect of the privacy metric ϑ, the metric ϑ is also better than compared methods. For the metric δ, since RDPT focuses on preserving the spatial distribution feature on the trajectory of each user, the differences for the spatial distribution features between perturbed trajectories are large. While DPT, TGM and AdaTrace preserve the global spatial distribution features of all users, they generate perturbed trajectories by constructing mobile models based on global spatial distribution features. erefore, the spatial distribution features between perturbed trajectories are similar in the perturbed dataset, so the average value for trajectory correlations between perturbed trajectories and original trajectories has less reduction. Hence, their δ performance is worse than RDPT. Moreover, as shown in the multiobjective functions in equation (6), the second part (i.e., O′(P C , Ψ)) is formulated for protecting the trajectory correlation of different trajectories. After solving the optimization problem via PSO-DP, we can obtain its extremum and achieve the goal, which means that the trajectory correlation is well protected (i.e., δ ( ). Consequently, RDPT naturally outperforms DPT, TGM and AdaTrace over δ. From the Figure 5, the results of δ over Geolife dataset for RDPT is not better than GNY and YSN, the reason is that the GNY and YSN datasets are for real social networking and some users have relationships, then trajectory relations between users are higher, while the users in Geolife dataset usually have not relationships in social networking, therefore the descent degree of trajectory correlation is not obvious.

Conclusion
In this paper, we proposed a differentially private trajectory publication method, named RDPT, to protect the trajectory correlation. We designed a multi-objective optimization problem that aims to reduce the trajectory correlation between a given trajectory and other trajectories. We integrated the optimization problem into RDPT and solved it through a modified particle swarm optimization algorithm with differential privacy. e experimental results on three real datasets show that RDPT achieves almost equivalent data utility to existing methods, as well as on two different specific implementations of trajectory correlation. Moreover, RDPT achieves a better privacy insurance than existing methods, and RDPT is more suitable for preserving privacy of long trajectories. For the dense dataset like Geolife which has overlength trajectories, how to improve our method, is our future work.

Conflicts of Interest
e authors declare that they have no conflict of interest.