Privacy Protection Method for Vehicle Trajectory Based on VLPR Data

,


Introduction
With the coming of the big data era, big data are increasingly connected with our lives. e development of locationaware technology, such as mobile communications and sensing, digitizes the geographical locations of people and objects. Generalized data have the characteristics of being large scale, fast updating, and high value, which has brought great changes to people's lives, business operations, and scientific research. Trajectory data are the sequence of location information from a moving object that is recorded in chronological order.
ere are two main applications for trajectory data: one for online applications and the other for offline applications. e former generally refers to location-based services (LBSs) in which location service providers can provide customized services by obtaining the real-time location information of mobile objects. For example, when a customer searches for a destination using a map application, the service provider can provide the customer with a reasonable path based on the customer's current trajectory. e latter refers to the location service provider's desire to discover the intrinsic value of a user's trajectory so that the trajectory data can be directly acquired and analyzed or made available to third parties. Traditional car insurance companies, for example, set insurance prices based on the average risk of vehicles, but as user customization needs increase, companies have begun to rely on travel time, public locations, travel processes, and other information in data organizations to provide customized services to users [1,2]. However, while location data facilitates people, it also increases the risk of personal privacy disclosure because the data contain not only the user's location information but also the user's home address, lifestyle, health status, social relations, behavioral patterns, and other implicit information. For example, Gambs et al. use GPS data to successfully infer a person's home address and company information [3]. Matsuo et al. use position sensor data to build user feature matrices, classify data using machine learning techniques, and accurately infer individuals' work, age, and smoking habits [4]. Song et al. use personal trajectory data to explore people's past, present, and future positions with 93% accuracy [5]. Using telephone call data from a European country and analyzing it based on random space-time points, Montjoye et al. found that 95% of users could be accurately identified by randomly selecting 4 space-time points [6]. ese examples show that privacy protection is a critical issue.
Trajectory privacy is defined as not being traceable as moving objects at a certain distance or time. Trajectory privacy preservation mainly focuses on LBS-based privacy protection and the privacy protection related to released data; the related work is described below. Olurin et al. believe that the privacy preference platform (P3P) and P3P-based technologies have great developmental potential; therefore, they summarize the research on P3Ps and explain that the current challenge is to design a formalized privacy policy language [7]. For smartphone query privacy, Shokri et al. propose a user collaboration privacy protection method based on LBSs, which saves user query information in a buffer and passes it to other people who search for such information. Simulation results show that this method is lightweight and has low collaboration costs [8]. Khoshgozaran et al. desired strong privacy and proposed a PIR-based framework that processes server answered client queries without inferring any information. Experiments that were conducted for real-time networks have validated the usefulness of the scheme, and the existing challenge is how to reduce the space and time costs [9]. Terrovitis and Mamoulis show that there is a high risk of privacy breaches by adversaries if the trajectories are published during processing that only hides the identities of users. ey develop a data suppression technique that iteratively suppresses selected locations from the original trajectory until privacy constraints are met to prevent this type of breach [10]. Based on the k-anonymity model, Nergiz et al. propose a method based on generalization to solve the problem of individual recognition in static trajectory datasets. ey redefine the notion of k-anonymity for sequences of space-time points and further enhance privacy protection by releasing only a randomly generated set of representative trajectories. Experiments on real and synthetic data show that the method is effective for publishing trajectory datasets [11]. Sun et al. deem that a privacy mechanism should be consciously and consistently integrated in system designs, and, on this basis, the virtual travel line model (VTL) is proposed. e VTL can be used to standardize data when and where mobile data are collected to meet the needs of traffic modeling and privacy protection. Traffic-knowledge-based adversary models are developed to evaluate the performance of the VTL zone method and filtering algorithms. e algorithm is evaluated in terms of its privacy protection and data demands to meet traffic applications [12,13]. e above studies show that trajectory privacy has become a key issue in trajectory data publishing. VLPR data, as a kind of trajectory data, face the same problem.
With the development of intelligent transportation, increasingly more urban service organizations hope to be able to obtain VLPR data from traffic management departments to provide better urban service solutions [14]. However, even if the vehicle license plate numbers have been replaced by qualified identifier (QI) before the data are released, the trajectories are still unique relative to their QIs. Previous works [15] have shown that adversaries can still connect trajectories to users using background knowledge. Gao et al. find that even when the temporal granularity is set to be half a day, five spatiotemporal records are sufficient to uniquely identify approximately 90% of individuals using VLPR data. Besides, in a dataset, an adversary can easily get a unique trajectory in a time interval [t 1 , t 2 ] when the point of interest (POI) information for the trajectory is open to the public; thus, if the adversary has the background knowledge that Mike once visited the POI in the same time interval [t 1 , t 2 ], then the adversary has a reason to infer that the trajectory comes from Mike. In this situation, traffic management departments decide to disclose the VLPR data without the vehicle license plate numbers in case the trajectory is linked and thus the privacy information is violated. However, in our research, it is found that the privacy adversary can still use different kinds of methods to acquire the unique vehicle trajectories from incomplete data. If the VLPR data are released without processing, more than 90% of the vehicle trajectories between the upstream and downstream VLPRs can be successfully linked. To reduce the probability of the successful linkages of vehicle trajectories, the (m, n)-bucket model based on time series is proposed, and the results can provide theoretical suggestions for relevant institutions when releasing VLPR data. e rest of this paper is organized as follows. e notations used throughout this paper are summarized in Section 2. Section 3 offers a brief introduction of the VLPR system. An adversary model based on VLPR data is proposed in Section 4. A VLPR data publishing method, namely, the (m, n)-bucket model based on time series, is proposed in Section 5. e criteria that are used to evaluate the performances of the (m, n)-bucket model and address the data requirements for privacy preservation are discussed in Section 6. Some case studies are carried out to evaluate the effectiveness of the (m, n)-bucket model by making privacy attacks in Section 7. Some conclusions are summarized in Section 8.

Notations
We use the notations listed in Table 1 throughout this paper.

VLPR System
e VLPR is a system that adopts advanced photoelectric technology, image processing technology, and pattern recognition technology to take an image of every vehicle passing by and automatically identify the license plate of the vehicle. Today, VLPR systems are ubiquitous in cities, and they are usually placed a few meters away from the road. When a vehicle is detected at a distance in front of the device, the system begins to capture the vehicle's image. e vehicle's information, such as the vehicle's license plate number, type, and color, is identified by the image algorithm; then, the identifying data that are related to geographic information in the VLPR system are stored in the database and are finally uploaded to the transportation authority data center. e storage format for VLPR data can be described as follows.
As shown in Table 2, the vehicle license plate number is directly associated with user privacy, which is classified as SA information. e other information consists of additional descriptions of the vehicle passing through the device, which are called QI attributes. It is a common way to encrypt or encode SA attributes before releasing data so that adversaries do not have direct access to owner information, but in this situation, the vehicle trajectory can be easily obtained. erefore, the data management department will choose to suppress the SA attributes; however, based on our research, there are still ways to breach trajectory privacy by using only QI attributes, which increases the likelihood for an adversary to breach a vehicle owner's privacy. To the best of our knowledge, related work on this topic has not yet been reported.

Adversary Model
To link the traces from two VLPRs, an adversary model based on the passing time, color, type, and brand of vehicles is proposed. Even though this adversary model only considers three QI attributes of the vehicle other than passing time, the successful linking rate is impressive (see Section 6). In this part, the adversary model is illustrated in detail. Two cases are given.

Trajectory Linking between Two Neighboring VLPRs without an Intersection (Case 1). Consider two neighboring
VLPRs where there is no entrance or exit between them, which is the simplest case, as shown in Figure 1. Suppose that the travel time from VLPR 1 to VLPR 2 is relatively stable.
Define the time when vehicles are at VLPR 1 as the departure time (denoted as Ω 1 ) and the time when vehicles are at VLPR 2 as the arrival time (denoted as Ω 2 ). By observing the departure and arrival times from the VLPR system of Shenluo highway in Foshan, China, it is found that they almost satisfy a log-normal distribution, as shown in Figure 2. In most studies [13,16,17], the estimated travel time is estimated using the length of the road divided by the limited speed on the road. However, the traffic flow is different at different times, resulting in different average Represents the value of the specified column SA i being shuffled e number of tuples with the same QI n p_max_onemb e maximum probability of one matching bucket P_one e data structure that stores all the p_max_onemb of one tuple p_all_one e sum of all the p_all_onemb in P_one Pr Actual linking probability Pc Total calculated probability speeds. As a result, it is difficult to accurately calculate the estimated travel time at different moments using this method. In this paper, the estimated travel time is estimated using the log-normal distribution function. e following equation can be easily derived as follows: In the previous equation, t is the travel time from VLPR 1 to VLPR 2 for the vehicle, μ is the logarithmic mean value, and δ is the logarithmic standard deviation. Since this is not the focus of this paper, it will not be discussed in detail here. For more information about the log-normal distribution function, refer to the relevant papers [4,18]. e fitting effect is also shown in Figure 2.
In this paper, if the vehicle n ∈ Ω 1 and the vehicle m ∈ Ω 2 are released, T n,p 2 is used to represent the estimated arrival time of vehicle n.
is is shown in the following equation, where E(x) is the expected travel time and T n 1 is the departure time of vehicle n: However, it is impossible for each vehicle to travel exactly as predicted. erefore, the estimated time should be predicted by using a threshold T t . While T t � rate * E(x), rate is a constant. at is, when a trace n ∈ Ω 1 , the estimated arrival time can be predicted within the time period In other words, a suspect list (denoted as S n ) can be obtained for VLPR 2 in the time period. If S n is not empty, when S n only contains one vehicle m, it can be said that m is the same vehicle as n. However, S n usually contains a number of vehicles, which indicates that certain rules should be employed to confirm the target. To determine which vehicle in S n is n, the other attributes of the VLPR data are used, which have been illustrated in detail in Section 2. In this paper, in addition to time, only three QI attributes (type, color, and brand) of the VLPR data are considered to recognize which vehicle at VLPR 2 is the same vehicle as that at VLPR 1. In this adversary model, the QI attributes from the VLPR data are utilized to reduce the number of suspected vehicles. Obviously, the more the QI attributes that are utilized, the fewer the suspicious objects that will remain. In the case that the filtered suspect list still contains more than one vehicle, the closest time rule is used to find the predicted vehicle, which satisfies.
where T m 2 is the time for vehicle m in S m and S m is a vehicle set of VLPR 2. e previous equation indicates that vehicles m and n are the same vehicle since the arrival time of m is the closest to n. at is, the vehicle trajectories at the two VLPRs are successfully linked, which will lead to a privacy disclosure. Algorithm 1 describes the process of trajectory linking in detail.

Definition 1. T i
1 and T i 2 represent the time that vehicle i is at VLPR 1 and VLPR 2, respectively. In addition, we also have the color (referred to as C i 1 and C i 2 ), type (referred to as Ty i 1 and Ty i 2 ), and brand (referred to as Br i 1 and Br i 2 ) of the vehicle. t i 1 and t i 2 denote vehicle i at VLPR 1 and VLPR 2, respectively. t k n represents vehicle k in the suspect list S n . S m is the output of the algorithm, namely, the predicted vehicles.

Trajectory Linking with an Intersection Situation (Case 2).
e road topology is complicated in real situations. Vehicles that arrive at one VLPR usually come from different VLPRs (e.g., VLPR 1, VLPR 2, VLPR 3, and VLPR 4). e vehicles  that arrive at VLPR 1 may come from VLPR 2, VLPR 3, and VLPR 4 with the same probability, and the adversary wants to link the vehicle traces from VLPR 2, VLPR 3, and VLPR 4 to those in VLPR 1, as shown in Figure 3. at is, the adversary wants to know what upstream VLPR each vehicle in VLPR 1 comes from. e travel time for each VLPR is predicted following the same rules as in Case 1, which can be estimated using Algorithm 1.
Different from Case 1, in the simplest case (without intersection), all the vehicles arriving at VLPR 2 come from VLPR 1. If two vehicles meet the requirements of Algorithm 1, it can be said they are the same vehicle from VLPR 1 to VLPR 2. However, in the complicated case (Case 2), we want to know which upstream VLPR the vehicle comes from. Sometimes there may be many of the same vehicles in different upstream VLPRs. at is, when it is found that vehicle m is at VLPR 1, there may also be one vehicle p from VLPR 2 and one vehicle q from VLPR 3 using Algorithm 1 if the two vehicles p and q have the same attributes, as shown in Figure 4.
at means that it fails to link these kinds of vehicles. Fortunately, the probability of this kind of vehicle is so small that it can be ignored. erefore, Algorithm 1 can also be used to link the trajectories in intersection situations.

A Privacy Preservation Method for VLPR Data Publishing
Privacy preservation is a serious concern when publishing VLPR data. If the original VLPR data are released without any processing, the VLPR system can be more vulnerable to privacy breaches, as shown in Section 6. To reduce the probability of trajectory linkages, the VLPR data publishing method, namely, the (m, n)-bucket model based on time series, is proposed.

Publishing the Original Data.
Publishing the original data refers to releasing all the data that are collected in a VLPR, which provides the least privacy protection but the most usable data. If all data that are collected in VLPR systems are released, then when VLPR systems' layouts are dense enough and the data quality is good enough, it is easy for an adversary to link the traces between the upstream and downstream VLPRs. In fact, due to the complexity of the road network, it is not a simple process for adversaries to keep track of the same vehicle between several intersections. However, with the development of smart cities, road network designs have become increasingly more standardized, VLPR systems' layouts have become increasingly more intensive, and the data that are collected have become increasingly more accurate; therefore, it will become increasingly easier for adversaries to link traces between multiple VLPRs in the near future. If all vehicle traces are released without processing, as shown in Section 6, a majority of the traces can be linked by using specific adversary models. erefore, a certain data processing approach is needed to process the data that are collected in VLPR systems to ensure the privacy of travelers.

(m, n)-Bucket Model Based on Time Series
e application of generalization to protect privacy disclosures has been very well studied in the literature [19][20][21]. is approach consists of generalizing a dataset to a certain extent by transforming the QI values into more general forms and dividing tuples into equivalent groups, where the values of each QI attribute are the same. erefore, the records in the same equivalence group are indistinguishable, which meets k-anonymity [22][23][24] and ldiversity [25,26]. Although generalization is a common method for privacy preservation, it makes the data lose much information and fails with high-dimensional data, which is not conducive to the effective analysis of data. To overcome the drawbacks of generalization, a bucketization method called anatomy, which protects both the privacy and correlation in the microdata, is proposed [27]. Bucketization first divides the tuples in the table into buckets and then randomly separates the quasi-identifiers (QIs) with sensitive properties by traversing the SA values in each bucket. Although bucketization can be used to deal with high-dimensional data, it also tends to disclose more information because all the QIs are released. To overcome the limitations Input: A set of released traces (Ω 1 ) and traces (Ω 2 ) that pass through VLPR 1 and VLPR 2, respectively Output: Predicted vehicles (S m ) (1) for t i S n ⟵ S n ∪t i 2 (6) end if (6) end for (7) for t k n ∈ S n (8) if C i 1 � C k n , Ty i 1 � Ty k n , Br i 1 � Br k n and T k n is closest to T i Journal of Advanced Transportation 5 of generalization and bucketization to get better privacy preservation and data utility, the slicing algorithm was proposed, which can prevent attribute disclosure and membership disclosure while releasing data [28]. However, slicing has been found to risk the disclosure of some absolute facts, which would help adversaries find invalid records in released datasets, resulting in personal privacy violations [29]. To improve the confidentiality of the published data in a slice table, a method called value swapping is proposed. Its purpose is to decrease the attribute disclosure risk for absolute facts and ensure l-diverse slicing. To meet the different requirements for the protection of identities and sensitive values, an approach called cross-bucket generalization is presented [30]. is approach can meet the requirements by providing higher attribute protection than identity protection and is proven to be useful for privacy preservation.   Besides, there are other methods to replace sensitive attributes with anonymous characters independently at random [31] or add randomized records to the original database [32]. ese methods usually can achieve good privacy protection. However, the former causes a great deal of information loss, and the latter adds a lot of information. is is unacceptable in the field of big data. In order to achieve the effect of privacy protection and keep the original data as much as possible, we proposed the algorithm that shuffles the sensitive attribute values in the same bucket.
is method neither adds data nor decreases data but just shuffles sensitive data in the same bucket.
All the related works that are mentioned above are not suitable for vehicle trajectory privacy preservation because existing privacy protection methods are applicable to data without time correlation. erefore, in order to meet the kanonymity or l-diversity requirement, the order of the data in the table can be adjusted at will. However, the VLPR data have strong time correlations, and so the positions of the data in the table cannot be adjusted at will. In conclusion, none of the above privacy protection methods are applicable to vehicle trajectory privacy protection. erefore, based on the above research, the (m, n)-bucket model based on time series is proposed in this paper.

Formalization of (m, n)-Bucket Model.
Define T as the VLPR data that is to be published. T contains d quasiidentifier (QI) attributes QI � QI 1 , QI 2 , . . . , QI d and e tuples T � t 1 , t 2 , . . . , t e . Each QI i (1 ≤ i ≤ d) can be categorical in the VLPR dataset.
Definition 1 (Bucket). A tuple partition consists of several subsets of T such that each tuple belongs to only one subset.
ese subsets are referred to as buckets, and we denote them as B 1 , B 2 , . . . , B b . en, ∪ b i�1 B i � T and, for any Definition 2 ((m, n)-bucket). T is sorted according to time series and is divided into several buckets where each bucket contains m tuples. After dividing T into several buckets, the values of the specified columns are shuffled inside the bucket, and n is the number of columns being shuffled. For example, in Table 3, when t 2 belongs to VLPR 1, t 2 � (purple, large, Toyota). When the QI brand column is the only sensitive column that needs to be shuffled, the set of MB for t 2 is B 1 , B 2 . If the QI brand and type are the sensitive columns, the set of MB for t 2 is B 1 , B 2 , B 3 .

Privacy Preservation.
In this section, the protection principle of the (m, n)-bucket model for vehicle identities is analyzed in detail. Generally, a tuple t may have multiple MBs. Consider an adversary who knows all the QI values of t in VLPR 1 and attempts to infer t's SA i (v) in VLPR 2 to link the trajectories according to time inference, where SA i (v) represents the value of the specified column SA i being shuffled. e adversary needs to determine where buckets t might reside. Tuple t may be in any one of its MBs. Let p(t, B) be the probability that tuple t is in bucket B. It is worth mentioning that the calculation of p(t, B) only takes the nonsensitive QIs into account. For instance, in Table 3, if the brand column is the only sensitive column, then B 1 and B 2 are the MBs of t 2 and B 3 is not because the nonsensitive type and color columns must be bound together.
In the second step, the adversary computes p(t, SA i (v)), which is the probability that t contains SA i (v). In other words, it is the probability that t is successfully identified. Specifically, let p(SA i (v) | t, B) represent the probability that t contains SA i (v) given that t is in bucket B. en, the probability p(t, SA i (v)) can be computed as In a VLPR table T, when the (m, n)-bucket model is used for any individual t, p(t, SA 1 (v), SA 2 (v), . . . , SA n (v)) meets the following formula: Here, v ′ denotes the most frequent value in SA i , and In the previous formula, d and n represent the number of QI attributes and sensitive attributes, respectively. erefore, formula (4) is converted into the following formula: For all buckets, v is always less than v ′ ; then, If n � 1 and SA 1 is the sensitive column, then you obtain Hence, erefore, when n � 1, formula (5) is true. If n � 2 and SA 1 and SA 2 are the sensitive columns, then you obtain Suppose that when n � k and SA 1 , SA 2 , . . . , SA k are the sensitive columns, the following formula (13) is true.
When n � k + 1 and SA 1 , SA 2 , . . . , SA k , SA k+1 are the sensitive columns, you obtain Because In summary, formula (5) is proven to be correct. is means that the greater the value of n, the lower the probability of t being successfully identified. Under the condition of a constant n, in general, the greater the value of m, the greater the probability that p ′ decreases. When m increases by 1, only if the added value is equal to SA i (v ′ )p ′ increases; otherwise, it decreases. Since SA i usually contains many different values, the probability of it being not equal to SA i (v ′ ) is always greater than the probability that the added value is equal to SA i (v ′ ).

(m, n)-Bucket Algorithm.
is section presents an algorithm to implement the (m, n)-bucket model. It calculates the maximum linking probability of each individual. e procedure is presented in Algorithm 2. e variable t j [QI n ] denotes the nonsensitive columns of vehicle t j (line 5). e data structures T_suspect and Buckets store the suspect tuples and suspect buckets, respectively (lines 6 and 7). SA n denotes the sensitive columns (line 9).
MB stores the matching buckets (line 10). e variable s_num is the number of tuples with the same QI n as t i (line 12). e variable p_max_onemb is the maximum probability of one matching bucket (line 16). e data structure P_one stores all the p_max_onemb of one tuple, and the variable p_all_one is the sum of all the p_all_onemb in P_one. e data structure P_all stores all p_all_one.
For example, in Table 3, suppose that the times of all tuples meet the requirements.
at is, for t j in Table 3, e adversary gets one tuple t i � (purple, large, Toyota) from VLPR 1. e maximum probability of t i that the adversary can guess from Table 3 is calculated as follows. Suppose that the brand and type columns are the sensitive columns, where SA n is the brand and type column and QI n is the color column. e tuple t j denotes t 2 , t 4 , t 7 , t 9 , and t 11 . In other words, the tuples t 2 , t 4 , t 7 , t 9 , and t 11 are stored in T_suspect.

Evaluation Criteria
In this section, the criteria that are used to evaluate the performances of the (m, n)-bucket model and address the data requirements for privacy preservation are discussed. To evaluate the performance of the (m, n)-bucket model, the proposed adversary model is used. e percentage of correctly linked trajectories (denoted as pr) is proposed to evaluate the privacy preservation. Let pr be the probability that the trajectories of one vehicle can be successfully linked between two VLPRs. It can be calculated by using the correct number of inferences divided by the total number of trajectories passing through both the VLPRs.
It is important to ensure that after the privacy preservation method is applied, the published datasets can still be used for traffic modeling or other research purposes, such as the impact of different vehicle types on environmental pollution and people's preferences for vehicle colors and brands. According to the needs or priorities of different users, data managers can release data with different ms and ns by using the (m, n)-bucket model that is proposed in this paper. In other words, high-priority users can access data with smaller ms and ns while low-priority users can access data with larger ms and ns.

Experiments for Case 1 and Case 2.
In this section, the proposed adversary is used to evaluate the effectiveness of the (m, n)-bucket model by making privacy attacks. e evaluation is done using VLPR data that were collected at Shenluo highway (Case 1) and the intersection of Songcheng road (Case 2), in Guangdong Province, China. e datasets have 3 discrete attributes as summarized in Table 4. e number of vehicles in Case 1 and Case 2 is illustrated in detail in Table 5. In Case 2, the reason why the number of vehicles (both VLPR 1 and VLPR 3) is much larger than other junctions is that it is the main road. e geometry of the network is shown in Figure 5. e road network is located in Foshan, China. Shenluo road belongs to highway and Songcheng road is an urban road. In Case 1, vehicles can only travel from VLPR 1 to VLPR 2 because there is no other exit between them. e road length between VLPR 1 and VLPR 2 is approximately 10 kilometers. e data that are collected between 10:00 a.m. and 12:00 p.m. are used in Case 1. e road network in Case 2 includes two intersections and the length between them is approximately 0.7 kilometers. For one vehicle trace that is at the downstream VLPR 1, it may be the same vehicle as the target trace in the previously given released traces in upstream VLPR 2, VLPR 3, and VLPR 4. e data that are used in Case 2 were collected from July 9 th to 11 th . e numbers of traces in the two cases are illustrated in detail in Table 5. Based on the proposed method, it is assumed here that the adversary has access to the released vehicle traces within each VLPR. is kind of data could be archived historical data. Using the proposed adversary model to attack the released datasets, the performance of the (m, n)-bucket model is evaluated with respect to privacy.
(1) Divide each m tuple into one bucket according to the time series (2) for t i in VLPR 1 add t j to T_suspect (7) add all tuples with the same bucket id as t j into Buckets (8) for each bucket in Buckets (9) if any tuples in bucket with the same SA n (n ∈ 1, 2, . . . , n { }) as t i (10) add bucket to MB (11) if MB is not empty (12) s_num � the number of tuples with the same QI n (n ∈ n + 1, n + 2, . . . , d { }) as t i (13) for each bucket in MB (14) calculate the number of maximum values in each of the sensitive columns, named a 1 , a 2 , . . . , a n (15) calculate the number of tuples with the same QI n (n ∈ n + 1, n + 2, . . . , d { }) as t i , named b (16) then p_max_onemb � (b/s_num) * (a 1 /m) * (a 2 /m) * . . . * (a n /m) (17) add p_max_onemb to P_one Suppose that the original VLPR data are released and the proposed adversary model is used to track the vehicle trajectories between the upstream and downstream VLPRs with different thresholds. Figure 6 illustrates the change of p with the different rates, where the x-coordinate is the rate, and the y-coordinate is the tracking probability. Figure 6(a) shows the performance in Case 1 from 10:00 a.m. to 12:00 p.m., where 10 h represents the probability that the vehicle trajectories are successfully linked between 10 and 11 o'clock and so on. Figure 6(b) shows the performance in Case 2 from July 9 th to 11 th , where 2-1 July 9 th denotes the probability that the vehicle trajectories are successfully linked between VLPR 2 and VLPR 1 on July 9 th and so on. Figure 6 illustrates that releasing all of the original data within each VLPR is useful to preserving privacy. However, if Tt is more than 0.7 times the estimated travel time, this is not sufficient since there is still a large proportion of vehicle traces (approximately 94% in Case 1 and more than 90% in Case 2) that can be successfully tracked. However, the general performance of the adversary model varies from one case to another, depending on the road topology between the upstream and downstream VLPRs. For example, if vehicles get off the road or end their trips in the middle of the VLPR pair, the estimated travel time would be inaccurate, eventually leading to a decline in the successful linking rate. In conclusion, in order to obtain the maximum linking rate, the value range of Tt is suggested to be no less than 0.7.
We assume that the adversary is always able to track vehicle trajectories with the maximum probability. In Case 1 and Case 2, the privacy performances with different ms and ns are illustrated in Figure 7, where the x-coordinate is m and the y-coordinate is the tracking probability. In Figure 7, type means that only the column type is the sensitive column; type & color means that the type and color columns are the sensitive columns; type & color & brand means that the type, color, and brand columns are the sensitive columns; and so on. Figure 7(a) represents the results of Case 1. Figure 7(b) denotes the results between VLPR 1 and VLPR 2 in Case 2, Figure 7(c) represents the results between VLPR 1 and VLPR 3 in Case 2, and Figure 7(d) denotes the results between VLPR 1 and VLPR 4 in Case 2.
As seen from Figure 7, if only one column is the sensitive column, the probability converges at a certain value and does not continue to decline. For the same curve, p decreases at first as m increases. When m increases to a certain value, p converges within an interval range. e convergence rate is different for different sensitive columns. For different ns, p declines at different rates. e larger n is, the faster p drops. If n and m are big enough, the probability can even reach close to zero. For Case 1 and Case 2, p changes similarly, which            proves that the adversary model that is proposed in this paper is not only useful to the simple situation (Case 1) but also useful to the intersection situation (Case 2).

Comparison.
To verify the effectiveness of the (m, n)bucket model that is proposed in this paper, the following experiment is performed using the 2018/07/30 12:00-12:59 data in Case 1. Since the calculated probability is the maximum probability (denoted as p(t, SA i (v)) in Section 4.2.3) of each individual being linked while the actual linking probability (denoted as pr) is for all individuals, in order to compare p(t, SA i (v)) and pr, the average value of p(t, SA i (v)) for all individual t's is used as the total calculated probability (denoted as pc). e results are shown in Figure 8, where pc-type represents the calculated probability with type as the sensitive column, pr-type denotes the actual linking probability with type as the sensitive column, and so on. In general, pc is greater than pr, but there are special circumstances. For example, in Figure 8(b), when m is equal to 2, the pr-type-brand is greater than the pc-type-brand.
is is because a small probability does not necessarily mean that the link fails.
e results show that the derived mathematical model can well expound the privacy protection principle of the (m, n)-bucket model.

Conclusion
In this paper, an adversary model based on the passing time, color, type, and brand of VLPR data is developed to evaluate the performance of the proposed (m, n)-bucket model based on time series. When tracking vehicles between two VLPRs, releasing original data within VLPRs helps protect privacy but not at a satisfactory level; that is, this method of publishing data can lead to a high risk of privacy disclosure. erefore, a data publishing model that can protect privacy is needed. e proposed (m, n)-bucket model is useful for privacy preservation. It overcomes the limitations of generalization and bucketization that can only handle single-column SA and data without time correlation requirements. e proposed model is a promising technique for handling highdimensional data because ns can be any value in the model. People can set different ms and ns as needed. Meanwhile, the mathematical principle of the model for preventing privacy disclosures is expounded. Our experiments demonstrate that the proposed model has great significance in privacy preservation.
It is worth mentioning that just because only three QI attributes of vehicles are selected for the modeling discussions, it does not mean that the proposed model is only applicable to data with three QI attributes. e more vehicle QI attributes are considered, the greater the tracking probability will be. Fortunately, the model that is presented in this paper applies to any number of QI attributes. ree QI attributes for vehicles are selected just to verify the validity of the (m, n)-bucket model. Which column is the SA is not the focus of this paper, and this study will be one of the next research directions. Although the method presented in this paper can usually achieve good results, it still has some limitations. We cannot deal with vehicles that are moving too fast or too slow. Because the estimated travel time is estimated using the log-normal distribution function, if the vehicle moves too fast or too slow, it is hard to estimate. e model can be used to evaluate other scenarios for protecting confidential information, such as personalized privacy protection and continuous data publication. It is recommended that these issues be studied according to the proposed model. Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.