An Approach Based on Customized Robust Cloaked Region for Geographic Location Information Privacy Protection

Location-based services (LBS) have gained huge popularity because of the easy availability of modern mobile devices and the fast development of geographical information science (GIS). However, the lack of protection for private user positions might give rise to privacy concerns. ,is kind of problem is especially serious in mobile application environment because many mobile applications tend to use LBS. In this paper, we propose a new privacy preserving approach using customized robust cloaked region (RCR), depending on a peer-to-peer structure and the premise that users do not trust each other when sharing their geographical locations. Two algorithms are used to generate the RCR with high user density. ,e area of the RCR is controlled by the user’s demanded degree of protection. To enhance the resistance to regional background knowledge attack, we incorporate a location semantic value into each unit of the user map. According to extensive simulations, our method can effectively obfuscate a user’s geographical location into a highly indistinguishable region because of the disturbance of nearby users and different equally possible locations.


Introduction
With the development of the geographical information science (GIS) technique and the popularity of mobile devices, location-based services (LBS) have been widely employed. Extensive daily examples can be found in navigation services. Although these services are popular, LBS have caused many privacy concerns. Also, this question has become a popular issue and one that has drawn the attention of researchers. Among all solutions to geographical location preserving, spatial obfuscation has been proved the most effective one. In general, it deliberately generalizes the user's precise position into a region, so that adversaries can only retrieve a coarse location. Gruteser and Grunwald [1] firstly defined k-anonymity in the context of spatial obfuscation. Several methods such as the Casper framework presented by Mokbel et al. [2] and the nearest neighbor cloak and the Hilbert cloak proposed by Kalnis et al. [3] are TTP-based. However, the anonymizer is highly likely to become the single attack target, and a severe privacy leakage will be incurred if the adversary gains access to it.
In addition, k-anonymity can also be implemented by a decentralized peer-to-peer (P2P) network [4]. Another P2P approach called MobiHide, presented by Ghinita et al. [5], uses Hilbert space-filling curves to hide the query issuer among a group of k users.
Although location anonymity suggests a practical way to obfuscate the user's precise location, it is vulnerable to many types of attack without careful consideration, and one of the most significant attacks is regional background attack (RBA). Specifically, an adversary could utilize the background knowledge to increase the precision of the user's position by excluding nonreachable area from the obfuscated area [6].
To address the abovementioned drawbacks when applying spatial obfuscation, this paper introduces a new approach using customized robust cloaked region (RCR). e proposed approach is based on a P2P structure and designed to address the privacy concern raised by sharing accurate positions. e main idea is that users no longer share their exact locations but an RCR based on nearby users' RCRs. Different from the cloaked region formed by applying k-anonymity, the number of users included in the RCR in our context is unknown, rather than a specific number. Instead, the size of the cloaked region is under control in accordance with the privacy preference of the user, which avoids an overlarge cloaked region or time delay caused by low user population [1]. e major contributions of this paper can be summarized as follows: (i) We propose the idea of replacing the precise position with a cloaked region when sharing location information in the peer-to-peer structure, which greatly reduces the risk of privacy breach. (ii) We propose two effective algorithms to calculate the cloaked region with a high user density. (iii) We enhance the robustness of the cloaked region by integrating location semantics and considering time parameter to reflect the fluctuation of population. is approach successfully defends against RBA, since a more reasonable cloaked region is produced and hard to be shrunk based on limited background knowledge. (iv) Extensive experiments are performed with realworld data, and the results have validated the correctness and superiority of the proposed approach and algorithm.

Related Work
Historically, researchers have exerted great efforts toward the protection of location privacy. According to the protection goals, existing mechanisms can be classified into three main categories: protection for user identities, protection for position information, and protection for querying content. User identity protection usually modifies or hides real identities of mobile users while position protection adds noise to users' precise locations. Query content protection enables a data holder to share person-specific records in such a way that the released information remains practically useful but the identity of the individuals who are the subjects of the data cannot be determined [7].

Protection of User Identities.
Anonymity and pseudonymity are two common ways to protect user identities. Anonymity makes an individual indistinguishable from all other individuals in a set, while in pseudonymity, an individual maintains a persistent identity (a pseudonym) that cannot be linked to their actual identities [8]. Among all the anonymity approaches, k-anonymity is the most widely used, which obfuscates user identities within a group of users. Another common approach to protect user identities is mix-zones [9] where the user identity is mixed with all other users in the zone by changing the pseudonym. Freudiger [10] incorporated the mix-zone model with techniques of encryption to produce higher levels of privacy protection. However, such technique is limited to those LBS that do not require the user's identity. Also, with advancements in techniques of data mining, pseudonymity and anonymity are no longer safe enough since identity can often be inferred from the location [6].

Protection for Position Information.
Existing works designed to preserve the privacy of position information can be mainly divided into two categories: position dummies and spatial obfuscation. e main idea of position dummies [11] is that when a user sends a service request to the Location Service Provider (LSP), in addition to its real location, it also sends multiple false positions randomly created. However, by monitoring long-term movement patterns of the user, the server may distinguish the genuine location from dummies. In order to reduce the computational cost, Hong and Landay [12] proposed a landmark approach using a landmark to replace the user's real position. On the other hand, spatial obfuscation approaches [1,[13][14][15] deliberately reduce the precision of position information sent to the LSP by replacing the user's position information with a larger region.

Protection for Query
Content. K-anonymity is a widespread general privacy concept firstly proposed by Samarati and Seweeney [16] to protect the query content, which guarantees that each record is indistinguishable from at least k − 1 other records with respect to certain identifying attributes [17]. e model of k-anonymity has been enhanced by various approaches to increase the level of privacy protection. e most prominent enhancements are strong kanonymity [18], l-diversity [17], t-closeness [19], p-sensitivity [20], and (α, k)-anonymity [21]. In addition, several works use differential privacy to protect the location privacy in information datasets, where the presence of no single user could significantly change the outcome of the aggregated location information [22]. Nonexposure location anonymity, presented by Pan et al. [23], is the first study that explores the problem of location cloaking without exposing the accurate user locations. It is designed for k-anonymity, and cloaking is performed based on the proximity information among mobile users, instead of directly on their coordinates. e PROBE approach proposed by ML Damiani et al. [24] obfuscates the user location by taking into account the geographical context and the user's privacy preferences. e substantial difference with our system is that we not only perform spatial obfuscation based on location semantics but we also provide anonymity by making use of nearby users' locations. Moreover, we noticed that the population of a location varies greatly at different times; therefore, we incorporate time parameter with location semantics to ensure correctness.

Cloaked Region Generation Algorithm
In this section, we will firstly present several fundamental definitions and then demonstrate the workflow of the cloaked region generation. Finally, two cloaked region generation algorithms will be introduced.

Definition 1.
A cloaked region is a rectangle represented by Rx and is defined as the following five-element tuple: Rx � (xl, yl, xr, yr, id). (1) In this definition, (xl, yl) and (xr, yr) stand for the coordinate of the bottom left and upper right corner of the cloaked region, respectively. e element of id is used to uniquely identify a cloaked region, so two ids cannot be the same.

Definition 2.
e size of the cloaked region S (Rx) is defined as Definition 3. Amax denotes the maximum allowed size, and Amin denotes the minimum allowed size for the cloaked region. Users are allowed to set up Amax and Amin to restrict the size of the cloaked region generated so as to satisfy varied degree of individual privacy protection requirement. In this algorithm, a gridding map named user-Map is created upon actual geography [25][26][27]. e actual location of every user is represented by a cell in this map, and a single cell can accommodate more than one user. e workflow of the cloaked region generation is depicted as follows: (i) e targeted user joins into a peer-to-peer distributed system and then searches for his neighbor through point-to-point communication protocol on his terminal. (ii) e neighbors share their cloaked regions to the targeted user. To ensure efficiency, when a neighboring user is asked to share his/her location, if he/ she has generated an RCR for his history queries and still remains in it, the previously generated RCR will be delivered; otherwise, he/she will generate one randomly. (iii) A cloaked region is calculated according to the generation algorithm. (iv) Finally, a group member is randomly selected to send the intended query together with the generated cloaked region to the LSP.
To further clarify the abovementioned process, we present an example here. As shown in Figure 1, the targeted user u 1 has six neighbor u 2 − u 7 who share their own cloaked regions to u 1 through peer-to-peer communication protocol. As shown in Figure 2, the cloaked regions are represented by varied size rectangles generated according to predetermined Amax and Amin. As illustrated in Figure 3, a rectangular userMap will be generated which is centered on the precise  location of the targeted user and covers all the cloaked regions of his neighbor. Note that the size of the userMap can be adjusted by users so that it can provide a particular individual request of privacy protection. e cloaked region cell value assignment principle is defined as follows. It is based on the fact that the user will appear wherever in his cloaked region with equal probability. For example, the region M 2 in Figure 3 is a 4 * 5 rectangle if the unit length is defined as the length of the side of the individual cell.
en, the probability of user U 2 appearing at each cell is equal to 1/S (M 2 ), which is 5%. For simplicity, we assign the value of each cell as 100 times as its probability. If one cell is covered by more than one cloaked region, then we calculate the sum of the cell value in each cloaked region as its ultimate value. Figure 4 shows the result of the assignment according to this principle.
A high-quality cloaked region should contain as many users as possible, and meanwhile occupy a smaller area for the sake of the usability to the LBS. erefore, it is crucial to provide a tradeoff between the sum of Cell-Values in a region and its size. To clarify this issue, we refer as userDensity to the measurement for the quality of a cloaked region.

Definition 4.
e userDensity of a region is defined as the ratio of the amount of users in this region to the size of the region. According to the cloaked zone cell value assignment principle, the number of users in a certain region equals to the sum of the cell values in this region.
us, the user density of a specified region M can be interpreted mathematically as To obtain the cloaked region, we are supposed to look for the area which possesses the largest userDensity. Based on this purpose, we proposed two algorithms to generate the cloaked region.

Exhaustive Method.
e exhaustive method, as given in Algorithm 1, aims to traverse all the rectangles containing the cell where the targeted user is located until the one with maximum user density is found and then return the final rectangle as the cloaked region to the user. is algorithm can guarantee that the generated cloaked region is of smaller size with more users. However, the drawback of this method is the high computational complexity and is thus timeconsuming in practice.

Heuristic Method.
Given the high computational complexity of the exhaustive method, we develop a heuristic method, Algorithm 2, to keep the balance between quality of privacy protection and cost of computation.
In the userMap, the higher the value a cell has, the more likely a user will appear in this cell. When a cell with a high value is located far from the targeted user, the cloaked region generated for the targeted user will shift to the high value cell which may result in a deviated cloaked region; also, there is a declined quality of LBS service.
us, in order to guarantee an acceptable quality of service requiring less algorithm complexity, the distance between the targeted user and his neighbor should be taken into consideration when performing the cell value assignment. e dCellValue can be depicted in the following formula:

Robust Cloaked Region
Generation Algorithm e aforementioned cloaked region generation algorithm is more based on theory rather than daily life scenario.
us, it fails to develop immunity to adversaries with background knowledge. In this section, we will introduce the customized coefficient Diversity which will greatly enhance the defensiveness of the region generation algorithm.

Location Semantics.
In our daily lives, each location possesses certain semantics in which a group of representative users distribute, and this allows adversaries with background knowledge, in particular, map knowledge, to speculate the precise user spatial information by excluding nonreachable areas from the cloaked area. e cloaked region generation algorithm can be more defensive if location semantics is taken into consideration, since the area generated will contain fewer places where users are less likely to appear.
Definition 6. Location semantics K is defined as the degree a user is likely to appear in a certain location. K is a number between 0 and 1, where 0 stands for "zero likely" with possibility of zero and 1 stands for "very likely" with possibility of one. K is a variable that indicates the spatial distribution of population and the user's preference to a location. It varies for different users, for example, if the user is a doctor, the semantics of hospital for this user is much higher than other people who do not work in the hospital (the hospital is sensitive for a patient, but not for a doctor).

Input: Nearby users' cloaked regions and customized size constraints Amin and Amax
Output: e cloaked region for the target user (1) Initialize the userMap with the dCellValue; (2) Initialize cloaked region M to contain Ccenter only; (3) while search for cell c with maximum dCellValue do (4) extend M to cell c; (5) if S (M) > Amax then (6) withdraw M to its previous state and set the dCellValue of c to 0; (7) else (8) for each cell in M do (9) set dCellValue to 0; (10) end (11) end (12) end (13) return the cloaked region M ALGORITHM 2: Generate cloaked region using the heuristic method.
Mobile Information Systems probability of each region p 1 (g), p 2 (g), p 3 (g), . . ., p n (g). Let L be the sum of the probability of g in all the cloaked regions it appears, which is denoted as L(g) � p 1 (g) + p 2 (g) + · · · + p n (g). (6) Definition 9. Given a cloaked region M, its user density can be defined as where g denotes the cell in M and |M| stands for the size of M.

Time Parameter.
In practice, not only does the location semantics play an important role in privacy protection, but the time parameter can also enhance the defensiveness, since the crowdedness of a place varies in a day; for instance, people are more likely to gather in the office during the day while in their residence during the night.

Definition 10.
e user's preference to a certain position on a time point or during a time period is denoted as T.

Size Customization.
As there is a trade-off between the benefits gained from LBS and possibility that private information might be revealed (at least partially), a user can choose the size of the cloaked region to be generated by assigning the values of Amax and Amin, as in Algorithm 3. Bigger Amax can result in a larger obfuscated area, and vice versa.
Definition 11. A customized coefficient named Diversity can be defined by a four-element tuple consisting of location semantics K, time parameter T, Amax, and Amin:

Experiments
e experiment is divided into two parts. e first part aims to compare the performance of two cloaked region generation algorithms, namely, the exhaustive method and the heuristic method; the second part is to demonstrate the correctness and superiority of our RCR generation algorithms.
Our method is applied during a specific moment in time for simplification. erefore, we implement our algorithms under static scenarios without taking mobility into account. However, our approach could be extended to moving users with further improvements and necessary modifications. e experiments were conducted on Windows 7 Operating System with Intel Core (TM) i5-4200U 1.60 GHz CPU and 8 GB RAM. We obtained experimental data from the Brinkhoff generator, which is an object-oriented data generation system developed by researcher omas Brinkhoff in 2000. Brinkhoff's generator takes a gridding map as input, adopts a discrete timing model, and produces new objects at each time stamp. e generator runs on JDK 1.8.

Comparison Experiments between Exhaustive and Heuristic Methods.
In the comparison experiments between the exhaustive method and heuristic method, we created a 2500 * 2500 gridding map upon the actual map of Oldenburg in Germany, as shown in Figure 5 and adopted it as the input map. e userMap was divided into 20 * 20 rectangles, and the number of the simulated users varied from 1000 to 5000. Since the number of neighbors around the targeted user and the size restriction (set up by the targeted user) both have big impacts on the cloaked region generated, we examined these two variables, respectively, in the experiment. e parameter setting is given in Table 1.
For performance evaluation, we measure the average value of the following metrics: (1) user density, (2) size of the RCR generated, and (3) running time of the algorithm. Figures 6 and  7, the user density of the cloaked region generated by the exhaustive method is always higher than that generated by the heuristic method, as the working mechanism of the exhaustive method is to find the cloaked region with maximum user density under the specified size restriction. In terms of user density of the cloaked region generated by the two algorithms, the exhaustive method has a better performance than the heuristic method. Figures 8 and 9, the size of the cloaked region generated by the heuristic method is always larger than that generated by the exhaustive method. Since the heuristic method adopts greedy algorithm, cloaked region continues to extend to the cell with maximum dCellValue until it reaches its size upper bound. As a larger cloaked region leads to the declined quality of LBS service, the exhaustive method has a better performance. Figures 10  and 11, the running time of the exhaustive method is much longer than that of the heuristic method, and the difference grows larger with the increase in region size. e longer running time of the exhaustive method lies in the fact that it continues searching the rectangles containing the targeted user until the one of the maximum user density is found. Also, as Amin and Amax get larger, the exhaustive method will take an even longer time to find the desired rectangle.

Running Time Comparison. As shown in
at is why the running time of the exhaustive method will soar as the allowed size of the generated RCR grows larger. As shown in Figure 11, the running time of the heuristic method decreases as Amin and Amax grow larger. is is because when the number of users remains still, the user density descends as the size of the cloaked region grows larger. As a result, the RCR extension time will decline at the same time. When looking into the running time for generation of RCR, we find that the heuristic method has a better performance.

RCR Generation Algorithm Experiment.
In the RCR generation algorithm experiment, we created a 25 * 25 gridding map upon on district map of the city of Wuhan, China (as shown in Figure 12), and adopted it as the input map. e number of simulated neighbor is increased from 10 to 50, and the size of the RCR is restrained to [25,81]. After updating the gridding map with location semantics shown in Table 2, we can get the gridding map as shown in Figure 13. For performance evaluation, we measure the average values of the following metrics: (1) user density of the RCR, (2) size of the RCR, and (3) the defensiveness of RCR generation algorithm to adversaries.
As shown in Figure 14, as the number of neighbors increases from 10 to 50, the user density of RCR does go up as it is supposed to be.
As indicated in Figure 15, the average size of the RCR generated grows larger as the number of neighbors rises.
at is because as the number of neighbor increases, the (2) Assign the calculated P (g) for each cell in the userMap; (3) for each rectangle M � (i, j) in the userMap do (4) if M contains Ccenter, and S (M) < Amax, S (M) > Amin then (5) calculate userDensity (M); (6) if userDensity (M) > maxUserDensity then (7) maxUserDensity � userDensity (M); cl � i; cr � j; (8) end (9) end (10) end (11) return RCR determined by cr and cl ALGORITHM 3: Integrated robust cloaked region (RCR) generation algorithm.  RCR tends to contain the area which is of high user density but locates far from the targeted user. Consequently, the size of the RCR grows larger. Table 3 demonstrates the ability of our RCR generation algorithm to prevent the user's spatial privacy from being revealed. An RCR of high quality ought to contain areas of similar location semantics K so that the adversary with background knowledge cannot use map matching to exclude irrelevant areas and thus shrink the cloaked region under intended size. As shown in Table 3, our RCR is capable of generating a high-quality cloaked area as defense from adversaries with background knowledge. For example, when t is weekday daytime, the probability of RCR-containing lake is only 1.5%, whereas the probability rises to 18.3% at night. is    is because at daytime, people are less likely to be around a lake as they may need to work or study. is is also a reason to why K for lake at daytime is small. As a consequence, the generated obfuscated area tends to cover less lake area. However, when it is night time, the K for mall and office building becomes the same as that for lake, since people are more likely to stay at home on a weekday evening. As K for the lake, park, office building, and mall are the same, the probability for the       generated RCR-containing lake goes up, so that the adversary cannot exclude the lake region from RCR easily.

Conclusions
We proposed a new geographical location preserving mechanism. e mechanism was based on peer-to-peer structure and used RCR to eliminate the trust concern within a user group. e RCR is generated under customized size constraints with an attempt to have as high user density as possible. Two methods developed to calculate the RCR are the exhaustive method and the heuristic method. ere are still many worthy focal points for further research based on the work in this paper: (1) the mechanism we propose mainly focuses on the snapshot query scenario, and if the user is making continuous LBS queries, the algorithm will not be useful. (2) Location semantics is just mentioned as a concept in this article, and to make it practical in real-time application, we need to work out a semantic labeling framework. (3) e protection model proposed can be further strengthened to resist many other possible attacks; for example, it is still likely to reveal the user's identity with the user's profile attributes, even if protected by an anonymity user group; therefore, more sophisticated strategies should be applied when conducting anonymity.

Data Availability
e data used to support the findings of this study are available from the first author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.