KLPPS: A k -Anonymous Location Privacy Protection Scheme via Dummies and Stackelberg Game

Protecting location privacy has become an irreversible trend; some problems also come such as system structures adopted by location privacy protection schemes suﬀer from single point of failure or the mobile device performance bottlenecks, and these schemes cannot resist single-point attacks and inference attacks and achieve a tradeoﬀ between privacy level and service quality. To solve these problems, we propose a k -anonymous location privacy protection scheme via dummies and Stackelberg game. First, we analyze the merits and drawbacks of the existing location privacy preservation system architecture and propose a semitrusted third party-based location privacy preservation architecture. Next, taking into account both location semantic diversity, physical dispersion


Introduction
With the rapid development of mobile devices and social networks, location-based service (LBS) has become a vital part of our daily activities in recent years. With smartphones or tablets, users can download location-based applications from Apple Store or Google Play Store. With the help of these applications, users can easily send queries to a service provider and obtain LBSs related to some points of interest. For example, users can check the bus schedule, the price information of nearby restaurants or gas stations, etc. Undoubtedly, by submitting LBS queries, users can enjoy the convenience provided by LBS. However, since the untrusted service provider has all the information about users such as where they are at which time, what kind of queries they submit, what they are doing, etc., he may track users in various ways or release their personal data to third parties. us, we need to take appropriate measures to protect users' privacy.
Many approaches [1][2][3] have been taken to address such privacy problems, where the location k−anonymity and location perturbation are commonly used. e existing location k−anonymity technology usually adopts the structure based on a trusted third party (TTP) [3]. e TTP structure refers to introduce a trusted third party, called centralized location anonymizer, between the user and the service provider, and the usage of the location anonymizer to make the target user's information indistinguishable from that of at least k − 1 other users, so that the probability of location leakage is therefore at most 1/k. Specifically, to achieve k− anonymity, an LBS-related query is submitted to the service provider via a centralized location anonymizer, which enlarges the queried location into a bigger cloaking region covering many other users (e.g., k − 1) geographically. As a result, it is hard for the untrusted service provider to distinguish the user's real location from this area. However, these approaches of using k− anonymity have a fatal problem. It heavily relies on the location anonymizer, which suffers from a single point of failure [4]. If the adversary gains control of it, the privacy of all users will be compromised.
In response to the problems existed in the TTP structure, some researchers have proposed a dummy location technology that can also achieve k−anonymity, which uses an independent system structure [2]. e independent structure only contains the user and the service provider, where the user uses the mobile terminal to generate k − 1 dummy locations and then sends k − 1 dummy locations combined with the user's real location to the service provider. As a result, it is hard for the untrusted service provider to distinguish the user's real location from the other dummy locations. Since the structure achieves functions such as location anonymity and filtering query results through a mobile terminal instead of a location anonymizer, there is no single point of failure caused by the location anonymizer. In 2008, with the birth of Bitcoin [5], blockchain technology has been widely used in finance, medical care, supply chain, and other fields. Blockchain [6][7][8], as the underlying technology of Bitcoin, realizes distributed information interaction and collective maintenance of data in a decentralized and autonomous way, with decentralization, tamper-proof, autonomy, traceability, etc. Simultaneously, the security of consensus protocols [9] and the protection of user privacy [10] in the blockchain has become a new research hotspot. Chen et al. [11] proposed a dynamic multikey fully homomorphic encryption. e decentralized characteristic of blockchain opens a new door for location privacy protection. Based on this idea, [12] proposes a distributed k−anonymity location privacy protection scheme based on blockchain, which can also achieve k−anonymity without the help of a location anonymizer.
In the LBS, the user firstly adopts approaches that are based on perturbing the information reported to the service provider, so to prevent the disclosure of one's location. Clearly, the perturbation of the information sent to the service provider leads to a degradation of service quality, and consequently, there is a trade-off between the level of privacy that the user wishes to guarantee and the service quality loss that she will have to accept; however, the adversary formulates corresponding strategies based on the privacy protection method adopted by the user and infers the real location of the user by observing the perturbation of the information. Since the relationship between users and adversaries objectively conforms to the game relationship between participants in the Stackelberg game model, introducing the Stackelberg game method into the field of location privacy protection is an important research direction. Shokri et al. [13] took the lead in introducing the Stackelberg game into location privacy protection and proposed a location privacy protection scheme based on the Stackelberg game (SG). e solution assumes that the adversary has acquired prior knowledge and allows the user and the adversary to play the game in turn: e level of privacy protection is maximized by user when the service quality loss is less than a given threshold, whereas the adversary strives to minimize the level of privacy protection based on prior knowledge and offset location. By playing the game above, this strategy can finally optimize the level of privacy protection while ensuring that the service quality loss is less than a given threshold.
Based on the analysis above, the existing location privacy protection schemes have the following shortcomings: (1) the existing location privacy protection schemes either adopt TTP structure that has a single point of failure or the independent system structure. However, users in the independent structure use mobile terminals to perform location anonymity algorithms and filter query results, which will greatly increase the client's pressure, affecting the service quality in turn. (2) On the one hand, these schemes do not fully consider the semantics, physical dispersion, and query probability of the location when selecting dummy locations. On the other hand, they do not fully consider the background knowledge that the adversary may have, which adversaries can use to infer the users' location privacy information. So they cannot effectively resist single-point attacks and inference attacks. (3) Since such schemes need to sacrifice the service quality for improving the privacy protection level, there is no trade-off between service quality and privacy protection level. Aiming at related shortcomings above, this paper comprehensively considers features such as side information, location semantics, physical dispersion of locations combined with dummy locations, k−anonymity technology, Stackelberg game and other ideas, and then designs a k−anonymous location privacy protection scheme (KLPPS) based on Stackelberg game and dummy locations, which can resist single-point attacks and inference attacks while effectively balancing service quality and location privacy. Our contributions are mainly as follows: (1) A semitrusted third party (STTP) based location privacy protection structure is proposed. e STTP is based on the TTP structure by adding an encryption server and WiFi-AP, and stores the user's privacy information on three party servers through a certain mechanism. In the STTP, even if the adversary steals the information on the location anonymizer, he still cannot locate the user and obtain the user's complete privacy information, which effectively solves the single point of failure existed in the TTP structure. Meanwhile, the location anonymizer is responsible for implementing location anonymity algorithms and filtering query results, etc., therefore, also solve the mobile device performance bottlenecks that exist in the independent structure.
(2) We propose a dummy location selection algorithm based on location semantics and physical distance (SPDDS). Compared with existing dummy location selection algorithms, SPDDS takes into account the characteristics such as location semantic diversity, physical dispersion, query probability and offset location when selecting dummy locations, which can effectively protect users' location privacy against single-point attack. Furthermore, we propose a location anonymous optimization method based on Stackelberg game, which introduces Stackelberg game to improve the dummy location selection algorithm. More specifically, we formalize the mutual optimization of user-adversary objectives (location privacy vs. correctness of inferring location) by using the framework of Stackelberg games, and find an optimal dummy location set by solving the game equilibrium. e optimal dummy location set can resist single-point attacks and inference attacks while effectively balancing service quality and location privacy. (3) We conduct a comprehensive experiment to evaluate the proposed scheme. Experimental results show that our scheme can effectively resist single-point attacks and inference attacks while effectively balancing service quality and location privacye when compared with other dummy-based schemes. e rest of the paper is organized as follows. We discuss the related work in Section 2. Section 3 presents some preliminaries of this paper. Section 4 presents the structure of STTP and the interactive process. We present the SPDDS algorithms and a location anonymous optimization method based on Stackelberg game in Section 5. Section 6 presents the experimental process as well as results. We conclude the paper in Section 7.

Related Works
In this section, we first analyze the merits and drawbacks of mainstream existing location privacy protection system structure. Furthermore, we review major existing techniques for preventing location privacy leakage including privacy protection scheme based on dummy and privacy protection scheme based on Stackelberg game in Sections 2.2 to 2.3, respectively.

Location Privacy Protection System
Structure. As a large body of location privacy protection technologies has been proposed, the system structure on which various privacy protection technologies depend has shown distinct category differences. As the carrier of privacy protection technology implementation, the system structure has got sufficiently researched and developed.
Currently, there are two mainstream system structures: TTP−based central server structure and independent system structure. In the TTP structure [14][15][16], the location anonymizer obtains the location information of all users and is responsible for implementing the location privacy protection mechanism. It is currently a more commonly used privacy protection system structure, the advantage of which is that the location anonymizer can obtain the location information of all users and assist users in filtering service data. e disadvantage is that it relies on a location anonymizer to enlarge the queried location into a bigger cloaking region, and hence the location anonymizer becomes the central point of failure. References [2,17,18] proposed an independent system structure, where users can protect their location privacy according to their own capabilities and knowledge. e architecture treats each user as an independent individual, allows the user's device to implement a location privacy protection mechanism, directly sends a service request to the service provider, and receives the query result. e advantage of this system structure is that the deployment is conveniently simple, and it is convenient for users to adjust the privacy protection granularity according to their privacy protection needs. However, the implementation of privacy protection algorithms has been limited by the performance of mobile devices. Meanwhile, filtering query results will also increase the burden on the mobile client, which in turn affects service quality.

Privacy Protection Scheme Based on Dummy.
Location dummies are aimed to secure users' accurate location by sending k − 1 false locations ("dummies") together with the true location so that the probability of location leakage is reduced to 1/k. Compared to the traditional k−anonymity, this approach sends exact locations instead of cloaked regions to a service provider, which can return a more precise query result and avoid single-point failure.
Kido et al. [19,20] first proposed to use dummy locations to achieve anonymity without employing a central server. However, they only concentrate on reducing the communication costs. Moreover, they employ a random walk model to generate dummy locations, and it cannot resist side information attacks due to lack of considering factors such as query probability. Subsequently, although Dapeng et al. [21] proposed the ABR algorithm based on query probability, which can resist side information attacks. However, it cannot resist homogeneity attacks and location similarity attacks for not considering the physical dispersion and location semantic diversity. e UPHIF algorithm proposed by Chang et al. [22] protected location privacy to a certain extent, but did not consider the location semantic diversity, so it cannot deal with location similarity attacks. Niu et al. [23] selected dummy locations based on entropy metrics, and proposed a dummy location selection (DLS) scheme and its improved scheme (enhanced − DLS). Although the enhanced − DLS scheme can resist side information attacks and homogeneity attacks, which cannot resist location similarity attacks for lacking of considering the location semantic diversity. References [24,25] all considered the user's location semantic diversity, which can effectively resist location similarity attacks, but all have the problem that the cloaked region is too big, affecting the service quality in turn. Although [26,27] fully considered the location semantic diversity and physical dispersion, which can effectively resist homogeneity attacks and location similarity attacks, but they cannot resist the side information attack for not considering the query probability.

Privacy Protection Scheme Based on Stackelberg Game.
In a big data environment, an adversary can use the various data collected to infer the privacy information of the user's location [28], which is called the location inference attack. Because the traditional dummy-based privacy protection scheme cannot effectively resist this kind of inference attack, the location privacy protection mechanism based on probabilistic reasoning [29,30] has gradually attracted the attention of scholars. Such methods are based on perturbing the real locations of a user to the service provider, in order to increase the uncertainty of the adversary about a user's true whereabouts. However, the perturbation of the information sent to the service provider leads to a degradation of service quality, and consequently, there is a trade-off between the level of privacy that the user wishes to guarantee and the service quality loss that she will have to accept. So, the Stackelberg game has become an important means of balancing the level of privacy protection and service-quality requirements in such methods.
Based on [13] and combined the ideas of k−anonymity and dummy location, Xingyou et al. [31] propose HCLS and its improved scheme (HCLS − SG). Although the HCLS − SG scheme can effectively resist inference attacks and better balance service quality and location privacy, it cannot resist location similarity attacks for not considering location semantic diversity. Bordenabe et al. [32] also introduced differential privacy on the basis of [13] and constructed a privacy protection mechanism that optimizes the quality of service. Since differential privacy does not depend on prior, this mechanism can minimize the service quality loss under the premise of satisfying location indistinguishability. Shokri [18] further proposed using two indicators of differential privacy and distortion privacy to optimize the privacy protection strategy based on the Stackelberg game. Differential privacy limits the extent of user privacy leakage, while distortion privacy measures the error rate of inferring a user's privacy. By combining these two standards, this privacy protection strategy can resist more kinds of inference attacks while ensuring privacy protection requirements.

Preliminaries
In this section, we first introduce some relative definitions of location privacy protection algorithm; meanwhile, we summarize the notations introduced throughout the section in Table 1 and then introduce the relative concepts of Stackelberg game.

Relative Definitions of Location Privacy Protection Algorithm
Definition 1. According to [27], location semantic tree (LST), a true structure used to represent the semantic relations between two locations within the range of a Wi-Fi access points (Wi-Fi AP), which satisfies the following requirements: (a) Each nonleaf node stands for the category of its children nodes and each leaf node for a real location l (b) e depth of LST, denoted as h, is equal to the maximum number of layers of categories plus 1 (c) e semantic distance d sem (l i , l j ) between two locations l i , l j (i ≠ j) is the number of hops from leaf node n i to leaf node n j Definition 2. User's privacy requirements S, represented by two-tuple (k, u) that has the following meanings: (a) k denotes the anonymous degree of our location privacy preservation model. More specifically, each query is sent with at least k − 1 dummy locations and its offset location (we use offset location instead of the real location), making that the probability of offset location leakage is therefore 1/k. (b) u represents the minimum acceptable value of semantic distance between two locations in dummy location set (DLS). In other words, it satisfies the inequality: Definition 3 (location map distance). If we let Map cur represent the map information within the range of the current Wi-Fi AP. For any two locations l i , l j (i ≠ j), the location map distance is the physical distance between the two locations on Map cur , the value of which ranges from tens of meters to hundreds.
Definition 4 (location query probability (LQP)). As shown in Figure 1, in a map divided into m × m cells with equal size. Each cell has a query probability based on the previous query history, which is denoted as where i � 1, 2, . . . , m 2 , m 2 i�1 p i � 1. e depth of the color in the figure indicates LQP (the darker the color, the greater the LQP), and the white area indicates that the location has never had a location service request, so these locations may be rivers, barren mountains, and other places that are easily filtered by the adversary.

Definition 5.
e probability of exposing real location (PERL), which has been used to measure the effectiveness of the algorithm against side information attacks, is computed by where k denotes the anonymous degree and k ′ represents the number of dummy locations filtered by the adversary through the side information attack. e larger the PERL, the less effective the algorithm resists side information attacks; the smaller the PERL, the better the algorithm resists side information attacks.
Definition 6. Location physical dispersion (PD), which has been used to measure the effectiveness of the algorithm against location homogeneity attacks, is obtained by computing the minimum physical distance between any two locations in a DLS. e specific process is shown in (4): where i, j � 1, 2, . . . , k, i ≠ j. e greater the minimum distance between any two locations in the DLS, the greater the PD and the greater the coverage of the DLS, the better the algorithm's resistance to location homogeneity attacks.
where SEM � d sem |d sem (l i , l j ) < u , k � |DLS|, and C 2 k is a combination formulas, we call DLS a θ− secure set. We use θ as a privacy protection index of location semantics in our experimental analysis in Section 6. Our aim is to achieve the maximum θ, i.e., to make it equal to 1, such that two locations in DLS belong to different categories.

Definition 8.
e adversary uses background knowledge to run an inference attack on DLS in order to output estimation l of the user's actual locations and the attack result can be denoted as g(l|DLS), and we define the location privacy protection mechanism (LPPM) that the adversary knows as f(DLS|l). en, we follow the definition in [33] and quantify the user's privacy level (PL) as the adversary's expected error in his inference attack, i.e., the expected distortion in the reconstructed event. We compute the expectation over all l, l, and DLS: PL directly reflects the adversary's inference on the user's actual location. e larger the PL, the less accurate the adversary's inference, and the better the effect of the algorithm resist the inference attack.
Definition 9. We define the process of generating the offset location l d as f(l d |l). en, following the definition in [13], the LBS response quality depends on the offset location l d output by f(l d |l) and not on the user's actual location l. e distortion introduced in the offset location determines the quality of service that the user experiences. e more similar the actual and the offset location are, the higher the service Table 1: Summary of notations.

Symbol
Meaning d sem (l i , l j ) e semantic distance between two locations l i , l j (i ≠ j) d phy (l i , l j ) e physical distance between two locations l i , l j (i ≠ j) d que (l i , l j ) e query probability distance between two locations l i , l j (i ≠ j), which is obtained by calculating the difference between the query probabilities of two locations θ Representation of the semantic diversity between locations φ(l) Location access profile of the user (probability of being at location when accessing the LBS) l Actual location l of the user l d Offset location output by the f(l d |l) DLS Set of possible dummy locations output by the LPPM f(DLS|l) e location privacy protection mechanism (LPPM): probability of replacing l with DLS l Adversary's estimate of the user's actual location g(l|DLS) Adversary's attack function: probability of estimating l as user's actual location, if DLS is observed f(l d |l) Function of generating offset location: probability of replacing l with l d Qos loss Expected quality loss of an LPPM with location obfuscation function f(l d |l) Q max loss Maximum tolerable service quality loss PL Expected location privacy of the user with profile φ(l) using LPPM f(DLS|l) against attack g(l|DLS) Security and Communication Networks 5 quality is. e expected quality loss (Qos loss ) due to f(l d |l) is computed as an average of d phy (l, l d ) over all l and l d : We set a service quality threshold Q max loss , which represents the maximum service quality loss that the user can accept. e location privacy protection scheme designed in this paper needs to guarantee Qos loss ≤ Q max loss because if the service quality loss exceeds the threshold, the location service request results obtained cannot satisfy the requirements of users.

Stackelberg Game.
e classic Stackelberg game is a twoplayer game composed of a leader and a follower [34]. e leader first determines his strategy, and after observing the leader's strategy, the follower chooses the strategy that maximizes his utility to play the game. In the field of location privacy protection, the terms "protector" and "leader," "adversary" and "follower" can be used interchangeably. For simplicity of expression, the leader (protector), denoted as Θ, is often referred to as she, whereas the follower (adversary), represented by Ψ, is referred to as he.
Definition 10. In the field of location privacy protection, the strong Stackelberg equilibrium (SSE) is generally used as the solution of the Stackelberg game. e definition of SSE is described as follows.
A strategy combination is SSE, if and only if it satisfies the following conditions: (a) e leader's strategy is the best response: where c ∈ Γ means a particular type of follower, x, x ′ ∈ X represents the leaders' mixed strategy, and q ∈ Q represents the followers' mixed strategy. (b) e follower's strategy is the best response: where x ∈ X represents the leaders' mixed strategy, q, q ′ ∈ Q represents the followers' mixed strategy. (c) If there are multiple best responses for the followers, the followers choose the most favorable strategy for the leader to break the deadlock: where Q * (x) is the follower's best response strategy set under the leader's strategy is x.

System Model
We first give the definition of the single-point attack mode and inference attack mode in Sections 4.1 and 4.2 respectively, and then introduce the structure of STTP in Section 4.3. Finally we present the interactive process of our scheme in Section 4.4.

Single-Point Attack
Model. From the time dimension, the adversary relies on the intercepted single location-service request to infer the user's private information, which is called the single-point attack model [35]. In the model, the main attack methods of adversaries include side information attacks, homogeneity attacks and location similarity attacks.
Side information refers to information used by adversaries to filter dummy locations and help reduce anonymity, including map information and location query probability. For example, for a randomly generated dummy location set, some locations may be in a river or no man's land, and adversaries can easily filter out these locations based on the map information. Assuming that the location anonymity requirement is k, when k ′ of the k − 1 dummy locations are filtered by the adversary based on the side information, the k−anonymity requirement is not satisfied, resulting in a decrease in the level of privacy protection.
Homogeneity attack means that the adversary analyzes the distance between multiple locations in a DLS to infer a user's privacy. Specifically, if the distance between any two locations is very close such as in the same building, although the DLS satisfies k−anonymity, the user's location privacy cannot be well protected because the cloaking region is too small. e location similarity attack means that the adversary analyzes the semantic information in the cloaking region to infer a user's privacy. More specifically, if the region contains only one kind of semantic information, such as a hospital or school, the adversary can infer the user's behavior.

Inference Attack Model.
In a big data environment, an adversary can use the various data collected to infer the privacy information of the user's location [28], which is called the location inference attack.
In the location inference model, the adversary has certain background knowledge such as the user's service request history records, LPPM, etc. Using the user's service request history records, the adversary can calculate the user's query probability distribution φ(l). When the user sends a query request again, if the location query probability distribution in the anonymous set is not uniform, the adversary can infer that the user is likely to be located in a location with a higher probability. While for the LPPM, the adversary can analyze the intercepted location request, combined with the anonymity algorithm, to infer the probability that each location in the anonymous set is the user's true location, so as to make the inference attack more accurate.

e Structure of STTP.
It can be seen from Section 2.1 that, for the current two mainstream location privacy protection system structures, the TTP structure has the problem of a single point of failure, while the independent structure has the problem of mobile device performance bottlenecks. In view of the problems above, we have designed a semi-trusted third party (STTP) based location privacy protection structure. STTP is based on the traditional TTP structure by adding an encryption server and Wi-Fi AP and stores the user's private information in the threeparty server through a certain mechanism, which results in that even if the location anonymizer has been controlled by adversaries, STTP also protects the user's private information to a certain extent. Furthermore, the location anonymizer is responsible for implementing the privacy protection algorithms and filtering query results, so there are no problems such as mobile device performance bottlenecks. STTP is shown in Figure 2, which consists of the following five entities: User: using a mobile terminal to initiate a location service request when needed. Wi-Fi AP: providing network support, and calculating, storing LST and LQP. Encryption server (ES): providing encryption and decryption key pairs corresponding to the user's pseudonym. Location anonymizer (LA): converting the user's actual location into a dummy location set, and after the service provider returns the query result, extracting appropriate service information and returning it to the user. Service provider (SP): return the corresponding service result according to the location query request. e proposed scheme assumes that ES, LA, and SP are "honest and curious." On the one hand, they will not disrupt the protocol process and can faithfully complete their work following the agreement; on the other hand, they all want to analyze more other sensitive information about the user from what they have mastered. Meanwhile, we further set that ES, LA, and SP cannot collude with each other, that is, they will not be controlled by an adversary simultaneously. ere will be no secrets for the user if the three parties conspire, so this setting is reasonable.

Interactive Process.
ere are eight steps in the interactive process of the proposed scheme. e specific implementation of each step is described below (as shown in Figure 2): (1) Before initiating a location service request, the user first requests Map cur , LST, and LQP from the Wi-Fi AP.
(2) e Wi-Fi AP generates Map cur , computes and stores LQP of all locations within its current coverage area, generates and saves LST by collecting location semantic information within its radio range, and then sends Map cur , LST, and LQP to the user. It should be noted that for any Wi-Fi AP, the location within its coverage area is relatively stable, so LST and LQP do not need to change frequently. (3) e user then requests the pseudonym U pseu and key pair from ES. Specifically, if there are multiple service requests at the same location within the limited time t session , the user only applies for the pseudonym and key pair once; when the time exceeds t session or the user's real location changes, she will reapply for a new pseudonym and key, so as to achieve the effect of confusing her identity. (4) ES generates the corresponding pseudonym U pseu and RSA key pair (Key public , Key privacy ), returns U pseu and Key public to the user, and sends U pseu and Key privacy to the SP. It should be noted that, as an example, the solution uses the classic RSA algorithm for encryption, and it can be replaced by other encryption algorithms according to actual requirements. In addition, the solution requires ES to only act as a provider of pseudonyms and keys, so ES does not store related pseudonyms and keys locally. (5) e user first encrypts his query content Query with the public key Key public and then sends his current pseudonym U pseu , encrypted query content Query ′ , current real location l, Map cur , LQP, and LST to LA together. (6) After receiving the information, LA performs the corresponding location anonymity algorithm that generates a dummy location set DLS to hide l and then sends U pseu , Query ′ and DLS to the SP. (7) After receiving the location service request, the SP first searches for the corresponding private key Key privacy according to U pseu , which is used to decrypt Query ′ , and then provides the service result Result(Query|DLS) according to Query and DLS, finally return it to LA. (8) After receiving Result(Query|DLS), the LA first identifies the corresponding location l d according to the U pseu , and then filters out the query result Result(l d ) from Result(Query|DLS) and finally returns it to the user.

Proposed Scheme
In this section, we first introduce a dummy location selection algorithm based on location semantics and physical distance (SPDDS) and then present a location anonymous optimization method based on Stackelberg game.

A Dummy Location Selection Algorithm Based on Location
Semantics and Physical Distance. Based on the analysis above, the final dummy location set not only needs to avoid selecting places that are easy to be filtered by adversaries, such as rivers and no man's land but also to meet the semantic diversity while making the locations as dispersed as possible. In other words, the final dummy location set needs to simultaneously satisfy (11)- (13) Security and Communication Networks 7 DLS � arg max min d sem l i , l j , DLS � arg max min d phy l i , l j , where l i , l j ∈ DLS, i ≠ j, P r (l i ) > 0, P r (l d ) > 0. It can be formulated as a multiobjective optimization problem (MOP) since three factors are considered simultaneously. However, we put forward a simpler objective formulas considering the complexity of MOP. In each dummy location set, we would like to make sure that (14) can be satisfied. Consequently, we propose a dummy location selection algorithm based on location semantics and physical distance (SPDDS).
DLS � arg max min d sem l i , l j + r · d phy l i , l j max d que l i , l j + 1 where l i , l j ∈ DLS, i ≠ j. d que (l i , l j ) + 1 is to avoid the situation where the two locations have the same probability, that is, the difference between the query probability of the two locations is 0. Here, r is a controllable factor for balancing the share of semantic distance, physical distance, and query probability distance since d sem (l i , l j ) ≤ 2 · (h − 1), where h is the depth of LST and hence is usually less than 10 while d phy (l i , l j ), as Wi-Fi transmission distance, ranges from hundreds of meters to thousands, whereas the query probability distance is always less than 1. Consequently, we set r � 0.03. Meanwhile, in order to balance service quality while the proposed algorithm can resist inference attacks effectively, we should take into account both PL and Qos loss . So, we propose a location anonymous optimization method based on Stackelberg game, which introduces Stackelberg game to optimize the dummy location selection algorithm. More specifically, we formalize the mutual optimization of user-adversary objectives (location privacy vs. correctness of inferring location) by using the framework of Stackelberg games, to find an optimal dummy location set. e optimal dummy location set can resist single-point attacks and inference attacks while effectively balancing service quality and location privacy. e main idea of SPDDS is to first select an offset location to replace the user's real location; secondly, selecting all locations that satisfy the semantic difference with the existing locations in the current dummy location set as the dummy location candidate set (DLCS); then, selecting an optimal location in the DLCS which refers to the location formed by satisfying (14); finally, a set consisting of an offset location and k − 1 dummy locations is generated. Algorithm 1 shows the formal description of the SPDDS algorithm.

A Location Anonymous Optimization Method Based on
Stackelberg Game. We propose a location anonymous optimization method based on Stackelberg game, which optimizes the dummy location selection algorithm by introducing the Stackelberg game. More specifically, we formalize the mutual optimization of user-adversary objectives (location privacy vs. correctness of inferring location) by using the framework of Stackelberg game, and based on which we construct the related linear programs. We can find an optimal dummy location set by solving the linear programs, which can resist single-point attacks and inference attacks while effectively balancing service quality and location privacy.

Location Inference Model.
In the location inference model, the adversary has certain background knowledge such as the user's service request history records, LPPM, etc. Using the user's service request history records, the adversary can calculate the user's query probability distribution φ(l). When the user sends a query request again, if the  location query probability distribution in the anonymous set is not uniform, the adversary can infer that the user is likely to be located in a location with a higher probability. While for the LPPM, the adversary can analyze the intercepted location request, combined with the anonymity algorithm, to infer the probability that each location in the anonymous set is the user's true location, so as to make the inference attack more accurate.
Based on the existing knowledge (φ(l), f(DLS|l), etc.), the adversary can form the posterior distribution on the true location l of the user, conditional on the anonymous set DLS: e adversary's objective is then to choose l to minimize the user's conditional expected privacy, where the expectation is taken under P r (l|DLS). e user's conditional expected privacy for an arbitrary l is l P r (l|DLS)d phy (l, l), (16) and for the minimizing l, it is min l l P r (l|DLS)d phy (l, l). (17) If there are multiple values of l that satisfy (17), then the adversary randomizes arbitrarily among them. e probability with which l is chosen in this randomization is g(l|DLS).

Stackelberg Game Optimization Process.
Here, we assume that the adversary has some background knowledge. Specifically, he will infer the user's actual location l as much as possible according to φ(l), the LPPM used by the LA, the anonymous result DLS, and other background knowledge. Relatively, we can assume that all the background knowledge that LA knows will be used by the adversary, so LA can use the adversary's optimal attack strategy as a parameter to optimize the generation process of the dummy location set DLS.
We formalize the process above by using the framework of Stackelberg game. In a Stackelberg game the leader, in our case, the LA plays first by giving the dummy location set DLS according to the relative location privacy protection Input: l: user's real location S: user's privacy requirement Map cur : map information in current Wi-Fi AP LST: location semantic tree LQP: location query probability; Output: DLS: dummy location set; (1) Divide the Map cur as the sample space into m × m grids; (2) Generate semantic distance matric (S DM) according to the LST and geographic distance matrix (GDM) according to the Map cur and probability distance matric (PDM) according to the LQP respectively (3) According to GDM and PDM, choose the n − 1 locations with p i > 0, which are closest to l, and then a offset location set (M set ) consisting of the n − 1 locations and l has been generated; (4) Randomly choose a location from M set as offset location l d (5) Generate a dummy location candidate set (DLCS) for all p i > 0 locations in the whole grid space; (6) DLS � l d (7) Remove l d from DLCS; (8) while |DLS| < k do (9) if DLCS � ϕ then (10) anonymity failed; (11) else (12) max � 0; BestLoc � ϕ; (13) for each Loc in DLCS do (14) if d sem (Loc, DLS.last) ≤ u then (15) anonymity failed; (16) go back to line 13; (17) else (18) m � d sem (Loc, DLS.last) + r · [d phy (Loc, DLS.last)]/d que (Loc, DLS.last) (19) compute the maximum m according to SDM, GDM and PDM, which is recorded with max, and then assign the corresponding Loc to BestLoc (20) end (21) end (22) DLS � DLS ∪ BestLoc Security and Communication Networks algorithm f(DLS|l). e follower, in our case the adversary, plays next by estimating the user's true location, knowing some background knowledge.
We use the distance between the adversary's inferred location l and the user's actual location l to measure the utility of the participants in the game: the greater the distance, the greater the LA returns, indicating that the anonymous algorithm is more effective in resisting inference attacks; on the contrary, the smaller the distance, the greater the adversary returns, the more effective the adversary's attack strategy. e game model is also an instance of a zero-sum game, as the adversary's gains (or losses) of utility is exactly balanced by the losses (or gains) of the utility of the user: the information gained (lost) by the adversary is the location privacy lost (gained) by the user. e purpose of Stackelberg game optimization is to find SSE so that the adversary cannot obtain more benefits by optimizing the attack strategy (that is, the adversary cannot make more accurate inferences about the actual location of the user). In this paper, SPDDS optimized by Stackelberg game is denoted as SPDDS SG.
It should be noted that DLS is the result obtained by l d further anonymously, so f(DLS|l) can be further expressed by In some cases, f(DLS|l) and f(l d |l) are equal, the reason is that the adversary can filter out dummy locations except l d in such cases.
We see that, for a given DLS, the user's conditional expected privacy is given by (17). e probability that DLS is output by the LPPM is P r (DLS) � l φ(l)f(DLS|l). Hence, the user's unconditional expected privacy (averaged over all DLS) is DLS P r (DLS) min To facilitate the computations, we define Incorporating x into (19), we write the unconditional expected privacy of the user as which the user aims to maximize by choosing the optimal DLS. To facilitate the computations, (20) can be transformed to a series of linear constraints: In addition, SPDDS SG needs to conceal the user's real location on the premise of ensuring the user's service quality. In order to ensure the quality of service, we set the service quality threshold Q max loss to limit the maximum service quality loss. e specific process is In summary, SPDDS SG can be solved by a linear program. e final definition of linear program is where C 1 is used to maximize the utility of the adversary; C 2 reflects the service quality constraint; C 3 indicates that the sum of the generation probability of the dummy location set must be 1; C 4 indicates the probability of each candidate dummy location set is greater than zero. SPDDS SG solves the objective function under the constraints in (24) and obtains the optimal dummy location set, which can resist single-point attacks and inference attacks while effectively balancing service quality and location privacy.

Simulations and Results
In this section, we use Python software to simulate the experiment. First, we give the relevant parameters of the experiment. Furthermore, we simulate the experimental results and analysis of the proposed scheme.

Simulation Setup.
Our scheme is implemented in MATLAB and performed on a Windows 10 PC with an Intel Core i5-8500 CPU, a 3.00 GHz processor and a 8.00 GB main memory. We use a real road map of Guangzhou from Google Maps, since Guangzhou as a provincial capital in southern China is a big city with enough users in LBS and its central urban area has been covered by Wi-Fi APs in 2016. e coverage area of each Wi-Fi AP is about 700 ∼ 800 m, the sample space is divided into 13 × 13 cells with equal size, and a total of 13 559 sample trajectories are used as historical data to calculate the historical query probability of each cell. Besides, all locations in our experiments are divided into 6 categories semantically as follows: Education and Science, Administration and Housing, Medical care, Shopping malls, Public places, Catering and Entertainment. e value ranges of the main parameters u and k of the experiment are 3 ≤ u ≤ 7 and 2 ≤ k ≤ 30, respectively.

Results and Analysis.
We first evaluate the effectiveness of our proposed scheme in resisting single-point attacks from three assessment metrics as follows: (1) PERL. As is shown in Definition 5, it reflects the effectiveness of the algorithm in resisting side information attack. (2) PD. As is shown in Definition 6, the larger the PD, the more dispersed the dummy locations in the DLS, and the better the effectiveness of the algorithm in resisting homogeneity attack. (3) θ. As is shown in Definition 7, it refers to the level of semantic diversity in the anonymous set, which reflects the effectiveness of the algorithm in resisting location similarity attack; Next, evaluating the effectiveness of the scheme in resisting inference attack while balancing location privacy and service quality from two assessment metrics as follows: (1) PL. As is shown in Definition 8, the larger the PL is, the better the effect of LPPM against inference attacks is. (2) Qos loss . As is shown in Definition 9, it reflects the effectiveness of the algorithm in balancing location privacy and service quality.

Effectiveness of the Scheme against Single-point Attacks.
(1) PERL vs k. In Figure 3(a), we compare the PERL of KLPPS, Max Min Dist DS [27], Simp Max Min Dist DS [27], enhanced − DLS [23], and HCLS − SG [31] schemes. As we can see, the PERL of the five schemes shows a downward trend with the increase of k, which means that the larger the k, the more difficult it is for adversaries to filter out invalid locations in the anonymous set through side information attacks, the better the effect of the scheme against side information attacks. e PERL of the KLPPS, enhanced − DLS, and HCLS − SG is lower than that of the Max Min Dist DS and Simp Max Min Dist DS. And that of the KLPPS, enhanced − DLS, and HCLS − SG are basically the same. e reason is that the KLPPS, enhanced − DLS and HCLS − SG all consider the query probability and avoid selecting locations with low access probability such as lakes and forests to form an anonymous set; whereas the Max Min Dist DS and Simp Max Min Dist DS do not consider the query probability, so there will be cases where invalid locations are selected, and thereby the adversary can filter out ones through side information attacks. In summary, the KLPPS scheme can effectively resist side information attacks.
(2) P D vs k. Figure 3  e reason is that the KLPPS, Max Min Dist DS, and Simp Max Min Dist DS schemes all consider the semantic information of the location when selecting dummy locations, thereby ensuring semantic diversity, while the enhanced − DLS and HCLS − SG schemes only consider the query probability of each location point instead of considering the situation that each location point may have the same semantic information. Moreover, the location points with higher query probability are often in hotspot areas, between which the semantic information is very similar and therefore not satisfying the semantic diversity. Consequently, the enhanced − DLS and HCLS − SG schemes behave such badly in semantic diversity that they cannot resist location similarity attacks. In summary, the KLPPS scheme can effectively resist location similarity attacks. e experimental results above show that the KLPPS scheme can effectively resist homogeneity attacks, location similarity attacks, and side information attacks simultaneously compared with the Max Min Dist DS, Simp Max Min Dist DS, enhanced − DLS, and HCLS − SG schemes, thereby effectively resisting single-point attacks.

Effectiveness of the Scheme against Inference Attacks and Balances PL and Qos loss .
Combining the location inference model and (6), it can be seen that the adversary can perform inference attacks, the purpose of which is to choose l based on existing knowledge to minimize the expected user privacy. (25) defines this attack strategy: Combining (6) and (25), we can construct the following linear program to find the optimal l: We use the model defined by (26) to run inference attacks on KLPPS, HCLS [31], SG [13], and HCLS − SG to make a comparison from two aspects of PL and Qos loss , evaluating the effectiveness of the KLPPS scheme.
(1) PL. e definition of PL is shown in (6), the larger the PL, the better the effect of LPPM against inference attacks. As shown in (24), the preset service quality loss threshold Q max loss and anonymous degree k have a greater impact on PL, so we evaluate the effectiveness of the KLPPS scheme against inference attacks from the two assessment metrics of Q max loss and k.  Figure 4(a). As shown in the figure, we can draw the following 3 conclusions. First, the PL of KLPPS, SG, and HCLS − SG is significantly better than HCLS, which results from that HCLS does not consider the inference attack strategy of the adversary. Secondly, the KLPPS and HCLS − SG behave better than SG, the reason is that the former two use dummy location anonymity, increasing the difficulty of the adversary's inference. Finally, with the increase of Q max loss , the PL of the four schemes all increase, however the growth trend of PL of the four schemes slows down when reaching a certain level, which is related to the query probability distribution of user's location, indicating that the influence of Q max loss on PL is limited.

(b) PL vs k
We compare the PL of KLPPS, HCLS, SG, and HCLS − SG schemes under different k in Figure 4(b). As we can see, three conclusions have been drawn below. First of all, as the value of k increases, the PL of KLPPS, HCLS, and HCLS − SG are significantly improved while not the SG. e reason is that SG only provides offset locations and does not consider dummy location anonymity; Secondly, the KLPPS and HCLS − SG behave better in improving PL than that of HCLS, which results from that the former two all consider the adversary's inference attack strategy, while the latter does not, so HCLS is less effective in resisting inference attacks than that of the KLPPS and HCLS − SG; Finally, the growth trend of of the four schemes slows down when reaching a certain level, which results from that the four schemes all use the offset location instead of the real location to protect users' privacy. More specifically, the choice of l d will make the service quality loss Qos loss is gradually approaching Q max loss with the increase of k, and the four schemes are all maximized PL under the premise of ensuring that Qos loss ≤ Q max loss , which means that when Qos loss gradually approaches Q max loss , the growth trend of PL slows down until Qos loss � Q max loss , reaching the maximum value.
(2) Qos loss . Qos loss is closely related to PL. Specifically, in some cases, users allow losing a certain service quality in exchange for higher privacy. In the experiment, we set the maximum service quality loss that the user can accept as Q max loss � 0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5 { } and analyze the relationship between Qos loss and PL on this condition. Figure 5 shows the experimental results. First of all, it can be seen from the figure that the Qos loss and PL of the three schemes all increase with the increase of Q max loss . e reason for this is obvious: in the first place, we can see the PL will increase to a certain extent with the increase of Q max loss from Figure 4(a). In the second place, from the definition of Q max loss and Qos loss , it is obvious that Qos loss will increase to a certain extent with the increase of Q max loss . Secondly, under the same Q max loss , the PL of KLPPS and HCLS − SG is significantly higher than that of HCLS, but Qos loss is also higher than HCLS to a certain extent, which results from that both KLPPS and HCLS − SG will make full use of the limited maximum service quality loss to optimize the selection of dummy location set, so as to improve location privacy as much as possible while ensuring that the loss of service quality does not exceed the constraints of service quality, thereby effectively balancing service quality and location privacy. In addition, under the same Q max loss , the PL of KLPPS is slightly larger than that of HCLS − SG while Qos loss is slightly smaller than that of HCLS − SG, indicating that KLPPS is better than HCLS − SG in balancing service quality and location privacy. e experimental results above show that the scheme can effectively resist inference attacks while effectively balancing service quality and location privacy compared with the SG, HCLS, and HCLS − SG schemes.

Conclusion
ere are some problems such as single point of failure and without the ability to effectively resist single-point attack and inference attack, etc., in the traditional k−anonymous location privacy protection schemes. To solve the problems above, by analyzing the merits and drawbacks of the existing location privacy protection system architecture, we propose a semitrusted third party based location privacy protection architecture that can tackle performance bottleneck of mobile device and single point of failure. en, comprehensively considering the characteristics of side information, semantic diversity, and physical dispersion of locations combined with the ideas of dummy location technology and offset location, a dummy location selection algorithm based on location semantics and physical distance is proposed to effectively resist single-point attacks. Finally, we propose a location anonymous optimization method based on Stackelberg game to optimize the dummy selection algorithm. Specifically, we formalize the mutual optimization of user-adversary objectives (location privacy vs. correctness of inferring location) by using the framework of Stackelberg games, to find an optimal dummy location set. e optimal dummy location set can resist single-point attacks and inference attacks while effectively balancing service quality and location privacy. e experimental results further verify the effectiveness of the proposed scheme. However, our work still has the following shortcomings. First, in LBS, more and more people use continuous query services such as navigation services, etc., while our scheme can be only applied in snapshot query scenario not the continuous query scenario. Secondly, in different application scenarios, users have different requirements for privacy protection level and service quality, so it needs to be improved as much as possible in terms of balancing data validity and privacy levels. e next work hopes to improve our scheme to make it suitable for continuous query scenarios. Meanwhile, aiming at users with different needs in different scenarios, on the basis of further balancing service quality and privacy protection level, we design a personalized location privacy protection scheme.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest. Security and Communication Networks 13