Protecting Check-In Data Privacy in Blockchain Transactions with Preserving High Trajectory Pattern Utility

Because the blockchain is secure and untamperable, it has been widely used in many industries, such as the financial industry, digital tokens, and e-commerce logistics. The remarkable security feature of the blockchain is that the blockchain verifies the transaction initiated on each block through the node, and its process is broadcast throughout the whole network to let everyone know. On the one hand, this ensures the security of every transaction, but on the other hand, it is easy to cause privacy disclosure problems for transaction users. Therefore, under the premise of ensuring the security of the blockchain, it has become a hot issue to protect the sensitive information of transaction users. A check-in privacy protection (CPP) algorithm based on check-in location generalization is proposed in this paper, which can be applied to blockchain transactions to solve the privacy leakage problem of transaction users’ sensitive information. CPP algorithm not only protects the privacy of check-in data but also keeps the high utility of trajectory pattern data. Firstly, location types are recommended in the sensitive check-in location generalization based on the user’s trajectory pattern by using Markov chain technology. Secondly, to make sure that the generalized locations can be scattered as much as possible to prevent the attacker from deducing back, a heuristic rule is designed to select the generalized location based on the recommended location types, and at the same time, the similarity between the anonymous trajectory and the original trajectory is maintained. In addition, a generalized location search strategy is designed to improve the efficiency of the algorithm. Based on the real spatial-temporal check-in data, the results of the experiment indicate that our algorithm can effectively protect the privacy of sensitive check-in while ensuring the high utility of trajectory pattern data.


Introduction
In recent years, the blockchain [1] has been broadly used in the financial industry, digital tokens, e-commerce logistics, and many other industries due to its characteristics of security and untampering. The significant security feature of the blockchain is that the blockchain authenticates each transaction initiated on each block through the node, and its process is broadcast throughout the network for everyone to know. This not only ensures the security of the transaction but also brings privacy harm to the transaction users. Hence, under the premise of ensuring the security of the blockchain [2,3], it is already an issue worthy of attention to protect the sensitive information of transaction users. With the constant development of mobile networks [4,5], vehicular networks [6][7][8][9], wireless communications network [10], and GPS-enabled devices, a mass of check-in data [11] of mobile users has been collected and utilized.
Check-in data contains the characteristics of human behavior, which plays a key role in major social science issues such as disease transmission, epidemic prevention and control, poverty eradication, urban planning, and other important life applications such as route recommendation and bus travel. Government and many research institutions hope to create more value through data mining. The trajectory contains many sensitive check-in data. Users' private information (home address, religious belief, interests, health, and other private information) will be obtained and used by malicious attackers, assuming these sensitive check-in data is leaked. Therefore, protecting sensitive check-in data in trajectory has become a challenging problem.
Check-in data means the user visits a certain place at a certain time. Sensitive check-in refers to user hopes to keep check-in data from being leaked. The user u's historical trajectory set T r = ftr 1 , tr 2 , tr 3 , tr 4 , tr 5 , tr 6 g is shown in Figure 1(a), and four check-in data in chronological order are included in the trajectory tr 5 = <ðl 2 , t 51 Þ, ðl 5 , t 52 Þ, ðl 7 , t 53 Þ, ðl 8 , t 54 Þ > . The check-in data ðl 7 , t 53 Þ indicates that user u visits location l 7 at time t 53 . The location type of l 7 is the zoo, and t 53 belongs to user u's office time. User u does not want to disclose the check-in data; therefore, ðl 7 , t 53 Þ is set to sensitive check-in of user u. Currently, there is no privacy protection technology for sensitive check-in, and location privacy protection [12][13][14][15] is nearest to the problem.
Location generalization [16] is a popular location privacy protection method, and it has the characteristics of retaining the user's complete location information, law computation, and simple mechanism. However, these current location generalization methods do not consider the user trajectory pattern factor, which may reduce the privacy protection degree of sensitive check-in or even directly reveal the real sensitive check-in of user.

Wireless Communications and Mobile Computing
For example, the trajectory in Figure 1(b) is an anonymous trajectory obtained by using location generalization technology under 4-anonymity privacy requirement (4-anonymity means that the probability of identifying the sensitive check-in location based on this anonymous trajectory is no more than 1/4). Generalized locations l 7 1 , l 7 2 , and l 7 3 are obtained by a random method in literature [17], and these and the real sensitive check-in location l 7 form an anonymous location set to participate in the trajectory release; thus, the attacker cannot guess the real check-in location of user u at time t 53 . This anonymous trajectory has two problems: (1) it does not conform to the user trajectory pattern. Location type of l 5 is bank. Based on user history trajectory, it can be seen that the next possible location type is the zoo, the coffee shop, and the bank from the current location type with the probability of visit being 4/7, 2/7, and 1/7, respectively. However, the location type of l 7 1 and l 7 3 is the fitness room. Obviously, the attacker can easily deduce that ðl 7 1 , t 53 Þ and (l 7 3 , t 53 ) are false check-in based on user history trajectory. (2) The similarity between the anonymous trajectory and the original trajectory decreases. l 7 a is the anonymous central location generated after random generalization, and l 7 a′ is the anonymous central location generated after another anonymization. As shown in Figure 1(b), the shape of the anonymous trajectory <ðl 2 , t 51 Þ, ðl 5 , t 52 Þ, ðl 7 a , t 53 Þ, ðl 8 , t 54 Þ > differs greatly from that of the original trajectory <ðl 2 , t 51 Þ, ðl 5 , t 52 Þ, ðl 7 , t 53 Þ, ðl 8 , t 54 Þ > . In Figure 1(c), the shape of the anonymous trajectory <ðl 2 , t 51 Þ , ðl 5 , t 52 Þ, ðl 7 a′ , t 53 Þ, ðl 8 , t 54 Þ > is closer to that of the original trajectory <ðl 2 , t 51 Þ, ðl 5 , t 52 Þ, ðl 7 , t 53 Þ, ðl 8 , t 54 Þ > . Due to the above two problems, the probability of identifying sensitive check-in will be greater than 1/4 and lead to privacy disclosure of sensitive check-in. To solve the problem, this paper proposes a check-in privacy protection algorithm based on check-in location generalization to protect the privacy of check-in data and keep the high utility of trajectory pattern data.
The main contributions are as follows: (1) We propose a check-in privacy protection (CPP) algorithm based on check-in location generalization (2) We recommend generalized location types by using Markov chain technology, and design a heuristic rule to select generalized locations (3) We optimize the generalized location search strategy to improve the efficiency of the algorithm (4) Extensive empirical studies show that our algorithm performs efficiently to protect check-in data while preserving the high utility of trajectory pattern data The rest of the paper is organized as follows. Section 2 analyzes related work. Section 3 presents some important concepts and problem definition. Section 4 elaborates our scheme in detail. Section 5 evaluates the performance of CPP. We conclude this paper in Section 6.

Related Work
Sweeney [12] first proposed the concept of the k-anonymity model, and it was first applied in the relational database.
Subsequently, Gruteser and Grunwald and Gruteser and Liu [18,19] applied the k-anonymity model to location privacy protection. The core idea of it is that the anonymous server selects k − 1 generalized locations to form an anonymous set with user real location, and the k locations cannot be distinguished from each other. Gedik and Ling [20] proposed the Clique Cloak algorithm, which constructed the anonymous region based on the graph model combined with time and space factors, and transformed the problem of anonymous set into the problem of finding k − 1 neighbors in the graph model. Wang et al. [21] proposed a generalized location generation scheme based on semantic information and query probability, which can generate k − 1 generalized locations related to user location semantic information. Niu et al. [22] proposed an enhanced DLS algorithm, which can select 2k generalized locations with high query probability similarity to the real location by calculating the location entropy and then select k − 1 generalized locations from them by calculating the product of location distance. Lu et al. [23] proposed two generalized location generation algorithms CirDummy and Grid-Dummy to realize location k-anonymity considering the shape of user privacy region.
Dwork [13] first proposed the differential privacy protection method, which protects privacy by adding noise to distort data. The differential privacy protection technology with mathematical theory and strict mathematical definition has two characteristics: first, it is not affected by attackers with background knowledge, and second, it is not affected by changing the specific data. Xiong et al. [24] proposed a spatial crowdsourcing algorithm based on a reward mechanism, which protects location privacy by adding Laplace noise to location data. Xu et al. [25] proposed a hybrid location privacy protection method, which divided locations into discrete locations and nondiscrete locations. For discrete locations, differential privacy technology was directly used for noise processing; while for nondiscrete locations, a k-means clustering algorithm based on differential privacy technology was used for generalization processing. However, excessive noise will lead to poor data availability and serious errors. Thus, Ping et al. [26] proposed PriLocation, a differential privacy protection method for noise reduction, to solve effectively this problem caused by excessive noise.
The basic idea of the location privacy protection method based on encryption technology is to encrypt the user's query information. Even if the attacker obtains the query information, he cannot know the real privacy information behind the query information. Zhang and Ni [14,15] proposed a neighbor query method PRN-KNN, which uses a spatial encryption algorithm to enable users to quickly query k-neighbor candidate sets and introduces pseudo-random number secret rules to effectively reduce algorithm processing time. Papadopoulos et al. [27] used security hardware to assist PIR protocol and protected user location privacy through KNN query. Encryption-based location privacy protection technology can better ensure data availability and service accuracy, but the disadvantage is a large amount of calculation.

Preliminaries and Problem Definition
The check-in data set of user u is represented as C u = fc i | i ∈ ½1, mg. The check-in data c i = ðl i , t i Þ indicates that user u visits location l i at time t i , where t i is the check-in time, and l i is the specific location on the map, such as Northeastern University, Wanda Plaza, and Beiling Park, and (x, y) is the latitude and longitude of a specific location, respectively. T i represents the location type of a specific location, such as universities, shopping centers, and parks.
Definition 1 Sensitive check-in. Given trajectory tr = <ðl 1 , t 1 Þ, ðl s , t s Þ, ⋯, ðl n , t n Þ > , if the user does not want to check in, (l s , t s ) was exposed, so (l s , t s ) is called sensitive check-in. As shown in Figure 1(b), (l 7 , t 53 ) is a sensitive check-in in trajectory tr 5 .

Definition 2 Trajectory pattern matrix M.
Given an m * m matrix, T 1 , ⋯, T m represents the location type, and MðT i , T j Þ represents the probability that the user travels from location type T i to location type T j .
As shown in Table 1(a), the location type of the zoo, the fitness room, the coffee shop, and the bank are, respectively, denoted T 1 , T 2 , T 3 , and T 4 , respectively. The user trajectory pattern matrix M is obtained according to the transfer situation of location type in user's historical trajectory set T r , where MðT 1 , T 2 Þ represents the transfer probability that the user travels from location type T 1 to the next location type T 2 . In Figure 1(a), location type T 1 includes l 7 , l 8 , and l 9 . The next location of l 7 is l 1 (T 2 ) and l 8 (T 1 ). The next location of l 8 is l 2 (T 3 ). The next location of l 9 is l 3 (T 2 ) and l 5 (T 4 ). Therefore, the value of MðT 1 , T 2 Þ is 2/5. Definition 3 Check-in location generalization. Given a checkin data (l s , t s ), generalization operation refers to convert location l s of check-in (l s , t s ) to a location set L ′ = fl s , l 1 ′ , l 2 ′ , ⋯, l i ′ g, there are 1 + i locations in L ′ , and the probability that any location in the set appears between moment t s−1 and moment t s+1 is equal.
Definition 4 Anonymous trajectory. The trajectory obtained after replacing the sensitive check-in location l s in the original trajectory with the anonymous center location l s a after anonymization.
Definition 5 Trajectory pattern similarity. Given the original trajectory pattern matrix M and the anonymous trajectory pattern matrix M ′ (the order of the matrix is m), the trajectory pattern similarity is shown in Formula (1): As shown in Figure 1(b), the anonymous trajectory pattern matrix M ′ is obtained by anonymizing the original trajectory pattern matrix M, and the value of trajectory pattern similarity simðM, M ′ Þ is 99.93%.
Definition 6 Check-in k-anonymity. Given sensitive check-in (l s , t s ), the generalized location set c ′ = fl s 1 , ⋯, l s m g is get through the check-in location generalization operation, where size ðc′Þ > = k, so that the leakage rate of check-in location is not greater than 1/k, namely, check-in k -anonymity.
Definition 7 Location exposure rate LE. The generalized location is expressed as l ′ , the location anonymous set is composed of the real check-in location l and k − 1 generalized locations, namely, LAS = fl, l 1 , l 2 , ⋯, l k−1 g. Given the user location anonymous set LAS, the attacker uses background knowledge to identify LAS and infers the probability of the user real check-in location as shown in Formula (2): |LAS | represents the total number of locations in an anonymous set, and |LAS′| indicates that the attacker can identify the number of generalized locations.
Definition 8 Distance between trajectories. Given original trajectory, sensitive check-in location l s , anonymous trajectory, and anonymous center location l s a , the distance between trajectories is defined as the Euclidean distance between two locations as seen in Formula (3):

Check-In Privacy Protection Algorithm Based on Generalization of Check-In Location
In this section, the check-in privacy protection algorithm based on check-in location generalization (Algorithm 1) is proposed. The main idea is to select the generalized location based on the original trajectory pattern matrix in the process of check-in location generalization so that the generalization operation can change the similarity between the original trajectory pattern matrix and the anonymous trajectory pattern matrix as little as possible. Thus, high data availability of anonymous trajectory in trajectory patterns is guaranteed. The algorithm framework of this paper is shown in Figure 2. The algorithm framework can show that users' sensitive check-ins are protected by the four algorithms (Algorithms 1-4) proposed in this paper, and this method can be used to protect user identity information in blockchain transactions. First, the Markov chain-location type recommendation (MC-LTR) algorithm is used to recommend the set of location types for sensitive check-ins (line 2). Generalizing location search (GLS) algorithm is used to search the specific location in the generalization area (line 3). The location assignment based on trajectory pattern (LATP) algorithm is adopted to allocate the number of generalized locations corresponding to the recommended location type, and the aim is to ensure that the change of the anonymized trajectory pattern matrix is minimal (line 4). The dummy location selection (DLS) algorithm is used to obtain the candidate array of generalized locations (line 5). As shown in Formula (4), score is a heuristic function, whose value measures the influence of the distance product between the generalized locations and the sensitive check-in location and the dis-tance between trajectories before and after anonymity. The higher the value, the more scattered between the generalized locations and the sensitive check-in location, and the closer the distance between the anonymous trajectory and real trajectory is. Finally, the CPP algorithm returns an anonymous location set containing k locations (line 6).
For example, we protect sensitive check-in (l 7 , t 53 ) in trajectory tr 5 .The random choice of location type is likely to expose the user's sensitive check-in, so the MC-LTR algorithm is used to ensure that the generalized location type conforms to the user's historical trajectory pattern. The location type of sensitive check-in location l 7 is T 1 , and the location type of next moment predicted based on Markov chain includes T 1 , T 2 , T 3 , and T 4 , and the recommendation probability is 4/35, 0, 1/14, and 4/49, respectively. Therefore, the recommended set of location types for sensitive check-in (l 7 , t 53 ) is T = fT 1 , T 4 , T 3 g. Searching the specific location corresponding to the recommended location type mainly considers two factors: historical average speed and time accessibility. In the query area, GLS algorithm will be used to put the searched specific locations corresponding to each location type in the set into location queue L, namely, L = ½ l 1 1 , l 1 2 , l 4 1 , l 4 2 , l 4 3 , l 3 2 , wherein the location type T 1 contains two specific locations l 1 1 and l 1 2 , and the location type T 4 contains three specific locations l 4 1 , l 4 2 and l 4 3 , and the location type T 3 contains one specific location l 3 2 . Due to need to achieve the 4-anonymity protection, three generalized locations are selected from location queue L to ensure that the anonymous set can achieve optimal protection. By using the LATP algorithm, one location type meeting the requirement of anonymity is selected at a time, and the number array S of generalized location types is obtained. Among them, S½T 1 = 2, S ½T 4 = 1, and S ½T 3 = 0. The dispersion between locations and the change situation of the original trajectory's shape and the anonymous trajectory's shape are considered. The purpose is to prevent the location exposure and ensure trajectory similarity. Finally, the DLS algorithm is used to select 3 candidates from the location    Wireless Communications and Mobile Computing check-in, it is predicted that the generalized location types of the sensitive check-in are T 1 , T 2 , T 3 , and T 4 , and the probabilities are 4/7, 0, 2/7, and 1/7, respectively. The transfer probability of each generalized location type to the next moment location type for sensitive location is 1/5, 0, 1/4, and 4/7, respectively. Therefore, the recommended set of generalized location types for sensitive check-in is represented as T = fT 1 , T 4 , T 3 g. As shown in Table 2, the user reverse trajectory pattern matrix RðMÞ is obtained in reverse time based on the user trajectory set T r . In Algorithm 2, M and RðMÞ are taken as inputs to recommend set T of location types that meets the trajectory pattern.
Algorithm 2 shows the recommendation process of location types when generalizing sensitive check-in. The algorithm takes into account three kinds of location situations of sensitive check-in in the trajectory. When the sensitive check-in location is located at the beginning of the trajectory, the reverse trajectory pattern matrix is used for the recommended location type of the sensitive check-in (lines 2 and 3). When the sensitive check-in location is located at the end of the trajectory, the trajectory pattern matrix is used for the recommended location type of the sensitive check-in (lines 4 and 5). When the sensitive check-in location is located at the nonhead-tail location of the trajectory, the combination of two trajectory pattern matrices is used to recommend location type for the sensitive check-in (lines 6 and 7). Finally, the first s location types with high recommendation probability values are selected from the recommended location types and put into the set T of location types.

Search for Specific Generalization Location. This section mainly introduces the use of a generalized location search algorithm (Algorithm 3) to generate location queue L.
According to the recommended location type, the specific location of the corresponding location type should be searched in the query area. At the same time, the specific location selected should meet the time accessibility. For example, the query areas of sensitive check-in (l 7 , t 53 ) are circular areas with check-in location l 5 at the previous time and check-in location l 8 at the later time as the center and vðt 53 − t 52 Þ and vðt 54 − t 53 Þ as the query radius, respectively, where v is the average speed calculated from the user's historical trajectory. First, the specific locations corresponding to each location type in the query area are put into location Input: privacy protection threshold k, generalized location type set T, trajectory pattern matrix M, sensitive location type T s , location queue L; Output: generalized location type quantity array S.
update M based on S½T i to change M to M T i ; 8.
Algorithm 4: The location assignment based on trajectory pattern algorithm.     There are two locations belonging to location type T 1 , there are three locations belonging to location type T 4 , there is one location belonging to location type T 3 , and then, l:DðRÞ is defined to represent a group of locations within distance R of sensitive location l. As shown in Table 3, search the location of distance sensitive location l 7 within the 0.7 km, that is, l 7 :D ð0:7Þ = fl 1 1 , l 1 2 , l 4 1 , l 4 2 , l 3 2 }, and the unit of distance is kilometer (km). The generalized location search algorithm proposed in this paper implements the breadth-first search on the query area to realize the location search. l:D is used to store the searched candidate locations and the corresponding distance index (D-index), and then, the binary search algorithm is used to select the qualified locations, in order to save the running time of the algorithm.
Definition 9 Distance index (D-index). Given a sensitive location l, the distance index (D-index) between this location and other locations is defined as a list l:D. The elements stored in the list are candidate locations and the distance data between each candidate location and the sensitive location l, and the distance data in l:D are arranged in order from small to large.

Generalized Location Quantity
Allocation. This section mainly introduces the generalized location quantity allocation algorithm based on the trajectory pattern (Algorithm 4). The purpose is to determine the number of specific locations allocated for the recommended location type and to ensure the maximum similarity of the trajectory pattern matrix. As shown in Table 4, five generalized locations (l 1 1 , l 1 2 , l 4 1 , l 4 2 , l 3 2 ) are found for sensitive check-in (l 7 , t 53 ) through the generalized location search algorithm. Because the privacy protection threshold k is 4, so three of the five generalized locations are selected to ensure the maximum similarity of the trajectory pattern matrix. In the generalized location allocation algorithm based on trajectory pattern, the same generalized location type as the sensitive check-in location is first assigned (line 2), so two generalized locations of location type T 1 are assigned. The selection of the remaining generalization location is determined by adding a generalization location of different location type at a time and calculating the similarity value of the corresponding trajectory pattern matrix (lines 4-9). When the generalized location type T 3 is added, the similarity of trajectory pattern is 99.81%. When the generalized location T 4 is added, the similarity of trajectory pattern is 99.93%. So, a generalization location of type T 4 is assigned and returns a generalization location type quantity array S.

Selection of Candidate Generalized
Location. This section mainly introduces the selection of candidate generalized locations by candidate location selection algorithm (Algorithm 5). The generalized location candidate array needs to meet two conditions: (1) the locations in the array are as scattered as possible, which can effectively prevent the antideduction of the attacker. (2) The center location of the area formed by each location is as close as possible to the sensitive check-in location, which ensures that the anonymous trajectory is similar to the original trajectory. Among them, the traditional method to ensure the dispersion between locations is to calculate the sum of the distance between locations ∑ i≠j distðl i , l j Þ. However, the product of the distance method Q i≠j distðl i , l j Þ can better reflect the dispersion of locations in most cases. As shown in Figure 3, A and B are selected generalization locations, and C and D are to be selected generalization locations. When selecting the third generalization location, both C and D can meet the requirements if the sum of distance method is used, because tr distðD, AÞ + tr distðD, BÞ = tr distðC, AÞ + tr distðC, BÞ. However, the product of distance method is used, and we should choose the generalization location C, because tr dist ðC, AÞ * tr distðC, BÞ > tr distðD, AÞ * tr distðD, BÞ, and the anonymous region formed by the generalization location A , B, and C is more scattered, so the product of distance method is adopted in this algorithm.

Algorithm Complexity Analysis.
In the MC-LTR algorithm, generalized location types are recommended through the trajectory pattern matrix. Suppose jMj is the order of the user trajectory pattern matrix, so, the time complexity of the MC-LTR algorithm is OðjMj + jMjlog 2 jMj Þ. In the GLS algorithm, we use the binary search algorithm to find specific generalized locations that match the location type, and the algorithm complexity of the location queue at any sensitive location is Oðlog 2 |D−index½l i | Þ. In the LATP algorithm, we need to assign generalized locations for the recommended location type, because the privacy protection threshold is k, Input: generalized location type quantity array S, real sensitive location l s , privacy protection threshold k, location queue L; Output: generalized location candidate array Cand l . 1. Initialize Cand l [T i ]=0, i ∈[1,|S|]; 2. while lens (Cand l )<k-1 do 3. for each location type if lens (Cand l )==0 then 5.
Select the location furthest from the real sensitive location from the generalized locations in T i and add it to Cand l ; 6. else 7.
Select the location with the maximum Score in T i and add it to Cand l ; 8. Return Cand l ; Algorithm 5: The dummy location selection algorithm. 8 Wireless Communications and Mobile Computing so, the generalization location needs to be allocated through k − 1 cycles, and each cycle needs to update the matrix and calculate the matrix similarity. The algorithm's time complexity is Oððk − 1Þ½ðjsj − 1Þð3 + jMj * jMjÞ + 2Þ = Oðk * s jMj 2 Þ. In the DLS algorithm, we need to choose k − 1 candidate generalization locations from the generalization region, and it is necessary to judge |S ½T i | locations every time, so the algorithm complexity is Oðk * jSjÞ. Therefore, the time complexity of the CPP algorithm is Oðk * sjMj 2 Þ.

Experimental Evaluation and Analysis
This section analyzes and evaluates the performance of the proposed check-in privacy protection algorithm based on generalized check-in location. The data used in the experiment comes from two real data sets Brightkite and Gowalla disclosed by the complex network analysis platform of Stanford University. The map data of California where these two data sets are located are also obtained. Firstly, this paper deletes and filters users whose cumulative check-in days are less than 50 days in the data set and then deletes the trajectory that contains only one check-in data in a single trajectory. Finally, this article selects 5000 users and their corresponding data from the two data sets. Table 5 shows the relevant information of the experimental data. This paper proposes a check-in privacy protection algorithm based on the generalization of check-in location (recorded as CPP), compared with the dummy location selection algorithm based on multiobjective optimization (recorded as enhanced DLS) [12] and the location privacy protection algorithm based on random selection method (recorded as RS) [7]. The performance of the algorithm is analyzed by comparison, and the influence of the parameters involved in the algorithm on the algorithm is evaluated. In the test, the value range of privacy protection anonymous parameter k is from 2 to 32.
The software and hardware environment of this experiment are as follows: (1) hardware environment: Intel Xeon 3.90 GHz CPU and 256 GB; (2) operating system platform:  This section compares the performance of the CPP algorithm, enhanced DLS algorithm, and RS algorithm by analyzing the running times of the algorithm, the change of score value, and data availability. The following can be seen from Figures 4 and 5: (1) The running time of the three algorithms increases with the increase of the privacy protection threshold k. The running time of the CPP algorithm is between the RS algorithm and the enhanced DLS algorithm. The running time of the enhanced DLS algorithm changes significantly with the increase of k, and the running time of the RS algorithm is the smallest and tends to be stable. The enhanced DLS algorithm should consider the influence of the query probability and the entropy when selecting generalized locations, and the running time will be increased with the number of generalized locations, The CPP algorithm saves the running time by proposing a reasonable and effective location search algorithm. (2) The score value measures the degree of dispersion between locations and the distance between the anonymous trajectory and the original trajectory.
When the score value is larger, it means that the selected generalized location and sensitive check-in location are    more dispersed, and the distance between the generalized trajectory and the real trajectory is closer. With the increase of the privacy protection threshold k, the score of the three algorithms increases significantly. The CPP algorithm has the better performance in score value because the CPP algorithm uses the heuristic rules to select each generalization location, and it ensures that each selected generalized location can keep the maximum score value. However, the random selection of each generalized location will lead to uncertainty, and the size of the score value to be unstable. When the order of magnitude of score value is too large, this paper uses the logarithm of score value to express it. The availability of measurement data can be evaluated from three aspects: the change of user trajectory pattern, the change of access location type, and the change of access location points. The following can be seen from Figures 6-8: (1) According to the position type transition probability difference before and after anonymity of the trajectory pattern matrix, it can be divided into four intervals: 0~10 −9 , 10 −9~1 0 −7 , 10 −7~1 0 −5 , >10 −5 , counting the quantity distribution of each interval. The probability difference greater than 98% in the CPP algorithm falls in the 0~10 −9 interval, and it shows that the change of location type transition probability is small, the similarity of the trajectory pattern matrix before and after anonymity is high, while the performance of enhanced DLS algorithm

11
Wireless Communications and Mobile Computing and RS algorithm is lower than that of CPP algorithm. The similarity of the trajectory pattern matrix before and after anonymity is high, and the performance of the enhanced DLS algorithm and RS algorithm is lower than the CPP algorithm (2) The Changes in the Type of User Access Location. The total number of original location types visited by a user is 38. It can be seen from the figure that the number of access location types of the CPP algorithm changes little with the increasing of the k, and the algorithm will not generate new location types. The access location type of the enhanced DLS algorithm increases gradually as the increasing value of the k, while the number of access location types of the RS algorithm changes significantly with the increasing of the k. Because the CPP algorithm recommends the location type for the user according to the user's trajectory pattern, the generalized location selected by the random method is uncertain, and the location type of the generalized location will exceed the range of the user's original access location type.
(3) For the user's access location change index, set the number of user visits to each access location before generalization as n i , and after generalization, the number of user visits to each location becomes n i ′; the change in the number of visits for each location is defined as Δn i = |n i − n i ′ | . Set a standard number threshold N and a location set S, and put the locations with the change of the number of visits greater than or equal to the standard number threshold into the set S, symbolized as S = fl i |Δn i ≥ Ng. The value of |S | represents the number of position points in the set S. The smaller the value of |S | , the more sta-ble the number of visits of the user to each access location after anonymity, and the better anonymity. It can be seen from Figure 8 that the CPP algorithm proposed in this paper has the smallest |S | value, which is much lower than the other two algorithms, so the anonymous protection effect is the best. The RS algorithm is easy to generate new locations when selecting generalized locations, and the number of new locations is uncertain, so the |S | value is larger

Conclusion
In this paper, a check-in privacy protection algorithm based on check-in location generalization for sensitive check-in protection is proposed for the first time and can be applied to blockchain transactions to solve the privacy protection problem of transaction users' sensitive information. Considering the user's trajectory pattern factor, the algorithm recommends the location type of the generalized check-in location for the user and selects generalized locations that can ensure the minimum change of trajectory pattern. Experimental research based on real check-in data sets shows that the CPP algorithm can effectively protect the sensitive check-ins in the trajectory, greatly reduce the probability of the attacker identifying the real sensitive check-ins, and maintain the high availability of the trajectory pattern data. This method is suitable for protecting the location in the area with dense geographical density. However, the k-anonymity method may not be implemented in areas with sparse geographical density. The solution to the above problem needs to be further studied.

Data Availability
All data included in this study are available upon request by contact with the corresponding author.

12
Wireless Communications and Mobile Computing

Conflicts of Interest
The authors declare that they have no conflicts of interest.