Privacy-Preserving Incentive Mechanism for Mobile Crowdsensing

Incentive mechanisms are crucial for motivating adequate users to provide reliable data in mobile crowdsensing (MCS) systems. However, the privacy leakage of most existing incentive mechanisms leads to users unwilling to participate in sensing tasks. In this paper, we propose a privacy-preserving incentive mechanism based on truth discovery. Specifically, we use the secure truth discovery scheme to calculate ground truth and the weight of users’ data while protecting their privacy. Besides, to ensure the accuracy of theMCS results, a data eligibility assessment protocol is proposed to remove the sensing data of unreliable users before performing the truth discovery scheme. Finally, we distribute rewards to users based on their data quality.+e analysis shows that our model can protect users’ privacy and prevent the malicious behavior of users and task publishers. In addition, the experimental results demonstrate that our model has high performance, reasonable reward distribution, and robustness to users dropping out.


Introduction
As more and more sensors are integrated into humancarried mobile devices, such as GPS locators, gyroscopes, environmental sensors, and accelerometers, they can collect various types of data [1]. erefore, the MCS system [2][3][4] can utilize the sensors equipped in mobile devices to collect sensing data and complete various sensing tasks [5], such as navigation service [6], traffic monitoring [7], indoor positioning [8], and environmental monitoring [9]. In general, the MCS system consists of three entities: a task requester, a sensing server, and participating users, as shown in Figure 1. e task requester publishes sensing tasks and pays awards for sensing results. e server recruits users according to the sensing task, processes the data from users, and sends the results to the task publisher. Users collect sensing data based on the requirements of the sensing task and get rewards.
In the practical MCS system, the sensing data collected by users are not always reliable [10,11] due to various factors (such as poor sensor quality, lack of effort, and background noise). erefore, the final result may be inaccurate if we treat the data provided by each user equally (e.g., averaging).
To solve this problem, truth discovery [12][13][14] has been widely concerned by industry and academia. e main idea of most truth discovery schemes is that the user will be given a higher weight (i.e., reliability) if the user's data are closer to the ground truth. Also, the data provided by a user will be counted more in the aggregation procedure if this user has a higher weight. Recently, a number of truth discovery methods [15] have been proposed to calculate user's weight and aggregated results based on this basic idea. But one problem with these methods is that users have to be online to interact with the server. Otherwise, the MCS system may fail and have to restart. erefore, if we design a truth discovery scheme that allows users to exit, the MCS system can get stronger robustness. e proper functioning of the truth discovery requires enough users and high-quality sensing data. Generally, the MCS system utilizes an incentive mechanism [16][17][18] to motivate sufficient users to participate in sensing tasks. However, because of monetary incentives, malicious users attempt to earn rewards with little or no effort. Although the truth discovery can assign low weight to malicious users, their continuous input of erroneous data can result in the unavailability of the MCS system [19]. Consequently, the evaluation of data quality is critical to the MCS system. To improve data quality, users who provide incorrect data can be removed before sensing data get aggregated [20]. On the one hand, we can get more accurate aggregation results. On the other hand, users who provide eligible data can get more monetary rewards.
Although the incentive mechanism has been improved a lot, users' privacy protection remains inadequate. When users submit sensing data, their sensitive or private information [21][22][23] may be leaked, including identity privacy [24], location privacy, and data privacy. Also, privacy disclosure [25] will reduce users' willingness to participate in sensing tasks. Some incentive mechanism methods only consider the cost of users to collect sensing data but do not consider the potential cost of privacy disclosure. Recently, some researchers have designed privacy-preserving incentive mechanisms [26][27][28]. In [20], an incentive method is proposed to protect the user's identity and data privacy. Still, the user's sensing data will be submitted to the task publisher regardless of the privacy of the sensing data. In [29], the incentive mechanism is designed under the assumption of a trusted platform, which may not hold in practice since the platform itself might be attacked by hackers.
To address these issues, we propose a privacy-preserving incentive mechanism based on truth discovery, called PAID. In our PAID, the task publisher sets data constraints, such as time, location [30], budget [31], and sensing data. If the user does not collect the sensing data at the required time and location or sensing data are not in the qualified range, we believe that the user's sensing data are not credible (i.e., unqualified). After removing the unqualified user's data, the qualified user's sensing data will be submitted to the server to calculate the ground truth and weight. We also design a secure truth discovery scheme, which uses secret sharing technology and key agreement protocol and can still work when some users drop out. Moreover, our truth discovery can ensure that other parties cannot obtain users' sensing data except users themselves. Finally, we calculate every user's data quality according to the weight and distribute the reward.
In summary, the main contributions of this paper are as follows: (i) We introduce a privacy-preserving interval judgment scheme to remove users who provide unreliable data before performing the truth discovery scheme. Removing unqualified users in advance can greatly improve the quality of the sensing data used in the truth discovery scheme, improve the accuracy of results, and save the reward budget.
(ii) We introduce a secure truth discovery scheme so that our incentive mechanism model can obtain the ground truth and the weight of each user's data while protecting the user's privacy. en, we design a reasonable reward distribution scheme based on the data weight of users. Moreover, our incentive mechanism model can allow users to drop out at any time. (iii) Analysis shows that our model is secure. Also, experimental results demonstrate that our model has high performance and can achieve reasonable reward distribution. e remainder of this paper is organized as follows. In Section 2, we describe the problem statement. In Sections 3 and 4, we introduce cryptography primitives and intuitive technology in our model. en, we discuss PAID in detail in Section 5. Next, Sections 6 and 7 carry out the analysis and performance evaluation. Finally, we discuss the related work and conclude the paper in Sections 8 and 9.

Problem Statement
In this section, we introduce the background of truth discovery and our system model. en, we describe the threat model and our design goals. Table 1 summarizes the main notations in this paper.

Truth Discovery.
Truth discovery [32] is widely used in the MCS system to solve the conflicts between sensing data collected from multiple sources. Although the methods of estimating weights and calculating ground truth are different, their general processes are similar. Specifically, truth discovery initializes a random ground truth and then iteratively updates the weight and ground truth until convergence.

Weight Update.
Suppose that the ground truth of the object is fixed. If the user's sensing data are close to the ground truth, a higher weight should be assigned to the user. e weight w i of each user u i can be iteratively updated as follows: We use U to represent the set of users, and |U| is the number of users in the set U. e sensing data collected by the user u i are denoted as x i , in which i is the number of u i , and x * is the estimated ground truth.
e final ground truth x * is obtained by iteratively running the weight update and the truth update until the convergence condition is satisfied.

System
Model. Similar to the general MCS system, our PAID comprises three entities: a task publisher (TP), a server (S), and users. In our PAID, the TP publishes tasks and requirements to S and gets the ground truth of the object from S. e server S recruits adequate users and removes the users who provide unqualified data. After receiving the sensing data of all users, S performs the truth discovery scheme and gets the ground truth and the weight of each user. To prevent the TP from refusing to pay the reward, we require the TP to prepay the reward to S as a guarantee. After getting the weight of each user, the server S calculates the data quality and distributes the rewards. Users collect sensing data and earn monetary rewards by providing qualified data. Moreover, our PAID can protect users' privacy of time, location, identity, and sensing data. Unlike general MCS models, in our PAID, the TP and S can only get the aggregated result instead of users' sensing data. Figure 2 shows the flow of our PAID. e specific process of our model is as follows.
(1) Task Publish. e TP publishes a sensing task to S, including sensing objects, data eligibility requirements, and budget.
(2) User Recruitment. e server S broadcasts the sensing task and recruits participating users.
(3) Eligibility Assessment. e server S judges whether every user's sensing data meet qualification requirements. (4) Prepayment. e TP prepays S monetary reward to avoid the denial of payment attack. (5) Submission Notification. e server S notifies qualified users to submit sensing data.
Budget constraint of a sensing task (pk T , sk T ) Key pair of public-key encryption Enc(P, pk T ) (IND-CPA) Public-key encryption function C � Enc(P, pk T ), where P is a plaintext Dec(C, sk T ) Public-key decryption function P � Dec(C, sk T ), where C is a ciphertext SEnc(P, k i ) Symmetric encryption function C � SEnc(P, k i ), where k i is the key SDec(C, k i ) Symmetric encryption function P � SDec(C, k i ) π A reward control parameter q i e data quality of the user u i , and q i � (w i / u i ∈U 6 w i ) q Mean of data quality, and q � ( u i ∈U 6 q i /|U 6 |) � (1/|U 6 |) p i Monetary reward of the user u i , and  Security and Communication Networks (6) Data Submission and Eligibility Confirmation. Users submit the masked sensing data to S. And the server S needs to confirm whether the submitted sensing data are qualified to prevent malicious users from tampering with the data. (7) Deviation Elimination. e server S removes users who tamper with their sensing data and eliminates the deviation of data aggregation caused by these dropped users. (8) Secure Truth Discovery. e server S calculates the ground truth and weight of each user by performing the security truth discovery scheme. (9) Reward Distribution. e server S calculates the data quality of each user and distributes the rewards. (10) Task Completion. e server S sends the ground truth of the sensing object to TP.

reat Model.
In this section, we mainly consider the potential threats from TP, the server S, and users. We suppose that TP is dishonest. After getting data from S, TP may launch a denial of payment attack (DoP) and refuse to pay rewards. e server S is considered as honest-but-curious [33,34]. Specifically, the server S follows the agreement execution instructions, but it also attempts to spy on users' private data. In other words, the server S may launch inference attacks (IAs) on the users' private data.
We assume that users are untrusted. Some malicious users may provide erroneous data and launch a data pollution attack (DPA). Besides, untrusted users may forge multiple identities and initiate a Sybil attack (SA), to earn more monetary rewards.

Design Goals.
In this section, we introduce the design goals of our PAID, which are divided into privacy and security goals and property goals. e privacy goals can protect the user's private data, and the security goals can avoid malicious attacks. e details are as follows.
(i) Privacy Goals. PAID can protect user's location privacy, data privacy, and identity privacy. Specifically, the location and sensing data of a user cannot be obtained by any other parties except the user himself. And users' real identities would not be disclosed when performing a sensing task. (ii) Security Goals. In our PAID, users can avoid the denial of payment attack (DoP) of TP. e server S cannot initiate an inference attack (IA) on users. e server S can resist the data pollution attack (DPA) launched by malicious users. And our PAID guarantees fairness by resisting the Sybil attack (SA).
Our PAID also requires the following property goals.
(i) Eligibility. If users' data do not meet the eligibility requirements, they cannot pass the eligibility assessment. In other words, the sensing data adopted by our PAID must be eligible. (ii) Zero Knowledge. When the server S assesses whether users' data meet the eligibility requirements, it cannot obtain the content of users' private data. (iii) Payment Rationality. Each user can get non-negative utility as long as the user provides qualified data. (iv) Budget Rationality. e total monetary reward paid by the TP does not exceed the budget constraint.

Preliminaries
In this section, we review the cryptographic primitives used in our PAID.

Secret Sharing.
We use Shamir's t-out-of-N secret sharing protocol [35], which can split each user's secret s into N shares, where any t shares can be used to reconstruct s. Still, it is impossible to get any information about s if the shares obtained by attackers are less than t.
We assume that some integers can be identified with distinct elements in a finite field F, where F is parameterized with a size of l > 2 k (in which k is the security parameter). ese integers can represent all users' IDs, and we use a symbol U to denote the set of users' IDs. en, Shamir's secret sharing protocol consists of two steps as below.
(i) Shamir.share (s, t, U) ⟶ (u i , s i ) u i ∈U : the inputs of the sharing algorithm are a secret s, a threshold t ≤ |U|, and a set U of N field elements denoting the users' ID, where |U| � N. It outputs a set of shares s i , each of which is associated with its corresponding the user u i . (ii) Shamir.recon( (u i , s i ) u i ∈M , t) ⟶ s: the inputs of the reconstruction algorithm are the shares corresponding to a subset M⊆U and a threshold t, where t ≤ |M|, and it outputs the secret s.
Correctness requires that ∀s ∈ F, ∀t, N with Security requires ∀s, s ′ ∈ F and any M⊆U with t > |M|. We have where " ≡ " indicates that the two distributions are indistinguishable.

Key Agreement.
We utilize the Diffie-Hellman key agreement called SIGMA [36] in our PAID to generate a session key between two users. Typically, SIGMA is described in three parts as follows.
(i) KA.param(k) ⟶ (G, g, q, H): the algorithm's input is a security parameter k. It samples a group G of prime order q, along with a generator g and a hash function H, where H is set as SHA-256 for practicability in our model. (ii) KA.gen(G, g, q, H) ⟶ (x, g x ): the algorithm's inputs are a group G of prime order q, along with a generator g and a hash function H. It samples a random x←Z q and g x , where x and g x will be marked as the secret key SK i and the public key PK i in the following sections.
⟶ s i,j : the algorithm's inputs are the user u i 's secret key x i , the user u j 's public key g x j , signed signature sign j (g x i , g x j ), and MAC k v (u j ) from the user u j , where k v is used as the MAC key. It outputs a session key s i,j between user u i and user u j . For simplicity, we use KA.agree(x i , g x j ) ⟶ s i,j to represent the above process in the following sections.
Correctness requires that KA.agree(SK i , PK j ) � KA.agree (SK i , PK j ) for any private and public key generated by the users u i and u j if two users use the same parameters. Security requires that the shared key s i,j is indistinguishable from a uniformly random string for any adversary who is given public keys PK i and PK j (but do not have the corresponding secret keys SK i and SK j ).

Paillier Cryptosystem.
e Paillier cryptosystem [37] is a probabilistic public key cryptosystem. It consists of three parts as follows.
(i) Paillier.gen(N, g) ⟶ (sk p , pk p ): the key distribution algorithm inputs are a number N and g←Z * N 2 , where N is the product of two large primes p, q. It outputs a secret key sk p and a public key pk p , where pk p is computed by (N, g), and sk p � lcm(p − 1, q − 1). (ii) Paillier.enc(m, pk p ) ⟶ c: the encryption algorithm inputs are a plaintext m (which m < N) and a public key pk p . It outputs a ciphertext c. (iii) Paillier.dec(c, sk p ) ⟶ m: the decryption algorithm inputs are a ciphertext c (which c < N 2 ) and a secret key sk p . It outputs a plaintext m.
e Paillier cryptosystem has the property of homomorphic addition.
We assume that E is an encryption function.

Technical Intuition
In this section, we first introduce how the interval judgment scheme can judge users' data eligibility while protecting users' privacy. en, we notice that truth discovery mainly involves the aggregation of multiple users' data in a secure manner. erefore, we require that the server S only get the sum of users' input, not content. And we propose a doublemasking scheme to achieve this goal.

Interval Judgment Scheme for Privacy Protection.
In our PAID, we use the interval judgment scheme [38] based on the Paillier cryptosystem to determine the sensing data eligibility. Every user u i provides sensing data x i , and the server S provides a continuous integer interval [y 1 , y 2 ] (y 1 , y 2 ←Z * ). e server S can judge whether the user u i 's sensing data x i meet the interval range [y 1 , y 2 ] without knowing the data x i . e user u i also cannot obtain any information about the integer interval. e scheme is divided into four steps as follows.
(i) e user u i gets (pk p , sk p )←Paillier.gen(N, g) and then u i computes E(x i ) using pk p and sends it to S. (ii) e server S picks two random numbers k, b (k, b←Z * ) to construct a monotone increasing (or and sends them to u i . (iii) After receiving the information from the server S, the user u i gets f(x i )←Paillier.dec(c, sk) and then compares the size of f(y 1 ), f(y 1 ), and f(x i ). Next, the message is sent to the server S. (iv) After receiving the message from u i , the server S judges whether f(y 1 ) < f(x d ) < f(y 2 ). If so, we can know x i ∈ [y 1 , y 2 ] because of the monotonicity of the function f(x i ) � kx i + b, i.e., the user u i passes the data eligibility assessment. Otherwise, the user u i fails to pass the eligibility assessment of the server S.
It should be noted that since the user u i does not know the monotonicity of the function f(x) � kx + b, it is impossible to infer whether the data x i are in the range of the interval [y 1 , y 2 ] from the size relationship. For simplicity, we formulate the above process as an interval judgment function denoted by ins(x i , y 1 , y 2 ). If the user u i passes the eligibility assessment of the server S, ins(x i , y 1 , y 2 ) � 1; otherwise, ins(x i , y 1 , y 2 ) � 0.

One-Masking Scheme.
Assume that all users are represented in sequence as integers 1, n. And any pair of users (u i , u j ), i < j, agrees on a random value r i,j . Let us add r i,j to the user u i 's data x i and subtract r i,j from the user u j 's data x j to mask all users' raw data. In other words, each user u i computes as follows.
where we assume x i and u j ∈U r i,j are in Z R with order R for simplicity. en, each user u i submits y i to the server S, and S computes Security and Communication Networks However, this approach has two shortcomings. e first one is that every user u i needs to exchange the value r i,j with all other users, which will result in quadratic communication overhead (|U| 2 ) if done naively. e second one is that the protocol will fail if any user u i drops out since the server cannot eliminate the value r i,j associated with u i in the final aggregated results z.

Double-Masking Scheme.
To solve these security problems, we introduce a double-masking scheme [39,40]. In the work [40], the double-masking scheme is used for privacypreserving data aggregation. And the scheme in [40] can also protect location privacy and verify the aggregation results. In our model, location privacy protection is implemented by the interval judgment scheme, and our secure truth discovery will confirm the data consistency. e details of the double-masking scheme are as follows.
Every user u i can get a session key r i,j with other user u j by engaging the Diffie-Hellman key agreement after the server S broadcasts all of the Diffie-Hellman public keys. en, we can utilize a pseudorandom generator (PRG) to reduce the high communication overhead by having the parties agree on a common seed instead of the whole mask r i,j .
We use the threshold secret sharing scheme to solve the issue that users are not allowed to drop out. Every user u i can send his secret shares to other users. Once some users cannot submit data in time, other users can recover masks associated with these users by submitting shares of these users' secrets to S, as long as the number of dropped users is less than t (i.e., threshold of Shamir's secret sharing).
However, there is a problem that may lead to users' data leaked to S. ere is a scenario where a user u i is very slow to send data to S. e server S considers that the user u i has dropped and asks for their shares of the user u i 's secret from all other users. en, the server receives the delayed data y i after recovering u i 's mask. At this time, the server S can remove all the masks r i,j and get the plaintext x i .
To improve the scheme, we introduce an additional random seed n i to mask the data. Specifically, each user u i selects a random seed n i on the round of generating r i,j and then creates and distributes shares of n i to all other users during the secret sharing round. Now, users calculate y i as follows: Note that an honest user will never reveal both kinds of shares of the same user to the server S. During the recovery round, the server S can request either a share of r i,j or a share of n i from each surviving user u j . After gathering at least t shares of r i,j for all dropped users and t shares of n i for all surviving users, the server S can eliminate the remaining masks to reveal the sum.

Our Proposed Scheme
In this section, we first provide an overview of our PAID. en, we show the details of the three critical designs in our PAID, including eligibility assessment, truth discovery, and reward distribution. In the eligibility assessment stage, the server S judges whether users' sensing data meet the requirements of a sensing task. In the truth discovery stage, the server S can calculate each user's weight and the ground truth required by the sensing task without knowing their sensing data. In the reward distribution stage, the server S computes the quality of sensing data by each user's weight and then pays a reward to users.

5.1.
Overview. For convenience, we introduce a simple case. We set up a sensing task T to collect the temperature of urban roads in the evening. ere are range requirements for time, location, and sensing data (i.e., temperature). To be more precise, the time range is required to be 5-8 pm on February 3rd, the location range is required to be 12.45-12.55 E and 41.79-41.99 N, and the temperature requirement is 10-15°C. In our PAID, we consider the range requirement as the data eligibility requirement E. e data D i (D i � (x i , τ i , ι i , ι i )) collected by a user u i meet the eligibility requirements E, meaning that 10 ≤ x i ≤ 15, 5 ≤ τ i ≤ 8, 12.45 ≤ ι i ≤ 12.55, 41.79 ≤ ι i ≤ 41.99. Since the data collected by mobile devices are usually rational numbers, in our PAID, we transform the eligible interval into an integer interval by moving the decimal point right. e sensing task T consists of three entities: a task publisher (TP), a server (S), and users. And the specific steps are as follows.
Step 1 (Task Publish). e task publisher TP initializes a public key pk T and a private key sk T , a reward control parameter π (π is a decimal number), a task budget B, the number of users N, and eligibility requirements E for a sensing task T. e public key pk T is used to encrypt the information that the server S needs to send to the TP, and the TP decrypts the ciphertext using the private key sk T . en, the TP sends the information T, pk T , π, N, B, Ε to S as a task request.
Step 2 (User Recruitment). e server S broadcasts the sensing task information T, π, N, B { } and recruits N users who request to participate in the sensing task.
en, S generates a key pair PK i S , SK i S using the key agreement scheme for every user u i and sends PK i S to u i .
Step 3 (Eligibility Assessment). Each user u i confirms whether c i ≤ (B − π/N), where c i denotes the sensing cost of u i , and the posted lowest reward is denoted as (B − π/N). If c i ≤ (B − π/N), and u i starts the sensing task and collects the data D i . e user u i then generates a key pair PK i , SK i using the key agreement scheme and computes a session key k i ←KA.agree(SK i , PK i S ) as u i 's anonymous identity information. en, the user u i performs the interval judgment scheme ins(D i , E) and sends the public key PK i to S. Specifically, ins Step 4 (Prepayment). After recruiting N eligible users, the server S requests TP to prepay a budget reward B for the sensing task T to prevent the denial of payment attack. And the server S calculates the session key k i ←KA.agree(SK i S , PK i ) with the eligible user u i .
Step 5 (Submission Notification). After getting the budget reward B, the server S informs the eligible user u i (1 ≤ i ≤ N) to submit data.
Step 6 (Data Submission and Eligibility Confirmation). After receiving the submission notification, each user u i performs double-masking scheme to mask the sensing data x i and get y i and , at the same time, executes eligibility confirmation ins(D i , E) to prevent malicious users from modifying data. en, u i encrypts the data y i using the symmetric encryption algorithm and sends the ciphertext SEnc(y i , k i ) to S. e session key k i is the key of symmetric encryption.
Step 7 (Deviation Elimination). For users who tamper with data during data submission, the server S regards them as dropped users and discards their data. en, S gets plaintext SDec(SEnc(y i , k i ), k i ) and requests seed n i and the noise r i,j between the dropped user u i and the surviving user u j to eliminate the impact on the aggregate result.
Step 8 (Secure Truth Discovery). e server S computes the surviving user u i 's weight w i and the ground truth x * of the sensing object utilizing the truth discovery algorithm. e detailed algorithm process will be introduced later.
Step 9 (Reward Distribution). e server S calculates the sensing data quality , m is the number of online users. en, S pays a monetary reward p i � (B/m) + π · (q i − q) for u i , where π · (q i − q) denotes the payment parameter, m ≤ N, and 1 ≤ i ≤ m.
Step 10 (Task Completion). e server S encrypts the ground truth x * using pk T and sends Enc(x * , pk T ) to TP. And the TP can decrypt the data using sk T , i.e., x * � Dec(Enc(x * , pk T ), sk T ).
In our PAID, only users who passed the eligibility assessment and eligibility confirmation can obtain the monetary reward. us, users cannot cheat S to get a reward with unreliable data. We can also ensure the quality of the sensing data used by the truth discovery algorithm and obtain more accurate ground truth x * . Moreover, since the TP pays the task reward to S in advance and S will pay a reward to u i according to the quality of u i 's sensing data after the task is accomplished, the TP cannot refuse to pay the reward. Besides, S cannot get users' raw sensing data, time, and location information, which can protect the users' privacy. e anonymous identity of each user is determined by both the user and S. S only assigns one random identity token to each user, so malicious users cannot forge multiple identities.

Eligibility Assessment.
In our PAID, there are three benefits to the design of the eligibility assessment. First, it can prevent users who provide unreliable or erroneous sensing data from receiving monetary rewards, which avoids wasting budgets. Secondly, filtering out unqualified sensing data can improve the accuracy of the sensing task result.
irdly, the data quality q i of each user u i is related to the sensing object's ground truth x * , and inaccurate ground truth will lead to unfair incentives. e process of eligibility assessment and eligibility confirmation is similar.
e purpose of the eligibility assessment is to filter out unqualified users preliminarily. us, the unqualified users do not need to communicate with other users to perform the double-masking scheme, by which the communication overhead can be reduced. e eligibility confirmation is designed to prevent malicious users from altering the original qualified data. e detailed process of eligibility assessment and eligibility confirmation is as follows.
Step 1. Each user u i initializes a key pair (pk p i , sk p i )←Paillier.gen (N, g). en, u i encrypts the sensing data D i using pk p i and sends the ciphertext E(D i ) to S. Generally, E(D i ) consists of four parts: E(x i ), E(τ i ), E(ι i ), and E(ι i ).
Step 2. After receiving E(D i ), the server S picks different random k, b (k, b←Z * ) and constructs a monotone increasing (or decreasing) function f(D i ) � kD i + b for each value in the quadruples (x i , τ i , ι i , ι i ).
e monotonicity of the four functions is inconsistent.
en, S sends f(E l ), f(E r ), c (c � c 1 , c 2 , c 3 , c 4 ) to u i . For convenience, we will not describe D i and E separately in the following text.
Step 3. After receiving f(E l ), f(E r ), c from S, each user u i gets f(D i )←Paillier.dec(c, sk p i ) and then Security and Communication Networks compares the size of f(E l ), f(E r ), f(D i ). Next, the size relationship is sent to S.
Step 4. After the server S receives the information from because of the monotonicity of the functions. And S determines whether u i passes the eligibility assessment. Otherwise, it fails.
Because users do not know the function's monotonicity, they cannot infer the size relationship between the qualified data and eligibility requirement. erefore, we can think that malicious users have a very low probability of passing the eligibility assessment. Moreover, during the eligibility assessment, u i cannot know the specific qualified interval. S also cannot get u i 's sensing data, which can protect u i 's privacy. e above process is represented by ins(D i , E). If u i passes the eligibility assessment, then ins(D i , E) � 1. If not, ins(D i , E) � 0.

Secure Truth Discovery.
In the secure truth discovery scheme [15], data exchange is between users and the server S. e user u i needs to collect sensing data x i , perform the double-masking scheme to mask the raw input data y i (y i � d ist (x i − x * )), and then send the masked input data z i to S. e server S receives masked input data z i from each user u i and aggregates the input data of online users. Each user u i can drop out at any time. As long as the number of surviving users is not less than the threshold t, S can eliminate the deviation caused by dropped users and restore the aggregation results. e detailed process is as follows.
Step 0 (Key Generation). Assume N users submit sensing data in the data submission phase. Given the security parameter k and threshold value t, a trusted third party creates three key pairs for each user u i as follows.
PK s i , SK s i , PK a i , SK a i , PK r i , SK r i ←KA.gen(k), where (PK s i , SK s i ) are used for signature, (PK a i , SK a i ) are used to generate a session key with other users for symmetric encryption, and (PK r i , SK r i ) are used to generate a session key with other users u j as the noise r i,j . en, each user u i signs two public keys using SK s i as ρ i ←sign · (SK s i , PK s i � � � �PK r i ) and sends (ρ i � � � �PK a i � � � �PK r i ) to S. When receiving messages from at least t users (which denotes the surviving users as a set U 1 ⊆U), S broadcasts (u j , ρ j , PK a j , PK r j ) u j ∈U 1 to all users. Otherwise, abort.
Step 1 (Key Sharing). After receiving the information from S, each user u i confirms whether |U 1 | ≥ t; then, u i verifies whether the signature ρ j is valid using the public key PK s j for other user u j . If not, abort. Next, u i selects a random parameter n i ←F and generates shares of n i and SK r i as follows. u j , n j,i u j ∈U 1 ←Shamir.share n i , t, U 1 , en, each user u i generates a session key with other users u j ∈ U 1 \ u i and uses the symmetric authenticated encryption to encrypt two types of shares as follows.
where the symmetric authenticated encryption is indistinguishable under ciphertext integrity attack and chosen plaintext attack. It can ensure the confidentiality and integrity of messages, which are exchanged between two parties. We do not repeat the details here. If any of the above processes fails, abort. Otherwise, each user u i sends T j,i to S. When receiving messages from at least t users (which denotes the surviving users as a set U 2 ), S randomly initializes the ground truth x * and then broadcasts T j,i u j ∈U 2 and x * to all users. Otherwise, abort.
Step 2 (Masking Input Data). After receiving x * and T j,i u j ∈U 2 from S, each user u i confirms whether |U 2 | ≥ t, then computes r i,j ←KA.agree(SK r i , PK r j ) for every user u j ∈ U 2 \ u i , and gets masked input data z 2 i as follows.

Security and Communication Networks
where d ist (x i , x * ) is the input data in the second round, represented by y 2 i for convenience, and z 2 i indicates the masked input data. If any of the above processes fails, abort. Otherwise, each user u i sends z 2 i u i ∈U 2 to S. When receiving z 2 i from at least t users (which denotes the surviving users as a set U 3 ), S sends the list of U 3 to all users. Otherwise, abort.
Step 3 (Consistency Check). After receiving the list of U 3 from S, each user u i confirms whether |U 3 | ≥ t. en, u i calculates the signature ρ i ′ ←sign · (SK s i , U 3 ) and sends it to S. When receiving ρ i ′ from at least t users (which denotes the surviving users as a set U 4 ), S sends u j , ρ j ′ u j ∈U 4 to all users. Otherwise, abort.
Step 4 (Unmasking). After receiving the list of U 4 from S, each user u i confirms whether U 4 ⊆U 3 , |U 4 | ≥ t, and the signature ρ j ′ is valid using the public key PK s j . en, u i decrypts T j,i for users u j ∈ U 2 \ u i as follows.
en, n j,i (u j ∈ U 3 ) and SK r j,i (u j ∈ U 2 \U 3 ) will be sent to S if u i � u i ′ and u j � u j ′ . If any of the above processes fails, abort. After receiving messages from users, S performs the deviation elimination and regards users who modify the data as dropped users, and S discards dropped users' data. e surviving users are then denoted as a set U 5 ⊆U 4 . If |U 5 | ≥ t, the secret key SK r i and masks PRG(r i,j ), u i ∈ U 2 \U 3 can be reconstructed as follows.
Furthermore, the PRG(n i ), u i ∈ U 3 can be reconstructed as follows.
PRG n i ←PRG Shamir.recon n j,i , t u j ∈U 5 . (15) Next, the aggregated results of y 2 i can be calculated as follows.
en, S selects a random positive noise value m to mask the raw aggregation results to prevent users from obtaining weight information.
Next, S sends W result to all users.
Step 5 (Masked Input Generation). After receiving W result from S, each user u i computes r i,j ←KA.agree(SK r i , PK r j ) for every surviving user u j ∈ U 5 \ u i . en, each user u i calculates the masked weight information as follows.
So, the masked input data are denoted as (z 5 ′ i , z 5 ″ i ), where the raw input data are y 5 ′ i � w i + m, y 5 ″ i � (w i + m) · x i . If any of the above processes fails, abort. Otherwise, each user u i sends ( Step 6 (Unmasking). After receiving (z 5 ′ i , z 5 ″ i ) from at least t users (which denotes the surviving users as a set U 6 ), S sends the list of U 6 to all users. Otherwise, abort. en, each user u j ∈ U 6 \ u i decrypts T j,i as follows.
agree SK a i , PK a j , T j,i . (19) en, n j,i (u j ∈ U 6 ) and SK r j,i (u j ∈ U 5 \U 6 ) will be sent to S if u i � u i ′ and u j � u j ′ . If any of the above processes fails, abort.
After receiving the information from at least t users (which denotes the surviving users as a set U 7 ), S restores the secret key SK r i , u j � u j ′ for each user (u j ∈ U 5 \U 6 ) and PRG(n i ) u i ∈U 6 as follows.
en, S can calculate the aggregation results as follows.
Next, S eliminates the random noise value m as follows.
erefore, the current ground truth x * and the weight w i of every user u i ∈ U 6 can be calculated using formulas (1) and (2) as follows.
us, S can get the final ground truth x * and the weight w i of every user u i by repeating steps 0 to 6 until the convergence conditions are met. And the weight w i will be used to calculate the data quality q i of each user u i .

Reward Distribution.
e weight u i calculated by truth discovery can represent the effective contribution of users. Still, to facilitate reward distribution, we need to quantify the data quality q i of every user u i further. en, S can compute the monetary reward p i according to the data quality q i of u i .
To achieve the rationality of reward distribution, we set u i ∈U 6 q i � 1, so the data quality q i of each user u i can be calculated as follows.
Next, we calculate the monetary reward p i of each user u i as follows. And the higher the quality q i of u i 's data, the more reward u i can get.
where q(q � ( u i ∈U 6 q i /|U 6 |) � (1/|U 6 |)) is the average quality of all surviving users. π(0 < π < (B/N)) represents the reward control parameter, which is a small rational number. e function of π is to ensure that the reward p i is non-negative. And |U 6 | is the number of surviving users. Since |U 6 | ≤ N, we can know that the lowest reward p i which a user can get is (B − π/N). When the number of final online users |U 6 | � N, each user u i 's reward is p i � (B/N) + π · (q i − q). If some users dropped out and |U 6 | < N, S will distribute the task budget to each surviving user u i ∈ U 6 , and each user 's reward is erefore, our reward distribution formula is applicable regardless of whether there are users offline.

Analysis
In this section, we introduce property analysis, privacy analysis, and security analysis to illustrate the feasibility of our PAID.

Property Analysis.
In this section, we introduce eligibility, zero knowledge, payment rationality, and budget rationality of our PAID.

Theorem 1. (eligibility)If the data
collected by users do not meet the eligibility requirement E, these users cannot pass the eligibility assessment.
Proof. We assume that the user's data are denoted as s, and the eligibility requirement interval is [a, b]. e user gets ciphertext E(s) using homomorphic encryption. en, S picks different random k, b and constructs a monotone increasing (or decreasing) function f(x) � kx + b. en, S  computes f(a), f(b), and c � E(s) k E(b) � E(ks + b). When receiving f(a), f(b), c from S, the user decrypts c to get f(s) and compares the sizes of f(a), f(b), f(s). Because the user does not know the monotonicity of the function, it is impossible to determine the size relationship among the three numbers. erefore, if the user's data are not qualified, then it cannot pass the qualification judgment. Proof. Similar to the description in eorem 1, we assume that the user's data are s, and the server S can receive the user's homomorphic encrypted ciphertext E(s). Since the Paillier cryptosystem is indistinguishable under the chosen plaintext attack, a malicious user has no way to recover the plaintext s. e server S may be curious about each user's data, but it cannot obtain each user's data s without knowing the secret key. Proof.
e utility ut i of each user u i is determined by the cost of u i and the real reward from task publisher TP, i.e., ut i � p i − c i . If the data provided by an untrusted user u i are not qualified, u i cannot pass the eligibility assessment, so the untrusted user's utility ut i � 0. However, when c i > (B − π/N) ((B − π/N) is the posted lowest pricing), an honest user u i will refuse to participate in the sensing task, so the trusted user's utility ut i � 0. When c i ≤ (B − π/N), an honest user u i will participate in the sensing task and earn a reward p i � (B/m) + π · (q i − q). Since q � m i�1 q i /m � 1/m, 0 < q i < 1, and m ≤ N, we have erefore, we can know that p i − c i > 0. To summarize, a user's real reward is always non-negative. Proof.
e total rewards for all users are calculated as follows.
Hence, m i�0 p i ≤ B, i.e., our PAID is budget rational. □ 6.2. Privacy Analysis. In this section, we demonstrate the protection of user's sensing data, location, and identity privacy in our PAID.

Theorem 5. (data and location privacy protection)Except for the user himself, other parties cannot obtain the user's sensing data and location data.
Proof. In PAID, the objects that steal users' data and location privacy are mainly the server S and external attackers. Specifically, the server S may obtain users' sensing data and location privacy in eligibility assessment and truth discovery. External attackers steal data and location privacy by eavesdropping on the communication between the server S and users. According to eorem 2, we can know that our PAID has the property of zero knowledge, so the server S cannot learn users' sensing data and location data in the eligibility assessment. In truth discovery, users' sensing data are sent to S after performing the double-masking scheme. However, the server S cannot recover users' raw sensing data by double-masking sensing data. Furthermore, before the communication between the user u i and S, the data are encrypted by AES symmetric encryption function SEnc(y i , k i ).
erefore, as long as SEnc(y i , k i ) is secure, external attackers cannot steal the data y i by eavesdropping communication.
□ Theorem 6. (identity privacy protection)When users participate in a sensing task, they use an anonymous identity rather than their real identity. erefore, any PPT adversary cannot distinguish the users' identities.
Proof. In PAID, the anonymous identity of a user u i is represented by k i ←KA.agree(SK i , PK i S ), and the real identity of u i is SK i where SK i � x i ←Z q , and PK i S � g x i S (PK i S is a token assigned by S). e user u i uses an anonymous identity k i rather than a real identity SK i to participate in a sensing task. Because of the DDH problem, the PPT adversary cannot get the real identity SK i of the user u i by the anonymous identity k i . We omit the detailed proof, and interested readers can learn more details in the literature [36].
□ Security and Communication Networks 6.3. Security Analysis. In this section, we describe the attacks our PAID can resist, including denial of payment attack (DoP), inference attack (IA), data pollution attack (DPA), and Sybil attack (SA).
(1) Resistance to Denial of Payment Attack (DoP). We use the prepayment mechanism in our PAID. At the beginning of a sensing task, the task publisher TP pays the monetary rewards of users to S in advance. If a malicious TP refuses to pay the monetary reward after receiving the data, S can pay the reward to users according to the reward distribution formula. erefore, the TP cannot refuse to pay users the reward.
(2) Resistance to Inference Attack (IA). e server S cannot initiate an inference attack on users' data due to the zero-knowledge property of our PAID.
(3) Resistance to Data Pollution Attack (DPA). Our PAID introduces eligibility assessment, and the unqualified data submitted by users are not used in the truth discovery algorithm. erefore, our PAID can resist the data pollution attack (DPA). (4) Resistance to Sybil Attack (SA). e anonymous identity k i of a user u i needs the information PK i provided by the user and the token PK i S assigned by S. Each user can only obtain one token from S and then get the anonymous identity k i using the key agreement algorithm. Hence, untrusted users cannot forge vast fake identities to launch the Sybil attack (SA).

Performance Evaluation
In this section, we use a temperature dataset from Roma for performance evaluation. First, we describe the computational and communication overhead of the eligibility assessment.
en, we show the performance of the truth discovery algorithm. Finally, the comparison with the related work shows that the quality quantification and incentive mechanism are effective.
In our experiment, the server has Intel(R) Xeon(R) E3-1231v3 3.4 GHz CPU, 16 GB RAM, 256 GB SSD, and 1 TB mechanical hard disk and runs on Ubuntu 18.04 operating system. ese mobile devices are equipped with Android system with 2.2 GHz CPU and 4 GB RAM.
e Roma temperature dataset includes users' ID, date, time, longitude, latitude, and temperature. In particular, the range accuracy of location, time, and sensing data (temperature) is 1 meter, 1 second, and 0.01°C, respectively. Before performing the eligibility assessment, we convert the decimal interval to the corresponding integer interval by moving the decimal point to the right. Figure 3 shows the statistical results of 232 qualified users. And we select 100 data from all qualified data for performance evaluation.

Evaluation of Eligibility Assessment.
In this section, we analyze the computational and communication overhead in the eligibility assessment. Table 2 shows the performance comparison between our PAID and related work.

Computational Overhead.
e Paillier homomorphic encryption requires two exponents (exp), one multiplication (mul), and one modular operation (mod). One decryption operation needs to perform two exponents (exp), three divisions (div), and two modular operations (mod). And in our interval judgment scheme, the user u i needs to perform one encryption and one decryption, so the computational cost of the user is 4n · exp + n · mul + 3n · div + 3n · mod, where n is the number of users. e server needs to perform one encryption E(b) and calculates c � E(x i ) k E(b), so the computational overhead of the server is 3n · exp + 2n· mul + 2n · mod. Consequently, the total computational overhead is n · (7 · exp + 3 · mul +3 · div + 5 · mod) and the computation complexity of the interval judgment scheme is O(n).

Communication
Overhead. According to our interval judgment scheme, users need to send encrypted data E(x i ) to the server S, and the communication overhead is ‖N 2 ‖ bits, where N is the product of two large primes p, q. After receiving the encrypted data E(x i ), the server S calculates c � E(x i ) k E(b) and sends it to the user. e communication overhead is ‖c‖ bits, where ‖c‖ denotes the bit length of ciphertext. So, we can conclude that the total communication overhead is ‖N 2 ‖ + ‖c‖ bits.

Evaluation of Truth Discovery.
In this section, we select 100 users to participate in the performance comparison of truth discovery. We compare the truth discovery of our PAID with the related word from five aspects, including accuracy, convergence, robustness to users dropping out, computational overhead, and communication overhead. e evaluation results show that our truth discovery algorithm has good accuracy, quick convergence, and high robustness to users dropping out. Besides, the computational overhead and communication overhead of our algorithm are better than those of the related work. erefore, our truth discovery algorithm is reasonable.

Accuracy.
We utilize the root of mean squared error (RMSE) to measure the resulting accuracy between PAID and CRH [32]. Figure 4 shows that the accuracy rates of PAID and CRH are similar when different numbers of users participate in a sensing task.

Convergence.
To prove the convergence ability of our truth discovery algorithm in PAID, we choose four different initial values to calculate the error rate of ground truth. As shown in Figure 5, our PAID can converge quickly in a few iterations when choosing different initial values.

Robustness to Users Dropping Out.
To analyze the robustness of our PAID to dropped users, we count the number of PAID failures and compare with related work PPTD [41]. Failure means that the model cannot continue to run and have to restart because of users' exit. In the PPTD, it is considered as a failure once a user quits in the whole truth discovery process. In our PAID, it is deemed to be a failure only when the number of online users is less than the threshold t (t � 25 in our experiment). And we repeat the experiment 50 times to count the failure times of the two models. Figure 6 shows the failure times of the two models when different users participate in a sensing task. We can know that the number of PPTD failures increases as the number of users increases. However, as long as online users' number is greater than the threshold, our PAID is robust to dropped users.  n · (7 · exp + 3 · mul + 3 · div + 5 · mod) ‖N 2 ‖ + ‖c‖ [20] n · (10 · exp + 5 · mul + 5 · comp + 5 · mod) 3‖p‖ + 2‖c‖ Note. p is a large prime. And exp, mul, div, comp, and mod represent one exponent arithmetic, one multiplication, one division, one comparison operation, and one modular arithmetic.

Computational Overhead.
We compare the computational overhead of PAID and PPTD [41]. Figure 7 shows the running time of the two schemes for different users. It is evident that the running time of our PAID is far less than that of PPTD.

Communication Overhead.
We count the communication overhead of users in a complete iterative process and compare our scheme with PPTD [41]. And we do not count the server's communication overhead because we can regard the total communication overhead of all users as the communication cost of the server. Table 3 shows that the communication overhead of our PAID is far less than that of PPTD, although the number of users is different.

Evaluation of Incentive Mechanism.
In this section, we compare the monetary rewards of our PAID and related work. In the experiment, we select 100 users, including 80 qualified users and 20 unqualified users. And the budget B � 100, π � 0.3. DQTE [42] is a scheme that includes unqualified users in reward distribution, while DQTE+ removes unqualified users before reward distribution. As Figure 8 shows, users in DQTE get almost the same rewards. Although DQTE+ removes unqualified users, there is no obvious difference for users' rewards except for the increase in each user's monetary rewards. However, our scheme can provide higher monetary rewards for users who submit higher quality data. erefore, our scheme can effectively motivate users to provide high-quality sensing data.

Related Work
Truth discovery is an effective technology that can calculate the ground truth and users' data quality from conflicting sensing data. Li et al. [32] proposed a general truth discovery scheme, but privacy protection is not in their work scope. To protect users' privacy data, Miao et al. [41] proposed the first privacy-preserving truth discovery scheme using the Paillier cryptosystem, but the computational and communication costs are huge. Zheng et al.   [43] designed a privacy-aware truth discovery, which greatly reduced the computational and communication overhead through a secure sum protocol. Zhang et al. [44] designed a truth discovery scheme using a one-way hash chain to ensure privacy security, and all truth discovery operations are completed by fog and cloud platforms. Tang et al. [45] used two servers to complete the calculation process of truth discovery, which can effectively protect users' sensing data privacy. However, these works do not take into account the failure of the MCS system caused by users' exit. Bonawitz et al. [39] proposed a double-masking scheme for secure data aggregation, and this scheme allows users to exit. After that, Xu et al. [15] designed a privacypreserving truth discovery scheme based on the doublemasking scheme. However, these truth discovery schemes do not incorporate incentive mechanisms. If malicious users constantly input erroneous data, it will affect the reliability of the results in the MCS system.
Another previous work [42,46] related to this paper is the incentive mechanism in the MCS system. Zhang et al. [47] presented a reverse auction model which can motivate online users to participate in sensing tasks. Jin et al. [16] designed an incentive mechanism model based on reverse combinatorial auctions, which can maximize social welfare and effectively motivate users. Yang et al. [42] introduced a quality-aware incentive mechanism, which can distribute rewards to users after calculating the data quality. However, these works do not consider the privacy of users. In [27], the authors designed a privacy-preserving incentive mechanism model. Nevertheless, these solutions can not eliminate users who provide error data. Zhao et al. [20] presented an incentive mechanism model to evaluate the reliability of users' data while protecting data privacy. Still, the user's sensing data need to be submitted to the task publisher, so the privacy protection of sensing data is still insufficient. Later, Zhao et al. [48] proposed a privacy-preserving incentive mechanism based on truth discovery. is model uses two servers to achieve real-time reward distribution while protecting users' privacy. However, most existing works do not take users' exit into account.

Conclusion
In this paper, we propose a privacy-preserving incentive mechanism based on truth discovery in the MCS system. Specifically, we introduce an eligibility assessment scheme to estimate whether the data submitted by users are qualified. Next, the truth discovery scheme calculates the ground truth and the weight of each user. en, we quantify the data quality of users by the weight and distribute the rewards. Besides, we also demonstrate that PAID meets eligibility, zero knowledge, payment rationality, and budget rationality. And the analysis shows that our PAID can resist the denial of payment attack, inference attack, data pollution attack, and Sybil attack. Finally, experiments illustrate that PAID is effective, efficient, and robust to dropped users. In future work, we will design an incentive mechanism model for the application of multidimensional sensing data collection.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.