A Collusion-Resistant and Privacy-Preserving Data Aggregation Protocol in Crowdsensing System

With the pervasiveness and increasing capability of smart devices, mobile crowdsensing has been applied in more and more practical scenarios and provides a more convenient solution with low costs for existing problems. In this paper, we consider an untrusted aggregator collecting a group of users’ data, in which personal private information may be contained. Most previous work either focuses on computing particular functions based on the sensing data or ignores the collusion attack between users and the aggregator. We design a new protocol to help the aggregator collect all the users’ raw data while resisting collusion attacks. Specifically, the bitwise XOR homomorphic functions and aggregate signature are explored, and a novel key system is designed to achieve collusion resistance. In our system, only the aggregator can decrypt the ciphertext. Theoretical analysis shows that our protocol can capture k-source anonymity. In addition, extensive experiments are conducted to demonstrate the feasibility and efficiency of our algorithms.


Introduction
Recently, smart devices and wireless network have a rapid development.Smart devices, such as smart phone, pad, and smart watch, have become ubiquitous all over the world.They have not only strong and independent computational capability but also rich embedded sensors.The advance of wireless communication technology further makes them connect more tightly, which can be leveraged to develop more applications.People can use the rich embedded sensors to collect different kinds of data, including pictures, sounds, and videos.The strong ability and lower cost of this system derive a new popular paradigm named crowdsensing.
In a typical crowdsensing application, the server, or the aggregator, recruits a group of users to work for him.Having been informed about their sensing work by the aggregator, all the users use their devices to collect data with relevant sensors and upload them to the server through Wi-Fi or 3G/4G network.Recently, a myriad of crowdsensing applications have been developed in different areas such as transportation [1], environment monitoring [2], healthcare [3], and social network [4].In this paper, we consider that an aggregator wants to periodically collect data and computes some functions based on them to obtain the desired information.For example, to monitor the health situation of a particular district, the aggregator recruits some users in this district to collect their body temperature or blood oxygen.The users need to contribute their data each hour by sensing relative data and uploading them.
However, most similar applications require users to upload their private data, which may breach individual's security and privacy.Concerned about these threats, users tend to refuse to participate in crowdsensing.Therefore, the user's security and privacy should be protected in a crowdsensing system.Lots of previous works [5][6][7][8][9][10][11] have focused on the challenge.Specifically, [5] allows the server to evaluate any multivariate polynomials.However, users need to communicate to generate their encryption key.In [7], the aggregator can only acquire the summation of all the raw data.Both of them did not consider the collusion between users and the aggregator.And [6] gives a solution by using more complex encryption keys.Unfortunately, the protocol requires  + 1 rounds of key exchange when  colluding adversaries exist, and [8,9] focus on multimedia data collection and require data interchange.Li et al. [10] proposed a novel key system to resist collusion attack, but it only supports sum and min aggregation.Zhang et al. [11] first proposed a scheme where the aggregator can acquire all the raw data; thus different functions can be computed in one round.Each user's privacy is protected by delinking data from its source.Thus, the only information the aggregator knows is that a particular data belongs to one of  users, which is called -source anonymity.Nevertheless, there are still some problems in [11].Specifically, each user owns half part of another user's secret keys and cannot resist collusion attack.The aggregator does not have decryption keys.Therefore, the outside adversary can decrypt the ciphertexts to get all the data if it can eavesdrop all the ciphertexts, which violates the aggregator's benefits.
In this paper, we propose a novel protocol to not only support different aggregation functions but also achieve collusion resistance.In each time period, we use the timestamp as a public parameter.The bitwise XOR homomorphic function is executed in the encryption phase.All the users take their encryption keys and the timestamp as the parameters of encryption functions to generate pseudorandom bit strings as a one-time pad to encrypt the raw data.To prevent the collusion attack, a novel encryption algorithm is designed.The aggregate signature is also taken to protect data integrity and achieve identity authentication.
Our contributions contain three parts: (i) We propose a novel protocol to protect users' privacy and resist collusion attack when the aggregator can obtain all the users' raw data and compute any functions based on them.We assume that the aggregator and a fraction of users are not reliable and may collude with each other, and they still cannot obtain any valuable information.
(ii) We protect the aggregator's benefits by preventing outside adversary from decrypting the ciphertexts.All the ciphertexts are also guarded against being tampered by applying aggregate signature.If any abnormal data is found, the aggregator can require the trust party to get involved according to the signature.
(iii) We prove that our protocol can achieve -source anonymity.Theoretical analysis shows the computational cost of our protocol.In addition, extensive experiments are also conducted to demonstrate that the protocol can be executed efficiently.
The remainder of this paper is organized as follows.Section 2 discusses related work.In Section 3, we present our system model, security model, and design goals.After the introduction of preliminary knowledge in Section 4, we elaborate our aggregation scheme in Section 5 and prove the security of the protocol in Section 6. Section 7 shows our experiment result.Finally, we conclude our paper in Section 8.

Related Work
The data aggregation issues are first discussed in wireless sensor networks (WSN) [12][13][14][15][16].Although there are many differences between WSN and crowdsensing, the work about WSN still gives inspirations to solve problems in crowdsensing.Works in [17,18] consider the user recruitment and incentive in crowdsensing.The papers [19][20][21] assume a trust server although they are devoted to protect users' privacy.References [5][6][7][8][9]22] contribute to overcoming the challenge when the aggregator is untrusted.However, bidirectional communication between users is required in these schemes, which is a strong assumption in crowdsensing system.Jung et al. [5] first proposed their product protocol and sum protocol and combined them to evaluate any multivariate polynomials.In the product protocol, each user interchanges his/her public parameter with his/her left and right user while all the users are arranged in a circle.With two public parameters and the secret key, the user can derive a pseudorandom number to execute encryption operations.In the sum protocol, they use modular property to compute summation efficiently.However, the pseudorandom number can only be used once and may breach the privacy if being used in several rounds.Therefore, in each round the setup phase should be executed where users have to communicate with the other two.The collusion between users is also a security issue.Jung et al. [6] tried to solve the problem by using more complex ways to generate pseudorandom number, correspondingly more rounds of key interchange are needed, and Jung et al. 's scheme only supports particular aggregation functions.
In [10,23], the privacy-preserving aggregation protocol is proposed while communication within users is not required.Zhang et al. [23] allowed the aggregator to obtain the minimum or the th minimum value of all the data without knowing them, and they assumed a semihonest aggregator and only supported the single aggregation function.Li et al. [10] proposed a scheme to resist collusion by using a novel key management system.The key dealer generates a key set which contains hundreds or thousands of elements.Then, the set is divided and distributed to the users and aggregator, and each of them owns multiple secret keys.The ability to resist collusion attack relies on the adversary to guess an honest user's keys correctly from a big set.However, this scheme can only support sum and min aggregation.
Different from the schemes which protect privacy by hiding content, Zhang et al. [11] delink all the users' data with source, through which the aggregator can collect all the raw data while it remains unaware of their corresponding owners.Thus, the aggregator can compute any complex aggregation functions on them.Each user has two keys to encrypt data, and he/she shares each of them with the other two users.Therefore, the aggregator can decrypt all the ciphertexts without decryption keys when the bitwise XOR homomorphic functions are employed.However, this paper ignores the collusion attack and cannot protect the aggregator's benefits, because an honest user's secret key can easily be recovered and the aggregator has the same ability with outside adversary without decryption keys.In this paper, we propose a novel protocol to solve the issues which are not dealt with in [11], while protecting the data integrity and achieving authentication and traceability.

Models and Design Goal
3.1.System Model.Our system is comprised of three parties:  participants, an aggregator, and a trust authority (TA).Assuming that there are  participants in this system who want to contribute data to the aggregator and get corresponding reward from the aggregator, the aggregator is willing to collect data and compute some functions on these data including addition and production aggregation.Only oneway communication channel is needed from participants to the aggregator, which could be 3G/4G, Wi-Fi, or other kind of channels supported by our system parties.We show our system model in Figure 1 and describe the details as follows: (i) TA.The TA is responsible for initializing the whole system, which includes registering the aggregator and participants, generating and distributing keys, and revealing and revoking the malicious participants.
Once the system initialization phase is finished, the TA is off-line in all the phases except for the occurrence of abnormal behavior.(ii) Participant.The participants may be mobile users who hold smartphones with various sensors or vehicles with built-in sensors.They wish to sense data and upload them to the aggregator periodically to get reward.Assuming that there are  participants in our system, which can be numbered as  1 ,  2 , . . .,   , they collaborate to push data to the aggregator in each time period, for instance, fifteen minutes per time, which can be listed as  1 ,  2 , . . .,   .Peer-to-peer communication is not required among participants.
In the remainder of this paper we will interchangeably use the same meaning for the user and the participant.
(iii) Aggregator.The aggregator periodically collects the participants' data and uses them to compute arbitrary aggregation functions.The aggregation result can be leveraged to get commercial benefits.The data can be time series data, location based data, or any other kind of predefined numerical data.

Security Model.
Because the collected data may include users' sensitive information, we mainly focus on the participants' privacy in our security model.Any adversary should not link data with the real data owner, such that the users' privacy will not be compromised, even if any internal adversaries, namely, the malicious users and the aggregator, collude to snoop into the privacy.Meanwhile, if the abnormal data is detected, the TA is involved to reveal the malicious users' real identities and revoke them from the system.
(i) TA.In our system, we assume the TA is fully trusted and cannot be compromised.The communication channel between the TA and participants is secure or can be protected by cryptographic tools.
(ii) Participant.The participants honestly execute the protocol, but they are curious about other participants' data.This assumption is based on the fact that some users can leverage others' valid data to receive reward instead of collecting data by their own, which may consume computation resource, battery, and other resources.A fraction of malicious users may also collude to recover valid users' secret keys, which compromises users' privacy.Another abnormal behavior is data pollution; that is, some users deliberately upload incorrect data to the aggregator, leading to wrong aggregation result.Many previous studies have focused on this issue but cannot solve the problem perfectly.Thus, in this paper, we provide traceability to identify the malicious users when the abnormal data is found in each round.
(iii) Aggregator.The aggregator is curious but honest.The aggregator can collect all the users' data and has more abilities to breach users' privacy.The untrusted aggregator can also collude with some malicious users to recover users' secret keys and thus link the data with its owner.Other parties may eavesdrop all the users' data to compute the aggregation, thus causing the aggregator's monetary loss.

Design Goals.
Under our system model, our design goal is to develop a framework to hold the security properties.We not only protect the privacy of users but also resist against collusion attack launched by internal parties and other attacks such as message tampering launched by outside adversaries.Specifically, the following desirable goals should be achieved.
(i) Protecting Participants' Privacy.When the participants upload their sensing data to the aggregator, we should guarantee not only that outside adversary cannot eavesdrop and tamper the original data but also that other users cannot decrypt the ciphertext.Any tampered data can be recognized by the aggregator and a retransmission request is sent to the user.Any illegal party cannot forge a legal user's signature.The user's data should not be linked with its source by any party including the aggregator.
(ii) Safeguarding the Aggregator's Benefits.We assume any party including outside adversary has the ability to eavesdrop all the uploaded data, and if other people can recover the original data, they can compute any aggregation result and thus seriously damage the aggregator's benefits.Therefore, we design the protocol to prevent illegal parties from getting valuable data.
(iii) Computation Efficiency and Accuracy.The proposed framework should achieve computation efficiency and accuracy; in particular, (i) the users can efficiently encrypt the data and compute the corresponding signature on it; (ii) the aggregator can efficiently verify all the users' signatures and recover the original data accurately; (iii) with all the original data the aggregator can accurately compute any function on them with high efficiency.Our goal is to design efficient data aggregation protocols that can be used by an untrusted aggregator to collect all users data in a source-anonymous manner.

Bilinear Map and Aggregate Signature.
Let  1 and  2 be two cyclic groups of prime order , and their generators are  1 and  2 , respectively.There exists an additional group   such that A bilinear map is a map ê :  1 ×  2 →   with the following properties.
Aggregation.For the aggregating subset of users  ⊆ U, an index  is assigned to each user, where  ∈ [1, ] and  = ||.Each user   ∈  provides a signature   ∈  1 on a message   ∈ {0, 1} * of his/her choice.The messages   must all be distinct.Compute  ← ∏  =1   .The aggregate signature is  ∈  1 .

The Collusion-Resistant and Privacy-Preserving Data Aggregation Protocol
In this section, we present our privacy-preserving aggregation scheme to achieve the aforementioned design goals.
5.1.Overview.Our proposed scheme can achieve -source anonymity while it prevents adversary from tampering users' uploaded messages and generates invalid signatures.During the system initialization phase, the TA generates users' public/secret key for message encryption and authentication and the aggregator's secret key for decryption.When the aggregator wants to collect some data from  users, it first confirms the time period  with other users.All the users sense the data and encrypt it with their own secret keys and sign the encrypted data and then collectively upload all the data to the aggregator.If any user does not have data to send, he/she can upload a predefined value, which helps others to decrypt data.After all the data are collected, the aggregator first aggregates all users' signatures and verifies them.If signatures are valid, all the users' original data can be recovered with the aggregator's secret key but cannot be linked with their owners' real identities.Our scheme consists of three algorithms: System Initialization algorithm assigns keys to the users and the aggregator.Enc&Sign algorithm encrypts users' data and signs the ciphertext.Verify&Dec algorithm verifies and decrypts users' data.
We state the basic idea of the encryption and decryption here.Consider the following equation: Then we use   ( ∈ [1, ]) as the key of the pseudorandom hash function ℎ; thus We assign the left part of (1) to all users and the right part to the aggregator.They use the same  as the parameter of ℎ and ℎ   () as the pad to encrypt or decrypt the data, and thus the aggregator can eliminate all the pads with its keys.
However, although all users can collectively compute  hash functions, the aggregator has to compute  hash functions every time by himself.Therefore, we move some elements in the right part of (1) to the left: Thus the aggregator knows fewer secret keys and computes much fewer hash functions, and each user only needs to compute a few more functions.The notations in our scheme are listed in the Notations.
Then the TA randomly generates  1 ,  2 , . . .,   ∈ {0, 1}  as secret keys, where  is the number of the users.As the idea described in previous subsection, we distribute these secret keys as follows: (iii) Let   = Ṡ  ⋃ Ŝ for  = 1, 2, . . ., , and each   is sent to the user  as encryption key set.Also, all the keys in   are sent to the aggregator as decryption keys.
The signing key is also generated in this phase.For each user  ( = 1, 2, . . ., ), the TA generates a pseudo ID   = {0, 1} ⌈log 2 ⌉ for him in each period, a secret signing key   =    ←  Z, and a public signing key   =    2 for the pseudo ID.

Enc&Sign.
In each period , before the users upload their data to the aggregator, a sequence number seq() ∈ [1, ] is generated for user , where  = 1, 2, . . ., . {seq()} =1,2,..., is a permutation of {1, 2, . . ., }, which is used to scramble the order of users' data.Each user  encrypts his/her data according to seq(), and the aggregator does not know the owner of the th data after decryption, where  = 1,2, . . ., , because {seq()} =1,2,..., is unknown for him and thus cannot get  so that seq() = .Considering security issues, the sequence number should be changed randomly.We emphasize that the sequence number can be generated by the TA or through communication among the users.
For user , although he/she only owns -bit data, he/she has to upload  *  bits' ciphertext to hide his/her data; otherwise the connection between his/her identity and   can be easily found.Therefore, each user has to compute extra ( − 1) bits' ciphertext.

Input:
For each user  ( = 1, 2, . . ., ), input his/her pseudo ID   , secret encryption key set   , and secret signing key   =   .Each user uploads his/her data   ∈ {0, 1}  in the time period .The symbol  |  represents the concatenation of  and  and ⊕ℎ ∈  () denotes the exclusive-or of all the results of function ℎ for each element  in   .

Output:
The user  outputs   and   as follows: begin (1) Generate  random -bit strings    ( = 1, 2, . . ., ) using The data   is encrypted as: The signature is computed as: Finally, the signature   is generated.Each user executes Algorithm 1 and then sends   and   to the aggregator with his/her pseudo ID   .

Verify&Dec.
The aggregator runs Algorithm 2 to fetch the data.After receiving all the ciphertexts from  users, the aggregator first leverages users' public keys to verify their signatures.If the algorithm outputs −1, which means some signatures are invalid, the aggregator discards the invalid data and asks for a retransmission.Otherwise all the users' original data can be recovered with the aggregator's secret key.
To decrypt all the users' data, the aggregator needs to take exclusive-OR on all the ciphertexts.Let , and each part is a -bit string.We know that   seq() is the ciphertext of   , and: Therefore, the aggregator first computes  in Step (2), and uses  to decrypt all the ciphertexts.The original data  is output in Step (3), which can be divided as  = {[1, ], [+ 1, 2], . . ., [( − 1) + 1, ]}, where each [, ] is a user's original data.However, the aggregator cannot link each data with its owner, because seq() is unknown for him.
If the aggregator finds any abnormal data in , for example, [, ], it can request the TA to recover the identity of the malicious user.The aggregator sends all the data containing   and   as well as  and  to the TA.If the seq() is known by the TA, it can directly find the malicious users.Otherwise, the TA recovers the corresponding secret encryption keys and real identities from the pseudo IDs, decrypts all the data and reveals the malicious users' identities.

Security Analysis
In this section, we analyze our framework and elaborate how our protocol can achieve the design goals under the security model.Specifically, we mainly focus on the following three aspects: why our protocol can hold -source anonymity so that the only knowledge the server can get is that the data owner is one of the  participants in the system, why the collusion attack cannot help adversaries to recover the users secret keys, and why the data integrity can be protected and the identity authentication can be guaranteed.

Our Protocol Is 𝑘-Source Anonymous.
As the definition of -source anonymity listed in Section 4, we want to prove that if we interchange any two users' data in the same interval, the adversary including the aggregator cannot efficiently tell the difference.
Given a group of participants  = { 1 ,  2 , . . .,   }, and their corresponding sensing data  = { 1 ,  2 , . . .,   } ∈   , where  = {0, 1}  is the message space of   ,  = 1, 2, . . ., .Each user   runs Algorithm 1 to encrypt his/her data   and sends the generated ciphertext   to the aggregator.Note that   has been defined in Step (2) of Algorithm 1.It is obvious According to the definition, if we want to prove that our protocol is -source anonymous, it is equivalent to prove: holds for any ,  where 1 ≤  ≤  ≤ , and ∀ ∈   .
To prove the correctness of this equation, we construct a simulator  to run the same protocol.First, the  generates a pseudorandom permutation function   : [1, ] → [1, ], and produces a permutation of [1, 𝑛] as [  (1),   (2), . . .,   ()].Then,  permutes the original dataset  to be: (1) ,    (2) , . . .,    () } The protocol is executed with the input of   , and outputs the corresponding result: Because we cannot distinguish  and its pseudorandom permutation   in polynomial time, the  and   are computationally indistinguishable.Therefore, the following equation holds: Similarly, we can draw the conclusion: Otherwise,   and its pseudorandom permutation   can be distinguished in polynomial time.Therefore, we have: 6.2.Our Protocol Is Collusion-Resilient.The ability to resist collusion attack depends on the size of each user's encryption keys and the aggregator's decryption keys.If we increase the size of   , or , and the size of   , or , the security level can be enhanced.Furthermore, when more users participate in the aggregation, our protocol can achieve better security.
In our scheme, the malicious users may collude to recover the aggregator's decryption keys, or collude with the aggregator to recover the other honest users' encryption keys.Let   denote the probability with which an honest user's key can be guessed successfully in a single trial,   denote the probability to recover the aggregator's key in a single trial, and  denote the proportion of malicious users who collude with the aggregator.As we can see in [10], we have: Assuming that there are at most 30% malicious users in the group, and the security requirements are   ≤ 2 −80 and   ≤ 2 −80 .When the number of participants is 10 2 ,  and  can be set as 7 and 13 respectively, which means that the aggregator owns 13 decryption keys and every user owns 14 encryption keys at most.When the number of participants reaches 10 3 , we set the  = 5 and  = 9.
Therefore, we can see when we set  = 100,  = 7, and  = 13, or  = 1000,  = 5, and  = 9,   ≤ 2 −80 and   ≤ 2 −80 are satisfied.Because the probability to recover secret keys in a single trial is not bigger than 2 −80 , our protocol is collusionresilient.Even if the number of users changes, we can adjust  and  to resist collusion attack.

Our Protocol Achieves Authentication and Data Integrity.
Our protocol leverages aggregate signature as a built-in block to achieve authentication and the users' data integrity.The adversary can break authentication and data integrity if and only if it can forge the aggregate signature.However, the aggregate signature described in Section 4 is proven to be secure under the aggregate chosen-key security model [24].We know the probability with which the adversary can generate valid aggregate signature is negligible.Furthermore, each user's data is bound with a pseudo ID.Therefore, authentication and data integrity are achieved.

Performance Evaluation
In this section, we implement our system and evaluate the performance of each instance of progress in our protocol, which demonstrates the efficiency and feasibility of our system.The comparison with other existing aggregation protocols is also performed with experiments and theoretical analysis.
As we can see, the main cost for the participants in our protocol is to encrypt and sign the uploaded data.We simulate participants' steps to examine the computational cost and elaborate the comparison with previous work.The aggregator's efficiency is affected by the verification and decryption, which is also shown to be accepted in the experiment.

Implementation and Experimental
Settings.We implement our protocol in a desktop with Intel Core i7-4790 3.60 GHz CPU and 8 GB memory.The compilation environment is Visual Studio 2013 in Windows 10.And the cryptographic functions in the algorithm are provided by the MIRACL library.We use the same hash function as that in [25], which uses HMAC-SHA512 as the pseudorandom function.For each -bit raw data, if  < 512, the HMAC-SHA512 outputs a pseudorandom 512-bit string, which is truncated into -bit substrings and taken exclusive-OR on all these substrings.When  > 512, we take several HMAC-SHA512 functions and concatenate their output, while the remaining part can use the same method as the condition  < 512.
All the users' data is generated by taking a uniform sample from [0, 2  − 1].The reason why we do not adopt the reallife data is that the value of the user's data does not affect the efficiency of our protocol.We only take experiments on the computational cost in both sides but do not consider the communication cost between them, because each user's data size is  *  bits, generally tens of or several kilobytes, which can be transmitted in short time.All the algorithms in our protocol are executed for 10 times.We take the average running time as the final output.

Computational Cost at User Side.
All the computational cost at user side depends on Algorithm 1.In our scheme, if a user wants to encrypt the raw data, he/she needs to compute  *  pseudorandom functions, where  is the number of the user's keys; namely,  = |  |, for  = 1, 2, . . ., .
Figure 2 shows the result of our computation time.We set the group size as  = 1000,  = 0.3,   < 2 −80 , and   < 2 −80 .According to [10], we let  = 5, which means that each user owns nearly 10 encryption keys.During each time period, each user computes 10 pseudorandom functions and takes 10000 exclusive-OR operations on -bit data.We can see that our algorithm is efficient and can be applied in real environment.In this figure, the encryption time increases linearly with the data length.If the data length is smaller than 2000 bits, the encryption time is no more than one second.When the data length reaches 5000 bits, we can finish the encryption within two seconds.
Figure 3 shows the relationship between the encryption time and the number of users when the data length is 1000 bits.Obviously, as the number of users increases, each user needs to compute more ciphertexts.However, it only takes 0.21 s for each user to take the encryption when the number of users increases to 500.
Here we compare our protocol with that in [11], to encrypt the raw data they have to compute 2 pseudorandom functions and take 2 exclusive-OR operations on  bits' data.Recall that although our computational cost is five times as much as that in [11], we can achieve more security properties.When the data length is 100 bits, our protocol costs 0.24 s and the scheme proposed in [11] costs 0.05 s.When the data length is 1000 bits, 0.58 s and 0.12 s are needed, respectively.
We emphasize that we do not list the details of the comparison of signature time and verification time here, because the signature time depends much on the aggregate signature scheme.In fact, we only need to take 30 ms to sign the final ciphertext, when the data length is 5000 bits.

Computational Cost at Aggregator Side.
Different from [11], the aggregator owns decryption keys to prevent others from obtaining all the raw data.In our scheme, the aggregator needs to compute  pseudorandom functions and take  * ( + ) exclusive-OR operations on  bits' data.However, it takes  * ( + ) exclusive-OR operations on  bits' data in [11].
If we set  = 1000,  = 0.3,   < 2 −80 , and   < 2 −80 , the result is shown in Figure 4.The aggregator only needs to take  several seconds to decrypt all the data when the data length is 5000 bits, and it only takes nearly one more second than [11] to resist the collusion attack.When the times of exclusive-OR operations increase, the time to compute pseudorandom functions does not dominate the whole running time any more.Experiment shows that it takes 0.26 s for decryption when  equals 100, while 0.031 s is needed in [11].When  reaches 1000, our protocol and [11] consume 0.72 s and 0.23 s, respectively.
When the data length is set as 1000 bits, the computational cost at aggregator side is shown in Figure 5.The decryption time is proportional to the number of users.If the group includes 100 users, the decryption can be finished in 0.1 s.Even though the number of users grows up to 1000, the decryption time does not exceed one second.

Conclusion
In this paper, we propose a novel protocol to allow an untrusted aggregator to compute any aggregation functions based on all users' data.Collusion resistance is also achieved even if part of malicious users collude with the untrusted aggregator.The data collection can be finished in one-round communication, so bidirectional communication channel is not needed in our protocol.We also protect all users' data integrity by leveraging the aggregate signature.Security analysis shows that -source anonymity is achieved.Through extensive performance evaluations, we have demonstrated that the proposed scheme is efficient at user/aggregator side.In our scheme, dynamic joining and exit of users have not been discussed.In the future, we will continue our efforts to address this issue.

Figure 1 :
Figure 1: System model of data aggregation.

Figure 2 :
Figure 2: Computational cost of encryption at user side.

Figure 3 :Figure 4 :
Figure 3: Computational cost of encryption at user side.

Figure 5 :
Figure 5: Computational cost of decryption at aggregator side.

Notations𝑛:
The number of users in the scheme Ṡ : The universal additive secret key set Ṡ  : The user 's additive secret key set Ŝ: The universal subtractive secret key set Ŝ : The user 's subtractive secret key set   : The user 's encryption key set,   = Ṡ  ⋃ Ŝ still cannot link each data with its owner, because every user's data is hidden in the dataset of  elements.The more users the group contains, the higher security level the system achieves.For a specific user, if the adversary can only know that the user's data belongs to one of the  users, we say this user's data holds -source anonymity.If all data captures -source anonymity in a data aggregation protocol, we say this protocol achieves -source anonymity.Intuitively, the definition states if the aggregator cannot efficiently notice whether we switch two data items, the aggregation protocol with  users is -source anonymous.k-SourceAnonymous.A data aggregation process or a protocol is -source anonymous if (. . .,   (  ), . . .,   (  ), . ..)   (  ), . . .,   (  ), . ..) is satisfied, where  denotes the number of users for any group ,   and   are two users in , { 1 , . . .,   } ∈ {}  is data aggregation sample,  denotes the message space, (. . .,   (  ), . ..) denotes the data aggregator's view when running protocol with   (  ∈ { 1 , . . .,   }) as   's input ( = 1, 2, . . ., ), and 4.1.-Source Anonymity.Assuming that there is a group of  users, each user uploads his/her data to the aggregator.Although the aggregator can obtain the exact values of all data, it The keys are randomly divided into  disjoint subset, and divides the remaining  −  secret keys into  random disjoint parts, denoted as Ŝ ,  = 1, 2, . . ., , which is called subtractive secret key set.Among them, there are −−×⌊(−)/⌋ subtractive sets containing ⌊( − )/⌋ + 1 keys, and the other (1 + ⌊( − )/⌋) −  +  sets contain ⌊(−)/⌋ keys.We also use Ŝ to denote the universal subtractive set, where Ŝ = ⋃  =1 Ŝ .It is clear that Ṡ = Ŝ ⋃   .
Output   and   .end Algorithm 1: Enc&Sign.To generate  *  bits' ciphertext, user  first uses his/her encryption keys as the secret key of ℎ  () to generate  *  bits' one-time pad,   1 ,   2 , . . .,    .Notice that all    are different, because each of them is generated by different parameter  | .To scramble the order of   , we encrypt   with   {0}  ⊕  1 , {0}  ⊕  2 , . . .,   ⊕  seq() , . . ., {0}  ⊕   .Then all of them are concatenated one by one to generate   .If any user does not have data to upload, he/she can simply set his/her data to 0.