A Secure Privacy-Preserving Data Aggregation Model in Wearable Wireless Sensor Networks

With the rapid development and widespread use of wearable wireless sensors, data aggregation technique becomes one of the most important research areas. However, the sensitive data collected by sensor nodes may be leaked at the intermediate aggregator nodes. So, privacy preservation is becoming an increasingly important issue in security data aggregation. In this paper, we propose a security privacy-preserving data aggregation model, which adopts a mixed data aggregation structure. Data integrity is verified both at cluster head and at base station. Some nodes adopt slicing technology to avoid the leak of data at the cluster head in inner-cluster. Furthermore, a mechanism is given to locate the compromised nodes. The analysis shows that the model is robust to many attacks and has a lower communication overhead.


Introduction
Recently, the wearable wireless sensors become powerful and rapidly expanding in healthcare monitoring [1][2][3].The wearable sensors can be used to collect and transmit the data to the users.Sometimes, the data collected from some near places are similar to each other.Meanwhile, the powers of sensors are limited.Therefore, the data aggregation techniques are used to reduce the communication overhead [4,5].In the process of data aggregation, data need to be aggregated by the aggregation nodes.Unfortunately, data aggregation is vulnerable to some attacks because the data are sensitive or privy.If the sensitive data are revealed, this may bring serious threat or economic loss.So, the security data aggregation is playing an important role in wearable sensors.
In this paper, a security privacy-preserving data aggregation model is proposed.The model adopts a mixed data aggregation structure of tree and cluster.Data integrity is verified both at cluster head and at base station.Moreover, a locating mechanism is provided, which can locate the compromised node.
The remainder of this paper is organized as follows.In Section 2, the related work is summarized.A new secure privacy-preserving data aggregation model (SPPDA) is proposed and analyzed in Section 3. In Sections 4 and 5, the security and performance of the model are analyzed.Finally, the conclusion of this paper is given.

Related Work
Recently, secure data aggregation is becoming an important issue for wearable sensors.Cryptographic is an efficient mechanism to secure data aggregation.Moreover, the homomorphic encryption can aggregate encrypted messages directly from sensors without decrypting so that it has a short aggregation delay.
Castelluccia et al. [6] proposed a simple and provably secure additively homomorphic stream cipher which is slightly less efficient on bandwidth than the hop-by-hop aggregation scheme described previously.Girao et al. [7] proposed an approach that conceals sensed and aggregated data end-to-end, which is feasible and frequently even more energy efficient than hop-by-hop encryption addressing a much weaker attacker model.Feng et al. [8] proposed a family of secret perturbation-based schemes, which can protect sensor data confidentiality without disrupting additive data aggregation.

Journal of Electrical and Computer Engineering
All the homomorphic encryption schemes above use the symmetric key.The securities of these schemes depend on the length of the key.Meanwhile, the security of the asymmetrical secret key schemes depends on the intractability of the algorithms.So the asymmetrical secret key schemes are designed.
Boneh et al. [9] proposed a homomorphic public key encryption scheme, which improved the efficiency of election systems based on homomorphic encryption.Mykletun et al. [10] revisited and investigated the applicability of additively homomorphic public-key encryption algorithms for certain classes of wireless sensor networks and provide recommendations for selecting the most suitable public key schemes for different topologies and wireless sensor network scenarios.Girao et al. [11] provided an approach for a tiny Persistent Encrypted Data Storage (tinyPEDS) of the environmental fingerprint.Bahi et al. [12] proposed a secure end-to-end encrypted data aggregation scheme, which significantly reduces computation and communication overhead and can be practically implemented in on-the-shelf sensor platforms.Ozdemir and Xiao [13] proposed a novel integrity protecting hierarchical concealed data aggregation protocol, which is more efficient than other privacy homomorphic data aggregation schemes.Lin et al. [14] proposed a new concealed data aggregation scheme, which is robustness and efficiency.Zhou et al. [15] proposed a Secure-Enhanced Data Aggregation, which can achieve the highest security on the aggregated result compared with other asymmetric schemes.
However, the models above can only detect the compromised nodes in verifying the data integrity at most, without locating the compromised nodes.In this paper, we present a new secure privacy-preserving data aggregation model (SPPDA), which adopts a mixed data aggregation structure.The network is divided into clusters, and the data aggregation trees are used in inner-cluster and interclusters.Firstly, some of nodes adopt slicing technology to avoid the leak of data at the cluster head.Secondly, data in the cluster are aggregated and sent to the cluster head, and cluster head verifies the data integrity to restrict the range of compromised node.Lastly, the cluster heads continue to send the data to the base station, and the data integrities are verified at the base station again.Furthermore, the model gives a mechanism to locate the compromised nodes.The analysis shows that this model has lower communication overhead.

SPPDA Model
The model uses the cluster structure network which contains three kinds of nodes: base station, cluster heads, and cluster nodes.The network is divided into two layers: inner-cluster and intercluster.In the inner-cluster, data are sent to the cluster head, and the cluster head verifies the data integrity to restrict the range of compromised node.In the intercluster, data are sent to the base station, and the integrity is verified at the base station.Furthermore, a mechanism is proposed to locate the compromised node.SPPDA model can be divided into initialization, the key distribution, inner-data aggregation, and interdata aggregation.

Initialization.
The initialization of SPPDA model includes three parts: cluster head voting, inner-cluster data aggregation tree, and intercluster data aggregation tree.
(1) Cluster Head Voting.Using the existing cluster protocols [16,17], the network can be divided into many clusters.In the process of cluster, the trust management mechanism [18,19] can be used to help the selection of the cluster header.Generally, it satisfied two conditions as follows: (1) The cluster head has higher trust values.
(2) The clusters are evenly distributed in the monitoring area.
(2) Inner-Cluster Data Aggregation Tree.In each cluster, the data are sent to the cluster head along the data aggregation tree [20].The inner-cluster data aggregation tree is structured by a certain data aggregation tree protocol.It satisfied two conditions as follows: (1) The degree of cluster head is large enough.
(2) The number of aggregation nodes is not more than the leaf nodes.
Lastly, cluster heads set the compromising threshold ℎ ch which is used to judge whether a branch in the cluster is compromised.
(3) Intercluster Data Aggregation Tree.When the cluster heads aggregated the data of their cluster, the data in cluster heads are sent to the base station along the intercluster data aggregation tree.The intercluster data aggregation tree is similar to the structure of the inner-cluster data aggregation tree.Lastly, base station set the compromising threshold ℎ ch which is used to judge whether a branch of the BS is compromised.

The Key Distribution.
In SPDSA model, there are three sets of key: BS (base station) key, CH (cluster head) key, and  (neighbor) key.The BS key is generated by the base station which is used to ensure the security of the communication between the cluster heads and the base station.The CH key is generated by each cluster head which is used to ensure the security of the communication between cluster nodes and the cluster head.The neighbors key is generated offline which is used to ensure the security of the communication between a node and its neighbors.The structure of each key is described as follows.
(1) BS Key Distribution.BS generates three primes ( 1 ,  2 ,  3 ) and  =  1  2  3 order elliptic curve ().Then, according to the degree of BS which is defined as degree BS, degree BS groups of points {  ,   ,   } degree BS are selected from , and the order of those points is .
For each group , we get three new points according to the formula as follows: Here,   is used to encrypt the aggregated data,   is used to record the number of the cluster, and   is used to mix the encrypted result and enhance the security of the data.
Then, the BS gets a group of keys.The public key is (,   ,   ,   , ) and the private key is ( 1 ,  2 ,  3 ).The public key is distributed to the cluster heads in a secure way, and the private key is reserved by the BS.
(2) CH Key Distribution.When the BS generates the key, each cluster head begins to generate the CH key.For example, CH() generates three primes ( ()  1 ,  () 2 ,  () 3 ) and an elliptic curve ( () ) firstly.The order of  () is According to the degree of CH which is defined as degree (), degree () groups of points are selected from  () , and the order of those points is  () .
For each group , we get three new points according to the formula as follows: Here,  ()  is used to encrypt the aggregated data,  ()  is used to record the number of the cluster, and  ()   is used to mix the encrypted result and enhance the security of the data.
Then, CH() gets a group of keys.The public key is ( () ,  ()   ,  ()  ,  ()  ,  () ) degree () and the private key is ( () 1 ,  ()  2 ,  () 3 ).Lastly, the public key is distributed to the cluster nodes in a security way, and the private key is reserved by the CH().
(3) N Key. key distribution consists of five steps [21]: (1) Generation of a large pool of  keys and their key identifiers.(2) Random drawing of  keys out of  without replacement to establish the key ring of a sensor.(3) Loading of the key ring into the memory of each sensor.(4) Saving of the key identifiers of a key ring and associated sensor identifier on a trusted controller node.(5) For each node, loading the th controller node with the key shared with that node.cannot share a key but they can be connected by a link consisting of some nodes, this link can be the secure link between these two nodes.

Inner-Cluster Data Aggregation.
In the inner-cluster data aggregation, the cluster heads can obtain the plaintext which is not secure enough for the data.Therefore, before the innercluster data aggregation, the slicing and mixing scheme [22] is used in each cluster.
(1) Slicing.In each cluster, we call one node "leaf node" if some neighbors of this node belong to other clusters.And the leaf node slice its data into two parts.One slice is sent to the other node in another cluster and the other is kept by itself.Figure 1 shows the slicing scheme.The solid line is the route in which the data is transmitted to the cluster head.The dotted line is the route in which the leaf nodes send the slices to the neighbor nodes in other clusters.In Cluster 1, there are 4 leaf nodes: CN 11 , CN 12 , CN 13 , and CN 14 .According to the rule above, these nodes divide their data into two slices.One is kept by itself; another is sent to the neighbor nodes in other clusters along the dotted line.CN 11 and CN 12 send the slices to the neighbor nodes in other clusters not drawn in Figure 1.CN 13 sends the slices to the CN 22 in Cluster 2 and receives the slices from CN 21 in Cluster 2. CN 14 sends the slices to CN 31 in Cluster 3.
(2) Mixing.When all the leaf nodes send the slice, all nodes recomputed the data of it.If a node receives the slices, it adds all the slices to get a new data.
After the slicing and the mixing, the data  ()  is encrypted into  ()   according to formula (3) at each cluster node in cluster : Here, + is the summation in elliptic curve, × is the scalar multiplication in elliptic curve, and  ()   is random.Then, the encrypted data is transmitted to the cluster head.And the data are aggregated by the intermediate nodes.
The aggregation of the th branch in cluster  is ∑   ()  is the aggregation plaintext of branch ,  ()  is the number of the nodes in branch , ∑   ()   is the aggregation of the random, and  ()  ,agg is the ciphertext of the aggregation in branch .
The cluster head in cluster  receives the aggregation of each branch.Then, the cluster head decrypts the  ()   of each branch using the privacy key.The plaintext  ()   is Here,  ()  =  () 2  () 3 ×  ()  .The cluster head judges whether the result of each branch is compromised according to the threshold ℎ ch .If a branch is compromised, the locating mechanism is used to locate the compromised node.If not, continue to aggregation.
Here,  () is the number of the cluster nodes in cluster . is random.

Intercluster Data Aggregation.
After the inner-cluster data aggregation, the encrypted data is transmitted to the base station.And the data are aggregated by the intermediate nodes.The aggregation of the th branch of base station is ∑    is the aggregation plaintext of branch ,   is the number of the nodes in branch , ∑    is the aggregation of the random, and  () ,agg is the ciphertext of the aggregation in branch .
The base station receives the aggregation of each branch.Then, the base station decrypts   of each branch using the privacy key.The plaintext   is Here,    =  2  3 ×   .
The base station judges whether the result of each branch is compromised according to the threshold ℎ ch .If a branch is compromised, the locating mechanism is used to locate the compromised node.If not, continue to aggregation.
The cluster head gets the plaintext of the aggregation result in the cluster.That is, Here,    =  1  2 ×   .Figure 2 shows the locating mechanism in a cluster.In the left part of Figure 2, CH finds a branch which consists of the red compromised nodes.So, this branch needs to be reconstructed.Obviously, CH 1 and CH 4 are two intermediate nodes.Therefore, this branch is divided into two new branches.CH 1 and CH 4 are also the intermediate nodes, and they are in the different branches.Then, these two branches transmit the data to the CH according to the rule described in inner-cluster data aggregation.And the CH checks their integrities.If a branch is still compromised, the only intermediate node in this branch is the compromised node.

A Case Study.
In this section, we give a detailed example of SPPDA model with initialization, the key distribution, inner-cluster data aggregation, and intercluster data aggregation.
(1) Initialization.In Figure 3, there are 25 sensor nodes distributed in the monitor area, and the base station is located in the left of the monitor area.These nodes are divided into 5 clusters.Then, the inner-cluster data aggregation tree and the intercluster data aggregation tree are constructed.In the intercluster data aggregation tree, there are 2 branches which are BSB 1 and BSB 2 from BS. BSB 1 consisted of BS, CH 1 , CH 2 , and CH 3 .BSB 2 consisted of BS, CH 4 , and CH 5 .In each cluster, there are 4 CNs and 1 CH.Then, the cluster nodes are divided into 2 branches.Using the th cluster as an example, the branches are CB i1 and CB i2 .The CB i1 consisted of CH i , CN i1 , and CN i2 .The CB i2 consisted of CH i , CN i3 , and CN i4 .When the data aggregation trees are completed, CH records the amount of the CNs in its cluster, and the BS records the amount of the CHs in the network.
(3) Inner-Cluster Data Aggregation.Firstly, the edge nodes are confirmed in each cluster by its CH.In this case, the edge nodes are CN 13 , CN 14 , CN 22 , CN 24 , CN 32 , CN 33 , CN 41 , Table 1: The values of the major parameters.

Parameters
Values  CN 42 , CN 52 , and CN 53 .Secondly, each edge node generates a slice from its data.Then, each edge node sends its slice to its neighbor randomly which belongs to a different cluster.
Figure 4 shows the process of the slicing.The full lines express the inner-cluster data aggregation tree, and the dash lines express the flow of the slices.After slicing, the nodes which receive the slices add them into their data.In Table 2,

Chosen-Plaintext Attack.
In chosen-plaintext attack, attackers can get some plaintexts and the ciphertexts.Attackers want to get the secret key by analyzing these texts so that the other ciphertexts can be cracked rapidly by using this secret key.
SPPDA model uses the elliptic curve encryption with three parameters, and one of them is used to add the random disturbance.In this way, even the same plaintexts can be encrypted to the different ciphertexts.So, no matter how many plaintext-ciphertexts the attackers get, they cannot get the secret key by analyzing the plaintext-ciphertexts.

Data Injection Attack.
In data injection attack, the attackers send the unauthorized data to the aggregation node.If the aggregation aggregates this data, the result will be different from the real result.So the base station gets a fault result.
SPPDA model uses the elliptic curve encryption.So the ciphertext is satisfied with the structure of the elliptic curve encryption.If the attackers send the data which lacks standardization, the aggregation can recognize it easily and remove it by the aggregation node.

Aggregation Node Compromised Attack.
In the node compromised attack model, attackers can compromise some aggregation nodes in the wearable sensors.Then, attackers get the key of these nodes and perform unauthorized aggregation.So, the base station gets the fault result.
SPPDA model verifies the data integrities both in cluster heads and in base station.If the aggregation node in cluster is compromised, cluster head can recognize the fault of the branch at which the compromised node stays.If the cluster head is compromised, base station can recognize the fault of branch at which the compromised cluster head stays.Then, the cluster head or base station uses the locating mechanism to locate the compromised node and remove it.

Performance Analysis
In this section, the computation overhead and the communication overhead of SPPDA model are analyzed and compared with the IPHCDA model.

The Computation Overhead.
The computation overhead includes encryption, aggregation, and decryption.We assume that the overhead of addition, scalar multiplication, MAC, XOR, and the decryption are expressed as  Add ,  Mul ,  MAC ,  ⊕ , and  log ,  is the amount of clusters, and  is the amount of the nodes in wearable sensors.In general, the computation overhead of IPHCDA model is lower than SPPDA model in encryption and decryption.The computation overhead of SPPDA model is lower than IPHCDA model in aggregation.But, there are two aspects not described in Table 7.
(1) The orders of the elliptic curve are not the same in both models.The order in IPHCDA is larger than in SPPDA.So the  Add ,  Mul , and  log in IPHCDA model are larger.(2) The computation overhead which is extra in SPPDA model is undertaken by the whole network, so the average overhead to each node is lower.So, the computation overheads in both models are almost the same.

The Communication Overhead.
In this section, the communication overhead between SPPDA model and IPHCDA model is compared.It is assumed that these two models are used in the same network structure.Therefore, the comparison of the communication is the same as the comparison of length of ciphertext.
It is assumed  is the length of each prime in both models, and the number of the clusters in the network is .So the length of ciphertext in IPHCDA model is ( + 1), and the length of ciphertext in SPPDA model is 3.In general case,

Conclusion
In this paper, we present a new secure privacy-preserving data aggregation model, which adopts a mixed data aggregation structure of tree and cluster.The proposed model verifies the data integrity both at the cluster nodes and at the base station.Meanwhile, the model gives a mechanism to locate the compromised nodes.Lastly, the detail analysis shows that this model is robust to many attacks and has lower communication overhead.

of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.

Figure 4 :
Figure 4: The slices of the edge nodes.
Mechanism.Locating mechanism is used to locate the compromised nodes in the intermediate nodes.The locating mechanism works as follows.We assume that the numbers of leaf nodes and intermediate nodes are  and .Then we have  ≥ .The branch which does not pass the integrity verification is reconstructed into  branches, where there is only one intermediate node in each branch.The new intermediate nodes are the same as in old branch.And the data integrity is verified in the root node.If one branch does not pass the verification, the intermediate node in this branch is a compromised node and the locating mechanism ends.

Table 7 :
The computation overhead of IPHCDA and SPPDA.Ciphertext only attack is a basic attack in wearable sensors.When attackers use this attack, they only can try to get the plaintext by analyzing the ciphertext.SPPDA model uses the elliptic curve cryptography, which is an asymmetric encryption model.Its security is based on the intractability in decomposition of large prime numbers.So SPPDA model can resist this attack well as long as the suitable prime numbers are used.
Table 7 shows the computation overhead in IPHCDA model and SPDA model.In encryption operation, IPHCDA model needs twice  Mul and once  Mul in each node, while SPPDA model needs three times  Mul and twice  Mul .In aggregation operation, IPHCDA model needs ( − 1) times  Add ,  times  MAC , and  times  ⊕ , while SPPDA model only needs ( − 1) times  Add .The number of XOR operations is decided by the structure of the aggregation tree.The constant  is no less than 1 and no more than  − 1.In decryption operation, IPHCDA model needs  times  log , while SPPDA model needs 2 times  log .

Table 8 :
The length of ciphertext in two models ( = 256, unit is bit).=256 is safe enough to a ciphertext, and Table1shows the comparison of the length of ciphertext in two models when  = 256.In Table8, the length of ciphertext increases with  in IPHCDA model, and the length of ciphertext is constant 768 when  increases.So, when  > 2, the length of ciphertext in IPHCDA model is larger than that in SPPDA model; that means the communication overhead of IPHCDA model is larger.Actually, a cluster-based network usually consists of plenty of clusters.Therefore, the SPPDA model has lower communication overhead.