Secure and Efficient Cluster-Based Range Query Processing in Wireless Sensor Networks

In wireless sensor networks, preserving privacy is more important and has attracted more attentions. Protecting data and sensor privacy while collecting and computing query results is a challenge. In cluster-based sensor networks, when a user queries a sensitive data, the adversaries can monitor original node or gain the data in cluster node. To deal with this problem, we propose a secure and efficient scheme for cluster-based query processing in wireless sensor networks. To preserve location privacy of sensors, we use anonymity method to confuse adversaries. To protect the sensitive data, we use prefix membership verification method to prevent adversaries from gaining sensitive messages collected by sensor nodes. And we analyze the security and communication cost. )e results show that our scheme can efficiently protect privacy in query processing.


Introduction
Wireless sensor networks have been widely deployed in various applications, such as monitoring environment, collecting temperature data, and gaining information of battlefield.Each sensor node transmits sensed data to a base station for further processing.In some applications, clustering method has been extensively studied [1] and used to organize sensor nodes, which has been considered as a useful approach.And some nodes are grouped into clusters such that sensor node sends data to a cluster head in the same cluster.Many clustering applications aimed at enhancing the energy efficiency and extending the network lifetime in wireless sensor networks.
In wireless sensor networks, when sensor nodes collect information in our daily life, we should pay attention to protect data privacy and security.For instance, a user wants to query sensitive data from a certain sensor node according to his interests.e sensor network may leak private information about the user's interests to an adversary who can gain the content from the queried data.Meanwhile, the adversary can monitor the frequency of query to analyze the user's preferences and find the related sensor nodes.en the adversary may attack and compromise the related nodes.And the compromised node may respond to a query and send fake data to the user, which is in conflict with the privacy requirement.So query processing brings serious security challenges.
When users monitor events or analyze sensed data, data query becomes an important operation in wireless sensor networks.Recently, many existing privacy techniques can be employed in sensor network scenarios.For example, a target region transformation technique [2], range query [3][4][5][6], and top-k query [7,8] have been well addressed.However, these schemes are not suitable for cluster-based query processing in wireless sensor networks.And many techniques do not consider the computing power, power of sensors, and capacity, which are the limiting factors in wireless sensor networks.For limited availability resources, it is important to make the trade-off between the privacy preservation and the communication overhead.
Based on the above discussions, in this paper, we propose a novel cluster-based privacy preserving query processing in wireless sensor networks.We consider the privacy issue when processing data query in wireless sensor networks.If a user wants to gain and query information from the cluster-based sensor network, we will use the anonymity method and the prefix membership verification scheme [9,10] to protect the sensitive data against adversaries.When a cluster head receives a query message, the cluster head will randomly choose several cluster members which include the real queried node.So, it is unlikely that the adversary can monitor the real frequency of query in a cluster.erefore, the adversary cannot gain the user's interests from analyzing the frequency of query or find the location of the real source node.Meanwhile, cluster members encode their sensed data and send them to their cluster head.e cluster heads can correctly process data queries over encoded data without knowing their real values.And an adversary cannot know the query results in the cluster heads.In our scheme, we make a balance between data confidentiality and query efficiency.
e rest of the paper is organized as follows.Section 2 gives the related work and the previous proposed techniques for data query.In Section 3, we describe the system model and the security model.en, we present our secure and efficient cluster-based query processing scheme in Section 4. In Section 5, we present the security analysis and performance analysis.Finally, we have the conclusion in Section 6.

Related Work
Protecting querying region's privacy in wireless sensor networks has been drawn attention recently [2].In [2], a querying region transformation technique is proposed to fuzzy the target region of the query according to predefined transformation functions.
e transformation function maps one region into m regions so that the target region cannot be distinguished from the other uninteresting regions.Meanwhile, multiple transformation functions include uniform, randomized, and hybrid function.
A secure and efficient range query processing scheme is proposed in [6], called SafeQ.ey use the prefix membership verification and neighborhood chains to encode both data and queries such that a storage node can correctly process encoded queries over encoded data without knowing their values.e prefix membership verification converts the verifications of whether a number is in a range to several verifications of whether two numbers are equal.e neighborhood chains allow a sink to verify whether the result of a query contains exactly the data items that satisfy the query.e SafeQ scheme can preserve privacy and integrity for processing range queries in two-tiered sensor networks.
Privacy preserving range query has been widely studied in two-tiered wireless sensor networks.Many range query schemes are proposed to protect privacy of range queries.CSRQ [3] employs an encoding mechanism and encrypted constraint chain to preserve data privacy and query result integrity.In [11], Zhang et al. provided an efficient secure range query protocol.In their scheme, different sensor nodes have different hash functions to encode data items for the protection of data privacy, and the correlation among data is used for verification of result.
In [12,13], two optimized versions which verify query result completeness to reduce the communication overhead between sensors and storage nodes based on the bucketing technique are proposed.In their scheme, a bit map is broadcasted by each sensor node to the nearby sensors, which indicates which buckets have data.In each sensor node, the collected data items and the received bit maps are encrypted together.e sink can verify the completeness of the query result for a sensor by examining the bit maps.But the compromised storage nodes can estimate the values of data items by using the bucketing technique to achieve data privacy.
Privacy preserving max/min query schemes in twotiered sensor networks are proposed in [14][15][16][17], which use the prefix membership verification scheme to privately compute the maximum or minimum data item.But their schemes cannot be suitable for cluster-based sensor networks.e power and storage are limited in cluster heads.

Network and Adversary Models
3.1.Network Model.Sensor networks consist of a number of different types of sensor nodes that have been deployed to monitor environment or collect data and send information to the sink in an area.Nodes are organized into clusters.A cluster head is selected in each cluster to receive and query data from cluster members.In each cluster, every sensor sends data to its cluster head.e sink collects data with a lot of resources in storage, energy, and computation.
In this paper, we assume that sensor nodes are evenly deployed in the sensor network and do not move after being deployed.All of the sensors have roughly the same capabilities, power sources, and expected lifetimes.e users can access the sensor network by the sink.e sink translates a query from a user into multiple queries which are sent to the cluster heads.e cluster heads process the queries and return the query results to the sink.All query results are sent to the sink which changes all results into a final query result and sends the final result back to the user.When a user makes a query request, the sink will send query request to each cluster head.e cluster heads collect all results and send them back to the sink.e results are forwarded through certain routing strategies that adopted the sensor networks.

Adversary Model.
For various kinds of wireless sensor networks, we assume that an adversary is a motivated and funded attacker whose objective is to learn sensitive data information.e adversary has unbounded energy resource, adequate computation capability, and sufficient memory of data storage.e adversary can use the leaked sensitive data to threaten the sensor network, such as health monitoring networks.For a user's query, the adversary tries to generate fake message and send it back to the user.
Meanwhile, the adversary wants to gain the user's interests and the frequency of query in clusters.He wants to find the location information of queried nodes.
e adversary may stay nearby the cluster to monitor and eavesdrop constantly.When the adversary monitors a message in a cluster, he will know the location of a sensor node.If the 2 Journal of Electrical and Computer Engineering frequency of transmitted messages is large, the adversary will find that certain sensor node is important for the user.When the adversary compromises the sensor node, the compromised node will send fake data to the user.

Secure and Efficient Cluster-Based Query Processing Scheme
In this section, we propose a scheme for preserving privacy query processing in cluster-based sensor networks.Each cluster head collects the data from sensor nodes in a cluster.
To preserve privacy, sensor nodes encrypt or encode their collected data, for example, DES algorithm.So the adversary cannot gain the content of transmitted data.

4.1.
e Basic Idea.In order to preserve privacy query processing, we propose a secure and efficient cluster-based query processing scheme to address this problem in wireless sensor networks.After the sensor network is deployed, the cluster heads are randomly chosen.en the cluster heads broadcast their join messages.When a node firstly receives a join message from a cluster head, it will reply to the cluster head and join the cluster.e cluster head will record the sensor's ID.Meanwhile, the sink will record all cluster heads' ID.
However, if a cluster head has the less remaining energy, it will randomly select one of its members as the new cluster head in the cluster.And the new cluster head will record the ID of all members in the cluster.en, the sink will replace the ID of the previous cluster head with the ID of the new cluster head.
When a user wants to gain the value of a sensor node s i , he will make a query to the sink.e sink will send the query message to the cluster head which includes the sensor node s i .
e cluster head will randomly select several cluster members which include the real queried node s i and gain the sensed data from them.It is aimed for preventing the adversary from monitoring the real frequency of the query in a cluster.We assume that each sensor s i shares a secret key k i with the sink in a network.A sensor s i encrypts its sensed n data items d 1 , d 2 , . . ., d n using key k i in time slot t, the result of which is denoted as )) is true.en, the sink decides whether (d j ) k i should be included in the query result.Meanwhile, given E(d j ) and (d j ) k i , it is infeasible for the sink to compute d j (1 ≤ j ≤ n). is condition can guarantee query privacy.Figure 1 illustrates the basic idea of clusterbased query processing scheme.

Prefix Membership Verification.
We protect privacy query processing by using the prefix membership verification scheme which is first introduced in [8] and later formalized in [9].In the prefix membership verification scheme, the key idea is to convert the verification of whether a number is in a range to several verifications of whether two numbers are equal.A k − prefix is in the form of 0, 1 { } k ( * ) w−k , which has k leading 0s and 1s, followed by w − k * s.For instance, 101 * is a 3-prefix and it denotes the range [1010,1011].
A prefix family consists of w bits binary number ) is at most 2w − 2 [18], where d 1 and d 2 are two numbers of w bits.For example, S( [9,15]) � 1001, 101 * , 11 * * { }.We compute the prefix family F(x) of number x and translate the range In order to ensure whetherF(x) ∩ S([d 1 , d 2 ]) ≠ ϕ, we use the operations of verifying whether two numbers are equal.

Data Collection.
In order to preserve sensitive data, sensor nodes send the sensed data to cluster heads and sink by a secure way.We assume that b 1 and b 2 , respectively, denote the lower bound and the upper bound, the values of which are known to both sensors and the sink.And we assume that sensor s i collects data item d j (1 ≤ j ≤ n) at a time slot t, and each data d j is in the range [b 1 , b 2 ].When each sensor node s i collects data, s i sends the sensitive data by the following steps: (1) Sort the n data, b 1 , and b 2 in an ascending order.We assume b (  4) Compute the keyed-hash message authentication code (HMAC) [6,20] of each data item in numericalize prefixes using key g, which is shared by all nodes and the sink.An HMAC function using key g is denoted as (5) Encrypt every data item d i to (d i ) k i using key k i .(6) Sensor s i sends the following packet to its cluster head (CH): Because the HMAC function has the one-wayness and collision resistance properties, and data items are encrypted, the cluster head cannot obtain the real values of all data items.

Filter and Query
Processing.When a cluster head receives collected packets from cluster members in a cluster, the cluster head will filter the packet by sensors' id.In the query phase, the cluster head randomly selects several cluster members which include the real queried node s i and gain the sensed data from them.It is aimed for preventing the adversary from monitoring the real frequency of query in a cluster.erefore, in the submission phase, the cluster head needs to filter out the useless packets and obtain the real packet.en, the cluster head transmits the real packet to the sink.
In the sink, it firstly converts the query range [a, b] and computes prefix families F(a) and F(b).After the sink numericalize all prefixes as N(F(a)) and N(F(b)), it applies HMAC g to each numericalized prefix as HMAC g (N(F(a))) and HMAC g (N(F(b))).When the sink receives a packet from cluster heads, it will process the packet based on the query range [a, b] using the following theorem [21].eorem 1.Given n numbers sorted in the ascending order

that the following two conditions hold:
(1) HMAC g (N(F(a))) ∩ HMAC g (N(S([ According to eorem 1, the sink selects the smallest n 1 and the largest n Based on the aforementioned description, Algorithm 1 shows a secure and efficient cluster-based query processing scheme.When a user wants to gain and check whether sensed data of certain node is in the range [a, b] at a time slot t, the user will send a query message to the sink.And then the sink relays the message to a cluster head which include the queried node.In the cluster head, it randomly chooses several nodes which include the queried node.In the collection and submission phase, when the node collects the sensed data, it will process the data using the PMV and HMAC schemes.
en, the node sends the secure packet to the cluster head.After the cluster head filters out the useless packets, it sends the useful packet to the sink.Finally, the sink processes the packet and sends the final result to the user.

Performance Analysis
In order to protect privacy, we propose a secure and efficient query processing scheme to prevent an adversary from obtaining the sensitive data or finding the user's interests and location of sensor node in cluster-based sensor networks.In this section, we present the privacy analysis and communication overhead analysis.From the following analysis, we can see that our scheme brings a better network security and minimal communication overhead.

Privacy Analysis.
For privacy of collected data, according to the data collection phase, sensor nodes convert the collected data by using encryption and HMAC scheme.So, the submitted information is not plaintext but encrypted and HMAC data.e HMAC function has one-wayness and collision resistance properties.And sensor nodes only share the secret key with the sink and encrypt sensitive data by the key.erefore, it is computationally infeasible for cluster heads to obtain the value of d i .It is difficult for the cluster head to break the privacy and gain the encryption and HMAC data.So, our scheme can efficiently protect collected data items.
For privacy of the query result, the sink obtains the query result by comparing the HMAC data items.For the HMAC data items and encrypted data, it is difficult for the adversary for computing and obtaining the values of the query result without keys.So, we can preserve the query result which is securely transmitted to the user.
For privacy of user's interests and location privacy of sensors in clusters, our scheme can efficiently preserve privacy information to prevent an adversary from monitoring user's interests and find the location information of sensors.And the adversary cannot use the content to trace the routing.We assume that an adversary monitors a local area with the intention of finding the interests of the user.We assume that each cluster has M members.e adversary wants to identify a set D T ⊂ M of nodes which represent the set of possible location in the local area.
ere is a close relationship between the analysis of query frequency of the adversary and the location privacy.When the adversary analyzes the query frequency uncertainly, it is secure to preserve the location information.In the eavesdropping area, the adversary will need to select the nodes of his analysis.We assume that the possible sensor nodes in D T include queried nodes which send data to cluster head.If the size of D T is very large, the adversary will find it difficult to analyze the user's interests.So, it is useful for preserving privacy information.
Let D P be the set of the protected nodes.We use information-theoretic metric, called entropy [22], to measure the privacy protection provided by our scheme.e entropy of identifying the queried node in the wireless sensor network is defined as where P i is the probability that node i is the queried node, (2) e entropy represents the adversary's uncertainty about the user's interests and the location of sensors in a wireless sensor network.When the adversary believes that the nodes have the same probability to be the queried node in a cluster, the entropy is maximum value.Let D * T be the set of all nodes in a cluster and |D * T | � M. We know that the size of D T can influence the level of privacy.erefore, the entropy is (3) Figure 3 shows the relationship between the level of privacy and the different number of nodes in a cluster.When M increases, the level of privacy (c) is higher.is is due to the increased number of nodes in a cluster.e probability that the adversary finds the protected queried nodes is decreased.So the level of privacy (c) increases.For the same M, when M is greater than 20, we can see that the number of protected queried nodes (r) increases, the level of privacy (c) increases.

Energy Consumption Analysis.
In cluster-based sensor networks, sensor nodes have limited energy resource.In this section, we discuss the energy consumption of sensor nodes in our scheme.In each phase, the total energy consumption includes the communication cost and computation energy consumption.We assume that the energy consumed by transmitting and receiving a data are e t and e r .And we assume that a cluster head randomly chooses N r nodes to query.
In the data collection phase, sensor nodes will have extra computation overhead to preserve privacy of sensitive data.Given a range [d  (10) selected_nodes � SelectNodesRandom(query_node_id); (11) for node in selected_nodes then (12) d i � CollectData(); (13) Sort Compute prefixes and numericalize all prefixes as Compute the HMAC as HMAC g (N(S([b   We assume that each HMAC data is z H bits and encrypted data is z D bits.Let H hop be the hop between a sensor node and a cluster head.In our scheme, H hop � 1.In a cluster, the energy consumption E dc is (4) In the query processing phase, the sink node computes the range [a, b] and converts prefix families F(a) and F(b).For each value in the w bits, there are w + 1 HMAC data items.So, the sink can perform at most 2(n + 1)(2w − 2)(w + 1) comparisons.e energy consumption E qp is E qp � 2(n + 1)(2w − 2)(w + 1) + e t + e r   • H hop � 2(n + 1)(2w − 2)(w + 1) + e t + e r .
(5) erefore, the total energy consumption E total is According to (3) and ( 6), Figure 4 shows the total energy expended in the systems as the prefix number bits from 8 bits to 32 bits and the energy expended in the level of privacy increases from 1 to 45, for the scenario where each HMAC data and encrypted data are 256 bits.We assume that each sensor collects 100 data items at each time slot.And we assume that the energy consumed by transmitting and receiving a data are 1.Each cluster head includes 100 sensor nodes. is shows that when the prefix number bits is the same, the higher level of privacy can increase the energy of the whole sensor network.

Conclusions
Wireless sensor networks have been widely deployed in many applications and drawn more attentions.It is an important problem to preserve the privacy of sensitive data in cluster-based query processing in wireless sensor networks.In this paper, we propose a secure and efficient scheme to protect query processing in cluster-based sensor networks.In order to preserve privacy, sensed data items are encrypted to prevent cluster heads from obtaining the content of data.We use the prefix membership verification method to query the result without plaintext data.Meanwhile, we use anonymity method to confuse adversaries and prevent adversaries from analyzing the user's interests and finding location of the queried node.en, we perform the privacy analysis and energy consumption analysis.
Moreover, s i sends the message that includes the encrypted data (d j ) k i and the encoded data E(d 1 , d 2 , . . ., d n ) to its cluster head.e cluster head transmits the message to the sink.When the user wants to perform query [a, b] { }, the sink encodes the range [a, b] as G([a, b]).en, the sink applies a secret comparing method C(E(d 1 , d 2 , . . ., d n ), G([a, b])) to be used for query processing over encrypted and encoded data.A data d is in range [a, b] if and only if
T | is the number of uncertain nodes by the adversary, and  |D T | i�1 P i � 1. erefore, the probability P i of any sensor nodes in D T being queried nodes can be estimated by |D P |/|D T |. en, we denote the size of D T as R(|D T | � R).And let r be the size of the protected queried node's set (|D P | � r).And we define the privacy as c � − (N(F(a))), HMAC g (N(F(b))); (23) GetQueryData(packet, HMAC g (N(F(a))), HMAC g (N(F(b)))); (24) end if ALGORITHM 1: Secure and efficient cluster-based query processing.

Figure 3 :
Figure 3: e different number of nodes in a cluster.

Figure 4 :
Figure 4: e relationship between energy consumption and the level of privacy.