Data aggregation is an essential operation to reduce energy consumption in large-scale wireless sensor networks (WSNs). A compromised node may forge an aggregation result and mislead base station into trusting a false reading. Efficient and secure aggregation scheme is critical in WSN applications due to the stringent resource constraints. In this paper, we propose a method to build up the representative-based aggregation tree in the WSNs such that the sensing data are aggregated along the route from the leaf cell to the root of the tree. In the cinema of large-scale and high-density sensor nodes, representative-based aggregation tree can reduce the data transmission overhead greatly by directed aggregation and cell-by-cell communications. It also provides security services including the integrity, freshness, and authentication, via detection mechanism in the cells.
Wireless sensor networks (WSNs) have been used in many promising applications such as habitat monitoring, battlefield surveillance and target tracking. A larger number of tiny sensors collect measurement data and send them to processing center, which is usually called base station or sink node. However, the communication between sensors and processing center relies on multihop short range radio. As we know, WSNs also suffer from limited energy lifetime, slow computation, small memory, and limited communication capability. Obviously, the data aggregation can greatly reduce the communication consumption by eliminating redundant data in WSNs. It is known that aggregated traffic, modeled as fractal time series with complex characteristic [
On the other hand, the sensors even the aggregators are vulnerable to attacks especially if they are not equipped with tamper-resistant hardware. When a sensor or an aggregator is compromised, it is easy for the adversary to inject bogus data into WSNs and change the aggregation results. Some methods have been proposed to solve the problem above. The works [
In this paper, we propose a method to build the representative-based aggregation tree in the WSN on the basis of the work in [
The rest of this paper is organized as follows. Section
We assume that the dimensions of the large deployment area are known in advance and the sensor nodes are uniformly distributed in this area. A grid structure is used to divide the target terrain into small nonoverlapping cells of equal areas as Figure
Network Model [
We assume that each node is aware of the dimension and the location of the cell to which it belongs. It is a reasonable assumption since the sensors with locating system are supported by most of current manufactories. It could also be deduced that the sensor node can judge which cells are its neighbor cells.
In our model, each cell has a cell representative which is selected based on its reputation, remained power, and so forth. A monitoring mechanism similar to Watchdog [
Cells covered by radio range.
The base station is responsible for broadcasting the initial query to the monitoring area, processing received answers for these queries, and deriving meaningful information that reflects the events in the target field [
The following notations are used throughout the paper.
hop_count: The count of the cells on the route to the base station.
In a large-scale target terrain, the data aggregation may occur in any corner of network. The aggregating operation is also graduated up to the quietist. Obviously, the directed forwarding and aggregation along a steady skeleton have more advantages considering eliminating the duplicated sensing data and reducing energy consumption. Moreover, building such a steady skeleton in a hop-by-hop manner may not be a good choice in the situation of large-scale and high-density sensor nodes, which would lead to a deep and complex structure of the aggregation skeleton.
In our scheme, the queries from base station are spread along the cell by cell route. We build up an aggregation tree based on the cell. The aggregation data could be directionally delivered to the destination along nonoverlapped cells to avoid duplicated aggregation. The aggregation operations are conducted on each intermediate cell in the tree if necessary. A representative sensor in each cell acts in name of the whole cell, including forwarding and aggregation of the sensing data in its cell and the receiving data from the neighbor cells. Other sensor nodes in the same cell monitor the behavior of their representative by listening to the communication between their representative and the representatives of the neighbor cells.
As the cheating detection mechanism is not the emphasis of our discussion, we build up the tree on the basis of the work in [
The bootstrap phase occurs in a short duration of time immediately after the network deployment. It is short enough to assume that no attacks are possible during this phase [
In this phase, each sensor node in the cell
After that, each node in the cell
At the end of this phase, each sensor node deletes
To enhance the accuracy of the aggregated data without trimming the abnormal and bogus reading, the cheating detection mechanism based on the reputation proposed in [
Since the local and intercell keys have been set up in the network after the bootstrap phase, the behavior of each node is under the detection of all the nodes in the same cell, including the cell representative. As soon as the reads of the cell departure the judgments of t nodes, the representative is responsible for computing the new cell reading. Each node establishes a reputation table to record the amount of positive and negative rating of every behavior of the other nodes in the cell. As soon as the reputation of the representative falls below a certain threshold, the revocation mechanism is triggered to generate a new representative based on the reputation records.
Before building RAT, we introduce the packet formats in the whole working phase.
The packets have the following two formats:
Now, we propose a distributed algorithm to build RAT along the route of neighbor cells. The distributed algorithm builds the tree from the base station and includes the following steps.
The base station locally broadcasts an invitation message to all of its neighbor cells, indicating that they should be its children. Since the base station is the root of the tree, it has no parent and its hop count is zero. The invitation message from the base station is described as the following format:
A node
Once the node
The node
It is possible for a node to receive more than one invitation message. The node just takes the first invitation message as the active invitation due to the first invitation message would have the minimal hop count value normally. Once a node joins the tree, the later received invitation will be recorded for future use if its hop counts value is not bigger than the node’s current hop counts value.
The parent node will record its children by collecting the join messages. A cell is a leaf cell if it does not receive any join messages from any cell which announces to be its child.
Repeat Steps
The representative-based Aggregation tree.
An aggregation process begins when
The data aggregation process includes the following two phases.
In the process of query spread, the intermediate cell propagates the query
The representative
When a leaf cell
If a node x is selected to report the sensing data in the last phase, it should report its reading to
When an intermediate cell receives the messages from its leaf, it verifies the
The data aggregation process also can be illustrated by Figure
The security requirements of data integrity, freshness, and authentication are achieved for the aggregation data in our scheme, since nodes share interkeys with the neighbor cells. As to the query message and the communication within the cell, the nodes in the same cell of the sender can authenticate it instead of the receiver, since nodes share local cell key in each cell. In fact, each monitoring node in the same cell can select some query messages randomly to low the energy consumption and prolong the lifetime of the cell. As the adversary is strong or the application is critical, the confidentiality could be achieved by encrypting the sensing data or aggregation data could be encrypted using inter/local keys.
Since the cell in RAT only communicates with the neighbor cells, the transmission distance is a constant value for each communication and the energy cost on data transmission is mainly decided by the amount of data transmitted. So, we will discuss the data transmission volume of one response to a query in the RAT. For one response to query, two times of data transmissions are required in one cell. One is that the sensor node reports the sensing data to the representative of the cell. The other is that the representative of the cell reports the aggregation data up to its parent in the tree.
For the data aggregation function with fixed output size, such as min/max, the energy cost on data transmission with any aggregation tree is
For some aggregation functions, the size of the return value is not fixed. It is a function of the total size of input data. We assume that such aggregation functions have fixed compression ratio of
Without losing generality, we still assume that
Compared with the aggregation tree built on the hop-by-hop nodes in [
The present results discussed above are assumed to be time invarying within a given interval of time, say
Note that a payload has complicated dynamics. The dynamics of traffic payload is strongly related to the selected time scale as can be seen from [
In future, we shall work on the statistics of the present algorithm from a view of nonlinear time series, which is challenging.
In this paper we have proposed a method of establishing the representative-based aggregation tree in WSN, where the network is divided into equal and nonoverlapping cells. In the cinema of large-scale and high-density sensor nodes, representative-based aggregation tree can reduce the data transmission overhead greatly by directed and cell-by-cell aggregating and forwarding. We have given the quantitative analysis of data transmission in the representative-base aggregation tree. At the same time, the monitoring mechanism in the cells prevents the injection of bogus information and forged aggregation values. In the future work, the problems which should be studied further are how to synthetically analyze the aggregated traffic in WSN from the aspect of fractal time series, including the traffic in RAT, to make further view of their characteristic of dynamics and nonlinearity.
This work was supported by the National High Technology Research and Development Program (863) of China under Grant no. 2009AA01Z418 and National Natural Science Foundation of China under Grant no. 60573125, no. 60873264 and no. 60903188.