Minimum Cost Data Aggregation for Wireless Sensor Networks Computing Functions of Sensed Data

We consider a problem of minimum cost (energy) data aggregation in wireless sensor networks computing certain functions of sensed data. We use in-network aggregation such that data can be combined at the intermediate nodes en route to the sink. We consider two types of functions: firstly the summation-typewhich includes sum,mean, andweighted sum, and secondly the extremetype which includes max and min. However for both types of functions the problem turns out to be NP-hard. We first show that, for sum andmean, there exist algorithms which can approximate the optimal cost by a factor logarithmic in the number of sources. For weighted sum we obtain a similar result for Gaussian sources. Next we reveal that the problem for extreme-type functions is intrinsically different from that for summation-type functions. We then propose a novel algorithm based on the crucial tradeoff in reducing costs between local aggregation of flows and finding a low cost path to the sink: the algorithm is shown to empirically find the best tradeoff point. We argue that the algorithm is applicable to many other similar types of problems. Simulation results show that significant cost savings can be achieved by the proposed algorithm.


Introduction
Motivation.In this paper we consider the problem of minimum cost (energy) data aggregation in wireless sensor networks (WSN) where the aggregated data is to be reported to a single sink.A common objective of WSN is to retrieve certain summary of sensed data instead of the entire set of data.The relevant summary is defined as a certain function applied to a set of measured data [1].Specifically we are given a function (⋅) such that, for a set of measurement data  1 , . . .,   , the goal of the sink is to retrieve ( 1 ,  2 , . . .,   ).Examples of (⋅) are mean, max, min, and so forth.When mean function is used, ( 1 , . . .,   ) = ∑  =1   /.For applications such as "alarm" systems, one can use max as (⋅), for example, ( 1 , . . .,   ) = max =1,..., [  ] where   can be temperature values in forest-fire monitoring systems or the structural stress values measured in a building.We will refer to (⋅) as a summary function throughout this paper.Certain types of (⋅) allow us to combine data at the intermediate nodes en route to the sink.Such combining techniques are commonly referred to as in-network aggregation [2][3][4].By using in-network aggregation one can potentially save communication costs by reducing the amount of traffic [5][6][7].For instance, in the applications such as wireless multimedia sensor networks (WMSN) where the transmitted multimedia data has a far greater volume than that in typical WSNs, the in-network aggregation technique is crucial for the purpose of saving energy and extending network lifetime [8,9].While in-network aggregation offers many benefits, it poses significant challenge for network design, for example, designing routing algorithms so as to minimize costs such as energy expenditure and delay.In particular, we show that it is crucial to take into account how the summary function (⋅) affects the statistical properties of sensed data.
Objectives.In this paper we study the minimum cost aggregation problem for several types of (⋅).The performance of in-network aggregation relies heavily on the properties of the function (⋅).To be specific let us briefly look at the problem formulation.Consider the single-sink aggregation problem where we define the cost function as follows.Let  denote the set of links in the network.We would like to minimize where   represents the weight associated with link  and   represents the average number of bits transmitted over .Note that the objective similar to (1) has been considered in [10][11][12][13][14] as well.The most relevant objective associated with (1) is the energy consumption.To see this, let us define weight   :=    −  where   is the distance between nodes connected by Link ,  is the path loss exponent, and   is the related channel parameter.Hence (1) is proportional to the total transmitted energy consumed throughout the data aggregation.Note in [13,14], the authors consider the same energy cost function.We refer to   as the aggregation cost function (we will use notation  to denote the cost function in general, whereas   is used to denote the cost function specifically on Link ).Note that   depends on the source measurements aggregated on , and also on (⋅) which is the summary function applied to the measurements.The work in [15] also studies an aggregation problem in sensor networks computing summary functions, assuming that all the packets generated in the network have the same size.However, the amount of information generated at intermediate nodes may vary, since a summary of data can be statistically different from the original data, which is our key observation.Let us take an example.Consider the network in Figure 1 where Nodes 1 and 2 are the source nodes, and the node in shaded color represents the sink.The sink wants to receive a summary of information from Nodes 1 and 2. The sensor readings generated at Nodes 1 and 2 are represented by the random variables (RV)  1 and  2 , respectively.Since Node 1 is a "leaf " node, Node 1 will simply transmit the raw reading  1 to Node 2. Node 2 will combine  1 with its own data,  2 , by computing the summary function ( 1 ,  2 ) which is then transmitted to the sink.We define the aggregation cost function  as follows.Suppose the sensor information to be transmitted on Edge  is random variable .The average number of bits to be transmitted on , or   , is defined as (We temporarily ignore communication overheads incurred in addition to the sensor information, e.g., the packet header size.We will however take such overheads into account later when we formally define .) where (⋅) denotes the entropy function.Note that the entropy function has been also adopted as cost function in [10,12], and throughout this paper we will define  in terms of (⋅).The average numbers of bits transmitted on Edges 1 and 2 are, respectively, given by  1 =  ( 1 ) ,  2 =  ( ( 1 ,  2 )) .
Suppose (⋅) is given by sum.Since (( 1 ,  2 )) = ( 1 +  2 ) ̸ = ( 1 ), the costs incurred at Edges 1 and 2, that is,  1 and  2 , are different.If we had used other types of (⋅), such as max, we would have that  2 = (max( 1 ,  2 )) which would incur different cost from the case where (⋅) was sum.In many cases we will assume symmetric sources; that is,  depends only on the number of sensor readings to which (⋅) is applied.In those cases we will treat  as a function  : Z + → R + ; that is, () = (( 1 , . . .,   )) (we will also examine the cases of asymmetric sources as well).We will show that (⋅) determines the properties of (⋅) such as convexity and monotonicity, and the structure of the aggregation problem heavily depends on those properties.Hence the aggregation scheme must be designed to capture key aspects of aggregation cost functions under the given summary function.The abovementioned links among summary functions, cost functions, and optimal aggregation strategies have not been previously well studied, as we will see in Section 2 through reviewing related works.

Contributions.
In this paper we investigate the minimum energy aggregation problem for several widely used summary functions.We consider two types of (⋅).The first type is called the summation type which involves sums of measurements: specifically sum, mean, and weighted sum.The second type is called the extreme type which is related to the extreme statistics of the data: specifically max and min.We will use the entropy function as the measure of information rate.We show that, when (⋅) is sum or mean, and if the source data is i.i.d.,  is indeed concave and increasing, irrespective of the distribution of the source data.This implies that one can use well-known algorithms such as the Hierarchical Matching (HM) algorithm [16] in order to approximate the optimal cost.When (⋅) is weighted sum however, it is unclear how we make association between the flow aggregation problem and the cost function.Nonetheless we prove that, if the source data is independent Gaussian random variables, there exists an efficient algorithm for the problem of aggregating weighted sum of data with arbitrary weights.
Next we consider extreme type summary functions such as max.We will show that for certain distributions of source data,  can be convex and decreasing in the (nonzero) number of aggregated measurements.Note that the singlesink aggregation problems for concave/increasing [16][17][18][19][20] or convex/increasing cost functions [21,22] have been widely studied, however convex and decreasing  has not been well studied yet.We propose a novel algorithm which effectively captures such properties of .We begin by observing that there are two aspects in cost reduction as follows.Since  is convex and decreasing,  decreases faster when the number of aggregated data is smaller.The intuition is that it pays to locally aggregate data among nearby sources in the early stages of aggregation, that is, when the number of measurements aggregated at sensors is small.This leads us to find a low-cost local clustering of sources, which is a "microscopic" aspect of cost reduction.Meanwhile we need to simultaneously find a low-cost route to the sink, which must take the global structure of the network into account and thus is a "macroscopic" aspect of cost reduction.These are conflicting aspects and a good tradeoff point between them should be sought.To that end we propose Hierarchical Cover and Steiner Tree (HCST) algorithm.The algorithm consists of multiple stages and is designed to empirically find the best tradeoff point over the stages.We show that, by simulation, the algorithm can significantly reduce cost compared to baseline schemes such as a greedy heuristic using shortest path routing, or the HM algorithm.
Our results show that the summary function (⋅) can significantly impact the design of aggregation schemes.However there are many choices for (⋅): suppose for example, we would like to compute   norm of the vector of measurement data.Note sum and max functions which we study in this paper are in fact related such that, if the measurement data is always positive, then sum function is simply  1 norm and max is  ∞ norm of a data vector.One could ask: what are good aggregation strategies if we take (⋅) as a different   norm, say  2 norm?We leave such questions as future work.
Paper Organization.We briefly review related work in Section 2. Section 3 introduces the model and problem formulation.Sections 4 and 5 discuss the optimal routing problem for summation and extreme type summary functions, respectively.Simulation results are presented in Section 6. Section 7 concludes the paper.

Related Work
In general the single-sink aggregation problem to minimize (1) is NP-hard [23], and a substantial amount of research has been devoted to designing approximated algorithms depending on certain properties of .In our case it is important to note that such properties of  are determined by the choice of (⋅).Let us briefly review the related work on the single sink aggregation problem for two types of .Most research on the single sink aggregation problem has focused on the case where  is concave and increasing.Due to the concavity of , the link costs associated with the amount of aggregated data exhibit economies of scale, that is, the marginal cost of adding a flow at a link is cheaper when the current number of aggregated flows is greater at the link.Buy-at-bulk network design [23,24] is based on such property of .A number of approximation algorithms have been proposed, for example, [17][18][19].When  is known in advance, a constant factor approximation to the optimal cost is possible [20,25].Even when  is unknown but is concave and increasing, Goel and Estrin [16] have proposed a simple and randomized algorithm called Hierarchical Matching (HM).The algorithm computes minimum weight matchings of the source nodes hierarchically over stages, and outputs a tree for aggregation.HM algorithm can approximate the optimal cost by a factor logarithmic in the number of sources [16].Nonuniform variants of this problem such that  differs among the links are also studied [26,27] in which a polylogarithmic approximation to the optimal cost is shown to be achievable.
The case where  is convex and increasing in the number of aggregated measurements has been studied in [21,22].Here  exhibits (dis)economies of scale, that is, the marginal cost of routing a flow at a link is more expensive when a greater number of flows are aggregated at the link.Such phenomenon can be observed from many applications, such as speed scaling of microprocessors modeled by () =   where  is the clock speed,  ≥ 0 and  ≥ 1 are constants, and () is the energy consumption at the processor.Notably the authors show that the problem can intrinsically differ from that for a concave and increasing .For example the authors show that constant-factor approximation algorithms do not exist for certain convex and increasing  [21].They nevertheless proposed a constant factor approximation algorithm for the case () =   .These results show that the single-sink aggregation problem crucially depends on certain properties of  such as convexity.However, none of the above works deal with convex and decreasing  which we will study in the sequel.
There have been many studies regarding the intermediate data combining in conjunction with routing in order for an efficient retrieval of the complete sensor readings.Scaling laws for achievable rates under joint source coding and routing are studied in [28].The work [11] studies the problem of minimizing the flow costs under distributed source coding.They show that when () is linear in , firstly applying Slepian-Wolf coding at the sources, and secondly routing coded information via shortest path tree from the sources to the sink is optimal.In [10] a single-input coding model was adopted in which the coding of information among the nodes can be done only in pairs, but joint coding of the source data from more than two nodes is not allowed.Assuming reduction in packet size is a linear function of correlation coefficient between each pair of nodes, they proposed a minimum-energy routing algorithm.The impact of spatial correlation on routing has been explored in [12].They showed that, assuming the correlation decays over distance, it pays to form clusters of nearby nodes and aggregate data at the clusterheads.The aggregated information is then routed from clusterheads to the sink.The algorithm is shown to perform well for various correlation models.The tradeoff between integrity of aggregated information and energy consumption has been studied in [29].Further works on in-network aggregation combined with routing include [30,31] which propose efficient protocols for routing excessive values among sensed data.A scheme using spatially adaptive aggregation so as to mitigate traffic congestion was proposed in [32].
The above works aim at retrieving the entire set of data, instead of a summary, subject to certain degrees of data integrity.In our case, we design energy efficient aggregation schemes to compute the summary function (⋅) of the sensor readings.Also in the above mentioned works, the in-network aggregation reduces cost mainly by removing correlation among the data set.In our work, by contrast, we will focus on losslessly retrieving a summary of statistically independent sensor readings.We assume the independence of sensor readings because we would like to decouple the cost savings from removing correlation, and the savings from applying the summary function in association with aggregation strategies; we focus on the latter.Moreover, the assumption of the independence among the readings represents the "worst" case in terms of cost savings, since one cannot reduce the energy cost by removing correlations in sensor readings.In fact, the independence assumption can be valid in certain cases.For example, consider a large sensor network assuming that the sensed data is spatially correlated and the correlation decays quickly over distance.If the source nodes are sparsely deployed and thus tend to be far apart from one another, the correlation among their data can be very weak.Obviously such sparse node placement is motivated by cost efficiency: sparse placement of nodes enables us to reap as much information given a fixed number of sensor devices, assuming that the network senses a homogeneous field and the measure of information is given by the joint entropy function.

Model
3.1.Preliminaries.We are given an undirected graph  = (, ) where  = {1, 2, . . ., } and  ⊆  ×  denote the set of vertices and edges, respectively.For , V ∈ , (, V) ∈  denotes the (undirected) edge connecting nodes  and V.For each edge in  we associate a weight defined by  :  → R + .A weight captures the cost of transmitting unit amount of data between two nodes, for example, expenditure of transmission energy in order to compensate path loss.The set  ⊂  denotes the set of source nodes, that is, the nodes which generate measurement data to be reported to the sink.Also define  := || where | ⋅ | denotes the cardinality of a set.For a source node  ∈ , its measured data is modeled by an RV denoted by   .We assume that   's are independent and identically distributed among the sources.The measured data is to be aggregated at the sink node denoted by  ∈ .The nodes which are not source nodes act as relays in the aggregation process.For simplicity we will assume that any node in the network transmits data at most once during the aggregation process.Such an assumption has been made in other works such as [15].Thus the routes for aggregation constitute a tree whose root is given by .We refer to such tree as an aggregation tree.The aggregation process is performed as follows.The sources initiate transmissions.An intermediate node waits for all the data from the sources which are descendants of the node to arrive.Next the node computes the summary function of the aggregated data which is then relayed to the next hop.
In this paper a summary function is defined to be a nonnegative function denoted by (⋅) which is a divisible function.Divisible functions are a class of summary functions which can be computed in a divide-and-conquer manner [1].Divisible functions are defined as follows: given  data samples, consider a partition of the samples into sets of size  and  −  denoted by { 1 ,  2 , . . .,   } and { +1 , . . .,   }, respectively.If (⋅) is divisible, (( 1 , . . .,   ), ( +1 , . . .,   )) = ( 1 ,  2 , . . .,   ) holds for any  and .Examples of divisible functions are sum, max and min.Particularly when (⋅) is divisible, the aggregation can be performed in a divide-andconquer manner as follows.Suppose a set of data samples are aggregated at a node.If the node is a source, it applies (⋅) to the collected samples and its own data.If the node is simply a relay, it applies (⋅) to the aggregated data samples to obtain a summary of the samples where the summary of its aggregated data is transmitted to the next hop.
Abusing notation for the sake of simplicity, we let the function (⋅) take a set, a vector, or their combination as its argument.For example if (⋅) is sum, (, ) =  + , and ({, }) =  + , and also ({, }, ) =  +  + .For some  ⊆ , we define   as the set of RVs representing the measurements from the nodes in ; that is, 3.2.Problem Formulation.We will define the problem of minimizing communication costs as follows.There exists a sink to which the data is to be aggregated.Our goal is to find a minimum-cost aggregation tree T = (  ,   ) ⊆  rooted at the sink.We would like to solve the following aggregation problem: where   represents the average number of bits communicated over Edge .Note that the objective of ( 5) has been considered in the works [10,12] as well.We call   as aggregation cost function which we define as follows.
We will use the entropy function (⋅) as our measure of information rate similar to works [13,14].We assume that the average number of bits to represent random sensor measurement  is given by ().A precise definition of the entropy function () depends on the nature of : if  is a discrete RV, () denotes the usual Shannon entropy.If  is a continuous RV, () is implicitly defined to be ( X) where X is a discrete RV obtained by applying uniform scalar quantization to  with some quantization step size, say 2 − for some integer  > 0. If the quantization precision is sufficiently high, it is known [33] that ( X) ≈ +ℎ() where ℎ(⋅) denotes the differential entropy of continuous RVs.Note that a similar approximation has been made in defining the information rates for continuous RVs in [13,14].Hence in this paper, we will assume that continuous RV  incurs the cost of  + ℎ() bits where  > 0 is a sufficiently large parameter, and we denote such costs by () :=  + ℎ().
In addition, the measured data is transmitted as a packet in the network.Hence for each packet transmission, there is an overhead of metadata, for example, packet header.For any measurement , no matter how small (), there is always an overhead of transmitting such metadata in practice.We will assume the header length is fixed to  > 0 bits throughout this paper.Hence the average number of bits required to send measurement information  per transmission over a link is given by For a given aggregation tree T = (  ,   ), let () ⊂   denote the path from a source  ∈  to the sink.For a given Edge  ∈ , let () ⊆  denote the set of source nodes whose aggregated measurements are transmitted over , that is, () = { ∈  |  ∈ ()}.The information to be communicated over Edge  is the function (⋅) applied to the set of measurement value from (), that is, ( () ).Hence we define the aggregation cost function as follows: We would like to solve (P) using the definition of   given by ( 7).In the following sections we investigate several widely used summary functions and the associated optimal aggregation problems.

Aggregation Schemes for Summation-Type Summary Functions
We consider the summary functions of sum, mean, and weighted sum.
4.1.sum and mean.We first discuss the case where (⋅) is sum.
We have that Clearly sum is a divisible function.Thus the aggregation process is as follows: a node simply applies sum function to the aggregated data, and relays the aggregated information to the next hop.When the source data is i.i.d., we will show that there exists a randomized algorithm which finds an aggregation tree whose expected cost is within a factor of (log() + 1) of the optimal cost to (5).Proposition 1. Suppose   's are i.i.d.For any distribution of   , there exists an algorithm yielding the mean cost within a factor of {log() + 1} of the optimal cost of (P).
Goel and Estrin [16] studied a single-sink data aggregation problem as follows.A source generates a unit flow which needs to be routed to a sink where the flows are aggregated though a tree.Their objective is to minimize the following cost function: where   is the weight on Edge ,   is the number of flows on Edge  and  : R + → R + is a function that maps the total size of flow to its cost.They proposed an algorithm to minimize (9) when  is a canonical aggregation function defined as follows.
Their algorithm, called Hierarchical Matching (HM) [16], guarantees the mean cost to be within the factor of log() + 1 of the optimal irrespective of , provided that  is a CAF.As mentioned previously, since   's are i.i.d.,   depends only on   := |()|.Specifically we will define  as follows: We will show that (⋅) is a CAF by showing that (⋅) satisfies the three properties of Definition 2. Note this implies that HM algorithm can be used to approximately solve (P), since (9) and the objective of (P) are identical.
Proof of Proposition 1.For the first property, it trivially holds that (0) = 0.For the second property, for any two independent RVs  1 and  2 , it is known that ( 1 +  2 ) ≥ ( 1 ) implying that (2) ≥  (1), that is, the sum of independent RVs always increases entropy [33], which implies that () is increasing in .For the third property, consider the following.It is shown in [34] that the entropy of the sum of independent RVs is a submodular set function.
That is, the following holds for independent RVs  1 ,  2 and  3 [34, Theorem I]: Now consider  + 2 sensor measurements  1 , . . .,  +2 , and make substitutions  1 :=  +1 ,  2 :=  +2 , and  3 := ∑  =1   in (11).We have that If we apply the definition of  given by ( 10) to (12), the following holds due to symmetry: Hence ( + 2) − ( + 1) ≤ ( + 1) − () holds, or the slope is decreasing in , which implies that (⋅) is concave on the domain of integers.Thus (⋅) satisfies all the properties of Definition 2, and is a CAF.This implies that, by using HM algorithm, one can achieve the expected cost which is within the factor of 1 + log() of the optimal cost of (P).

Journal of Sensors
Next we consider mean as the summary function.Note that mean, as well as weighted sum considered in the next section, are not divisible functions in general.We will nevertheless show that the problem for those summary functions can be reduced to sum problem as follows.Suppose every source node is aware of the total number of the sources, that is, .In our scheme every source simply scales its measurement by  −1 prior to transmission, that is, Source  transmits  −1   , then such scaled measurements are aggregated in a similar way as the sum problem.The average number of bits transmitted over Edge  can be written as  + {∑ ∈() ((1/)  )}.Since ((1/)  )'s are i.i.d., for the minimum cost aggregation problem for mean we can use the same algorithm as that used for sum, for example, HM algorithm.

weighted sum.
Next we consider the case where (⋅) is weighted sum as follows.We assign arbitrary weights   ,  ∈ , to the source nodes.The goal of the sink is to compute ∑ ∈     .Our method of aggregation is similar to that for the case of mean, that is, Source  scales its measurement by   , then transmits     where the aggregation process is the same as that for sum.However the effective source data     seen by the network is no longer i.i.d., unless   's are identical for all  ∈ .The aggregation cost function is given by The difficulty lies in it is difficult to associate a "flow" with the source data     due to asymmetry, that is, the problem is no longer a flow optimization.Moreover, it is easily seen that ( 14) is not a CAF in general.Thus we restrict our attention to a specific distribution of   .We will show that, if   are independent Gaussian RVs, the problem for weighted sum is indeed a single-sink aggregation problem with concave costs, and there exist algorithms similar to HM algorithm which have good approximation ratio.Specifically we prove that our problem is equivalent to the single-sink aggregation/flow optimization problem with nonuniform source demands.
Proof.Consider the information communicated over Edge  denoted by : Since   's are independent Gaussian RVs,  is also Gaussian with variance  2  where  2  := ∑ ∈()  2   2  .Thus the differential entropy of  is given by We observe that, from ( 16), we can treat  2   2  as the "flow" generated by Source , and the sum of flows at Edge  incurs the entropy cost as in (16).Specifically we will make the following definitions: () : Here   represents the (unsplittable) flow demand generated by Source , and  * denotes the minimum demand.Hence under a flow routing scheme, the total amount of flow at Link  is given by ∑ ∈()   .Then from ( 16), the associated communication cost incurred at Link  is given by (∑ ∈()   ) bits, that is, (⋅) represents the information rate of a flow aggregated at Link .Unlike the previously defined cost functions,  is no longer a function of the number of sources on a link, but instead the function of the amount of flow on that link.Finally we define the aggregation cost function  in terms of  as in (20) in order to meet the concavity condition for  as follows:  is essentially identical to , and if one can show that () is concave and increasing for all  ≥ 0.
Hence under the condition (21),  is an increasing concave function of the total flow aggregated on a link.In that case we can use the algorithm proposed by Meyerson et al. [19] which essentially extends the HM algorithm to the problems with nonuniform source flow demands, and can approximate the optimal cost by a factor of log() + 1 on average.
In summary, the key question was whether (P) can be cast as a flow aggregation problem, if (⋅) is weighted sum.In general, it is difficult to make such association due to asymmetry; however, we revealed that such formulation is possible for independent Gaussian sources.

Discussions.
Note that some properties regarding   's such as the submodularity relation in (11), used to show that  is a CAF rely heavily on the independence of   's.When   's are correlated, we can find examples of  which are not CAF for the summary function of sum as follows.Let  1 and  2 be jointly Gaussian with the same marginal given by (0, 1) with E[ 1  2 ] = .Then  1 +  2 is distributed according to (0, 2(1 + )), thus we have that, if  < −0.5, then Thus the entropy function does not satisfy the second condition of Definition 2, that is, the increasing property, as a CAF.Hence for arbitrarily correlated sources, presumably few meaningful arguments can be made on optimal aggregation problems, even for simple summary functions such as sum.
The discussion so far enables us to deal with more general objective functions extended from (P).Consider a function  : R + → R + which is concave and increasing.We now define communication overhead on an edge as the function  of the average number of bits transmitted over the edge.Namely, we consider the following extension of (P): Consider (P

󸀠
) for the summary function sum for i.i.d.sources and weighted sum for independent Gaussian sources.Note that the composition of two concave and increasing functions is also concave and increasing [35].Thus (  ) is a concave and increasing function of the amount of flows at an edge, and thus is a CAF.Hence HM algorithm can be used to approximate (P  ).

Aggregation Schemes for
Extreme-Type Summary Functions

Case Study.
In this section we consider summary functions regarding the extreme statistics of measurements, that is, max or min.We will first investigate the entropy of the extreme statistics of a set of RVs.Consider  measurements denoted by   ,  = 1, . . ., .Since (max 1≤≤   ) = −{min 1≤≤ (−  )}, we will focus only on max without loss of generality.It is easily seen that max function is divisible, thus the aggregation process is similar to that for sum: a node simply applies max function to the aggregated data.For example, suppose a node receives data given by  1 , . . .,   .The node simply computes (max =1,...,   ) and forwards it to the next hop.
For extreme-type summary functions, we will show that  is in general not a CAF.In particular we consider several cases of practical importance.
Case 2 (Extreme data retrieval problem).We consider the problem of extreme data retrieval defined as follows.Assume that a source node  ∈  measures some physical quantity which is distributed according to a continuous RV   .We assume   's are independent but not necessarily identically distributed.Suppose with some probability   is equal to a large number, which indicates an "abnormal" event.An important application of sensor networks is to detect the maximum abnormality among the measurements.The abnormality is defined as how far a sensor's measurement has deviated from its usual statistics as follows.Let us denote the cumulative distribution function (CDF) of   by   (⋅) or P(  ≤ ) =   (),  ∈ .Consider realizations of  1 , . . .,   given by  1 , . . .,   .We will quantify the abnormality at Source  in terms of how unlikely the measurement   is: specifically the goal of the sink is to retrieve min ∈ [P(  >   )], or alternatively, max thus the abnormality of   is defined by   (  ).Let   =   (  ).
We will assume that the nodes transmit and aggregate   instead of   , and the goal of the sink is to retrieve max ∈ {  }.
Note since   =   (  ) is the RV evaluated at its distribution function, one can show that   's are i.i.d.RVs uniformly distributed on [0, 1].Thus the problem reduces to an optimal aggregation problem retrieving max of i.i.d.uniform RVs.
We will show that  associated with the extreme data retrieval problem is convex and decreasing function when the number of aggregated measurements is greater than or equal to 2. Suppose the data aggregated at a node is given by  1 , . . .,   and define   := max =1,..., [  ].As previously we assume that the node requires on average () =  +  + ℎ(  ) bits to transmit   .

Proposition 4. Consider the extreme data retrieval problem. The aggregation cost function 𝜙(𝑚) is convex and decreasing for 𝑚 ≥ 2.
Proof.Since   is the maximum of  i.i.d.uniform RV's, the CDF of   denoted by    (⋅) is given by Thus the probability density function (pdf) of   denoted by    is given by  −1 .If we compute ℎ(  ), Thus By regarding  as a continuous variable, we have that, for  ≥ 1, Clearly () is decreasing for  ≥ 1, and since its second order derivative is nonnegative for  ≥ 2, () is convex for  ≥ 2. On the right of Figure 2 the plot of ℎ(  ) is shown.Note ℎ(  ) is strictly convex for  ≥ 2, but overall appears to be approximately convex.Note that ℎ(  ) is nonpositive, thus one could select a sufficiently large  such that () =  +  + ℎ(  ) ≥ 0, so that () ≥ 0 for all 1 ≤  ≤ .
In general, for a convex and decreasing , (P) is clearly NP-hard since the problem contains the Steiner tree problem as a special case.In the following section we present a novel algorithm which captures key properties of convex and decreasing .Later we show by simulation the algorithm effectively achieves low cost.

Algorithm for Convex and Decreasing
Aggregation Cost Functions 5.2.1.Motivation.Before we describe our algorithm we present the motivation behind the algorithm.An important observation for the data aggregation problems was made in [25] for concave and increasing .They proposed a "hub-andspoke" model for so-called facility location problem.The idea is that when  is concave and increasing, one should first aggregate flows to some "hubs, " then route the aggregated flow from the hubs to the sink at the minimum cost; this is done by building an approximately optimal Steiner tree where the hubs (facility locations) are the Steiner nodes.The rationale is that, once multiple flows are aggregated at hubs, the cost of routing them collectively to the sink is cheaper than routing the sources' flows separately, due to the concavity of .We observe two aspects in such hub-andspoke schemes.Firstly by local aggregation of flows at hubs we aim at greedily reducing costs based on local information, which we view as the microscopic approach to reduce cost.Secondly by building an approximately optimal Steiner tree with respect to the hubs and the sink, we take the global network structure into account, which can thus be seen as the macroscopic aspect for cost reduction.Hence there exists a tradeoff between microscopic and macroscopic aspects of the cost reduction.A similar observation on such tradeoff was made in [12].However our key question is that, how do we achieve an optimal tradeoff between those aspects for a convex and decreasing ?Consider the three examples of aggregation cost functions denoted by  1 ,  2 , and  3 which are decreasing and convex for  ≥ 1 as shown in Figure 3.In case of  1 , we see that  1 is flat for  ≥ 1, that is, the average number of bits communicated over a link is constant irrespective of the number of flows passed through it.Thus, the minimum cost routing problem reduces to a Steiner tree problem, in which case a completely "macroscopic" solution is optimal.In case of  2 , we see that  2 decreases slowly in .Thus, the more number of flows merges at a link, it takes the less number of bits to transmit the merged information.Suppose we use the hub-and-spoke scheme to aggregate flows in a local manner.The amount of aggregated flows at a hub is at least 2: note that however,  2 is approximately "flat" for  ≥ 2. This implies that, once more than two flows are aggregated, the benefits from further local flow aggregation will be negligible.Hence the optimal routing problem from the hubs to the sink approximately reduces to the Steiner tree problem!Thus one could expect that local aggregation (microscopic approach) followed by an optimal Steiner tree construction (macroscopic approach) would yield a good solution.Now let us consider  3 .The overall rate of decrease of  3 is higher than that of  2 .It appears that when the number of aggregated flows is significantly high, for example,  is greater than 6,  3 becomes effectively "flat." This suggests that, one should keep aggregating flows until sufficient amount of flows, say 6, is aggregated, that is, the microscopic cost reduction should be applied for multiple times in a hierarchical manner, then build an optimal Steiner tree with respect to the aggregated sources, that is, applying macroscopic reduction.The example provides us with some insights.Since () is convex decreasing, the marginal benefit of local aggregation is large for small  but decreases with increasing .In other words, when  is small, that is, in the early stages of the overall aggregation process, one should focus on low-cost local aggregation in order to benefit from high rate of decrease of () for small .Meanwhile, once a large number of flows are aggregated, it pays to perform macroscopic cost reduction from there on by building the optimal Steiner trees since  becomes more "flat" with increasing .This suggests that there exists a tradeoff point at which such microscopic and macroscopic reduction are optimally balanced.Unfortunately it is difficult to know such a tradeoff point in advance.The proposed algorithm not only exploits both the microscopic and macroscopic aspects of cost reduction for a convex and decreasing , but also empirically searches for the optimal tradeoff point.Details are presented in the following section.

5.2.2.
Outline.An outline of the proposed algorithm is presented as follows.The algorithm consists of multiple stages.A hub-and-spoke problem (or facility location problem) is approximately solved at each stage.The flows from source nodes are merged at the hubs.The hubs at the present stage become the source nodes in the next stage, that is, the flows are merged hierarchically.Instead of solving complex facility location problem, we find a minimum weight edge cover (MWEC) on the source nodes at each stage as a simple approximation.The rationale is that we would like to cluster sources for local aggregation at low costs, and by definition the MWEC incurs low cost in doing that.MWEC consists of multiple connected components, each of which is a tree.For each connected component we select a source as a hub and call it a center node (details on the selection of center nodes are provided later).The flows in that component is aggregated at the center node.
At each stage, once the center nodes are determined, we build an approximately optimal Steiner tree with respect to the center nodes and the sink.We use algorithm in [36] for the Steiner tree construction.Their algorithm provides the best known   -approximation for Steiner tree problem where   ≈ 1.39.
Each stage outputs an aggregation tree.The output tree at Stage  is the union of the paths from all the hierarchical aggregations found up to Stage  and the Steiner tree built at Stage .Namely, the output tree at Stage  is a combination of  consecutive hierarchical aggregations (microscopic cost reduction) and a Steiner tree with respect to the sink and Stage  hubs (macroscopic cost reduction).
Hence, over the stages, the algorithm progressively changes the balance between microscopic and macroscopic aspects of cost reduction in the output trees.Roughly speaking, the output trees from later stages are more biased towards the microscopic aspect.After the stages are over, we pick the tree with the minimum cost among the output trees.As a result the algorithm empirically searches for the point of the "best" balance between the two aspects of cost reduction over the stages.Hence one could expect that our algorithm will work well for any convex and decreasing .

Algorithm Description.
We present a formal description of the proposed algorithm followed by an explanation of further details.For given aggregation tree T ⊆ , let (T) denote the total energy cost associated with T, as in the objective of (P).

Hierarchical Cover and Steiner Tree (HCST) Algorithm
Begin Algorithm (1) (Metric completion of ) If  is not a complete graph, perform a metric completion of  to yield a complete graph.Namely, if there exist any pair of vertices without an edge, create an edge between the pair and assign the edge a weight which is the distance between the pair.The distance is measured in terms of the sum of the weights on the shortest path between the pair.
Journal of Sensors (4) (Initial output is a Steiner tree) Jump to Step 7.
(5) (Minimum weight edge cover) Let us denote the subgraph of  induced by  −1 by   .Find a minimumweight edge cover   in   .Let   = ( −1 ,   ) be the subgraph of  induced by the cover.
(6) (Node selection) Suppose   has ] connected components, and denote the th connected component of   by   = (  ,   ) for 1 ≤  ≤ ].For each   , select a node with the maximum degree (ties are arbitrarily broken), say   , which is called a center node.  is a tree, and   becomes the root of   .All the flows in   are aggregated at   such that every node transmits data to its parent node after the data from its child nodes has been aggregated at the node.The total flow at   is updated as follows: Remove all the noncenter nodes from  −1 , and let   be the resulting set of source nodes.
(7) (Steiner tree construction) Build   -optimal Steiner tree    with respect to the source nodes in   and the sink, using the algorithm in [36].
(8) (Merging trees) If  > 0, merge all the MWECs found up to the present stage and the Steiner tree found in Step 7; that is, let If  = 0,  0 ←   0 .We call   the output tree of Stage .(10) (Tree selection) The final output is the tree   * such that that is, the minimum cost tree among the output trees from all the stages.End Algorithm

Comments.
We explain the details of several steps in the algorithm.In Step 3 the flow variables denoted by   ,  ∈ , associated with the source nodes are initialized where we will track the amount of flows throughout the algorithm.In Step 6 it is natural to select a node with the maximum degree as the center node, since such node is literally a "hub." When solving the hub-and-spoke problem at each stage, we choose to solve the MWEC problem whereas in [25] the load-balanced facility location problem is solved.An advantage of solving MWEC problem is that it is considerably simpler than loadbalanced facility location problems since an MWEC problem can be reduced to a minimum weight perfect matching problem [37].Note that the algorithm in [25] solves the huband-spoke problem only once, that is, its output is analogous to the output tree from Stage 1 of our algorithm.Meanwhile HM algorithm solves minimum weight perfect matching at each stage in order to locally aggregate flows with low costs.HM algorithm solves the matching problem hierarchically until all the flows are aggregated to a single source, and the final output is the union of those matchings.Thus its final output is analogous to that from the final stage of our algorithm.In other words, the outputs of the abovementioned algorithms correspond to those from intermediate stages in our algorithm.The HIERARCHY algorithm proposed in [20] hierarchically constructs Steiner trees and solves loadbalanced facility location problems, however in a way which heavily relies on the concave and increasing property of .Thus the algorithm may not be suitable for convex and decreasing .

Performance Analysis.
In this section we analyze the performance of HCST algorithm.For set E of weighted edges, let ‖E‖ denote the sum of its edge weights, that is, ∑ ∈E   .
For given source set Σ, let   (Σ) denote the edge set of the optimal Steiner tree associated with Σ.
Proposition 5.For given network graph  = (, ), the cost achieved by HCST algorithm is higher than the optimal algorithm by a factor of at most  defined as where I (I ≤ log 2 ) denotes the stage at which HCST algorithm terminates.  ≈ 1.39 denotes the approximation ratio for Steiner tree problem, and   ∈ [0, 1] is the ratio of the sums of edge weights between MWEC   at Stage  of HCST algorithm and the Steiner tree associated with source set , that is, Also  is defined as where  := ||, and  [] denotes the th smallest value of the edge weights of .Note that the second summation term of ( 32) is defined to be 0 if  = 0.
Proof.Denote the optimal cost by OPT.We first find a lower bound for OPT.Let  * denote the set of edges of the optimal aggregation tree.Let us sort the amount of edge flows of  * in increasing order, and denote them by   , that is, 0 <  1 ≤  2 ≤ ⋅ ⋅ ⋅ ≤   where  * has  edges.There are at least  nonzero flows since there are  sources, hence   > 0 and  ≥  hold.
Let us denote the weight of the edge that carries flow   by V  .For real numbers  and , let  ∧  := min(, ).We have that where (36) is by Jensen's inequality due to the convexity of , and ( 37) is from the definition of Steiner trees.Considering that  is decreasing, we would like to make the argument of  in (37) as large as possible in order to find a lower bound for OPT.Hence we would like to maximize (V 1 , . . ., V  ) defined as where V  ,  = 1, . . .,  are chosen from the edge weights of .For the purpose of maximizing (38), we will assume We first observe that (⋅) is decreasing in V 1 , . . ., V ⌊/2⌋ , since if  ≤ ⌊/2⌋, we have that Hence (⋅) can be maximized over V 1 , . . ., V ⌊/2⌋ by choosing ⌊/2⌋ smallest weights from the edge weights of , that is, by letting V  =  [𝑖] for  = 1, . . ., ⌊/2⌋.Next we would like to derive an upper bound for ( [1] , . . .,  [⌊/2⌋] , V ⌊/2⌋+1 , . . ., V  ) as follows: For inequality (42), we used the fact that (41) is increasing in Since the cost of HCST algorithm is min =0,...,I (  ), the proposition is proved.
An interpretation for ratio  in (32) is as follows: the first term in the bracket of  represents a bound on the macroscopic cost associated with the Steiner tree approximation.The second term in the bracket of  is a bound on the cost associated with the hierarchical aggregation of flows, that is, the microscopic cost reduction.Clearly we have that ‖ 1 ‖ ≥ ‖ 2 ‖ ≥ ⋅ ⋅⋅, due to  1 ⊇  2 ⊇ ⋅ ⋅⋅, thus  1 ,  2 , . .., is a decreasing sequence where 0 ≤   ≤ 1, for all .The progressive cost reduction due to hierarchical flow aggregation is reflected in  1 ,  2 , . ... As in (32),  is the minimum of I + 1 numbers, each of which contains a weighted sum of (⋅) in different combination of weights   .Hence  represents the empirical minimum of different degrees of tradeoff between microscopic and macroscopic cost reduction.
Next we discuss constant  in (34).Firstly observe that  ≤ ; the first summation of the numerator of (34) is at most  ∑ ⌊/2⌋ =1 V  , in which case the first term of (34) is at most .Note that a naive upper bound for (V 1 , . . ., V  ) is simply , yielding a lower bound OPT ≥ ‖  ()‖(); however we observe that our bound (43) improves such a bound since () ≥ ().
can be numerically computed for a given graph, and in the next section we provide numerical examples of .We also apply HCST algorithm to a specific graph as an example.

Illustrating Examples.
In this section we consider a simple convex and decreasing .As previously the packet header length is  bits, and we assume that the maximum packet size is 10 times the header length, that is, 10.We will accordingly consider () which is convex and decreasing for  ≥ 1 of the following form: Clearly  < () ≤ 10 holds for  ≥ 1.
Figures 4 and 5 show the numerical examples of the performance bound . is computed and averaged over randomly generated graphs of uniformly distributed nodes in a square area.In Figure 4, network size  is fixed to 200, and  is plotted against the number of source nodes .We consider two types of cost functions: the curve labelled "harmonic" represents the cost function (50) in which (⋅) decreases as a harmonic sequence.The curve labelled "exp" corresponds to the case where the term  −1 in (50) is replaced by exp(−(− 1)) where the parameter  > 0 controls the decay rates of the cost function.We set  = 0.2 in this example.In addition, we compare  with a simple analytical bound; suppose we build a   -approximate Steiner tree based on .The cost under that tree is at most   ‖  ()‖ (1).By combining that cost with (43), we obtain a simple approximation ratio of   (1)/() for the approximately optimal Steiner tree.In Figure 4, the plots of such bounds based on   -approximate Steiner tree are added for both harmonic and exponential cost functions, and are labelled as "Steiner(har)" and "Steiner(exp), " respectively.We observe that  provides improved bounds as compared to those based on -approximate Steiner tree.In Figure 5,  is plotted against varying  under the aforementioned harmonic and exponential cost function where we fixed  to 10.In Figures 4 and 5, we observe that  eventually becomes nearly constant, or increases very slowly at most, even if the system size grows.Hence we conclude that  provides an approximation ratio which remains effectively constant irrespective of the system size.Next we present an example of the application of the HCST algorithm to a specific graph.An example of  is given in Figure 6(a). consists of  = 10 nodes where Node 1 is the sink, that is,  = 1.There are four source nodes:  = {2, 3, 4, 5} where the sources are depicted in a shaded color.Each source generates 1 unit of data.We will again consider convex and decreasing (⋅) given by (50), and assume  = 1. Figure 6(b) shows the output of Stage 0 or  0 which is an approximately optimal Steiner tree. Figure 7 shows the MWECs over the stages.Figure 7(a) shows the metric completion of the subgraph induced by .
Thus the final output of HCST is  2 with the final cost of 275.5.Note that in this example, the Shortest Path Tree (SPT) heuristic incurs the energy cost of 374.
Next consider () such that Assume that the algorithm has yielded the same  0 ,  1 and  2 as the previous case.Since  is constant for  ≥ 1, the problem reduces to the Steiner tree problem, thus one would expect that  0 would perform the best since  0 is intended to be an approximately optimal Steiner tree.The energy costs are given by  ( 0 ) = 37,  ( 1 ) = 44,  ( 2 ) = 43; (54) thus indeed the HCST algorithm will output  0 as the best solution with cost 37, whereas the SPT heuristic will yield the energy cost of 41.This demonstrates that our algorithm can effectively deal with various types of convex and decreasing aggregation cost functions.In the following section we will evaluate the performance of the HCST algorithm by simulation.

Simulation
In our simulation we randomly generate  as follows.The node locations are generated independently and uniformly on a unit square.We define  as the Delaunay graph induced by the node locations.An example of  is depicted in Figure 10 for  = 20.As previously it is assumed that the average number of bits required to transmit the aggregated information ( 1 , . . .,   ) is approximately  +  + ℎ(( 1 , . . .,   )) where we set header length  to 1 and the number of quantization bits  to 3. The edge weights are randomly selected from {1, . . ., 10} which represents the energy consumption per transmitted bit.In our simulation two types of sources are considered.The first type, called uniform type, is associated with the extreme data retrieval  problem, that is,   are i.i.d.uniformly on [0, 1].The second type, called Gaussian type, is associated with retrieving the maximum of Gaussian source data where   ∼ (0, 1).The summary function (⋅) is given by max function.
We will compare the performance of the HCST algorithm with HM algorithm [16] and SPT heuristic.Figure 11 shows the average energy consumption of the algorithms when we fix the number of sources to 8 with varying .The energy cost shown on the left (resp.right) of Figure 11 is associated with the sources of uniform (resp.Gaussian) type.We observe that the HCST algorithm achieves lower energy costs than the SPT heuristic in both types of the sources.The gain in the energy savings by the HCST algorithm ranges 35-38% for uniform type sources and 24-25% for Gaussian type sources.Compared to HM algorithm, our algorithm reduces the energy consumption by 20-21% and 14-15% for uniform and Gaussian type sources, respectively.HM algorithm focuses on microscopic cost reduction, which may be effective for concave and increasing cost functions, however not for convex and decreasing cost functions.Comparing SPT heuristic and HCST algorithm, we observe that the difference in the mean energy consumption of the algorithms slightly increases with .This can be interpreted as follows: for larger networks, there is further room for improvement by HCST, for example, there are more choices for Steiner nodes and more ways to merge sources at low costs by MWEC.Thus the performance gain from the HCST algorithm relative to the SPT heuristic is expected to grow with  as shown in the simulation.
Figure 12 shows the mean energy costs with varying  where we scale the number of the sources proportional to .Specifically in the simulation we let  = /5, that is, one out of five nodes is a source node.In the figure we see that the HCST  algorithm again outperforms the SPT heuristic.The relative savings in energy by HCST algorithm ranges 19-41% for uniform type sources and 14-27% for Gaussian type sources.
Relative to HM algorithm, HCST algorithm saves energy costs by 20-23% and 14-17% for uniform and Gaussian type sources, respectively.The difference in the energy cost of the algorithms increases with  similar to the case of fixed number of sources, however, such a rate of increase is higher in the case of varying number of sources.This can be explained as follows.When we increase the network size, the number of sources also increases proportionally.When the network size grows, from the previous argument such that there is further room for improvement by HCST, its relative gain will increase with the network size.In addition to that, since the number of sources grows, the total number of stages at the end of the HCST algorithm will also increase.Since HCST chooses the best tree from the intermediate output trees collected over stages, a large number of stages implies that we can choose the final output tree from a large pool of trees having various degrees of tradeoff between microscopic and macroscopic aspects of the cost reduction.Thus the abundance of source nodes enables us to choose an aggregation tree with a "refined" tradeoff, which is crucial for a convex and decreasing .This explains the enhanced performance of HCST with increasing number of sources.Hence we conclude from the simulation that the HCST algorithm can improve performance for various proportions of source nodes among the network.

Figure 1 :
Figure 1: An example of computing and communicating a summary.

6 Figure 2 :
Figure 2: The differential entropy of the maximum of a set of i.i.d.RVs distributed according to an RV .On the left,  ∼ (0, 1), and on the right,  is uniform on [0, 1].

Figure 3 :
Figure 3: Aggregation cost functions which are convex and decreasing for  ≥ 1.

Figure 4 :
Figure 4: Performance bounds under varying number of sources.

Figure 5 :
Figure 5: Performance bounds under varying network sizes.
Figure 7(b) shows the MWEC at Stage 1. Node 4 and 5 became the center nodes as emphasized in the figure.
Figure 7(c) shows the MWEC and the center node at Stage 2.

Figure 8 (
Figure 8(a) shows the full paths of the MWEC at Stage 1, that is, that in Figure 7(b), in .By building an approximately optimal Steiner tree   1 associated with {1, 4, 5} and taking the union of   1 and  1 as in Step 8, we get  1 as in Figure 8(b).Similarly Figure 9 demonstrates Stage 2 of the algorithm.The full paths for the MWEC from Figure 7(c) in  are shown in Figure 9(a).Note that Node 4 is selected as the center node, and the output from Stage 2 or  2 is shown in Figure 9(b).Let us compare the energy costs from all the stages.For  0 , a total of three flows pass through the link between Node 1 and 3, while the flow on the other links is simply 1.Thus, the cost of  0 from Stage 0 is given by

Figure 10 :
Figure 10: An example of randomly generated  for simulation.

Figure 11 :
Figure 11: Energy cost associated with a fixed number of sources.
because over all possible permutations (1), (2), . . ., () of {1, 2, . . ., }, ∑ (37)e we chose  =  and the largest possible weights [||],  [||−1] , ...forV ⌊/2⌋+1 , V ⌊/2⌋+2 , ..., in order to maximize ∑  =⌊/2⌋+1 V  .From (42), we obtain (V 1 , ..., V  ) ≤ .Hence from(37), we obtain Now let us consider the cost of output tree at Stage  of HCST algorithm, or (  ).Recall that in HCST algorithm,   denotes the source set at Stage , and   denotes the output tree at Stage .The cost of   is divided into (i) the cost incurred by hierarchical MWECs  1 , . . .,   , and (ii) the cost of  approximate Steiner tree    associated with   .Hence  (  ) = ) denote the amount of flow at Edge  under HCST algorithm.Note that, the amount of flow in the network at Stage  is at least 2 −1 , since the flows are agglomerated through MWECs at every stage.Since (⋅) is decreasing, the first summation of (44) is at most   ⊆ ; specifically, the Steiner tree for  is a tree that spans   , hence by definition, the sum of edge weights of   (  ) is no more than that of the Steiner tree associated with .