Energy-Efficient β-Approximate Skylines Processing in Wireless Sensor Networks

As the first priority of query processing in wireless sensor networks is to save the limited energy of sensor nodes and in many sensing applications a part of skyline result is enough for the user’s requirement, calculating the exact skyline is not energy-efficient relatively. Therefore, a new approximate skyline query, β-approximate skyline query which is limited by a guaranteed error bound, is proposed in this paper. With an objective to reduce the communication cost in evaluating β-approximate skyline queries, we also propose an energy-efficient processing algorithm using mapping and filtering strategies, named Actual Approximate Skyline (AAS). Andmore than that, an extended algorithm namedHypothetical Approximate Skyline (HAS) which replaces the real tuples with the hypothetical ones is proposed to further reduce the communication cost. Extensive experiments on synthetic data have demonstrated the efficiency and effectiveness of our proposed approaches with various experimental settings.


Introduction
Wireless sensor networks (WSNs), which integrate sensor technology, embedded computing, networks, wireless communication, and distributed information processing, are widely used in military and civil fields [1,2], such as object tracking, nuclear reactor controlling, fire detection, and battlefield surveillance.A WSN is a wireless network consisting of spatially distributed autonomous sensor nodes which are densely deployed either inside or close to the phenomenon and cooperatively monitor physical or environmental conditions, such as temperature, sound, vibration, humidity, and pressure at different locations.Sensor nodes are generally cheap, resource-constraint, unreliable, and battery powered; moreover most WSNs work in an unattended, hard-to-reach environment; it is impossible or at least very difficult to change batteries.Therefore, applications over WSNs need a scalable, energy-efficient, and fault-tolerant method to manage tremendous data generated by sensors and minimize power consumption to prolong the lifetime of WSNs.
As an important operator for multicriteria decision making and data mining, skyline query [3] has been well studied in the database literature.Given a dataset , the skyline Skyline() of  returns all tuples that are not dominated by the others in .Here, tuple   dominates another one   means   is no worse than   for every dimension and better than   for at least one dimension.The vocabularies "worse" and "better" mentioned above can be any preference judgment.Without loss of generality, we assume the smaller values are preferred in this paper.As illustrated in Figure 1(a),  1 ,  2 ,  6 ,  7 , and  11 are the skyline tuples.
Actually, in many sensing applications, we just need a part of skyline result rather than the whole one, because that is enough for users to make decisions.For example, administrator always concerns the nodes with heavier traffic and lower battery to find out how the whole WSN works, so the administrator can take some actions to prolong the lifespan of WSNs.In this application, administrator just needs a rough condition about the relationship between traffics and power consumption, while evaluating skyline query in WSNs is  costly in terms of energy consumption, so it is not necessary to compute the exact skyline and an approximate one which can show the general shape of skyline is enough.So far, approximate skyline has already been studied in traditional database literature.However, the existing solutions cannot be applied to the sensor environment directly due to the unique characteristics of WSN.In WSNs, energy is the precious resource and wireless communication is the main consumer [4], and the main challenge of approximate skyline processing in WSNs is how to minimize the communication cost.Although an approximate skyline which returns a subset of the results over WSNs was studied in [5], its definition may be not good enough to represent the exact skyline, as it did not guarantee the error bound.Therefore, in this paper, a new approximate skyline definition is proposed which can return the subset of a skyline in certain error bound.Moreover, the corresponding energy-efficient query processing algorithms are proposed accordingly.The contributions of this paper are summarized as follows.
(i) An error-bounded -approximate skyline is proposed, and an Actual Approximate Skyline (AAS) algorithm based on mapping and filtering to solve approximate skyline is also proposed to reduce the communication cost among sensor nodes on evaluating -approximate skyline in WSNs.
(ii) Hypothetical Approximate Skyline (HAS) algorithm, which not only is limited by error bound but also replaces several real tuples with the hypothetic ones, is also proposed to further improve the query processing efficiency.
(iii) Last but not least, our extensive experimental studies using synthetic data show that the proposed approaches can significantly reduce the communication cost among sensor nodes and save the energy consumption during the evaluation processing of approximate skyline queries in WSNs.
The rest of the paper is organized as follows.Section 2 briefly reviews the previous related work.The proposed Actual Approximate Skyline (AAS) and Hypothetical Approximate Skyline (HAS) algorithms are introduced in Sections 3 and 4, respectively.The extensive experimental evaluation results showing the effectiveness and energyefficiency of the proposed approximate skyline algorithms are reported in the Section 5. Finally, Section 6 concludes this paper.

Related Work
The skyline operator was first introduced in [3], where two algorithms based on block nested loops (BNL) and divideand-conquer (D&C) were proposed, respectively.As a variant of BNL, a sort-filter-skyline (SFS) algorithm which improves the performance by presorting the dataset according to some monotone scoring function was proposed in [6].Two progressive processing algorithms, Bitmap and Index, were proposed in [7].Both of them can obtain the skyline result set without having scanned the whole dataset.A nearest neighbor (NN) approach was investigated in [8], which can process skyline query progressively by using the result of the nearest neighbor query to partition the data space recursively.Papadias et al. [9] proposed an algorithm based on branch-andbound (BBS) taking advantage of R-tree to improve the performance of NN.Besides the original skyline definition, there are a lot of skyline variants having been proposed.Jin et al. [10] suggested the conception of thick skyline offering more results for users to choose.An approximate skyline algorithm based on BBS was proposed in [11].Chan et al. [12] relaxed the idea of dominance to -dominance.A novel metric, called skyline frequency that compares and ranks the interestingness of data points, was considered in [13].Lin et al. [14] studied the problem of selecting  skyline points so that the number of points, which are dominated by at least one of these  skyline points, is maximized.Benouaret et al. [15,16] introduce two new concepts based on an extension of the (Pareto) dominance relationship, called -dominant skyline [15] and -dominant skyline [16], to tackle multicriteria service selection, and then proposed an efficient and flexible Web service selection framework that implement the above mentioned skyline variants [16].All these methods introduced above are just available in the centralized database system.In the distributed database system especially in WSNs, the situation is more complex.In WSNs, not only data is storage distributed but also the nodes are battery supported.As a result, energy-efficiency should be the first priority of the algorithm designed in WSNs considering the limited battery power in the sensor nodes.
Various skyline processing algorithms in WSNs have been studied recently.Chen et al. [17] presented a hierarchical threshold-based approach to minimize the transmission traffic in WSNs.Xin et al. [18] presented a filter-based approach which employs two types of filters (tuple filter and grid filter) within each sensor to reduce the cost of transmission traffic in WSNs.While in [19] they also presented an energy-efficient approach which uses the mapped skyline as the filter to get to the purpose of energy-efficiency in WSNs.In [20], The filter-based distributed algorithms for skyline evaluation and maintenance were studied to maximize the network lifetime.In [21], the dataset is partitioned into several disjoint subsets and skyline points can be found progressively by examining each subsequent subset.Multiple skyline queries over WSNs were discussed in [22], and an energy-efficient multiskyline evaluation (EMSE) algorithm which can reduce the transmission cost with two optimization mechanism was investigated here.Shen et al. [23] studied two-dimensional skyline query based on position in WSNs and proposed Ring-Skyline algorithm which calculates the skyline for each ring according to the order of distance from the near to the far.Su et al. [24] proposed a data-centric algorithm named Skysensor using a cluster-based architecture, and Skysensor can reduce the energy consumption for each query depending on several skyline queries started by different sensors sharing the same data gathering process.Pan et al. [5] investigated an approximate skyline (AS) algorithm which computes an approximate skyline result set only by making partial sensor nodes transmitting their sensor data back, and the approximate skyline result set is just a subset of the exact skyline set.This approach can reduce the energy cost efficiently, but it does not consider the error bound between the approximate skyline and exact skyline, which may cause the skyline result unilateral and inadequate.In this paper, we will propose a new definition of approximate skyline taking both inclusion relationship and error bound into consideration, and we call it -approximate skyline.It always can show the general distribution of exact skyline that compensates for the weakness of approximate skyline in [5].

Approximate Skyline in Wireless Sensor Networks
In this section, the network routing structure and the definition of -approximate skyline are first introduced in Section 3.1.Then, the preliminaries which are the foundations of our proposed approaches are presented in Section 3.2.
Finally, the details of Actual Approximate Skyline (AAS) algorithm are described in Section 3.3.Notations section summarizes the notations used throughout the paper.

Problem Statement.
The tree-based routing structure is established to tackle the -approximate skyline in WSNs.It constructs a spanning tree with the base station as the root.
The construction process is as follows: All nodes set their own level to infinite.The base station broadcasts a message with its own id and level to construct the routing tree.The level of base station is usually set to zero.Any node that hears the message will compare its own level to the level in the message; if the former is bigger, it will be replaced by the latter added by one and also chooses the sender as its parent.Each of these nodes then replaces the id and level with their own ids and levels and then rebroadcasts the routing message to their neighbors.The routing tree is constructed step by step this way.This construction process will be initiated periodically by the base station; thus the network topology will be constructed periodically.Therefore, this structure can easily adapt to the moving, addition, or deletion of the node.Although the approximate skyline algorithm proposed by Pan et al. [5] can reduce communication cost efficiently in WSNs, it may seriously affect precision of the skyline result.Therefore, the skyline result cannot show general distribution of the exact skyline very well.In practical applications, the query result is "good" or "bad" which always depends on the precision of approximate skyline.Hence, besides considering that approximate skyline is the subset of the exact one, the distance between them should also be taken into consideration at the same time, to make sure approximate skyline can show the general distribution of exact skyline very well.Consequently, a new definition of approximate skyline named -approximate skyline is proposed and shown as Definition 1.
Definition 1.Given a dataset  and its skyline being , if there is a dataset S satisfying that S is a subset of  ( S ⊆ ) and the distance Dis( S, ) between the surfaces of S and  satisfies Dis( S, ) ≤ , then it can be said that S is a -approximate skyline of .
Here, the surface is the shape of a region dominated by at least one of the elements in a set of skyline tuples.It is known that the surface is composed by the zipzag plane of the skylines and distance between two skyline surfaces is the maximum of all distances between a tuple in one skyline surface and the other surface (i.e., point to surface distance).the dominating region of -approximate skyline is almost the same with the exact skyline's.The distance between the exact skyline and the approximate one is no more than .

Preliminaries.
First, the definition of mapped skyline is introduced here.
Using a regular grid, the data space can be partitioned into many cells.For every dimension, assume that the extent of grid is ; then we will have  Definition 2. Given a dataset , the skyline Skyline(  ) of its mapped dataset   is defined as mapped skyline of .
Next, we will introduce how to calculate the approximate skyline depending on mapped skyline.
Proof.If   and   belong to the same cell divided by , then (  ) = (  ).Otherwise, (  ) ̸ = (  ), since   is no worse than   for all the dimensions and better than   for at least one dimension, according to the mapping function, we can easily get that (  ) is no worse than (  ) for all dimensions and better than (  ) for at least one dimension; therefore, (  ) ⪰ (  ).The proof is completed.
According to Lemma 3, we can get the important Theorem 4, which is the foundation of AAS algorithm.Proof.According to Definition 1, two aspects need to be proofed.
(2) (( D), ()) ≤ .Depending on the mapping function, the distance between  and () for each dimension is no more than ; therefore the distance between Skyline() and Skyline(  ) or rather D is no more than .And in the calculation of Skyline( D), the distance between surfaces of two skyline dataset cannot exceed  if we only test the dominance relationship among tuples which have the same mapped one.So Dis(Skyline( D), Skyline()) ≤ .
In conclusion, Skyline( D) is the -approximate skyline of dataset .

Actual Approximate Skyline Algorithm. According to
Theorem 4, the calculation processing of -approximate skyline of Dataset , namely, ApprSkyline  () consists of three steps.First, calculate which tuples in   belong to Skyline(  ).Then, all the tuples whose mapped one () is in Skyline(  ) should be figured out and kept in D. Finally, we evaluate Skyline( D) as -approximate skyline of .
The in-network computation [4] is used to calculate the -approximate skyline in WSNs, that is to compute local query results of in-network whenever possible.Firstly, the leaf sensor nodes in the routing tree calculate their local approximate skyline and send them to their parent nodes, respectively.Then, the intermediate sensor node merges its local sensing data and the -approximate skyline results sent by its children firstly and then sends the merged intermediate result to its parent.Finally, the base station will get the global -approximate skyline of WSNs.The query processing in each sensor node (including leaf nodes, intermediate nodes, and base station) is shown as Algorithm 1. First, in the    step, we merge the -approximate skyline S of each child node into the dataset  of this current node and then get the mapped dataset   of  using the

Hypothetical Approximate Skyline
Since using AAS algorithm to calculate approximate skyline in WSNs can reduce the communication cost significantly comparing with calculating the exact one, it can efficiently extend the lifespan of WSNs.While in some sensing applications, users just need to know the distribution of skyline roughly rather than the real skyline tuple values.Consequently, we tend to maintain the error bound in -approximate skyline but relax the restriction that approximate skyline should be the subset of exact skyline.
To deal with that, we use the expected tuple to replace all the tuples in the cell which contains -approximate skyline; then we can get algorithm -hypothetical skyline named ApprSkyline   () based on the expected tuples which are also called hypothetical tuples in this paper.As Figure 2 shows, both -hypothetical skyline and -approximate skyline can present the general distribution of exact skyline very well, while the number of tuples is less in -hypothetical skyline.
In the calculation processing of -hypothetical skyline, we need to evaluate mapped skyline in the whole WSNs first.Then in the base station, we figure out the hypothetical tuples corresponding to all the mapped tuples in the mapped skyline, and these hypothetical tuples make up the hypothetical skyline result in WSNs.In this way, not only cannot the distance between hypothetic skyline and exact skyline, namely, Dis(ApprSkyline   (), Skyline()), exceed , but also the quantity of data transferred in WSNs can be reduced significantly since the communication cost of mapped tuples is far more less than that of real tuples.The reasons of that are that the number of mapped tuples is less than the real ones and the values of mapped tuples are integer while the real tuples are real number type data.Actually, the advantage of integer type data transmission has already been demonstrated in [19].
To get the hypothetical tuples which also mean expected tuples, we can use the method mentioned in [11], which is shown as follows: taking advantage of the equation () = 1/( ×  + 1), the expected value of skyline tuple  in an unit cell can be calculated; then we can get the expect tuple using the equation x =  + ⌊( − )/⌋ ×  + /( ×  + 1).
The Hypothetical Approximate Skyline (HAS) algorithm, which can achieve -hypothetical skyline result is shown as Algorithm 2. We should get the mapped dataset   of this node first and then merge the local mapped skyline    of each child node into   .After all these have been done, we can get the mapped skyline   of   .Finally, the base station aggregates all the mapped skyline and figures out the ultimate mapped skyline and then calculates the -hypothetical skyline result according to   using the equations mentioned above.

Performance Evaluation
In this section, we will present our simulation results comparing the performance of our algorithms using independent, correlated, and anticorrelated data [3].The result of correlated  data is omitted, since it is similar with the performance of independent data.In order to prove that approximate skyline algorithm is more energy-efficient than the exact skyline algorithm, we also take TAG into consideration.So, the algorithms we need to compare are (i) TAG: a Tiny AGgregation service for ad hoc sensor networks [4]; (ii) AAS: Actual Approximate Skyline algorithm; (iii) HAS: Hypothetical Approximate Skyline algorithm.

Experimental Settings.
We have developed a simulator using java to evaluate the performance of our proposed algorithms, and the parameters of simulator are error bound, number of nodes, dimensionality, and cardinality.In our experiment, we place  sensor nodes in an area of √ × √ unit at random; then each node holds one unit space averagely and the communication radius of nodes is set to 2 √ 2 unit.Meanwhile, we make the capacity of packet transmitted in the network be no more than 48 bytes [6].Table 1 presents the parameters we investigate along with their default values and ranges in our experiments.We vary one single parameter and keep the others being their default values in every experiment.All the simulations are run on the PC with 2.8 GHz CPU and 512 M of memory.The performance metrics of our experiments include communication cost (number of messages) and result quality

Experimental Results.
As shown in Figures 3(a) and 3(b), the less error bound becomes, the more communication cost will be since the number of mapped tuples will increase when we make error bound become smaller.However, the increasing rate of communication cost will begin to slow down when it reaches to a certain value, and this limit state can be regarded as a tendency towards TAG.In addition, we can find that the communication cost for HAS is always lower than that of AAS because transmitting the integer tuples is more energy-efficient than transmitting the tuples consisting of real numbers.
In Figures 4(a) and 4(b), we find that when error bound is minimal, relative error would reduce since number of   skyline tuples calculated by our algorithms increases for the decreasing of error bound, which makes the skyline more precise.Moreover, relative error of HAS is always higher than that of AAS for the reason that result of AAS consists of the real skyline tuples but result of HAS is made up of hypothetical ones.Relative error is controlled well by setting error bound to some appropriate value, and when error bound reaches some critical value such as 1/2 12 , the relative error is near to zero.Its astringency is pretty good, and we make default error bound equal to 1/2 12 in all experiments.
As shown in Figures 5(a) and 5(b), the communication cost increases correspondingly when the number of nodes increases.Communication cost for TAG is the highest and HAS has the least communication cost among the three algorithms.The reason is that TAG calculates the exact skyline result, while HAS transmits the integer tuples in the network and AAS uses tuples consisting of real numbers.
In Figures 6(a) and 6(b), since the distance between approximate skyline and exact skyline is controlled well with the mapping function, the change to number of sensors can just result in relative error's fluctuating, and the influence is not very significant.Moreover, the relative error of AAS fluctuates more slightly comparing with that of HAS.
Figures 7(a) and 7(b) show that the communication cost increases along with the dimensionality increase, because the rate of a tuple being dominated decreases with the increasing dimensionality, which gives rise to the amount of skyline tuples to be transmitted getting lager.And when dimensionality increases, relative error increases generally in Figures 8(a     approximate skyline and exact skyline becomes smaller with the increasing of dimensionality. As shown in Figures 9(a) and 9(b), the communication cost for TAG is larger than that of the other two algorithms.And the communication cost does not vary much with the changing of cardinality for all the three algorithms for the reason that error bound is a constant in this experiment.In Figures 10(a) and 10(b), we can see that when cardinality changes, both of the approximate algorithm's relative errors will change slightly.The reason is similar with what we present above if number of nodes varies.

Conclusions
In most WSNs, energy is a critical resource and is mainly consumed by the wireless communication.How to minimize the communication cost in WSNs becomes an essential problem.In this paper, we presented a comprehensive study on approximate skyline queries in WSNs.First, we proposed the definition of -approximate skyline, which demands that not only are the tuples in it the members of exact skyline but also the distance between the approximate skyline and the exact skyline is no more than .Then, the AAS algorithm     using the mapping and filtering is proposed to reduce the communication cost of evaluating -approximate skyline.Moreover, we extended definition of -approximate skyline to -hypothetical skyline and proposed the HAS algorithm whose skyline result consists of hypothetical tuples instead of real ones to further reduce the communication cost.Our experimental results show that both AAS and HAS are energy-efficient in evaluating approximate skylines in WSNs.Mapping range on dimension .

Figure 1 :
Figure 1: Examples of skyline and approximate skyline.

Figure 1 (
Figure 1(b) straightly illustrates the corresponding approximate skyline of the dataset in Figure 1(a).Comparing with the exact skyline in Figure 1(a), we can conclude that [] = ⌈([] − [])/⌉ segments.And totally there are ∏  =1 [] cells in the data space.In other words, the division processing is equivalent to using mapping function () = ⌊( − [])/⌋ to map every value  ∈ [[], []] on dimension  for all the tuples to integer at a range of [0, [] − 1].The mapped dataset   can be gotten from original dataset  using the method above.The definition of mapped skyline is shown as Definition 2.

Figure 3 :Algorithm 2 :
Figure 3: Number of message versus error bound.

Figure 5 :
Figure 5: Number of message versus number of nodes.
) and 8(b) since ratio of the region dominated by both

Figure 6 :
Figure 6: Relative error versus number of nodes.

Figure 7 :
Figure 7: Number of message versus dimensionality.

Figure 9 :
Figure 9: Number of message versus cardinality.

Notations
: Dataset   : Mapped dataset : Number of sensor nodes : D i m e n s i o n a l i t y o f  : A t u p l e i n  (): The mapped tuple of    ⪰   :   dominates     ≻   :   strictly dominates    = Skyline(): Theskylineo pera to r S = ApprSkyline  (): The -approximate skyline operator Dis( S, ): Distance between the surface of S and  []: Lower bound on dimension  []: Upper bound on dimension  []: for local -approximate skyline S of each child node do  =  + S ;   = (, ); for each tuple  in  do   = (  , ); if   is not dominated by any other tuples in   then D = V(,   , );