Distributed clustering is widely used in ad hoc deployed
wireless networks. Distributed clustering algorithms like
DMAC, HEED, MEDIC, ANTCLUST-based, and EDCR produce
well-distributed Cluster Heads (CHs) using dependent thinning
techniques where a node’s decision to be a CH depends on the
decision of its neighbors. An analytical technique to determine
the cluster density of this class of algorithms is proposed. This
information is required to set the algorithm parameters before
a wireless network is deployed. Simulation results are presented
in order to verify the analytical findings.
1. Introduction
Distributed clustering is a robust technique used to organize ad hoc deployed wireless nodes to form a communication network [1]. It is widely adopted in energy constrained ad hoc deployed wireless sensor networks (WSNs) [2]. Distributed clustering algorithms, used in ad hoc deployed wireless networks, can be broadly categorized into two classes. The first category consists of independent randomized cluster head (CH) selection class of algorithms; that is, the decision for a node to be a CH is made independent of the decision of its neighboring nodes. For example algorithms such as LEACH [3], LEACH-D [4], SEP [5], and EDAC [6] fall into this category. These algorithms do not produce well-distributed CHs [7]. They may produce two or more adjoining nodes as CHs. Furthermore, the variation between the theoretical expected number of CHs for these algorithms is considerable when compared to the actual number of CHs obtained after deployment [8]. The second category consists of distributed clustering algorithms like DMAC [1], HEED [9], ANTCLUST based [10], MEDIC [11], EDCR [12], and its derivatives [13]. The location of a CH for these algorithms is dependent on its neighbors decision as well. This ensures that no two CHs appear in each others neighborhood and all nodes have at least one CH in their neighborhood or the node itself is a CH. They produce well-distributed clusters using dependent decision making and is referred too as Dependent Thinning Distributed Clustering (DTDC) class of algorithms. We note that the CH selection process of DTDC class of algorithms resembles the reverse price auction and is sometimes known as the Dutch auction [11] method.
Irrespective of the distributed clustering algorithm used in ad hoc deployed wireless network applications, the knowledge of the expected number of clusters, denoted by E[k], is an important parameter required at the planning stage of the network. For example, consider a WSN where data is collected periodically and aggregated at the CH, then communicated to the base station (BS). The application may expect E[k] number of clusters, where each cluster has an expected number of nodes, denoted by E[n], in the given deployment area A. The given requirement is generated based on the level of reliability expected from the collected data. That is, the reliability is directly connected to the redundancies associated with the nodes within a cluster [14]. Another example is if an ad hoc deployed wireless network application is required to produce an optimal number of clusters E[k] based on the requirement to minimize the energy cost for communication and maximize the network lifetime [15, 16]. In both these examples, the WSN parameters should be set appropriately at the initial deployment stage so that when in operation the desired number of clusters E[k] is achieved to meet the design objective. In the first example, this objective is increased reliability, where as in the second example it is to maximize the lifetime.
To identify the importance of knowing WSN design parameters, let us look at a known example like the LEACH algorithm from the first category. LEACH uses a parameter p which represents the expected proportion of nodes to be CHs. That is, a node has p probability of becoming a CH independent of the decision of its neighbors. When an independent randomized clustering algorithm like LEACH is applied to an ad hoc deployed network where N nodes are uniform randomly deployed in a given area, the expected number of clusters can be found using the expression E[k]=pN [3]. According to [17], the node distribution of such a system is considered to be a 2-D Poisson point process with intensity λ=N/A, and the resultant CHs too would be distributed as a 2-D Poisson point process with intensity (i.e., CH density) λc=pλ. We see that by setting the WSN parameters p and N we can achieve a desired E[k]. That is the analytical expressions presented play an important part in achieving the proper E[k].
To the authors knowledge, no such analysis exists to determine the CH distribution and density (λc) of DTDC class of algorithms. However, Bettstetter [18] has presented an empirical formula for the CH density of the DMAC algorithm using simulation results. As it is an empirical formula, it cannot be generalized. In this paper we present an analytical expression for CH density for the DTDC class of algorithms in order to address this gap.
In what follows, we will first establish that the DTDC class of algorithms such as HEED, ANTCLUST based, DMAC, MEDIC, and EDCR will indeed fall into one common category in terms of their CH distribution. Then, we will determine the probability distribution of the cluster area of the DTDC class of algorithms. Subsequently, the distribution of the cluster area will be used to derive the cluster density. Furthermore, we will also consider the boundary (or boarder) effect due to the finite geographical area in which the nodes are distributed and modify the expressions to accommodate it. The proposed analytical results will prove that the empirical results derived using simulations by Bettstetter in [18] are indeed accurate.
Rest of the paper is organized as follows: Section 2 presents the nomenclature. Section 3 provides a mathematical model to express the CH selection and distribution common to all DTDC algorithms. In in Section 4 the model presented in the previous section will be used to identify a probability distribution of cluster area of DTDC class of algorithms. Subsequently in Section 5 these results will be used to find the cluster density and the number of expected clusters in a rectangular and circular deployment area. Simulation results presented in Section 6 establish that the analytical findings are in line with the actual values presented in existing literature. Section 7 presents the conclusion.
2. Nomenclature
Table 1 gives the notations used in what follows. Some are extracted from [12].
Brief definition of some notations used in the paper.
λ
Intensity of sensor node distribution point process.
α
Random variable denoting the cluster area.
R
CH message broadcasting distance (using an omni direction antenna).
Ti
Time at which a node i is supposed to broadcast its CH candidacy, provided that it has not heard similar messages from its neighbor.
𝒮
Set of all N sensor nodes deployed in the sensor bed.
ℋ
Set of all CHs at a given moment.
|ni-nj|
Euclidean distance between any two nodes ni and nj.
y
Random variable denoting the probability of any selected random node could be a CH in a given moment.
E[k]
Expected number of clusters where k represents the number of clusters in any given moment.
E[n]
Expected number of nodes in a given cluster.
λc
CH density.
3. Preliminaries
This section presents the background necessary to find the CH density of DTDC class algorithms covering HEED, DMAC, ANTCLUST base, MEDIC, and EDCR. As mentioned before, these algorithms will produce well-distributed CHs by making a node’s decision to be a CH based on the decisions of other nodes in its neighborhood.
We assume that there are N number of uniform-randomly distributed nodes in a given deployment area A resulting in a 2-D Poisson point distribution of intensity λ, where λ=N/A [17] in our analysis. Furthermore we assume that all clusters are well populated; that is, each cluster consists of a large number of less reliable low cost nodes which work collaboratively to achieve reliable results. Hence, αλ≫1 where α is a random variable denoting the cluster area. According to the DTDC class of algorithms, the area covered by a CH candidacy message is given by πR2, where R represents the maximum distance a CH candidacy announcement message would reach. Since α<πR2,πR2λ≫1.
The following common features exists in the DTDC class of algorithms.
DTDC class of algorithms does not allow two CHs to be within a distance R. Furthermore, it ensures that all the nodes are either discovered by a CH (i.e., there is a CH within a distance R of a regular node), or the node itself is a CH.
Each node calculates a time Ti at which it will broadcast the CH candidacy announcement, provided it has not heard a similar message from a neighbor by this time. Ti calculation is algorithm specific. However, all algorithms ensure that Ti is inversely proportional to the fitness of a node to be a CH. For example in the EDCR algorithm, Ti is inversely proportional to the relative residual energy level of a node [12]. As such, the node with the highest fitness to become a CH will have the lowest Ti, resulting it to announce CH candidacy first and becoming the CH for that neighborhood.
All the algorithms use a random component for tiebreaking. Hence, when all nodes are equally fit to be CHs, Ti is purely random. This is true for EDCR, HEED, MEDIC and ANTCLUST algorithms at the initial deployment stage since all nodes have equal energy.
Above features of DTDC class of algorithms reaffirm that the selected CHs represent a dependent thinning point process on the original 2-D Poisson point process. Let 𝒮 represents the set of all deployed nodes, where 𝒮⊂ℝ2 with |𝒮|=N. The clustering process yields a random set ℋ⊆𝒮 of secondary points which are CHs with the property that |hi-hj|>R, where hi,hj∈ℋ and i≠j. Note that 𝒮∖ℋ are the regular (non-CH) member nodes. For any node mk∈𝒮∖ℋ we have |mk-hi|<R and Tmk>Thi at least for one CH node hi∈ℋ. Further, it should be noted that mk is a member of the cluster with CH hi when |mk-hi|<|mk-hj|<Rforalli≠j.
According to [19] aforementioned dependent thinning point process follows a Matérn Type III process when Ti is a pure random value. Hence, we can conclude that the CH distribution of dependent thinning algorithms like HEED, ANTCLUST, DMAC, MEDIC, and EDCR immediately after deployment would resemble a Matérn Type III point process.
Example 1.
Figure 1 gives a simplified description of Matérn Type III process applied to 3 random nodes a, b, and c with |a-b|=0.30, |b-c|=0.23, |a-c|=0.53, R=0.4, Ta=0.27, Tb=0.52, and Tc=0.78.
Simple example of Matérn Type III repulsive point process.
According to this illustration, since Ta<Tb, a eliminates b; since b is eliminated, even though Tb<Tc, c would not be eliminated; hence, nodes a and c will be elected as CHs. Even though the above description clearly indicates that the DTDC class of algorithms resemble a Matérn Type III process, we cannot find the resultant CH density (or expected number of clusters) using this information. As Bertil Matérn has shown in [20], the point distribution of Matérn Type III-dependent thinning process is mathematically intractable.
Based on this background, we will derive the CH density of the class of DTDC algorithms by finding the probability density function (p.d.f) of α for practical cases satisfying (1) in the next section.
4. Probability Density Function of Cluster Area
Based on our analysis we observe that the probability of α depends on the following two scenarios.
For a given cluster area, there are no uncovered nodes (uncovered node means a node that has not heard from a neighboring CH almost at the end of a new CH candidacy announcement time interval) in its cluster neighborhood.
The chance of having no such uncovered nodes.
Let PB be the probability that no uncovered nodes exist in a given cluster neighborhood. Then the conditional probability PA∣B denotes the cluster area given no uncovered nodes existent in a given cluster neighborhood. Based on these facts, we find the probability PA·B of a resultant cluster area α when no uncovered nodes exists. One finds thatPA⋅B(δ1≤α≤δ2)=PA∣B(δ1≤α≤δ2)⋅PB(δ1≤α≤δ2),
where 0≤δ1<δ2≤πR2.
We use Figures 2, 3, and 4 to explain (2). Please note that the radius of each disk is R in all the figures.
Smallest possible cluster size is 3R2/2.
Largest possible closed packed cluster size is 33R2/2.
Cluster area more than 33R2/2 creates uncovered region with area Au (shaded in gray).
According to the class of DTDC algorithms, smallest possible cluster area size would result whenever a given CH’s neighboring CHs sit on the perimeter of its CH broadcasting coverage disc of radius R since no two CHs could be selected within each other’s CH broadcasting range R. This situation is shown in Figure 2.
Hence, we can writePA∣B(0<α<3R22)=0.
In other words, Figure 2 shows the possible highest CH density (Number of CHs in a given unit area). According to the DTDC class of algorithms, we can expect cluster area sizes between smallest of 3R2/2 to largest of πR2 provided that there are no uncovered nodes in the cluster neighborhood. Therefore, we can write0<PA∣B(δ3≤α≤δ4)≤1,
where 3R2/2≤δ3<δ4≤πR2.
Further, when we have close packed clusters (smallest as shown in Figure 2 and largest as shown in Figure 3), there cannot be any uncovered areas. In other words, when cluster area α>33R2/2, there can be uncovered nodes in its neighborhood since there can be uncovered neighboring regions as shown in Figure 4.
PB(α) represents the probability that there is no uncovered nodes in a given cluster (with area α) neighborhood. This can be expressed byPB(α)=P(n=0∣λAu)=e-λAu,
where Au is any uncovered area formed by the cluster setup as shown in Figure 4. We can show that the neighboring clusters are close packed when the cluster area, α≤33R2/2. In other words, there is no uncovered area, resulting in Au=0 for α≤33R2/2. As a result, the probability that there would not be any uncovered nodes is given byPB(α≤33R22)=1.
According to (5), PB(α) is an exponential decaying function when α>33R2/2. Now let us consider Figure 5. This is a special case of Figures 3 and 4 where nodes 0 and 6 are placed 2R distance apart. According to Figure 5, there is a chance for a node to be in the uncovered area Au shaded in gray. The cluster area α of Figure 5 can be expressed asα=33R22+(π6-34)R2=33R22(1+.0349).
Proof of PB(α>33R2/2)→0.
This is only 3.49% bigger than the size of the cluster area shown in Figure 3. The uncovered area Au of Figure 5 isAu=2(Δp,q,r+Δp,q,c0+Δp,r,c1+Δq,r,c6-Sp,q,c0-Sp,r,c1-Sq,r,c6),
where, in general, Δx,y,z represents an area of a triangle {x,y,z}, and Sx,y,z represents an area of a sector {x,y,z}. Since p=(-0.5R,-0.8660R), q=(0,-R), r=(-0.5446R,-1.1613R), c0=(0,0), c1=(-1.5R,-0.8660R) and c6=(0,-2R), we can derive Au=0.094R2.
We have shown that λπR2≫1 in (1). Therefore, if we consider a WSN with 100 nodes in a given node neighborhood, then λπR2=100 and the resultant PB=P(n=0∣λAu)=0.0502. On the other hand, when the neighborhood contains 200 nodes, this will be further reduced to PB=P(n=0∣λAu)=0.0025. Hence, we can conclude thatPB(α>33R22)⟶0,
whereλπR2≫1.
Therefore, we can approximate thatPB(α≤33R22)=1,PB(α>33R22)=0
provided that λπR2≫1. Hence, once we combine (2), (3), (4), and (10) we obtain thatPA⋅B(3R22>α>33R22)=0.
Therefore,PA⋅B(3R22≤α≤33R22)=1.
The resultant cluster areas α of DTDC class of algorithms have an equal chance to be in the interval [3R2/2,33R2/2], due to the fact that all nodes having an equal chance to get the lowest Ti as they may have equal fitness to be a CH. This results in cluster area p.d.f, pA.B(α) to be uniform. Hence,pA⋅B(α)={13R2,3R22≤α≤33R220,otherwise
provided that πR2λ≫1.
This far we have derived the p.d.f of cluster area α. This result will be used in deriving the expected cluster density in the subsequent section.
5. Derivation of Expected Cluster Density
In this section, we will derive the expected cluster density (or CH density as each cluster is served by one and only one CH) for the class of DTDC algorithms.
Let us define y as the probability that a randomly chosen node is a CH. Thus,y=NumberofCHsinagivenareaNumberofallnodesinthesamegivenarea=1Numberofnodesinarandomcluster=1αλ.
We note that when z=f(x) and x is a random variable with a p.d.f of pX(x), then the p.d.f of z is given bypZ(z)=pX(f-(z))|f′(f-(z))|.
We can write the p.d.f of random variable y, pY(y) using (13) as,pY(y)={13y2λR2,233λR2≤y≤23λR2.0,otherwise.
According to (14), y=k/N, where k is the total number of CHs at a given moment, and N is the total number of nodes. Hence E[y], the expected probability that a given node is a CH, can be given asE[y]=E[kN]=E[k]N=λcλ,
where λc is the CH density. So we haveλc=E[y]λ,E[y]=∫-∞∞ypY(y)dy=∫2/33λR22/3λR2y13y2λR2dy=1(3/πln3)(πλR2)=10.5018(πλR2).
According to (19), we can expect a 0.5018 fraction of nodes belonging to a given CH’s broadcasting range R neighborhood to join its cluster.
Further, using (18) and (19), we can show thatλc=λ(3/πln3)πλR2=ln33R2=10.5018πR2.
Hence, we can conclude that the expected CH density λc is independent of the node density provided that πR2λ≫1.
Observation 1.
The result obtained in (20) matches with the empirical formula proposed by Bettstetter in [18] where λc=λ/(1+μ/2) and μ=πR2λ. When πR2λ≫1 the empirical formula proposed by Bettstetter reduces to
λc=λ0.5μ=10.5πR2.
In the analysis thus far we have ignored the influence of the node deployment region boundary and its effects. In what follows, we will analyze the boundary effect. The CHs closest to the boundary does not have any neighboring CHs beyond the boundary; that is, nodes at the boundary have a higher isolation probability even though all the nodes are uniformly distributed within the deployed area. Hence, CHs are more likely to be found at the boundary. This was observed and confirmed in [18].
We can use (17) and (19) to derive the expected number of clusters E[k] to be formed assuming that the boundary effect does not exist. In other words, we have relaxed the reality that there can be more CHs close to the boundary compared to rest of the area. Thus,E[k]=NE[y]=N(3/πln3)M,
where M=πR2λ is the expected number of nodes in any given CHs broadcasting range R. That is, in (22), we have not considered the boundary effect. In what follows, we will derive M considering the boundary effect for frequently considered node deployment region shapes, namely, a rectangular region and a circular region. Subsequently we will use these results to obtain E[k] accounting for the boundary effect.
5.1. Boundary Effect on E[k] due to a Rectangular Deployment Area
We derive M for a rectangular region with dimensions a×b and N ad hoc deployed nodes. For this scenario, the probability (P0) that two uniformly distributed nodes each within CH candidacy broadcasting range R is given by the integralP0=∫0RfS(s)ds,
where fS(s) is the p.d.f of the distance S between two nodes that are independently and uniformly distributed (at random) in a rectangular area of size a×b, where a≥b>R. According to [21], fS(s) is given byfS(s)=4sa2b2(πab2-as-bs+12s2)for0≤s≤b.
Further, when there are N(≫1) uniformly distributed nodes in the deployment region, we can expect M nodes in a given CH neighborhood of radius R, where M is given byM=NP0.
Hence, using (23)–(25), we can deriveM=NR2πab(1-4R3πab(a+b)+R22πab).
Therefore, when (26) is used with (22), we can derive the expected number of CHs. Thus,E[k]=abln33R2(1-(4R/3πab)(a+b)+(R2/2πab)).
As we have already discussed, deriving the CH candidacy broadcasting range R for a desired E[k] is a salient requirement in most applications. Hence rearranging (27), we obtain that32πabR4-43(a+b)3πabR3+3R2-abln3E[k]=0.
By solving (28), we can derive R for a given ad hoc network setup for a rectangular deployment region with the desired number of clusters, E[k] provided that πR2λ≫1.
5.2. Boundary Effect on E[k] due to a Circular Deployment Region
Let us now derive E[k] for a circular deployment region. We follow the same approach as in the rectangular deployment region case. Let’s assume that the ad hoc deployed wireless node network consists of uniform randomly deployed N nodes in the circular deployment region of r radius resulting in λ=N/πr2.
The expected number of neighboring nodes M in a given CH’s CH candidacy broadcasting range R, for a circular deployment area with radius of r is also given by (25). Note that still the P0 given in (23) is applicable. However fS(s), that is, the p.d.f of the distance S between two nodes that are independently and uniformly distributed (at random) in a circular area with radius r is given byfS(s)=4sπr2(cos-1(s2r)-s2r1-(s2r)2),for0≤s≤2r,
according to [22]. Hence, we can write the E[k] of a given circular area with radius r asE[k]=π2ln323D(R/r),
whereD(R/r)=4(R2r)2cos-1(R2r)-3(R2r)[1-(R2r)2]1/2+2(R2r)[1-(R2r)2]3/2+sin-1(R2r).
Thus, we can determine R for a given circular deployment area with radius r for an expected number of clusters E[k] by solving the reordered (30).
Note 1.
We derived E[k] assuming that Ti is a random variable. This is true only for the situation where all nodes have equal fitness to be a CH; that is, the residual energies all the nodes are the same. This is in fact true for HEED, ANTCLUST, and EDCR algorithms during initial deployment with the assumption that the sensors are ideal. However in subsequent rounds, Ti would be weighted based on each node’s residual energy level at the beginning of the cluster formation. That is, a node with the highest residual energy would be the CH in a given neighborhood. We know that a node closest to a CH would spend the minimum energy in communication. As a result, it would be the highest energy node in that neighborhood at the beginning of the subsequent CH selection phase. Hence it can be observed that a subsequent round, the CHs would be the nodes closest to the previous CHs. Thus, we can expect on average, the same number of clusters formed in subsequent reclustering rounds as well. As a result, (28) will be valid for all subsequent rounds as well.
In this section, we presented an analytical technique to find the Cluster/CH density of DTDC class of algorithms. Further, we derived the expected number of clusters in a finite area considering the boundary effect. In what follows, we compare the analytical results with simulation experiment results.
6. Simulation Results
In this section, the proposed analytical method to determine the cluster density and expected number of clusters for the DTDC class of algorithms using MATLAB simulations were evaluated. It is already established that the proposed analytical results match the empirical results derived using DMAC algorithm in [18]. For comparison, the simulation results for HEED, ANTCLUST, and EDCR algorithms are presented as well. The results are presented based on the following design scenarios.
Design requirement of 20 clusters each with 15 nodes monitoring a square area of 100×100m2. That is, 300 nodes should be deployed in this region. According to (28), the computed broadcasting distance is R=19.42m to achieve the 20 cluster requirement.
Design requirement of 30 clusters each with 20 nodes monitoring a rectangular area of 150×100m2. That is, 600 nodes should be deployed in this region. According to (28) the computed broadcasting distance is R=19.11m to achieve the 30-cluster requirement.
Design requirement of 20 clusters each with 20 nodes monitoring a circular area with radius 200 m. That is, 400 nodes should be deployed in this region. According to the (30) the computed broadcasting distance is R=68.24m to achieve the 20 cluster requirement.
The simulation results related to above-described scenarios are given in Table 2. H1, A1, and E1 denotes the results of HEED, ANTCLUST, and EDCR algorithm respectively, for scenario 1 (square area). Similarly, H2, A2 and E2 represents the results for scenario 2 (rectangular area) and H3, A3 and E3 represent the results for scenario 3 (circular area). Note that E[k] denote the desired number of clusters in each case. The average and standard deviation (AV ± SD) of the actual number of clusters (E[k]A) obtained via a large number of different random node deployment simulations corresponding to each scenario has been tabulated. The E[k]A tabulated in column “Beginning’’ corresponds to the cluster formation results at the initial deployment stage with a fresh set of homogeneous energy nodes, column “End’’ corresponds to the average number of clusters closer to the end of life of the sensor bed (we used 95% nodes alive as the lifetime measurement [12]), and column “Middle’’ corresponds to an average number of clusters at a position halfway in between the “Beginning’’ and “End’’ scenarios. Further the cumulative average of these three cases is presented in the column “Overall’’.
Applicability of the proposed technique equally among HEED, ANTCLUST, and EDCR algorithms.
Average number of clusters, E[k]A
Case
E[k]
(AV ± SD)
Beginning
Middle
End
Overall
H1
20
20.4±0.5
19.3±1.2
18.8±1.1
19.5±1.2
A1
20
19.6±0.9
19.4±1.8
18.8±1.5
19.3±1.4
E1
20
19.6±1.5
20.3±0.9
20.4±0.8
20.1±1.1
H2
30
29.1±1.6
28.3±0.8
29.9±0.8
29.1±1.3
A2
30
29.7±1.4
30.1±1.6
28.4±2.0
29.4±1.8
E2
30
29.3±2.0
30.3±1.0
30.2±1.6
29.9±1.6
H3
20
18.9±1.9
20.0±2.2
19.5±1.1
19.5±1.8
A3
20
19.3±1.1
19.6±0.9
18.3±1.7
19.1±1.4
E3
20
19.1±0.7
19.7±1.4
17.9±1.2
18.9±1.3
The results given in Table 2 show us that the analytical estimation for R based on E[k] cluster requirement is indeed valid as only a minimal variation of E[k] is seen in all simulation results. These results (based on HEED, ANTCLUST and EDCR algorithms) and independent simulation results of DMAC algorithm (and its corresponding empirical formula) given in [18] affirm the validity and applicability of the proposed analytical technique in determining the cluster density and the expected number of clusters of DTDC class of algorithms.
As it can be seen from Table 2 all major algorithms in the DTDC class respond in a similar manner. Hence without loss of generality the EDCR algorithm can be selected from this class for further analysis. For the analysis 15 different hypothetical node deployment requirements (case) which would cover the applicability of the analytical method with square, rectangular, and circular deployment regions, with different expected number of clusters for a given deployment region, and different expected number of nodes for a given cluster based on the E[k] requirement will be used. These requirements are listed in Table 3. The case number will be used to link the tabulated test results of Table 4 for each of these node deployment requirements. The column given under the heading “Area’’ presents the dimensions of node-deployed region (e.g., a×b for a rectangular region and πr2 for a circular region), while the rest of the columns represents the expected number of clusters E[k], expected number of nodes in a cluster E[n], and the total number of nodes to be deployed in the region N(=E[k]×E[n]). The last column presents the calculated R for each case using either (28) or (30) depending on the shape of the region.
Network deployment requirements.
Case
Area
E[k]
E[n]
N
R
1
100 × 100
20
20
400
19.42
2
100 × 100
30
20
600
15.57
3
150 × 100
30
20
600
19.11
4
125 × 100
40
25
1000
14.93
5
200 × 200
40
20
800
26.70
6
200 × 200
30
40
1200
31.15
7
200 × 200
30
25
750
31.15
8
200 × 200
30
20
600
31.15
9
200 × 200
30
15
450
31.15
10
200 × 200
30
10
300
31.15
11
200 × 200
10
30
300
57.42
12
π2002
40
15
600
47.04
13
π2002
20
15
300
68.24
14
π3002
20
20
400
102.37
15
π3002
30
30
900
82.24
Comparison of expected and actual number of clusters of EDCR algorithm.
Average number of clusters, E[k]A
Case
E[k]
(AV ± SD)
Beginning
Middle
End
Overall
1
20
19.6±1.5
20.3±0.9
20.4±0.8
20.1±1.1
2
30
30.2±1.6
29.5±1.1
30.7±1.3
30.1±1.4
3
30
29.3±2.0
30.3±1.0
30.2±1.6
29.9±1.6
4
40
39.8±2.0
39.6±1.1
41.6±1.1
40.3±1.7
5
40
39.2±1.9
41.0±0.9
40.8±1.4
40.3±1.6
6
30
30.9±1.6
31.1±1.7
32.1±1.2
31.3±1.5
7
30
30.4±1.8
30.9±1.6
31.0±1.7
30.8±1.7
8
30
29.8±1.5
30.3±1.1
30.5±1.5
30.2±1.4
9
30
28.8±1.2
28.3±1.7
28.5±1.5
28.5±1.4
10
30
27.3±1.4
28.0±2.8
28.3±1.4
27.8±2.0
11
10
9.8±0.9
11.0±0.8
10.5±0.9
10.4±1.0
12
40
38.2±1.6
39.6±1.5
39.5±1.7
39.1±1.7
13
20
19.1±0.7
19.7±1.4
17.9±1.2
18.9±1.3
14
20
20.5±1.5
20.7±0.9
18.9±1.6
20.0±1.5
15
30
30.2±1.1
33.1±1.1
32.5±1.0
31.9±1.6
Table 4 shows the simulation results of the deployment requirements listed in Table 3. Table 4 presents the average and standard deviation (AV ± SD) of the actual number of clusters we observed with the large number of different random node deployments corresponding to each case. The results tabulated in Table 4 indicate that the proposed analytical technique in estimating R for a desired number of clusters E[k] is indeed an accurate method to realize the actual number of clusters. Furthermore, it can be noted that there is minimal variation in E[k] irrespective of the cluster shape (rectangular, square, or circular), desired number of clusters, and the expected member population in each cluster, provided that all clusters are well populated.
The simulation results presented thus far clearly show the applicability of the proposed analytical technique in estimating the expected number of clusters of the DTDC class of algorithms provided that each cluster is well populated, that is, πR2λ≫1. In order to identify a minimum threshold for πR2λ or expected number of nodes in a cluster E[n] for a given application requirement, the behavior of curves representing average number of actual clusters, E[k]A versus different node densities, λ for different CH broadcasting ranges, R can be observed. Figures 6 and 7 present these curves (E[k]A versus λ) of EDCR algorithm applied for a square deployment region with size 200×200m2 and a circular deployment region with radius 100m, respectively. Both of these graphs consist of E[k]A versus λ curves for R= 25, 30, 35, 40 and 45. The expected number of clusters, E[k] calculated using (28) and (30) respectively for Figures 6 and 7 and is plotted as a vertical dotted line for each R.
“E[k]A versus λ” for different R: 200×200m2 square deployment area.
“E[k]A versus λ” for different R: 100m radius circular deployment area.
Figures 6 and 7 clearly indicate that all the E[k]A versus λ curves are asymptotic and close to the expected number of clusters, E[k]. The vertical solid error bars marked on each E[k] line shows the 5% (short) and 10% (long) levels below the E[k] at πR2λ are 30 and 20, respectively. It has been already identified that 0.5018 fraction of nodes belonging to any given CH’s broadcasting range R neighborhood (M) to join its cluster in Section 5. Therefore, the proposed analytical technique can be used to determine CH candidacy broadcasting range, R of DTDC class of algorithms with a maximum error of 10% for a required expected number of clusters, E[k], when the expected number of nodes in a cluster, E[n], is more than 10. The number of nodes in a cluster is well above this figure in most of the practical applications.
Above-presented simulation results and empirical formula derived based on simulation experiments in [18] affirm the accuracy of using the proposed analytical method in determining R for given expected number of clusters, E[k] of DTDC class of algorithms at the network planning stage.
7. Conclusion
Distributed clustering is a popular technique in organizing ad hoc deployed wireless networks including WSNs. We found that clustering algorithms like DMAC, HEED, ANTCLUST, MEDIC, and EDCR can be categorized into the class of DTDC algorithms based on the common underline Dutch Auction principle in CH selection resulting in a similar CH distribution. In this research, we have provided an analytical framework which can be used to derive the cluster density, λc, for a given deployment requirement where each cluster is assumed to be well populated. Furthermore, the analysis framework has been extended to include the effects of the boundary resulting from a finite deployment region when computing the expected number of clusters. The proposed analytical technique was verified via simulation experiments, and the results were presented. Further, the empirical formula proposed by Bettstetter in [18] independently verifies the accuracy of the proposed technique and vice versa. The authors feel that this analytical framework can be extended to derive λc for any generic situation given by Matérn Type III-dependent thinning point process [20] in future research.
BasagniS.Distributed clustering for ad hoc networksProceedings of the 4th International Symposium on Parallel Architectures, Algorithms and Networks (I-SPAN '99)1999Perth/Fremantle, WA, Australia310315YounisO.KrunzM.RamasubramanianS.Node clustering in wireless sensor networks: recent developments and deployment challenges200620320252-s2.0-3374506935210.1109/MNET.2006.1637928HeinzelmanW. B.ChandrakasanA. P.BalakrishnanH.An application-specific protocol architecture for wireless microsensor networks2002146606702-s2.0-3364658983710.1109/TWC.2002.804190HandyM. J.HaaseM.TimmermannD.Low energy adaptive clustering hierarchy with deterministic cluster-head selectionProceedings of the 4th International Workshop on Mobile and Wireless Communications Network2002World Scientific Publishing368372SmaragdakisG.MattaI.BestavrosA.SEP: a stable election protocol for clustered heterogenous wireless sensor networksProceedings of the International Workshop on SANPA2004Boston, Mass, USA111WangY.wangyongcai@mails.tsinghua.edu.cnZhaoQ.ZhengD.Energy-driven adaptive clustering data collection protocol in wireless sensor networksProceedings of the International Conference on Intelligent Mechatronics and Automation2004599604HebdenP.phebden@csse.unimelb.edu.auPearceA. R.adrian@csse.unimelb.edu.auDistributed asynchronous clustering for self-organisation of wireless sensor networksProceedings of the 4th International Conference on Intelligent Sensing and Information Processing (ICISIP '06)2006374210.1109/ICISIP.2006.4286056WangQ.qwang@ee.queensu.caHassaneinH.hossam@cs.queensu.caTakaharaG.takahara@mast.queensu.caStochastic modeling of distributed, dynamic, randomized clustering protocols for wireless sensor networksProceedings of the International Conference on Parallel Processing Workshops (ICPPW '04)200445646310.1109/ICPPW.2004.1328055YounisO.FahmyS.HEED: a hybrid, energy-efficient, distributed clustering approach for ad hoc sensor networks2004343663792-s2.0-1094426650410.1109/TMC.2004.41KamimuraJ.WakamiyaN.MurataM.Energy-efficient clustering method for data gathering in sensor networksProceedings of the 1st Workshop on Broadband Advanced Sensor Networks2004San Jose, Calif, USAZhaoL.LiangQ.liang@uta.eduMedium-contention based energy-efficient distributed clustering (MEDIC) for wireless sensor networks20073434736910.1080/15501320701260709GamwarigeS.KulasekereC.An energy efficient distributed clustering algorithm for ad hoc deployed wireless sensor networks in building monitoring applications20091127GamwarigeS.sankalpa@ent.mrt.ac.lkKulasekereC.ekulasek@ent.mrt.ac.lkA cluster based energy balancing strategy to improve wireless sensor network lifetimeProceedings of the 2nd International Conference on Industrial and Information Systems (ICIIS '07)200740340810.1109/ICIINFS.2007.4579211SunT.ChenL.-J.HanC.-C.GerlaM.Improving data reliability via exploiting redundancy in sensor networks2004csd-tr no. 040037Los Angeles, Calif, USAUCLA Computer Science DepartmentWangL.-C.lichun@cc.nctu.edu.twWangC.-W.hyper.cm91g@nctu.edu.twLiuC.-M.cmliu@csie.ntut.edu.twOptimal number of clusters in dense wireless sensor networks: a cross-layer approach200958296697610.1109/TVT.2008.928637ChanT. J.ChenC. M.HuangY. F.LinJ. Y.ChenT. R.Optimal cluster number selection in ad-hoc wireless sensor networks2008788378462-s2.0-69249194320MhatreV. P.RosenbergC.KofmanD.MazumdarR.ShroffN.A minimum cost heterogeneous sensor network with a lifetime constraint2005414142-s2.0-1284424940610.1109/TMC.2005.2BettstetterC.The cluster density of a distributed clustering algorithm in ad hoc networksProceedings of the IEEE International Conference on CommunicationsJune 2004433643402-s2.0-4143086007MatérnB.1986362ndNew York, NY, USASpringerHuberM. L.WolpertR. L.Likelihood-based inference for Matern Type III re-pulsive point processesUnder Invited Revision in Applied Probability Trust, 2009, http://ftp.stat.duke.edu/WorkingPapers/08-27.pdfGoshB.Random distances within a rectangle and between two rectangles1951431724TuckwellH. C.19952ndLondon, UKChapman & Hall