GAECH : Genetic Algorithm Based Energy Efficient Clustering Hierarchy in Wireless Sensor Networks

Clustering the Wireless Sensor Networks (WSNs) is the major issue which determines the lifetime of the network. The parameters chosen for clustering should be appropriate to form the clusters according to the need of the applications. Some of the well-known clustering techniques in WSN are designed only to reduce overall energy consumption in the network and increase the network lifetime.These algorithms achieve increased lifetime, but at the cost of overloading individual sensor nodes. Load balancing among the nodes in the network is also equally important in achieving increased lifetime. First Node Die (FND), Half Node Die (HND), and Last Node Die (LND) are the different metrics for analysing lifetime of the network. In this paper, a new clustering algorithm, Genetic Algorithm based Energy efficient ClusteringHierarchy (GAECH) algorithm, is proposed to increase FND, HND, and LND with a novel fitness function. The fitness function in GAECH forms well-balanced clusters considering the core parameters of a cluster, which again increases both the stability period and lifetime of the network. The experimental results also clearly indicate better performance of GAECH over other algorithms in all the necessary aspects.


Introduction
The rapid development in microelectromechanical systems (MEMS) led to the development of miniature sensor nodes [1].Wireless Sensor Network (WSN) is the interconnection of these small sensor nodes in large number.WSN plays a vital role in environmental monitoring, traffic monitoring, disaster prevention, and national border surveillance [1].The individual sensor node generates data by sensing its surroundings and sends it to the central base station (BS).Each node is embedded with a battery and these batteries are mostly not rechargeable.The communication activities carried over in the sensor node will be consuming more energy than its sensing and computation activities [2].If the battery power of one node gets drained, the node became useless and literally called dead node.
All the sensor nodes are able to directly transmit their generated data to BS, but this leads to more energy consumption and affects the lifetime of the network [3].To reduce the overall energy consumption of the network, the nearby nodes or the nodes having the same characteristics are grouped together to form clusters.A cluster head (CH) will be elected among the nodes to manage the cluster activities.The responsibilities of the CHs are collecting the data from their member nodes, aggregating the collected data, and transmitting the aggregated data to BS.But the CH node will not be involved in sensing activities like other nodes.Compared with member nodes (non-CH), usually a CH node has to spend more energy because of its data reception, aggregation, and transmission to BS [4].Various such clustering algorithms in WSN are discussed in detail in [5].
Earlier works like in [6,7] concentrated more on efficient data gathering mechanism in WSN and not on clustering process.But later, the clustering mechanism occurs in the distributed way which proved to be more suitable for WSN [8].As a further improvement, reducing the communication distance within the clustering framework had gained attention [9].Reducing the intra-and intercluster communication in the clustering architecture gathered attention [10] along with distributed clustering approach.Instead of forming dynamic clusters, fixed grid structure is also used, but again it is another form of clustering architecture [11].
In most of the existing clustering algorithms, the overall energy consumption of the network is reduced, but it comes at the cost of uneven energy consumption among the sensor nodes.In certain cases, when we are trying to balance the energy spent among the nodes, the overall energy consumption may rise.Various computational intelligence techniques have been applied to increase the WSNs lifetime [12].In order for a trade-off between overall energy consumption and energy balancing in the network, a new clustering algorithm, Genetic Algorithm based Energy efficient Clustering Hierarchy (GAECH), is proposed in this paper.Although many clustering algorithms using genetic approach [13,14] are proposed, these protocols fail to increase the stability period of the network.The period till the First Node Die (FND) in the network is referred as stability period.FND is an important parameter in deciding the reliability of the network.GAECH achieves both increased lifetime and stability period of the WSN using enhanced fitness function.
The paper is organized as follows.Section 2 makes a brief survey of some of the clustering algorithms with their pros and cons.Section 3 describes basic terminologies used in genetic algorithms.Section 4 discusses the proposed GAECH algorithm with its fitness parameters.Section 5 deals with the simulation setup environment for various clustering algorithms and Section 6 discusses the results of the experiments.

Related Work
LEACH [15] was the pioneer clustering protocol in WSN.It forms clusters in a distributed manner.Setup and steady state are the two phases in LEACH.At first, in the setup phase, each node decides itself to be CH or not.This decision was influenced by various factors like number of times it elected as head, current round, and the percentage of allowed CHs in the network.Then, the CH elected nodes advertise it among the nodes.The other non-CH nodes join the nearby elected CH which is computed based on the received signal strength indicator (RSSI).Followed by this, TDMA schedule is created for the newly formed clusters by the respective heads.In steady state phase, data generated from the member nodes is forwarded to the CHs in its allotted time schedule.The probabilistic way of election of CHs leads to election of noneligible CHs which costs the overall lifetime of the network at the end.
In the HEED [16] protocol, the remaining energy of the sensor nodes is the most important parameter for stochastic selection of CHs.Node degree or average distance to neighbours is used to conclude the CH when there is a tie between two sensor nodes.HEED provides better performance than LEACH due to its energy level consideration during CH election.
GCA [17], genetic clustering algorithm, achieves increased lifetime through two parameters.The first parameter is the total transmission distance within a cluster.The total transmission is calculated by adding the distance of individual member nodes to its CH.The second parameter is the total number of CHs in the network.Since CH nodes spend more energy than other member nodes, the reduction in number of CHs will considerably increase the lifetime of the network.Equation (1) shows the fitness function of GCA."" in (1) is the weight value which is set based on the application requirement: (1) EAERP [18] is another centralized evolutionary computing algorithm for WSN.The fitness function of EAERP is referred to in (2).In cluster formation phase, the initial population is evaluated with the given fitness function till the termination condition.In association phase, the best phenotype is used to select CH node and forms the clusters.The fitness function includes both intracluster and intercluster communication energies: where  Tx represents the transmission energy from one node to another,  Rx is the reception energy,  DA is the aggregation energy, "" represents individual solution, and "nc" is the number of clusters.In all the existing genetic based clustering, there is a common denominator between them all.They achieve overall improvement in network lifetime of the network.The stability period of the network is not considered.The main reasons behind this uneven energy consumption are as follows: (i) The CH designation is not properly rotated among the nodes.(ii) Some clusters in the network accommodate more number of nodes than others, which led to uneven energy consumption among the CH nodes but the overall energy consumption may be reduced.(iii) The CH nodes are not properly distributed among the network.
In the proposed algorithm GAECH, the fitness function is designed in such a way that the above-mentioned factors are considered during the formation of a cluster and electing a CH.

Overview of Genetic Algorithm
Genetic algorithm (GA) is a metaheuristic optimization technique, which produces many fruitful results in the engineering field.It is structured yet randomized search technique which primarily works based upon the following three genetic operators called selection, crossover, and mutation [19,20].Let us have a look at genetics algorithm terms.
Chromosomes.The initial possible solution to the problem is called chromosomes.All the chromosomes should have the same length and the elements in them are called genes or alleles.
Fitness Function.Fitness function is used to evaluate the chromosomes fitness values and the higher valued chromosomes would produce more offspring than others.Here, in this paper, the fitness value is the sum of various parameters in the given proportion.

GAECH Algorithm
In GAECH, the fitness function is enhanced compared to the previous algorithms.The parameters included in the fitness function are aiming to form more balanced clusters and reduce the overall energy consumption too.Unlike earlier algorithms, the fitness function is made up of four components; they are (i) total energy consumption for single data collection round, (ii) Standard Deviation in energy consumption between clusters, (iii) CH dispersion, (iv) CHs energy consumption.

Total Energy Consumption for Single Data Collection
Round.The overall purpose of all the existing algorithms is to reduce energy consumption in the network.So it is taken directly in calculating the fitness function of the chromosomes.Total energy consumption is the sum of intracluster and intercluster energy consumptions.
Intracluster energy is The intercluster energy is The total energy consumption is (, CH) represents energy consumption from th node to its corresponding CH node,  Rx is the reception energy spent at CH, and  DA is the energy consumption due to data aggregation in CH in (3).(, BS) is transmission energy from th CH to BS as in (4)."" and "" represents cluster member and cluster head, respectively, in (3) and (4).Equation ( 5) illustrates the total energy per communication round.

Standard Deviation in Energy Consumption between Clusters.
In most of the existing algorithms, the overall energy is reduced but even energy consumption among clusters is not achieved.The overall energy reduction may affect one or more individual cluster with higher energy cost.It leads to the premature death of certain nodes in the network and it reduces the overall stability of the network.The Standard Deviation (SD) is a metric which measures the deviation in energy consumption between existing clusters in the network.The lesser SD value in (7) represents the most stable network.Equation ( 6) illustrates the calculation of  value:

CH Dispersion.
The distance between CHs is desirable since it reduces intercluster interference a lot.The proper distribution of CH also ensures good connectivity and shared transmission load in the network: The minimum distance between any pairs of CH is taken in (8) and it represents how far the CH nodes are scattered in the network area."" is the set of all CH nodes in the current cluster setup.

CH Energy Consumption.
Compared with non-CH nodes, CH nodes will be spending more energy for their activities.Further, they have to be in wakeup state always to receive data from the members.Also, they have to aggregate the data and send it to BS node: Equation ( 9) represents the sum of total energy consumed by all the CH nodes in the current cluster setup and (10) represents the energy consumption of the individual CH node.
The fitness function of GAECH is represented by The constant coefficients " 1 ," " 2 ," " 3 ," and " 4 " are the individual weight value of each parameter.The weight values may be varied under different conditions and based on the application requirement.GAECH is divided into two working phases, cluster formation phase and data collection phase.
Cluster Formation Phase.As in other WSN clustering algorithms in cluster formation phase, GAECH algorithm runs in the centralized BS which is having all the location details of the nodes in the network.The genetic operators are applied over the random initial population till the termination condition.At the end, the best fit chromosome in the population represents the new cluster architecture.The value "1" in the chromosome represents the CH designated nodes and "0" is the cluster member designated node.These newly selected CH and CM nodes will be intimated by the BS directly to them.
Data Collection Phase.In data collection phase, the CH nodes will be generating a TDMA schedule for its members.The member node has to report its data to corresponding CH only during its allotted time slot.In other time slots, it may enter sleep state, but the CH nodes will be always in wakeup state in order to receive the data from its members.The received data from the member nodes will be aggregated at the CH and sent to BS at the end of each communication round.
Algorithm 1 illustrates the GAECH algorithm.The elitism genetic operator will reproduce the best fit chromosomes in previous generation without any changes.Single point crossover means only one point will be randomly selected in parent chromosomes which divides the chromosome into two substrings.These substrings of parent chromosomes will combine to form the new set of chromosomes.

Experimental Setup
All the existing algorithms LEACH, GCA, and EAERP and proposed algorithm GAECH are implemented in MATLAB.All the algorithms are tested in 20 different network topologies to ensure proper results.Since the location of BS is having a considerable effect on energy consumption, the algorithms are tested in three different scenarios.The three different scenarios vary by the location of BS.In Scenario 1, BS is located at the middle of the network, in Scenario 2 BS at one corner of the network, and in Scenario 3 BS outside of the network.Figures 1, 2, and 3 show the three scenarios, respectively.The blue spots in Figures 1, 2, and 3 represent the sensor nodes and red spot represents the BS.The transmission energy and reception energy equations are the same as in [15] Transmission Energy Tx and  Rx denote the transmission and reception energy."" denotes number of bits to be transmitted; " elec " is the electronic energy for node activity." fs " denotes energy dissipation in free space and " mp " is the energy dissipation during multipath propagation."" represents the distance between two nodes and " 0 " is the threshold distance to determine the transmission model whether free space or multipath propagation is to be followed.The above parameter values are mentioned in Table 1.
From the experimental analysis over 10 different network topologies, it is found that the percentage of CH ranging between 5 and 10 gives optimal energy conservation in the network.When the percentage of CHs is less than 5, it  leads to too much energy consumption for the CH due to more number of cluster members.In another case, when the number of CHs is higher than 10, the cluster formation activities such as CH election broadcasting, requested for member and approval from CH itself, are leading to higher energy consumption among the nodes.In Figure 4, the overall energy consumption in joules in the network is shown for varying number of CHs.In GAECH, the initial random population is generated with 5 to 10 CHs.The cluster head nodes are represented by bit "1" and member nodes are represented by bit "0" and dead sensor nodes are represented by bit "−1." The selection technique used for GCA, EAERP, and GAECH is elitism.The crossover rate is   = 0.6 and mutation rate   = 0.03.The initial random population is chosen as 20 chromosomes and all the genetic algorithms run for 20 generations.Since the number of generations is fixed, the convergence rate of GAECH is same for all runs.The weight "" used in GCA is set to 0.5 and the weight values of GAECH are " 1 " = 0.5, " 2 " = 0.30, " 3 " = 0.1, and " 4 " = 0.1.These weight values have been chosen since they give better results than other combinations.The higher value of " 1 " may reduce overall energy reduction in the network but leads to First Node Die (FND) soon.Each coefficient value represents the importance level of its component in the fitness function.

Simulation Results
In the various network configurations, 100 sensor nodes are randomly deployed in the 100 × 100 meter field area.In these configurations, three different scenarios based on the location of the BS are verified and the corresponding results are discussed in detail.All the four algorithms are tested and their lifetime is evaluated.The percentage of dead nodes along with the corresponding communication rounds are displayed in Tables 2, 3, and 4. 6.1.Scenario 1.In this scenario, the BS is located at the middle of the network, that is, (50, 50)  and  coordinates.Considering FND metrics, GAECH shows improvement in network lifetime by 6.92% better than EAERP, 14.64% better than GCA, and 14.70% better than LEACH.In terms of HND, GAECH is 5.75% higher than EAERP, 12.5% higher than GCA, and 13.4% higher than LEACH.GAECH performs 5% better than EAERP, 12% better than GCA, and 13% better than LEACH in terms of LND. Figure 5 shows the distribution of the number of dead nodes against communication rounds.Table 2 shows the number of communication rounds during every 10% of dead nodes in detailed insight.

Scenario 2.
Since the BS is located at the corner of the network, for some CH the communication distance reduces but for other distant CHs it may increase.By FND metrics, GAECH is 9.52% better than EAERP, 13.44% better than GCA, and 12.18% better than LEACH in increasing network lifetime.When HND is used, GAECH is 6.67% better than EAERP, 10.64% better than GCA, and 10.12% better than LEACH.GAECH shows 6.8% higher performance than EAERP, 10.56% higher performance than GCA, and 10% higher performance than LEACH in terms of LND. Figure 6 shows the distribution of the number of dead nodes against communication rounds.Table 3 shows the number of communication rounds during every 10% of dead nodes in detailed insight.

Scenario 3.
Compared with other two scenarios, since the BS is located outside the network, the number of communication rounds will be relatively low.In terms of FND, GAECH shows a 12.03% improvement in network lifetime compared to EAERP, 20.70% compared to GCA, and 21.11% compared to LEACH.In terms of HND, GAECH is 7.32% better than EAERP, 13.27% better than GCA, and 15.87% better than LEACH.GAECH performs 7.78% higher than EAERP, 13.11% higher than GCA, and 15.50% higher than LEACH in LND metrics.Figure 7 shows the distribution of the number of dead nodes against communication rounds.Table 4 shows the number of communication rounds during every 10% of dead nodes in detailed insight.Figure 8 depicts the stability period, that is, FND occurrence in LEACH, GCA, EAERP, and GAECH algorithms in all the three scenarios, respectively.Since the stability period is an important metric in deciding the reliability of the network, it is shown separately in Figure 8.In the best case, GAECH has achieved 285 higher numbers of communication rounds than LEACH in Scenario 1, 263 rounds than GCA in Scenario 2, and 370 rounds than LEACH in Scenario 3.
Figure 9 displays the average energy of the network for a single communication round.LEACH is found to be consuming more energy than others due to the random and the probable way of election of CH compared to other algorithms.The other genetic based clustering algorithms GCA and EAERP show reduced energy consumption compared to LEACH but the unequal energy consumption between clusters is still affecting their performance.GAECH achieves reduced energy consumption compared to other two genetic algorithms by proper balancing of energy load in each cluster.
Figure 10 shows the average energy consumption of a CH node.Due to uneven load between clusters, LEACH, GCA and EAERP have higher energy consumption at their CHs.GAECH properly balances the energy consumption among its CHs through its novel fitness function and comparatively it has less average energy consumption in its CH than other compared algorithms.

Conclusion and Future Work
The increase in the lifetime of the WSN is achieved in a balanced sense using the proposed algorithm GAECH.The experimental results have shown that GAECH performs better than its other counterparts like GCA, EAERP, and LEACH in three different network scenarios.The increase in performance is distinct when the BS is located far away from the network, which is practically possible in most of the real time applications.As the best case, GAECH shows 21.11% increased lifetime than its counterpart when the BS is located outside of the network.Also GAECH is found to be conserving energy by balancing energy consumption among the clusters.Depending upon the application need, the various weight coefficients mentioned in the fitness function of GAECH can be changed to obtain better results.The future research direction would be around using other parameters like node degree and residual energy of a node in the fitness function.

( 1 )
Generate Random initial population (2) Evaluate the initial solutions using fitness function (3) for (1) till the condition is true (a) Apply Elitism selection operator (b) Apply single point crossover,   = 0.6 (c) Apply mutation with given probability,   = 0.03 (4) Update the population with new offspring (5) end (6) Select the best fit chromosome and form the cluster accordingly Algorithm 1: GAECH algorithm.

Figure 4 :
Figure 4: Average energy consumption per communication round.
. It is the basic genetic operator which reproduces the chromosomes with higher values to the next generation.Various selection techniques like Roulette wheel, rank, steady state, and elitism are there.Depending on the application requirement, any one selection technique can be used.Crossover.Crossover selects two parent chromosomes and makes them swap part of their genetic information with each other and produces the next generation chromosomes: Chromosome 1 . . .100000 | 001000 . . .
[19]mosome 2 ...000100 | 000001 ...Off-spring 1 ...100000 | 000001 ...Off-spring 2 ...000100 | 001000 ....Mutation.After crossover, mutation operator may be applied to the chromosomes.It prevents the GA approach from premature convergence.It is used to search the solution from a whole new place instead of searching for the current better ones:...10001000 ...↓ mutation ...00010001 ....Compared with selection and crossover operator, mutationis used with less probability, since it may drastically change the fitness value of a particular solution[19].The advanced genetic operators are not involved in the proposed algorithm since it may increase the complexity of the algorithm working.The basic genetic operators like selection, crossover, and mutation are only used in GAECH.

Table 1 :
Simulation parameters and values.

Table 2 :
Percentage of dead nodes and number of communication rounds in Scenario 1.

Table 3 :
Percentage of dead nodes and number of communication rounds in Scenario 2.

Table 4 :
Percentage of dead nodes and number of communication rounds in Scenario 3.