Metaheuristic Load-Balancing-Based Clustering Technique in Wireless Sensor Networks

University of Petroleum & Energy Studies (UPES), Department of Cybernetics, School of Computer Science, Energy Acres Building Bidholi, Dehard un-248007, Uttarakhand, India School of Mines, Kazi Nazrul University, Asansol, West Bengal, India Department of Electronics & Communication Engineering, School of Engineering, Sister Nivedita University, DG 1/2, New Town, Action Area 1, Kolkata, West Bengal, India Physics Department, Bidhan Chandra College, Asansol, 713303 West Bengal, India Department of Electronics & Communication Engineering, Dr. B. C. Roy Engineering College, Durgapur, West Bengal, India


Introduction
A wireless sensor network (WSN) comprises a large number of tiny devices capable of sensing the surrounding, processing the collected data as per the application, and communicating the processed field information to the centralized base station (BS) [1]. However, the sensor nodes deployed (either randomly or deterministically) in the sensing field suffer from several constraints. They are limited in processing abilities, storage abilities, power, and other allied restrictions [2]. Among all these restrictions, limited power is the most severe one as the node drained of all the energy and frequent recharging and replacement cannot be facilitated, especially in remote applications of WSN like habitat monitoring, environmental monitoring, industrial monitoring, and military surveillance systems [3,4].
Typically, transmission and route allocation consume most of the nodes' energy and are very much responsible for the power drainage of the sensor nodes. Thus, to solve this issue, energy-efficient network layer operations have been targeted by researchers for many years. Routing is the main functionality of the network layer, and hence, designing an energy-efficient routing protocol is consistently captivating the attention of the community. To the aforementioned, clustering has evolved as a very significant tool that not only eases the task of routing and distributes the load evenly within a cluster but also, through the use of data aggregation, results in substantial saving of nodes' energy to be consumed in other significant network operations.
Clustering has been defined as the grouping of nodes based on some common attributes. In a clustering-based architecture, the network nodes are partitioned into some groups termed clusters. Within the cluster, a node is designated as cluster head (CH) which carries out more energy heavy tasks such as data aggregation and long-distance communication to the sink on behalf of the entire cluster. The rest of the nodes, called cluster members, perform the basic task of sensing and short-distance communication to the CH [5]. To effectively improve the WSN performance, balancing the clusters is a prerequisite. Thus, the formation of clusters in the WSN can be seen as an optimization problem involving multiple variables to be brought into consideration like nodes' proximity, nodes' residual energy, and size of the tentative clusters. The optimization problems can be classified into two major categories-heuristic and metaheuristic.
The primary motivation behind this work is to pursue the problem of clustering through metaheuristic algorithms. As mentioned above, since the formation of balanced clusters leading to the energy-efficient network operation requires the adequate consideration of various parameters such as nodes' proximity and cluster size, optimization techniques can help a lot in having a suitable solution. With the obtainment of balanced clusters and rotation of cluster head's role among the nodes over the network rounds, the foremost goal of network lifetime improvement can be achieved effectively. In this paper, a novel energy-efficient clustering protocol, Metaheuristic Load-Balancing-Based Clustering Technique (MLBCT), is proposed for the wireless sensor networks based on the idea of differential evolution, a metaheuristic technique. The proposed scheme defines a suitable fitness function to formulate the balanced network partitioning. Once the clusters are finalized, the scheme freezes those and enables the CHrole rotation among the cluster members. To prove the scheme's efficacy, an extensive set of simulations demonstrate the showcasing of the improved network lifetime and network energy consumption. The rest of the paper is organized into five descriptive sections. Section 2 outlines the literature review of the existing works in the same context to identify the technical gaps. Section 3 presents the adopted network model, an introductory discussion on differential evolution, and the terminology to be used throughout the work. Section 4 describes the proposed scheme detailing each of its constituent phases. Section 5 discusses the performance in detail to confirm the supremacy and efficacy of the MLBCT, and finally, Section 6 concludes the work by mentioning the future scope for the same.

Literature Review
As mentioned in Section 1, the optimization techniques can be majorly categorized as heuristic and metaheuristic schemes. Heuristic techniques utilize the complete set of particulars of a given problem and, being greedy in nature, generate solutions that might get trapped into local maxima/minima instead of producing the global maxima/minima.
On the other hand, metaheuristic techniques, also termed guided random search algorithms, are problemindependent, providing the optimal solution without getting stuck into the local maxima/minima. Metaheuristic algorithms compute the optimal solution by thoroughly exploring and exploiting the available search space in multiple iterations. The general working of the metaheuristic techniques is summarized in Figure 1.
The metaheuristic scheme starts working with a randomly selected set of solution vectors that improve over the iteration. Once the application-specific parameters such as scaling factor and crossover rate are defined, the fitness of the current solution set is evaluated through a carefully designed fitness function. Then, the counter which keeps track of the iterations is initialized. Afterward, a selection from the population chosen is made, and the selected vectors undergo a variation phase (mutation/crossover). Thus, updated vectors are again evaluated for their current fitness, and through a survivor function, a greedy selection strategy, the population for the next generation is finalized. The process of updating the set of solutions is repeated for a predefined number of iteration, and at last, the most recent population is selected as the final solution. An intelligently and carefully designed fitness function plays the most significant role in obtaining further improved offspring in metaheuristic techniques.
Here, we present a brief review of such schemes based on the approaches known as heuristic and metaheuristic.
2.1. Heuristic Schemes. In one work [6], the authors proposed the most popular clustering-based routing protocol, Low-Energy Adaptive Clustering Hierarchy (LEACH), for the wireless sensor networks, which features a probabilistic 2 Wireless Communications and Mobile Computing selection of cluster heads. It implements the localized coordination for various network operations and randomized rotation of the role of the cluster heads for load balancing among the nodes. However, since the selection of cluster heads does not count the residual energy of the nodes, nodes with low residual energy might suffer from early death if frequently selected as cluster heads. In another work [7], the authors of the LEACH proposed an extension of the [6] requiring the nodes to send their location and energy status to the base station for the selection of cluster heads in a centralized manner and the formation of appropriate clusters via the application of simulated annealing algorithm.
The authors proposed a chain-based scheme in which, instead of forming multiple clusters [8], the nodes were provisioned to develop chains in a way that each sensor could exchange data with the neighbor nodes. At last, the chain leader concludes the entire data flow and forwards it to the base station. However, the scheme proved to be more energy-efficient than LEACH, but the significant delay in the delivery and dynamic topological adjustments appeared as the major issues of the scheme.
In [9], the authors proposed a static clustering scheme that eradicated the energy costing of the dynamic cluster formation in every round of the network operation as in LEACH, etc. In this scheme, distance-based clustering is executed via the base station. Once the clusters are decided, two important parameters-residual energy of the nodes and the nodes' spatial distribution-are considered to select cluster heads. However, the scheme only targeted energy consumption minimization.
In one scheme [10], the authors proposed a centralized scheme that treated coverage in the sensing field as equally important as the energy efficiency. The scheme starts with the distance-based clustering as in the [9]. It selects the cluster heads based on the weighted mean of the contribution factor of the nodes, where the contribution factor is defined as the ratio of the node's residual energy to that of the native grid in the sensing field. The main objective of the scheme is to assure network-wide coverage for the maximum network operation time.
In [11], the authors proposed a LEACH-based clustering protocol that mainly targets the energy efficiency and the fault tolerance in the network. To improve the network lifetime, the network nodes are provisioned to send their data to their respective cluster heads only when the current data is distinct from the previous data. At the end of every network round, noncluster head nodes forward their current energy status to the respective cluster heads to get classified as faulty (nodes with lower residual energy level) and live nodes (nodes with sufficient residual energy). The identification of faulty nodes facilitates the fault tolerance in the network.
In [12], the authors proposed a Fault-Tolerant Clustering-based Multipath algorithm (FTCM) to address the problems of energy efficiency and fault tolerance in the wireless sensor networks. The scheme calls the hybrid energy-efficient distributed clustering (HEED) [13] scheme to partition the network into an appropriate number of clusters. It also appoints a backup CH (BCH) for a cluster head to improve the fault tolerance. The BCH consistently monitors the performance of CH and keeps a copy of CH's data until delivered to the base station. In case of any mishap at the CH end, the BCH can instantly transmit data to the base station without asking the member nodes to send their data again. In addition to the regular responsibilities of CH, the CH is also responsible for the removal of the majority of faulty nodes via hypothesis testing and majority voting. The proposed scheme enables three paths to transfer data from the source node to the base station based on the parameters-residual energy of the nodes, number of hops, propagation speed, and path reliability.
In [14], the authors proposed a clustering-based Hierarchical Fault-Management Framework (HFMF) to address energy management and fault management jointly. For the minimization of energy consumption, the sleep/active method is used. For the management of faults, that is, faults' detection and recovery, backup CH (BCH) is appointed along with every CH to take care of acting CH in the event of its malfunctioning or failure. Later by measuring the data correlation among the cluster members, nodes are grouped virtually to further achieve the energy and fault

Metaheuristic Schemes.
A wide variety of metaheuristic techniques such as genetic algorithm (GA), genetic programming (GP), evolutionary programming (EP), evolution strategies (ES), differential evolution (DE), particle swarm optimization (PSO), ant colony optimization (ACO), and teaching-learning-based optimization (TLBO) exist in the literature. Such metaheuristic techniques with the virtue of being problem-independent have already imparted a lot in almost every field of engineering like [15]. In the context of wireless sensor networks, some contributions are noticed especially for the selection of cluster heads and the effective formulation of the clusters like in [16][17][18][19][20][21][22][23][24][25].
Due to its simplicity, robustness, and fast convergence, differential evolution has proved its worth over the algorithms like GA and PSO [26]. Several contributions have already been proposed based on this outstanding differential evolution technique in search of suitable clusters of the nodes in WSN. This subsection discusses some of the prime contributions in this regard as follows: In one work [27], a differential evolution-based routing scheme, DE-LEACH, is proposed for environmental monitoring wireless sensor networks. DE-LEACH applies the fast and straightforward converging search technique of differential evolution to produce the clusters by considering the nodes' residual energy status and spatial distribution. The scheme consists of four phases: partitioning initial clusters, collecting status information of the nodes within the clusters through the auxiliary cluster heads, determining optimized cluster heads with differential evolution, and forming optimized clusters. The phases are to be executed in every round of the network operation. The scheme outperforms the traditional LEACH, and LEACH-C [7]. However, the nodes are burdened with heavy computational responsibilities.
In another work [28], a differential evolution-based clustering algorithm (DECA) is proposed, which provisions specialized nodes enriched with the additional amount of initial energy to act as cluster heads. These specialized nodes are called relay nodes or gateways. In DECA, besides providing a suitable fitness function (to measure the health of the tentative clusters), a new local improvement phase has also been proposed that carefully prevents early death of the gateways. DECA utilizes the DE/best/1/bin scheme for the differential evolution. In addition to a novel scheme for the vector representation, a fitness function is designed by considering the standard deviation of the lifetime of gateways and average cluster distance. The scheme outperforms the [29][30][31] traditional differential evolution and genetic algorithmbased scheme in terms of network lifetime; however, the scheme gives only a little attention to the cluster balancing via its local improvement phase.
A hybrid differential evolution and simulated annealing (DESA) scheme for the improvement of network lifetime in wireless sensor networks is proposed in [32]. The scheme utilizes a hybrid of differential evolution and simulated annealing for local and global optimal solutions, respectively. There are four phases in the scheme-population vector initialization, mutation, crossover, and selection as in the traditional differential evolution. However, instead of using a random selection of population vectors, a more effective, "opposite point method" [33] technique is used for the initialization of population vectors. The mutation scheme is decided randomly at run time based on a chosen threshold value (here, it is 0.5) in such a way that a random number belonging to (0, 1) is observed, and if it is below the threshold, the mutation scheme is DE/rand/1; otherwise, it is DE/ target − to − best/1. The fitness function is designed by considering the ratio of nodes' energy to that of the respective clusters. And for crossover, a blending rate based on Gaussian distribution is used. The scheme outperforms the traditional differential evolution scheme in terms of network lifetime, energy consumption, throughput, etc.; however, it converges slowly.
In [34], the authors proposed Multiobjective Load-Balancing Clustering (MLBC) which is a multiobjective optimization technique that addresses two significant problems in WSN-energy efficiency and reliability. It utilizes the Multiobjective Particle Swarm Optimization (MOPSO). MLBC targets energy efficiency by appropriately considering the average residual energy of the cluster heads and reliability by reducing the intercluster communication cost among the nodes in a cluster. It also provisions the load balancing via shuffling the roles of the next-hop node and CH in every iteration. However, it considers only the average residual energy of cluster heads in formulating the objective function for energy efficiency.
In a scheme [35], efficient energy consumption in wireless sensor networks using an improved differential evolution algorithm is highlighted. The scheme is an improvement of [28], in which the mutation strategy has been updated to accommodate the target vector along with the prior best and two random population vectors. Also, the fitness function has been upgraded to accommodate the total energy of the gateways and nodes in addition to the existing network lifetime standard deviation component. However, nothing has been mentioned concerning the load balancing among the clusters.
In one work [36], the authors proposed a hybrid metaheuristic clustering algorithm that exploits the best of Artificial Bee Colony and differential evolution optimization techniques. In their proposed Artificial Bee Colony (ABC) with differential evolution (DE) scheme, known as ABC-DE-based clustering scheme, the objective function is designed by taking into account the three network parameters-average intracluster distance, average energy of cluster heads, and data transmission delay to ensure the loadbalanced cluster heads. In addition to this, an ABC-based metaheuristic algorithm has also been proposed to facilitate the dynamic repositioning of the mobile sink within the cluster-based network to achieve further energy efficiency.
In [37], the authors have addressed the problem of energy optimization in an Internet-of-Thing-based WSN (IoT-based WSN). In pursuance of the problem, as mentioned earlier, a hybrid of the Whale Optimization 4 Wireless Communications and Mobile Computing Algorithm (WOA) and simulated annealing (SA) metaheuristic algorithms have been employed to select the most suitable cluster heads in their respective clusters. For choosing the most appropriate cluster heads, the fitness function of the proposed scheme considers a set of five node-specific parameters: residual energy, load, delay, distance, and temperature. The fitness function ensures that the node with the highest residual energy but the least load, delay, distance, and temperature is selected as the cluster heads in every network round. In one work [38], the authors proposed an Artificial Intelligence-(AI-) based quorum system to address the issue of energy conservation in the wireless sensor networks. The primary motivation behind the proposed AI-based was to fasten the neighbor discovery process in order to minimize the network latency. Moreover, the scheme facilitates a quorum-based grid system that allows a substantial increase in the number of nodes in the quorum without mandating the increase in the number of quorums to reduce the effective network delay. In addition to the aforesaid, the feature of weighted load balancing reduces the network energy consumption to improve the network lifetime. Through the various experimentation, the authors have established the outperformance of their proposed scheme over the stateof-the-art quorum algorithms in terms of latency, improved coverage, energy efficiency, and network lifetime.
In [39], the authors proposed a genetic algorithm-(GA-) inspired clustering-based approach to address the problem of node's localization in wireless sensor networks. To find the accurate position of unknown nodes with respect to the anchors or known nodes, the authors used the Euclidean distance objective function in their proposed scheme. Through various simulation results, the supremacy of the GA-based localization scheme with an extended clustering approach has been established over the state-of-the-art schemes like Centroid and Distance Vector-Hop (DV-Hop) in terms of improved location accuracy.
In a scheme [40], the author proposed a genetic algorithm-based energy-efficient clustering scheme which addressed the localization problems in wireless sensor networks. The authors utilized parameters like node's residual energy, distance estimation, and coverage connection in the formulation of fitness function for their proposed scheme, Energy-Efficient Clustering in Genetic Algorithm Localization (EECGL). Through various experimentation, the authors have shown that EECGL approximates the unknown node's location with the least localization error and extends the effective network lifetime by minimizing the overall network energy consumption.
In a work [41], the authors proposed a metaheuristic energy-efficient clustering technique which is inspired by the Brain Storm Optimization (BSO). The BSO is a swarmbased metaheuristic technique exploiting the human brainstorming process in search of the best possible solutions. In their proposed scheme, Energy-Efficient Clustering-Brain Storm Optimization (EEC-BSO), the authors have focused on deciding energy-efficient clusters in a way that nonparticipating nodes in the information transmission process are sent to sleep mode minimizing the overall network consumption. In the formulation of such clusters, the fitness function is designed by considering the parameters like node's residual energy, coverage, and packet data rate. Moreover, the outperformance of EEC-BSO has been established over the state-of-the-art schemes such as LEACH, LEACH-Centralized, Energy-Efficient Clustering Scheme (EECS), and LEACH-BSO in terms of reduced energy consumption, improved coverage, and data packet rate.
In a proposed scheme [42], a differential evolution-based clustering routing protocol (DEBCRP) for wireless sensor networks. DEBCRP is a base station-dependent scheme that applies DE/best/1/bin scheme for the network partitioning into some clusters. The fitness function devised by the authors considers the nodes' residual energies with respect to the probable cluster heads and the distance between the nodes and the cluster heads for the formulation of clusters. At last, to communicate the data from the sensing field to the base station, a PEGASIS [8] like a chain of the cluster heads is formed. The scheme DEBCRP is reported to outperform the S-DE [43] in terms of network lifetime. However, no adequate consideration is given for the formulation of load-balanced clusters, which is the most prime key to network lifetime improvement. Also, PEGASIS like chain of the cluster heads suffers from similar problems as in [8], for example, delayed communication, and since data from one CH is to be aggregated with that of the others in the direction to the sink, there might be introduced some inaccuracy in the information being sent to the base station.
From the aforementioned analysis, it can be easily concluded that despite being the most important factor for the formulation of clusters in the network, cluster balancing has been addressed the least. Thus, the work being presented here serves the following objectives: (i) Balanced cluster formulation to contribute effectively towards the enhancement of network lifetime (ii) Adaptable clustering solution to perform consistently well in any network configuration

Preliminaries
This section describes the network model for the scheme. In addition to this, it also discusses the basics of the differential evolution metaheuristic technique and the entire set of notations used throughout the work.
3.1. Network Model. MLBCT assumes the wireless sensor network with the following characteristics: (1) All the sensor nodes are deployed randomly across the sensing field and are static. More illustratively, nodes once deployed cannot change their location (2) The sensor nodes are homogeneous and equipped with a definite amount of initial energy (3) The sensor nodes are facilitated with the power control features to introduce variations in the transmission power as and when needed 5 Wireless Communications and Mobile Computing (4) The base station is also static and can be placed at any point in the network accordingly (5) The continuous data flow model is used here to define the working mode of the sensor nodes 3.2. Differential Evolution: An Overview. The differential evolution has evolved as a prevalent stochastic metaheuristic multimodal optimization technique over the continuous search space. Similar to the general scheme of metaheuristic techniques as discussed in Section 1, it starts with the definition of the initial parameters where the values of scaling factor and crossover rate are defined along with the randomized set of initial solutions (initial population) and the number of iterations. Here, each solution vector (equivalently known as chromosome or genome) termed as a target vector undergoes the mutation phase followed by the recombination. This mutation followed by the recombination is nothing but the variation phase of Figure 1. As depicted in Figure 2, the target vector, once it passes through the mutation phase, becomes the donor/mutant vector. After the recombination or crossover phase, the donor vector is known as the trial vector.
In the differential evolution scheme, obtainment of the next-generation solutions is performed only after the generation of all trail vectors when compared to particle swarm optimization, and teaching-learning-based optimization [44,45]. In other words, the greedy selection towards the next-generation solution is performed between the pair of target and trial vectors once all the target vectors have been converted into trial vectors. A variety of mutation strategies exist, such as random, best, and target-to-best, along with the two types of crossover techniques-binomial and exponential crossovers. The binomial and exponential crossover can be defined as follows: where C P is the crossover probability, δ is the randomly selected variable location from the set f1, 2, 3, ⋯,|decision variable | g, r is the random number between 0 and 1, u j refers to the j th variable of the trial vector, v j refers to the j th variable of donor/mutant vector, and x j refers to the j th variable of the target vector.

Exponential Crossover.
In the exponential crossover, at very first, the n th variable from the donor vector is copied into the trial vector. Afterward, every subsequent variable from the donor vector is copied into the trial vector as long as the r ≤ C P . Once r > C P , variables from the target vector are copied into the trial vector. Based on the adapted mutation strategy and crossover type, various schemes have been proposed for differential evolution, and to discriminate among them, a standard notation, DE/x/y/z, is used. Here, DE refers to the differential evolution, x denotes the mutation strategy, y denotes the number of difference vectors to be used in the mutation operation, and z refers to the crossover scheme selected. Some of the variants of the DE schemes are listed here in Table 1.
Here, in Table 1, V is the donor vector, F is the scaling factor such that F ∈ ð0, 2Þ, X best is the target vector with best fitness value, X i is the i th target vector, and X r j is the j th target vector chosen randomly where j ∈ ½1, N, N being the number of target vectors in the population. Once the trail vectors are generated for all the target vectors of current generation, say G, offsprings are chosen based on the fitness value of the corresponding pairs of target and trial vectors, i.e., <X i,G , U i,G > for i ∈ ½1, N as follows:  Figure 2: Vector transformation in differential evolution.

Wireless Communications and Mobile Computing
The main objective of the present work is to formulate the balanced clusters within the network for the even distribution of load among the nodes. To ensure this, it is attempted that the clusters are equipped with an almost similar count of member nodes situated close to one another. Also, the clusters are left with an approximately equal amount of residual energy at the end of every network round.

Proposed Scheme: Metaheuristic Load-Balancing-Based Clustering Technique (MLBCT)
This section describes the proposed scheme, Metaheuristic Load-Balancing-Based Clustering Technique (MLBCT) in wireless sensor network. The MLBCT is a base station-(BS-) assisted scheme which calls the BS for the differential evolution-based cluster formation. Once the optimized and balanced clusters come into existence, it hands over the responsibility of further network operations to the network nodes.
The scheme starts with a bootstrapping phase in which all the nodes are assigned unique IDs, which in turn communicate their IDs and location information to the BS. The BS then applies the differential evolution with a wellestablished fitness function (detailed below) and formulates the balanced clusters. The selected cluster heads are then informed of their specific roles and their members' information by the base station. Thus, selected cluster heads then provide their IDs to the respective members along with the TDMA schedules. Afterward, the overall network operation is divided into rounds where each round consists of the steady-state phase and the responsible node selection phase. In the steady-state phase, cluster members send their data to their respective cluster heads, which aggregate the received data and forward it to the base station. In the responsible node selection phase, the current cluster head in a cluster, select a node randomly to act as head for the next round and broadcast into the concerned cluster. The entire workflow is summarized in Figure 3 and has been detailed into the subsequent subsections and algorithm as follows: 4.1. Bootstrapping. In bootstrapping, differential evolution is applied by the base station to divide the entire network into k number of balanced clusters where k is a user-defined parameter. It starts with the sharing of node-specific information such as identity, residual energy, and location information to the base station by the nodes deployed. Based on the information received, BS performs the following to determine the required partitioning.
4.1.1. Generation of the Random Population. The population vectors are generated as per the [28]. Each population vector is chosen in such a way that it indicates the assignment of every network node to one of the cluster heads. The notation adopted to represent the i th population vector of the G th generation is as follows: where x 1,i,G , x 2,i,G , x 3,i,G , ⋯, x N,i,G are the random numbers between 0 and 1. x j,i,G denotes the assignment of the node s j to one of cluster heads, say k, as follows: Here, the length of the population vectors is definite and determined by the number of nodes deployed in the field.
Thus, corresponding to every population vector, say where y k ∈ Θ is assigned to the node x j in the i th vector of G th generation as per equations (4) and (5).

Fitness Function.
It can be easily intuited that if the clusters are balanced in the clustered network architecture, they might have an almost similar level of residual energy and a similar count of member nodes. With this conception, to meet our primary objective of network partitioning into some balanced clusters, nodes' residual energy and cluster size have been taken as the decision parameters. In addition to this, nodes' proximity has also been taken into account, ensuring the reduced energy consumption in intracluster communication.

Wireless Communications and Mobile Computing
A suitable fitness function always contributes the most to the differential evolution to converge. Thus, the fitness function has been derived in such a way that it characterizes all the aforementioned requirements as follows: (i) Standard deviation of average cluster energy If the clusters have been formed in an optimized way, ensuring the entire network energy is distributed evenly across the clusters formed in the network, each cluster is supposed to have an almost similar level of residual energy. In other words, it can be said that in terms of average cluster energy (ACE), each cluster should have the approximately same amount of energy, and hence, the standard deviation accords to the following: where k is the number of clusters. It is quite obvious that the lower the value of σ CE , the higher the value of fitness, i.e., (ii) Standard deviation of average cluster size The balanced clusters must have an approximately equal number of members. In other words, it can be said that the average cluster size (AvgCS) of each cluster should have the almost same count of cluster members.
With this, the standard deviation and the fitness value accord to equations (9) and (10), respectively.
where k is the number of clusters. It can be intuited again that the lower the value of σ CS , the higher the value of fitness, i.e., (iii) Nodes' proximity within the cluster This is the metric that ensures that when there comes to decide on the nodes to be a part of a cluster, the one who is located at a shorter distance from the other members gets priority. The central idea behind having this metric is to reduce the cost of communication within the cluster. The lower the value of this metric, the higher the value of fitness. More illustratively, From equations (8), (10), and (11), we can have the following: i.e.,

Wireless Communications and Mobile Computing
where "K" is proportionality constant which can be set as K = 1 without loss of generality. And, hence, or 4.1.3. Mutation Strategy. Like in [28,42], DE/best/1/bin scheme is adapted here in this work which refers to the application of the DE/best/1 mutation strategy. As depicted in Figure 2, each target vector of the population (say, of the size P) will go through this scheme to get transformed into a donor vector. From Table 1, the mutation expression for the selected strategy is where X ! best,G and X ! r 1 ,G , X ! r 2 ,G refer to the best vector, and any two randomly selected vectors from the G th generation of the population such that i, r 1 , and r 2 are the three random integers ∈½1, P and i ≠ r 1 ≠ r 2 , respectively. F is the scaling factor that may have any value between ð0, 2Þ.
From equation (3), it is quite obvious that the components of the vectors in equation (16)-X ! best,G , X ! r 1 ,G , and X ! r 2 ,G -are the random values ∈ð0, 1Þ. In order to ensure that the components of the vector V ! i,G are also the values ∈ð0, 1Þ, a few amendments are being introduced as in [28]. Let then, Also, for the computation of v j,i,G contributing to V ! i,G , the following can be referred to .1.4. Crossover Scheme. The crossover schemes in terms of the binomial and exponential crossover are already described in Section 3. A binomial crossover scheme is used in this work to convert the donor vector into the trial vector.

Selection or Offspring Generation.
Once all the trial vectors are generated following the above-mentioned steps, the next generation can be obtained on basis of the comparison of fitness values of the corresponding pair of target and trial vectors as given in 4.1.6. Complexity Analysis. Throughout the proposed scheme, fitness function would be evaluated for N P + N P * T times where N P refers to the size of population and T refers to the number of iterations known a priori. Moreover, exploiting solution space in search of the most optimal solution is a continuous process in the metaheuristic scheme. For this reason, even in the best case, the complexity of the fitness function will be Oðn 2 Þ as each newly generated solution has to be compared with its predecessor in terms of its fitness value. Similarly, complexity of the fitness function in the worst case will be Oðn 2 Þ due to successive fitness value computation and comparison. Thus, the average-case complexity for the fitness function can be concluded as Oðn 2 Þ.
As explained at the beginning of this section, once the clusters are formed, and members are notified of their respective initial heads, further network operations can be 9 Wireless Communications and Mobile Computing divided into two rounds-the steady-state phase and the responsible node selection phase.

Steady-State
Phase. This phase refers to the data transmission in which cluster members send their data to their respective cluster heads in the designated time slots. After receiving the data from its members, cluster heads aggregate the collected data and forward it to the base station on behalf of their entire cluster.

Responsible Node Selection Phase.
After executing the steady-state phase, a cluster head in its respective cluster selects a node randomly as the head for the next round and communicates the same to its members. The members note the same and communicate their data to that newly selected cluster head in the upcoming round accordingly. The process is carried out in each of the clusters in the network.

Performance Analysis
This section deals with the various experimental processes conducted throughout the work and analyses the obtained results thoroughly.

Experimental Environment.
In conducting the experiments, different network configurations with varying node densities have been examined. More illustratively, experiments have been performed with the different number of nodes, say 50, 100, 150, and 200 in an area of 100 × 100 m 2 with two different sink placements-one at the center of the sensing field (50 m, 50 m) and another beyond the network precisely at (50 m, 150 m). An instance of clustering with 50 nodes and 5 and 10 cluster heads, respectively, is demonstrated in Figure 4. The base station is situated at (50 m, 150 m) in this exemplary instance.
An extensive set of experiments have been performed for the proposed scheme using MATLAB.
Mainly, the experiments have been performed to (1) Prove the efficacy of the proposed fitness function In this set of experiments, the proposed fitness function as in equation (15) has been tested for the quality of clusters being produced. It has been verified that the proposed fitness function yields balanced clusters in terms of cluster size. The clusters generated as per equation (15) have been compared with the clusters produced by the fitness function given in [42] under two different clustering scenarios. The network is divided into 5 clusters and 10 clusters, respectively.
(2) Prove the supremacy of the proposed scheme, MLBCT in terms of network lifetime and network stability In the second set of experiments, the performance of MLBCT is compared to that of DEBCRP [42] and improved differential evolution-LEACH (ImDE-LEACH) [46], majorly in terms of network lifetime and network stability with respect to the number of alive nodes in the network, network energy consumption, average residual energy per network nodes over the network rounds, and data packets delivered to the base station under the variable network configurations. Moreover, for the sake of experimentation, the performance of the LEACH [6] has also been recorded into the same context as that of MLBCT, DEBCRP, and ImDE-LEACH.

Simulation Parameters.
To compare the performance of the proposed scheme, MLBCT, with that of DEBCRP and ImDE-LEACH, simulation parameters have been adopted here as listed in Table 2. However, to prove the scalability and adaptability of the proposed scheme, the performance has also been tested under variable network configurations.
In addition to the parameters listed in Table 2, the following performance criteria have been used for the evaluation of schemes: (i) Network lifetime: the network lifetime is generally measured as the time when the first node dies, or when the last node dies in the network [28][29][30][31]42]. In this work, both definitions have been considered to demonstrate the supremacy of the MLBCT over DEBCRP, and ImDE-LEACH (ii) Network stability: network stability refers to how smoothly the network operations are going on. It can be measured in terms of the rate of the network energy consumption and the average residual energy per network node. The lower the rate of energy consumption, the more stable the network is, resulting in improved network lifetime. Similarly, the higher the value of average residual energy per network node, the more stable and durable the network is To further compare the performance of the schemes-MLBCT, DEBCRP, and ImDE-LEACH, packet delivery at the base station can also be considered as a criterion.
The success in this regard can be judged by the higher number of successfully delivered packets to the base station.

Results and Discussion.
As stated in point 1 of Section 5.1, the suitability of the proposed fitness function equation (15) is manifested in the first set of experiments. Since the scheme is a metaheuristic one, a suitable fitness function might contribute a lot to decide the best possible clusters. The main objective of this work is to formulate the clusters which are balanced in the sense that the clusters are having an almost similar count of member nodes and the member nodes are located close to one another to have minimized intracluster communication.
In this experimentation, variable node counts as in Table 2 have been considered for two instances of clustering such as 5 clusters and 10 clusters as shown in Figure 5.
The success of the fitness proposal mentioned above is evident in Figure 5. When implemented in the scheme DEBCRP, the proposed fitness function has been found more effective in having more balanced clusters. In other words, clusters are obtained with an approximately similar count of member nodes, leading to the even distribution of load throughout the network nodes. In Figure 5(a), the efficacy of the proposed scheme is demonstrated with five clus-ters being formed in the network, whereas Figure 5(b) presents the same while partitioning the network into 10 clusters. It can be easily observed from the figure that the members recorded in the clusters do not vary to the extent Input: * N: No. of randomly deployed sensor nodes. * f f n ðÞ: Fitness function. * F: Mutation/Scaling factor. * T: No. of iteration * k: No. of user-specified clusters * C r : Crossover rate ◊ BEGIN %% BOOTSTRAPPING PHASE %%. ◊ for i ⟵1: N ◊ Status Transmission(Node i ⟶ BS) ◊ end for ◊ Random population generation (P) where each vector (X i ) refers to the complete assignment of all the nodes to the k cluster heads ◊ for i ⟵1: size(P) where u l j is the l th component of U j defined as follows:

11
Wireless Communications and Mobile Computing as it is there in DEBCRP over the network rounds. Also, it has been verified that the scheme for the fitness evaluation of the clusters works invariably well irrespective of node density present in the network.

Statistical Analysis.
Statistical analysis is performed to further explain the efficacy of the proposed fitness function (MLBCT-fitness) as in equation (15) in producing the balanced clusters. This is done by finding out the standard deviation of average cluster size, σ CS following equation (9) along with the confidence interval. Standard deviation is defined as the measurement of how the clusters being produced deviate from the ideal distribution of the nodes among the specified number of clusters. The ideal distribution refers to the clusters with ðN/kÞ nodes if N nodes are to be distributed among k clusters.
For this very purpose, as explained above, the proposed fitness function is fitted into the scheme of DEBCRP, and the performance of such a modified scheme is compared with that of DEBCRP with respect to the formation of clusters. This is achieved by recording the clusters' length in both cases until the first node dies. Afterward, standard deviations of the average cluster size are measured in both of the cases-with its own fitness function (σ D−Fitness ) and MLBCTfitness function (σ M−Fitness ).

12
Wireless Communications and Mobile Computing the application of the DEBCRP-fitness function for all the node deployments under both the specified requirements of 5 clusters and 10 clusters. This also justifies the efficacy of the scheme. Another statistical analysis known as confidence interval justifies the probability of the deployment of the nodes within a range of the values of the cluster. In this case, the confidence intervals with the confidence levels 95% and 99%, respectively, are measured for both cases of the clustering scenarios with variable node counts. Table 3 clearly explains the efficacy of the MLBCT-fitness function over the fitness function used in DEBCRP in every possible network configuration. For example, when 100 nodes are deployed to be distributed among 5 clusters, ideally, each cluster should have 20 nodes. Here, the proposed fitness function ensures that each cluster has a node count in the range [18.8245, 21.1755] with 95% confidence and in the range [18.4526, 21.5474] with 99% confidence, whereas the fitness function of DEBCRP finds the same as in the ranges [15.2210, 24.7790] and [13.7093, 26.2907] with 95% and 99% confidences, respectively. It can be easily intuited that the node count in each cluster is much closer to the ideal node count (20 here) with the MLBCT-fitness function when compared to that with the DEBCRP-fitness function. The consistency of the MLBCT-fitness function in terms of balanced clusters' formation can be seen in Table 3.

Experimental Analysis.
In this second set of experiments, as stated in point 2 of Section 5.1, MLBCT is compared to DEBCRP, ImDE-LEACH, and LEACH concerning the metrics-network lifetime, network energy consumption rate, and average residual energy per network node under two different network configurations, say WSN#1 and WSN#2. In WSN#1, the sink has been placed at the center of the 100 × 100 m 2 sensing field, precisely at (50 m, 50 m) whereas, in WSN#2, the sink is located outside the sensing field at (50 m, 150 m). Moreover, to validate the adaptability of the scheme, simulations have been conducted with variable node deployments, say with 50 nodes, 100 nodes, 150 nodes, and 200 nodes.

Wireless Communications and Mobile Computing
(1) Network Lifetime. As mentioned earlier in this section that the network lifetime can be defined as the time when the first node dies in the network or the time when the last node dies in the network. In Figures 7 and 8, both strategies have been followed separately.   On the other hand, if the network lifetime is taken as the time when the last node dies that is LND (last node death) in the network, Figures 7(a) and 7(b) describe the outcomes of experiments conducted in this regard with the variable number of nodes as above, say 50, 100, 150, and 200, respectively.
In WSN#1 (Figure 8 Moreover, the comparative performance of the schemes MLBCT, DEBCRP, ImDE-LEACH, and LEACH with respect to the nodes' death rate can also be observed from Figure 9. Figure 9(a) describes the performance of the MLBCT against that of DEBCRP, ImDE-LEACH, and LEACH in   variable node population under the first network scenario WSN#1. Similarly, Figure 9(b) describes the same but for WSN#2. It is evident from Figure 9 that irrespective of the network configuration and nodes' population in the sensing field, MLBCT performs consistently well as the nodes' death rate is low in MLBCT, and hence, the number of alive nodes is high at any point of network operation in MLBCT when compared to DEBCRP, ImDE-LEACH, and LEACH. Thus, it can be concluded here that the MLBCT outperforms DEBCRP, ImDE-LEACH, and LEACH in terms of the first performance criterion-network lifetime.
(2) Network Energy Consumption. From Figure 10, it can be concluded that at any point of the network operation, the energy consumption in MLBCT is less than that in DEBCRP, ImDE-LEACH, and LEACH in both of the scenarios implemented that is in WSN#1 (Figure 10(a)) and   16 Wireless Communications and Mobile Computing WSN#2 (Figure 10(b)). Moreover, to demonstrate the consistency in the performance, variable counts of sensor nodes have been deployed here too.
(3) Average Residual Energy/Node. In this next set of experiments, the performance of MLBCT is measured in terms of the average residual energy that a network node has at any point in the network operation for the schemes DEBCRP, ImDE-LEACH, and LEACH. It can be explicitly observed that the nodes are always equipped with a larger amount of residual energy if being operated with MLBCT in comparison to DEBCRP, ImDE-LEACH, and LEACH ( Figure 11). It is noticed not only in WSN#1 (Figure 11(a)) but also in WSN#2 (Figure 11(b)); average residual energy for a network node is higher at any point in network operation if implemented with MLBCT.

Wireless Communications and Mobile Computing
This depicts that a network utilizing MLBCT saves energy and keeps its resource intact for future usage, which is the desired criteria for sensor networks.
(4) Data Packet Delivery at Base Station. In the final set of experiments, the performance of MLBCT against the DEBCRP, ImDE-LEACH, and LEACH with respect to the number of data packets delivered to the base station is com-pared. The predominance of the proposed scheme, MLBCT, can be read for both the network scenarios WSN#1 and WSN#2 in Figures 12(a) and 12(b), respectively. For the 50, 100, 150, and 200 nodes, MLBCT enriches the base station with 915, 969, 1221, and 1054 data packets, respectively. However, DEBCRP results into 800, 755, 685, and 700 data packets, ImDE-LEACH results into 550, 650, 650, and 645 data packets, and LEACH results into 416, 477, 533, and   Based on the outcomes of the various simulations conducted so far, it can be concluded that the MLBCT outperforms the DEBCRP, ImDE-LEACH, and LEACH in terms of the chosen criteria of network lifetime, network stability, average residual energy, and data packet delivery.

Conclusion and Future Works
In this work, a Metaheuristic Load-Balancing-Based Clustering Technique has been proposed for wireless sensor networks. To achieve the prime objective of load-balanced clusters, a fitness function has been proposed that offers balanced clusters in terms of their size and energy and ensures the members to be in close proximity to one another reducing the cost of intracluster communication. Through an extensive set of simulations and experimentation, the supremacy of the proposed scheme MLBCT has been proved over the existing ones DEBCRP, and ImDE-LEACH in terms of improved network lifetime and network stability, average residual energy, and data packet delivery.
Statistical analysis also justifies and supports the feasibility of the scheme. Moreover, the scheme's adaptability and scalability have also been established by varying the network configuration with the different number of nodes and different placement of the base station.
As a future extension of this work, a heterogeneous wireless sensor network (HWSN) would be investigated to device a clustering-based scheme induced by metaheuristic techniques to consistently contribute to the network operations without being affected by the heterogeneity present in the network.

Data Availability
Extensive analysis, method, and result data has been fully provided.

Conflicts of Interest
The authors declare that they have no competing interests.