An Energy Balancing Strategy Based on Hilbert Curve and Genetic Algorithm for Wireless Sensor Networks

1 Innovative Information Industry Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China 2Fujian Provincial Key Lab of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, China 3Swinburne University of Technology, Melbourne, VIC, Australia 4Faculty of Electrical Engineering and Computer Science, VSB-Technical University of Ostrava, Ostrava, Czech Republic


Introduction
Wireless sensor networks (WSNs) are spatially distributed autonomous sensors used to monitor physical or environmental conditions, such as pressure, sound, and temperature.WSNs are composed of common sensor nodes and sink nodes [1,2]; the common sensor nodes cooperatively pass their data through the network to a sink node.The development of wireless sensor networks was originally motivated by military applications such as remote sensing or data collection in dangerous or remote environments [3].Today, these networks are used in many industrial and consumer applications and have become part of daily life.WSNs are built of a few to several hundreds or even thousands of nodes, where each node can connect with one or more sensors.Each sensor node is equipped with several parts, namely, a transceiver, a sensing device, and an energy source.These sensor nodes differ in size and cost, which results in corresponding constraints on resources such as energy, memory, and computational speed [4][5][6][7].Their energy source is usually a battery, which is undesirable and infeasible to replace or recharge [8][9][10].Therefore, network lifespan becomes a vital concern in the construction of a WSN [11].However, unbalanced energy consumption between inner nodes (the nodes close to the sink node) and outer nodes (the node far away from sink node) always occurs and is uncontrolled in two-tier network structures.Sink nodes, the only nodes that control and operate as processing centers, collect all the valuable packages from the sensor nodes via a predefined routing path.The inner nodes not only transfer their own sensed data, but also pass on data from outer nodes.Thus, inner nodes have greater energy consumption than that of outer nodes.The more energy one node uses, the earlier it depletes its battery.The worst case scenario resulting from this is if the depleted node is the only communication line between outer nodes and the sink node.In this network structure, if even a few inner nodes die, many outer nodes will be affected.In this situation, several service sites which have part of the functions of a sink node become necessary, and the sensor nodes then send their data to the nearest service site instead of the sink node.This also decreases the workload on inner nodes and extends the lifespan of the overall network.This paper focuses on developing a method to determine the optimal number of service sites for a given network.The cost of deployment and construction of a service site is much greater than that of a common sensor; thus, there should be a minimum necessary number of service sites in the network to satisfy full coverage demand.
Given  nodes with specified distances,  centers must be constructed for groups of nodes in such a way as to minimize the maximum distance between nodes and their centers.This is the -center problem.The goal of this paper is to minimize the number of service sites in a wireless sensor network, thus reducing the construction cost of a threetier network caused by service sites.More importantly, this three-tier network must satisfy the full coverage requirement.The number of service sites is considered  in the -center problem.However,  is not yet known.One of the most popular methods for resolving the -center problem is the farthest first method [12]; although this method satisfies a 2-approximation solution, it is not perfect.This paper proposes a new scheme, HHSG, to solve the service site problem.The name of "HHSG" was given by an integrated abbreviation of "Huffman coding," "Hilbert curve," "Sudoku puzzle," and "genetic algorithm" because the concepts of these four classical terms were utilized in our proposed scheme.Furthermore, several other methods are simulated and applied to wireless sensor networks.
The remainder of this paper is structured as follows: Section 2 reviews background work on Hilbert curves, the -center problem, and wireless sensor networks and will also describe related work on basic genetic algorithms and Sudoku and Huffman codes.Section 3 describes the HHSG process in detail.Experimental results and some analysis with other methods are given in Section 4. Conclusion is offered in Section 5.

Related Works
Wireless sensor networks have been widely used in vast variety of different fields.Driven by microelectromechanical systems technology advances in low-cost networking, there have been rapid development and use of wireless sensor networks in recent years [13,14].These sensor networks carry the promise of significantly improving and expanding the quality of care across a wide range of applications, which include air pollution monitoring, medicine and public health, and natural disaster prevention.Although a general twotier network is considered to be a flat network and has a very simple structure, it has an inherent disadvantage in terms of balancing the workload of its sensor nodes.When inner nodes deplete their batteries, they die and disconnect from their outer nodes, interrupting the routing path from the outer nodes to a sink node.As a result, many nodes that still have sufficient energy to function will be removed from the network, and their information will no longer be forwarded to a sink node.Alternatively, a hierarchical network is a network in which all sensor nodes are clustered through some specific technique according to given protocols [15,16].Hierarchical networks facilitate equalized power consumption.

Genetic Algorithms.
Genetic algorithms are a family of computational models inspired by natural evolution [17][18][19][20].In a genetic algorithm, a population of candidate solutions to a problem is evolved toward better solutions.Each candidate solution, which is expressed in binary string of 0 or 1, is a chromosome with a set of attributes which can be mutated and modified.The basic genetic algorithm usually starts by generating several random chromosome solutions, then evaluating each chromosome, and storing the ones with better fitness values as the algorithm approaches an optimal solution by randomly mutating and altering the predefined number of genes to generate a new solution.This new solution will be used in the next iteration.Commonly, the algorithm stops when it reaches a predefined number of iterations or time limit or when there is one solution that is satisfied.Genetic algorithm is widely used in many applications and is also combined with other methods to generate new optimal solutions [21,22].

Space-Filling Curve.
A space-filling curve is a single onedimensional curve that tours around an entire 2 or more dimensional space and recursively fills up all points when the number of iterations approaches infinity [23,24].Because Giuseppe Peano (1858-1932) was the first to discover one of the filling curve constructions, space-filling curves in 2dimensional planes are sometimes called Peano curves.Some of the most celebrated are the Hilbert curve and the Sierpiński curve [23].Space-filling curves are used in many fields.In 2014, Yan and Mostofi [24] scheduled a data collection path for mobile robots using space-filling curves; his goal was to minimize the total energy consumption, including the communication cost between the robot and sensors and the motion cost of the robot.In this study [25], the problem of how mobile sinks should move is addressed.A good strategy for a moving trajectory for mobile sinks can reduce data loss and delivery delay, increase network lifetime, and enable better handling of sparse networks.A dynamic Hilbert curve is used to design a trajectory for a mobile sink while achieving efficient network coverage.The dynamic curve order varies with node densities in a network.Simulation results show the effectiveness of network coverage and scalability.
For Hilbert curves, if there is a point within the unit square, with coordinates (, ),  is the distance along the curve from the start till it reaches that point.Points from the curve that have nearby   values will also have nearby coordinate (  ,   ) values.The basic level one (also called first order) Hilbert trajectory is a 2 × 2 grid.The method of recursively constructing a Hilbert filling curve is described as follows: dividing the network field into 4 small grid cells, the one-level Hilbert curve will be the line passing through the centers of those four-grid cells in a specific order of points.To derive a two-level curve, it simply replaces each small grid cell with a one-level curve which may be appropriately rotated and reflected.And -level curve is derived from an  − 1-level curve.Intuitively, the higher the level of the curve is, the more accurate its localization precision will be.However, this means that more space is needed for recording the positions, at greater cost.Figure 1 shows one-, two-, and three-order Hilbert curves.There are 4 points in a one-level Hilbert, 16 points in a two-level Hilbert, and 64 points in a three-level Hilbert.  = 4  is the equation used to compute the relationship between the level of Hilbert curves and the number of points, where  is Hilbert level and   is number of points.

Sudoku.
Sudoku is a logic-based, number-placement puzzle which consists of  ×  grid of blocks, where  smaller cells of each  element are partitioned.The numeric values 1 to  appear uniquely in each row and column of the grid and in each block [26,27].Given a 9×9 grid, the goal is to fill this grid with digits from one to nine only.The rule is that each row and each column, even the nine 3×3 subgrids which compose the big grid, should contain all of the digits of one to nine.Although the 9 × 9 grid is by far the most commonly used, many other variations exist.Number placement could be 4 × 4 with 2 × 2 regions or 16 × 16 with 4 × 4 regions.

The 𝑘-Center
Problem.One of the well-known fundamental facility location strategies [28] is the solution of center problem, and this problem is known to be NP-hard [29].The basic -center problem starts from a given graph with  vertexes, where it is required to put  facilities into the graph, so as to narrow down the maximum distance from any vertex to the facility to which it is assigned.Several optimal algorithms that can achieve a factor of 2-approximation performance have been proposed for it.An algorithm could be called -approximation algorithm which means that the algorithm can always output a value in polynomial times, where the value is no more than  times the optimal for a minimization problem.With the widely used -center problem, some variant versions of it have also been massively explored.For example, some special constraints on the centers positions were added to the problem.In 2015, Du et al. [30] explored the incremental one that all the centers should lie on the boundary of a convex polygon.In the same year, Liang et al. [31] addressed the constraint of vertexes with internal connectedness, where it is guaranteed that any two nodes in one set should be lined by an internal path.This is actually a classic -center problem, which is called connected -center (CkC) problem.In [32], the authors presented a solution for the -center problem and did some research about its generalizations.He also noted that dominating set problem is another specific form of -center problem.The authors Chechik and Peleg [33] studied the other constrained version of capacitated -center problem and examined the fault tolerance in failures of one or more centers simultaneously and then proposed methods to address the problem.

HHSG Scheme Implementation
The flowchart of the HHSG scheme is shown in Figure 2. The process of this scheme focuses on selecting k service sites out of  sensor nodes.Every sensor node will be assigned twokinds of serial number.One is the node numbering (NID), which is nonrepeatable.The NID ranges from 1 to .The other number is a Huffman code (ℎ  ).It is reasonable that there can be more than one node with the same Huffman code.
The process runs as follows.First, encode sensor nodes using Huffman code and define the level of Hilbert filling curve used.Second, pick the appropriate size and order of Sudoku according to the communication radius and network scale, randomly select a digit from the Sudoku grid, and record and encode the positions (  ) of the digit into ℎ  .Third, mark   that have Huffman codes that are the same as or similar to   , and initialize a chromosome using   .Fourth, repeat steps 2 to 3 until chromosome initializing is complete.Fifth, find the best solution by executing the mutation or crossover operation to output the outcome.

Encoding and Defining Curve
Order.Huffman code uses a prefix-free code that is a bit string representing some particular points but is never a prefix of any other points.As shown in Figure 3, each position with specific distance d or multiple times d distances starts from the red point (Figure 3(b)), signifying a Huffman code, and this code expresses one or more real sensor nodes in wireless sensor network.By utilizing its locality property, it needs a six-bit string to represent a three-level Hilbert curve.The encoding process is given below.
Divide this field into four small grid cells, and set a twobit binary number to it; its order is from upper left to lower left and then lower right to upper right, with values of 00, 01, 10, and 11, respectively.This coding order also strictly applies to the inner subgrid cell.As showed in Figure 3(a), binary 10 is in red on the lower right, and all the sensor nodes located in this quarter of the area will be prefixed with a two-bit code 10.Then, this quarter area is also divided into four small grid cells, and the binary number order is exactly the same as that of the bigger area.Binary 10 is in blue and is 1/16th of the total area.Any node in this area will have 10 in the second part of its Huffman code.As shown in Figure 3(b), the red point starts from 0 in the Huffman code to the blue point  47.The binary string for 0 of the red starting point is "00 00 00."The encoding process for the blue point is described here in detail.The first 2-bit binary string is 10 as it appears in the lower-right quadrant.Then the second 2-bit binary string relies on the upper-right 1/16th area, which is 11.The point located in lower-left 1/64th area results in a 01 suffix.Assemble the binary string 10 11 01, which is 47 in decimal numerals.Finally, the Hilbert curve gives every node an ℎ  [34][35][36].ℎ  of one node varies according to the order of the Hilbert curve.The largest code number is 63 for a three-order curve and 1023 for a five-order curve.Assume a 100-node network with a five-order Hilbert curve.Almost less than ten percent of codes are truly used in sensors, which is a great waste.Similarly, for a 600-node network in a three-order curve, more than ten sensor nodes have the same ℎ  .Figure 4 shows the configurations of nodes with ℎ  in varied order of Hilbert curve.
100 to 400 nodes randomly scattered in a 200 × 200 unit area, and nodes are coded in Huffman code with four-or fiveorder Hilbert curves.
A filling curve has a locality property which means that any two close points in one-dimensional space are mapped to two points that are close in the original 2 (or more) dimensional space, but the converse cannot always be true.There are points where the coordinates are close but their  values are far apart, which means that two close nodes may not close in the curve.Thus, if one selects the nodes from this curve directly to initialize chromosomes, there may be neighbor nodes in one chromosome.This is why the Sudoku is needed.

Sudoku Size and
Order.This section will demonstrate how size and order of Sudoku are chosen.The size of Sudoku expresses the  value of grids in one row or one column.The order of Sudoku gives the level value of a block constructed by several Sudoku of the same size.A single-or multiple-order Sudoku that can sketch the network appropriately is desired, whereby each grid may cover the right amount of sensor nodes.It is not appropriate to use a single-order 9 × 9 grid Sudoku in a 600-node network with a ten-unit sensing radius    of 200 × 200 unit network.If the sensor nodes are deployed randomly and uniquely, the practical number of service sites used will be more than 100.However, only nine positions can be chosen at one time to generate the service site candidates.So before choosing the size and order of Sudoku, the number of service sites must be calculated.A one-order and two-order 9 × 9 grid Sudoku resolution are shown in Figure 5.
Each digit randomly chosen from the Sudoku is labeled as a target digit (  ).Those nodes near the   position are potential sites (  ).One   in a 9 × 9 grid Sudoku generates a solution set with nine   .In the experiment, more than one   are usually used, and a two-order 9 × 9 grid Sudoku generates 36  .In the same way, one   generates 64  in a two-order 16 × 16 grid Sudoku.  will be used in a genetic algorithm initialization, but not all the chromosomes are generated from   directly.

Initialization and Evaluation.
An -bit binary string with binary values 0 and 1 represents the structure of a chromosome.The order corresponds exactly to the NID order of sensor nodes, and  is the number of sensor nodes in network.This -dimension string exactly expresses the relationship between service sites and common sensor nodes.In this string, value 1 represents a service site, and value 0 represents common nodes.As shown in Figure 6, NID ranges from 1 to , and some of those value 1 bits come from   .There also are some extra value 1 bits randomly added.Each chromosome obtains a fitness value based on its fitness function.The best chromosome with the best fitness value is stored as  best .

Fitness Function.
A well-constructed fitness function may substantially increase the chance of finding a solution.
This section presents a new fitness function which includes four parameters. is the number of service sites in the field, which is also the amount of 1 values in one chromosome.The field is divided into [width/radius × length/radius] cells, and the number of cells in which those service sites are located is the   value; here width and length are the size of the field, and radius is the communication range of the sensor nodes.The fitness function is shown as (1).The function  (  ,  sink ) indicates the distance between the node   and the sink node.The function  (  , ) means the distance between the node   and its closest service site. and  are the sets of sensor nodes (size is ) and service sites (size is   ), respectively.Coefficients  1 ,  2 , and  3 are constants.

Crossover and Mutation Process.
Let  1 =  1,1 , . . .,  1, (mother) and  2 =  2,1 , . . .,  2, (father) be the parents; after crossover operation, the child  3 is The mutation process works by inverting a bit value in the chromosome with a small probability.Here, the mutation rate is set as constant 0.02.The crossover and mutation processes are shown in Figures 7 and 8, respectively.
3.6.Remedy Process.As mentioned above, each chromosome represents a candidate solution for service sites versus common nodes in this model.This model should satisfy validity and feasibility demands.In other words, those 1-bit values representing service sites should be able to cover all 0-bit values representing common nodes.If not, this individual must be repaired.The following method is used in this experiment to revise incorrect chromosomes.First, generate the chromosome again if it happens in the initialization step.Second, change those uncovered bits with value 1.Third, list those uncovered value 0 bits, and change one of them to 1 each time, and remove all other bits dominated by it in the list.Repeat the process until the list is empty.

Experiment Results
The simulation environment is a 200 × 200 (unit) area, with 100, 200, 300, and 400 sensor nodes scattered randomly with a communication radius of 40, 30, 25, and 20 units, respectively, in the network.Six methods are implemented, including the FF, HL, HD, DO, GA, and HHSG scheme, where FF is the farthest first traversal, HL and HD are the Harel and Koren [37] methods, DO is a heuristic algorithm solving the minimum dominating set problem [38], GA is the original genetic algorithm itself, and HHSG is the proposed scheme.results.Table 1 shows the final values for the parameters used in this experiment.
DCT records the number of times for distance computation used in the methods' operation process.The distance could be node to node or node to service site.Table 2 lists the DCT values for FF, HL, and HD, three nonevolutionary algorithms.The values are computed by simple equation, and the practical values may be smaller due to some pruning strategies used in the methods.As shown in Table 2, HL has the lowest DCT value.Here, the times of HL, FF, and HD are set as three baselines labeled  HL ,  FF , and  HD , to be used later.

Influence of the Hilbert and Sudoku.
In the experiment, HHSG was executed on a 200-node network with different Hilbert curve and Sudoku parameters.In Table 3, the consecutive numbers show the Hilbert curve order, Sudoku size, and Sudoku order parameters used in the experiment.For example, 5-16-1 represents that the test operates on a 5-order Hilbert curve with a one-order 16 × 16 grid Sudoku.Column one lists the DCT level.HHSG may produce different results with different parameters.With a five-order Hilbert curve, the results are clearly better than the other two cases.

Comparison of GA and HHSG.
GA and HHSG belong to a larger class of evolutionary algorithms.They may generate high quality solutions by operating endless iterations for optimization.For further comparison of the rate of evolution, the following tests were made.In Table 4, GA and HHSG were run in 100-node to 400-node networks with the same number of iterations.In the 100-node network, HHSG only used 13 service sites to cover the field, two less than GA, and its superiority is obvious in a 400-node network.
In order to test the stability of the HHSG and GA methods, the two methods were run sixteen times in the same situation, with the exception of the random number used.This study lists the standard deviation (SD), best value (best), average (AG), and the worst value (worst) for comparison in Table 5.The standard deviation values stay below one for the HHSG method, where the GA method reaches seven in a 400-node network, which fully illustrates the stability of the HHSG scheme.From the table, it can be seen that the worst result of HHSG is still better than the GA method in 200node, 300-node, and 400-node networks.
4.4.Overall Evaluation.Tables 6 and 7 list results for the number of service sites and fitness values obtained by the six different methods.In the experiment, the larger the fitness value the better, and the lower the number of service sites the better.Although the number of service sites result by HL is equal to HHSG, the HL fitness value in a 400-node network is lower.The HHSG scheme is better than HL in other networks sizes overall.As for the other methods, they yield lower results overall than those of HHSG in both number of service sites and fitness values.In Figure 9, HHSG1 plots the fitness value evolution process for a 100node network using the HHSG scheme, HHSG represents the same for a 200-node network, and the processes for the other schemes are labeled accordingly.The superiority of HHSG is clear.show the simulated network after service sites are added.Sensor nodes are randomly scattered in the  field, with the sink node deployed in the center of area.The sink node is shown in the 100-node and 200-node networks, but it is too complex to plot all the lines between service sites to sink nodes in the 300-node and 400-node networks.

Conclusion
Energy load balancing is critical to extending the lifespan of wireless sensor networks, in addition to ensuring continued functionality and avoiding communication interruptions caused by dead nodes.Wireless sensor networks usually consist of hundreds or even thousands of sensor nodes scattered randomly in adverse, remote, or dangerous environments, with only nonrechargeable, nonreplaceable batteries to power each node.Thus, energy conservation for individual nodes is important, but equally important is the energy efficiency of the overall network.Traditional twotier networks with one sink are vulnerable to energy holes, which cut off many nodes from the sink when one inner node dies.This paper therefore proposes a solution, HHSG, to minimize the construction cost of a three-tier network and take full advantage of node energy.A Hilbert curve is scheduled for different sized networks, Huffman codes are assigned to nodes, and chromosomes for a genetic algorithm are initialized using a Sudoku puzzle.Furthermore, five other methods are tested in the experiment for performance comparison with the proposed method.The experiment lists the relationship between Hilbert order and sensor nodes' Huffman codes and the convergence of results due to the varied Hilbert and Sudoku order.It also compares the service site performance of the other five methods with that of the HHSG algorithm.Importantly, this paper lists the standard deviation (SD), best value (best), average (AG), and the worst value (worst) of each method in order to compare the stability and benefits of each method.The standard deviation values stay below one for the HHSG method, which fully illustrates the stability of the algorithm.Simulation results show the superior performance of the proposed method, which builds a stable three-tier network using fewer service sites than other methods.In terms of the costs of the proposed scheme, because HHSG is a centralized algorithm, the cost could be a communication overhead for collecting global information before executing the algorithm; however, this is also the common cost in all of the centralized algorithms to solve the problem.Other possible costs could be the computation time and memory space.However, centralized algorithms are usually executed in a resource-rich machine, and computing power and memory space are not the most important considerations to solve the problem by a centralized algorithm.

4. 1 .
Parameters.The order and size of Hilbert and Sudoku values play an important role in generating   .Therefore, different parameters used in the experiment produce diverse

Table 1 :
The values for the parameters.

Table 2 :
DCT values (number of times for distance computation) with different methods.

Table 3 :
The effects of Hilbert order, Sudoku size, and Sudoku order to HHSG.

Table 4 :
Comparison of DCT and number of service sites.

Table 6 :
The final results (value of fitness function) in six methods.

Table 7 :
The final results (number of times for distance computation) in six methods.