A Distributed Unequal Clustering Routing Protocol Based on the Improved Sine Cosine Algorithm for WSN

In order to balance the overall energy consumption and improve the energy e ﬃ ciency of wireless sensor network (WSN), a distributed energy-balanced unequal clustering routing protocol based on the improved sine cosine algorithm (DUCISCA) is proposed. Firstly, DUCISCA adopts a time-based cluster head competition algorithm. In this algorithm, the broadcast time depends on the residual energy of the candidate cluster head, the distance to the base station, and the number of neighbour nodes. Secondly, a competition radius considering the distance from node to base station and the residual energy of node is proposed. It can balance energy consumption of nodes in di ﬀ erent locations to avoid the “ hot spot ” problem. At the same time, it adopts a time-based broadcast mechanism. The waiting time depends on the residual energy of CCHs, the distance to the BS, and the number of neighbour nodes, which can e ﬀ ectively reduce the overhead of nodes. Thirdly, the energy of cluster head, the number of neighbour nodes, and the distance from the ordinary node to the cluster heads need to be taken into account to get a better clustering result. Finally, in order to speed up convergence and improve the ability to jump out of local optimum, the improved sine-cosine algorithm (ISCA) based on Latin hypercube sampling and adaptive mutation is proposed. The improvement strategies adopted by ISCA are expressed as follows: Firstly, the diversity of the population is enhanced through LHS population initialization. Secondly, the adaptive weight strategy is introduced to accelerate the convergence speed of the algorithm. Finally, the population is disturbed by Gaussian mutation or Levy ﬂ ight to jump out of the local optimum. The standard deviation of cluster heads ’ residual energy in intercluster communication is taken as the objective function to search the energy-balanced intercluster data forwarding path based on ISCA. Compared with EEUC, DEBUC, I-EEUC, and M-DEBUC, the simulation results prove that DUCISCA can e ﬀ ectively balance the overall network energy consumption and prolong the network lifetime.

mostly adopt multihop routing in order to save energy. In this situation, the CHs near BS will undertake more data forwarding tasks, which is easy to cause the CHs to run out of energy earlier than other nodes. It leads to network coverage vulnerabilities and data loss. That is "hot spot" problem.
In order to improve these problems mentioned above, a distributed unequal clustering routing protocol based on the improved sine cosine algorithm (DUCISCA) is proposed in this paper, which self-organizes networks through unequal clustering and intercluster multihop routing. DUCISCA is a distributed competition algorithm. CHs are selected by competition. The competition range of nodes varies unevenly with the distance from BS and the residual energy of nodes, which can effectively balance the energy consumption of nodes in different locations. An improved sine-cosine algorithm (ISCA) based on Latin hypercube sampling and adaptive mutation is proposed and applied to intercluster multihop routing to obtain the near optimal multihop path, so as to balance the energy consumption of network.
According to above discussion, our contributions are concluded as follows: (1) We have adopted a time-based broadcast technology to select CHs, and the waiting time of CHs is related to the residual energy of CCHs, the distance to the BS, and the number of neighbour nodes, which can effectively reduce the overhead of nodes (2) We have presented a cluster formation scheme which considers energy, distance, and the number of neighbour nodes to obtain better clustering effect (3) We have proposed an improved sine-cosine algorithm (ISCA) based on Latin hypercube sampling and adaptive mutation, which achieves good results on 6 standard test functions. And we have applied ISCA to intercluster multihop routing to obtain the near optimal multihop path, so as to balance the energy consumption of network The rest of this paper is organized as follows. Some wellknown routing protocols proposed for WSN are summarized in "Related Works". "System Model" describes the network model and energy model in this work. "DUCISCA Protocol" describes proposed unequal clustering routing protocol DUCISCA in detail. "Simulation Experiments and Analysis" presents the simulations and performance evaluation of DUCISCA. At last, "Conclusion" summarizes this paper.

Related Works
Earlier clustering routing protocols usually adopted the method of uniform clustering [4] to divide the whole network into clusters with equal size. The number of members and the cluster radius is approximately equal in each cluster. Heinzelman et al. [5] proposed a low-energy adaptive clustering hierarchy (LEACH), a clustering-based protocol that utilizes randomized rotation of CHs to evenly distribute the energy load among the sensors in the network. Younis and Fahmy [6] presented a hybrid energy-efficient distributed clustering approach (HEED) that periodically selects CHs according to a hybrid of the node residual energy and a secondary parameter (node proximity to its neighbours or node degree). Thakkar and Kotecha [7] proposed a coverage routing protocol based on LEACH (CVLEACH) to make uniform distribution of CHs by creating nonoverlapped cluster regions using overhearing properties of the sensor nodes, which prolongs lifetime of WSN. In uniform clustering routing protocols, CH selection is an important challenge that directly affects the network performance [8]. A protocol for analyzing energy-delay trade-off (EDIT) [9] is proposed, which can better balance nodes' energy consumption and reduce transmission delay. Thakkar and Kotecha [10] presented a new CH election protocol inspired from Bollinger Bands. Simulation result shows significant improvement in the network lifetime in comparison with other decentralized and ant-based algorithms. Daniel et al. [11] proposed an effective method named tunicate swarm butterfly optimization algorithm (TSBOA) is developed for selecting CH to accomplish effective data transmission between the sensor nodes. Although these protocols can balance the energy consumption, the overall energy consumption of the network is still very high due to the single-hop routing method. Therefore, in order to reduce the energy consumption of CHs, researchers propose many multihop routing protocols, such as EM-LEACH [12], IEE-LEACH [13], and DMEERP [14]. Multihop routing can reduce the energy consumption of CHs far away from the BS, but more data forwarding tasks fall on CHs closer to the BS, resulting in excessive energy consumption and premature failure. It will cause "hot spot" problem and premature death of network lifetime.
From the above analysis, it can be seen that the energy consumption among clusters is unbalanced whether the uniform clustering protocols use single-hop or multihop routing. So many researchers propose unequal clustering routing protocols. In unequal clustering, the cluster size varies proportionally to the distance to BS. Energy conservation and eliminating hot spot problem are the most common objectives of unequal clustering [15]. UCS [16] proposed the idea of unequal clustering for the first time to balance energy consumption of CHs. EEUC [17] partitions the nodes into clusters of unequal size, and clusters closer to the BS have smaller sizes than those farther away from the BS. For intercluster communication, EEUC adopts an energy-aware multihop routing system considering residual energy of CHs to balance the energy consumption of network. I-EEUC [18] improves the CH selection algorithm based on EEUC. CH elections adopt a timed broadcast mechanism to select CHs reasonably, which prolongs the network lifetime compared with EEUC. Jiang et al. [19] proposed a distributed energy-balanced unequal clustering routing protocol (DEBUC). DEBUC adopts a time-based competitive algorithm to select CHs. For intercluster routing, it takes into account the residual energy of nodes, the cost of intracluster, and intercluster communication.
The simulation results show that it has better performance than LEACH, EEUC, and I-EEUC. Based on DEBUC, M-2 Journal of Sensors DEBUC [20] has improved its candidate CH election method, competition radius, and wait time. The simulation results show that it has better performance than DEBUC.

System Model
3.1. Network Model. Let us consider a sensor network consisting of N sensor nodes uniformly deployed over a twodimensional (2D) square area. We denote the i-th sensor by S i and the corresponding sensor node set S = fS 1 , S 2 ,⋯, S N g, where jSj = N. We make some assumptions about the sensor network model [21]: (1) There is a base station with unlimited energy located far away from the square sensing field. Sensors and the base station are all stationary after deployment (2) All nodes are homogeneous and have the same capabilities. Each node is assigned a unique identifier (ID) (3) All nodes have similar capabilities and equal status and can act as cluster head nodes or ordinary nodes (4) Nodes can use power control to vary the amount of transmission power which depends on the distance to the receiver (5) Data fusion technology is adopted to reduce the amount of data transmission

Energy
Model. The radio model in this work is the same as described by [14]. With reasonable signal-to-noise ratio guaranteed, the energy consumption for node sending data is shown in (1).
where n is the number of bits transmitted, d is the transmission distance, E elec is the energy consumption for sending or receiving 1 bit data, ε f s is the coefficient of energy consumption for amplifying radio at free-space mode, ε mp is the coefficient of energy consumption for amplifying radio at multifading mode, and d 0 = ffiffiffiffiffiffiffiffiffiffiffiffiffi ε f s /ε mp p is the threshold value of distance.
The energy consumption of data received by the node is Due to the large difference of data between clusters, data fusion between clusters is not considered in this simulation. The assumptions of the data fusion model in the cluster are that the CH receives n bit data sent by each member node and compresses it into n bit data regardless of the number of nodes in the cluster. The energy consumption for CH fusion of n bit data is where E DA (nJ/bit) is the energy consumption for fusing 1 bit data. In our simulation, communication energy consumption model parameters set to E elec = 50 nJ/bit, ε f s = 10 pJ/bit/m 2 , ε mp = 0:0013 pJ/bit/m 4 , and d 0 = 87 m.

DUCISCA Protocol
DUCISCA protocol adopts "round" cycle mechanism, and each round includes three stages: cluster formation, multihop routing between clusters, and data transmission. Firstly, clusters with different sizes are formed. Secondly, according to the network clustering results, multihop routing between clusters is established. Finally, the network enters stable stage to complete data transmission. Figure 1 is the basic schematic diagram of DUCISCA.   [22]. Therefore, it is not necessary for all nodes to be CCHs. Given an appropriate threshold T to control the proportion of CHs, in this paper, T is set to 0.4 according to [17]. Each node S i calculates the value of μ. If μ < T, node S i becomes a CCH; otherwise, node S i becomes an ordinary node and goes into a dormant state until the final cluster head election is completed. The value of μ is calculated by (4).
where μ 0 is a random number evenly distributed between (0,1) and E avg is the average residual energy of all alive nodes in the network. Its calculation process is as follows: The data packets transmitted by the nodes include their own residual energy RE i . After the BS receives them, the average residual energy E avg of all alive nodes is calculated; RE i denotes the residual energy of node S i . Obviously, the larger of RE i , the smaller of μ, so the greater probability that S i will become a CCH.

Calculation of Unequal Competition Radius.
In DEBUC, the calculation of the competitive radius only considers the distance from the CCH to BS and ignores the energy. This may result in low-energy nodes to be the CHs, and after forming a cluster, the members in the cluster will cause an increase in the energy consumption of CHs. Therefore, a new competitive radius considering both distance and energy is proposed, and the improved competitive radius for CH v i is calculated in (5).
where d max and d min are the maximum and minimum distances from the alive nodes to the BS, respectively, dðv i , BSÞ denotes the distance from v i to the BS, RE v i is the residual energy of v i , w 1 and w 2 are both constants between (0,1), and w 1 + w 2 = 1.

Final Cluster Head (FCH) Election
Definition 1. In DUCISCA, given maximum communication radius R 0 comp , for S i , its neighbour node set Nbr i is computed as (6).
Definition 2. The degree of S i is calculated as (7).
CCH v i broadcasts competition message with R 0 comp radius. The message content includes ID and the competition radius v i . R comp and the remaining energy RE i . CCH v i establishes its neighbour CCH set NT i based on the received election message.
In the CHs' competition stage, DEBUC adopts a time-based broadcast mechanism to calculate the waiting time. It only considers the energy factor of the nodes, which easily causes the "hot spot" problem. Therefore, this paper presents an improved waiting time that takes into account energy, distance, and node degree. The improved waiting time is calculated for CCH v i which is shown in (9).
where k is a random number evenly distributed between (0.9,1) to reduce the probability of broadcast message conflicts; T CH is the maximum duration of predefined CHs' competition; E 0 is the initial energy of v i ; c 1 , c 2 , and c 3 are weight factors of energy, distance, and degree, respectively, and c 1 + c 2 + c 3 = 1. Using a time-based broadcast mechanism can reduce the number of control messages broadcast and received when CCHs compete for FCHs. BS broadcasts the CHs' selection message CH_SEL_ MSG to synchronize the time of each node. After receiving the CH_SEL_MSG message, the node starts its own clock to start timing and listen to the messages sent by other nodes.
If RE i < E NT i ( E NT i is the average residual energy of the neighbour nodes of v i ), v i gives up the competition of FCH; otherwise, its waiting time is calculated according to (9). It is to be noted from (9) that the CCHs with larger residual energy, closer to the BS, and a small number of neighbour nodes have higher probability for FCH selection.
The pseudo code of CH selection algorithm is shown as Algorithm 1.

Cluster Formation.
After the CH selection is completed, the ordinary nodes quit the sleep statue and the CHs broadcast CH_ADV_MSG message in the network area. In DEBUC, ordinary nodes join the closest cluster according to the strength of the received signal, which is easy to cause uneven energy consumption. In this case, how to select an optimal cluster to join is a key issue of unequal clustering [23]. To solve this problem, a fitness function is set to each CH, which comprehensively considers the energy, degree of CH, and the distance between ordinary 4 Journal of Sensors nodes and CH. The fitness function is calculated as shown in (10).
where E res i is the residual energy of CH i , E 0 is the initial energy, degree i is the degree of CH i , N is the total number of all sensor nodes, d to i is the distance from ordinary node to CH i , d to BS is the distance from ordinary node to BS, d 1 , d 2 , and d 3 are the weight factors, and The process of cluster formation is shown as Algorithm 2. Figure 2 is an example of unequal clustering of DUCISCA, in which the BS is located at (100,250). As can be seen from this figure, the size of each cluster is different, which achieves the effect of unequal clustering.

Intercluster Multihop
Routing. In order to balance the energy consumption between clusters, ISCA is used to implement multihop routing between clusters. Each CH adopts ISCA to select its relay node in the list of neighbour CHs. The CH transmits the data collected from the members of the cluster to the relay node. The intercluster multihop routing is established, and the CH sends the data to the BS along the optimal path. Input: The locations of the BS and each node in the network, the residual energy and the maximum cluster radius of each node. Output: FCHs.   Journal of Sensors Firstly, the locations of N search agents are randomly initialized in the search space. Secondly, the individual fitness values are calculated based on the objective function. Finally, the current optimal individual locations are selected and saved. In each iteration of the algorithm, the individual updates the position according to (11).
where t is the current iteration, x t id is the position of the i-th solution in d-th dimension at t-th iteration, and P t d is the position of the global optimal solution in the d-th dimension at t-th iteration. There four main parameters in (11), where r 1 = 2ð1 − t/TÞ (T is the maximum number of iterations) is the sine-cosine amplitude adjustment factor, and r 1 determines the direction of the next iteration of the i-th individual; r 2 ∈ ð0, 2πÞ, r 3 ∈ ð0, 2Þ, and r 4 ∈ ð0, 1Þ are random numbers, where r 2 determines the distance for the next iteration of the i-th individual, r 3 is the weight factor of the global optimal individual, and r 4 is the discriminant coefficient.  1] into N equally spaced nonoverlapping subintervals and conduct independent equal probability sampling for each subinterval, so as to ensure that the sampling points are evenly distributed in the whole distribution interval.
(2) Adaptive weight strategy Affected by the particle swarm optimization (PSO) algorithm [26], this paper adds a weight factor to the original position update formula to speed up the convergence of the algorithm. The new position update formula is shown in (12).

Journal of Sensors
where wðtÞ is the weight factor. Its calculation is shown in (13).
where t is the current iteration and Max_iter is the maximum iteration.
(3) Disturbance strategy In order to expand the local search ability, Gaussian variation and Levy flight strategy are introduced to make the individuals trapped in local extreme points jump out of the limit and continue the search.
Gaussian variation [27] comes from the normal distribution of continuous probability distribution, which has good local development ability. The variation formula is Levy flight [28] obeys the Levy distribution. Please refer to [28] for specific mathematical model. The variation formula is  Figure 3: The flow chart of DUCISCA protocol. where x is the original position, mutationðxÞ is the position after Gaussian variation or Levy flight, ⊗ denotes point to point multiplication, randnð1Þ is a random number obeying standard normal distribution, and LevyðλÞ is the random vector obeying Levy flight.

Performance of ISCA.
In order to verify the performance of ISCA, this paper compares ISCA with BOA [29], WOA [30], SSA [31], and SCA. The population size N = 30 and the maximum iteration Max iter = 500 are set, respectively. Each algorithm operates 30 times independently, and the worst, optimal, average, and standard deviation of these 30 experiments are taken as evaluation indexes. This paper takes the first six 50-dimensional test functions in [32] as examples. The theoretical optimal function values are all 0. Table 1 shows the optimal value, the worst value, the mean value, and the standard deviation of the results obtained from each algorithm running 30 times. Through the function optimization simulation experiments of five representative comparison algorithms on 6 different feature benchmark functions, the test results prove that the optimization accuracy and stability of ISCA are significantly improved.

Routing Based on ISCA.
Suppose there are eight CHs in the network, and the CHs' set is fCH 1 , CH 2 ,⋯,CH 8 g. N individuals are randomly generated in the range of (0,1), in which the dimension of the individual is the number of CHs, and each individual represents a multihop path.
CH i takes δ times of the competition radius as the broadcast range. Other prior CHs and the BS that receive this message will become CH i 's neighbour CHs. CH d selects the n d -th CH in its neighbouring CHs' set as the final relay node. The value of n d is calculated as shown in (16).
where x id is the d-th dimensional value of the i-th individual, which denotes the probability that CH d selects the relay node and NeiNumðCH d Þ indicates the number of neighbouring CHs of CH d . In order to reduce and balance the energy consumption of CHs and prolong the network lifetime, this paper takes the standard deviation of CHs' residual energy as the objective function of ISCA. The specific expression is shown in (17).
where m denotes the number of CHs, E res ðiÞ denotes the residual energy of CH i , and E denotes the mean value of CHs' residual energy. The residual energy of CH  9 Journal of Sensors needs to subtract the energy consumed in intercluster communication, which is calculated as (18).
where E bef ðiÞ denotes the residual energy of CH i before intercluster communication, PackNumðiÞ denotes the number of  The flow chart of the implementation of DUCISCA protocol is shown in Figure 3.

Simulation Parameters.
To prove the performance of DUCISCA, the EEUC, DEBUC, I-EEUC, M-DEBUC, and DUCISCA are simulated under the same conditions using MATLAB 2018a, and many performances are compared. We run for 1000 rounds, and each node transmits a data packet each round. The network parameters are shown in Table 2.

CHs' Energy Consumption.
A comparative analysis of CHs' energy consumption after 20 rounds of network clustering is made, as shown in Figure 4. The average CHs' energy consumption of I-EEUC, EEUC, DEBUC, M-DEBUC, and DUCISCA are 0.2077 J, 0.1156 J, 0.1021 J, 0.1030 J, and 0.0980 J, respectively. Obviously, CHs' energy consumption of DUCISCA is lower than other protocols, and the energy consumption of each CH is balanced in each round. The energy consumption of CHs is related to the number of packets forwarded and the forwarding distance. DUCISCA optimizes the selection strategy of CHs and selects high-quality nodes as CHs, and the number of CHs does not change much. In addition, when establishing intercluster forwarding paths, ideal multihop paths are searched through ISCA, and data packets are forwarded accordingly; the energy consumption of intercluster communication is reduced.

Network Lifetime.
Network lifetime is one of the important indicators to evaluate network performance [33]. Table 3 shows the first node death (FDN) and half node death (HDN) times for the five algorithms in Figure 5. If HDN is used as the criterion of network lifetime, the five algorithms have 535, 633, 603, 637, and 668 network lifetimes, respectively. Compared with EEUC, I-EEUC optimizes the CH selection algorithm for more balanced energy consumption. Compared with I-EEUC, DEBUC not only optimizes the rules of CH selection but also takes into account the residual energy of nodes and energy consumption within clusters and energy consumption between clusters in multihop routing, which further balances the network energy consumption. Compared with DEBUC, M-DEBUC improves the competitive radius and waiting time of DEBUC, so it consumes less energy than DEBUC. Compared with M-DEBUC, the proposed DUCISCA considers factors such as node energy, distance to the BS, and node degree when CCHs compete for FCHs, which is more reasonable than M-DEBUC CH selection and the FDN time is later. It can be seen from Figure 5 that DUCISCA can better balance the energy consumption between clusters, so the later death speed is slightly faster than other algorithms. Figure 6 is a comparison diagram of the number of packets received by the BS for the five protocols.  11 Journal of Sensors M-DEBUC. This is because the improved CH selection and multihop routing mode of DUCISCA extends the lifetime of network, especially the improved multihop routing, so that more CHs indirectly send data to the BS through multihop mode, which increases the amount of data received by the BS. Before the arrival of DUCISCA's HDN, most nodes of the compared protocols have died, and there are coverage vulnerabilities in the monitoring area, so it is difficult for nodes to send data to the BS. Figure 7 compares the variation of the network average residual energy for the five protocols with the number of running rounds. Because EEUC only considers energy in routing, the intercluster routing is unreasonable, resulting in less network average residual energy. DEBUC uses a time-based broadcast mechanism to select CH that reduces control overhead effectively. Multihop routing takes into account such factors as CH energy, distance, and node degree. The average residual energy of the network is more than that of EEUC. DUCISCA improves CH selection algorithm and competition radius calculation algorithm, which makes the distribution of clusters more reasonable. Moreover, in order to save and balance energy, DUCISCA adopts intercluster multihop routing based on ISCA strategy that reduces the energy consumption of data transmission effectively. Hence, the energy consumption of DUCISCA is less than other four protocols.

Conclusions
The major contribution of this article is to propose a distributed unequal clustering routing protocol based on ISCA (DUCISCA) to enhance WSN performance in terms of energy efficiency and lifetime. In the CH competitive stage, this protocol improves the competitive radius which considers the residual energy of node and the distance to the BS. It can balance energy consumption of nodes in different locations to avoid the "hot spot" problem. This is a precondition for unequal clustering. At the same time, it adopts a time-based broadcast mechanism. The waiting time depends on the residual energy of CCHs, the distance to the BS, and the number of neighbour nodes, which can effectively reduce the overhead of nodes. After the cluster head selection is completed, it enters the cluster formation stage. In order to produce optimal clusters to balance network energy consumption, a rule for joining clusters is proposed, which takes into account the energy of CHs, the degree of nodes, and the distance between ordinary nodes and CHs. Finally, the standard deviation of cluster CHs' residual energy in intercluster communication is taken as the objective function to search the energy-balanced intercluster data forwarding path based on ISCA. Compared with EEUC, DEBUC, I-EEUC, and M-DEBUC, DUCISCA has better effectiveness compared to above protocols in terms of nodes' energy consumption, network lifetime, throughput, and network average residual energy.
In future work, we will further optimize the DUCISCA protocol according to the experimental results to improve the energy saving effect of the protocol.

Data Availability
Please contact with the corresponding author to acquire the underlying data if necessary.

Conflicts of Interest
The authors declare that they have no conflicts of interest.