Nonuniform Clustering of Wireless Sensor Network Node Positioning Anomaly Detection and Calibration

In order to detect and correct node localization anomalies in wireless sensor networks, a hierarchical nonuniform clustering algorithm is proposed. This paper designs a centroid iterative maximum likelihood estimation location algorithm based on nonuniformity analysis, selects the nonuniformity analysis algorithm, gives the ﬂ owchart of node location algorithm, and simulates the distribution of nodes with MATLAB. Firstly, the algorithm divides the nodes in the network into di ﬀ erent network levels according to the number of hops required to reach the sink node. According to the average residual energy of nodes in each layer, the sink node selects the nodes with higher residual energy in each layer of the network as candidate cluster heads and selects a certain number of nodes with lower residual energy as additional candidate cluster heads. Then, at each level, the candidate cluster heads are elected to produce the ﬁ nal cluster heads. Finally, by controlling the communication range between cluster head and cluster members, clusters of di ﬀ erent sizes are formed, and clusters at the level closer to the sink node have a smaller scale. By simulating the improved centroid iterative algorithm, the values of the optimal iteration parameters α and η are obtained. Based on the analysis of the positioning errors of the improved centroid iterative algorithm and the maximum likelihood estimation algorithm, the value of the algorithm conversion factor is selected. Aiming at the problem of abnormal nodes that may occur in the process of ranging, a hybrid node location algorithm is further proposed. The algorithm uses the ℓ 2, 1 norm to smooth the structured anomalies in the ranging information and realizes accurate positioning while detecting node anomalies. Experimental results show that the algorithm can accurately determine the uniformity of distribution, achieve good positioning e ﬀ ect in complex environment, and detect abnormal nodes well. In this paper, the hybrid node location algorithm is extended to the node location problem in large-scale scenes, and a good location e ﬀ ect is achieved.


Introduction
With the gradual development of sensors in the direction of integration, miniaturization, and networking, wireless sensor network technology was born, bringing a revolution in the field of information perception [1]. It consists of a large number of wireless sensor nodes deployed in a certain area. These sensor nodes are responsible for collecting, processing, compressing data, transferring data packets from other nodes, and sending the data packets, so as to realize the monitoring of the designated objects in the deployment area of the entire network [2]. The wireless sensor network integrates the functions of information collection, information processing, and information transmission. It can perceive various environmental data and target information in real time and use the access network to deliver it to the observer [3]. Node location is also the basic supporting information in many wireless sensor network protocols and algorithm designs. Therefore, how to achieve more precise positioning of nodes is one of the key technologies of wireless sensor networks [4].
Wireless sensor networks have obvious potential advantages and broad application prospects in the acquisition and transmission of information, but a large number of resourceconstrained sensor nodes cause them to face huge challenges in many aspects such as network design and information processing [5]. At present, most of the corresponding theoretical results are out of touch with practical applications, and the practical application performance of the theoretical results is not good [6]. In particular, there is a lack of innovative research on the entire system, so there is a lot of room for research and development. With the renewal of wireless sensor technology, the application of mobile nodes is constantly developing and expanding. Mobility anchor nodes can increase the coverage of network nodes and provide more location reference information for unknown nodes. Mobile unknown nodes are mainly used in the fields of mineral development, environmental protection, and logistics production. In mobile wireless sensor networks, the positioning of mobile unknown nodes also plays a vital role and is the foundation and core of the entire wireless sensor network [7]. Compared with static node positioning, the movement of mobile nodes increases the uncertainty of the network, which makes the topology of the entire network constantly change, and the positioning process is more difficult. It raises the battery capacity, computing power, and fault tolerance of sensor nodes [8]. Therefore, mobile node positioning technology with excellent performance is both a research hotspot and a difficult point in wireless sensor networks and has important research significance.
This paper presents the judgment method of nonuniformity analysis and the distribution diagram of the distribution points based on the self-deviation value, designs a hybrid positioning algorithm process based on nonuniformity analysis, and trains the improved centroid algorithm iteration termination parameters to obtain the best values of α and β under time complexity. We estimate the calculation of the positioning algorithm's positioning error in the wheat field and select the algorithm conversion factor. By comparing the positioning error of the maximum likelihood estimation method with the hybrid positioning error based on nonuniformity analysis, it is concluded that the latter can be better applied to the positioning of wireless sensor network nodes in the wheat field environment. In order to deal with the problem of inaccurate and incomplete ranging information in the actual application environment, we propose a novel anomaly-aware wireless sensor network node location algorithm. The hybrid node location algorithm proposed in this paper is divided into two steps. Based on the inferred complete and accurate EDM, the classical MDS method can be used to easily estimate the positions of all unknown nodes. In addition, the hybrid node positioning algorithm can also be extended to large-scale positioning schemes.

Related Work
As one of the supporting technologies of wireless sensor networks, the problem of node location has been studied by scholars at home and abroad [9]. At the same time, some scholars have conducted a relatively complete overview of positioning technology [10,11]. In recent years, with the introduction of rigidity theory, positioning technology has made great progress in theory and algorithm research [12]. There have been many systems and algorithms to solve the problem of wireless sensor network node positioning [13]. The wireless sensor network node positioning algorithm can be divided into a positioning algorithm based on ranging and a positioning algorithm that has nothing to do with the ranging. The positioning algorithm based on distance measurement can achieve more accurate positioning, but the calculation and communication overhead is large, and certain hardware support is required. Relevant scholars proposed a positioning algorithm based on hop-distance correction particle swarm optimization in a randomly distributed network topology environment [14]. The algorithm adds a beacon node to weigh the error of its average per-hop distance and perform positioning on the positioning node. The improved particle swarm optimization algorithm can improve the accuracy of positioning and the stability of the algorithm, but the increase in the calculation of the algorithm during the optimization process greatly affects the survival time of the node. Relevant scholars have proposed a threedimensional multidimensional calibration algorithm based on a new distance estimation, defined a dissimilarity matrix to represent the distance between nodes in the network, and determined the unknown node coordinates in the local coordinate system [15]. The unknown node coordinates in the coordinate system are transformed into the global coordinate system. Related scholars combined maximum likelihood estimation and weighted centroid algorithm and proposed a hybrid algorithm [16].
The coordinates of unknown nodes are roughly estimated by the maximum likelihood estimation method, and the weighted centroid algorithm is used to optimize the positioning results. After a long period of experimental time, the environmental state has changed a lot, and these changes have a huge impact on the characteristics of the wireless signal transmission channel. As the parameters change, using this model to calculate the relationship between the distance between nodes and the RSSI value will produce a nonnegligible error [17].
Relevant scholars communicate through the hardware equipment installed by the research node and propose a scheduling algorithm that uses multicast technology to meet different priority information requests [18]. Multicast is used to save time and bandwidth resources and has better performance. However, the premise of this method is to receive messages based on hardware device requests, and it is passive to not be able to publish messages in real time. Researchers propose a two-phase AFL positioning method for WSN independent of reference nodes [19]. During the simulation process, all nodes in the network cooperate to complete positioning. It can ensure the flexibility of the node and improve the accuracy of the node to a certain extent, but due to the cyclic positioning, the energy consumption of the node is too large and does not meet the actual application. The Map Stitching algorithm proposed by related scholars is a method based on graph theory, which piece together the global graph to get the position of all nodes according to the obtained partial graph [20]. However, because there are many overlapping parts in the local map, the local positioning error will accumulate too much, resulting in low positioning accuracy. On the basis of the received signal 2 Journal of Sensors strength, relevant scholars use the maximum probability likelihood theory to establish a realistic probability positioning model based on the received signal measurement [21]. For the characteristics of the highly nonlinear nodes in the model that are not easy to solve, the sensor network communication characteristic algorithm is designed using the theoretical method of optimization calculation to solve it, and the convergence of the algorithm is proved by a random process to further obtain the optimal solution. However, the complexity of the algorithm is greatly affected by the prediction operation before positioning [22].

Nonuniform Clustering Method for Wireless Sensor Network with Local Cluster Reconstruction Mechanism
3.1. Structure Diagram of Wireless Sensor Node. The wireless sensor network node has relatively weak processing capabilities, communication capabilities, and storage capabilities, so the sensor node mainly includes sensor module, processor module, wireless communication module, and power management module. The design of the node needs to be considered in terms of volume, price, and energy supply, and there are certain restrictions on the performance of all aspects. Its structure is shown in Figure 1. Suppose that N sensor nodes are deployed in a rectangular monitoring area to periodically send data to the sink node. Each node has a unique identification number (ID).
When a node sends a l-bit data to another node whose distance is d, the transmission energy consumed by it is Among them, E Tx-elec is the loss of the transmitting circuit, and E elec is the energy lost by the transmitting circuit. E Tx-amp is the power amplification loss, and different models are adopted according to whether the transmission distance d exceeds the threshold d 0 : when d 0 > d, the free space model is adopted; when d 0 < d, the multipath attenuation model is adopted. ε fs and ε mp are the energy required for power amplification in these two models, respectively. When a node receives 1 bit of data, the energy it needs to consume is In addition, the energy consumed when a node merges data per bit is EDA.
3.2. Nonuniform Clustering Algorithm. One round of this agreement consists of an establishment phase and a stabilization phase. At the beginning of each round of the establishment phase, a method similar to the flooding algorithm is used to enable all nodes to obtain basic local information. Each node only needs to forward the flag message once to obtain its approximate location information and neighbor node information. Suppose that node i saves a data set S i , n used to record the information of neighbor nodes, mainly including the ID of the neighbor node and the distance d ij between node i and node j. In addition, use H i to represent the number of hops required from the node to the sink node.
According to the feedback information of the nodes, the sink node first counts the total number of nodes N l in level l and calculates the average remaining energy AE l of the level. The sink node sets the node whose remaining energy is greater than the average remaining energy of the layer to which it belongs as the cluster head candidate of the layer. The sink node records the IDs of these nodes and counts the total number N l,h . Then, the sink node randomly selects N l among the remaining nodes in each layer, that is, the nodes whose remaining energy is lower than the average remaining energy of the layer according to the "exploration" strategy, and e nodes become additional candidate cluster heads.
Among them, P l represents the proportion of additional candidate cluster heads in this layer: The number of additional candidate nodes is inversely proportional to the distance from the level to the sink node. After determining all the candidate cluster heads, the sink node broadcasts the ID of the candidate node to the network through the "CANDIDATE_ID" message.
In order to make the selected cluster head in the central position as much as possible to reduce the overall data transmission energy consumption, this paper proposes a weightbased candidate cluster head election method; the core of which is to comprehensively consider nodes when selecting election weights. If node i is selected as a cluster head candidate node, the weights W i,C of its election for cluster head are Among them, N i,n is the number of neighbor nodes of node i, RE i is the remaining energy of node i, and A, B, and C are weighting factors, which can be changed according to the specific application background.

Local Cluster Reconstruction
Mechanism. The cluster head cycle mechanism of reselecting cluster heads regularly can balance the energy load of each node. However, the reclustering of the entire network requires the transmission of a large number of control messages, which brings about a lot of additional communication energy consumption. Due to the different energy consumption of nodes in different regions in the network, the degree of demand for reclustering in different regions is also different. In the adopted 3 Journal of Sensors multihop data transmission mode, the energy consumption rate of cluster heads close to sink nodes is faster than that of cluster heads far away. Therefore, cluster head nodes in areas close to sink nodes require more replacement, and the replacement rate of cluster heads should be higher than other areas. In order to effectively reduce the energy consumption caused by cluster reconstruction and more effectively alleviate the "hot zone" problem, this paper proposes a hierarchical-based local cluster reconstruction mechanism.
Similar to LEACH, this algorithm also periodically reselects cluster heads in the entire network in units of rounds, but the time setting of each round of this algorithm will be longer than LEACH. In the stable phase, each network layer performs a local reselection of cluster heads within the layer with different frequencies.
Let l max be the total number of network levels and T R be the running time of each round. T 0 is the minimum time interval for performing cluster reconstruction, and T l is the time interval for performing local cluster reconstruction on the lth layer, which is defined as follows: At the beginning of each round, the sink node will start a timer t r to record the running time of the network. When t r = T R , the sink node will resend a "HELLO" message to make the network enter a new round and start the reclustering of the entire network.
When the network enters the stable phase, each level will start its own timer t l . When t l = T l , the lth layer starts the local cluster reconstruction operation of this layer and makes t l = 0 reselection to start timing. First, all cluster heads in the l layer send a "RE_CLUSTERING" message announcing the start of cluster head reselection to their neighbor nodes. After receiving the message, these neighbor nodes will calculate their respective elections in this cluster head reselection. The weight W i,r is sent to the cluster head through the "RCH_COM-PETE" message. The calculation method is as follows: Among them, θ ′ and δ ′ are weighing factors, which can be changed according to the specific application background.

Design of Hybrid Node Positioning
Algorithm Based on Nonuniformity Analysis 4.1. Analysis of Node Nonuniformity. Let X n be n nodes in the unit m-dimensional space C m , when the area occupied by a single node is v = 1/n, the nonuniformity of n nodes in C m is good, and the optimal point radius Rstðn,−mÞ is The self-deviation value D is the uniformity measurement value defined under the maximum cavity and the minimum cavity, and its calculation formula is as follows:   Journal of Sensors Among them, the value range of D is ½0, 1Þ. The larger the value of D, the more uniform the distribution of points. When the nodes are concentrated at the same node, the value of D is 0.

Selection of Hybrid Positioning Algorithm Based on
Nonuniformity Analysis. After evaluating the nonuniformity of the nodes, this paper first selects the centroid iteration algorithm to calculate the position based on the deviation of the distribution points and improves the algorithm to adapt to the nonuniformity-based positioning algorithm and reduce the complexity of the algorithm. The difference between the calculated result and the actual position of the node is introduced into the centroid iteration-maximum likelihood estimation positioning algorithm.
After the network nodes are arranged, the anchor node broadcasts its position to its neighbor nodes by flooding. At the same time, it forwards the position information transmitted by other anchor nodes, and then, the external network sends positioning instructions and related parameter values to it, which is very important to the nodes in the network. Figure 2 shows the flow of hybrid positioning algorithm based on node nonuniformity analysis.

Calculation of Distance between
Nodes. The algorithm flowchart 2 establishes a prediction model of signal loss in the environment; after the unknown node receives the signal transmitted by the anchor node in the connected area, the signal strength is measured. We calculate the channel loss value through the signal strength value and then substitute it into the OP-1-R model formula according to the transmitting antenna height h T , the receiving antenna height h R , the signal wavelength λ, the signal frequency f , and the average height H of the wheat plant transmitted from the external network.

Centroid Iteration Algorithm
Process. The centroid iteration algorithm is an improvement of the centroid positioning algorithm. Since centroid iteration is a hybrid algorithm without ranging, its accuracy largely depends on the nonuniformity of points in the sensing field, and it is not sensitive to distance errors between nodes. Therefore, theoretically, it is very suitable to calculate the node position within a certain nonuniformity range and control the error within the ideal range.
The algorithm first calculates the distance between the centroid of the connected node and the unknown node in the sensing field, and then replaces the connected node farthest from the unknown node with the centroid node to reduce the plane surrounded by the connected node in the sensing field, and finally passes through multiple algorithm iterations. The algorithm principle is as follows.
Assuming that the coordinates of the unknown node O are ðX, YÞ, the N anchor nodes connected to the O node are S n , where the S n coordinates are ðX n , Y n Þ and the  Figure 2: Nonuniformity analysis centroid iterative maximum likelihood estimation location algorithm.
In the plane surrounded by N anchor nodes, the coordinates of the center of mass O1 are (X O1 , Y O1 ), and the calculation formula is as follows: In order to prevent the iterative algorithm from falling into an infinite loop or to improve the accuracy at a very high cost, it is necessary to set an end condition suitable for node uniformity evaluation to ensure that the algorithm is closer to the real coordinates under a reasonable number of algorithm iterations.
Since the centroid iterative algorithm in this algorithm runs under the condition of good nonuniformity, there is no need to use the APIT algorithm to judge the positional relationship between the O point and the connected anchor node, thereby reducing the complexity of the centroid iterative algorithm. In this paper, based on the relationship of nonuniformity, the iteration termination conditions are Among them, β is the number of iterations, and β and α are fixed values.

Centroid Algorithm Iteration Termination Parameter.
From the description of the iterative algorithm in the previous section, it can be seen that the setting of the iterative termination condition greatly affects the complexity of the centroid iterative algorithm. Therefore, this paper uses MATLAB to simulate the algorithm and select the parameters α and η by observing the number of iterations of the algorithm.
Suppose the point ð45, 45Þ is the true coordinate value of the unknown node, the points are arranged according to the difference of the nonuniformity deviation value, and the parameter β and α values are analyzed, as shown in Table 1.    Journal of Sensors

Maximum Likelihood Estimation Positioning
Algorithm. The maximum likelihood estimation positioning algorithm is completely based on the ranging algorithm. Because this type of algorithm does not need to add additional hardware modules as support, it is a low-power and cheap wireless sensor network node positioning method. Therefore, it can generally provide a higher accuracy value in field positioning than without the need for a ranging algorithm. However, as an algorithm based on RSSI signal strength measurement, it is necessary to consider the various interferences received by the wireless signal in the experimental environment, which is also an important reason for establishing the channel loss model. The principle of the algorithm is described as follows.
The relationship between anchor node and unknown node is Among them, ðX n , Y n Þ are the anchor node coordinates, ðX, YÞ are the unknown node coordinates, and d n is the estimated distance between the anchor node and the unknown node. X is the coordinate value of the unknown node to be sought.

Experimental Site Setting.
In order to ensure normal communication between nodes, the abscissa and ordinate are set to 100 m as the basic values of node positioning in this wireless sensor network. In the perception field, we set O as the origin of coordinates, and ðO, aÞ, ðO, bÞ are the abscissa and ordinate of the perception area, respectively. The anchor nodes are arranged according to the D value of different nonuniformity analysis. The unknown node is placed in the center of the anchor node.
After the anchor node undergoes flooding broadcast, the unknown node receives the anchor node's coordinates in the sensing area and the received signal strength of its relative     7 Journal of Sensors point. During the operation of the network, the anchor node sends its own node coordinates to neighboring nodes in flooding mode. When the unknown node receives the anchor node data, its signal strength is measured, and then, the anchor node position and received signal strength are transmitted through the gateway node.
The intermediate nodes are unknown nodes, the nodes on the right are gateway nodes, and the rest are anchor nodes; the location model is shown in Figure 3.
In the experiment, the number of anchor nodes 5, 11, 16, and 21 was selected, among which the number of nodes was 5 and 11, and 20 experiments were carried out according to the change of the self-bias value; when the number of nodes was 15, the experiment was carried out 15 times according to the change of the self-bias value. When the number of nodes is 21, we perform 5 positioning experiments. This paper chooses DD = 1/D as the abscissa to better observe the relationship between the error value and the nonuniformity.
In this experimental site, according to the nonuniformity of the nodes, the position of the unknown node in the coordinate system is calculated through the received signal strength and anchor node coordinates, so as to verify the effectiveness of the algorithm.

Node Location without Noise and Abnormality.
This experiment assumes that all the obtained ranging information is accurate. The horizontal axis of the figure represents the sampling rate, and the vertical axis represents the positioning error of the WSN node. Obviously, it is found that the positioning error of the hybrid node positioning algorithm is relatively minimal. When the sampling rate reaches around 0.6, the node positioning error of our algorithm and the three comparison algorithms can reach a very low level. When the sampling rate changes from 0.1 to 1, the positioning errors of the four algorithms are basically less than 5%. This shows that the algorithm proposed in this paper and the existing node location algorithm can achieve highprecision node location under ideal conditions. Figure 4 shows the variation of the average positioning error of WSN nodes when EDM is sampled at different ratios.

Node Location under Complex
Noise. This experiment is aimed at examining the influence of complex noise on the performance of four node localization algorithms. Therefore, we only added complex noise to the EDM. In this application scenario, the SVT-based algorithm and the OptSpacebased algorithm with different sampling rates have relatively large positioning errors, which indicate that the two methods cannot handle complex noise well. When the sampling rate reaches 0.9, the positioning error of the hybrid node positioning algorithm can reach 3%. Taken together, the hybrid node location algorithm in this paper has the best performance under complex noise. Sampling the EDM in different proportions, the experimental results of the four algorithms are shown in Figure 5.

5.4.
Positioning under Abnormal Node. The purpose of this experiment is to test the performance of the algorithm in detecting node anomalies in a noise-free environment. In this case, only the exception is added to the EDM. As shown in Figure 6, when there is an abnormality in the EDM, the positioning errors of the four algorithms are maintained within a certain acceptable range. However, compared with the other three methods, the hybrid node location algorithm can achieve better performance at the same sampling rate. In addition, the hybrid node location algorithm can also detect abnormal nodes. Therefore, when some sensor nodes are abnormal, the hybrid node location algorithm has the best performance, which can accurately locate unknown nodes and detect abnormalities. Figure 6 shows that the hybrid node location algorithm has the highest anomaly recognition accuracy.

Node Location under Complex Noise and Anomalies.
In order to evaluate the performance of hybrid node localization algorithm in the application environment where complex noise and anomaly coexist, this experiment adds complex noise and anomaly to EDM to simulate the impact of complex practical application scenarios on node localization.
Obviously, compared with the other three algorithms, the hybrid node location algorithm proposed in this paper   Journal of Sensors not only has the lowest location error but also has the highest recognition accuracy of abnormal nodes at the same sampling rate. In short, our hybrid node location algorithm is robust to complex application scenarios and can accurately detect faulty nodes.
The experimental results are shown in Figure 7; the accuracy of anomaly recognition in scenario 4 is tested.
5.6. Large-Scale Scene Positioning. In actual application scenarios, there is a situation: when the size of the positioning scene is much larger than the ranging length of the sensor node, only a few ranging information between nodes can be collected. Therefore, the sampled EDM matrix based on distance measurement will be very sparse, which causes the performance of the method proposed in this paper to deteriorate. To solve this problem, this paper proposes a largescale positioning method. The experimental results are shown in Figure 8; results confirm that the extended largescale scene hybrid node positioning algorithm can achieve better positioning results.
Suppose there is a 200 × 200 (scene unit is a unit length) rectangular area, divided into three parts I, II, and III from left to right. In order to verify the feasibility of the largescale scene location method, this paper designs an experiment in a scene where complex noise and anomalies coexist. In this experiment, the EDM sampling rate of the positioning area is set to 0.5.

Conclusion
Nonuniform clustering routing protocol based on hierarchical wireless sensor networks divides the network into different levels according to the number of hops required by the node to reach the sink node; the average node also has a chance to become a candidate cluster head. When generating the final cluster head, the remaining energy of the node, the number of neighboring nodes, and the distance to the neighboring nodes are comprehensively considered. These can ensure that the final cluster head node is located in the relative center of the cluster as much as possible. According to the distance between the level and the sink node, different levels use different frequencies to reselect cluster heads within the scope of this level. According to the selfdeviation D value, the centroid iterative positioning algorithm error based on the self-deviation value distribution is obtained. By analyzing the positioning error of the maximum likelihood estimation method based on the self-bias value, the algorithm conversion value Φ is obtained. This paper designs the corresponding optimization algorithm for these two models and obtains the distance information between all pairs of nodes. When the area of the monitoring area is relatively large, there may be cases where the relative distance of a large number of nodes exceeds the range limit of the sensor, resulting in too sparse EDM, which in turn leads to a decrease in the accuracy of the positioning algorithm in this paper. This paper extends the hybrid node location algorithm to a certain extent and proposes a solution that can effectively locate in this scenario. The simula-tion experiment results show the effectiveness of the algorithm in this paper.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.