A Distributed Anchor Node Selection Algorithm Based on Error Analysis for Trilateration Localization

This paper proposes a distributed anchor node selection algorithm based on error analysis for trilateration localization (EATL). The influence of distance measurement error on localization accuracy is discussed from two aspects: condition number of triangle formed by the three anchor nodes and the relative position between the unknown node and the three anchor nodes. Based on the error analysis, three principles for optimizing the selection of anchor nodes are given and then an algorithm for selecting anchor nodes on the ring is proposed.


Introduction
In a distributed sensor network, for most applications, such as target tracking, environmental monitoring [1], the geographical information of sensor nodes needs to be known. Estimation of node position is a fundamental requirement in distributed sensor networks. One possible solution is to install GPS receiver for each sensor node (or similar system, such as BeiDou Navigation Satellite System), but this scheme is limited by the characteristics of distributed sensor network itself. On the one hand, the cost of node with GPS system will be two orders of magnitude higher than that of ordinary node [2]. On the other hand, according to the different applications, sensor nodes are often deployed in the interior, city buildings, or even forest environment, and satellite signals are easily affected by many factors such as multipath interference and occlusion [3]. The localization accuracy is poor or even affecting its usability.
Cooperative localization [4,5] is a new idea to realize high accuracy positioning in GPS denied environments. The basic idea is to use the following information to assist node localization, such as the distance information obtained from the communication between sensor nodes, the relative velocity information via Doppler shift measurement in the dynamic network. In large-scale distributed networks, due to the limited communication capability, nodes only can interact with their neighbors. All nodes will form a connected multihop network. According to the different application scenarios, coordinates here can be standard coordinates, such as latitude and longitude, or relative coordinate system. On the one hand, arranging a large number of anchor nodes is expensive. On the other hand, in some applications such as battlefield environments, anchor nodes only can be deployed around the network. This does not guarantee that all unknown nodes are adjacent to anchor nodes and obtain enough localization information. Consider an application in a two-dimensional network as shown in Figure 1; anchor nodes are arranged around the network, and unknown nodes are arranged in the network. Initially, anchor nodes broadcast the coordinate information. Due to the limitation of the communication radius, only a few unknown nodes can obtain enough localization information to estimate their own coordinates. The unknown node which has completed its own location will be an anchor node to assist other unknown nodes to be located. Through this kind of information interaction between nodes, cooperative localization of the whole network will be completed.
Localization techniques of the distributed sensor network can be divided into two categories: range-based localization algorithm and range-free localization algorithm. This paper only considers range-based localization algorithm. This technique includes three main categories: Trilateration-based algorithm [6], Maximum Likelihood-based algorithm [7], and Multidimensional Scaling (MDS)-based algorithm. In Trilateration-based algorithm, unknown nodes measure the distances to three neighbor anchor nodes and then use this information to estimate their locations. The specific mathematical model will be given in the next section. When there are more than three neighbor anchor nodes, the unknown node should choose three to estimate its location. Maximum Likelihood-based algorithm is different from Trilaterationbased algorithm. When the unknown node has more than three neighbor anchor nodes, it will utilize the information of all anchor nodes to construct an overdetermined equation and find the least square solution. The MDS algorithm based on distance measurement is divided into two main categories. The first category is classical MDS [8,9]. This algorithm provides relative coordinate to a seemingly nonconvex localization problem, using only singular value decomposition. Within the classical MDS framework, the complete Euclidean distance matrix is needed, but this matrix is often very difficult to be obtained. The second category is to construct a pressure function [10][11][12] and use SMACOF [13] algorithm to minimize the pressure function and get the relative coordinate estimate. SMACOF algorithm is an iterative solution which will lead to high computational complexity. Although Trilateration-based algorithm and Maximum Likelihoodbased algorithm need a certain number of anchor nodes, their computational complexity is low and they are suitable for cooperative localization of large-scale distributed networks.
The main distance measurement techniques of rangebased localization algorithm are Received Signal Strength Indication (RSSI) [14], Time-of-Arrival (TOA) [15], and Ultrawideband (UWB) [16]. There is also a tradeoff between device cost and range accuracy. Using RSSI technique is cheap but the accuracy of measurement may be low, using TOA technique needs to guarantee time synchronization, and using UWB technique can achieve accuracies on the order of centimeters, but at the expense of high device and energy costs. It should be noted that even in idealized setups with no obstacles or other external factors, relatively small error from noisy sensor measurements can induce much larger errors in node position estimate [17]. In Trilateration-based algorithms, effect of range measurement error on localization accuracy is mainly related to the selection of anchor node. In [6], an anchor node selection scheme based on the minimum condition number is proposed to improve the localization accuracy. However, this scheme only considers the influence of the distribution of three anchor nodes on localization accuracy. Actually, the relative position between the unknown node and the three anchor nodes will also have a great impact on localization accuracy. In Trilateration-based algorithms, when the unknown node completes the estimation of its location, it will become an anchor node to assist other unknown nodes to be located. In this process, the iterative error will be produced. Accumulation of measurement error will increase rapidly if the anchor nodes are not selected properly. Therefore, the selection of anchor nodes will have an important influence on localization accuracy.
The main contributions of this paper are as follows: (1) The influence of distance measurement error on localization accuracy is discussed from two aspects: condition number of triangle formed by the three anchor nodes and the relative position between the unknown node and the three anchor nodes.
(2) Based on error analysis, an anchor nodes selection scheme (EATL) is proposed, which can effectively improve the localization accuracy.
The rest of the paper is organized as follows. The mathematical model of Trilateration localization is provided in Section 2 and error analysis is provide in Section 3. Based on the analysis in Section 3, an anchor nodes selection algorithm is given in Section 4. Simulation analysis is provided in Section 5, while Section 6 concludes the paper.

Mathematical Model of Trilateration Localization
We use ( , ) to represent the unknown node as shown in Figure 2; three anchor nodes within the communication radius of are 1 ( 1 , 1 ), 2 ( 2 , 2 ), and 3 ( 3 , 3 ), respectively. The exact Euclidean distances without any noise between and 1 , and 2 , and 3 are 1 , 2 , and 3 , respectively. The coordinates of can be obtained by solving (1).
We get (2) by using the third formula minus the first formula and the third formula minus the second formula in (1).

(
In practice, the measured distances are modeled as a noisy version of the actual node distances. For example, the actual distance between anchor node 2 and is 2 , the measured distance is 2 , and Δ 2 denotes the distance measurement error. In this case, the three circles intersect to form an area as the shaded area shown in Figure 3. In Figure 4, the three circles intersect with one another. From elementary geometry, we know that the lines , , and will intersect at one point. When we solve (2), we get the coordinates of the intersection point actually.

Error Analysis
When the distance measurement error Δ exists, the coordinates of the unknown node obtained by solving (2) will also be inaccurate. Even if Δ is small, the coordinates of obtained by inappropriate anchor node combination may also have great errors. See the example in Table 1; ( , ) refers to the coordinates of the three anchor nodes, the third column is the actual distance between the unknown node and the anchor node, and the fourth column is the measurement distance with errors.
Based on actual distances, we can get the exact coordinates of as (790, 157) by solving (2). But from measurement distances, the solution is (760, 402). It can be seen from Table 1 that the differences between actual distances and measurement distances are negligible, but the differences of solutions, especially ordinates, are several hundred meters. Therefore, it is very important to study which factors will affect the solution of (2). To simplify the analysis, we rewrite (2) as follows: where is a coefficient matrix, denotes the coordinates of , and is a column vector. We have is completely described by the coordinates of the three anchor nodes and can also be understood as the distribution of the three anchor nodes in the two-dimensional space. The elements in are completely determined by the coordinates of the three anchor nodes and the distances from the unknown node to the anchor nodes. That is, the relative position of the unknown node and the three anchor nodes will have a decisive influence on . In this paper, the error analysis will be made from the distribution of three anchor nodes and the relative positions of the unknown nodes and the three anchor nodes. In Section 3.1.1 we discuss the situation when there is only distance measurement error; that is, we assume that the coordinates of the three anchor nodes 1 , the distance measurement error exist. We assume that the distance measurement error Δ is a random variable which obeys uniform distribution, Δ ∼ (− , ).

The Influence of the Distribution of Anchor Nodes on
Localization Accuracy 3.1.1. The Influence of Distance Measurement Error. In this paper, we always assume that a matrix norm ‖ ⋅ ‖ is 2-norm; that is to say, ‖ ⋅ ‖ 2 = √ max ( ). According to the property of matrix norm, we have ‖ ‖ ≤ ‖ ‖ ⋅ ‖ ‖.
When the distance measurement error exists, the error is mainly reflected on the column vector in (3); can be rewritten as follows: and, thus, using , it follows that where Δ is the error of solution and ] .
Here we assume that the matrix is nonsingular and we can obtain the following.

The Influence of Distance Measurement Error and Anchor
Nodes Position Error. When both the anchor nodes position error and the distance measurement error exist, the error is mainly reflected in the coefficient matrix and the column vector . Equation (6) should be rewritten as follows.
Here Δ is expressed as follows.
We assume that ‖Δ ‖ is small and can satisfy ‖ −1 ‖ ⋅ ‖Δ ‖ < 1. If ‖Δ ‖ is very large, the coordinates error of the anchor node itself is very large. Substituting (3) in (11) yields Computing the norm we have And then So we have the following.
From inequality (16), we obtain the following.

The Influence of
( ) on Localization Accuracy. From (9) and (17), the upper bound of ‖Δ ‖/‖ ‖ is dependent on ‖ −1 ‖ * ‖ ‖. In (1), we use the third formula minus the first and the second formula, respectively, and then get the coefficient matrix ] .
Similarly, we can also use the first formula minus the second and the third formula, respectively. Then we will get indicates the collinear degree of three anchor nodes. From the theory of illconditioned matrix, the greater the condition number of is, the more sensitive the linear equation with coefficient matrix is to Δ and Δ . When three anchor nodes constitute an equilateral triangle, the sum of the condition numbers will obtain a minimal value 5.1963.
But in most cases, it is difficult to find three anchor nodes that just form an equilateral triangle. In order to illustrate the influence of the condition number on localization accuracy, we do the experiment as follows. As shown in Figure 5, 600 unknown nodes are randomly deployed in a two-dimensional area of size 100 * 100. The coordinates of anchor nodes 1 and 2 are (25, 25) and (75, 25), respectively. The x-coordinate of 3 is 50, and the y-coordinate starts from 27, sampling every 0.5 meters, sampling 100 times, and observing the influence of the condition number of triangles formed by the three anchor nodes on localization accuracy. We assume that each unknown node can communicate directly with any anchor node and distance measurement error Δ obeys the uniform distribution (−2, 2).
The MAE of the location estimates is given by where is the number of unknown nodes, ( , ) are the exact coordinates of unknown node , and (̂,̂) are the estimation coordinates. Δ denotes the distance between the estimated position and the real position of unknown node . Δ denotes MAE.
To give a dimensionless form, we define Δ as follows: where Δ denotes the distance measurement error between unknown node and anchor node . The relationship between Δ and the condition number is shown in Figure 6. As shown in Figure 6, Δ has an increased trend with the increase of condition number, but it is not a completely monotonically increasing relationship. This shows that Δ is not completely determined by condition number of the triangle formed by the anchor nodes. When the condition number is less than 18, Δ is less than 3.6. In the process of computer simulation, we find that the condition number of the triangle formed by the three anchor nodes has the following function relationship with the degree of inner angles ( 1 , 2 , 3 ) of the triangle. From this equation, when the condition number is 18, the minimum interior angle is 13 ∘ .
When the shape of a triangle is determined, its condition number is also determined. As shown in Figure 7, the condition number of isosceles right-angle triangle formed by 1 , 2 , and 3 is 6.2361 and the condition number of isosceles right-angle triangle formed by 1 , 2 , and 3 is also 6.2361.
Although condition numbers of the two triangles are the same, their influences on error Δ are different. From we obtain the following. Using the data in Table 1 ]. The large elements in −1 will lead to a large error of location estimate.

Influence of the Relative Positions of the Unknown Nodes and Anchor Nodes on Localization Accuracy.
In this section, we only discuss the situation in Section 3.1.1; that is, the coordinates of the anchor nodes are precise, and only distance measurement error exists. According to the expression of Δ in Section 3.1.2, when ‖Δ ‖ is small, the elements in Δ are mainly affected by the relative position of unknown node and the three anchor nodes. This is similar to the case of Δ . When estimating the coordinates of , there are two groups of anchor nodes: 1 , 2 , 3 and 1 , 2 , 3 as shown in Figure 8. The triangle 1 2 3 is the expansion of the triangle 1 2 3 . We assume that line 1 2 is times as long as line 1 2 .
In the case of fixed distance measurement error Δ , for triangle 1 2 3 , we have For triangle 1 2 3 , we have and Δ ] .
In fact, when the positions of the three anchor nodes are determined (that is, matrix −1 is determined), the elements in Δ are mainly determined by the relative positions of the unknown nodes and the three anchor nodes. In geometry, the Fermat point of a triangle, also called the Torricelli point or Fermat-Torricelli point, is a point such that the total distance from the three vertices of the triangle to the point is the minimum possible. 1 , 2 , and 3 must be satisfied: 1 + 2 + 3 ≥ min . min is the minimum total distance.
Since the value of Δ 1 2 − Δ 3 2 is small, we assume and in the same way we have (28) In the same way we have ‖Δ 1 /2‖ 2 2 and ‖Δ 2 /2‖ 2 2 . Then we obtain the following.
To get the minimum value of (29), the problem can be transformed into the following optimization problem.
The objective function of (30) is a convex function, and the inequality constraints are all linear functions, so the K-T point of (30) must be the global optimal solution. The K-T point can be obtained as 1 = 2 = 3 = min /3.
According to the above analysis, the distances between the three anchor nodes and the unknown node should be as similar as possible.

Design Principle.
The algorithm presented in this paper (EATL) abides by the following three principles: (1) The minimum internal angle of the triangle formed by the three anchor nodes should be larger than 13 ∘ .
(2) The shortest edge of the triangle formed by the three anchor nodes should be as long as possible.
(3) The distances between the three anchor nodes and the unknown node should be as similar as possible.
Based on the above principles, we can select the anchor nodes on the ring centered on the unknown node. As shown in Figure 9, is the unknown node, nodes marked in red are anchor nodes, is communication radius, and is the inner radius of the ring.
We only select anchor nodes on the ring shown in Figure 9. On the one hand, it reduces the complexity of the algorithm. In Trilateration algorithms, if unknown node selects the optimal combination among all neighbor anchor nodes, 3 calculations will be performed, where is the average number of neighbor anchor nodes. If the unknown node only selects anchor nodes on the ring, then will be reduced to = ⋅( 2 − 2 )/ 2 . On the other hand, it ensures that the distances between the unknown node and the three anchor nodes are as similar as possible, which also satisfies principle (3).
The six anchor nodes marked with black circle are available for in Figure 9. Thus, 6 anchor nodes will have 3 6 = 20 combinations. After narrowing down the selection range of anchor nodes, we will select 3 among these 6 anchor nodes according to principle (1) and principle (2).
For principle (1), the minimum internal angle of the triangle formed by the three anchor nodes should be larger than 13 ∘ . This principle is to reduce the collinearity of the three anchor nodes. In order to guarantee principle (2), we set the shortest side length threshold ℎ ℎ , as shown in Table 2. Among all combinations satisfying the threshold, we select the maximum min ( ), as shown in Table 2.
It should be noted that the inner radiuses and ℎ ℎ are given in Section 5 through simulation experiments, where = 0.6 , ℎ ℎ = 0.8 .

Symbol Description.
The main parameters and variables used in the algorithm are shown in Table 2. The ring shown in Figure 9 1( ).
ℎ Neighbor anchor nodes set of unknown node on ring, recording distance and location information of anchor nodes ( ) The number of neighbor anchor nodes of unknown node 1( ).
1 Localization flag of unknown node . The initial value is 0, update to 1 after localization 1( ).

Algorithm Procedure.
In the beginning, initial anchor nodes broadcast their location information, and unknown nodes collect the information of neighbor anchor nodes and measure the distance. The unknown node starts executing the anchor nodes selection algorithm. The algorithm flowchart is shown in Figure 10. After completing localization procedure, the unknown node updates 1( ). 1 = 1 and becomes an anchor node. Then, it will broadcast its own location information. If the unknown node fails to complete the localization procedure, for example, no anchor node information is collected, or the anchor node information does not satisfy Mathematical Problems in Engineering requirement of the flowchart, then it will enter the waiting state and wait for receiving enough information to complete the localization procedure.
In a relatively sparse network, some unknown nodes may wait for a long time to find the anchor node information which can satisfy the requirement of the flowchart. Therefore, according to different applications, a waiting time threshold may be set. When reaching the waiting time threshold and the localization procedure is still not completed, the conditions in the flowchart can be properly relaxed. For example, the value of can be reduced. Of course this may reduce localization accuracy.

Simulation Setup.
Our experiments are run on various topologies of networks in Matlab R2017a. The 200 unknown nodes are placed randomly with a uniform distribution within a 1000 * 1000 square area. The anchor nodes are placed (a) randomly with a uniform distribution around the square area as shown in Figure 11(a), or (b) randomly with a uniform distribution within the square area as shown in Figure 11(b). In order to observe the influence of network average connectivity on localization accuracy, we change communication radius between 200 and 300. We assume that Δ ∼ (−1% , 1% ). The performance of different algorithms is compared using mean absolute error (MAE) of the location estimates. We also calculate error bar defined by standard deviation to compare the stability of the algorithms. The standard deviation is given as follows.
The performance of the proposed algorithm (EATL) is compared with that of the Maximum Likelihood-based (ML) localization algorithm and the minimum condition numberbased (FMMC) localization algorithm [6]. In ML approach, the unknown node to be located requires a minimum of seven neighbor anchor nodes. When comparing the performance of the three algorithms, we mainly do simulation on the network topology shown in Figure 11(a). For this type of network topology, the anchor nodes are placed around the network, the inside unknown nodes require more iterations to complete location which may result in greater iteration error. In many application scenarios, the anchor nodes can only be randomly placed around the network, such as the battlefield environment. It is more practical to simulate the network topology in Figure 11(a).

and ℎ ℎ
. We set = 75, = 200, and Δ ∼ (−2, 2). The network topology is shown in Figure 11(a). We change from 0.5 to 0.75 , sampling every 0.05 , sampling 6 times. Similarly, ℎ ℎ is changed from 0.6 to 0.8 , sampling every 0.05 , sampling 5 times. Thus, There are a total of 30 combinations of and ℎ ℎ . We do experiment to solve the MAE of every combination. The value of MAE varies with and ℎ ℎ is shown in Figure 12. From Figure 12, with the increase of and ℎ ℎ , that is, the constraints becoming more and more stringent, the MAE is generally on a downward trend. For example, when = 0.5 and ℎ ℎ = 0.6 , = 19.47. When = 0.6 and ℎ ℎ = 0.8 , = 3.62. However, as increases, the area where unknown nodes select anchor nodes continuously decreases, which lead to a long waiting time of the unknown nodes to be located. The entire network requires more iteration times. When = 0.7 , it takes more than 10 iteration times to complete the entire network localization. The increase in the number of iterations means the accumulation of iteration errors. From Figure 12, it can also be seen that when increases to 0.65 , continued increase of does not significantly reduce the MAE, but the entire network localization time increases significantly. After balancing the localization time and localization accuracy, in the following simulations, if there are no special instructions, we take = 0.6 and ℎ ℎ = 0.8 .

MAE Performance.
We set = 75 and = 200 to observe that MAE varies with iteration times of the three algorithms. The network topology is shown in Figure 11(a).
As shown in Figure 13, The MAE of all three algorithms tends to increase with the number of iterations, which is due to the accumulation of errors. According to the error bar of each iteration, the standard deviation of the proposed algorithm is smaller than that of ML and FMMC, and the proposed algorithm is more stable.
Overall, ML algorithm requires 4 iteration times to complete the localization and the MAE is 5.07. FMMC algorithm requires 5 iteration times and the MAE is 4.12. The proposed algorithm requires 6 iteration times and the MAE is 3.62. Compared with ML and FMMC, the MAE of proposed algorithm decreased by 28.6% and 12.1%, respectively.

Impact of Network Connectivity.
We set = 75 and the values of are 200, 225, 250, 275, and 300, respectively. The network topology is shown in Figure 11(a). Figure 14(a) shows that, with the increase of the communication radius of nodes, the network connectivity increases. From Figure 14(b), we can see that, under different communication radius, the MAE and standard deviation of the proposed algorithm are lower than that of ML algorithm and FMMC algorithm. This shows that the proposed algorithm has good stability and scalability.

Impact of Different Number of Anchor Nodes. We set
= 200 and the values of are 20, 25, 30, 35, 40, 45, and 50, respectively. The network topology is shown in Figure 11(b). From Figure 15, with the increase in the number of anchor nodes, the MAE and standard deviation are gradually reduced. When the number of anchor nodes exceeds 30, the increase in the number of anchor nodes has no obvious effect on improving the localization accuracy of the network. When the number of anchor nodes in the network is less than 20, the process takes too long and the localization accuracy is low.

Conclusion
This paper proposes an anchor selection algorithm based on error analysis, starting from an example of ill-conditioned linear equation to show that selecting the right anchor nodes combination will make a big difference in localization accuracy. The influence of distance measurement error on localization accuracy is discussed from two aspects: condition number of triangle formed by the three anchor nodes and the relative position between the unknown node and the three anchor nodes. Then an algorithm of selecting anchor nodes on a ring is proposed. The values of and ℎ ℎ are given through simulation experiments. Simulation also shows that the performance of the proposed algorithm in MAE and standard deviation are better than those of ML algorithm and FMMC algorithm.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare that they have no conflicts of interest. College) for his valuable comments during the revision of the paper.