Mean Shift-Based Multisource Localization Method in Wireless Binary Sensor Network

Source localization is one of the major research contents in the localization research of wireless sensor networks, which has attracted considerable attention for a long period. In recent years, the wireless binary sensor network (WBSN) has been widely used for source localization due to its high energy efficiency. A novel method which is based on WBSN for multiple source localization is presented in this paper. Firstly, the Neyman-Pearson criterion-based sensing model which takes into account the false alarms is utilized to identify the alarmed nodes. Secondly, the mean shift and hierarchical clustering method are performed on the alarmed nodes to obtain the cluster centers as the initial locations of signal sources. Finally, some voting matrices which can improve the localization accuracy are constructed to decide the location of each acoustic source. The simulation results demonstrate that the proposed method can provide a desirable performance superior to some traditional methods in accuracy and efficiency.


Introduction
Wireless sensor network (WSN) is a novel distribution selforganization data acquisition network, which integrates wireless communication, data capture, and information processing [1]. WSN mainly consists of a large number of stationary or mobile sensor nodes which are used to form a network in a self-organized or multihop manner [2]. Due to the flexible, inexpensive, and effective performance, it plays a significant role in industrial applications and civil applications [3][4][5][6]. In the recent period, the WSN has become the focus of academic researchers and industry circles. Location information is of great significance in quantities of fields, such as intelligent house system, mobile localization services, and forest fire monitoring. The utilization of WSN for position estimation has become an important application of WSN [7].
Recently, source localization has been studied extensively and solved in various ways [8,9]. According to different measurement approaches, these methods can be mostly categorized into four types: angle of arrival-(AOA-) based ones [10], time of arrival-(TOA-) based ones [11], time difference of arrival-(TDOA-) based ones [12,13], and energy-based ones [14] [15]. In the AOA-based methods, the array antenna is required to estimate the angles between the signal sources and the nodes. Both TOA-based methods and TDOA-based methods depend on the high precision clock to obtain the accurate measurements. Thus, it can be seen that these three types of methods are all regarded as the expensive approaches with the disadvantages of high hardware configuration [16,17]. The energy-based methods use the measurements of received signal strength and the energy decay model to estimate the source location. In contrast, this type of methods only needs low hardware configuration which can be easily realized in practical application. Therefore, they have been considered as the attractive methods and received research's considerable attention. In this paper, we focus on the energy-based source localization methods.
In order to solve the source localization problem, numerous energy-based source localization methods have been proposed. The source localization problems are usually formulated as the maximum likelihood (ML) problems to be solved with some optimization algorithms. In [18], Sheng and Hu firstly presented a model that the signal energy attenuation was regarded as a function of source-to-sensor distances. According to the model, the acoustic source localization problem was formulated as a maximum likelihood estimation problem. The multiresolution search algorithm and expectationmaximization (EM) iterative algorithm were employed to solve such a nonlinear optimization problem. The Cramér-Rao Bound (CRB) of location estimation was utilized to analyze the influence of sensor placement on the accuracy of the source location estimation. In [19], an optimization to ML (OML) algorithm was proposed, which can provide superior estimation performance compared with the traditional ML methods. In [20], an alternating projection (AP) approach was presented to solve the ML estimation problem with the advantage of lower computational complexity. In [21], Dranka and Coelho designed an effective error estimate model to estimate the source localization by taking into account the relationship between the sources and noise samples. In [22], Lu et al. formulated the energy-based multiplesource localization problem as a ML estimation problem. Two algorithms, alternating projection and expectation-maximization, were introduced to solve such a localization problem of multiple sources. However, these energy-based methods rely on the precise measured information between the signal sources and location-known nodes to estimate the source location. All nodes are needed to carry out the complex calculation to obtain the accurate distance measurements. A large number of measured data from sensors are transmitted to the fusion center to finish the final source localization estimation. This process results in high computational complexity and much energy cost. Moreover, when the communication is constrained in the network, these operations are hardly to be realized [23]. In order to overcome this problem, the binary sensors have been widely used in the WSN. They always make a binary decision by sensing the presence of the signals or not according to the different measurements. Unlike the other sensors, the binary sensor only sent its ID to the fusion center when the signal is sensed. They will remain silent if there is no signal sensed. They have been a desirable solution under the condition of the communication and energy constrained in the network.
In order to achieve the high localization accuracy, a large number of source localization methods have been presented for wireless binary sensor network (WBSN). The previous relevant works mainly focus on the solution of a single source localization problem. In [24], the authors employed the ML method to estimate a source location in WBSN. In [25], a subtract on negative add on positive (SNAP) algorithm is presented which can achieve almost the same localization performance as the ML estimator with lower computational complexity. In [26], according to the spatial topology of WBSN, the authors design an effective wake-up scheme which can activate a series of nodes to collaborate on estimating the location of the source. In recent years, the multiple source localization received the researcher's considerable attention and more and more methods for solving the multisource localization problem have been proposed. An improved version of SNAP algorithm (ISNAP) [27] is proposed for estimating the locations' multiple sources. A Fuzzy C-Mean-(FCM-) based multisource localization approach [28] is presented. In this method, the FCM algorithm is firstly utilized to estimate the initial locations of multiple sources. And then, a likelihood matrix is constructed to improve the localization accuracy for each source. In [29], the authors formulate the multisource localization problem as the ML problem and employ a self-adaptive practical swarm optimization to solve. In [30], Wang et al. use the affinity propagation (AP) algorithm to gain the cluster centers of the alarmed nodes. Then, these centers are merged as the final estimated locations of multiple sources. However, the design of these methods appears downright ideal that the false alarms are not fully considered, which may cause the undesirable localization results.
In this paper, we investigate the multiple source localization problem in WBSN based on the mean shift [31] cluster analysis algorithm. Firstly, the Neyman-Pearson criterionbased sensing model [32] is applied to make the judgement if the node is alarmed or not. Then, we employed the mean shift and hierarchical clustering algorithm to obtain the cluster centers of these alarmed nodes as the initial locations of signal sources. Finally, some voting matrices which can be regarded as the decision schemes are conducted to estimate the final location of each signal source. Simulation results demonstrate that the proposed method can achieve the desirable localization results.
The paper is structured as follows. In Section 2, the system model, the sensing model, and the mean shift method are introduced. Section 3 describes the proposed method. Simulation results are given in Section 4. Finally, the paper is concluded in Section 5.

Energy Attenuation Model.
In this part, we introduce the acoustic energy attenuation model for the multiple source localization in WBSN. This model is constructed based on the following assumptions.
(1) There are N acoustic sensor nodes with known coordinates ðx n , y n Þn = 1, ⋯, N and K acoustic signal sources with unknown coordinates ðx k , y k Þk = 1, ⋯, K in a region. These acoustic sensor nodes and acoustic signal sources are uniformly deployed. The acoustic signal is emitted by each source and received by some sensor nodes with the known energy intensity as the prior information (2) The acoustic signal propagation from every acoustic source is consistent in all directions (3) The energy intensity from each acoustic source is inversely proportional to the distance between this signal source and sensor nodes The strength of the k-th acoustic source measured at 1 m away is expressed as I k ðtÞ, ðk ∈ KÞ. The sensor node n receives the signal from all acoustic sources, and the relevant signal strength is denoted as Z n during time 2 Journal of Sensors interval t. The acoustic signal attenuation model [28] [29] can be formulated as where S n is the signal strength received by the n-th sensor node from all acoustic sources.
is the distance between the sensor node n and the acoustic source k. ν n ðtÞ denotes the measurement noise which is modeled as additive white Gaussian noise with zero mean, i.e., ν n ðtÞ~ð0, φ 2 n Þ. The parameter ξ ∈ R + is the environment factor which can be determined according to the practical environment.
In practice, the expected measurement can be achieved by calculating the mean of all energy measurements in a fixed time interval T = M/f s . The average energy y n ðtÞ measured during the fixed time interval ½t − T/2, t + T/2 can be modeled as where M stands for the number of sample points utilized for estimating the acoustic energy intensity and f s stands for the sample frequency.
Generally, in small-scale applications, the intensity of the acoustic signal and energy emitted from every source are assumed to be stable during a short time interval. Therefore, the signal propagation delay does not need to be considered. A more concise acoustic energy model is in the following form: where H 1 indicates that the sensor node can receive signal from sources, H 0 indicates that there is no signal. g n is the gain factor of the n-th sensor node. In this paper, we set g n = 1. ε n ðtÞ is the measurement noise that obeys Gaussian distribution with zero mean and variance σ 2 n , ε n ðtÞ~ð0, σ 2 n Þ. 2.2. Neyman-Pearson Model. When the sensor senses the acoustic signal, the sensor will alarm with a high probability if the signal source is within the sensing region of the sensor node. Similarly, if the signal source is outside the sensing region of the sensor node, such a sensor node will remain silent. Therefore, the sensing model which reflects the sensing characteristics of the sensors plays a significant role in WBSN. Many sensing models have been proposed, among which the disk model is one of the most commonly used models with the advantages of analytical simplicity. The disk model is a binary sensing model which assumes the sensing region of a sensor is a circular area centered at it. The signal within the sensing radius of a sensor is sensed with probability 1 while the signal outside this circle of influence is not sensed with probability 0. The sensing probability of a signal source i by sensor n can be defined by where R denotes a sensor's sensing radius.
However, the disk model has certain limitations in the practical application due to its unrealistic assumption. Compared with the disk model, the Elfes model can represent the relationship between the signal attenuation and the sensor's sensing capability. The sensing probability of a signal source i by sensor n can be expressed as where d T1 , d T2 , λ, and β are the parameters associated with physical properties of the sensor. d i,n is the Euclidean distance between signal source i and sensor n.
In this paper, we utilize a sensing model based on the Neyman-Pearson criterion [29] to determine whether the sensor node is alarm or not. This sensing model takes into account the false alarm rate and signal characteristics which is more realistic than the disk model and the Elfes model. According equation (3), the received intensity of the acoustic signal y n at n-th sensor node is as follows: The probability density functions of y n under these two conditions H 1 and H 0 are defined as follows: , According to the Neyman-Pearson criterion, the likelihood ratio can be given by We set the parameter η as a threshold. If the likelihood ratio Λðy n Þ ≥ η, the condition H 1 is accepted. Otherwise, the condition is H 0 . Therefore, according to equation (8), we can obtain The above formulas can be summarized in the following form: Hence, the following equations can be obtained: Let σ 2 1 = σ 2 n ð∑ K k=1 S n,k /d 2 n,k Þ 2 . The false alarm rate can be defined by where Φð•Þ stands for the cumulative distribution function of the standard normal distribution. The sensing probability P D is expressed as It is assumed that the false alarm rate P F is equal to α. Therefore, we can obtain the following equations: Finally, the joint sensing probability can be defined by Figure 1 illustrates the variation of sensing probability of the sensor node P D with the distance for different false alarm rate α. It is obvious that the sensing probability P D varies between 0 and 1. When the value of P D approximates to 1, it means this sensor node is nearby the sources. The degradation of a sensor's sensing probability gradually occurs as the distance between the sensor and the source increases. Given a fixed value of distance, it can be seen that the higher the false alarm rate, the higher the detection probability of the sensor node.

Mean Shift Method.
The mean shift method is one of the most classical clustering techniques, which has wide applications due to its advantages of effectiveness and practicability. In this paper, it is utilized to cluster these alarmed nodes and the obtained cluster centers are regarded as the initial locations of signal sources. It is assumed that there are S alarmed nodes with the known coordinates γ n = ðx n , y n Þ T , n = ð1, ⋯, SÞ. For a random initial position γ, the weighted mean of the positions γ n can be obtained as follows: where NðγÞ denotes the neighborhood of the initial position γ. K M ðxÞ denotes the kernel function which is nonnegative in the following forms: The kernel function plays a significant role in the process of reestimating the mean by assigning the weights of the neighborhood data. The final weighted mean estimations can be obtained through carrying out the iterative computations of VðγÞ until it meets the convergence condition. In addition, the random initial estimates need to be set appropriated according to the practical application in order to gain the desirable results.

Proposed Method
In WBSN, the number of the sensor nodes and signal sources, the coordinates of the sensor nodes, and the received energy intensity for each sensor node are the important prior information. Each sensor node can be influenced by one or more signal sources at the same time. Generally, it has a higher 4 Journal of Sensors alarm probability when it is close to the signal sources. Figure 2 shows the relationship between the alarm nodes and the signal sources. In this figure, S 1 and S 2 stand for the signal sources, the solid and hollow circles stand for the alarmed and nonalarmed nodes, respectively. Obviously, the alarmed nodes N 1 and N 3 are strongly influenced by the signal sources S 1 and S 2 , respectively. Both sources have strong influence on the alarmed node N 2 simultaneously. The node N 4 is wrongly alarmed by the effect of some factors such as the measurement noise and the hardware problem. The false alarms have been a serious challenge for multiple source localization. In order to realize the accurate source location, the criterion is needed to be constructed to determine if a node is alarmed or not. In some previous works, such a binary decision depends on the received energy intensity of each sensor node. If the received energy intensity is above a threshold, this sensor node is regarded as an alarmed node. Otherwise, it belongs to the nonalarmed node. However, the false alarms cannot be effectively restraint based on this criterion [27]. In this paper, we utilize a Neyman-Pearson criterion-based sensing model which takes into account the false alarm rate to design the criterion. It can be defined by where ϕ stands for a constant threshold. A sensor node can be seen as an alarmed node when its sensing probability P D is above the threshold ϕ. The false alarm rate α needs be tuned to restrain the false alarms according to the practical environment.
The alarm nodes transmit their IDs and location information to the fusion center through communication. Then, the fusion center uses this information to calculate the source location. Though the alarm nodes have been decided, it is unknown which source the alarmed nodes belong to. In this paper, we employ the mean shift method to cluster these alarmed nodes to describe the relationship between them. The mean shift method decides the cluster centers according to the density of the alarmed nodes. The false alarmed nodes are usually far away from the areas in which there are numerous normal alarmed nodes. Therefore, the mean shift method can overcome the influence of the false alarms effectively. We assumed that the number of the alarmed nodes is S and the coordinates of these alarmed nodes are γ n = ðx n , y n Þ T , n = ð1, ⋯, SÞ. The weighted means of the positions γ n with the corresponding l initial estimates γ s = ðx s , y s Þ T , s = ð1, ⋯, lÞ , can be obtained as follows: Finally, there are q cluster centers Vðγ s Þ obtained through an iterative process. Since the number of cluster centers is uncertain for the mean shift method, we adjust some parameters appropriately to ensure that the number of the cluster centers q is larger than or equal to the known number of sources K. If q is larger than K, we employ the hierarchical clustering algorithm [23] to merge the cluster centers until they have the same number. Otherwise, there is no operation carried out. The new cluster centers V ′ ðγ s Þ are regarded as the initial locations of signal sources for subsequent processing. The subordination degree between the n-th alarmed node and the k-th cluster can be obtained as follows:  Figure 1: The sensing probability of nodes for different false alarm rates. Figure 2: The relationship between the alarm nodes and the signal sources.

Journal of Sensors
where kγ n − V k ′ k 2 stands for the distance between the n -th alarmed node to the k-th cluster center. Hence, there is ∑ K k=1 μðn, kÞ = 1 for the n-th alarmed node. In order to further reduce the influence of the false alarms on localization accuracy, we construct a voting scheme to improve the initial location results for each signal source. This voting scheme consists of the following steps.
Step 1. The sensing area is divided into a grid ψ with G × G cells and the grid resolution τ. For example, a 200 × 200 square area with G = 20 and a grid resolution τ is 10. The resolution is needed to be set properly to avoid the high computational complexity. Each cell can be regarded as a point. We define Cði, jÞi, j = 1, ⋯, W , as the centers of these cells in a matrix form. Based on these definitions, the G × G voting matrices V m k k = 1, ⋯, K for each signal source can be designed in the following step.
Step 2. All K voting matrices are all initialized to the zero matrices. For the k-th signal source, its voting matrix can be defined by where b n ði, jÞ is used to measure the probability that the signal source locates in the cell's center Cði, jÞ . It is defined in the following form.
where kCði, jÞ − γ n k stands for the distance between the n-th sensor node and the centers of the cells Cði, jÞ. R n stands for the sensor's sensing radius. If the center Cði, jÞ locates within the sensing range of many alarmed nodes, it will be the location of the signal source with a high probability. Figure 3 shows an example of the voting matrix. There are three alarmed nodes with the same square sensing radius. It is obvious that when the condition P D ≥ ϕ and kCði, jÞ − γ n k ≤ R n is met, the μðn, kÞ values of the alarm nodes are added the corresponding elements of the voting matrix V m k for the k-th signal source.
Step 3. We can obtain one or more centers Cði max, y maxÞ corresponding to the elements of the voting matrix V m k which have the maximum values. Let φðx, yÞ denote the positions of centers Cðik max , yk max Þ. The average of φðx, yÞ and the cluster center V k ′ is regarded as the final estimated location of the k-th signal source.
Step 4. The same strategy is carried out until all K source localization results are obtained.

Performance Evaluation
In this section, we evaluate the proposed algorithm through simulation experiments. The proposed mean shift-based multiple source localization (MS-MSL) algorithm is compared with the self-adaptive particle swarm optimization method (SAPSO) algorithm [26] and the AP-based multiple source localization (AP-MSL) algorithm [27] to test its performance. We assume that there is a WBSN monitoring area with 100 m × 100 m. N sensor nodes and K signal sources are randomly deployed in this area. All default parameters in the following simulation are shown in Table 1. The simulation results of these three algorithms are all obtained through Monte Carlo experiments. The average location error is considered to evaluate the performance of localization of these algorithms, which is defined as follows: where ðx ki ,ŷ ki Þ indicates the estimated position of the k-th signal source. ðx k , y k Þ denotes the real location of the k-th source. Figure 4 shows the impact of the false alarm rate α on the average location error. As shown in Figure 4, the localization accuracy of the SAPSO algorithm and the AP-MSL algorithm is both similar to the proposed algorithm when the α is low but drops sharply with the increase of α. The SAPSO algorithm and the AP-MSL algorithm are sensitive to the false alarms. Likewise, the average location error of all the three localization algorithms will rise sharply when α increases. In our localization algorithm, the false alarms are sufficiently considered. Hence, the proposed MS-MSL algorithm performs better than the other algorithms in terms of location accuracy.
The average location error varies with standard deviations of measure noise as shown in Figure 5. The average location errors rise sharply as the standard deviation of  Journal of Sensors measure noise σ n increases. Compared with the other two algorithms, our proposed MS-MSL algorithm has the lowest localization error under the same standard deviation. We discuss the relationship between signal source energy S k and average location error. Results shown in Figure 6 demonstrate that S k has a great impact on the average location error. The average location errors of these three algorithms all raise sharply when the parameter S k increases. The reason is that more and more false alarms occur by the energy sources as S k increases, which leads to the high average location error. The proposed algorithm always has the best performance than the other two algorithms. Figure 7 shows the performance of the localization algorithm as the number of sensor nodes increases in the area. In this figure, we can observe obviously that the number of nodes has an important impact on these three positioning algorithms. These three algorithms achieve higher location accuracy as the number of nodes increases. The SAPSO algorithm has the worst performance. The localization accuracy of these three algorithms can be greatly increased as the number of nodes increases. The MS-MSL algorithm and the AP-MSL algorithm have similar location performance when a large number of nodes are deployed. By contrast, the proposed MS-MSL algorithm outperforms the other two algorithms which always have the highest location accuracy.

Conclusions
In this paper, we presented a novel method for multiple source localization in wireless binary sensor network. Firstly, we utilize a Neyman-Pearson criterion-based sensing model which takes into account the false alarm to decide the alarmed nodes. Secondly, mean shift algorithm is adopted to estimate the cluster centers of the alarmed nodes. Thirdly, the hierarchical clustering algorithm is employed to merge

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare no conflict of interest.