A Survey on Wireless Transmitter Localization Using Signal Strength Measurements

Knowledge of deployed transmitters’ (Tx) locations in a wireless network improves many aspects of network management. Operators and building administrators are interested in locating unknown Txs for optimizing new Tx placement, detecting and removing unauthorized Txs, selecting the nearest Tx to offload traffic onto it, and constructing radio maps for indoor and outdoor navigation. This survey provides a comprehensive review of existing algorithms that estimate the location of a wireless Tx given a set of observations with the received signal strength indication. Algorithms that require the observations to be location-tagged are suitable for outdoor mapping or small-scale indoor mapping, while algorithms that allow most observations to be unlocated trade off some accuracy to enable large-scale crowdsourcing.This article presents empirical evaluation of the algorithms using numerical simulations and real-world Bluetooth Low Energy data.


Introduction
Locating the wireless transmitters (Tx) in the network provides mobile network operators with important and relevant information for a wide range of purposes, including finding rogue and nonfunctional access points (AP), planning and operating the communication networks, and estimating the radio frequency propagation properties of an area.Tx location determination is also used when constructing radio maps for localization services.
Every operator aims at providing good coverage so that subscribers in most locations can access the network.Competition between operators in providing the subscribers with continuous and uninterrupted data usage prompt them to find unknown Txs that mainly belong to their competitors.Based on the knowledge of the deployed Tx locations, operators decide optimal places for installing new infrastructure within the area or steering the beam directions.An unknown Tx can be a WLAN (wireless local area network) AP with unlicensed spectrum or a femtocell AP whose spectrum is licensed to the operator.These Txs may be managed by individuals or groups or the operator itself.
The operators offload users from their 3G or 4G cells to adjacent small cells or indoor femtocells when the traffic becomes heavy [1].Knowing Txs' locations and coverage areas helps the operators to identify which cells are nearby.
Locating unknown Txs helps the administrators secure the network when security loopholes are detected or when there are intruders that breach the area managed by the administrators [2].Also when administrators update their network infrastructure within an authorized area, a map of existing Tx locations helps to determine optimal locations for new Txs.
Moreover, knowledge of Tx locations assists navigation in environments where GNSS (Global Navigation Satellite System) navigation is not feasible such as indoors.Indoor navigation requires detailed knowledge of the network topology of the building, and unmanaged Txs can also be used provided that their locations are estimated.In many indoor localization studies [3][4][5], it is assumed that Tx locations are known a priori.This assumption is usually only valid for Txs that belong to the owner of the infrastructure.
This survey provides the reader with a comprehensive review on methods for locating wireless Txs using a set of measurements of the received signal strength (RSS).Most of the presented methods can be applied to different types of wireless networks, such as WLAN, Bluetooth Low Energy (BLE), and cellular networks.Figure 1 shows examples of an outdoor cellular base station and an indoor BLE Tx and RSS measurement sets collected in the respective areas.RSS information is available in reception reports of most wireless networks' receivers (Rx) without any special hardware or software modifications [6].
This article categorizes the methods based on two criteria: measurement type and reference location requirement.The measurement type can be the actual RSS from a Tx or just the connectivity, that is, whether the Tx can be sensed or not.Some of the reviewed methods rely on located measurements; that is, they assume that every observation includes accurate information about the location of the measurement.Some methods assume that most of the observations are unlocated, lacking the location information.The former are more accurate but are costly to implement, while the latter are especially suitable for crowdsourcing.We evaluate methods that use located measurements through numerical simulations and real-world BLE data.
The structure of this article is as follows: firstly, the methods are presented in detail, connectivity-only methods first in Section 2, then RSS based methods that use measurements with known locations in Section 3, and finally RSS based methods that do not require all measurements to be located in Section 4. Secondly, experimental results are presented in Section 5, along with a table that summarizes the basic practical properties of each method.Finally, Section 6 presents the conclusions.

Connectivity Based Methods with Located Observations
Connectivity based Tx localization algorithms assume that the closer one is to the Tx, the higher is the probability of observing the Tx when listening with a Rx device.The observations consist of tuples (p  , ID  ), where p  is the reference position of the th measurement and ID  is the set of Tx identifiers observed at the th measurement.Connectivity actually means that the RSS exceeds the receiver's sensitivity threshold [7].Thus, connectivity based methods in fact rely on a very coarsely quantized RSS.
The simplest connectivity based Tx localization algorithm is the (unweighted) centroid algorithm that was proposed for the localization of a wireless sensor network's nodes by Bulusu et al. [8].The centroid algorithm has also been proposed and tested at least in [9][10][11][12][13].The estimate of the location of the th Tx is the mean of the measurement locations where # is the number of elements in set ,  ∈  means that  is an element of the set ,  is the total number of measurements, and ⟦ ⋅ ⟧ is the indicator function The centroid location is the solution of the optimization problem because (3) can be expressed as the weighted least squares problem where w = ⟦ ∈ ID  ⟧ and  is the identity matrix, and (1) follows from (4) by the weighted linear least squares formula.Typically only the measurements where the th Tx has been observed are used in the estimation of m  ; that is, the information of not observing the th Tx in a measurement is omitted.In this sense, the centroid also has a probabilistic interpretation as the maximum likelihood solution to the measurement model where the positive-definite matrix  is a constant that does not affect the solution and P denotes probability.Koski et al. [12] also estimate the coverage area parameter matrix  for the purpose of online mobile Rx positioning.The algorithm of Piché [14] can be considered a robustified version of the centroid algorithm.This work considers the assumption that there are outlier measurements with, for example, erroneous reference position in the observation set by relying on the Student's -distribution that gives a higher probability for occasionally receiving the signal far from the Tx: where  is a constant that does not affect the solution and ] ∈ R + is a model parameter degrees of freedom, and the closer ] is to zero, the more robust the algorithm is.Based on the Student's  model [14], use an EM (expectation-maximization) algorithm to solve the maximum a posteriori values of the centroid and the coverage area matrix.Figure 2 shows a localization scenario where four out of the 50 measurements are outliers.In this scenario the robust centroid's Tx location estimates are significantly closer to the true location than the conventional centroid algorithm's.Notice that both [12,14] mainly concentrate on online positioning and do not explicitly assume that the Tx is actually located at m ; the estimate rather models the center point of Tx's coverage area.
The connectivity based methods are based on two assumptions: Tx's antenna is omnidirectional and measurements are collected uniformly in the whole reception area [8,10,19].As Bulusu et al. [8] point out, the performance of the centroid algorithm is highly dependent on the data.Some studies report [10,19] that the Tx position estimate will be biased towards areas with the highest measurement densities.A possible solution to this problem is to model the thoroughness of the data collection in each location, which would also introduce information on where the Tx is not hearable; this information has been used for mobile Rx localization [7].Another approach is gridding, clustering the observations in a regular grid so that each grid point represents all the measurements in its vicinity, which can partly mitigate the problem of uneven measurement distribution.Algorithms for detecting insufficient data collection and automatically proposing new measurement locations have been proposed [25].
The centroid method is straightforward to understand and implement.The basic centroid algorithm is computationally light and the robustified version is still computationally feasible for most purposes even though it is a constant factor heavier than the basic centroid.Furthermore, the centroid algorithms have a small number of tunable configuration parameters, which might be advantageous if there is little prior information on the Tx locations and signal propagation models, and there is no risk of overconfident RSS models.
One important property of a method is whether the Tx position estimate can be updated when new observations appear without needing to access all the old observations.For all the presented connectivity based methods, updateability can be achieved with a very low cost; only the point estimate and the number of samples used need to be stored in the database.

RSS Based Methods with Located Observations
This section reviews methods that estimate the Tx location using observations that consist of tuples (p  , ID  , r  ), where p  is the reference location, ID  is the list of Tx identifiers, and r  is the vector of corresponding RSSs.
The RSS is negatively correlated with the distance between the Tx and Rx.Attenuation of the signal strength (path loss, PL) is due to both free space propagation loss governed by the Friis equation and losses generated by various obstructions in the environment [26,Ch. 4].Accurate modeling of these obstructions is in most practical cases infeasible, so simplifying probabilistic models are commonly used.The conventional probabilistic model in both outdoor and indoor environments is the log-normal shadowing model [26,Ch. 4] where () is the RSS in dBm (dB referenced to milliwatt) at distance  from the Tx,  (0) is a reference distance, typically 1 m,  (0) fl ( (0) ) is the RSS at the reference distance,  is the PL exponent parameter, and   ∼ (0, 2 ) is a normally distributed shadowing term with variance  2 .The environment dependent PL parameters  and  2 are usually estimated for a certain environment or for a certain Tx based on data, while the transmission power dependent parameter  (0) can be assumed to be known based on the Tx properties [26,Ch. 4].A typical assumption is that the shadowing term   is a statistically independent random variable for each measurement, while another possible approach is to assume spatially correlated shadowing; see, for example, the Gaussian process based algorithm [27].For localization purposes, it is usually adequate to use distances in 2-dimensional Cartesian coordinates; in the localization of cellular base station tower, for example, Tx's altitude affects the distance in the proximity of the Tx, but as most data typically comes from farther distances, this effect can be neglected [16].
The signal shadowing consists of the so called small-scale and large-scale shadowing components, and typically the PL models model average of the both statistically, as accurate analysis of the multipath propagation patterns that cause the small-scale fading is not feasible in large systems.See further discussion in [28,Ch. 7.2].Currently most wireless communication networks transmit continuous waveforms, and optimization for impulse signals is out of the scope of this article.It should be noted that in case of most WLANs, for example, the mapping from the reported RSS indicator to the actual RSS in dBm is unknown.This problem can be circumvented, for example, by using RSS ratios [29] or RSS histogram [30].The Rx can have one or multiple antennas, and in the latter case the Rx device can either report all the measurements separately or combine them into a single RSS measurement.

Closed Form Solutions.
A commonly proposed closed form solution for Tx's position using RSS measurements is the weighted centroid algorithm that was proposed for the localization of the wireless sensor nodes by Blumenthal et al. [31].It has been proposed for WLAN Tx localization, for example, by [9,11].In the weighted centroid approach, the estimate of the th Tx's location is where  is a weighting function that depends on the RSS and   is the RSS of the signal transmitted by the th Tx and received at the location p  .Usually the weights are chosen so that the stronger the RSS, the greater the weight.The standard weighting methods are the distance based weighting [31] and the RSS based weighting [11] In both the weighting methods,  ∈ R + is a free parameter and  min is the signal detection threshold, that is, the lowest possible RSS.These weighting methods are compared for wireless node localization in [11] where it is found that the two methods have equal average performance.The weighted centroid location is the solution of the optimization problem because (11) can be expressed as the weighted least squares problem (4) by setting w = (  ), and (8) follows from (4) by the linear least squares formula.A corresponding probabilistic interpretation is that the weighting function value in the measurements where the th Tx is observed  follows the exponential distribution whose scale parameter is inversely proportional to the squared Tx-Rx distance.By the change of variables formula for PDFs (probability density functions) this gives where  ∈ R + is a constant that does not affect the solution.To obtain the objective function of (11) for the maximum likelihood solution, the Rx-Tx distance ‖p  − m  ‖ is only to appear in the exponent, so it is removed from the normalization constant by modeling the probability of the RSS exceeding the signal detection threshold  min to be inversely proportional to the squared Rx-Tx distance for distances exceeding a limit.The comments regarding the unweighted centroid are applicable to the weighted version as well, although modeling of the RSS somewhat reduces the sensitivity to uneven data density.Figure 3 shows a simulated example where Tx's actual location is not in the middle of its coverage area.In such a case the weighted centroid algorithm outperforms the unweighted centroid due to the RSS measurement information.
Another RSS based closed form solution is proposed by Koo and Cha [15].Earlier similar formulas have been proposed for wireless sensor networks in [32].The same formulas are used in [33] for distance measurement based wireless transmitter positioning without the estimation of the signal propagation parameter.Instead of the log-normal shadowing model (7) [15], use a different PL model where (x; , Σ) is the PDF of the (possibly multivariate) normal distribution with mean  and covariance matrix Σ evaluated at x, and   and   are the parameters of this nonlogarithmic PL model that are not directly related to the PL parameters  (0) and  in (7).(The notation is simplified from [15].)Thus, the distribution of two conditionally independent RSSs' difference is where Given the flat prior for the Tx position m  and the improper prior (1/  ) ∝ |1/  | −1 , the posterior of (m  , 1/  ) is thus given by the standard linear least squares (LLS) formulas where In this formula, each RSS measurement is used only once to avoid correlations between RSS differences.A strength of this LLS method is the existence of closed form formulas; the method thus has rather low and predictable computational cost, and convergence is not an issue.Addition of prior information on the Tx location is also straightforward.However, if the actual RSS follows the log-normal shadowing model (7), the approximation ( 14) can be crude.

Iterative Methods.
Maximizing the likelihood of the Tx position and possibly some model parameters  using the model ( 7) leads to the nonlinear least squares (LS) problem where is the model function of one measurement.This optimization problem can be solved using various nonlinear LS methods that are typically iterative algorithms [34].The general form of the nonlinear LS problem is where f is a known nonlinear function and ‖⋅‖ is the Euclidean norm.Many solution methods are based on differentiation, either on the first order derivative (gradient, Jacobian) such as the steepest descent and Gauss-Newton (GN) methods or on the second-order derivative (Hessian matrix) such as the Newton method [34].To the authors' knowledge, the secondorder information has not been used in problem (19) because of the difficulty of analytical differentiation.Given an initial point x0 , a GN iteration is where   is the Jacobian matrix of the function f evaluated in x .
The GN method has been applied to problem (19) in [16,17], for example.In this case the model function f is the function h whose th element is and the th row of its Jacobian matrix  is If the parameters  (0) and  are known for a certain environment, the corresponding columns can be left out from the matrix (24).The GN algorithm can sometimes diverge.A less divergence-prone GN version is the Levenberg-Marquardt (LM) algorithm used for Tx localization in [18].Alternatively, the divergence can be addressed by using an additional line search algorithm that ensures decrease of the objective function value as in [16].
As pointed out in [35], if the posterior covariance matrix is approximated by the covariance matrix of the linearized model.the estimate can be updated when new measurements are obtained.Including a Gaussian prior distribution (, Σ) for x keeps the problem as a nonlinear LS problem for which a GN iteration is [35] where  =  2 ⋅  is the measurement noise covariance matrix, and  2 is the shadowing variance in (7).This iteration enables approximative updating of the estimate without storing the old observations by using the covariance matrix update Notice that if there is enough knowledge of the PL parameters the Tx location estimate can be outside the observation area as illustrated by Figure 4.
The GN converges to a local minimum, so the choice of the initial point x0 is important.Proposed choices of x0 in Tx localization are the location of the strongest observation [16], the centroid of all observations [17], or the result of a gridtype algorithm, which is discussed in Section 3.3.A drawback of GN and LM is that if there are several separate areas of strong measurements, the computed Tx location estimate depends strongly on the initial point so that different strong areas are not compared.
Due to the assumption of normally distributed shadowing, the GN algorithm can be sensitive to outlier measurements, where the RSS differs significantly from the value predicted by the PL model.Outlier removal procedures for tackling this issue have been proposed at least in [36].

Monte Carlo and Grid
Methods.This section discusses methods that are based on explicit evaluation of the Tx locations PDF at several points of the location space.In grid methods prespecified evaluation points are used, while Monte Carlo (MC) algorithms are based on pseudorandom evaluation points.
Importance sampling is a basic form of MC sampling.Kim et al. [10] use a method where the MC samples of the location of one Tx m ()   are generated from a prespecified prior distribution and then given weights  ()   based on the training measurements and known PL parameters.Kim et al. do not explain their weighting method, but the formula based on the model ( 7) is The MC estimate of the posterior mean is then the weighted average of the samples where   is the number of MC samples.This method can also be called a particle filter with the static state model as in [10], since the weights can be updated recursively.A drawback is that importance sampling suffers from sample impoverishment in static state estimation [37,Ch. 3.4]: all weight will over time concentrate to a few samples and there will be little variability because of lack of dynamics.This problem can to some extent be overcome by using resampling techniques such as resample-move algorithm or Markov Chain Monte Carlo techniques [37,Ch. 3.4].
For some models a solution to sample impoverishment is Rao-Blackwellization [37,Ch. 3.4] that is proposed for simultaneous mobile Rx and static Tx localization by Bruno and Robertson [21].They do online Tx localization using a Rao-Blackwellized particle filter (RBPF) so that the training measurements' locations are obtained by inertial positioning.Thus, the distribution of the Rx locations is obtained by MC sampling.The distribution of each Tx location is approximated by a Gaussian mixture for each MC sample and each value of the PL parameter  (0) .The PL exponent  is assumed to be known.This gives a recursive algorithm for joint estimation of the Rx and Tx locations.This solution is suitable for cases where the locations of the training measurements are imprecise but form a time-series that can be filtered.
Han et al. [19] propose a grid method, where a plane  = (p) is fitted to the 3-dimensional position-RSS space (p,  :, ) for each grid point.The direction of gradient of  is then considered as an estimate of Tx's direction, and the Tx location estimate is defined as the point that minimizes the mean square error of the directions of the grid points.Han et al. use a dense grid method for the minimization, but they also suggest that more efficient optimization tools could be used.
Some authors exploit the fact that the PL parameters appear linearly in the measurement model given the Tx location.Thus, the PL parameters can be fitted analytically to each point of a set of candidate Tx locations.Shrestha et al. [20] make a linear least square fit of the PL parameters for every measurement assuming that the Tx is located in the considered measurement location.The Tx estimate is chosen to be the measurement location that minimizes the mean square error of the PL parameter fit.Dependence of the measurement density can be reduced by using a regular grid as the set of candidate points.This will make the algorithm more flexible but also increase the computational complexity.Achtzehn et al. [38] propose a genetic algorithm, but its details are left unexplained.If the PL parameters  (0) , , and  2 are assumed known, the Tx location's likelihood can simply be evaluated at each grid point [39].A grid can also assist the GN or LM algorithm so that each grid point gives an initial point to the iterative algorithm [18].
Grid algorithms can achieve arbitrary modeling accuracy, but the computational complexity will increase rapidly along with the state dimension and grid density.Furthermore, optimal values of critical parameters such as grid density and grid size may vary in different subregions in large-scale systems.

Tx Localization with Unlocated Observations
All methods presented this far rely on a set of measurements with reference locations assumed known accurately or as a probability distribution.However, this assumption is not always realistic especially in indoor environments, where accurate GNSS services are unavailable and manual entry of reference locations is too laborious especially for data collected by crowdsourcing.This section reviews algorithms where Txs' locations relative to other Txs is estimated using unlocated observations and the undirected graph created by connecting Txs that appear in a common observation.The basic assumption is that the more frequently two Txs are observed in the same measurement location, the closer to each other they are probably located.It is also possible to use the RSS: if two Txs' signals are strong in a location, the Txs are probably close to each other.The locations in global coordinates, that is, the correct scaling and rotation of the radiomap, are obtained by adding some measurements with reference locations: it is assumed that when a Tx is observed (with a high RSS) in a located measurement, the Tx is probably close to this measurement's location.The principle is illustrated in Figure 5. Koo and Cha [22] propose multidimensional scaling (MDS).The RSSs in a measurement with more than one Tx determine the dissimilarity between the observed Txs, and Building Unlocated meas.Located meas. the MDS finds the 2-dimensional Tx locations whose mutual distances best agree with the dissimilarity matrix.In [22] the dissimilarity of the mobile Rx and the Tx is defined to be a certain decreasing function of the RSS, and the dissimilarity of two Txs is the smallest sum of the Rx-Tx dissimilarities observed in the same training measurement.The dissimilarities of Txs that are not connected by a common measurement are determined through the other dissimilarities by using a graph construction.Since the dissimilarities are not simple functions of distance and contain noise, the Tx localization is a nonmetric MDS problem, for which iterative algorithms exist [40].If some reference locations are available, the relative MDS location estimates are transformed to global coordinates by an optimal scaling, rotation, and translation given by Procrustes analysis [22].A drawback of this algorithm is that if two Txs are located close to each other but the closest training measurement location is far from both, the Koo-Cha dissimilarity will overestimate the distance between the Txs, because the dissimilarity corresponds to the distance via the closest training measurement location.Furthermore, the most natural choice for the mapping from the RSS to dissimilarity would be the exponential relation derived from the log-normal shadowing model (7), which is different from the choice of [22].

Tx Connected
Raitoharju et al. [23] propose several algorithms that use unlocated data.Based on their tests, they recommend a closed form solution called access point least squares (APLS).The APLS is based on the model where the Txs  and  are observed in the same measurement, the Tx 's located measurements' mean location is p  , and  1 and  2 are constants whose values do not affect the solution if no prior distribution is used for the Tx locations.This results in a linear Gaussian measurement model whose solution is the standard linear least squares formula.Raitoharju et al. [23] also propose that the accuracy can be improved with the cost of increased running time by applying a Gauss-Newton method (22), where the log-normal shadowing model with fixed PL parameter values is used so that both Tx locations and mobile Rx locations are unknown.The GN algorithm is more accurate than the APLS due to modeling of RSS, but multimodality of the posterior distribution can cause convergence to nonglobal extrema [24].Chintalapudi et al. [24] present a method that relies on a genetic algorithm for finding initial points for iterative optimization methods.In the first phase, all initial points are generated randomly; genetic algorithms are thus Monte Carlo algorithms.The initial points are then treated in a manner that depends on the objective function value (fitness) of the local maxima given by the iterative optimization method for each initial point.The initial points with high fitness are retained, while the initial points with low fitness are replaced by generating new values, added random noise, or mixed by random convex combinations.This cycle is iterated until the solution stops improving.Chintalapudi et al. estimate the mobile user location p  , the Tx location m  , and the PL parameters  (0) and  jointly for each th measurement and th Tx.Chintalapudi et al. use a fitness function that is based on the mean absolute error, but the standard least squares approach of (19) can also be used for more standard modeling and a wider range of optimization methods.The genetic algorithm is capable of finding the global maximum with a much higher probability than a single gradient descent algorithm.The disadvantage is the increased computational burden.Chintalapudi et al. discuss criteria to select a subset of Txs and training data so that computational requirements are somewhat reduced without losing accuracy significantly.

Tests
5.1.Simulations.We implemented 11 Rx localization methods with Matlab.We simulated 100 Txs with 250 measurements for each.We generated the measurement points from bivariate normal distributions whose covariance matrices were generated separately for each Tx from the Wishart distribution with three degrees of freedom ((20 m) 2 ⋅ , 3).Each measurement point was then assigned a RSS value generated from the distribution that is, the used PL parameters are  (0) = −70,  = 2, and  = 6, which are approximately in line with the values  (0) = −70.39, = 1.32, and  = 5.85 given in [41].Each Tx localization method that uses measurements with known reference locations is then applied to each measurement set.The parameter values used in the tests were the following: In the robust centroid algorithm the number of EM iterations was five, and the number of degrees of freedom ] = 4.In the weighted centroid, we set  min = −120 dBm.We optimized the parameter  with a Monte Carlo simulation using 10.000 replications, and the median Tx positioning error as a function of  is shown in Figure 6.Based on this, we set the parameter value to 0.07 for distance based and 5 for RSS based weighted centroid.The GN iteration was terminated when change in the Tx location between two successive iterations was less than 1 mm or after 1000 iterations.The importance sampling used 5000 Monte Carlo samples.In the RSS gradient method the gradients were fitted for each point of the regular grid with 1-meter spacing so that the grid squares that did not have any measurements were removed.The window size of the gradient fitting was chosen according to the advice given in [19]: the window size was increased until at least 30% of the grid points had at least three measurements to fit the gradient.The grid-point-wise PL parameter fitting method was based on a regular grid using 0.75-meter spacing and the square around the strongest RSS measurement with side length 60 m.
The Tx localization error distributions are illustrated in Figure 7.In these boxplots, the asterisks show the maximum and minimum error of the method, and the box levels are 5%, 25%, 50%, 75%, and 95% error quantiles.In the left subplot, the measurement locations are generated from the bivariate normal distributions.In the right subplot, the measurements whose east coordinates are greater than those of the Tx are removed; this test is done to study the robustness of the methods to training data distributions that are not symmetric with respect to the Tx location.Some of the algorithms can be given prior information on the PL parameters.Note that this kind of prior information is not always available in realworld scenarios.The red boxes in Figure 7 show the error distributions when the PL parameters are given the prior ∼  (2, 0.5 2 ) .
With the importance sampling method, estimation without prior means using a prior with a large variance.Figure 7: Tx localization error distributions with simulated data.On the left, the measurements come from a point-symmetric bivariate normal distribution, while on the right, the measurements east of the Tx are removed.The red boxes correspond to methods that use prior information of the PL parameters.
Figure 7 shows that when the measurement data distribution is point-symmetric, Gauss-Newton (GN) and gridpoint-wise fit (grid-fit) are the most accurate methods.The importance sampling method is very close in accuracy and it has flexibility, for example, for extensions to non-Gaussian models, but it requires a good prior distribution to produce an efficient importance distribution.The accuracy of the measurement point-wise fit (meas-fit) is limited by the measurement point density and whether the measured area covers the true Tx location.The gradient method performs well with point-symmetric measurement sets, but suffers dramatically from removing the measurements of an area.The reason for this can be that the method is based solely on the measurement geometry; it does not use the logarithmic shape of the propagation model.That is, in the west-east direction there will mainly be arrows pointing to east, and this can deteriorate the accuracy in west-east direction.The linear least square (LLS) method of [15] suffers from approximating the logarithmic PL model with a linear one; the method seems to fit the linear PL model overweighting weak RSSs that are the majority, and therefore the RSS peak location estimation is biased.
The centroid algorithms that do not use RSSs perform well in accuracy with point-symmetric data distributions.The error is typically slightly higher than that of the GN, but the overall performances can be regarded as competitive considering the simplicity and computational ease of the centroid methods.The centroid methods are robust against deviations from the logarithmic PL model, but especially the nonweighted centroids are sensitive to asymmetric data sets.However, the weighted centroid still has accuracy slightly lower but comparable with that of the GN.Robust centroid is less accurate than the distance-weighted centroid, but slightly more accurate than the nonweighted centroid due to non-Gaussian coverage area.In some cases the distribution of RSS is not a function of the distance only, but there can, for example, be several RSS peaks, that is, areas governed by strong RSS measurements.These can be due to uneven terrain topology, reflective building materials, or unmapped strong RSS areas, for example.Figure 8 shows the Tx localization error distributions when 20% of the training measurements are generated from a normal distribution ( * , 5 2 ⋅ ), where  * is a random point close to the true Tx location.For each measurement point we then generated the RSS  1 from the model (30) and the RSS  2 from the same model using  * as the Tx location.We then set the actual RSS measurement to 0.7 ⋅  1 + 0.3 ⋅  2 .Figure 8 shows that the methods that perform best in the unimodal RSS distribution's case, that is, weighted centroid and GN, have some large Tx localization errors with bimodal RSS distribution.Weighted centroid and GN tend to choose one RSS peak, the weighted centroid based on the strongest measurements, and the GN solution based on the initial guess  given to the algorithm.Centroid, importance sampling, and point-wise fitting methods give more weight to the whole RSS distribution and do not converge into nonglobal local extrema.Thus, the weighted centroid and GN have median accuracy close to the other methods, but they may require some heuristics to cope with cases with multiple RSS peaks.

Real Bluetooth Low Energy Data.
We installed 82 Bluetooth Low Energy (BLE) Txs in a building in the campus of Tampere University of Technology.The ground truths of the the Tx locations were measured relative to some map objects using a measurement tape.Furthermore, we collected measurements of the received BLE signal strengths using an Android-run Samsung tablet device.The true location related to each RSS measurement was obtained manually by clicking an indoor map figure at each turn and interpolating between the turns.Floor estimation was assumed perfect, so only training data collected in the true floor of each Tx was used.The locations of the Txs and the training measurements are shown in Figure 9.
Figure 10 shows the Tx localization error distributions for the real data test.The results mostly resemble those of the simulation results with non-point-symmetric measurement point distribution in Section 5.1.The root-mean-square errors (RMSE) of the methods are given in Table 1.

Concluding Remarks
This paper reviews and tests mathematical models and methods for wireless transmitter localization based on received signal strength information.Empirical comparisons results using simulated and real-world data are provided.The key features of each presented method are summarized in Table 2.Note that the column accuracy refers to how accurately the method can be adapted to the assumed signal model, such as the path loss model; the real-world localization error can depend on the details of the scenario.Updateability means that an algorithm for recursive updating without storing the entire training database has been proposed.The methods can be categorized based on what information they use: RSS or only connectivity, with or without known reference position.The methods that require reference positions are suitable for so called wardriving, that is, outdoor network surveying where GNSS provides reference positions, or for small-scale indoor mapping.The unlocated methods trade off some accuracy to enable large-scale crowdsourcing even in GNSS-less environments.Computational efficiency and ease of updating the estimate without storing large training databases are crucial in large-scale applications.

Figure 1 :
Figure 1: An outdoor cellular base station (a) and an indoor BLE Tx (b) with RSS measurement sets.Red color indicates a strong RSS.

Figure 3 :
Figure 3: When the Tx is not located in the middle of its coverage area, RSS based algorithms such as the distance-weighted centroid can outperform the unweighted centroid.

Figure 4 :
Figure 4: Centroid algorithms can suffer from biased sampling more than PL model based nonlinear methods such as the GN.The colored dots are measurement locations in a 2-dimensional map.A simulated example.

Figure 5 :
Figure 5: Unlocated measurements connect several Txs and thus give information on how close each Tx is to other Txs.Located measurements give information on the observed Txs' locations in global coordinates.Combining both measurement types gives an optimal estimate of the Tx location coordinates.

Figure 6 :
Figure 6: Optimization of the weighted centroid's parameter  for distance based (black) and RSS based (grey) weighted centroid algorithms.The values chosen for the further tests were 0.07 for distance based and 5 for RSS based algorithm.

Figure 9 :
Figure 9: Measurement and Tx locations in the test data set.

Figure 10 :
Figure 10: Tx localization error distributions with real BLE data.

Table 1 :
Tx localization RMSEs with real BLE data.