Recent years have witnessed a growing interest in using WLAN fingerprint-based methods for the indoor localization system because of their cost-effectiveness and availability compared to other localization systems. In this system, the received signal strength (RSS) values are measured as the fingerprint from the access points (AP) at each reference point (RP) in the offline phase. However, signal strength variations across diverse devices become a major problem in this system, especially in the crowdsourcing-based localization system. In this paper, the device diversity problem and the adverse effects caused by this problem are analyzed firstly. Then, the intrinsic relationship between different RSS values collected by different devices is mined by the linear regression (LR) algorithm. Based on the analysis, the LR algorithm is proposed to create a unique radio map in the offline phase and precisely estimate the user’s location in the online phase. After applying the LR algorithm in the crowdsourcing systems, the device diversity problem is solved effectively. Finally, we verify the LR algorithm using the theoretical study of the probability of error detection. Experimental results in a typical office building show that the proposed method results in a higher reliability and localization accuracy.

In recent years, people’s daily life is becoming more and more convenient owing to the development of “5G”, smart cities, and Internet of Things [

In outdoor environments, the Global Navigation Satellite System (GNSS) is the most prominent positioning technology and provides precise positioning. It has made a great success and brought great convenience to people’s lives [

Since the RADAR system is proposed in [

Typically, The RSS fingerprint-based WLAN indoor localization system contains two phases: the offline phase and the online phase [

A conclusion can be drawn from the structure of the fingerprinting localization systems that RSS values are the foundation of realizing positioning. In most of the existing experimental systems, researchers build the radio map using a mobile device in the offline phase, and the user’s location is computed using the same device in the online phase. However, in the actual localization system, this assumption is generally invalid, and the mobile devices used by different users are different in both the offline phase and the online phase. In the offline phase, aiming to reduce the labor and time costs of radio map construction, the crowdsourcing method has been proposed in the indoor localization domain, which brings a variety of distinct mobile devices [

Although the establishment of a radio map for each device can obtain the highest positioning accuracy, this method is not practicable obviously as a result of numerous devices. In [

The main contributions of this paper are as follows.

The linear relationship between the RSS data collected by different devices is proved. We obtain the RSS points by comparing the RSS vectors collected by different devices, and the slope and the intercept of the straight line determined by any two points are calculated. Since all the slopes and intercepts are equal, all the RSS points are on the same line. Therefore, the relationship of RSS data collected by different devices is linear

The fast least trimmed squares (FAST-LTS) algorithm is proposed to eliminate the device diversity problem. Since the relationship between RSS values collected by different devices is linear, using the linear regression algorithm, all the RSS values can be mapped into the same signal space. Because the outliers appear in the collected RSS values frequently and seriously affect the performance of the linear least squares (LLS) algorithm, the FAST-LTS algorithm is used in this paper. Simulation results verify the effectiveness of the proposed algorithm, and all the RSS data are mapped into the same signal space

We derived the probability of error detection of all fingerprints in the radio map. By deducing the formula, we can obtain that the probability of error detection is greater when the two fingerprints are closer. Hence, these fingerprints in the set of candidate nearest neighbor fingerprints contribute most of the error and need to be dealt with carefully

The rest of the paper is organized as follows. The related works are discussed in Section

To handle the device diversity problem, the establishment of a radio map for each device can obtain the highest positioning accuracy. However, this method is not practicable obviously as a result of numerous devices [

The device diversity problem was first discussed in [

Since collecting labeled RSS values for each mobile devices is a labor-intensive and time-consuming process, a semisupervised method is proposed in [

A latent feature space and a regression function are learned at the beginning, and then the signal spaces of all devices are mapped to the latent feature space by the regression function. Accordingly, the differences between different devices have been significantly reduced, and the positioning accuracy has been greatly improved.

In [

Kjærgaard et al. utilized hyperbolic location fingerprinting (HLF) to solve the device diversity problem [

In [

In [

In [

For the RSS fingerprint-based localization method, the localization accuracy greatly depends on the mapping relation between the fingerprint and its corresponding coordinates stored in the radio map. In the offline phase, the localization area is divided into a discrete grid with

Typical WLAN indoor localization system.

In the online phase, the user collects the RSS value

In the actual localization system, the mobile devices used by different users are distinct from each other. Since the WLAN signal receivers with different performance are equipped with the different mobile devices, so the different mobile devices may have different signal sensing capacities and yield different RSS values. To illustrate, we used five different mobile devices to collect RSS values from a single AP at a particular location and plotted the histogram in Figure

RSS values collected by five different devices at the same time and location from the same APs..

We define

By learning

Next, we will explore the relationship between different RSS values collected by different mobile devices. The signal processing diagram of the mobile device is shown in Figure

Signal processing diagram of the mobile device.

We suppose that the transmit power of the AP is

Assume that

Figure

Schematic diagram of relation between data points.

The linear equation determined by two points

The slope and intercept of the equation are

Substituting the values of Eq. (

As can be seen from Eq. (

Assume that the RSS vectors

In practice, the instability of the AP transmit power and the complexity of the indoor electromagnetic environment make the RSS values collected by the mobile device unstable at different times.

Substituting the values of Eq. (

When the indoor environment approaches an ideal environment, which means the difference of AP transmitting power at different times

Similarly, we can get

As a result, the linear relationship between

In an actual indoor environment, the working status of the AP is unstable, and the electromagnetic environment in the room is very complicated. Therefore, the difference

Figure

The slope and intercept between RSS values collected by different devices. (a) Slope diagram. (b) Intercept diagram.

In Figure

Linear correlation between RSS values for different devices.

As a result, we apply the linear regression method (LR) as the mapping function in this paper. The LR model is defined by specifying how the signal space of localization device

In this section, a preprocessing procedure is used to stabilize the acquisition of RSS values at the beginning. Then, the RSS values collected by an unknown device are labeled automatically with a rough location estimation using a correlation ratio computed from the Pearson product-moment correlation coefficient. Finally, the linear regression algorithm (LR) is proposed as the mapping function to solve the device diversity problem, and the fast least trimmed squares (FAST-LTS) is applied for the LR method to provide a more robust performance.

The first step in our work is to mitigate the RSS fluctuations caused by the complexity of the indoor environment. Typically, when building the radio map, we measure a large number of RSS values at each RP to eliminate the noise. Let

Preprocessing of RSS values using trimmed average. (a) Traditional average. (b) Trimmed average.

Figure

There is also a special case when calculating the truncated average. Consider a situation that only one sample reports valid reading

After completing the preprocessing of RSS data, we use the linear regression method as the mapping function.

Based on the mapping function in Eq.(

When we get the online RSS values

The fingerprints in

In Eq. (

Assume that the amount of the nearest neighbors in _{,} and

Given the

Compute

Compute the residuals

Sort the absolute values of these residuals,

Arrange the absolute values of the residuals in ascending order, let

Compute

Repeating

In this section, we analyzed the probability of error detection in the crowdsourcing indoor localization system. In the offline training phase, the radio map

Using the proposed LR method, the online RSS vector

In this paper, the KNN algorithm is used to estimate the location of the online RSS vector

In the KNN algorithm, we choose the fingerprints with the smallest Euclidean distance as the nearest neighbor of

If the nearest neighbor of the online data

Obviously, in Eq. (

Substituting the values of Eq. (

When a localization error occurs, then

By making use of triangular inequality,

From Eq. (

Because

In Eq. (

From Eq. (

Then, we can get

At last, the linear regression coefficient between

Substituting the linear regression coefficients into Eq. (

Using Eq. (

For the right side of the inequality in Eq. (

Using Cauchy-Schwartz inequality,

Therefore,

For the left side of the inequality in Eq. (

Since

Using Eq. (

Assume that the actual nearest neighbor of the online data

Each of the standard normal distribution function

The effectiveness of the proposed LR method is studied and analyzed through experiments and simulations in this section. Figure

Floor plan for indoor localization, where the area colored in yellow is used for testing.

Illustration of the reference point distribution in the interesting area for localization.

To verify the LR method, several RSS values are collected by the test devices and are used to find the candidate fingerprints in the radio map at the beginning. Based on the candidate fingerprints and the measured data, the linear regression coefficients are calculated. Then, the signal space of the radio map and the online RSS values can be mapped to the same signal space, and we can obtain the accurate localization result.

Next, we take (Lenovo, Huawei) pair as an example to illustrate the effectiveness of the LR algorithm. As a comparison, the linear regression coefficients are calculated by the LTS algorithm and LLS algorithm, and the linear regression functions are shown in Figure

Linear correlation between Huawei and Lenovo.

In order to demonstrate the ability of linear regression algorithm more intuitively, the comparison of RSS values before and after using LTS algorithm is illustrated in Figure

Comparison of signal distributions.

After applying the LTS algorithm, the RSS values in the radio map are transformed, and the KNN algorithm (

Comparison of the proposed algorithm with other algorithms.

In Eq. (

Localization performance for different numbers of candidate nearest neighbors.

The probability of error detection in Eq. (

Probability of error detection.

In this paper, the linear regression (LR) method is proposed to overcome the device diversity problem for the RSS fingerprint-based WLAN indoor localization system using crowdsourced data. The intuition behind this technique is that the RSS values between different devices have a linear relationship. The Pearson correlation coefficient is used to label the RSS values with rough location estimation at the beginning, and the regression coefficients are calculated by the LTS algorithm. Based on the LR algorithm, the RSS values collected by distinct devices can be shifted into the same signal space, and the device diversity problem can be solved. We did a theoretical study of the probability of error detection, and the proposed algorithm is validated through it. Furthermore, we tested the proposed method in a typical office environment, and the experimental results demonstrate that the proposed method leads to significant improvements in localization accuracy.

The radio map data used to support the findings of this study were supplied by Liye Zhang under license and so cannot be made freely available. Requests for access to these data should be made to Liye Zhang (

The authors declare that they have no conflicts of interest.

Liye Zhang provided the conception. Liye Zhang and Xiaoliang Meng made the analysis and experiment. Liye Zhang, Xiaoliang Meng, and Chao Fang reviewed and edited this paper.

This paper is supported by the Shandong Provincial Natural Science Foundation, China (grant number ZR2019BF022) and National Natural Science Foundation of China (grant number 62001272). Reference [