Support Vector Regression Based Indoor Location in IEEE 802 . 11 Environments

The wide spread of the 802.11-based wireless technology brings about a good opportunity for the indoor positioning system. In this paper, we present a new 802.11-based indoor positioning method using support vector regression (SVR), which consists of offline training stage and online location stage.Themodel that describes the relations between the position and the received signal strength (RSS) of the mobile device is established at the offline training stage by SVR, and at the online location stage the exact position is determined by this model. Due to the complex indoor environment, RSS is vulnerable and changeable. To address this issue, data filtering rules obtained through statistical analysis are applied at offline training stage to improve the quality of training samples and thus improve the quality of predictionmodel. At the online location stage, k-times continuousmeasurement is utilized to obtain the high quality RSS input, which guarantees the consistency with the training samples and improves the position accuracy of mobile devices. Performance evaluation shows that the proposedmethod has a higher positioning accuracy compared with the probability and neutral network method, and the demand for the storage capacity and computing power is also low at the same time.


Introduction
Location awareness is one of the keys to the success of pervasive computing.All kinds of pervasive applications such as behavior recognition, intelligent medication, and intelligent building must get the accurate position information of the mobile devices/users to provide accurate and timely services for these devices/users [1][2][3].
Nowadays, GPS [4] dominates the market of location system.However, GPS cannot function well in the indoor environments because its radio signal suffers from the influence of the buildings, walls, and so forth.In order to provide indoor positioning service, many researchers have proposed a variety of indoor positioning systems based on the principle of different technologies.IEEE 802.11-based positioning system is one of the most successful systems because the 802.11compliant hardware is inexpensive and already available in many buildings.Existing research results have shown impressive position accuracy.
Many IEEE 802.11-based indoor positioning systems utilize a so-called fingerprinting approach, which uses a twostage mechanism, offline training phase and online position determining phase.During the offline phase, the signal strength distributions collected from the Wireless Access Points (APs) at predefined reference points in the operation area are stored together with their physical coordinate in a database.During the online position determination phase, mobile devices sample the signal strength of the APs in their communication range and search for similar patterns in the database.The closest match is selected and its coordinate returned to the mobile device as the position estimated.
IEEE 802.11 uses the industrial, scientific, and medical (ISM) band at 2.4 GHz or 5 GHz, and the predominant ones are the 2.4 GHz based substandards called 802.11b/g/n.2.4 GHz is the resonance frequency of water.This is the reason the human body containing water can disturb the transmissions of these radio signals.Moreover, due to the complex indoor environment, wireless channel congestion, obstructions, and limitation of node communication range, the received signal strength (RSS) is vulnerable and changeable in the practical application.The accuracy of the positioning algorithm seriously suffered from this vulnerable and changeable RSS information.In addition, most of the existing positioning methods require mobile devices to store 2 Mobile Information Systems the fingerprints, which makes higher storage capacity and computing power of mobile devices in charge in the scene of large-scale applications.
In this paper, we present a novel IEEE 802.11-based indoor positioning method using support vector regression (SVR), which also consists of offline training stage and online location stage.The accurate position prediction model is achieved at the offline training stage by SVR, and at the online location stage the exact position is determined according to the received signal strength (RSS) of the mobile devices.To address the vulnerable and changeable RSS information issues, corresponding data filtering rules obtained through statistical analysis are applied at offline training stage to improve the quality of training sample and thus improve the quality of prediction model.At the online location stage, -times continuous measurement is utilized to obtain the high quality input, the online received signal strength, which guarantees the consistency with the training samples and improves the position accuracy of mobile devices.Performance evaluation and comprehensive analysis are done through intensive experiments.And the results show that the method proposed in this paper has a higher positioning accuracy when compared with the probability positioning method and the neutral network positioning method, and the demand for the storage capacity and computing power of the mobile devices is also low at the same time.
The remainder of this paper is structured as follows.A summary of related work is presented in the following section.Section 3 describes our positioning method in detail, including SVR-based location modeling, offline training sample filtering, online -times continuous measuring, and input preprocessing.The performance evaluation and analysis are given in Section 4. Finally, Section 5 draws the conclusions.

Existing Position Methods in IEEE 802.11-Based Indoor
Positioning Systems.The fingerprinting approach is the most common used location method in 802.11-based indoor positioning systems.Varieties of existing methods have been proposed to find the closest match in the fingerprints, which include -nearest neighbor (KNN) [5,6], smallest vertex polygon method [7], probabilistic method [8], neural networks [9], and decision tree [10].
In the KNN method proposed by Bahl and Padmanabhan [5], fingerprints are collected by deploying the mobile devices at each predefined reference point in the operation area.Then -nearest position is found based on the least squares method during the online positioning phase.At last, the location of the target is determined as the average of these -nearest positions.To improve the accuracy of the KNN positioning method, Bhasker et al. [6] considered the users' feedback during the online location period.The Smallest -vertex polygon method [7] is similar to the KNN method.They are the same during the offline training phase.During the online positioning period, the smallest -vertex polygon method takes the location unit in which the signal strength value is closest to the signal value in each of the -dimensional signal vectors collected by the real-time detection as an optional vertex, selects the smallest perimeter polygon among all of the -vertices polygons, and averages these  positions as the predicted results.
The probabilistic method [8] divides the entire positioning scene into a series of discrete areas,  1 ,  2 , . . .,   .The RSS distribution information of each area is obtained through the training (usually stored in a histogram or distribution of the Gaussian model) during the offline phase.During online positioning phase, , the observed signal strength vector, is obtained, and the location is determined by the following rule: if ((  | ) is the probability that the signal vector  is observed, given that the mobile node is located in location   ), then choose   as the location of the mobile node.(  | ) can be calculated as follows, according to the principle of the Bayesian posterior distribution: where (  ) is the probability of the mobile node in location   , and it is usually assumed that the probability of the mobile node at each area follows a uniform distribution in practical application, and ( |   ) can be calculated according to the RSS distribution information of each area stored during the offline phase, () = ∑  =1 ( |   )(  ).However, this method can only obtain the discrete position information.Therefore, the continuous position information of the mobile device can only be obtained by interpolation processing.
Mehmood et al. proposed an indoor positioning system based on artificial neural network [9], which also includes offline phase and online phase.During the offline training phase, the trained neural network was obtained through the ⟨coordinates of the location cell, RSS⟩ data.During the online phase, the RSS information collected by the mobile device is taken as the input to the trained neural network and the positioning results are the output of this trained neural network.Compared with the probability method, this method has a high positioning accuracy; the experimental results show that the average error distance decreased from 2.54 meters to 1.43 meters.
Yim [10] builds a decision tree with training data during the offline phase and determines a user's location referring to the tree.This method is more efficient than commonly used probabilistic method, neural networks, and KNN method.Moghtadaiee and Dempstera [11] also used fingerprinting method to implement the indoor location with the FM radio signal.Oussalah et al. [12] proposed a multivariable fuzzy inference system for fingerprinting location, which makes use of -nearest neighbor classification in signal space and determines target's location as a weighted combination of nearest fingerprints.The weights are determined using Takagi-Sugeno fuzzy controller.
For the methods proposed in the literature [5][6][7][8][10][11][12], the mobile devices need to store the entire fingerprints, which put a large burden on the mobile devices with limited storage capacity.At the same time, these methods take much time to search the entire fingerprints during the online phase.The method based on neural network [9] has large computational overhead, and the location algorithm is done by the server, not the mobile device, so it leads to transmission delay and may make the server become a bottleneck.It also causes the concern about leaking the user privacy.For most of these methods, the average distance errors are about 2-3 m, and only few of them reduce this value to 1-2 m.
Talvitie et al. [13] considered having an incomplete fingerprint database with realistic coverage gaps and utilized interpolation and extrapolation methods for recovering the missing fingerprint data.The idea of improving the quality of fingerprinting data is similar to our methods.However, the circumstances are different.

Support Vector Machine Based Positioning Method.
The learning methods based on kernel function [14,15] made tremendous achievements in aspects such as the data classification, the regression estimation, and the function approximation.Support vector machine (SVM) is a new type of machine learning methods among such methods, which has good features such as higher fitting accuracy, fewer parameters, and global optimality.It was generally believed that the data has better separability when the data is extended to high-dimensional feature space, which is suitable for the use of support vector classification, support vector regression, and other kernel function learning methods [16].Support vector machine includes support vector classification (SVC) and support vector regression (SVR).SVR is used to handle function regression problems.It has been successfully applied to the identification systems, nonlinear forecast systems, and achieved good results.
Battiti [17,18] made a comparative analysis of various location methods based on statistical learning theory (the results show that the method based on support vector machine has the best results) and recommended the use of support vector classification for location.Nguyen et al. [19] proposed the use of the support vector machine method for location in ad hoc networks.This approach assumes that all nodes can receive the signals of others, there are some anchor nodes whose position was known in the network, and the location of the nodes is divided into  categories.The data collected by the anchor nodes is taken as support vector machine training dataset, and the classification model is then obtained and used to deduct the location during the online phase.Tran and Nguyen [20] studied the wireless sensor network position estimation based on support vector machine and analyzed the upper bound of the positioning error and proposed a modified version of mass-spring optimization to further improve the location estimation.Feng et al. [16] did similar work and proposed a hierarchical support vector machine (H-SVM) program to carry out large-scale sensor network location and carried out an analysis of the average error and variance, as well as the probability distribution of the positioning error.All these researches [16,17,19,20] studied the location problem in ad hoc and sensor networks based on support vector classification.The scenarios are different with indoor location applications.And the location results correspond to the discrete class, so the location accuracy is closely related to the method and degree of discretization.For example, the method proposed by Tran takes the midpoint of the predicted class and the adjacent class as the location result.
Wu et al. [21] investigated the location estimation for Global System for Mobile (GSM) communication based on support vector regression.Using support vector regression, they studied location estimation problem with the missing value by providing theoretical and empirical analysis on existing and novel kernels.A novel synthetic experiment was designed to compare the performances of different location estimation approaches.The proposed support vector regression approach shows promising performances, especially in terrains with local variations in environmental factors.However, it is for the cellular communication system, not for the 802.11-based wireless local area network.
Space partition methods [22,23] were presented to improve the location accuracy by dividing the physical space of the target region into small areas according to the RSS features.For each divided area, corresponding SVM model was trained and applied in the location to ease the effects of variable signal.However, even in the small area, the signal may also be changed due to the reasons such as people's movement or network congestion.
Like space partition, Deng et al. [24] also clustered the whole radio map into a series of sub-radio maps by -means clustering.After that, kernel direct discriminant analysis (KDDA) is used to implement nonlinear discriminative feature extraction of RSS.Then, the relationship between extracted features and physical locations is established by SVR.The focus is to design feature extracting methods that can characterize and capture the nonlinear RSS patterns well.
Our focus is to design appropriate data filtering procedures based on the features of RSS information in IEEE 802.11 environment for training dataset to eliminate the abnormal data and get the location model with high quality.The preprocessing procedure for input data is also included to get the more stable input.The ultimate goal is to achieve higher location accuracy.

SVR-Based Location Method
3.1.Location Process.The core of the SVR-based location method is to identify the relationship between the RSS information collected by the mobile device and its geographical location based on support vector regression.The framework of the method is shown in Figure 1.Mobile device obtains its location information through the RSS information collected from 802.11 APs deployed at the predefined location.The entire location process consists of two stages, the offline training stage and the online location stage.(i) Raw Data Collection.The mobile device is placed at each predeployed reference point to collect the RSS information from the APs.
(ii) Raw Data Filtering.Due to the complex indoor environment, wireless channel congestion, obstructions, and limitation of node communication range, the received signal strength (RSS) is vulnerable and changeable.To address the above issues, corresponding data filtering rules obtained through statistical analysis are applied to improve the quality of training sample and thus improve the quality of the location model.
(iii) Training the Data Using Support Vector Regression.In this stage, all the collected data is stored on a central server, and data filtering and training are also done on this server.After the appropriate location model is established by training the filtered data using support vector regression algorithm, the mobile devices get this model from this server.

Online Location.
During online location phase, mobile devices collect the RSS information from APs and use this information as the input to the location model established in the offline training period to get the real location.The process is shown as follows.
(i) -Times Continuous Measurement to Collect RSS Information.The RSS information collected by the mobile device is vulnerable, changeable, and inconsistent even at the same position.To get the accurate RSS information, mobile devices execute  continuous measurement and get  RSS values at each location during online phase.
(ii) Input Data Preprocessing.Input data preprocessing analyzes  RSS values obtained through -times continuous measurement, eliminates the effects of environmental disturbance on the RSS values, and determines the final RSS value that describes the relationships more accurately.
(iii) Location.The RSS information after preprocessing is taken as the input to the position prediction model trained during offline phase and the final location is calculated.In this stage, mobile devices do not need to exchange the information with the central server.This is because our SVM model is small enough to be stored in the mobile device.And unlike the common fingerprints based methods, there is no need to search all the reference points to match the location.
Offline data filtering and online data preprocessing can effectively reduce the impact of transient variations of RSS values on the location accuracy.However, when major or permanent changes happen, for example, the location of AP changes or the indoor layout changes, the position prediction model needs to be retrained by the latest collected data.This retrained process can be triggered by the location error analysis.If the location error exceeds predefined threshold, the retraining begins.

Support Vector Regression Based Prediction Model.
As shown in Figure 1, the position prediction model describes the relationship between the physical position of the mobile device and the RSS information received from each of the 802.11APs.Assume that there are  802.11APs, and   = {mac 1 (), mac 2 (), . . ., mac  ()} denotes the RSS information received by the mobile device at the physical location  for each measurement, where mac  () is the RSS information received by the mobile device at the physical position  from the th 802.11AP.   denotes the coordinates of the mobile device in the physical position .Given a training dataset  := {(  ,   )}  =1 (  ∈   ,   ∈ ), the goal of the function regression is to find the mapping  :   →  and make (  ) ≈   .This mapping relationship is nonlinear.For nonlinear problems like this, support vector machine method [21,25] maps the original data  into a higher-dimensional feature space by utilizing a nonlinear function () and then performs linear regression in the feature space where () denotes a nonlinear mapping function from the input space  to a high-dimensional feature space and  and  are the support vector weight and the bias, respectively.The nonlinear regression model is turned into a linear regression model after the input vector  is mapped into the feature space.Parameters  and  are calculated by minimizing the following regularized risk function: ( The value of the loss function is 0 when the predictive value of the error is less than ; otherwise, linear punishment is applied. By introducing the positive slack variables   and    , the minimization of ( 4) is equivalent to minimizing the following constrained risk function: subject to the constraints where  is a regularization constant, controlling a compromise between maximizing the margin and minimizing the number of training set errors, and   and    represent upper and lower constraints on the outputs of model.This constrained optimization problem can be defined as the following Lagrangian function: where ,   , t, and t  are the Lagrange multipliers.The optimization of (,   ,    , ,   , t, t  ) must satisfy the following conditions: The above problem can be transformed to dual optimization problem by substituting (9) into (8).A convex function can be obtained: The solution can be obtained by maximizing (10) subject to a new set of the constraints: With the Lagrange multipliers   and    , the estimated output can be represented by where (,   ) is called the kernel function.According to Karush-Kuhn-Tucker's (KKT) conditions of solving quadratic programming problem, only some of the coefficients,   −    , are not zeros, and the corresponding training Mapping vectors: Φ(x i ), Φ( . . ., Support vectors got by training: x 1 , x t Real-time RSS information at unknown point s  : x (ii) polynomial kernel (where  is the degree of the polynomial and  and  are constants), (iii) Gaussian RBF kernel with the following form: (where  is a user-specified parameter and ‖⋅‖ denotes the distance between two input vectors), (iv) Sigmoid kernel (where  and  are constants).

Data Filtering during the Offline
Phase.The quality of the training sample determines the quality of the final position prediction model when using the support vector regression algorithm: the better the quality of the training sample is, the better the final model is.Meanwhile, the more precise the input of the model is, the more accurate the location results obtained are.In the wireless indoor environment, the wireless signal transmission is influenced by many factors, and the RSS information received by the mobile device is variable due to complex transmission environments.Even at a fixed point, the RSS information collected by the mobile device will be changing, which may compromise the quality of the training sample and the model input and affect the final position accuracy.Therefore, in order to obtain the high quality training samples, the raw data needs to be filtered at the offline phase.And the input RSS information must also be preprocessed to obtain a more accurate input at the online phase.Therefore, we will discuss the raw data filtering rules in this section and the input RSS information preprocessing in the next section.
There are a lot of existing methods to ease the effects of abnormal data.For example, averaging is a simple and common used method, which can give a measure that is more robust in the presence of outlier values.Raw data filtering can be done through the simple averaging of the RSSI value groups.For wireless environment, reliability is always a concern, which makes the packets containing RSS information vulnerable to loss and leads to lower RSS value.Based on this intuition, the rule of removing the least values eliminates certain number of the lowest values from a RSSI value group.
However, 802.11 radio transmissions have their own characteristics.To get better result, filtering rules should be designed according to these characteristics.Through an analysis of the collected data, the probability distribution of the RSS information collected by the mobile device at a fixed point from the APs is shown in Figure 3; we found that it is consistent with the log-normal distribution law described in [26,27].
Assume Then, the probability density function of  is as follows: where mac  () denotes the RSS information received by the mobile device in the physical location  from the th 802.11AP,  is the logarithmic mean of , and  is the logarithmic standard deviation of .
Then, the expected value of  is Based on the probability distribution, we can set a positive threshold , if and only if 0 < (  ) −  < −mac  () < (  )+  mac  () is considered as a valid value.And if the value is beyond the range, too high or too low, it is considered as an abnormal signal value.For each measurement value  = −mac  (), the probability of a valid value is ) . ( In addition to the abnormal RSS values, due to network congestion and lossy wireless links, the packet containing the RSS information from AP may be lost, and the mobile device may get a record with mac  () = 0.Even in a good transmission environment, high network load would cause packet dropping.Therefore, we need to distinguish the situation where the packet loss makes mac  () = 0 from the situation where the positional relationship between the mobile device and the corresponding 802.11APs makes the signal intensely weak.Therefore, utilizing multiple measurements to distinguish between these two cases is proposed.
Assuming that measurements of the device are independent of each other, for each reference point, the probability that, in each measurement, the mobile device cannot receive the RSS information from the th AP at this position is where  is the total number of measurements that the mobile device made at the reference point and  is the total number of measurements where the mobile device received the RSS information from the th AP at the reference point.
If   is very high, it suggests that the mobile device is far from the th AP and the received signal strength itself is low, or the communication between the mobile device and the th AP is interfered in (such as barriers); the RSS information is unstable.In such cases, even multiple measurements cannot obtain a valid signal strength value.If   is very low, it suggests that the device is within the communication range of the th AP; it is able to receive a normal signal strength value.In this situation, mac  () = 0 is caused by the drop of the packet containing the signal strength information due to the reason such as network congestion.In order to improve the accuracy of the location prediction model, these data records should be removed from the training sample.Therefore, we define a valid measurement threshold .If   ≤ , we believe that the mobile device at the reference point can communicate effectively with the th AP, and most measurements can obtain the valid values; if   > , we believe that the communication between the mobile device and the th AP is subject to greater interference, and the mobile device cannot get the signal strength information; most measurements are invalid.Based on the above analysis, we define the corresponding data filtering rules as follows during the offline period: (1) For each record collected by the mobile device at the reference point from the APs, if   ≤  for  = 1, 2, 3, . . .,  (where  is a valid measurement threshold value) and mac  () = 0, then the record is considered to be invalid and will be removed; if   >  for  = 1, 2, 3, . . .,  and mac  () ̸ = 0, it is also considered to be invalid and will be removed as well.
(2) For each record collected by the mobile device at the reference point from the APs, if   ≤  for  = 1, 2, 3, . . .,  and −mac  () < (  ) −  or −mac  () > (  ) + , the record is considered to be invalid and will be removed.
The first case of the first rule describes the situation that the packet containing the normal signal strength is lost due to the network congestion; the record should be removed; the second case describes the situation where a small amount of unreliable records generated when the device and AP are in the margin of their communication range should be filtered out.The second rule describes the situation where the abnormal signal should be filtered out when the communication between the mobile device and the APs was disturbed.

Input Data Preprocessing during Online
Phase.The unstable and abnormal data are removed from the training sample after the offline period, and an accurate position prediction model is obtained.To get the position information with high accuracy, the input to the model during online phase should also be of high accuracy; unstable and abnormal RSS data should be avoided too.-times continuous measurements to collect RSS information are proposed.During the online period, the mobile device performs  continuous measurements at first to ensure that the valid RSS information can be acquired; then the input data preprocessing method removes the abnormal and disturbed data from the obtained data to get high quality input that truly reflects the relationship between the mobile device and the 802.11APs.
If the simple rules such as averaging and removing the least values are used in the data filtering, the same rules can also be used in the data preprocessing.For the sophisticated data filtering rules we discussed in the previous subsection, more complex preprocessing rules are needed.Under this circumstance, to obtain the valid RSS information, we must set an appropriate  value. Assume where mac  (  ) denotes the RSS information received by the mobile device at the physical location of   from the th 802.11AP.
Based on the above analysis,  follows a log-normal distribution.Let   ((  ) −  <  < (  ) + ) denote the probability that  is in the range ((  ) − , (  ) + ) (where () denotes the expected value of  and (  )− > 0).When considering the situation where the signal strength cannot be properly received by the mobile device, let ((  ) −  <  < (  ) + ) denote the probability that the RSS value received by the mobile device in each measurement is in the range ((  ) − , (  ) + ) (where (  ) −  > 0); then where   denotes the probability that, in each measurement, the mobile device cannot receive the RSS information from the th AP at this position.
In order to ensure that the mobile device is able to obtain the valid RSS information after the -times continuous measurements during the online period, the following condition must be satisfied: where  is a constant approaching 1. Then, In the real system, the value of  must be greater than or equal to the derived value from formula (25).The larger the value of  is, the more the amount of the valid RSS information obtained by the mobile device is.At the same time, the larger  value also means that the mobile device takes more time to measure, which leads to longer delay and deteriorates the user experience.The scope of application may also be restricted; for example, the moving speed of the user is limited.Formula (25) gives a lower limit of the  value.The real value of  can be determined based on the indoor wireless environment and the needs of users.In general, the better the wireless transmission environment is, the lighter the wireless network load is, the less the packet loss is, the lower the probability of the signal's abnormalities is, and the smaller the  value can be.On the other hand, if the requirements of users on location accuracy are low, the  value can also be smaller.From formula (25), it can be seen that the lower limit of the  value is directly related to the parameters  and . reflects the range of the users' tolerance to the signal strength changes, and  reflects the reliability of the assurance.The optimal  value should be determined by users' need and real environment.
After the value  was determined, the mobile device gets the valid RSS information through -times continuous measurements during the online period.Then, this information is preprocessed as follows. Let where  is a valid measurement threshold value, mac  (  )  is a valid measuring value, and  is the total number of the valid values corresponding to the -times continuous measurements from the th AP.
After preprocessing, we can obtain the valid and true RSS information that is consistent with the offline training sample.It serves as the input to the position prediction model, which greatly improves the location accuracy.

Simulation Environment.
The performance of this positioning method is evaluated through simulation.The data comes from the real deployment and experiments of 802.11based wireless positioning system as shown in Figure 4.The grid of reference spots in the operation area includes 130 spots with spacing of 1.5 meters.Wherein the blue spots represent the position of the reference point, the yellow spots represent the location of the AP, and the red spots represent the test position during online period.The entire system covers an area of 221 square meters, and 25 802.11APs are deployed in the entire region.In our experiments, not all the APs are used.The default number of used APs is 17; that is to say,  = 17.We also change the number of used APs to study its effect on the accuracy.And 110 signal strength samples are collected at each reference spot, which means  = 110.For each measurement, it takes 250 milliseconds.
The implementation of SVR is based on the LIBSVM; the parameters of the SVR are fixed at an empirically favorable value:  = 1, 000,  = 0.01.To measure the positioning error, let {  ,   } denote the predicting coordinates of the position prediction model, and {  ,   } denote the actual coordinates; the location error distance is calculated as In the experiment, the training samples were filtered according to the corresponding filtering rules and the times continuous measurement method is used to collect RSS information during the online period.The experimental parameters are described in Table 1.

The Impacts of the Data Quality on the Location Method.
Our method improves the quality of the training sample and input data by the filtering operation during the offline period and input data preprocessing during the online period.In order to analyze the influence of the quality of the corresponding data on the accuracy of the positioning method based on SVR, four experiments have been done to compare with each other.In experiment 1, the training sample is not filtered during the offline period, and the mobile device only collects 1 sample during the online period at the unknown point.In the second experiment, the training sample is not filtered during the offline period either, but the mobile device repeated 8-time measurements during the online period at the unknown point, and then the records are preprocessed.In experiment 3, the training sample is filtered during the offline period, but the mobile device only collects 1 sample during the online period at the unknown point.In experiment 4, the training sample is filtered during the offline period, and the mobile device repeated 8-time measurements during the online period at the unknown point, and then the records are preprocessed.Figure 5 shows the cumulative probability distribution (CDF) of the error distance for these four scenarios.The average error distance is shown in Table 2.
It can be seen that the positioning accuracy of experiment 4 is the best, and the average error distance is 0.68 m, less than 1 m.The results of experiment 1 rank second, and the average error distance is 2.25 m.Then experiment 2 follows; the average error distance is 2.70 m.Experiment 3 generates the worst results; the average error distance reaches 10.37 m.Therefore, the filtering operation during the offline period and the -times continuous measurement to collect RSS information during online period can effectively improve the position accuracy.The results from experiment 1 indicate that original SVR without training data filtering and input preprocessing has good features such as higher fitting accuracy, fewer parameters, and global optimality and can establish good mapping nonlinear relations between RSSI and location.To further improve the performance, better

Parameters name Default value Description
Training sample size 5000 The default size of the training sample is 5000, while the value is varied in the simulation to reveal the relation between the training sample size and the performance.
The number of APs 17 The default number of APs is 17, while the value is varied in the simulation to reveal the relation between the number of APs and performance.

Kernel function RBF kernel
The default kernel function is RBF kernel, while different kernel functions are used in the simulation to reveal its impact on the performance.
The value of  8 The default value of  is 8, while the value is varied in the simulation to reveal its impact on the performance. 0.9 The default value of  is 0.9, while the value is varied in the simulation to reveal its impact on the performance.

𝜃 4
The default value of  is 4, while the value is varied in the simulation to reveal its impact on the performance.Test sample size 1000 Better model combined with worse input leads to the worst result because our model does not search all the reference points and the results depend on the input.Also worse model combined with better input cannot lead to better results.Since the size of training data is huge, the percentage of the abnormal data is small and the effect on the model is relatively small.At the online stage, sampling the abnormal data would make the input completely wrong.This is the reason why the results of experiment 3 are worse than experiment 2.

Performance Comparison.
In order to demonstrate the effectiveness of the location method (denoted as Preprocessing SVR) proposed in this paper, we have it compared with the probabilistic method [26] (denoted as Probabilistic Model), the probabilistic method in which the fingerprinting data has been filtered and the input has been preprocessed (denoted as Preprocessing Probabilistic Model), the SVR method with no filtering and no preprocessing (denoted as No Preprocessing SVR), the ANN method [9] (denoted as No Preprocessing ANN), and the ANN method in which there is the same operation as the Preprocessing SVR method during the offline period and the online period (denoted as Preprocessing ANN).For each method, the CDF of the final location error distance is shown in Figure 6.The average values are shown in Table 3, from which we can see that the positioning accuracy of the Preprocessing SVR method (our proposed method) is the best among all these methods.The performance of the Preprocessing ANN method is close to the Preprocessing SVR method.The average error distance of the Preprocessing ANN method is 0.886 m.The average error distance of the Preprocessing SVR method is 0.68 m.Without filtering and preprocessing, the average error distance of the No Preprocessing SVR method reaches 2.25 m, and the average error distance of the No Preprocessing ANN method also reaches 2.26 m.
Therefore, the effective data filtering and preprocessing operation can greatly improve the accuracy of the positioning methods based on machine learning; the performance of the SVR methods is superior to the ANN methods.Compared with SVM model, the size of ANN method is larger and its computing complexity is also higher; therefore, ANN methods cannot be implemented on the device side.The mobile devices have to send the sample to the server where the location results are generated and sent back to the devices.ANN methods that are implemented on the server side cause extra delay and extra network overload.
The average error distance of the probability model method is 2.857 m.With preprocessing and filtering, it decreases to 2.19 m.It shows good data quality can improve the performance of probability model.But it still cannot compete with the SVR method.Unlike SVR and ANN methods, the Probabilistic Model does not have sophisticated data training process to get a prediction model.The locations are determined through a searching and matching process.SVR and ANN can capture more accurate mapping relations between the RSSI information and locations.To the best of our knowledge, SVR and ANN usually perform better than the Probabilistic Model, which have been proved by the existing literatures.4.5.The Impact of Kernel Function.According to the support vector regression theory, the kernel functions that meet the demands of the Mercer Theorem transform the dot product operation in high-dimensional feature space into the function calculation in low-dimensional feature space [16] to solve the "curse of dimensionality" problem.The location prediction model obtained by different kernel functions will have different performance.In the experiments, we tested the linear kernel, the polynomial kernel, the RBF kernel, and the Sigmoid kernel.The results are shown in Figure 8; we can see that the location prediction model using the RBF kernel has the highest accuracy.Using the linear kernel, the average error distance is 2.44 m.Using the Sigmoid nuclear function (where  = 1/17,  = 0), the average error distance was 2.46 m.The average error distance was 0.68 m with the RBF kernel used.Using polynomial kernel ( = 1/17,  = 0, and  = 3), the average error distance is 1.15 m.In our experiments, we try the different parameters for all the kernel functions.For each kernel function, the results presented in our paper are the best results got in these experiments.

The Impact of the Number of APs.
In general, the large number of APs in a certain region leads to higher location accuracy.In the previous sections, the number of APs is 17; now we change the number of APs from 1 to 20 to study the impact of the number of APs on the location accuracy.The APs used for location are randomly selected from all the APs.We repeat the same experiment as mentioned above.The results are shown in Figure 9.These results are consistent with our original prediction that the higher the number of APs is, the higher the accuracy of the system is.When the number of APs is 20, the minimal average error distance is achieved.However, the importance of APs to the location is not the same; the location accuracy does not increase smoothly with the increase of the number of APs.Some APs may have more important influence on the positioning accuracy of the location methods than other APs.Therefore, how to deploy the appropriate number of APs in the right places needs further research.Moreover, when the number of APs reaches a certain number, a high positioning accuracy can be obtained by the location method proposed in this paper, but there is no significant improvement in positioning accuracy after the number of APs exceeds 8. 4.7.Determination of the  Value.As described before, the value of the parameter  has an important impact on the positioning accuracy of the system, and it is closely related to the parameters  and  and the deploying environment.To obtain a good positioning accuracy and meet the users' needs at the same time,  should be set to a specific value according to the wireless transmission environment and the time delay requirements of users, as well as the positioning accuracy.In order to get the relation between the value of  and the location prediction model, we change the value of  from 1 to 12 one by one with corresponding  and .The experimental results are shown in Figure 10.Overall, the positioning accuracy of the model increases with the increase of the value of .The values of the parameters  and  identify the training sample filtration standard; the more stringent the standard, the larger the value of  required for the accuracy of positioning model to reach convergence and, at the same time, the better the accuracy.On the contrary, the lower the standard, the less the value of  required for the accuracy of positioning model to reach convergence and the lower the location accuracy.

Resource Consumption of the Mobile Device.
Due to a limited storage space and computing power of the mobile devices, how to reduce the demand for data storage capacity and computing power of the mobile devices for the positioning methods is also very important.In order to reduce the demand for data storage capacity of the mobile devices, the literature [28] proposed an algorithm for 80.11-based positioning system.For this algorithm, the experimental results show that the mobile terminal equipment only needs to store 12 Kbytes of data and greatly reduced the demand for storage capacity of the mobile device.However, the mobile devices need to communicate with the data storage center and update the "fingerprint" data synchronically, which greatly aggravates the communication loads for the mobile device.For the method proposed in this paper, if the size of training sample is 5000 and the kernel function is the Gaussian kernel function and the number of the APs is 17, the size of the obtained position prediction model is only 81 Kbytes.It is far smaller than the entire "fingerprint" data (6110 Kbytes), which greatly reduced the storage load of the mobile device.The ANN method [9] that generally provides locationbased services in server-client mode due to computational complexity causes the problem of users' privacy protection.For our positioning method, the main work of the calculation is completed during the offline period, and it can calculate the location information of the mobile device with constant time complexity during online period; thus the requirements for the computing power of the mobile device are not high.
4.9.The Impacts of Filtering/Preprocessing Rules.Good data quality leads to accurate location results.Filtering/preprocessing rules decide the quality of trained data and input data.In our paper, the particular preprocessing/filtering rules are devised according to the characteristics of 802.11 radio environment.To prove the efficiency of these rules, we compared them with two other filtering/preprocessing methods mentioned in our paper, averaging and removing the least values.The results are shown in Figure 11.With averaging, only very slight improvement is achieved compared with the No Preprocessing SVR.The two curves are almost identical.The reason is that the unreliable wireless transmission causes lower RSS values and has huge impacts on the location results.Removing the least values performs better than averaging because it can remove some unreal lower RSS values.Our approaches perform the best, which indicates that custommade preprocessing/filtering approaches are necessary to get high accuracy.
We also compare our approach with KDDA [24].According to the existing literature [24], the mean error is 1.46 m in the experiments, which is higher than Preprocessing SVR and lower than the original SVR.Of course, the comparison may be unfair because the environments and setups are different.Therefore, we implement our own version of KDDA without location clustering and used it as the filtering/preprocessing rules in the location process.As shown in Figure 11, KDDA performs better than averaging and removing the least values, which indicates that mapping RSS signal into a highdimensional kernel space can help in extracting the signal feature.However, our Preprocessing SVR still performs better because filtering/preprocessing rules improve the data quality significantly through removing the abnormal data samples.

Conclusions and Future Work
Location awareness as a part of a context-aware computing paradigm is one of the keys to the success of ubiquitous and pervasive computing.In this paper, we present a new 802.11basedindoor positioning method using support vector regression.Due to the complex indoor environment, wireless channel congestion, obstructions, and limitation of node communication range, the received signal strength (RSS) is vulnerable and changeable.To address the above issues, corresponding data filtering rules and -times continuous measurement method were proposed.And the results show that the method proposed in this paper has a higher positioning accuracy compared with the probability positioning method and neutral network positioning method.The demand for the storage capacity and computing power of the mobile devices is also low at the same time.
The work of the future research will focus on the analysis of the relationship between the position of the APs and the positioning accuracy of the location method.Given a certain number of APs, how to make full use of them to improve the positioning accuracy of the system and reduce the cost of the system is crucial to the real applications.

Figure 1 :
Figure 1: The framework of the location algorithm.

Figure 2 :
Figure 2: The location prediction model obtained by SVR.

Figure 3 :
Figure 3: The distribution of the RSS information.

Figure 4 :
Figure 4: The deployment of the positioning system.

Figure 5 :
Figure 5: Relation between the data quality and error distance.

4. 4 .
The Impact of Training Sample Size.The size of the training sample is a critical parameter to the positioning algorithm.It determines the lower bound of the time needed to collect data for the training.To analyze the impact of training sample size on the performance, we vary the training sample size between 1000 and 6000.The corresponding average error distance is shown in Figure 7.The average error distance rapidly reduced as the training sample size increases when the training sample size is less than 2000.When the training sample size is between 2000 and 4000, the average error distance decreases significantly slower.And when the training sample size is greater than 4000, the average error distance is almost constant as the training sample size increases.

Figure 7 :
Figure 7: Training sample size versus average error distance.

Figure 8 :Figure 9 :
Figure 8: CDF of the error distance of different kernel functions.

Figure 10 :
Figure 10: The value of  versus average error distance.

Figure 11 :
Figure 11: CDF of the error distance versus filtering/preprocessing rules.

Table 2 :
Error distances in different experiments.

Table 3 :
Comparison of the error distances of the different location methods.