Analyzing Machine Learning Models with Gaussian Process for the Indoor Positioning System

Recently, there has been growing interest in improving the eﬃciency and accuracy of the Indoor Positioning System (IPS). The Received Signal Strength-(RSS-) based ﬁngerprinting technique is essential for indoor localization. However, it is challenging to estimate the indoor position based on RSS’s measurement under the complex indoor environment. This paper evaluates three machine learning approaches and Gaussian Process (GP) regression with three diﬀerent kernels to get the best indoor positioning model. The hyperparameter tuning technique is used to select the optimum parameter set for each model. Experiments are carried out with RSS data from seven access points (AP). Results show that GP with a rational quadratic kernel and eXtreme gradient tree boosting model has the best positioning accuracy compared to other models. In contrast, the eXtreme gradient tree boosting model could achieve higher positioning accuracy with smaller training size and fewer access points.


Introduction
Wireless indoor positioning is attracting considerable critical attention due to the increasing demands on indoor location-based services.Examples of this service include guiding clients through a large building or help mobile robots with indoor navigation and localization [1].However, the global positioning system (GPS) has been used for outdoor positioning in the last few decades, while its positioning accuracy is limited in the indoor environment.Moreover, the GPS signals indoor are also limited so that it is not appropriate for indoor positioning.Generally, the IPS is classified into two types, namely, a radiofrequency-based system and infrared-based system.e radiofrequencybased system utilizes signal strength information at multiple base stations to provide user location services [2].e infrared-based system uses sensor networks to collect infrared signals and deduce the infrared client's location by checking the location information of different sensors [3].As the coverage range of infrared-based clients is up to 10 meters while the coverage range of radiofrequency-based clients is up to 50 meters, radiofrequency has become the most commonly used technique for indoor positioning.
Estimating the indoor position with the radiofrequency technique is also challenging as there are variations of signals due to the motion of the portable unit and dynamics of the changing environment [4].Moreover, the traditional geometric approach that deduces the location based on the angle and distance estimates from different signal transmitters is problematic as the transmitted signal might be distorted due to reflections and refraction and the indoor environment [5].Machine learning approaches can avoid the complexity of determining an appropriate propagation model with traditional geometric approaches and adapt well to local variations of indoor environment [6].us, we use machine learning approaches to construct an empirical model that models the distribution of Received Signal Strength (RSS) in an indoor environment.e model can determine the indoor position based on the RSS information in that position.
e model-based positioning system involves offline and online phases.
e RSS readings from different AP are collected during the offline phase with the machine learning approach, which captures the indoor environment's complex radiofrequency profile [7].e model is then trained with the RSS training samples.During the online phase, the client's position is determined by the signal strength and the trained model.Moreover, there is no state-of-the-art work that evaluates the model performance of different algorithms.No guidelines of the size of training samples and the number of AP are provided to train the models.
In this paper, we compare three machine learning models, namely, Support Vector Regression (SVR), Random Forest (RF), and eXtreme Gradient Tree Boosting (XGBoost), with the Gaussian Process Regression (GPR) to find the best model for indoor positioning.Each model is trained with the optimum parameter set obtained from the hyperparameter tuning procedure.Besides, the GPR is trained with three kernels, namely, Radial-Basis Function (RBF) kernel, Matérn kernel, and Rational Quadratic (RQ) kernel, and evaluated with the average error and standard deviation.We used the hyperparameter tuning procedure to tune the parameter for each model and get the optimal parameter set for each model and then compare the performances.e prediction results are evaluated with different sizes of training samples and numbers of AP. Results show that the XGBoost model outperforms all the other models and related work in positioning accuracy.Moreover, the XGBoost model can also achieve high positioning accuracy with smaller training size and fewer APs.We design experiment and use results to show the optimal number of access points and the size of RSS data for the optimal model.is paper is organized as follows.Section 2 summarizes the related work that constructs models for indoor positioning.Section 3 introduces the background of machine learning approaches as well as the kernel functions for GPR.Sections 4 and 5 describe procedure and experiment result we carried out for the indoor positioning with different approaches.Section 6 concludes the paper and outlines some future work.

Related Work
A great deal of previous research has focused on improving the indoor positioning accuracy with machine learning approaches.Brunato evaluated the k-nearest-neighbor approach for indoor positioning with wireless signals from several access points [8], which has an average uncertainty of two meters.Battiti et al. compared the neural network-(NN-) based model and k-nearest-neighbor model to determine the mobile terminal under the wireless LAN environment [9].Results show that the NN model performs better than the k-nearest-neighbor model and can achieve a standard average of 1.8 meters.Wu et al. compared different kernel functions of the support vector regression to estimate locations with GSM signals [6].eir results show that the SVR models have better positioning performance compared with NN models.As SVR has the best prediction performance in the current work, we select SVR as a baseline model to evaluate the performance of the other three machine learning approaches and the GPR approach with different kernels.
Besides machine learning approaches, Gaussian process regression has also been applied to improve the indoor positioning accuracy.Schwaighofer et al. built Gaussian process models with the Matérn kernel function to solve the localization problem in cellular networks [5].Bekkali et al. compared the kernel functions for GPR and developed a location sensing system based on RSS data [7].Alfakih et al. proposed a Gaussian Mixture Model to approximate the distribution of RSS for indoor localization [10].eir approach reaches the mean error of 1.6 meters.Less work has been done to compare the GPR with traditional machine learning approaches.Our work assesses the positioning performance of different models and experiments on the size of training samples and the number of APs for the optimum model.

Machine Learning Models and Gaussian Process Regression
In the past decade, machine learning played a fundamental role in artificial intelligence areas such as lithology classification, signal processing, and medical image analysis [11][12][13].More recently, there has been extensive research on supervised learning to predict or classify some unseen outcomes from some existing patterns.Given a set of data points x 1 , x 2 , . . ., x n   associated with set of labels y 1 , y 2 , . . ., y n  , supervised learning could build a regressor or classifier to predict or classify the unseen y from x.Here each x i is a feature vector with size n and each y i is the labeled value.A model h θ is built with supervised learning for the given input x i and the predicted value is h θ (x i ). e training process of supervised learning is to minimize the difference between predicted value h θ (x i ) and the actual value y i with a loss function L(h θ (x i ), y i ).
e model performance of supervised learning is usually assessed by  M i�1 L(h θ (x i ), y i ).

Support Vector Regression.
e support vector machine (SVM) model is usually used to construct hyperplane to separate high-dimensional feature space and distinguish data from different classes [14].Drucker et al. proposed a support vector regression (SVR) algorithm that applies a soft margin of tolerance ξ in SVM to approximate and predict values [15].ξ is used to define the soft margin allowed for the model.e weights w of the model are calculated given that model function h θ (x) is at most Ε from the target y; formally, ∀i In SVR, the goal is to minimize the function in equation (1).Here, C is the penalty parameter of the error term SVR uses a linear hyperplane to separate the data and predict the values.However, in some cases, the distribution of data is nonlinear.us, kernel functions map the nonlinear separable feature space to linear separable feature space with kernel functions [16].Equation (2) shows the 2 Mathematical Problems in Engineering Radial Basis Function (RBF) kernel for the SVR model, where σ defines the standard deviation of the data.‖x − y‖ 2 defines the squared Euclidean distance between feature vectors x and y: 3.2.Random Forest.In supervised learning, decision trees are commonly used as classification models to classify data with different features.Classification and Regression Trees (CART) [17] are usually used as algorithms to build the decision tree.However, using one single tree to classify or predict data might cause high variance.us, ensemble methods are proposed to construct a set of tree-based classifiers and combine these classifiers' decision with different weighting algorithms [18].Random Forest (RF) algorithm is one of the ensemble methods that build several regression trees and average the result of the final prediction of each regression tree [19].Algorithm 1 shows the procedure of the RF algorithm.Given the feature space and its corresponding labels, the RF algorithm takes a random sample from the features and constructs the CART tree with randomly selected features.During the procedure, N trees are built to generate the forest.During the test phase, test data are fed into the forest, and each CART tree in the forest predicts a value  y i based on the tree structure.e final output of the forest is the average vote of all the predicted values.During the training process, the number of trees and the trees' parameter are required to be determined to get the best parameter set for the RF model.

Boosting Approaches.
Besides SVR and RF, boosting is also useful in supervised learning to reduce bias and variance of the model by constructing strong models from weak models step by step [20].In each step, the model's weakness is obtained from the data pattern, and the weak model is then altered to fit the data pattern.In recent years, there has been a greater focus placed upon eXtreme Gradient Tree Boosting (XGBoost) models [21].Friedman et al. proposed to use gradient descent in the boosting approach to minimize the loss function  n i�1 L(y i , h (x i ) ) [22] and refined the boosting model with regression trees in [23].In their approach, the first-order Taylor expansion is used in the loss function to approximate the regression tree learning.In Chen and Guestrin's approach, they proposed to use a higher-order approximation to get a better regression tree structure [21].
e XGBoost algorithm works as Algorithm 2. e model is initialized with a function f 0 (x) � a 0 which minimizes the loss function  n i�1 L(y i , a 0 ).In each boosting step, the multipliers p ik and q ik are calculated as first-order Taylor expansion and higher-order Taylor expansion of loss function L(y i , f(x i )) to calculate the leaf weights which build the regression tree structure.en the current model f k (x) is updated with the previous model f k− 1 (x) with the shrunk base model ρa k h k (x).At last, the weak models f k (x) are combined to generate the strong model f(x).In XGBoost, the number of boosting iterations and the structure of regression trees affect the performance of the model.us, these parameters are tuned to with crossvalidation to get the best XGBoost model.

Gaussian Process Regression.
Gaussian process (GP) is a distribution over functions with a continuous domain, such as time and space [24].In recent years, Gaussian process has been used in many areas such as image thresholding, spatial data interpolation, and simulation metamodeling.A GP g(x) is usually parameterized by a mean function μ(x) and a covariance function K(x, x ′ ), formalized in equations ( 3) and (4).Given a set of data points x 1 , x 2 , . . ., x n   associated with set of labels y 1 , y 2 , . . ., y n  , each label y i can be seen as a Gaussian noise model as in equation ( 5).Here, GP((μ(x), K(x, x ′ )) defines the stochastic map for each data point and its label and ε i defines the measurement noise assumed to satisfy the Gaussian noise with σ standard deviation: Given the training data x with its corresponding labels y as well as the test data x * with its corresponding labels y * with the same distribution, then equation ( 6) is satisfied.Here, K(x, x) is the covariance matrix based on training data points x, K(x * , x) is the covariance matrix between the test data points and training points, and K(x * , x * ) is the covariance matrix between test points.
en, the conditional probability of y * can be formalized as equation (7): where Maximum likelihood estimation (MLE) has been used in statistical models, given the prior knowledge of the data distribution [25].us, given the training data points x with label y, the estimated y * of target x * can be calculated by maximizing the joint likelihood log(p(y * |y, x, x * )) in equation (7).

Mathematical Problems in Engineering
In GPR, covariance functions are also essential for the performance of GPR models.is paper mainly evaluates three covariance functions, namely, Radial Basis Function (RBF) kernel, Matérn kernel, and Rational Quadratic kernel.
e RBF kernel is a stationary kernel parameterized by a scale parameter l that defines the covariance function's length scale.Equation (2) shows the kernel function for the RBF kernel.
e Matérn kernel adds parameter v that controls the resulting function's smoothness, which is given in equation (9).Equation (10) shows the Rational Quadratic kernel, which can be seen as a mixture of RBF kernels with different length scales.In the equation, the α parameter controls the mixture of the length scales:

Experiment with Offline Training
In this paper, we use the RSS-based modeling technique that explores the relationship between the specific location and its corresponding RSS. Figure 1 shows the procedure that builds indoor positioning model by comparing the performance of different machine learning models.In the offline phase, RSS data from several APs are collected as the training data set.ere are two procedures to train the offline RSSbased model.In the first step, cross-validation (CV) is used to test whether the model is suitable for the given machine learning model.e CV can be used for feature selection and hyperparameter tuning.By using the 5-fold CV, the training data is split into fivefold.During the training process, the model is trained with the four folds of data and test with the left fold of data.e training procedure is repeated five times to calculate the average accuracy of the model with the specific parameter.After we get the model with the optimum parameter set, the second step of the offline phase trains the model with the RSS data.en, we got the final model that maps the RSS to its corresponding position in the building.Later in the online phase, we can use the generated model for indoor positioning.

Data Collection.
To construct the fingerprinting database and evaluate the machine learning models, we collect RSS data in an indoor environment whose floor plan is shown in Figure 2. In the building, we place (a) Construct a CART tree CART(x) with randomly selected data x ∈ x i with randomly selected features (b) Get the prediction  y of each CART tree and add the CART tree to the forest F � F ∪ CART(x) (3) Predict the final result y from the forest:  y �  N i�1  y i /N ALGORITHM 1: Random Forest algorithm.
(1) Given:(x 1 , y 1 ), . . ., (x n , y n ) where x i ∈ X, y i ∈ IR e number of iterations: N c) Determine the leaf weight for the learnt structure with p ik and q ik (d) Update current model f k (x) with previous model f k− 1 (x) and the constrained ρa k h k (x) e training set's size could be adjusted accordingly based on the model performance, which would be discussed in the following section.e 200 RSS data are collected during the day with people moving or environment changes, which are used to evaluate the model performance.

Hyperparameter Tuning.
As is shown in Section 2, the machine learning models require hyperparameter tuning to get the best model that fits the data.Table 1 shows the parameters requiring tuning for each machine learning model.Tuning is a process that uses a performance matrix to rank the regressors with different parameters to optimize a parameter for each specific model [11].In this paper, we use the distance error as the performance matrix to tune the parameters.Given the predicted coordinates of the location as ( x,  y) and the true coordinates of the location as (x, y), the Euclidean distance error is calculated as follows: Underfitting and overfitting often affect model performance.In this paper, we use the validation curve with 5-fold cross-validation to show the balanced trade-off between the bias and variance of the model.In the validation curve, the training score is higher than the validation score as the model will be a better fit to the training data than test data.
e increasing of the validation scores indicates that the model is underfitting.When the validation score decreases, the model is overfitting.us, validation curves can be used to select the best parameter of a model from a range of values.Results show that nonlinear models have better prediction accuracy compared with linear models, which is evident as the distribution of RSS over distance is not linear.Table 1 shows the optimal parameter settings for each model, which we use to train different models.
Figure 3 shows the tuning process that determines the optimum value for the penalty parameter C and kernel coefficient parameter c for the SVR with RBF and linear kernels.Results show that RBF has better prediction accuracy compared with linear kernels in SVR.It is evident, as the distribution of RSS over distance is not linear.us, linear models cannot describe the model correctly.Moreover, the selection of coefficient parameter c of the SVR with RBF kernel is critical to the performance of the model.e validation curve shows that when c is 0.01, the SVR has the best performance in predicting the position.

Mathematical Problems in Engineering
Figure 4 shows the tuning process that calculates the optimum value for the number of trees in the random forest as well as the tree structure of the individual tree in the forest.e validation curve shows that the maximum depth of the tree might affect the performance of the RF model.When the maximum depth of the individual tree reaches 10, the model comes to the best performance.e number of boosting iterations and other parameters concerning the tree structure do not affect the prediction accuracy a lot.
Figure 5 shows the tuning process that calculates the optimum value for the number of boosting iterations and the learning rate for the AdaBoost model.Results show that a   6 Mathematical Problems in Engineering higher learning rate would lead to better model performance.While the number of iterations has little impact on prediction accuracy, 300 could be used as the number of boosting iterations to train the model to reduce the training time.
Figure 6 shows the tuning process that calculates the optimum value for the number of boosting iterations, the learning rate, and the individual tree structure for the XGBoost model.To avoid overfitting, we also tune the subsample parameter that controls the ratio of training data before growing trees.e validation curve suggests that a higher learning rate and the number of boosting iterations could have a better model performance.e individual tree structure and the ratio of training data have less impact on prediction accuracy.

Kernel Selection for Gaussian Process Regression.
Besides the typical machine learning models, we also analyze the GPR with different kernels for the indoor positioning problem.Table 2 shows the distance error with a confidence interval for different kernels with length scale bounds.In statistics, 1.96 is used in the constructing of 95% confidence intervals [26].We calculate the confidence interval by multiplying the standard deviation with 1.96.Overall, the three kernels have similar distance errors.However, the confidence interval has a huge difference between the three kernels.
e RBF and Matérn kernel have the 4.4 m and 8.74 m confidence interval with 95% accuracy while the Rational Quadratic kernel has the 0.72 m confidence interval with 95% accuracy.Rational Quadratic kernel is the most stable model for the GPR algorithm.us, we select this as the kernel of the GPR model to compare with other machine learning models.

Model Evaluation and Experiment Results
In the previous section, we train the machine learning models with the 799 RSS samples.In this section, we evaluate  With the increase of the training size, GPR gets the better performance, while its performance is still slightly weaker compared with the XGBoost model.e size of the APs determines the size of the features.Results show that the distance error decreases gradually for the SVR model.e graph also shows that there has been a sharp drop in the distance error in the first three APs for XGBoost, RF, and GPR models.en the distance error of the three models comes to a steady stage.is trend indicates that only three APs are required to determine the indoor position.More APs are not helpful as the indoor positioning accuracy is not improving with more APs.Overall, XGBoost still has the best performance among RF and GPR models.

Conclusion and Future Work
In this paper, we evaluate different machine learning approaches for indoor positioning with RSS data.e models include SVR, RF, XGBoost, and GPR with three different kernels.Hyperparameter tuning is used to select the optimum parameter set for each model.en the performance of different models is evaluated using the Euclidean distance error between the predicted coordinates and real   coordinates.Results show that XGBoost has the best performance compared with all the other machine learning models.Also, 600 is enough for the RSS training size as the distance error does not change dramatically after the training size reaches 600.Results also reveal that 3 APs are enough for indoor positioning as the distance error does not decrease with more APs.Indoor position estimation is usually challenging for robots with only built-in sensors.Accumulated errors could be introduced into the localization process when the robot moves around.However, based on our proposed XGBoost model with RSS signals, the robot can predict the exact position without the accumulated error.us, more work can be done to decrease the positioning error by using the extended Kalman filter localization algorithm to fuse the built-in sensor data and the RSS data.

Figure 1 :
Figure 1: Indoor positioning modeling procedure with offline phase and online phase.

Figure 2 :
Figure 2: Indoor floor plan with access points marked by red pentagram.

Figure 3 :
Figure 3: Hyperparameter tuning for SVR with linear and RBF kernel.(a) c of RBF kernel.(b) C of RBF kernel.(c) c of linear kernel.(d) C of linear kernel.

Figure 7 (
b) reveals the impact of the size of APs on different machine learning models.In the training process, we use the RSS collected from different APs as features to train the model.

Figure 7 :
Figure 7: Features that affect model performance of indoor positioning.(a) Impact of the number of RSS samples.(b) Impact of the number of access points.

Table 1 :
Hyperparameter tuning for different machine learning models.
XGBoost also outperforms the SVR with RBF kernel.However, the XGBoost and the GPR with Rational Quadratic have similar performance concerning the distance error.In this section, we evaluate the impact of the size of training samples and the number of APs to get the model with high indoor positioning accuracy but requires fewer resources such as training samples and the number of APs.
different trained machine learning models.Results reveal that there has been a gradual decrease in distance error with the increasing of the training size for all machine learning models.In all stages, XGBoost has the lowest distance error compared with all the other models.e RF model has a similar performance with a slightly higher distance error.

Table 2 :
Distance error with confidence interval for different Gaussian progress regression kernels.