Random Forest Based Coarse Locating and KPCA Feature Extraction for Indoor Positioning System

With the fast developing of mobile terminals, positioning techniques based on fingerprinting method draw attention from many researchers even world famous companies. To conquer some shortcomings of the existing fingerprinting systems and further improve the system performance, on the one hand, in the paper, we propose a coarse positioning method based on random forest, which is able to customize several subregions, and classify test point to the regionwith an outstanding accuracy comparedwith some typical clustering algorithms. On the other hand, through the mathematical analysis in engineering, the proposed kernel principal component analysis algorithm is applied for radiomap processing, whichmay provide better robustness and adaptability compared with linear feature extractionmethods andmanifold learning technique.We build both theoretical model and real environment for verifying the feasibility and reliability. The experimental results show that the proposed indoor positioning system could achieve 99% coarse locating accuracy and enhance 15% fine positioning accuracy on average in a strong noisy environment compared with some typical fingerprinting based methods.


Introduction
With the fast developing of mobile terminals and wireless network techniques, location based services (LBS) are becoming unprecedented popular recent years.World prestigious research institutions have exerted great attention and effort on both indoor positioning and relative business applications, such as the cooperation between Alibaba and AutoNavi and the competition between Google and Baidu.Many emerging indoor positioning systems based on ultrasound, infrared, and radio frequency have been proposed recently [1,2].Fingerprinting method, as one of the most convenient techniques, has exhibited more popularity than other systems in the application of civilian uses due to its capability of using the existing WLAN infrastructure and its wide coverage [3,4].
Firstly, in order to avoid tremendous computation complexity and reduce the error margin of a large fingerprinting dataset, or radio map, clustering algorithm is normally implemented in indoor positioning systems to separate the database into several subradio maps to boost the performance.However, conventional methods, for example, means, fuzzy -means (FCM), affinity propagation, and so forth [5,6], may only cluster the radio map on the basis of received signal strength (RSS) value in signal space rather than concerning the coordinates in physical space.Besides, reference points (RP) of certain areas (e.g., confidentiality room) within a radio map might be required to be assigned together for the purpose of providing indoor positioning services of certain area only to the people who are authorized.Under the circumstances, conventional methods may not suffice because the range of a cluster mostly depends on algorithm instead of administrator.
To solve the issues, we propose a coarse locating method based on random forest (RF) [7] for accurately classifying a test point to the subarea, while the number of subregions and its range can be customized by system administrator without constraints presented above.Moreover, the coarse locating scheme can also work on three-dimensional scenario for discriminating different floors or as the basis for positioning on a size-reduced radio map.In addition, compared with some typical machine learning techniques, such as artificial neural network (ANN) [8] and support vector machine (SVM) [9][10][11], the proposed RF based coarse positioning method shows a better performance in terms of classification accuracy and training time complexity.
Secondly, deploying feature extraction algorithm on the radio map is able to decrease the noise and improve the positioning accuracy at the expense of increasing computation complexity [12].Some algorithms have been successfully introduced into the fingerprinting system.For instance, an indoor positioning system based on linear discrimination analysis (LDA or MDA) is presented in [8].Parallel with that, principal components analysis (PCA) for indoor positioning is well deployed in [13,14].However, PCA and LDA are linear feature extraction methods, and it is hard to prove that the linear methods are suitable for processing dataset with nonlinear features, such as the radio map.Some typical manifold learning methods, for example, local linear embedding (LLE), isometric feature mapping (ISOMAP), and local discriminant embedding (LDE), can be deployed in the positioning system as well [15,16].However, LLE is a local approach and highly requires dense data on the manifold for ensuring precise extraction, which seems to be impractical to perform in some indoor areas.ISOMAP presents capability of representing dataset in global, but it is based on the hypothesis that the high dimensional dataset is isometric to a convex subset of Euclidean space.For the LDE algorithm, it indeed shows a comparable performance in terms of feature extraction and efficiency of dimensional reduction compared with others according to our early works [15,17].However the shortage of the local method still exists in LDE, in addition to the unsettled optimal number of neighbors and the most appropriate clustering method within LDE.
In this paper, a kernel PCA approach based on the PCA method [13,18] is deployed as the feature extraction technique for the indoor positioning system.It takes advantage of the kernel methods to map the nonlinear dataset into higher dimensions, where the dataset is able to reveal linear structures and be divisible.Moreover, though manifold learning is different from the kernel method, they actually have internal relations.Essentially ISOMAP, LLE, and LTSA (local tangent space alignment) [16] can be expressed or explained by kernel PCA but with different kernel matrix [19].Therefore kernel PCA algorithm potentially could be an effective method with adaptive kernel function and adjusted parameters.
The remainder of this paper is organized as follows.In Section 2, we expound the conventional fingerprinting method for positioning, based on which, in Section 3, we propose a new indoor positioning system followed by the theoretical analysis of the RF based coarse locating method and the kernel PCA feature extraction method.In Section 4, we will verify the proposed system in both simulated model and real environment; related methods comparisons are also presented in this part.Conclusion is remarked in Section 5.

Fingerprinting Method for Indoor Positioning
This section includes a detailed introduction of traditional fingerprinting positioning system in indoor environment.At the beginning, RSS values are sampled on a grid of reference points (evenly distributed in the interesting region) by an administrator with the mobile terminal from all available access points (AP).A radio map thereby can be built with both RSS values and location coordinates.The current location of any customer can be estimated according to their RSS readings of the mobile device by matching the similarity between the received values and the built radio map.Thus, the radio map building and location matching are the two major procedures for the traditional fingerprinting methods.

Building Up Radio Map.
As we presented before, the RSS values are sampled and recorded at known spots using a mobile device.However, the height and the direction of a device antenna may affect the sampled signals, which, to some extent, influence positioning accuracy of a positioning system (also communication states, type of WLAN card, and the driver version may have impact on it).Therefore, as a compromised resolution, we concern only holding-in-hand situation, which means that the user is holding the mobile for using LBS service; therefore terminal antenna normally is placed at 1.2 m and its direction is kept consistent while no data is transmitting by any AP, and RSS readings on each reference point are taken in four directions in average.Defining RSS values sampled from AP,  = 1, 2, . . ., , at RP,  = 1, 2, . . ., , as  , (),  = 1, 2, . . ., Θ, where Θ is the total number of discrete time samples and  and  are the number of available APs and RPs, respectively, the actual RSS readings of AP at RP are considered as average of the time samples and represented by Therefore, the RSS part of the radio map can be denoted by and each row of Φ (the row vector of the matrix), denoted by   = [ ,1 ,  ,2 ,  ,3 , . . .,  , ],  = 1, 2, . . ., , actually stands for the RSS values of each RP.Then the radio map can be denoted by (  ,   ),  = 1, 2, . . ., ,   ∈ R  where   = (  ,   ) is the location coordinates of the RP.

Location Matching Based on Radio Map.
There are many approaches which can be deployed for location matching process: probabilistic methods such as Bayesian, deterministic methods such as -nearest neighbors, machine learning methods such as ANN [8], and support vector regression [12].To concentrate on the two proposed methods and also consider the limitation of mobile computing platform, we simply adopt weight -nearest neighbors (WKNN) method for the location matching in the proposed positioning system due to its effectiveness and low computation complexity.An analysis of WKNN is briefly presented below.After receiving signals from all available APs on a test point (TP), the RSS readings will be calculated for matching with the most similar point throughout the radio map.WKNN method measures the similarity between the TP and each RP by calculating their distance where  ,test is the RSS value received from AP of the TP and  is defined as Euclidean distance when  equals 2. And then the  RPs with the minimum distances will be taken with normalized weights  to estimate the location of the TP according to the coordinate information of the radio map.
Finally the estimated location of the TP will be calculated by It is obvious that the dimensionality of a radio map depends on both the number of RPs and quantity of deployed APs.Therefore, in the case of positioning quite a large area with many RPs and numerous APs, the size of radio map will be expanded largely and the computational burden will increase sharply.Besides, in case of some APs being breaking down, the fingerprinting system may be severely damaged due to the missed dimension, even out of work.

Proposed Indoor Positioning System and the Deployed Methods Analysis
We design the new positioning system with two phases by and large, which are offline stage and online stage, respectively.The structure of the proposed system is illustrated in Figure 1.
The radio map of positioning area is established firstly in the offline stage as presented in Section 2. Then it is divided into several subradio maps in line with supervisor demands.After that the random forest classifiers (models) for coarse location are generated by training those labeled subradio maps.Meanwhile the kernel PCA method can be implemented for extracting the effective features of each subradio map.The extracted subradio maps (in feature space) for each subregion and corresponding transfer matrix gained from the last step will be stored together with the RF models for the online positioning procedure.
While in the online stage, a group of RSS values are received firstly by a mobile terminal.Then the terminal will be directly located to a subregion by applying RF classifiers, of which the process is also denoted by coarse locating procedure.After that, the sampled RSS values are extracted into feature space by the corresponding transfer matrix of the subregion in order to match with the radio map in feature space.Thereafter, online matching algorithm (WKNN) is employed for precise position estimation and finally the proposed positioning system will output the estimated location coordinates.
Initialization: setting the number of class ; number of RPs ; number of features  (i.e., number of APs); number of features  at a node of decision tree, where  < .

FOR Each decision tree
Selecting a subset (with replacement) of radio map dataset randomly with known label of class (i.e., to randomly select  RPs with its class labels, where  < ).The rest part of radio map is reserved to test the error rate.
FOR each node of the tree Selecting  features randomly to make the criterion at the node Calculating the best split accordingly

Algorithm 1
Moreover, different from some positioning systems, which are designed to upload the measured RSS to a central server for computing and estimating location [8], the proposed system has the capability of running independently on a mobile terminal without any central computer aid.Furthermore, in the proposed system, the computational complexity and the resource limitation of mobile terminal are considered comprehensively.Thus, the main consumption of computing (including training RF classifiers and KPCA transfer matrixes) could be carried out by powerful computer in the offline stage so as to simplify the online computation complexity.

Coarse
Locating by Random Forest.Random forest substantially is an ensemble classifier including many decision trees [7].The output of the RF classifier actually is the class number that most frequently occurs in the output of the decision trees.The training of each decision tree is based on a recursive process where the input dataset is split into subsets and the process keeps on running until all the tree nodes have the same output targets.Random forest classifier takes weights based on the contribution as parameter that corresponds to the number of decision trees, and those weights will be created in the ensemble forest classifier without the traditional tree pruning process (which is simply removing parts of the tree with small contribution).
Random forest classification is adopted in our proposed locating system for coarse positioning process.Its ensemble bagged forest classifier has advantages of resolving the overfitting problem and being able to process large scale dataset which meets the locating system requirement.The procedures of training random forest classifiers are briefly given in Algorithm 1.
After training the random forest classifiers, the generated classification model could be directly used in the proposed positioning system for coarse locating.For instance, we put the RSS values of a test point into the classifiers, and then the system output would illustrate which class the test point is located to.

Feature Extraction by KPCA.
Kernel PCA (KPCA) method is responsible for the feature extraction procedure in the proposed system.Extracting precise features of radio map could reduce the interference and enhance the positioning performance.A brief introduction of KPCA algorithm is given below.
Consistent with the representation of the radio map defined in fingerprinting method, the RSS values of a subregion thereby can be denoted by Φ  = { 1 ,  2 , . . .,   } (concerning the RSS part only), where  is the number of RPs belonging to the subregion  and each vector of Φ  needs to be decentralized in advance.Then we define nonlinear mapping  : R  → F, where R  is the original space of RSS samples and F is the high-dimensional feature space where RSS vectors could be linearly separable.So the covariance matrix of the RSS vectors in feature space and the eigenvectors could be expressed as where   are the weights of each (  ).According to the eigen-decomposition equation V = CV, where  and V are the eigenvalue and eigenvector of C, respectively, we could substitute (5) where    is the th element of   ,  = 1, 2, . . ., .Therefore, the projection of a RSS sample  on th dimension of the feature space can be calculated by where Δ  is a normalized factor which can be derived from the equation V(V)  = 1.Taking the maximum  eigenvalues λ1 , λ2 , . . ., λ and the relevant  eigenvectors  1 ,  2 , . . .,   ,  ≪ , we may extract  features (dimensions) from each RP and TP (vector).Denoting by transfer matrix of the subregion , a TP of this region in feature space finally can be calculated by Similarly, this procedure would also work on the feature extraction of subradio maps in the offline phase.WKNN algorithm then will be deployed for matching the extracted TP throughout the subradio map in feature space and at last the estimated location can be obtained.

System Simulation and Implementation
The proposed indoor positioning system evaluation will be presented based on simulation model and real environment, respectively, in this section.Both the RF based coarse locating accuracy and KPCA feature extraction performance are examined and compared in detail.

Simulated Indoor Positioning Environment.
In order to validate the proposed methods for indoor positioning, a system simulation is implemented firstly.Figure 2 illustrates the floor plan of a research center, where 27 APs are deployed evenly in different rooms.According to it we build an indoor propagation model based on finite difference time domain (FDTD) and ray-tracing (related theory analysis is presented in our early work [20]).Figure 3 illustrates RSS distribution of AP18 at one sample with interval 0.5 m, where RSS values are represented by gradient colors, AP is labeled with a blue triangle, and little random noise is added on.Radio map therefore is taken as average value of 100-time sample at each reference point from all APs.
Under the simulated ideal environment, the number of nearest neighbors  is set at 4 because it affects accuracy a little, and the number of features  is set as 8 for ensuring contribution rate of the eigenvalue is over 95%.According to Figure 4, WKNN fingerprinting method achieves 93% confidence probability under the situation that positioning error distance is within 2 meters and ranks top.Among the feature extraction methods for positioning in [8,13,15], PCA based system performs better and comes close to the WKNN fingerprinting system.The proposed KPCA method ranks second among the feature extraction methods in the ideal environment.
However, when we enhance the ambient noise (especially adding on one of APs in consideration of abnormal state), as shown in Figure 5, the proposed KPCA positioning system outperforms others with 82% confidence probability while the error distance is within 2 meters, which actually reveals the better robustness of the proposed system in noisy situation.

4.2.
Real Indoor Positioning Environment.The system simulation theoretically proves feasibility and robustness of the proposed positioning system.However, it is worth emphasizing that the theoretical model is based on the assumption that the indoor area is in a relatively ideal environment, although random noise has been added on (it affects little because we take average RSS value and the noise would approach to zero in average).Factors such as interference from ISM band, movement of people, and electrical devices may also affect the RSS values.Therefore, in order to validate the effectiveness of the theoretical model and test the proposed system in practical, we sample the RSS readings 100 times at each direction on each RP (north, south, east, and west, 400 times in all with sampling rate two times per sec) by a mobile terminal and record their location coordinates in the real office environment as shown in Figure 6.Due to the barriers and authorization limitation, the sampled areas are the corridor (49.4 m × 14.1 m) where equally distributed (meshed) 828 location spots as the experimental RPs.In addition, all the RSS values below −70 dBm are blocked off because of instability and fluctuation, and the average of discrete 400 samples is taken as the RSS value of a RP.
A comparison between measured RSS in real environment and the RSS simulated by FDTD model is presented in Figure 7.The result turns out that the estimated RSS value by simulated model is reasonable and therefore the proposed methods could be reliable in the given environment.
We firstly verify the proposed random forest coarse positioning method (different coarse positioning methods in the simulated environment are precise and close to each other, so we only test it in real environment).According to Table 1, the fuzzy -means and -means methods are deployed as the comparison, where the clusters is automatically generated by the algorithms and the coarse classification accuracy is 82.1% and 87.6%, respectively.
In terms of the SVM method (optimized by genetic algorithm) [10,11] and the proposed RF method, their subregions are able to be defined freely.Therefore the number of RPs within a region is set according to the corners or partitions of the building and then assigned to both methods.As a result, according to Table 2, the 98.9% coarse locating accuracy of the proposed RF method ranks first among the four approaches in addition to its capability of customizing regions.
For the fine positioning performance, according to Figure 8, the accuracy of WKNN fingerprinting method performs roughly the same as in the simulated strong noisy environment (compared with Figure 5).Under the same condition that  is set at 4 and  is 8, the proposed system based on KPCA still ranks top with 87% confidence probability while the positioning error is within 2 meters, followed by the positioning systems based on PCA, LDE, and LDA algorithms, respectively [8,13,17].The result verifies  the robustness and effectiveness of the proposed system.Moreover, the feasibility and reliability of the proposed system have been validated in Android OS on smart terminal as shown in Figure 9.
In conclusion, traditional fingerprinting method is able to provide the best accuracy in an ideal environment theoretically.Any forms of feature extraction would cause the information loss, which inevitably does harm to the positioning performance.However, ideal condition rarely exists in real environment.The abilities of noise suppression and precise feature extraction of the target radio map are supposed to be the key points to evaluate locating systems.The proposed positioning system is able to provide nearly 99% coarse locating accuracy with customized range.And its KPCAbased feature extraction method may provide outstanding robustness and boost 15% confidence probability in real noisy environment compared to traditional fingerprinting method (87% and 72%, resp., under the circumstance that error distance is within 2 meters).

Conclusion
In this paper, we propose an indoor positioning system which includes random forest method for coarse localization and KPCA algorithm for feature extraction.The structure of proposed system is designed for mobile computation, where the offline stage is responsible for most of the data computing procedure by powerful central server, and all the processed subradio maps and generated functions (classifiers and matrixes) created by offline phase would be stored and transmitted to mobile terminal and then applied in the online stage independently for real time positioning procedure.Our future work will also focus on dynamic indoor positioning

Figure 1 :
Figure 1: The structure of the proposed indoor positioning system.

Figure 2 :
Figure 2: Floor plan of a research center and the APs deployment.

Figure 3 : 8 Figure 4 :
Figure 3: Distributed RSS received from AP18 with one sampling based on FDTD propagation model with sampling interval of 0.5 m.

8 Figure 5 :
Figure 5: Positioning systems comparison based on different feature extraction algorithms in simulated strong noise environment; KPCA-based method performs best.

Figure 6 :Figure 7 :
Figure 6: Distributed average RSS value of 400 samples received from AP23 by a mobile terminal in real environment with sampling interval of 0.5 m.

8 Figure 8 :
Figure 8: Positioning systems comparison based on different feature extraction algorithms in real environment, KPCA-based method performs best.

Figure 9 :
Figure 9: The proposed indoor positioning system is running on LG Nexus.

Table 1 :
Performance of FCM and -means for classification.

Table 2 :
Performance of SVM and RF for classification.