Indoor handset localization in an urban apartment setting is studied using GSM trace mobile measurements. Nearest-neighbor, Support Vector Machine, Multilayer Perceptron, and Gaussian Process classifiers are compared. The linear Support Vector Machine provides mean room classification accuracy of almost 98% when all GSM carriers are used. To our knowledge, ours is the first study to use fingerprints containing all GSM carriers, as well as the first to suggest that GSM can be useful for localization of very high performance.
Location-based services for cellular telephone networks are today very much in the public eye [
The database correlation method [
As for indoor radio-based localization, most studies which have appeared in the literature have involved WiFi networks, describing “corridor waveguide” scenarios in the workplace, and obtaining performance which, though interesting, can still be improved [
In this article, we present tests of indoor GSM localization using scans containing large numbers of carriers—up to the full GSM band. In order to keep working with such large numbers of carriers tractable, we propose to create a mathematical model mapping fingerprints to position using machine-learning techniques, in this case Support Vector Machines (SVM), and Multilayer Perceptrons (MLP), often also referred to as neural networks. We demonstrate the superiority of the machine learning approach, for problems with such high input dimensionality, over more traditional classifiers based on Euclidean (K-Nearest Neighbor) and Mahalanobis (Gaussian process) distances. Our results show that in an urban apartment setting, the room in which a handset is located can be identified with nearly 98% accuracy when the full set of GSM carriers is included. To our knowledge, this study, which is an extension of that described in [
The structure of the article is as follows. The data sets used in our study are presented in Section
The TEMS [
Schematic of apartment layout.
Ten of the carriers were found to contain no energy and were removed from the study. As the BSICs of the remaining 488 proved unreadable in many instances, a decision was made to exclude the BSICs entirely from the subsequent analysis, despite the possibility this engenders of confusing carriers at the same frequency in separate cellular motifs. The data set contained a total of 241 scans—approximately 48 scans per class, where a class is defined here simply as the index of the room within the apartment, indicated in Figure
The relatively small size of our dataset is a reflection of the difficult, time-consuming nature of obtaining labeled scan data—a point to which we will return later. Its high dimensionality (488 carriers) also limits the complexity of the classifiers which may be applied. To deal with these issues, signal strength-based carrier selection was initially carried out so as to define the four fingerprint types defined below. Further dimensionality reduction of any fingerprint can be obtained by a subsequent application of Principal Component Analysis (PCA).
Three vectors are used in defining the fingerprints:
These seven carrier fingerprints,
This fingerprint, defined as
The
All of the active carriers’ RSS values are included in the fingerprint, that is, no selection is in fact made.
Four types of classifier were tested:
A 2-class SVM classifier [
In the case of a linear SVM, the kernel function is just the scalar product
Architecture combining five one-versus-rest SVM classifiers to predict the class of an RSS vector from one of the carrier sets.
A multilayer Perceptron is a multivariate, nonlinear, scalar or vector function, which is a combination of parameterized elementary nonlinear functions called neurons [
The parameters of the multilayer Perceptron are estimated from the available training data by minimizing the least squared cost function
A multilayer Perceptron with a single output.
In a two-class (
In the present study, two strategies were compared for multiclass classification with multilayer Perceptrons. All functions
In either case, and example described by
As a first step,
As in the case of
To be more precise, let
We define the localization performance of a given classifier as the average of the validation scores obtained over our ten random splits, expressed as a percentage of correctly identified locations. The standard deviation over the ten splits is also calculated. The results are shown in Table
Percentage of correct radio fingerprint classifications on the 4 carrier sets described in the text. Figures quoted are averages and standard deviations over 10 randomly selected validation sets. All classifiers achieve their best performance when all 488 carriers are included. The most effective classifier for this case is the linear SVM.
Fingerprint Type | |||||||||
Classifier | Current Top 7 | Top 7/Memory | 35 Best Overall | All 488 | |||||
( | (36–40 carriers) | (35 carriers) | (488 carriers) | ||||||
Linear SVM | 71.3 ± 7.2 | 84.6 ± 3.6 | 90.4 ± 3.5 | ||||||
Gauss. SVM | w/o PCA | 72.2 ± 3.6 | 89.2 ± 2.9 | 93.2 ± 3.4 | |||||
w/PCA | 71.8 ± 3.2 | 85.6 ± 5.3 | 92.0 ± 3.0 | 96.4 ± 1.5 | |||||
Linear Perceptron | 66.9 ± 4.1 | 73.2 ± 5.1 | 79.7 ± 5.1 | 94.4 ± 2.6 | |||||
MLP (one versus all) | w/o PCA | 66.9 ± 7.1 | 87.2 ± 3.3 | 91.8 ± 3.4 | |||||
w/PCA | 68.1 ± 3.4 | 87.5 ± 4.5 | 89.6 ± 2.5 | 95.7 ± 2.1 | |||||
MLP (multiclass) sigmoids | w/o PCA | 56.8 ± 7.1 | 80.4 ± 12.9 | 92.6 ± 3.2 | |||||
w/PCA | 66.4 ± 5.7 | 85.1 ± 9.5 | 89.4 ± 3.6 | 96.1 ± 1.1 | |||||
MLP (multiclass) softmax | w/o PCA | 64.3 ± 7.5 | 85.7 | 91.2 | |||||
w/PCA | 67.7 ± 5.7 | 88.2 | 90.4 | 96.6 | |||||
K-NN | 5 59.3 | 26 85.1 | 20 93.3 | 20 94.9 | |||||
1-NN | 58.1 ± 5.2 | 74.7 ± 3.7 | 86.0 | 87.2 | |||||
GP ( | 78.8 ± 3.7 | — |
A few preliminary remarks about the table are in order. First, when the
The table shows that the performance of all classifiers tested improves as more carriers are added to the fingerprint, but that very good performance—for example, our best result of 97.8% in the case of the linear SVM—is only obtained on the
A more detailed look at our conclusion is given in Table
Confusion Matrices for 35 Best Overall and All 488 carrier sets, using a Linear SVM classifier. Figures quoted are in percent. Using the full number of carriers tightens up the diagonal to give individual room classification efficiencies near 100%.
Confusion Matrix | True class | ||||
35 Best overall | |||||
Pred. Class | 1 | 2 | 3 | 4 | 5 |
1 | 95 | 5.3 | 3.3 | ||
2 | 1.4 | 93.3 | 3.6 | ||
3 | 0.7 | 1.3 | 77.9 | 11.4 | |
4 | 16.4 | 87.9 | 0.7 | ||
5 | 2.9 | 2.1 | 0.7 | 96 |
Confusion Matrix | True class | ||||
All 488 | |||||
Pred. Class | 1 | 2 | 3 | 4 | 5 |
1 | 100 | 0.7 | |||
2 | 100 | ||||
3 | 91.4 | 1.4 | 1.3 | ||
4 | 5.7 | 98.6 | |||
5 | 2.2 | 98.7 |
We believe this study, which is an extension of that presented in [
Acquiring datasets and labeling scans is a tedious and time-consuming activity. To address this issue, two independent solutions are currently being investigated. First, experiments with semisupervised classification techniques using kernel methods (see, e.g., [