Full-Band GSM Fingerprints for Indoor Localization Using a Machine Learning Approach

Indoor handset localization in an urban apartment setting is studied using GSM trace mobile measurements. Nearest-neighbor, Support Vector Machine, Multilayer Perceptron, and Gaussian Process classifiers are compared. The linear Support Vector Machine provides mean room classification accuracy of almost 98% when all GSM carriers are used. To our knowledge, ours is the first study to use fingerprints containing all GSM carriers, as well as the first to suggest that GSM can be useful for localization of very high performance.


Introduction
Location-based services for cellular telephone networks are today very much in the public eye [1].Global Positioning System, or GPS, receivers integrated into cellular handsets can provide very accurate positioning information; however, few mobiles are so equipped at present, and GPS furthermore performs poorly in the indoor and urban canyon environments which are prevalent in wireless networks.For these reasons, the study of localization techniques based upon the radio networks themselves is also a very active area.Most commercially installed systems still rely on cell-ID, in which the mobile station's position is reported as that of the serving base station.Although improvement is possible using triangulation, time of arrival, and the like, the accuracy of such methods is in practice compromised by the path loss and multipath characteristics inherent in the radio channel [2].
The database correlation method [3] allows to overcome channel effects to a certain extent.In this method, a mobile is localized by comparing one of the regularly emitted Received Signal Strength (RSS) measurements to a position-labelled database of such measurements, which are often called fingerprints.Existing localization services implemented in some GSM networks rely on Network Measurement Reports (NMR), which are a part of the GSM norm and contain the RSS and Base Station Identity Code (BSIC) of the serving cell and six strongest neighboring cells.The resulting 7component vector allows a localization precision of some tens of meters in outdoor environments (see, e.g., [4,5]).
As for indoor radio-based localization, most studies which have appeared in the literature have involved WiFi networks, describing "corridor waveguide" scenarios in the workplace, and obtaining performance which, though interesting, can still be improved [6][7][8].Another approach, using the household power lines as an antenna, appears in [9].The notion of using GSM or CDMA networks for localization in indoor environments, particularly in domestic settings, is still somewhat new (see, e.g., [10,11]).The basic idea is that inside a building, the RSS of the external base stations will be strongly correlated with a mobile's exact position, due to for example the varying absorption of electromagnetic energy by different building materials, and the exact placement of doors and windows.There has also been evidence that including more than the standard 7carriers of the NMR fingerprint is advantageous in indoor GSM localization [10,12].
In this article, we present tests of indoor GSM localization using scans containing large numbers of carriers-up to the full GSM band.In order to keep working with such large numbers of carriers tractable, we propose to create a mathematical model mapping fingerprints to position using machine-learning techniques, in this case Support Vector Machines (SVM), and Multilayer Perceptrons (MLP), often also referred to as neural networks.We demonstrate the superiority of the machine learning approach, for problems with such high input dimensionality, over more traditional classifiers based on Euclidean (K-Nearest Neighbor) and Mahalanobis (Gaussian process) distances.Our results show that in an urban apartment setting, the room in which a handset is located can be identified with nearly 98% accuracy when the full set of GSM carriers is included.To our knowledge, this study, which is an extension of that described in [12], is the first to use fingerprints of all carriers in the GSM band, and the first to demonstrate very good performance on indoor localization using GSM.
The structure of the article is as follows.The data sets used in our study are presented in Section 2, while a discussion of preprocessing and the classifiers tested are given in Section 3. Our results are discussed in Section 4, while our conclusion, as well as some perspectives for the future is outlined in the final section.

Data Sets
The TEMS [14] trace mobile system was used to take twicedaily scans of the entire set of 498 GSM carriers in 5 rooms of a 5th floor apartment (top floor) in Paris, France.Both the RSS and the BSIC, where readable, were requested for each carrier in the scans.The layout of the apartment is shown in Figure 1.Acquisitions could be made anywhere within a room; however, in practice, the scans were recorded in those areas where the necessary laptop and cellphone could be conveniently set down and accessed.An exhaustive coverage of all rooms was thus not assured.

Data Analysis
3.1.Preprocessing.Ten of the carriers were found to contain no energy and were removed from the study.As the BSICs of the remaining 488 proved unreadable in many instances, a decision was made to exclude the BSICs entirely from the subsequent analysis, despite the possibility this engenders of confusing carriers at the same frequency in separate cellular motifs.The data set contained a total of 241 scansapproximately 48 scans per class, where a class is defined here simply as the index of the room within the apartment, indicated in Figure 1.To obtain a measure of the statistical significance of our classification results, cross-validation was performed with ten independent randomly selected splits of our data, each one containing 169 training examples and 72 validation examples.In a given split, the training and validation examples were uniformly distributed over time during the one-month acquisition period.

Dimensionality Reduction and Fingerprint Types.
The relatively small size of our dataset is a reflection of the difficult, time-consuming nature of obtaining labeled scan data-a point to which we will return later.Its high dimensionality (488 carriers) also limits the complexity of the classifiers which may be applied.To deal with these issues, signal strength-based carrier selection was initially carried out so as to define the four fingerprint types defined below.Further dimensionality reduction of any fingerprint can be obtained by a subsequent application of Principal Component Analysis (PCA).
Three vectors are used in defining the fingerprints: where 1 is the so-called indicator function, and j represents the mean over the index j.The first, g 7 j , contains the indices of the 7 strongest carriers, i, in example j.The vector G 7 , composed of the indices of the carriers which were among the strongest 7 in at least one scan of the training set, contains between 36 and 40 of such "good" carriers, depending upon the random split used.The third vector, G 35 , consists of the indices of the 35 carriers which were the strongest on average, over the whole training set.The fingerprints may then be defined as follows.
(1) Current Top 7.These seven carrier fingerprints, RSS(g 7 j ), are meant to mimic standard "top 7" NMRs, which were not present in our scans.Indeed, NMRs are only logged during a communication, while our scans were obtained in idle mode.Validation set fingerprints can in fact contain less than 7 elements if certain carriers were not represented in the training set.For classifiers requiring fixed labeling of input vectors, such as KNN, SVM, and MLP, the seven RSS(g 7 j ), values are entered at the corresponding positions in a vector of length G 7 , and the rest of the elements are set to zero.
(2) Top 7 with Memory.This fingerprint, defined as RSS(G 7 ), includes the values of all of the 36-40 "good" carriers; they are thus "wider" than the Current Top 7 fingerprint defined above.
(3) 35 Best Overall.The 35 Best Overall fingerprint, of length 35, is defined as RSS(G 35 ).It thus gives another way of assessing the "goodness" of a carrier, by the size of its average RSS value over the whole training set.
(4) All 488.All of the active carriers' RSS values are included in the fingerprint, that is, no selection is in fact made.

Classifiers. Four types of classifier were tested:
(1) Support Vector Machines (SVM).A 2-class SVM classifier [15] finds the separating surface which maximizes the distance (or "margin") between that surface and the data points on either side of it.The SVM can be linear and operate directly upon the data, or first map the data onto a higherdimensional space using a non-linear transformation, before finding the maximum margin surface.The SVM decision rule takes the sign of with x the RSS vector to be localized, N s the number of support vectors s i (training vectors which are on the boundary of the optimal margin), and y i = ±1 the class label of the vector s i .K(•) here is the selected kernel, and b as well as the α i are parameters determined in the search for the optimal separating surface.For large, well-behaved data sets, the SVM rule approximates the Bayes decision rule [15].
In the case of a linear SVM, the kernel function is just the scalar product K(s i , x) = s i • x.The standard Gaussian kernel was adopted in our tests of non-linear SVMs, where the variance, σ 2 , is optimized in the cross-validation stage.Since a "soft margin" approach was used (i.e., some training examples were allowed to lie within the margin), a regularization parameter controlling the complexity of the separating surface [15] was also estimated by crossvalidation.For m classes, it is traditional, using the "conventional recipe" [16], to construct m binary, one-versus-rest classifiers, and take as the output class that of the classifier having the largest output value before thresholding.This procedure is illustrated for the case of m = 5 in Figure 2. The Spider SVM package [17] was used in all of our analyses.
(2) Multilayer Perceptron (MLP).A multilayer Perceptron is a multivariate, nonlinear, scalar or vector function, which is a combination of parameterized elementary nonlinear functions called neurons [18].A neuron is usually a function of the form f = tanh(θ • x) where θ is the vector of parameters of the neuron and x is the vector of variables.A single-output "multilayer Perceptron" g(x) is a combination of N h "hidden" neurons f i (i = 1 to N h ) and of a constant equal to 1. Denoting by Θ 1 the vector of parameters of the linear combination (of size N h + 1), by Θ 2 the (N + 1, N h ) matrix whose elements are the parameters of the "hidden" neurons, and by f the vector (of size N h +1) of functions computed by the N h hidden neurons with an additional component equal to 1, the multilayer Perceptron function is of the form Multilayer Perceptrons are frequently described pictorially as shown in Figure 3.
The parameters of the multilayer Perceptron are estimated from the available training data by minimizing the least squared cost function with respect to all parameters, where x k is the vector of variables pertaining to example k and y k is the measured value of the quantity of interest for example k.In the present study, the gradient of the cost function was computed by a computationally efficient algorithm known as "backpropagation", and the optimization of the cost function was International Journal of Navigation and Observation Figure 2: Architecture combining five one-versus-rest SVM classifiers to predict the class of an RSS vector from one of the carrier sets.

Hidden neurons
Variables Figure 3: A multilayer Perceptron with a single output.
performed by the conjugate gradient algorithm with Powell-Beale restarts [19].
In a two-class (A, B) classification problem, y k = +1 for all examples of class A and y k = −1 for all examples of the other class.After training, an unknown example described by vector x is assigned to class A if sgn(g(x)) = +1, and to class B otherwise.In the present study, function h was taken identical to function f.For a c-class problem, example k, belonging to class i (1 ≤ i ≤ c), is assigned a vector y k , of dimension c, that encodes the class in a 1-out-of-c code: all components are equal to −1, except component i, which is equal to +1.The number of output neurons is equal to the number of classes, so that the output of the multilayer Perceptron is a vector g(x) of dimension c.The cost function that is optimized during training is In the present study, two strategies were compared for multiclass classification with multilayer Perceptrons.
(i) All functions h were taken identical to f (sigmoid functions), so that the output vector of the multilayer Perceptron was where Θ 1 is the (c, N h +1) matrix of the parameters of the output neurons.(ii) Output i (1 ≤ i ≤ c) of the multilayer Perceptron was computed as In either case, and example described by x was assigned to the class j such that In the second case, the components of vector g belong to [0, 1] and sum to 1, so that they can be interpreted safely as estimates of the posterior probability of class c given the observed vector x.
(3) K-Nearest Neighbor (K-NN).As a first step, K-NN ranks the training vectors according to their RSS-space Euclidean distances from a test vector to be localized.The predicted class of this test vector is then the class most often represented in the K "nearest" vectors according to the defined metric.The K parameter is chosen empirically, to optimize performance.When a single best neighbor is used, K = 1, and the classifier is called 1-NN. 1 SVM and K-NN can have < 7 carriers if some did not show up in the training set. 2 Small training set size precludes training a nonlinear classifier due to Cover's theorem [13]. 3Best result obtained using the first 4 principal components. 4Gaussian process is equivalent to 1-NN for fixed input vector length.
(4) Gaussian Process (GP).As in the case of K-NN, GP starts by comparing the test RSS vector to be localized to every vector in the training set.The probability, P 1 , that the compared vectors correspond to measurements at the same geographical position is assumed to be Gaussian in the Euclidean RSS distance between the two vectors, using a fixed variance σ 2 which is determined empirically.If a carrier appears in one of the compared vectors, but not in the other, GP presumes that the missing value was below the reception threshold in the vector lacking it.A penalty term probability, P p , is introduced, in which the missing RSS value is replaced by an estimate of the reception threshold, taken to be the smallest RSS in the vector which is missing the carrier.The overall GP probability, P, is the product of P 1 and P p .
To be more precise, let A and B be sets of indices of carriers contained in a training set vector, and a test set vector, respectively.We define the set of common carriers as C = A ∩ B, and the noncommon carrier sets as D = A − C and E = B − C, for the train and test sets, respectively.We then have where RSS A i is the signal strength of the ith carrier of set A, and the order of each root normalizes the probability to the number of carriers in the corresponding term.GP is actually the only classifier tested which is able to handle missing carriers in a natural way.When input vectors are of fixed length-a requirement for SVM, MLP, and KNN-and all variables must be represented, GP is equivalent to a 1-NN classifier.As a caveat, however, as we do not use the BSIC information, in some cases, carriers with the same index can belong to different cellular motifs, which would penalize the GP method.

Results
We define the localization performance of a given classifier as the average of the validation scores obtained over our ten random splits, expressed as a percentage of correctly identified locations.The standard deviation over the ten splits is also calculated.The results are shown in Table 1.
A few preliminary remarks about the table are in order.First, when the All 488 fingerprint is used, it is not meaningful to apply a non-linear classifier to the data.This is because of Cover's theorem [13] [20]).For this reason, linear classifiers are not applied in those cases where PCA is used.A few further details are explained in the remaining footnotes of Table 1.
The table shows that the performance of all classifiers tested improves as more carriers are added to the fingerprint, but that very good performance-for example, our best result of 97.8% in the case of the linear SVM-is only obtained on the All 488 carrier fingerprint.The implication is that indoor position can indeed be deduced from the RSS of GSM cell towers, but that commonly used 7-carrier NMRs and even "wide" fingerprints are insufficient: high performance requires fingerprints of very high dimensionality.It is reassuring to see that this conclusion is supported by all the classifiers tested, including a simple K-NN, even if the best results are obtained with SVM and MLP machine learning techniques.MLP performance appears slightly worse than that of linear SVMs, within the statistics of our sample, with the best MLP performance, 96.6%, obtained on a multiclass MLP with the softmax output function applied to All 488 carriers, after an input dimensionality reduction.
A more detailed look at our conclusion is given in Table 2, where the confusion matrices for the linear SVM classifier on the 35 Best Overall and All 488 fingerprints appear.The table shows once again that the ability to sharply discriminate between rooms comes only with the inclusion of the full GSM carrier set.The deviation of our global result from 100% is in fact dominated by the confusion between class 3 and class 4, which appears to be the most difficult case.

Conclusions and Perspectives
We believe this study, which is an extension of that presented in [12], to be the first to include the full set of GSM carriers in RSS fingerprints for localization.Although confirmation with more extensive databases will be required, our results strongly suggest that high-performance room-level localization is possible through the use of such fingerprints.The fact that good performance is obtained irrespective of the machine learning technique used (MLPs or SVMs,) is a further confirmation that the useful information for localization is obtained by taking into account many GSM carriers, including those which may be rather weak.Finally, it is interesting to note that our result is robust against time-dependent effects-network modifications, propagation channel changes, meteorological effects, and so forth, as our dataset was acquired over a period of one month.
Acquiring datasets and labeling scans is a tedious and time-consuming activity.To address this issue, two independent solutions are currently being investigated.First, experiments with semisupervised classification techniques using kernel methods (see, e.g., [21]) are being carried out, which will permit to take advantage of the unlabeled scans during the training procedure.The second approach entails the design and construction, in our laboratory, of a set of ten autonomous scanning devices which will allow the acquisition of large datasets simultaneously in different rooms, labeled with very little human intervention.These devices will also enable to test the efficiency of our approach when implemented using mixed datasets of scans acquired both indoors and in nearby outdoor areas.For larger outdoor areas, preliminary results indicate that a regression approach using x-y coordinates seems more suitable than the roomby-room classification used here for indoor localization.

Table 1 :
Percentage of correct radio fingerprint classifications on the 4 carrier sets described in the text.Figures quoted are averages and standard deviations over 10 randomly selected validation sets.All classifiers achieve their best performance when all 488 carriers are included.The most effective classifier for this case is the linear SVM.
, which states that the examples of a training set are always linearly separable when the number of input variables exceeds the number of examples.The corresponding table entries are thus left blank (footnote 2 in the table).Secondly, on the other hand, dimensionality reduction by principal component analysis is known to often make examples nonlinearly separable, giving poor performance (nonlinear separability of the training examples was verified using the Ho-Kashyap algorithm

Table 2 :
Confusion Matrices for 35 Best Overall and All 488 carrier sets, using a Linear SVM classifier.Figures quoted are in percent.Using the full number of carriers tightens up the diagonal to give individual room classification efficiencies near 100%.