Among the accurate indoor localisation systems that are using WiFi, Bluetooth, or infrared technologies, the ones that are based on the GSM rely on a stable external infrastructure that can be used even in an emergency. This paper presents an accurate GSM indoor localisation system that achieves a median error of 4.39 metres in horizontal coordinates and up to 64 percent accuracy in floor prediction (for 84 percent of cases the floor prediction is mistaken by not more than a single floor). The test and reference measurements were made inside a sixfloor academic building, with an irregular shape, whose dimensions are around 50 metres by 70 metres. The localisation algorithm uses GSM signal readings from the 7 strongest cells available in the GSM standard (or fewer, if fewer than 7 are available). We estimate the location by a threestep method. Firstly, we propose a point localisation solution (i.e., localisation based on only one measurement). Then, by applying the central tendency filters and the Multilayer Perceptron, we build a localisation system that uses a sequence of estimations of current and past locations. We also discuss major accuracy factors such as the number of observed signals or the types of spaces in the building.
Outdoor localisation is, today, a part of our life. However, such useful localisation methods as the Global Positioning System (GPS) fail inside buildings. Popular alternatives use Received Signal Strength (RSS) from wireless networks that are accessible indoors. Measuring the strength of WiFi signals from multiple Access Points in various locations, we create a map of fingerprints. In the localisation process one’s position can be found by comparison of current signal strengths with the created map.
Using WiFi signals inside buildings with many different Access Points whose range covers the whole building allows a very accurate localisation solution to be built. However, this method may fail in buildings with a poor network infrastructure. Moreover, in the case of an emergency such as a fire in the building, local infrastructure may be damaged and the localisation system will fail.
An alternative is a localisation system based on an outside infrastructure. We present an indoor localisation system based on Global System for Mobile Communications (GSM) signals. The proposed localisation system can be implemented on most Android mobile phones. However, there are exceptions. Our test showed that Samsung Galaxy S III returns limited data and cannot be used. We tested the system in a sixfloor academic building for threedimensional localisation of objects. The paper presents the following results: a horizontal localisation solution with the median error less than 4.39 metres and a floor detection solution with an accuracy of 64 percent.
The remainder of this work is organised as follows: Section
The data for the statistical models were collected using three different mobile phones, Sony Ericsson E15i, Sony Ericsson MT11i, and HTC Desire, all of them running Android OS 2.1 (or newer). The measurements were taken with the phones held horizontally by hand at around 1 metre above the floor. All mobile phones were attached to a GSM (2G) mobile network during the experiment. It should be noted that in parallel with collecting the data on mobile phones, the Gateway Mobile Location Centre (GMLC) based location of the terminal was estimated, which could result in more frequent cell reselection requests sent to a mobile station (compared with a terminal operating in a purely idle mode of operation). This results in a more complete representation of possible serving cells in a given location, although none of our models takes direct advantage of that fact, relying only on vectors of Received Signal Strengths.
The data were gathered on all of the publicly accessible areas of a sixfloor academic building (including the ground floor). The building has an irregular shape, its outer dimensions are around 50 by 70 metres, and its height is 24 metres. A 3D sketch of the building is shown in Figure
Overall presentation of the acquired measurements within a 3D sketch of the building. The green areas represent the rooms and sections of the building where the measurements were taken and the vertical bars denote the locations where the measurements were taken.
For the purpose of training and testing the statistical models for localisation prediction, we gathered the data in three independent series of measurements (each series took place in a different week). In order to have accurate coordinate information inside the building, we defined a 0.75by0.75metre grid and assigned for each point a unique identifier (denoted by POI). All measurements were taken in the points of the defined grid and they were labelled by the corresponding POIs. A single measurement (fingerprint) is a real valued vector
The data were gathered by three persons and each series of measurements took 9 days on average. Each series of measurements was taken in around 1200 POIs. To make the final results more independent from spatiotemporal RSSI fluctuations, at each POI the measurement was taken 40 times. This resulted in a total number of ca. 48000 fingerprints.
The first and the second series were taken in the same set of POIs, ordered in general in a 1.5by1.5metre grid (denoted
the value of the
the floor number,
one of the four directions in which the phone was oriented (parallel to one of the horizontal axes); it should be noted that the orientation information has not been passed to the prediction model but was used in the generation of artificial paths.
In the whole building we observed 39 different BTS identifiers. RSSI values for up to 7 BTSs were registered in a single measurement, which corresponds to the GSM standard. In particular, according to the Radio Resource Control Protocol [
Descriptive statistics of the gathered data for each floor.
Floor  # BTS  Samples 


# area  



 
0  3  5  6  14%  8.02  9.45  8 
1  4  5  6  14%  6.91  9.62  12 
2  4  5  6  34%  8.77  15.09  22 
3  3  4  5  12%  7.06  13.92  9 
4  3  5  6  14%  9.99  12.87  12 
5  5  6  7  11%  8.89  14.16  12 
Despite the appropriate standards, we observed that mobile phones used in the experiment reported RSSI values that are not allowed by the standard. In particular, RSSI levels ranging from −113 dBm to −11 were observed, while 3GPP specifications define the maximum value as −51 dBm (which is to be interpreted as “−51 dBm or greater”). Such measurements form 0.4 percent of the data set. Moreover, unknown RSSI values were reported by the mobile terminal in 0.7 percent of the measurements. To sum up, after data set validation, a total number of 683 258 registered RSSI values for GSM BTSs were used for the purpose of this work. A histogram showing the distribution of the valid RSSI values registered in our database is shown in Figure
Distribution of observed RSSI values.
The measurements scope was focused on the halls and corridors of the building, with the addition of several lecture rooms and computer laboratories, as these are the most often visited locations within the building. Figure
The data described in this section was also used in the analysis presented in [
Although the basic prediction models process only a single fingerprint, much better results (especially in terms of determining the floor) could be achieved with a multipoint method (i.e., taking into account the past measurements). In each series of measurements, at every POI we took 10 fingerprints in each of the four main directions parallel with the coordinate axes of the building. We used one of the fingerprints from a given point to construct artificial paths between two randomly chosen points from the grid of defined and measured points.
We generated 500 paths for the
The topology of the building was not taken into account during the construction of the neighbourhood, except for information about the positions of the stairwells, which was used for changing the floor.
The paths for the
Choose at random a start point from the
Choose at random an end point from the
Find the shortest path in the graph of points (a change of floor is possible only near the main staircase).
Choose a fingerprint gathered in the direction in which the path is traversed, having RSSI for at least two BTSs.
In the presented generator, if the start and end points are from two different floors, the algorithm searches for the shortest path from the start point to the staircase, then inserts the predefined points between the start and end point floors, and then searches for the shortest path from the staircase to the end point.
The shortest path was found using the greedy
Our task was the detection of the current location of the tracked object. The location is described as a triple
Figure
Localisation schema for point
The estimation of the coordinate where the fingerprint
Two estimations are
The final estimation for a single fingerprint is the function
The localisation schema (Figure
Three functions
We compared several boosting algorithms to estimate the localisation separately for the floor and the coordinates. Our test included AdaBoost, Bagging, and Least Squares Boosting (LSBoost). AdaBoost is a classification method [
AdaBoost was implemented as the AdaBoostM2 algorithm, where weighted pseudoloss is calculated for
Bagging bags a weak classifier such as a decision or regression tree on a data set, generates many bootstrap replicas of this data set, and grows decision trees on these replicas. To find the predicted response of a trained ensemble, the algorithm takes an average of predictions from individual trees.
LPBoost performs multiclass classification by attempting to maximise the minimal difference between the predicted soft classification score for the true class and the largest score for the false classes in the training set. This operation should improve generalisation ability [
This part describes Step
The mode filter calculates the most popular result among the floor estimations for a single fingerprint:
The filter is parameterised by the coefficient
To avoid a strong influence of the estimations for the farthest points, estimation is repeated
Equation (
In the modified formula, each estimation is replicated additionally
The average filter has the same role as the mode filter but works with continuous data. The
The element
Similarly to the mode filter, we introduce the second form of the filter that includes knowledge about the quality of the estimation:
The weighted average uses weights
Unlike the commonly used complex methods, such as Kalman’s filter [
After the estimation for the fingerprints sequence we had the
To use all that available knowledge, we decided to aggregate all the estimations. For that task, we used a Multilayer Perceptron.
We created two independent MLP models. The first model detected the floor and the second model detected the horizontal coordinates.
All inputs and outputs of the first network were binary. Therefore, all the features were represented by a set of neurons and each neuron defined one of the possible values of the feature.
Only one neuron from the set could be active at any one time. If the number of floors is given by the number
The model for coordinates is much simpler. All the inputs and the output are continuous. The input neurons represent estimated values
The location estimated by MLP should give better results than a pure tendency filter. However, the created network will be fitted to the problem described by the learning set. In the learning process, two sets must be created. A learning set defines a network structure and a validation set controls learning process and prevents overlearning. Although such division allows generalisation, the created network cannot be used on a testing set with a different number of floors, while a mode filter works on any data.
This section presents the results obtained in the building described in Section
Data for the tests were divided into the learning set, the validation set, and the testing set created from the first series of measurements
We tested four classification boosting algorithms to estimate the current floor of the tracked object. Our test included AdaBoost, Bagging Classification, Bagging Regression with discretisation of the results, and Least Squares Boosting.
Table
Floor detection. Results for the testing series
Method  Accuracy 
Average error 

AdaBoost  33  1.50 
Bagging Classification  56  0.83 
Bagging Regression  43  0.81 
Least Squares Boosting  29  1.07 
Estimation of horizontal position is a regression task. We checked two regression models to solve this problem: the Bagging method and the LS Boosting method. The results are given in Table
Horizontal error analysis for testing series
Mean  Median  80th percentile  

LS Boosting  10.22  9.15  14.75 
Bagging  8.15  6.76  12.16 
The better results were obtained by the Bagging method. The mean error for the Bagging method could be accepted in indoor localisation, but gross errors were very frequent. The location estimations of 20 percent of all the test fingerprints were mistaken by more than 12 m horizontally.
Let us consider a sequence
The aim of this section is to describe the experiments for computing the location of the point where
The defined sequence of fingerprints of length
We introduced formulas (
We expected to have a better accuracy when more BTSs were visible. We say that a particular BTS is visible for a given fingerprint when the device reported a signal from that BTS. It is worth recalling that commonly used devices report up to 7 signals (usually the strongest ones) and the sensitivity is −113 dBm (signals that are weaker are not reported). Figure
Results with respect to the number of visible BTSs.
Floor detection
Coordinates approximation
For instance, when 7 BTSs are visible, the accuracy is up to 66 percent, while with only 1 visible BTS the accuracy was as low as 34 percent. This suggests that when localisation is based on a history of readings in the next steps of our algorithm, one should give more importance to the fingerprints with more BTSs that are visible.
Similarly to the floor estimation case, we investigated the accuracy of the Bagging method with respect to the density of the infrastructure. The results are presented in Figure
We expected the same relationship between the error and the number of signals, as in the case of floor detection. And, indeed, the accuracy increases with more BTSs that are visible. However, there is a difference between the horizontal and vertical cases, as we can see in Figure
To sum up, the results presented in Figures
To evaluate the influence of the signal weights used in the weighted models, we compared three sets of weights. The first set contained equal weights. Therefore, the estimations were calculated according to (
The second set contained weights given by the number of observed BTSs. According to the results obtained during the estimation for a single fingerprint, the number of observed BTSs is correlated with the accuracy obtained. The estimations were calculated according to (
The third set contained random weights from the domain
Estimations for the floor and coordinates were made separately using different tendency filters.
In this section we perform the analysis of the localisation solution described above for the problem of floor prediction. The localisation is based on
The localisation model was trained using the
Floor estimation obtained by mode filters.
Learning set
Validation set
In all cases we see that the weighted mode with weights corresponding to the visible BTSs brings the best results. For the validation series and the testing series, the weighted methods provide a substantial improvement when compared to localisation based on a single fingerprint (path length
What is interesting is that for
Figure
After the learning process and the validation, the obtained model was tested on the
Accuracy for floor estimation.
Mode
Weighted mode
Similarly to the previous section, we perform the analysis of the localisation solution for the problem of horizontal localisation (the estimation of
The median horizontal error presented in Figures
Error for localisation obtained by average filters.
Learning set
Validation set
As before, for the learning set we obtain worse results when
Figures
Horizontal error cumulative distribution.
Equal weights
Signal weights
The aggregation process is carried out separately for the floor detection and the coordinates approximation because of the different input for the neural network. The differences lie in both the different data and the different structure of the input layer.
The MLP can be used in several ways in the floor detection task. First, the network can be used to estimate the current floor on the basis of single estimations. Second, the MLP can aggregate the results of the previous steps. Third, both inputs can be merged to create a new model.
In our test the MLP brings 59.54, 64.21, and 55.37 percent levels of accuracy for the estimation on the basis of single estimations, the aggregation, and the merged model, respectively. This shows that the proposed solution works better than neural network modelling. It also suggests that the results obtained by weighted model are strongly different from the results obtained by the model based on single estimations, and so the merged model cannot work properly.
Figure
MLP accuracy for individual floors.
Floor  Accuracy [%]  Mean error [floor] 

0  82.83  0.29 
1  63.87  0.52 
2  75.51  0.38 
3  40.11  0.87 
4  44.85  1.21 
5  50.57  1.33 



Accuracy for floor estimation obtained by MLP.
The network gives good results for the first three floors. For the upper floors, results are worse, but the main source of errors is the third floor, where accuracy is about 40 percent. Additionally, for the 4th and 5th floors, the mean error is over one floor.
The MLP for the horizontal coordinates approximation was tested—similar to the floor detection task—on three different inputs: for separate estimation, for aggregation, and for the merged model.
In our test, the MLP brings a 6.80, 5.22, and 5.18 mean error for estimation on the basis of single estimations, aggregation, and the merged model, respectively.
The errors diminish according to the increase of the length of the paths. However, the length was fixed at 6 in the previous tests. The mean error, the median error, and the gross error are less than those for the weighted average filter.
Figure
MLP point localisation errors for individual floor.
Floor  Mean  Median  80% 

0  6.24  5.35  8.86 
1  4.58  3.85  6.73 
2  5.09  4.41  7.29 
3  4.71  4.32  6.79 
4  4.36  3.80  5.66 
5  6.58  5.53  10.45 




Horizontal error cumulative distribution for MLP.
In this part, we discuss the major factors that may influence the accuracy of the localisation algorithm. In order to do that, we will look at the first step of our algorithm (see Section
Horizontal distance error cumulative distribution.
In Sections
Mean errors analysis with respect to the number of visible BTSs.
Visible BTSs  1  2  3  4  5  6  7 

Distribution  9%  2%  9%  18%  22%  25%  15% 
Effectiveness (floor)  27%  63%  47%  60%  63%  64%  70% 
Floor error  1.33  0.61  0.88  0.77  0.80  0.77  0.74 
Error for 
6.49  6.45  4.29  3.24  2.92  3.04  2.99 
Error for 
5.94  5.38  5.96  6.06  4.90  4.34  3.87 
Horizontal error [m]  9.74  9.49  8.08  7.53  6.30  5.89  5.38 


Results under the assumption that the floor’s prediction was correct  
Error for 
7.05  7.60  3.26  2.95  2.51  2.76  2.68 
Error for 
6.05  5.39  5.70  5.38  3.69  3.49  3.72 
Horizontal error [m]  10.49  10.49  7.05  6.70  5.03  5.01  5.08 
In this section, we analyse errors with respect to the floor number. It is not a natural accuracy factor, although the following table shows that the errors vary with the floor number. We shall attempt to find the reason for this phenomenon.
Table
Mean errors analysis with respect to the floor number.
Floor  0  1  2  3  4  5 

Distribution  13%  40%  11%  13%  11%  0% 
Effectiveness (floor)  68%  68%  69%  30%  47%  43% 
Floor error  1.33  0.61  0.88  0.77  0.80  0.77 
Error for 
6.49  6.45  4.29  3.24  2.92  3.04 
Error for 
5.94  5.38  5.96  6.06  4.90  4.34 
Horizontal error [m]  7.93  5.72  6.80  6.94  5.96  8.22 


Results under the assumption that the floor’s prediction was correct  
Error for 
7.05  7.60  3.26  2.95  2.51  2.76 
Error for 
6.05  5.39  5.70  5.38  3.69  3.49 
Horizontal error [m]  6.27  4.76  6.22  6.53  4.66  6.28 
Firstly, let us look at the effectiveness of the floor’s localisation as this is the most important issue in the localisation problem.
Let us observe that there is a significant difference in effectiveness for floor numbers 0 to 2, where our algorithm works better, and for floor numbers 3 to 5, where the algorithm obtains worse results. It is not surprising that the best effectiveness is obtained for floor 2, since this is where the most fingerprints were taken when collecting learning data. However, we have no simple explanation as to why the effectiveness for the other floors is so different despite their being equally represented in the collection of fingerprints gathered for learning purposes.
Let us now look at Figure
Analysis with respect to the floor number.
Efficiency of the floor’s prediction
Horizontal distance error
In this section, we analyse the effectiveness of the floor prediction as well as the mean horizontal distance error for the different space types in the building. We showed that the more regularly shaped spaces covered by the measurement points (POIs) where the fingerprints were taken were mostly covered by good POIs. Here, by regularity we mean that there is no significant difference between width and length. On the other hand, long corridors where fingerprints were only taken along them (and not in the adjacent rooms) were commonly covered by bad POIs. It should be mentioned that the POIs where fingerprints were taken for learning purposes (not testing) cover the same regions as the POIs used for testing purposes. This leads us to the conjecture that the shape of the area covered by the POIs where fingerprints are taken is important. In other words, if the shape is more compact then we obtain better results. This would explain why the effectiveness is so different on floors 3 to 5. Using visualisation methods, we are able to find the areas of the building for which the localisation algorithm should be improved. Such maps also show that the good and bad regions are not randomly distributed. They actually give us a reasonable partition into easily identified and somehow natural sectors of the building.
Beside the accuracy factors discussed in the sections above, there are several other factors that influence the accuracy. All this is well covered in [
Signals strength for BTSs.
Signals for floor 0 in series 1
Signals for floor 0 in series 2
Signals for floor 0 in series 3
Signals for floor 1 in series 1
Signals for floor 1 in series 2
Signals for floor 1 in series 3
Signals for floor 2 in series 1
Signals for floor 2 in series 2
Signals for floor 2 in series 3
Signals for floor 3 in series 1
Signals for floor 3 in series 2
Signals for floor 3 in series 3
Signals for floor 4 in series 1
Signals for floor 4 in series 2
Signals for floor 4 in series 3
Signals for floor 5 in series 1
Signals for floor 5 in series 2
Signals for floor 5 in series 3
The results obtained are compared with the results of the random algorithm as well as results presented in other works.
Reference levels for the described algorithm were defined by the following algorithm. A set consisting of (
The mean error obtained for floor identification was 80 percent. The mean errors for coordinate estimation were 8.97 and 12.82 metres for
Position finding methods and their accuracy have become important issues in research over recent years. Outdoor solutions are mainly based on GPS/AGPS combined with a GSM approach. Regrettably, they are not suitable for indoor environments—because of significant loss in the GPS signal strength. Therefore, many different methods for positioning in indoor situations have been proposed: [
Almost all the existing indoor positioning algorithms are based on fingerprinting. In this section, we concentrate on methods based on widely available GSM networks. This approach does not require additional installation and maintenance, and thus it has been shown to be promising for locating mobile terminals inside buildings. The main limiting factors here are multipath and fading. The power level of the signal as it is received from Access Points at a fixed location may change due to the user’s orientation or environmental changes. We focus above all on the accuracy of position approximation, measured in metres, and the requirements (e.g., input data) for the methods that have been developed.
An accurate GSM indoor localisation system in multistorey buildings is proposed in [
In an indoor positioning system presented in [
The effectiveness of GSMbased localisation methods depends on the number of available and examined GSM carriers. The work [
The problem of choosing a subset of relevant GSM carriers providing a good distinction between rooms is also examined in [
Floor classification itself is the subject of research [
As we verified before, the effectiveness of 3dimensional localisation may be improved when the phases of recognition of the floor, and the estimation of 2dimensional coordinates within a given (single) floor, are separated. Floor detection can be significantly enhanced by using additional sensors embedded in mobile devices. The work [
The paper [
In all the works described, the complexity of the proposed solutions is not given. The authors have written that algorithms work in short time or even in real time. Only in [
To sum up, the errors for our solution (4.62 metres for a horizontal localisation in a multistorey building and 64 percent for the floor detection) are comparable to, or even better than, those obtained in the reviewed works and based only on received GSM signal strength. The results reported in literature for methods using additional sensors in a mobile system seem to be a step in the right direction towards improving the accuracy of our localisation system.
This work describes and analyses a localisation solution in a sixfloor academic building. The localisation was based on the signals from Global System for Mobile Communications Base Transceiver Stations. The localised objects were common mobile phones. We demonstrated that, for the localisation based on a single fingerprint, we obtained a horizontal median error of about 6.5 metres and accuracy of floor detection of about 56%. We proposed a threestep method that uses not only a single fingerprint but also the preceding ones. This method improved the accuracy. We obtained a horizontal median error of around 4.4 metres and floor detection accuracy of over 64%. Table
Summary of the localisation results of individual steps of the localisation process.
Stage  Horizontal error  Floor  

Mean [m]  Median [m]  80% [m]  Error [%]  
Estimation for single fingerprint  6.75  5.66  9.86  42.93 
Estimation for fingerprints sequence  5.32  4.67  7.51  37.24 
MLP aggregation  5.18  4.39  7.47  35.79 
The localisation error for coordinates obtained in our work is similar to the results obtained in different works that used additional information such as WLANs signals or accelerometers data. A median error of below 5 metres is acceptable in indoor localisation. However, the accuracy of the floor detection algorithm is relatively low. This problem can be solved using WLANs signals. Our tests proved that, by using a combination of GSM and WLAN signals, the current floor can be detected with an accuracy of over 90 percent. This work focuses on GSM signals and in this case a pure GSM signal seems to be insufficient to localise a floor in a multistorey building.
In further works we want to collect additional data from the same and other buildings to examine longterm differences in the signal map, collect additional data such as accelerometer data, and apply the proposed methods for various buildings.
The authors declare that they have no competing interests.
The authors would like to thank Le Dinh Tung, Łukasz Wrzesiński, and Bogusław Zaręba for making the measurements and Orange Labs Poland for providing the cell phones and the application for the Android OS. The research is supported by the National Centre for Research and Development, Grant no. PBS2/B3/24/2014, application no. 208921.