Wi-Fi Fingerprint-Based Indoor Mobile User Localization Using Deep Learning

. In recent years, deep learning has been used for Wi-Fi ﬁ ngerprint-based localization to achieve a remarkable performance, which is expected to satisfy the increasing requirements of indoor location-based service (LBS). In this paper, we propose a Wi-Fi ﬁ ngerprint-based indoor mobile user localization method that integrates a stacked improved sparse autoencoder (SISAE) and a recurrent neural network (RNN). We improve the sparse autoencoder by adding an activity penalty term in its loss function to control the neuron outputs in the hidden layer. The encoders of three improved sparse autoencoders are stacked to obtain high-level feature representations of received signal strength (RSS) vectors, and an SISAE is constructed for localization by adding a logistic regression layer as the output layer to the stacked encoders. Meanwhile, using the previous location coordinates computed by the trained SISAE as extra inputs, an RNN is employed to compute more accurate current location coordinates for mobile users. The experimental results demonstrate that the mean error of the proposed SISAE-RNN for mobile user localization can be reduced to 1.60m.


Introduction
As the development of the fifth-generation (5G) networks, the key technologies of 5G are paving the way for applications of Internet of Things (IoT) [1][2][3], for they can greatly improve the network connectivity [1], spectrum efficiency [3], and so on. Since location-based service (LBS) is essential to the IoT and people's demand for LBS has increased rapidly, LBS has attracted extensive attention [4,5]. Although satellite-based navigation and positioning systems like global positioning system (GPS) and BeiDou navigation satellite system (BDS) can meet most LBS requirements outdoors, these outdoor localization and navigation systems are not suitable for indoor applications owing to the signal attenuations caused by the blockage of the buildings [6,7]. Meanwhile, people daily spend most of their time indoors. Therefore, indoor localization has been extensively researched in the past years because of its application and commercial potentials [8].
So far, people have developed various indoor localization methods using infrared, ultrasound, radio frequency identifi-cation (RFID), ZigBee, Bluetooth, ultra wideband (UWB), and Wi-Fi [9][10][11][12]. Among them, the indoor localization method using Wi-Fi has become a research hotspot because of its low cost, widely deployed infrastructure, and so on. Compared with the Wi-Fi localization using time of arrival (TOA), time difference of arrival (TDOA), and angle of arrival (AOA) [12][13][14], the Wi-Fi fingerprinting localization using received signal strength (RSS) has been favored because it does not require extra hardware and has a satisfactory performance under nonline-of-sight (NLOS) environments.
Generally, a basic Wi-Fi fingerprint-based localization method can be divided into two phases: the offline phase and online phase [15]. In the offline phase, RSS vectors from deployed Wi-Fi access points (APs) are recorded as fingerprints at a number of location-known reference points (RPs) to establish a fingerprint database, namely, radio map. In the online phase, after an online RSS vector is measured by a user's terminal device, similarities between the online RSS vector and the RSS vectors in the radio map can be calculated to select RPs for localization, or the online RSS vector is input into a machine learning-based localization algorithm trained with the radio map in the offline phase to calculate the user's location coordinates. However, indoor radio propagation is time-varying and easily affected by multipath effect, shadowing effect, and environmental dynamics, which could degrade the performance of Wi-Fi fingerprinting localization, let alone the localization for mobile users with only limited RSS data available. Therefore, it will be a serious challenge for the Wi-Fi fingerprint-based indoor mobile user localization to achieve high localization accuracy.
Recently, various deep learning algorithms have been proposed, and some of them have been applied in Wi-Fi fingerprinting localization for performance improvement like deep belief network (DBN) [16], convolutional neural network (CNN) [17], and stacked autoencoder (SAE) [18]. As an unsupervised deep learning algorithm, a stacked sparse autoencoder (SSAE) can learn high-level feature representations of data efficiently [19]. Because the dimensions of the hidden layers of the SSAE could be greater than the dimension of its input layer, it can be applied to learn RSS features in the scenarios with a few deployed APs. Moreover, as a supervised deep learning algorithm, a recurrent neural network (RNN) is powerful for processing sequential correlation data and is able to improve the localization accuracy for mobile users. Therefore, we propose a Wi-Fi fingerprintbased indoor mobile user localization method that integrates a stacked improved sparse autoencoder (SISAE) and an RNN to achieve high localization accuracy. The main contributions of this study can be summarized as follows: (1) We propose a Wi-Fi fingerprint-based indoor mobile user localization method using deep learning. We use the stacked encoders of three improved sparse autoencoders to obtain high-level feature representations of RSS vectors. Then, we add an output layer to the stacked encoders to construct an SISAE for localization. In addition, to process the spatial correlation locations of mobile users, we integrate the SISAE with an RNN to calculate more accurate location coordinates (2) We propose an SISAE fingerprinting algorithm. We first improve the sparse autoencoder by adding an activity penalty term in its loss function to control the neuron outputs in the hidden layer and stack the encoders of three improved sparse autoencoders. Then, we add a logistic regression layer as the output layer to the stacked encoders to construct an SISAE, which takes the locations of RPs as its expected outputs for supervised fine-tuning. The proposed SISAE fingerprinting algorithm has a superior localization performance (3) We propose an RNN tracking algorithm to improve the localization accuracy for mobile users. We construct the RNN by using the same network structure as well as the weights and biases of the trained SISAE and also by adding the previous ground-truth location coordinates of a trajectory as extra inputs for the RNN training. In the online phase, the previous location coordinates computed by the SISAE are used as the extra inputs of the RNN for current localization, and therefore, the localization performance can be improved The remainder of this paper is organized as follows. Section 2 summarizes the related works of fingerprinting localization using deep learning. Section 3 describes our proposed localization method that integrates the SISAE and RNN. Section 4 demonstrates the experimental setup, parameter settings, and experimental results and analyses. Finally, we conclude this work in Section 5.

Related Works
In the last two decades, many famous Wi-Fi fingerprintbased indoor localization systems have been developed, such as RADAR [20], Nibble [21], and Horus [22]. At the same time, many fingerprinting localization algorithms have been presented, for example, K-nearest neighbors (KNN) [23], maximum likelihood probability [24], and neural network [25]. Because deep learning algorithms outperform many traditional machine learning algorithms, researchers have focused their attention on the deep learning-based fingerprinting and tracking algorithms as follows.

Deep
Learning-Based Fingerprinting Algorithms. Some researchers used deep reinforcement learning (DRL) algorithms for fingerprinting localization. Li et al. [26] proposed an unsupervised wireless localization method based on DRL. The method extracted landmark data from unlabeled RSS automatically and eased the demand for retraining the DRL. Mohammadi et al. [27] proposed a semisupervised model using a variational autoencoder and DRL and applied the model to indoor localization in a smart city scenario. Meanwhile, some researchers utilized deep learning approaches for RSS feature extraction. Le et al. [16] proposed a DBN constructed by stacked restricted Boltzmann machines for fingerprint feature extraction. The network could reduce the workload of fingerprint collection and maintain the localization accuracy. Shao et al. [17] combined Wi-Fi and magnetic fingerprints to generate a new image and utilized the convolutional windows to extract the Wi-Fi features automatically. Hsieh et al. [28] presented a deep neural network (DNN) implemented with one-dimensional CNN, which was used to extract hierarchical features and to reduce the network complexity. Moreover, some researchers preferred autoencoders (AEs) for the RSS feature extraction. Khatab et al. [29] combined an AE and a deep extreme learning machine to improve the localization performance by using the AE to extract high-level feature representations of RSS data. Song et al. [18] employed an SAE for dimension reduction of RSS data and combined it with a onedimensional CNN to improve the localization performance. In [30], a time-reversal fingerprinting localization method with RSS calibration was proposed. The method used the amplitude-AE and phase-AE to calibrate the online RSS measurements. Zhang et al. [31] first utilized a DNN structure 2 Wireless Communications and Mobile Computing trained by a stacked denoising autoencoder for coarse localization. After the DNN, a hidden Markov model was used for fine localization. However, these algorithms based on undercomplete AE were applied for dimension reduction of RSS features [29] and may not be suitable for the scenarios with a few deployed APs.

Deep
Learning-Based Tracking Algorithms. Bai et al. [32] proposed a real-time localization system that consisted of two RNNs. The first RNN computed the locations using the historical observation information, and the second RNN refined the locations with the historical computed locations. In [33], an RNN solution for localization using the correlation among the RSS measurements in a trajectory was proposed. Different RNN structures were also compared and analyzed by the authors. Reference [34] proposed a novel RNN for indoor tracking. The RNN estimated the orientation and velocity using inertial measurement units of terminal devices as well as learned and predicted a human movement model. Zhang et al. [35] embedded a particle filter algorithm into an RNN to realize real-time map matching. They also employed a CNN to learn map information for particle weight computation. Chen et al. [36] proposed a deep long short-term memory (LSTM) algorithm to learn high-level representations of the extracted RSS features and conducted the experiments in two different scenarios to verify the effectiveness of the algorithm. Tariq et al. [37] tested and analyzed several neural networks for localization and concluded that the LSTM could have a comparable performance with the lowest processing effort and fewest network parameters. Among these deep learning-based tracking algorithms, researchers concentrated on the spatial correlation locations of mobile users to improve the localization performance.
In this study, we propose an SISAE fingerprinting algorithm whose dimensions of the hidden layers can be freely set and not be limited by its input dimension. Also, based on the trained SISAE, we add the previous location coordinates of mobile users as extra inputs to construct an RNN. We integrate the SISAE with the RNN to improve the mobile user localization accuracy. To the best of our knowledge, so far, neither have the SSAE and SISAE been used for fingerprinting localization nor has the integration of the SISAE and RNN been proposed for mobile user localization.

Proposed Localization Method
Similar to the basic Wi-Fi fingerprinting localization method, our proposed localization method also has two phases: the offline phase and online phase shown in Figure 1. In the offline phase, we first collect the RSS vectors at selected RPs to establish the fingerprint database 1 that is the radio map. We train three improved sparse autoencoders with the RSS vectors of RPs and stack their encoders to obtain high-level feature representations of RSS vectors. Also, we add a logistic regression layer with two neurons that takes the location coordinates of RPs as its expected outputs to the stacked encoders to construct an SISAE for fingerprinting localization, and then we fine-tune the SISAE by using the back propagation (BP) algorithm. Meanwhile, we also consecutively collect extra RSS vectors along a trajectory with known locations to establish the fingerprint database 2 for the RNN training. In this process, we construct the RNN by using the same network structure as well as the weights and biases of the trained SISAE and also by adding the previous groundtruth location coordinates of the trajectory as extra inputs. Then, we train the weights of the extra inputs with the fingerprint database 2. In the online phase, after a mobile user's terminal device measures an RSS vector, the RSS vector is input into the SISAE to compute the current location coordinates of the mobile user, which can be used as the extra inputs of the RNN to calculate the following location coordinates of the mobile user. Also, with the RSS vector and the previous location coordinates calculated by the SISAE, more accurate current location coordinates of the mobile user can be calculated by the RNN.

SISAE Fingerprinting Algorithm
3.1.1. Unsupervised Feature Learning. In order to improve the localization accuracy, traditional AEs are usually used for feature extraction and dimension reduction of RSS data, which forces the network to learn compressed representations of the inputs [19]. However, when only a few APs are deployed in a localization area, it means the dimension of RSS vectors is limited. Thus, the dimension reduction of the traditional AEs may directly affect the learning ability and network stability. To solve this problem and improve the network performance, we use an SISAE to increase the number of neurons  Regarding the principle of a traditional AE, it is a neural network with a single hidden layer and can be divided into two parts: an encoder function f ðxÞ and a decoder function gðxÞ. The outputs of the AE are expected to be approximate to its inputs through training, which can be denoted by g½ f ðxÞ ≈ x. The jth neuron output h j in the hidden layer and the ith neuron outputx i in the output layer of the AE can be computed by: where x i is the ith input of the AE, w ij is the weight between the ith input of the AE and the jth neuron in the hidden layer, w T ij that is the transposition of w ij is the weight between the jth neuron in the hidden layer and the ith output of the AE, b BP algorithm is employed to update the weights and biases of the AE to decrease the loss function J AE between the inputs and outputs. The weights and biases of the AE are optimized as the loss function J AE decreases, which is calculated by: wherex k is the kth output vector, x k is the kth expected output vector that is also the kth input vector, n is the number of input vectors or output vectors, and k·k 2 is the l 2 -norm.
A sparse autoencoder that has a sparsity constraint in its hidden layer is an improved AE. The sparse autoencoder is able to learn sparse representations of the RSS data, and the dimension of the hidden layer can be increased even greater than the dimension of the input layer. The structure of a sparse autoencoder is shown in Figure 2. Usually, we define a neuron in the hidden layer as "activated" if its output is approximate to 1 and as "inactivated" if its output is approximate to 0. The sparsity constraint can be interpreted as the neurons in the hidden layer are "inactivated" most of the time.
Firstly, a sparsity penalty term is added to the loss function of the AE to constrain the sparsity. We calculate the average activation p j ′ of all the n input vectors of the jth neuron in the hidden layer of a sparse autoencoder by: where h kj is the jth neuron output in the hidden layer of the kth input vector.
A sparsity constant p approximate to 0 is used as a criterion for the sparsity constraint. The sparsity penalty term J KL based on the Kullback-Leibler (KL) divergence is calculated by: In addition, a weight decay term W is added to the loss function to prevent overfitting, which can be calculated by: So the loss function J LOSS of the sparse autoencoder is calculated by: where α is a weighted factor to control the contribution of the sparsity penalty term and β is a weighted factor to control the contribution of the weight decay term.
In this paper, we improve the sparse autoencoder by adding an activity penalty term A to the loss function J LOSS to control the neuron outputs in the hidden layer. The activity penalty term A is calculated by: Therefore, the loss function J ′ LOSS of the improved sparse autoencoder is calculated by:  Figure 2: The structure of a sparse autoencoder.

Wireless Communications and Mobile Computing
where λ is a weighted factor to control the contribution of the activity penalty term. An SISAE can be composed of the encoders of several improved sparse autoencoders and an output layer. The encoder outputs of the first trained improved sparse autoencoder can be taken as the encoder inputs of the second improved sparse autoencoder for its training, so are the trainings of the other improved sparse autoencoders. Then, the encoders of these improved sparse autoencoders are stacked to learn high-level feature representations of data efficiently.
In this study, we normalize the RSS data of each AP to [-1, 1] to improve the network performance. As shown in Figure 3, we construct an SISAE with three improved sparse autoencoders: improved sparse autoencoder 1, 2, and 3. We first train the improved sparse autoencoder 1 by taking the normalized RSS vectors of RPs as its inputs. Then, we train the improved sparse autoencoder 2 by taking the neuron outputs in the hidden layer of the trained improved sparse autoencoder 1 as the inputs of the improved sparse autoencoder 2, so is the training of the improved sparse autoencoder 3. Finally, the SISAE can be constructed by using the weights and biases of the encoders of the trained improved sparse autoencoder 1, 2, and 3 as the weights and biases of its hidden layer 1, 2, and 3, respectively, and also by adding a logic regression layer with two neurons as the output layer to the stacked encoders for the supervised fine-tuning.

Supervised Fine-Tuning.
We take the RSS vectors of RPs as the inputs and the location coordinates of RPs as the expected outputs of the SISAE for its supervised finetuning. The oth neuron output y o in the output layer of the SISAE is calculated by: where h Then, we use the BP algorithm to fine-tune the weights and biases of the SISAE to decrease the loss function J SISAE , and the loss function J SISAE is calculated by: where y k is the estimated location vector in the output layer computed by the SISAE using the kth input RSS vector, and L k is the ground-truth location vector of the kth input RSS vector. In addition, we add a dropout layer after the hidden layer 3 to prevent overfitting for the supervised finetuning of the SISAE.

RNN Tracking Algorithm.
Considering the spatial correlation of the mobile user locations in a short time slot, we propose an RNN structure for mobile user tracking, which uses the same network structure and parameters of the trained SISAE. To construct the RNN, we also add an extra input vector s t−1 that represents the previous location coordinates at time t − 1 to the hidden layer 3 at time t and initialize a weight matrix u for the extra input vector. In the offline phase, the ground-truth location vector at time t − 1 is used as s t−1 for the RNN training. In the online phase, the location vector of a mobile user calculated by the SISAE at time t − 1 is used as s t−1 for the mobile user tracking at time t. Particularly, when a mobile user enters the localization area for the first time, because there are no previous location coordinates of the user, the location coordinates computed by the SISAE are taken as the final location coordinates. After adding the extra input vector s t−1 , the jth neuron output h ð3Þ t,j in the hidden layer 3 at time t can be calculated by: Improved sparse autoencoder 3 Unsupervised feature learning  The oth neuron output in the output layer of the RNN at time t is defined as y t,o , which can be calculated using (9). The weights and biases of the trained SISAE in the RNN structure are invariant during the RNN training. In addition, we also use the BP algorithm to optimize the weight matrix u for the extra input vector of the RNN and add a dropout layer after the hidden layer 3 to prevent overfitting in the RNN training phase. The proposed SISAE-RNN structure is shown in Figure 4.

Experimental Setup.
We collected all the experimental data in a real office environment. As shown in Figure 5, the experimental environment is a rectangular area with a size of 51:6 m × 20:4 m. We deployed 7 TP-LINK TL-WR845N APs shown in Figure 6(a) in the experimental environment at a height of 2.2 m. We selected 116 RPs in the corridor, Room 616, and Room 620 and collected 120 RSS vectors at each RP to establish the fingerprint database 1 that is the radio map. Meanwhile, in Room 616, we selected 19 locations in a training trajectory at an interval of 0.6 m and collected 60 RSS vectors at each location to establish the fingerprint database 2. Similarly, from one location near the elevator in the corridor to one location in Room 620, we selected 90 testing points (TPs) in a testing trajectory at an interval of 0.6 m and collected the RSS vectors at a sampling rate of 1 RSS vector per second to simulate a scenario that a mobile user moves along the trajectory at a speed of 0.6 m/s. We also collected the RSS vectors for 60 times at each TP to test the proposed localization method. We measured all the RSS vectors using a Meizu M2 smart phone shown in Figure 6(b), which was placed on a tripod at a height of 1.2 m. We installed an Android application developed by us in the smart phone, and the sampling rate of the Android application was 1 RSS vector per second as mentioned above.

Parameter Settings.
Considering that the parameter settings could directly affect the network performance, we tested the proposed method multiple times with different parameters and obtained the optimized parameters. In the unsupervised feature learning of the SISAE, we set the sparsity constants of the three improved sparse autoencoders equal to 0.04, 0.01, and 0.02, respectively. We set the weighted factor α of the sparsity penalty term equal to 10, β of the weight decay term equal to 0.001, and λ of the activity penalty term equal to 0.1. Meanwhile, we initialized the weight matrix w ðiÞ , i = 1, 2, 3 of each improved sparse autoencoder using a normal distribution with a mean of 0 and a standard deviation of 0.1 and set each element of the bias vector b ðiÞ , i = 1, 2, 3 of each improved sparse autoencoder equal to 0.1. We set the numbers of neurons in the hidden layer 1, 2, and 3 equal to 200, 400, and 800, respectively. In addition, we set the learning rate of each improved sparse autoencoder and number of iterative epochs equal to 0.001 and 1000, respectively.
In the supervised fine-tuning of the SISAE, we initialized the weight matrix w ð4Þ of the output layer of the SISAE using a normal distribution with a mean of 0 and standard deviation of 1, and set each element of the bias vector b ð4Þ of the output layer of the SISAE equal to 0.1. We set the learning rate, dropout rate, and number of iterative epochs equal to 0.005, 0.2, and 200, respectively.
In the supervised training of the RNN, we initialized the weight matrix u for the extra inputs of the RNN using a normal distribution with a mean of -0.01 and a standard deviation of 0.01. We set the learning rate, dropout rate, and number of iterative epochs equal to 0.001, 0.2, and 200, respectively.
Besides, we used the rectified linear units (ReLU) as the activation function for the encoder and the Tanh as the activation function for the decoder of each improved sparse autoencoder in the unsupervised feature learning phase of the SISAE. We also used the ReLU as the activation function for each hidden layer in the supervised training phases of the SISAE and RNN because the ReLU activation function could have a faster training speed and better performance than the Sigmoid activation function. At the same time, we selected Adam optimizer for the BP algorithm to optimize the weights and biases of each improved sparse autoencoder and also to optimize the weights and biases of the SISAE and RNN.   (4) w (3) w (2) w (1) w (4) w (3) w (2) w (1) w (4) w (3) w (2) w (1) w (4) w (3) w (2) w (1) y t y t+1   Figure 7, the cumulative probabilities of these algorithms within a localization error of 1 m are all relatively low. The cumulative probabilities of the SAE, SSAE, and SISAE with feature learning within a localization error of 1 m are obviously higher than that of the DNN, which proves the effectiveness of the RSS feature learning.    We also take one trajectory as an example and compare the SISAE-RNN with the WKNN, DNN-RNN, SAE-RNN,     Figure 9(d) is not obvious, the mean error of the SISAE-RNN is 1.47 m, which is the minimum one among these algorithms. These experimental results of the single trajectory with 90 testing RSS vectors are in accordance with the results computed with all the 5400 testing RSS vectors.

Conclusions and Future Works
In this study, we propose a Wi-Fi fingerprint-based indoor mobile user localization method that integrates an SISAE and an RNN. We first propose an SISAE fingerprinting algorithm whose dimensions of the hidden layers can be freely set. We improve the sparse autoencoder by adding an activity penalty term in its loss function to control the neuron outputs in the hidden layer. We stack the encoders of three improved sparse autoencoders and add a logistic regression layer as the output layer to the stacked encoders to construct the SISAE for fingerprinting localization. Moreover, we also construct an RNN by using the same network structure and parameters of the trained SISAE and also by adding the previous location coordinates computed by the SISAE as the extra inputs of the RNN to compute more accurate current location coordinates. The experimental results show that our proposed SISAE-RNN is able to reduce the mean error to 1.60 m. The SISAE-RNN has a superior localization performance compared with some other Wi-Fi fingerprintbased localization algorithms and also can be used in the sce-narios with a few deployed APs, which proves the validity and generality of our proposed method.
In the future, we intend to continue the research work on the improvement of the deep learning-based fingerprinting and tracking algorithms, such as the acceleration of unsupervised feature learning, the optimization of deep network structure, and the integration of fingerprinting and tracking algorithms.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.