Evaluation of a High-Accuracy Indoor-Positioning System with Wi-Fi Time of Flight (ToF) and Deep Learning

,


Introduction
Te goal of 5G and IoT applications is to improve people's daily lives by transforming a variety of things from conventional to intelligent [1,2]. Trough efcient packet radio access and adjustable bandwidth, it will ofer better data speeds and reduced latency [3]. Among them, the deep learning theory is considered one of the most promising techniques to tackle tremendous highdimensional data [4]. Many of those applications require location-related information to deliver their services. Te majority of location-based services (LBSs) for outdoor environments are possible due to GNSSs and Global Navigation Satellite System [5][6][7] and Global Positioning System (GPS) [8]. Te state of the art of outdoor positioning technologies nowadays can be considered mature [9] and sufcient in terms of fulflling the related service requirements such as QoS [10] and user mobility [11] that depends on the network architecture used [12]. Unfortunately, that is not the case for indoor scenarios. Localizing an object or route indoors using GPS is usually not feasible due to the loss of the signal emitted by its satellites [13]. Te complexity of indoor environments with walls and various objects contributes to this phenomenon. Regarding the numerous potential applications that can be enabled, e.g., indoor wayfnding, asset tracking, and crowd monitoring; it is unfortunate that there is no standardized solution for indoor positioning systems (IPSs) yet [5].
However, the topic of indoor positioning solutions has gained substantial interest among industries and academia [14][15][16][17]. Te current landscape of IPS underlying connectivity technologies mainly consists of Bluetooth, Wi-Fi, Zigbee, RFID (radio frequency identifcation), and UWB (ultra-wideband). Each of these technologies comes with its characteristics in enabling an IPS. Te characteristics of each mentioned technology in enabling an IPS are listed in Table 1.
In the case of where the users are people, Bluetooth and Wi-Fi are typically the preferred option over the other technologies as both are available in most smartphones nowadays. Regarding the deployment cost, Wi-Fi can be preferred over Bluetooth as the deployment of Wi-Fi access points (APs) in indoor facilities are more common than Bluetooth beacons. It makes deploying a Wi-Fi-based IPS cheaper than a Bluetooth-based IPS since there is no need to implement new infrastructure in the area. On the UWB side, it leads in terms of accuracy (see Table 1).
However, the availability of UWB in smartphones is not very common yet besides its high deployment cost [22]. Tis makes a UWB-based IPS not as practical as either a Wi-Fibased IPS or a Bluetooth-based IPS. Although the building blocks of making an IPS have been available, realizing a cheap, practical, and highly-accurate IPS remains a challenge [23][24][25][26]. It proposes DeepIndoor, a Wi-Fi-based IPS utilizing the time of fight of Wi-Fi signals and a deep learning approach. DeepIndoor leverages the advantages of a Wi-Fi-based IPS in terms of its practicality and the low deployment cost and combines them with a deep learning approach to improve its accuracy. It uses a data-driven approach for the location inference technique to work with deep learning, i.e., location fngerprinting.
Te location fngerprinting technique can be considered more robust than the classic geometrical approaches (such as multilateration). It does not rely on line-of-sight (LOS) communication to make a good estimation. In location fngerprinting, a location is estimated based on its fngerprint (or set of features), which in this case is a set of Wi-Fi time of fights (ToFs). For that task, it proposes using a fully connected deep neural network (FCDNN) model to act as a positioning engine. Te model is given a location fngerprint as its input to produce the estimated location coordinates as its output. Te successful applications of deep learning in various domains [27] and as computing resources become cheaper and more available, it encourage us to apply it in this domain. By doing this research, our major contributions can be seen as follows: (i) We design a cheap, practical, and highly-accurate IPS using Wi-Fi ToF and a deep learning approach. (ii) We conduct extensive experiments to evaluate the condition of available AP pair scenarios and optimize the performance of the WKNN algorithm and our positioning engine or DeepIndoor on a publicly available dataset which can be accessed in [28]. (iii) We detail the structure and confguration of our positioning engine to encourage its applications in other testbeds or perhaps to work with other features than Wi-Fi ToF for future developments.
Te rest of this article is divided into several sections: Section 2 presents some previous related works, Section 3 details our system model, Section 4 covers our experiment settings, Section 5 shows our results and fndings, and Section 6 provides the conclusions of this research.

Related Works
Indoor positioning system or IPS consists of radio frequency-based system and nonradio-frequency-based system. In frequency-based system, namely, Wi-Fi, there are several localization parameters that consist of distance based and direction based [29]. In distance based, there are signal based and time based. In signal based, there are RSSI and CSI. While in time based, there are ToF and RTT. Te localization parameter, ToF is a time diference between time of departure in APs and time of arrival in users. Te disadvantages are time synchronization, needed for both APs and the user, and higher cost. Te strengths include great resistance to multipath efects and high localization accuracy. Te other localization parameter, RSSI is a received signal strength indicator that computes distance by power loss and the signal strength defciency between APs and the user. Te weakness is prone to the noise, multipath efects, and NLoS, and the strength is easy to implement and no synchronization of time and additional hardware is needed. Hence, if the priority is the high accuracy, then the ToF localization parameter can be considered over RSSI. However, the RSSI localization option might be preferred over ToF if the priority is low price. To estimate the user position, the positioning algorithm is needed to calculate the localization parameter.
Tere are the range-based method like trilateration and the range-free method like fngerprint to utilize the localization parameter. Te user position estimation for both the methods in a 2D space requires measurements from at least three APs [30], and at least four APs needed for 3D space.
Ma et al. [31] proposed a novel positioning algorithm to improve positioning result of Wi-Fi RTT ranging. Tey also explained a characteristic of Wi-Fi fne time measurement (FTM). From the results, the proposed approach achieved a localization error of 1.20 m for static and 1.31 m for dynamic positioning.
Zhou et al. [32] proposed a novel indoor-positioning system algorithm with matrix completion and anchor selection. From the results, the proposed approach achieved a localization error of 1.52 m.
In fngerprinting algorithm, there are deterministic approach such as the Kalman flter, NN, KNN, WKNN, SVM, DT, PCA, and neural networks and the probabilistic approach such as Gaussian distribution, particle flter, Kernel method, hidden Markov model, and Naive Bayes method. For example, using the fngerprinting algorithm and Kalman flter, Giovanelli et al. [33] proposed a novel indoor-positioning system with ToF and RSSI data fusion. Te mean RMS error of data fusion is about 50% lower than  [37] presents an indoor-positioning system with Wi-Fi RTT and RSSI. To solve the problem of signal fuctuations, interference from fngerprinting, multipath propagation errors, and NLOS transmissions, the proposed system achieved a localization error of 0.51 m and 0.59 m, respectively, for ofce and lab environments.
Singh et al. [38] presents an overview of machine learning-based indoor-positioning system with Wi-Fi RSSI fngerprints. Te survey provided an ML-based Wi-Fi RSSI fngerprinting for indoor localization and a comparison of their performance. Te performance of ML prediction models such as DT, SVM, KNN, ANN, MLP, CNN, RNN, and DQN has been compared based on classifcation accuracy, positioning error, robustness, scalability, complexity, localization space, and database used. Also, the author evaluated that CNN [39] has high robustness, high scalability, and low complexity. Ten, from the summarized view of indoor localization schemes table, it can be concluded that KNN could increase the robustness and decrease the positioning error, while PCA could decrease the complexity to reduce the computational time. Tey also summarized the lists of available open-source datasets.
Chin et al. [40] proposed a MIMO-based indoor positioning with CSI data using the artifcial neural network. Tey compare the performance of GCNN, CNN, and FCNN. Te error distance that is below 0.2 m is more than 90% for the GCNN, error distance that is below 0.2 m is 75% for the proposed CNN, and error distance is all above 0.4 m for the FCNN.

Location Fingerprinting.
Location fngerprinting is a location inference technique that utilizes locationdependent characteristics to infer where the estimated location is [41]. A fngerprint, in this context, is a set of characteristics or features that characterize a location. As it utilizes Wi-Fi ToF for this research, a fngerprint is a set of Wi-Fi ToFs. Te location fngerprinting technique consists of two stages: (i) the ofine stage and (ii) the online stage. Te features of various locations in the testbed are collected to build a reference database or map in the ofine stage. Tis database contains various fngerprints with their respective location coordinates. In the online stage, the features of an unknown location are collected to create its fngerprint. Te fngerprint of the unknown location is then compared with the fngerprints stored in the reference database to estimate where the unknown location is. A high-level view of the location fngerprinting technique is depicted in Figure 1.

Wi-Fi ToF.
Measuring the ToF of a Wi-Fi signal has been made possible from the time the fne time measurement (FTM) protocol was introduced in the IEEE 802.11-2016 standard. Te communication between a client device and an AP under the FTM protocol is shown in Figure 2.
Te ToF between two devices is perceived as half of the round-trip time (RTT). Tus, the ToF between a client device and an AP (as illustrated in Figure 2) can be calculated as follows: where t 1 , t 2 , t 3 , and t 4 are timestamps recorded on the local device denoting the time of arrival (ToA) or time of departure (ToD) of the corresponding message. Tis research uses a publicly available dataset (that can be accessed at [28]) for our experiments. Te dataset consists of records of Wi-Fi signals traveling from a transmitting (T x ) device to a receiving (R x ) device. Te ToD and ToA of the corresponding Wi-Fi signal are available in each record. Te ToF of each record can be formulated as follows: Feature based on ToF for fngerprint, ToF from T x to R x , ToA of the signal assessment R x , ToD of the signal assessment T x , and the assessment error path indicates ψ, ToF Tx,Rx , ToF Rx , ToD Tx , and e, respectively. Note that the value of e is not provided in the dataset. However, it considers the value of e as part of the characteristics that are willing to be captured since it represents the area condition. Te transmission range of RFID is below 100 m in free space [20]. 2 Te deployment cost of an RFID-based IPS depends on the utilized positionning algorithm [21].

FCDNN as a Positioning Engine.
Te role of an FCDNN as the positioning engine, in this case, is to estimate the position of the client device based on its location fngerprint.
In the input layer of the FCDNN, the number of neurons is equal to the number of features in the dataset. Additionally, the number of neurons in the output layer depends on the 2-D or 3-D space coordinate. In our case, there are 6 available features in the dataset and 10 APs in the area, and they use the 3-D Cartesian coordinate system to represent the client device location. However, we just used 5 features and omitted the column that displays the AP index in order to evaluate the simulated situations. Te structure of the FCDNN for such a case is shown in Figure 3. On the other hand, the number of hidden layers and their neurons is not specifed initially (presented in another section).

Using the FCDNN.
To use the FCDNN, it needs to feed it with an input of a location fngerprint. In Figure 3, it is shown in the input layer that each element of the input is connected to a neuron. Terefore, for the input of X � (x1, . . . ., xN) ∈ R N , the activation value of each neuron in the input layer is as follows: where x j denotes the j th element of the input and a [1] j denotes the activation value of the j th neuron in the input layer.
For the remaining layers, each neuron is connected to all the neurons in the previous layer (see Figure 3). Determination of the neuron's activation value in the hidden and output layers is as follows: Te activation value of j th neuron in i th layer, link's weight from a [1] j to a [l−1] k , activation bias of j th neuron in i th layer, and number of layers represents a [1] j j , and L, respectively. Notice that in equation (4), it implements the rectifed linear units (ReLU) function in calculating the activation value of the neurons.
Te output of the FCDNN is generated based on the value of the neurons in the output layer. As shown in Figure 3, each neuron in the output layer is connected to an input element. Terefore, the output of the FCDNN is calculated as follows: where Y denotes the output, which is the estimated location coordinates (or label), and n [L] denotes the number of neurons in the output layer.
3.5. Optimizing the FCDNN. By changing the model's parameters θ ∈ R d , such as weights and biases, the FCDNN is optimized. Tis process aims to make the model better at doing its task. For that purpose, it uses gradient descent to minimize the cost function J(θ) by updating each element of θ in the opposite direction of the cost function gradient ∇ θ j(θ) w.r.t. the elements of θ [41]. Te model's cost using the root mean square error (RMSE), since it considers the issue as a regression problem as follows: Sample count, weights and biases, true label of i th sample, and estimated label of i th sample indicated by m, d, Y (i) , and Y (i) , respectively. Note that J(θ) and (Y (i) , Y (i) ) are treated equally as both depend on the model's parameters.
To update the parameters using gradient descent, it adopts the stochastic gradient descent (SGD) algorithm [41]. Additionally, minibatch SGD improves computing efciency. We use the estimate of moments of gradients to hasten convergence and slow down the quick decay of learning rates [42,43]. First, calculate g t,i that is the gradient of the cost function w.r.t. to the parameter θ (i) at the timestep t as follows: After obtaining g t,i , compute the value of m t and v t , which are the exponential moving averages of the gradient and the squared gradient, respectively. Te computations are as follows:

Input Input Layer
Output Output Layer Hidden Layers a [2] n [2] a [L-1] where β is the hyperparameter that controls the exponential decay rates of the corresponding moving averages and w t and b t are the frst moment (the mean) and the second raw moment (the uncentered variance) of the gradient estimates, respectively.
Since the value of w t and b t is initialized as 0, weightcorrection and bias-correction to both m t and v t is performed to counteract the moment estimates that are biased towards zero at initial timesteps. Te value of biascorrectedw t and b t is calculated as follows: where w t and b t denotes the weight-corrected and biascorrected, respectively. Finally, it can update the model's parameters as follows: where the parameter w that minimizes Q(w) is to be estimated. Each summand function Q i is typically associated with the i th observation in the dataset (used for training). Te hyperparameter η, or the learning rate, controls the step size for each iteration.

Te Testbed Map.
Te dataset used in this research is obtained from [28]. Te dataset consists of ToF measurements at 4410 locations in the given area. Before creating the fngerprints of each location, we check if all of the APs are accessible at each of those 4410 locations. Te locations where a pair of 1, 2, 3, and more APs in the area can be heard. Ten, we make 3 scenarios for those APs pair locations. Where, 1 AP pair, 2-10 AP pairs, 1-10 AP pairs, or raw dataset is simulated as shown in Table 2. Te ToF measurement samples at 4410 diferent locations in the given area. In our experiment, our goal is to estimate the client device location based on the Wi-Fi ToFs between the client device and the available APs in the area. Table 3 provides an illustration of measurement data at a specifc place, where for every client location (X, Y, Z), there are ψ or the ToFbased feature between the Tx device or the user device and Rx device or APs. Ten, these features are available from distance (m) in the dataset. Some client location (X, Y, Z) may have 1, 2, 3, 4, 5, or more available ToF-based feature from nearby APs. For example, in  Table 4, for the scenario 1, where 1 AP pair was fltered from the dataset with 1006 fngerprints. Tus, the number of fngerprints in the training and testing dataset are 805 and 201, respectively. In addition, the dataset containing 2689 fngerprints was fltered for scenario 2, which included 2 AP pairs and above, up to 10 AP pairs. Tus, the number of fngerprints in the training and testing dataset are 2151 and 538, respectively. Additionally, for scenario 3, where there were at least one AP pair and up to ten AP pairs were fltered out of the dataset with 4410 fngerprints. Tus, the number of fngerprints in the training and testing dataset are 3528 and 882, respectively. Notice that rounding is applied. Ten, for these 3 scenarios, 2 algorithms which are WKNN and the proposed FCDNN were simulated to compare the performance to predicted user positions.

Te Model's Hyperparameters.
Te value of each hyperparameter of our model is detailed in Table 5. Tere are options for batch size, epochs, hidden layer, neuron in the hidden layer, and η.

Exploration of Model Structures.
Te experiments started by training WKNN and the proposed FCDNN models where each model has a diferent combination of an AP pair that fltered from the dataset. To obtain their accuracy, 3 scenarios were simulated to predict user locations (X, Y, Z). Tus, in terms of number of AP pairs, these models are tested for their accuracy. Note that there is a trade-of between accuracy and over-ftness; therefore, the right balance between the two is aimed. Te positioning error is calculated as the L-2 norm between the estimated and the ground-truth position in Equation (11), where l(Y (i) , Y (i) ) denotes the positioning error of the i th example.

Exploration of Model Structures.
Weighted k-nearest neighbor: in Figure 4, the positioning error of X, Y, and Z in the WKNN algorithm are shown. It can be seen, if the distribution of the true values is more condensed near the predicted lines and also linear along the lines, the algorithm can be considered more accurate. Additionally, because every user's z location is the same (1.4 m), the WKNN algorithm can identify this distribution of user z positions and forecast that all user z positions will be at a single location with a value of almost 1.4 m. It may be inferred from this that the WKNN was able to discriminate between the user x, y, and z position distribution, where the user x and y position have diverse distributions. Additionally, distinguish between the user z locations and the fact that they all have the same value under a single distribution. Figures 5 and 6 show WKNN loss for K � [1,39]. It can be seen that, 0.9424 m loss and 1.3635 m RMSE are the lowest in scenario 3 with K � 3, where the dataset used has 1 and above, up to the 10 AP pair distribution. For scenarios 1 and 2, the loss and RMSE are lowest with K � 2. Furthermore, scenario 1 with K � 2 has the greatest 3.1136 m loss and 2.4771 m RMSE. Tus, in order for WKNN to achieve lower loss and RMSE, the AP pair distribution of 2 and even 1 are needed in scenario 3. Figure 7, the positioning error of X, Y, and Z in the proposed FCDNN algorithm are shown. It can be seen, if the distribution of the true values is more condensed near the predicted lines and also linear along the lines, the algorithm can be considered more accurate. Moreover, because every user's z location is the same (1.4 m), the FCDNN method predicts            pairs. Te FCDNN has the lowest loss and RMSE with 0.1749 m and 0.5740 m, respectively, in scenario 3 or 1-10 AP pairs or the raw dataset.

Conclusion
Te proposed IPS, DeepIndoor, which combines Wi-Fi ToF and a deep learning approach, successfully achieves the goal of this research, namely, enabling the realization of a highaccuracy, cheap, and practical IPS. Te use of a deep learning model, where it established FCDNN that was made for this purpose, allows for the high accuracy. Te average positioning error of DeepIndoor is 0.1749 m and RMSE of 0.5740 m. Te realization of DeepIndoor is also cheap and practical since it utilizes Wi-Fi as the underlying technology where the availability of Wi-Fi in most smartphones and the deployment of Wi-Fi networks in many indoor facilities contribute to these advantages. Terefore, accuracy will increase if there is a larger variety of the AP pair distribution available in the dataset used for training and testing.

Data Availability
Te Wi-Fi ToF dataset used to support the fndings of this study are available in the GitHub repository (https://github. com/intel/WiFi-Location-Core-PE-and-Measurement-Database).

Conflicts of Interest
Te authors declare that they have no conficts of interest.