There are several methods which can be used to locate an object or people in an indoor location. Ultra-wideband (UWB) is a specifically promising indoor positioning technology because of its high accuracy, resistance to interference, and better penetration. This study aims to improve the accuracy of the UWB sensor-based indoor positioning system. To achieve that, the proposed system is trained by using the K-means algorithm with an additional average silhouette method. This helps us to define the optimal number of clusters to be used by the K-means algorithm based on the value of the silhouette coefficient. Fuzzy c-means and mean shift algorithms are added for comparison purposes. This paper also introduces the impact of the Kalman filter while using the measured UWB test points as an input for the Kalman filter in order to obtain a better estimation of the position. As a result, the average localization error is reduced by 43.26% (from 16.3442 cm to 9.2745 cm) when combining the K-means algorithm with the Kalman filter in which the Kalman-filtered UWB-measured test points are used as an input for the proposed system.

With the expansion of information technology, indoor positioning technology has developed rapidly. Positioning methods are mainly divided into two categories: the location fingerprint positioning method and the trilateration algorithm [

One of the most important applications of the indoor positioning system is to achieve efficient manufacturing processes in industrial facilities where it is necessary to track products, objects, and machines. Such an environment is considered to be more complex compared to other regular indoor positioning scenarios in which large machines block the line of sight path and increase the reflections and multipath effects. Thus, in [

For the UWB systems to perform reliably in indoor areas, error mitigation techniques are applied based on the ranging error modelling methods [

Another feature that can benefit from the indoor positioning system is determining the position of assets within a network. The GPS is sufficient for an outdoor environment; however, the GPS is hard to apply in an indoor environment because of walls and obstacles. In [

There is a wide range of medical applications that can benefit from the indoor positioning functionality. Patients that suffer from dementia often show wandering behaviour because of memory loss or boredom. Such cases are considered hard to understand and manage. Yang et al. [

In [

A nonintrusive breathing monitoring system that benefits from the C-band sensing technique is proposed in [

Most of the predescribed works utilize UWB for the indoor positioning system because of the wide range of advantageous properties that the UWB indoor positioning system offers. It especially offers accuracy better than 30 cm. In our paper, a UWB development kit is utilized to implement this experiment and to provide the dataset for this study. Moreover, this UWB development kit provides accuracy better than 20 cm, and with the help of clustering algorithms, it provides accuracy better than 10 cm, around 9 cm.

Regarding the machine learning methods that are employed in these references, the support vector machine method is used in more than one study for classification purposes. Offered methods in our paper investigate the benefits of using the clustering methods that involve the grouping of data points with similar properties. Our paper presents the effect of using the clustering method on the accuracy of UWB indoor positioning system.

Because of its many advantages, UWB is an emerging and promising technology in indoor environments. However, the existence of a line-of-sight (LoS) blockage can affect the location accuracy. First, the effects occur because the LoS blocking material, which has a high level of dielectric constant, introduces propagation delay. Second, by making the propagation channel’s multipath structure complicated, it makes it difficult to estimate the ToA of the path signal [

A method is proposed to estimate the positions of a moving object instantaneously by combining the machine learning algorithm with the Kalman filter [

A method for using the multilateration with probabilistic RFID map-based technique is developed to determine the position of the unknown tag. The Kalman filter is also implemented to improve the estimation of the tag position. The application of this method can obtain the accurate estimation of position and accelerations as well [

In [

A detailed similarity analysis is presented in [

The issue of selecting the right cluster number is studied in [

The intelligent centroid localization (ICL) method is proposed in [

In this work, a dataset is used, collected from an active learning classroom (ALC), shown in Figure

Active learning classroom, measuring 7.35 m × 5.41 m and installation of the four anchors expressed as A0, A1, A2, and A3, tag expressed as Tag1, and the test points expressed as ✕.

While the active learning classroom, measuring 7.35 m × 5.41 m, is designed as a test bed for collecting data, a ceiling system, attached to the ceiling and the anchors (shown as A0, A1, A2, and A3 in Figure

As shown in Figure

A sensor kit of Decawave MDEK1001 development kit which can be assigned as an anchor or a tag.

A special ceiling system shown in Figure

Ceiling installation of four anchors expressed as A0, A1, A2 and A3.

The proposed methods employed in this study are briefly described in the following sections. These methods are K-means, fuzzy c-means, and mean shift for clustering, the Kalman filter, and finally, the average silhouette method to initialize the optimal number of clusters.

K-means is considered to be one of the most important clustering algorithms. The K-means algorithm selects

Set the cluster number

Select

Calculate the distance between points of data and cluster centroids

If similar points of data are close to the centroid, move that cluster

Acquire new cluster centers by averaging data points in each cluster

Repeat Steps (3) to (5) until there is no change in cluster centroids or the maximum number of iterations is reached

FCM is an algorithm for data clustering. Based on the fuzzy set theory, it allows one piece of data belong to two or more clusters where fuzzy means “unclear” or “not defined” and C denotes “clustering.”

The advantages of this algorithm are its robust behaviour, ability of uncertainty data modelling, applicability to multichannel data, and its straight-forward implementation [

The objective function given in equation (_{ij} refers to the membership degree of _{i} in the cluster _{i} refers to the ^{th} measured _{j} refers to the

The fuzzy partitioning process through the iterative optimization of the objective function is shown in equation (_{ij} and the _{j} cluster centers by [

The iteration stops when [

Initialize _{ij}] matrix,

Calculate the center vectors at _{j}] with

Update both

STOP If ||

The mean shift algorithm is based on the general idea that locally averaging data result in moving to a higher density and, therefore, more typical regions [

The algorithm is used for a variety of purposes. Clustering analysis, image segmentation, object tracking, information fusion, edge detection, and filtering are some examples. The Kernel function is used in the mean shift algorithm to compute the steps of the algorithm and estimate the point gradient orientation [

The mean shift algorithm is very attractive because it is based on nonparametric kernel density estimates (KDE) in which the user does not need to define the number of clusters. The only parameter the user needs to specify is the scale of the clustering (bandwidth). In the mean shift clustering, the input of the algorithm is the data points and the bandwidth or scale. Call

for

repeat

until stop

end

connected components

The results of the mean shift are carried over to kernels where each test point has its own weight and also its own bandwidth. The Gaussian kernels are utilized since it is easier to analyze and it leads to simpler formulas.

The Kalman filter uses a series of data observed over time that may contain inaccuracies such as noise with the aim to estimate the unknown variables with better accuracy. The Kalman filter has become a standard approach in optimal estimation due to its merits of real time, efficiency, speed, and strong anti-interference. And now, the Kalman filter is applied in the fields of target tracking and navigation, such as tracking of a maneuvering target and positioning of GPS [

Input: _{est}, _{est}

Output:

Step 1. Initialize

Step 2. Predict the state vector and the covariance:

_{prd} = _{est}

_{prd} = _{est}

Step 3. Estimation step:

Step 4. Compute the Kalman gain factor:

klm_gain =

Step 5. Correction based on observation:

_{prd} + klm_gain _{prd})

_{prd}−klm_gain _{prd}

Step 6.

_{est}, _{est}, _{est} its covariance _{est}.

The average silhouette is a way of defining the number of clusters, by measuring the quality of clustering. In other words, it determines how well each data point lies within its cluster. The silhouette ranges from −1 to +1, the high value refers to good clustering. The higher the average silhouette coefficient is (closer to 1 than 0), the higher to its cluster the data points get [_{i} is the average dissimilarity between the ^{th} data point and all other points in the cluster and ^{th} point to points in another cluster ^{th} data point is [

The steps of the average silhouette are as follows:

Perform the clustering algorithm, such as K-means or fuzzy c-means for different values of

Calculate the average silhouette of observations for each

Consider the appropriate number of clusters based on the location of the maximum

Experiments are performed using the ALC dataset. Our goal focuses on improving the accuracy of UWB indoor positioning system using machine learning methods. Accuracy is used as the performance metrics in comparison among the clustering methods. The accuracy metric is related to the distance between the real location and measured location for a given point. The distance is calculated using the Euclidean distance equation:

Proposed system flow chart.

The proposed system is applicable for K-means, FCM, and mean shift algorithms. The average silhouette method is used in order to define the optimal number of clusters in K-means and FCM algorithms for each test point by varying

The maximum average silhouette coefficient in K-means for the training set.

The maximum average silhouette coefficient in FCM for the training set.

The maximum average silhouette coefficient in K-means for the test set.

The maximum average silhouette coefficient in FCM for the test set.

Figure

The distribution of UWB test points over clusters (a) for the training set and (b) for the test set.

When it comes to the test set, the average silhouette method is also used to define the optimal number of clusters for K-means and FCM algorithms. The optimal distribution of the test set over clusters is shown in Figure

The average location error comparison for the training set is shown in Figure

The average error comparison (a) for the training set and (b) for the test set.

To acquire a better optimized result and improve the accuracy of the clustering algorithms, in the second simulation, the Kalman filter is applied on the ALC dataset first.

Filtering noisy signals are important since many sensors have an output too noisy to be used directly; and utilizing the Kalman filter lets you take the uncertainty in the signal/state into account.

The same simulation is repeated, but instead of using the row UWB-measured test points, now the Kalman filtered UWB test points are used as an input.

Figures

The maximum average silhouette coefficient in K-means after applying the Kalman filter for the training set.

The maximum average silhouette coefficient in FCM after applying the Kalman filter for the training set.

The maximum average silhouette coefficient in K-means after applying the Kalman filter for the test set.

The maximum average silhouette coefficient in FCM after applying the Kalman filter for test set.

The distribution of test points over clusters after applying the Kalman filter for the training set and test set is shown in Figures

The distribution of test points over clusters after applying the Kalman filter (a) for the training set and (b) for the test set.

The average error results after applying the Kalman filter for the test set.

As shown in Figure

The primary purpose of this study is to investigate the use of different clustering algorithms to improve the accuracy of the UWB indoor positioning system and check the performance of each algorithm. The highest accuracy is obtained when applying the K-means algorithm. Thus, applying the K-means algorithm in relevant studies is recommended based on the obtained results. One of the limitations of using the K-means clustering algorithm is to initialize the number of clusters in advance, so it is difficult to predict the

The secondary purpose is to introduce the impact of employing the Kalman filter on the accuracy. Hence, the raw UWB dataset is fed to the Kalman filter first. Then, the Kalman-filtered UWB dataset is used as input to the clustering algorithms. By combining the Kalman filter with K-means, the highest possible accuracy is obtained in this study. Implementing the Kalman filter should be highly considered when improving the accuracy of the indoor positioning system. The cost factors should also be considered when combining both the Kalman filter and any of the clustering algorithms, especially the computation time factor.

In this paper, three clustering algorithms are compared in terms of accuracy, using the ALC dataset. As a conclusion, it can be deduced that the K-means algorithm is superior to all other methods, with the highest accuracy (14.0864 cm) for the test set, especially when the average silhouette method is used to determine the optimal number of clusters. However, the mean shift algorithm has the lowest accuracy (14.4748 cm), when it is compared with K-means and FCM algorithms, despite its advantage. The main advantages of mean shift algorithms stem from the nonparametric nature of the kernel density estimate (KDE); and the user needs to set only one parameter, the bandwidth. This is often more convenient than having to select the number of clusters explicitly or utilizing other methods to define the number of clusters such as the average silhouette or the elbow methods.

The FCM algorithm has an accuracy of 14.2743 cm, which is very close to the result obtained from the K-means algorithm. However, the FCM algorithm tends to run more slowly when it is compared with K-means because more work is done during the processes where each data point is evaluated with each cluster; and with each evaluation, more operations are involved. FCM needs to do a full inverse-distance weighting, whereas K-means just needs to do a distance calculation. Thus, K-means is simpler and computationally faster.

In [

Finally, the Kalman-filtered UWB data are applied as an input to the clustering algorithm for the training and the test sets. The best result is obtained from the K-means algorithm in which the average error is reduced by 43.26% (from 16.3442 cm to 9.2745 cm). As it can be clearly observed by considering the Kalman filter effect on the raw data, noise and interference effects can be removed from the signal. Then, if filtered data can be considered for the clustering method, it will be much more effective and much more accurate. Based on the obtained results from the clustering algorithms, it can be concluded that the K-means is the most appropriate one for indoor positioning system due to its simplicity, fast computations, and especially its high accuracy. Another feature to recommend the K-means algorithm for consideration is that it can be scaled to large datasets. Advanced versions of the K-means should be taken into account for future studies to select better values of the initial centroids. Since the K-means has a gradient descent nature, the algorithm is highly sensitive to the initial placement of the cluster centers.

The raw data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that they have no conflicts of interest.

This research was funded by Personal Research Project (BAP) grants received from Kadir Has University (Grant number: 2017-BAP-09).