A Novel Hierarchical Clustering Approach Based on Universal Gravitation

. The target of the clustering analysis is to group a set of data points into several clusters based on the similarity or distance. The similarity or distance is usually a scalar used in numerous traditional clustering algorithms. Nevertheless, a vector, such as data gravitational force, contains more information than a scalar and can be applied in clustering analysis to promote clustering performance. Therefore, this paper proposes a three-stage hierarchical clustering approach called GHC, which takes advantage of the vector characteristic of data gravitational force inspired by the law of universal gravitation. In the ﬁrst stage, a sparse gravitational graph is constructed based on the top k data gravitations between each data point and its neighbors in the local region. Then the sparse graph is partitioned into many subgraphs by the gravitational inﬂuence coeﬃcient. In the last stage, the satisfactory clustering result is obtained by merging these subgraphs iteratively by using a new linkage criterion. To demonstrate the performance of GHC algorithm, the experiments on synthetic and real-world data sets are conducted, and the results show that the GHC algorithm achieves better performance than the other existing clustering algorithms.


Introduction
Clustering is one of the major unsupervised learning techniques and has been applied in many fields such as pattern recognition [1], image processing [2,3], community detection [4,5], bioinformatics [6,7], information retrieval [8,9], and so on. e main task of clustering is to classify a dataset into some nonoverlapping clusters based on a suitable similarity metric so that the elements in the same cluster are similar, while any elements from different clusters are dissimilar. A range of various clustering methods have been proposed and classified as partition-based, hierarchical, grid-based, density-based, model-based clustering, and so on.
K-means [10] and its successors [2,11] are typical partition-based clustering approaches. ey need to be specified the number of clusters in advance. Each data point of the dataset is assigned to its closest cluster according to the Euclidean distances among data points. e new centroids of clusters are repeated to be calculated until the elements are consistently assigned to the same cluster. en, the cluster centers have stabilized and will remain the same forever. However, these approaches are not able to detect nonspherical clusters because an element is always assigned to the nearest center. Numerous studies have been done to overcome the drawback of K-means type algorithms, particularly by using density distribution. In density-based clustering, clusters which have arbitrary shape are considered as the dense regions separated by sparse region in data space [12]. DBSCAN [13] is the most representative densitybased clustering algorithm that needs to be specified a density threshold, discards the points with densities lower than this threshold as noises, and assigns to different clusters disconnected regions of high density. DP [14] is a novel algorithm that efficiently discovers the centers of clusters by finding the density peaks. It assumes that cluster centers are surrounded by neighbors with lower local density and are at a relatively large distance from any points with a higher local density. Furthermore, hierarchical clustering is a significant method of cluster analysis which seeks to build a hierarchy of clusters. e hierarchical clustering algorithms can be divided into two categories including agglomerative and divisive algorithms. Agglomerative hierarchical clustering algorithm starts with every single element in a dataset. en it aggregates the closest clusters with a linkage criterion in each iteration until all elements form one cluster. e divisive hierarchical clustering algorithm starts with the dataset considered a single cluster which is separated into many subclusters until every element forms a cluster. e other differences among hierarchical clustering approaches are determined by the diverse choices of similarity criteria and the linkage criteria. BIRCH is one of the most effective hierarchical clustering methods [15]. It constructs a tree data structure with the cluster centroids being read off the leaf, which can be either the final cluster centroids or can be provided as input to another clustering algorithm. In addition, there are many multistage hierarchical clustering algorithms, such as Chameleon [16], which is a representative approach and can detect the arbitrary shape of the cluster effectively. In the first stage, Chameleon uses a graphpartitioning algorithm to cluster the data items into several relatively small subclusters. In the second stage, it is to find the genuine clusters by repeatedly combining these subclusters based on its selections on both interconnectivity and closeness. ese classical clustering approaches usually only utilize one kind of internal evaluation function to determine clustering quality [17]. Many scholars focused on the study of multiobjective clustering to overcome the defect of conventional clustering algorithms. Peng et al. [18] proposed fuzzy multiobjective clustering based on PSO to obtain wellseparated, connected, and compact clusters. Saha and Maulik [19] proposed the multiobjective clustering based on incremental learning for categorical data. Moreover, lots of clustering algorithms, such as DenPEHC [20], GHFHC [21], Muenc [22], and so on, were also put forward to improve the clustering performance. Meanwhile, new gravity-based clustering approaches were also proposed, such as the LGC algorithm, which would be discussed in Section 2.
It is an important task to design a new clustering algorithm because every algorithm has its own advantages and disadvantages. In conventional clustering algorithms, the similarity or distance is usually scalar, which only contains the partial information among data points. To obtain more information among data points, the vector can be adopted to represent the similarity of two data points. Data gravitational force, which is like the universal gravitational force, is employed to cluster data points. en, we propose a novel hierarchical clustering based on the sparse gravitational graph in which the vertex denotes each object of a data set, and the edge denotes that data gravitation force exists between its two vertices. In the clustering process, a weighted graph firstly is constructed based on universal gravitation. en the graph is divided into several subgraphs based on the gravitational influence coefficients between each vertex and its adjacent vertices. Finally, it is iterative to merge two subgraphs based on a new linkage measure until the genuine clusters are found.
ere are three highlights in this paper as follows. At first, the sparse gravitational graph is defined based on the data gravitation model. Meanwhile, a new measure is used to extract more valuable information between each vertex and its adjacent vertices in the sparse gravitational graph. Secondly, a new linkage measure which makes the best of the data gravitation's characteristics is proposed to merge the subclusters iteratively. irdly, a novel three-stage gravitybased hierarchical clustering method named GHC is proposed. e GHC algorithm can be used to detect arbitrary clusters effectively and achieves an excellent clustering performance on the synthetic and real-life data sets in this study. e remainder of the paper is organized as follows. In Section 2, the related work of gravity-based clustering is reviewed.
e novel gravity-based hierarchical clustering (GHC) is proposed and analyzed in detail in Section 3. In Section 4, the experiments on the synthetic and real-world data sets are conducted and discussed. In Section 4, the conclusions are drawn.

Related Work of Gravity-Based Clustering
Using gravity theory in clustering is not a new idea. Numerous gravity-based clustering algorithm, which simulates the process of the attraction and merging of objects by their gravity force, has been studied. Usually, these algorithms consider each data point as an object and assign a mass to it in feature space. Wright [23] proposed the first version of gravitational clustering, which updates the position of each data point at each iteration and aggregates the data points into clusters when they are close. Yung [24] employ the gravitational clustering approach to segment color images. Each pixel with a unit mass maps to a location (as a particle) in RGB space. e mass of a particle is the total number of pixels mapped to it. e gravity causes the particles to move in the space under constraint. e particles are clustered when they move to the same location in RGB space. Wang et al. [25] proposed two novel clustering approaches based on the local gravitation model. In this model, each data item is considered as an object with mass and associated with a local resultant force (LRF) generated by its neighbors in the local region. e clustering process is realized by using the differences among the LRFs of the data points close to the cluster centers and at the boundary of the clusters. Bahrololoum et al. [26] proposed another approach that finds the best positions of the cluster centroids determined by employing the law of gravity. In the approach, the data points and cluster centroids are considered as fixed celestial objects and movable objects, respectively. e celestial objects apply a gravity force to the movable objects and change their positions in the feature space. e best cluster centroids are obtained until the sum of the forces on each centroid approaches zero. Mohammed Alswaitt et al. [27] proposed a modification over a gravity-based data clustering algorithm. e modified algorithm adopts the dependence of the agent on velocity and an initialization step of centroid positions to impose a balance between exploitation ability and exploration ability of gravity-based clustering approach. Besides, a serial of approaches based on gravity theory and Newton's second law of motion was proposed by Gómez et al. [28], Kundu [29], and Sanchez et al. [30]. In these approaches, points of the same cluster will move toward the direction of their cluster center. Inspired by the phenomena of gravitation and the black hole, Hatamlou [31] proposed a new heuristic optimization approach called the black hole algorithm. Other heuristic algorithms inspired by gravitational phenomena have been designed for clustering. For instance, a heuristic gravitational search algorithm (GSA) was proposed by Rashedi et al. [32] and was applied in solving wind-hydro-thermal CO problem by Shukla and Singh [33]. Yin et al. [34] designed a hybrid data clustering algorithm based on GSA.
To the best of our knowledge, each data point of a dataset is considered a movable object with mass in the most existing gravity-based clustering algorithms. Data points can move around in feature space in the influence of the law of gravity and merge into several clusters when they move close enough to each other. In our approach, we establish the data gravitation model and utilize the relation between each data point and its neighbors which exert the largest gravitational forces on it to group a dataset into many subclusters. And then, two subclusters with the largest resultant gravity force are merged. To boost the effectiveness of clustering, we define the sparse gravitational graph based on the data gravitation model, which can be divided into many subgraphs based on the relation between each vertex and its adjacent vertexes. Next, subgraphs can be repeatedly merged to form a larger subgraph until the terminal condition is satisfied.

Data Gravitation Model
Newton's law of universal gravitation states that every point mass attracts every other point mass with force acting along the line through those points, which is proportional to the product of their masses and inversely proportional to the square of the distance between them. e gravitational force can be calculated as follows: where P ⇀ ij denotes the gravitational force exerted on point mass i by point mass j, σ i and σ j are the masses of the two points, respectively, δ ij is the distance between point mass i and point mass j, δ ij is the unit vector from point mass i to j, and κ is the gravitational constant.
Similar to the gravitational force, it is assumed that data gravitation exists among any two data points in data space. e data gravitation can be given as follows: , m i and m j are, respectively, the masses of the data points i and j, d ij is the distance between the two data points, d ij is the unit vector from data point i to j. e mass of data point i can be defined by where In other words, m i equals the number of points from which the distances to point i are less than c. Especially, the mass of a data point is equal to 1 when c � 0. Moreover, we assume that the gravitational forces exerted on a data point are the top k gravitational forces between it and other data points. erefore, the gravitational resultant force (GRF) of data point i can be obtained as follows: where Ω i is the set of neighboring data points which exert the top k gravitational forces on data point i. e gravitational force between two data points changes with the cutoff distance c because their masses are related to c. en GRF of a data point also changes with c according to equation (4).
For example, Figure 1 shows the gravitational force and GRF when c is specified to different values in a 2D data set Figure 1(a), the mass of each data point is 1 when c � 1. It can be noticed that the GRF of the data point x 1 is directed towards the data points x 2 and x 4 . It indicates that the data points x 2 and x 4 provide more influence on x 1 . In Figure 1(b), the masses of x 1 , x 2 , x 3 , x 4 , and x 5 are 4, 1, 2, 3, and 3, respectively. e GRF of the data point x 1 is directed towards the data points x 4 and x 5 . It indicates that the data points x 4 and x 5 provide more influence on x 1 . en the gravitational influence coefficient (GIC) is introduced to represent the relationship between the RGF of a data point and the gravitational forces exerted on it by other data points. e GIC of data points i and j is defined as follows: where F ⇀ i is the resultant force of data point i, F ⇀ ij is the gravitation force exerted on data point i by its neighboring data point j. GIC i j ranges from − 1 to 1. e bigger the GIC i j , the point j provides more influence on data point i. Intuitively, the gravitational influence coefficient can be adopted to realize the data cluster analysis. e data point i and j are grouped into a cluster if there are the bigger GIC i j and GIC j i than a threshold. Otherwise, they are clustered into different clusters. In this way, a course clustering method can be obtained.

The Proposed Gravity-Based Hierarchical Clustering Algorithm
ough the course clustering algorithm based on the data gravitation model can be employed to cluster a dataset, its clustering performance is not good. erefore, a novel Mathematical Problems in Engineering hierarchical clustering algorithm (GHC) is proposed based on sparse gravitational graph which can make the algorithm implement easily and perform effectively. e time complexity of GHC algorithm is analyzed at the end of this section.

Sparse Gravitational
. , x n denote a data set with n data points, in which each data point (2). e smaller the value of k, the sparser the graph. Figure 2 shows the different gravitational graphs of a dataset when various parameters are specified. e vertex x i is not only influenced by the vertex x j but also by its other adjacent vertices in the sparse gravitational graph, though the relationship of two vertexes x i and x j can be described by the gravitational force between them simply. Considered the influence of each vertex to its adjacent vertex, the gravitational influence coefficient can also be introduced into the gravitational graph to describe the influence between two vertices. Two vertices i and j can be grouped into the same cluster if their GIC j i and GIC i j are larger than the threshold θ. e edges between two vertices in the same cluster are retained in the graph, while the edges of which the vertices belonged to different clusters are removed from the graph. en the gravitational graph will be partition into many subgraphs. However, these subgraphs are not the final clustering results.

Gravity-Based Hierarchical Clustering Algorithm.
ough the gravitational graph can be partition into many subgraphs which denote different subclusters, the performance of the clustering is poor. But these subgraphs can be considered as the intermediate results of clustering. erefore, a new hierarchical clustering algorithm is proposed based on the intermediate results of partitioning the gravitational graph. e proposed clustering approach consists of the following three stages.
During the first stage, the data set is mapped into a sparse gravitational graph which is similar to the k-NN graph. Firstly, the data set is preprocessed by using feature transformation and dimension reduction technique. And then, the mass of each data point is calculated by equation (3), and the gravitational force between two vertices is computed by equation (2). e initial gravitational graph is constructed, in which the weights of vertex and edge are the corresponding mass and force. e procedure of constructing sparse gravitational graph is presented in Algorithm 1.
During the second phase, the gravitational graph is partitioned into many small connected subgraphs based on the gravitational influence coefficient among vertices. If the GIC i j and GIC i j are greater than the threshold θ, the edge (x i , x j ) is retained in the graph. Otherwise, the edge would be removed from the graph. e process of the second stage is described in Algorithm 2.
In the last stage, the genuine clusters are found by emerging subgraphs iteratively. e core of merging the subgraphs is to define the linkage criterion between two clusters. e linkage criterion determines the similarity among the subgraphs. e common linkage criteria are complete linkage, single linkage, mean average linkage, centroid linkage, minimum energy linkage, graph degree linkage, and so on. Different from the above linkage criteria, a novel linkage measure is defined to determine the similarity of two subgraphs based on the vector property of gravitational forces. It is called as gravitational merging coefficient (GMC) and obtained by combining the gravitational forces between two subgraphs. Mathematically, GMC is formulated as follows: x 5  Mathematical Problems in Engineering where C i is the ist subgraph, C j is the jst subgraph, N i is the number of vertexes in C i , and N j is the number of vertexes in C j . It each iteration, the two subgraphs with the biggest GMC are merged into a new subgraph. e clustering process is terminated until the end conditions are met. e processing steps are presented in Algorithm 3. e overall procedure of GHC is presented in Algorithm 4. To illustrate the GHC algorithm, the main clustering steps are shown in Figure 3. Figure 3(a) shows the first stage of GHC when c � 0.8 and k � 3. e artificial data Input: X: the data set. k: the number of data points with top k gravitational force. c: the cutoff distance used to determine the mass of each point. Output: G: the sparse gravitational graph. (1) Scale the data set X using a feature transformation technique; (2) Calculate the Euler distance d ij between any two data points i and j in the data set X; (3) Calculate the mass m i of any data point i in the data set X by equation (3); (4) Calculate the data gravitational force F ⇀ ij between any two data points i and j in the data set X; (5) Initialize the sparse gravitational graph G � (V, E). And set V � X and E � { }; (6) for each data point x in X do (7) Assign the mass of x as the weight of the corresponding vertex in V; (8) Select data points y 1 , y 2 , . . . , y k with the top k data gravitation exerted on data point x; (9) for i � 1to k do (10) Insert the edges (x, y i ) into the set E; (11) Assign the data gravitational force of x and y i as the weight of the edge (x, y i ); Mathematical Problems in Engineering set with 25 data points is mapped into a gravitation graph by Algorithm 1. Figure 3(b) shows the second stage of GHC when θ � 0.5. e gravitational graph is partitioned into many subgraphs by Algorithm 2. Figures 3(c) and 3(d) show the third stage of GHC. e subgraphs with highest GMC is merged by using Algorithm 3. Figure 3(c) shows the gravitational graph after six iterations. Figure 3(d) shows the clustering result after twelve iterations. Obviously, the data set is grouped into two clusters correctly.

Complexity Analysis.
e time complexity can be defined as the sum of the complexities of each stage of GHC algorithm. For the first stage, each data point needs to calculate the masses, find its neighbors with the top k gravitational forces, and then construct the gravitational graph. Considering a data set with n data points, the time complexity of Algorithm 1 is O(n 2 log(n)). During the second stage, the GRF of each data point is calculated, and the gravitation graph is divided into some subgraphs. us, Input: G: the sparse gravitational graph. θ: the threshold of gravitational influence coefficient. Output: G′: the separated gravitational graph.
(1) G′� G; (2) for each vertex v in the graph G′ do (3) Search the adjacent vertices u 1 , u 2 , . . . , u t of the vertex v; (4) for i � 1 to t do (5) Calculate the GIC i j of the vertex u i for the vertex v by equation (5); (6) Calculate the GIC u i v of the vertex v for the vertex u i by equation (5); (7) if GIC v u i < θ or GIC u i v < θ then (8) Remove the edge (v, u i ) from the edge set E of G′; (9) end (10) end (11) end (12) return G′; (1) Search out the connected subgraphs G 1 , G 2 , . . . , G p in the gravitational graph G′; (2) Calculate the GMC(G i , G j ) of the connected subgraphs G i and G j by equation (6); (3) Select the connected subgraphs G s and G t which have the largest GMC(G s , G t ); (4) for each vertex v in G s do (5) for each vertex u in G s do (6) if the edge (v, u) in G then (7) Insert the edge (v, u) into G′; erefore, the worst case time complexity of GHC algorithm is O(n 2 log(n)).

Performance Metrics.
In this study, four clustering performance metrics, such as Purity [27], Rand Index (RI) [35], Fmeasure [35], and Normalized Mutual Information (NMI) [36], are used to evaluate the performance of clustering algorithms. Given a dataset X � x 1 , x 2 , . . . , x n with p categories and n data points, the set P � p 1 , p 2 , . . . , p p denotes the real classes in which P j (1 ≤ j ≤ p) is the subset of X. e clustering result is the set Q � Q 1 , Q 2 , . . . , Q q in which Q i (1 ≤ i ≤ q) also is the subset of X. Purity is the external evaluation criterion of cluster quality. e purity of a cluster Q i with n i data points is defined as follows: where n j i is the number of the data points in jth class that are assigned to ith cluster. e overall purity of a clustering result is defined as In general, larger Purity denotes better clustering result. Rand Index is calculated as follows: where RI ∈ [0, 1], a is the number of pairs of data items in X that are in the same subset of Q and in the same subset of P, b is the number of pairs of data items in X that are in different subsets of Q and in different subsets of P. Fmeasure is like RI with the exception that true negatives are not taken into account. Mathematically, Fmeasure is calculated as follows: where c is the number of pairs of data items in X that are in different subsets of Q and in the same subset of P. e normalized mutual information (NMI) is also adopted in this paper. e NMI is computed as , e larger NMI denotes a better performance of clustering.

Parameter Settings.
To investigate the performance of GHC, the experiments are performed on the synthetic datasets shown in Figure 4 and real-life datasets tabulated in Table 1. Six well-known clustering algorithms, such as Kmeans [10], K-means++ [37], Spectral Clustering (SC) [38], DBSCAN [13], Birch [39], and LGC [25], are employed to compare with GHC algorithm. e well-tuned parameter settings of the GHC algorithm and the competitive algorithms are tabulated for each data set in Table 2. For K-means, K-means++, and SC, the parameter τ is the number of classes in each data set. For SC algorithm, the parameter σ 2 is sought from the set {0.01, 0.1, 0.5, 1, 1. To demonstrate the performance of GHC algorithm, the three parameter c, θ, and k are, respectively, equal to 0.2, 0.1, and 6 for all synthetic data sets. For all the realworld data sets, the pair (c, θ, k) with the best RI value is chosen for each real-world dataset. e tunable parameter c is varied from − 1 to 10 with an increment 0.1. e parameter θ is chosen from − 1 to 1 with an interval 0.1. e parameter k is chosen from the set {4, 5, 6, 10}. For all nondeterministic approaches, we run these algorithms 100 times on each data set and adopt the average of each performance criterion to evaluate the performance of GHC algorithm. For all deterministic approaches, the performance metrics are taken by running only once.

Experiments on Synthetic Datasets.
In order to investigate the performance of the proposed approaches, a series of experiments on twelve synthetic datasets shown in Figure 3 are performed by using the proposed GHC and the other existing algorithms. e performance results are tabulated in Table 3. In Table 3, the first column denotes the used dataset, whereas the first row denotes the used algorithms. e digits in the other fields of the   performance on the dataset. Although the synthetic datasets are easy to be clustered intuitively, not all the clustering algorithms achieve remarkable performance in this study. Overall, CHC, DBSCAN, and LGC algorithms obtain more competitive advantages than the other algorithms. e GHC algorithm obtains good clustering results on all synthetic datasets while DBSCAN and LGC algorithms only achieve the worse performances on a few datasets.

Experiments on Real-Life Datasets.
In order to investigate the performance, the proposed GHC algorithm and other competitive approaches are adopted to solve the clustering problems on the real-world datasets tabulated in 3. For each real-life dataset, the well-tuned parameters of all algorithms also are tabulated in Table 2. e performance results are shown in Table 4. In Table 4, the first row denotes the algorithms while the first column denotes the real-life datasets used in the experiments. e digits in other fields of this table denote the evaluation results for the GHC algorithm and other existing algorithms on each dataset. GHC algorithm obtains the best values of all evaluation criteria on the datasets such as BTissue, Iris, and Wine. On the other datasets, the evaluation results of the GHC algorithm are the best or close to the best for the four evaluation criteria. In the overall view, the GHC algorithm outperforms other competitive algorithms on these real-world datasets.

Discussions.
In this subsection, we mainly discuss the role and impact of parameters to the performance of the GHC algorithm. ere are three tunable parameters c, θ, and k, which are required to determine for GHC. e parameter c determines the masses of data points, which affects the force of gravity straightly and controls the structure of the gravitational graph with the gravity forces varying. e second parameter θ controls the number of subgraphs that the gravitational graph can be partitioned into. e third parameter k determines the sparsity and connectivity of the gravitational graph. In the previous subsection, it can be noticed that the GHC algorithm performs better than the state-of-the-art algorithms for all synthetic datasets, though these parameters are set to fixed values (c � 0.2, θ � 0.1, k � 6) which maybe are not the optimal values.
To illustrate the impact of the parameters c, θ, and k, we conduct a series of experiments on real-world datasets to analyze the influence of each parameter to the clustering performance of the GHC algorithm. e prior knowledge of the real-world datasets can be used to search the optimal parameters with the best values of evaluation metrics. Figure 5 shows that the values of the evaluation metrics Rand Index and Purity change when the parameter c varies in a given interval. e parameter c varies from 0 to 10 with increment of 0.1 for all datasets except the SControl dataset for which it varies from 0 to 250 with increment of 5. It can be noticed that the performance values fluctuate within an interval as the parameter c is increased. e evaluation result on each dataset will converge to a fixed value when c is beyond the interval. Because c determines the mass of each data point by equation (3), the gravitational force between two data points will be significantly different when c is set to different values. In essence, the different distribution of data points' masses affects the gravitational forces between them  and leads to different clustering performance. Figure 6 shows that the values of the evaluation metrics Rand Index and Purity with the best clustering performance change as the parameter θ specifies different values, which changes from − 1 to 1 with increment 0.1. From Figure 6, it can be noticed that the evaluation values increase on the general trend as the threshold θ is increasing in most of the realworld datasets. e reason is that the data points of different clusters are divided into the same cluster in the second stage of the GHC algorithm when the parameter θ is set to a lower value. In contrast, the performance of the GHC algorithm is better when the value of parameter θ is set to a high value because the data points of different clusters can be partitioned into a cluster correctly. Figure 7 shows that the values of the evaluation metrics Rand Index and Purity with the best clustering performance change as the parameter k specifies different values. In general, the performance metric Purity decreases as the parameter k is increasing. From Figure 7, there is a single peak at which the value of Rand Index reaches the maximum when k changes from 1 to 20. From the above analysis, the GHC algorithm can achieve good performance when the three parameters are set to the suitable values for each dataset.

Conclusions
In this paper, we propose a novel gravity-based clustering approach that sufficiently utilizes the vector properties of gravitational force. To some extent, the data gravitational force can be considered as a similarity measure which takes not only density but also distance into account. To illustrate the performance of GHC algorithm, the experiments with all  well-tuned parameters have been conducted on synthetic datasets and real-life datasets compared with the other famous clustering algorithms. e experiments' results show that the GHC algorithm is robust and achieves competitive performance. Of course, it also can be noticed that the time complexity of the GHC algorithm is high. e problem can be improved in the future. Meanwhile, the GHC algorithm can be applied in more application fields.

Data Availability
e data used to support the findings of this study have been deposited in the UCI Machine Learning Repository (http:// archive.ics.uci.edu/ml) and the figshare database (https:// doi.org/10.6084/m9.figshare.8187623.v1).

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.