An Improved Algorithm Based on Fast Search and Find of Density Peak Clustering for High-Dimensional Data

The find of density peak clustering algorithm (FDP) has poor performance on high-dimensional data. This problem occurs because the clustering algorithm ignores the feature selection. All features are evaluated and calculated under the same weight, without distinguishing. This will lead to the final clustering effect which cannot achieve the expected. Aiming at this problem, we propose a new method to solve it. We calculate the importance value of all features of high-dimensional data and calculate the mean value by constructing random forest. The features whose importance value is less than 10% of the mean value are removed. At this time, we extract the important features to form a new dataset. At this time, improved t-SNE is used for dimension reduction, and better performance will be obtained. This method uses t-SNE that is improved by the idea of random forest to reduce the dimension of the original data and combines with improved FDP to compose the new clustering method. Through experiments, we find that the evaluation index NMI of the improved algorithm proposed in this paper is 23% higher than that of the original FDP algorithm, and 9.1% higher than that of other clustering algorithms (K-means, DBSCAN, and spectral clustering). It has good performance in high-dimensional datasets that are verified by experiments on UCI datasets and wireless sensor networks.


Introduction
In our daily life, when we are faced with a problem that we do not have an accurate standard to classify. Cluster analysis is an effective means to judge and analyze. It has applications in many fields, such as finance, medical, and image. The cluster algorithm divides elements into clusters according to calculated mathematical characteristics. Although many clustering algorithms have been studied and discussed, there is no agreement on the definition of clustering. Speaking of clustering algorithm, the first thing we have to mention is K-means [1]. K-means is an efficient and concise algorithm, which has been discussed by many researchers. It only needs to set the number of clusters to calculate clustering results to users. But these methods cannot detect clusters for nonspherical data [2]. DBSCAN [3] is also a classic and effective algorithm. It is a representative clustering algorithm based on density. DBSCAN is an algorithm that is in line with the spatial distribution of data and the consistency of data density. The author creatively defines two parameters to constrain and control the generation of clusters. But these two parameters also lead to the effect of the DBSCAN algorithm seriously affected by the parameters. It has good robustness [4] to outliers and can even detect outliers. Spectral clustering [5] is a method of clustering data points by means of discrete mathematics. The algorithm constructs an undirected graph and judges and analyzes the properties of the undirected graph. The most ingenious and important part of this algorithm is to construct Laplacian matrix. To construct Laplacian matrix, Laplacian matrix needs to calculate the similarity matrix. Firstly, the similarity matrix will use full connection mode. There are many kernel functions to measure the relationship between points. The average effect of Gaussian kernel function is the best among many kernel functions. Secondly, the Laplacian matrix is constructed by calculating adjacency matrix and degree matrix through similarity matrix. Finally, it needs to get the eigenvector of Laplacian. According to eigenvalues and eigenvectors, we can use other clustering algorithms to complete the clustering task. Balanced iterative reducing and clustering using hierarchies (BIRCH) [6] is method by tree structure to cluster quickly. Birch algorithm is to form a clustering feature (CF) tree. By calculating the similarity of the dataset, the most similar sample data points in the dataset are combined, and the process is iterated [7]. Fuzzy clustering [8] is a new clustering algorithm with the development of the automation control field. In the theory of fuzzy mathematics, the concept of membership degree is mentioned for the first time. This theory greatly promotes its development. The main solution of the algorithm is to introduce membership degree to optimize or even solve problem when a point cannot be effectively identified. Affinity propagation (AP) is a clustering algorithm using a voting mechanism with good results. The idea of AP algorithm is very interesting, which is illustrated by examples in our life. For example, there are many commodities and many customers. If a commodity can be selected by customers, it must have its own strength. That is to say, it must have enough attraction. Secondly, it is the direct word-of-mouth of customers. One customer says yes, and further recommends the next customer to buy it, and so on. In this case, the commodity is like the cluster center to be selected, and the customer is the point attached to the cluster center. To sum up, the core of AP clustering algorithm is to calculate the attribution matrix and attraction matrix.
After introducing FDP, there are many papers that solve FDP's problem. This paper proposes a method [9] that uses the characteristics of data point density distribution to cluster efficiently and quickly. The algorithm ingeniously designs a set of calculation method and selects the point with the highest density in a certain area as the center. Noncentral point assignment is based on the class of the nearest point whose density is larger than itself. Outliers are found and excluded from the analysis. Regardless of the shape of the cluster and the dimension of the embedded space, clustering will be recognized.
However, FDP cannot solve many data features or highdimensional data effectively, some nonmain features will interfere and affect the performance of the algorithm, and the scientists have made a series of improvements. Among them, dimension reduction is the first choice. A new fast hybrid dimension reduction method [10] is proposed. It innovatively proposes a method that combines multiple feature extraction methods. In this way, the interference of nonmain features can be reduced and the efficiency can be improved. The Fisher score and feature selection based on information gain are used to remove nonmain features. In this way, the interference of nonmain features is reduced, and the clustering effect is significantly improved. At present, there are many ways to reduce dimension, and principal component analysis (PCA) [11] is the best one. PCA creatively maps high-dimensional features to the lowdimensional features, so that the new low-dimensional data is an orthogonal matrix, which is also called the main component. Because it is a low-dimensional feature reconstructed on the basis of the original high-dimensional feature, the original feature is kept while the dimension is reduced. In conclusion, we compare with the PCA in experiments.
The purpose of studying the clustering algorithm is to serve people's life. In this paper [12], the popular concept of topology is used. In topology, manifold learning is the key to this paper. Manifold learning is currently a popular dimension reduction method. Let us briefly introduce manifold learning. For example, a piece of paper can be seen as two-dimensional when it is tiled. But when it is kneaded into a mass, it can be regarded as three-dimensional. For any point on the tissue, whether it is kneaded into a ball or flat, its relative position has not changed. If we can reduce the dimension smoothly and keep the feature unchanged. At the same time, the index matrix of the discrete clustering optimization process is easily affected by noise. To solve matters, a new clustering way was proposed in this paper. It combined local adaptive subspace learning and knowledge clustering to mine discriminant information adaptively.
Aiming at the practical problems of sample data nonlinearity and high dimension in complex system evaluation or prediction, in paper [13], they all used a memetic algorithm model [14] to achieve dimensionality reduction and achieved good results.
In another paper, it proposed a clustering algorithm for high-dimensional stream data. This algorithm introduced dimension reduction into the framework of stream clustering. When the new data arrives, for the sake of finding the local shadow space, there is necessary processing of the disordered new data. It is very necessary to reduce the high dimension to the low dimension. The algorithm innovatively used the unsupervised linear discriminant analysis (LDA) to process the data, so as to find the local shadow space. The obtained local subspace will maximally separate the adjacent microclusters from the incident point. Introduction points are admeasured to the microclusters in the projection space, which can be improved by this method.
Facing high-dimensional data, we proposed a method to combine FDP to achieve the performance of FDP on highdimension data.
At this moment, let us discuss the practical application of clustering. When it comes to practical application, we have to mention the Internet of Things (IoT). It is often used in the intelligent world. As a sensing layer, wireless sensor networks are composed of many sensor networks. The sensor is the most important part of Internet of Things. Sensors are like human facial features to perceive the world. Sensors constantly provide information for users to facilitate their production and life. Therefore, to obtain a stable network, the energy control of the sensor must be optimized first, so as to extend the life of each sensor in the network. Sensor in the network is like a data point in space. At this time, each data point has a series of characteristics, such as relative distance and energy consumption. How to better control the energy consumption of wireless sensor is the difficult problem.
In this paper, [15] creatively introduced the semantic relationship between sensors to promote the effect of the whole network. Therefore, it proposed the new sensor ontology integration technology by introducing such a mechanism which is called the debate mechanism (DM). The purpose of this method is to extract sensor ontology alignment [16,17].

Wireless Communications and Mobile Computing
In this way, the communication ability between wireless sensors is strengthened, and the performance of the whole network is improved. The global factor in the algorithm is calculated by the correctness factor. The local factor is calculated through the debate mechanism. By getting the global factor and the local factor, the judgment factor can be obtained by combining them. The judgment factor is used to achieve ontology alignment. By this means, the effect of the whole wireless sensor network can be improved. Inspired by this article, we finally apply the improved FDP to wireless sensor networks and do a comparative experiment to show the performance.  (1) and (2) are designed to evaluate local density.

Related Work
where ρ i is used to represent the local density. The degree of similarity between point i and point j is expressed by d ij . It is generally measured by Euclidean distance. For Equation (1), d c is usually set to 1%-2% of the total number of samples (the total number of data points). Although the original text does not specifically point out, the setting of d c needs to be set by the user, and the impact on the algorithm is very huge. Partly, the setting of d c is also very difficult. This parameter has a great influence on the algorithm. Moreover, if the setting is not correct, the expected effect will not be obtained, and it is mistaken for the performance of the algorithm itself. Its value setting is also a very popular direction, and its value setting can effectively improve the performance of FDP. Secondly, δ i is the nearest Euclidean distance at all points with a greater density than itself. The decision graph is set by ρ i and δ i as the x-axis and y-axis of the coordinate axis, respectively. The larger and larger sample points are selected as the center of the class cluster in the decision graph. It needs to check manually the region according to the generated decision graph, and the point in the selected region is the cluster center point. The cluster of each noncenter sample point is the cluster of the nearest sample point higher than the point in the neighborhood.
2.2. Introduction to t-SNE. Among many dimensionality reduction algorithms for high-dimensional data, the stochastic neighbor embedding (SNE) [18] is a very special one. The idea of its algorithm is very clever and simple, but it contains a lot of probability theory. It is very similar for the popular learning mentioned above. It is about flattening a spatial dimension into a plane. The algorithm first describes the distribution of each point; how to measure the distribution is a key. There are many ways to measure; here, we choose Euclidean distance first. According to the research, different measurement methods have a great impact on the performance of the algorithm. We consider two data points x i and x j which are high-dimensional data. The conditional probability represents the degree to which data x j is a neighbor of data x i by p j|i . We can define p j|i in this way.
where σ i is a statistical constant. First, we get the Gaussian distribution x i which is centered and then calculate its variance.
The essence of dimension reduction is not to change the nature of data points and the relationship between data points. Low-dimensional data can keep as many features as possible. y i and y j are the data in low-dimensional corresponding to high-dimensional x i and x j separately. Through such a mapping relationship is to establish the dimensionality reduction process. The conditional probability density of a low-dimensional is defined just like that of a highdimensional by q j|i .
It is obvious that two conditional probability densities are obtained by calculation, and how to deal with and use these two values will be the key of this algorithm. If the high-and low-dimensional distributions are consistent, then the two conditional probabilities will be equal. Then, our goal is relatively clear. When the two conditional probabilities are equal or the difference is very small, the effect of dimension reduction will be perfect. The author of SNE introduced the Kullback-Leibler (KL) divergence distance to solve this problem.
The dimension reduction effect of SNE is catastrophic for clustering. The clustering algorithm cannot cluster the reduced data effectively. Such dimension reduction is meaningless for clustering. Because SNE pays more attention to the local structure and ignores global structure. Having a certain impact, there is also congestion problem when using symmetric SNE.
The performance of t-distribution and Gaussian distribution is similar without interference data. But when there are outliers in the sample, there are inconsistencies. The simulation result of Gaussian distribution is not as good as that of t-distribution. t-distribution can keep the internal structure of the original data unchanged, and the variance is small. t-distribution can keep and show the characteristics of the original data better. At this time, the author can 3 Wireless Communications and Mobile Computing solve this problem by using t-distribution instead of Gaussian distribution. Equation (6) shows the result after replacing with t-distribution.
KL distance is to find the optimal value.
However, we find that the FDP algorithm cannot effectively cluster the data after t-SNE dimensionality reduction.
In the process of Figures 1-3, the data in Figure 1 only has six groups of features. After dimensionality reduction, we can see that each class is separated and not mixed together. Figure 2 has 7 groups of features. At this time, a small green piece appears near a large yellow block, and a small yellow one appears near the green one. If a clustering algorithm is used, the green class will be divided into yellow, and the yellow small piece is divided into green, which leads to the final result error of clustering. In this picture, the same is true. There are many other cases which will lead to the error of the final clustering results. But there are 10 groups of data in Figure 3. It should have appeared in the same region with the same color. However, it can be seen from the graph that many colors are mixed together and are not effectively separated. If clustering algorithm is used at this time, the accuracy of clustering algorithm will be reduced, and even the wrong results will be obtained. In the next section, we propose an improvement.

Algorithm Improvement
3.1. Feature Extraction. Figures 1-3 show the process in which the effect of the algorithm becomes worse as the feature increases. t-SNE [19] has decreased with the increase of data features. The cluster algorithm cannot be correctly  Wireless Communications and Mobile Computing distributed to the right location. This will inevitably degrade the performance of cluster algorithm. We decided to introduce random forest to improve the performance. The performance of traditional t-SNE still has some problems, so we decided to introduce random forest to improve the performance. The data in scientific research and production life are very complex, but the main features of these data that can distinguish each other are sometimes not many. For example, fruit can be distinguished by shape and color. Therefore, effectively selecting the main features can improve the performance of the t-SNE algorithm. The feature extraction of random forest [20] is mainly based on the out of bag (OOB) principle. If a feature is important, then when a certain amount of noise is introduced into the distributed data of this feature, the performance of the model should be greatly changed by random forest training with only the changed data of this feature. On the contrary, if a feature is unimportant, the performance of the retrained model will not change much. Data has many features. We need to calculate the importance value of all features. Firstly, a random forest is established, and the decision tree of the random forest uses C4.5. Firstly, for a feature, the error of its packet data is calculated according to the established random forest, which is recorded as errOOB1. Then, the interference data is added to the same feature to calculate errOOB2.
N stands for n trees. If we add noise to a feature at random, Importance x will be greatly reduced, which means that the main feature. It has a high degree of importance. The feature select process is to sort the feature variables by random forest in descending order of Importance x . A new feature set is obtained by determining the deletion ratio and eliminating the unimportant indexes from the current feature variables. After calculating the importance value of all the features, we calculate their average value. If the importance value is less than 10%-20% of the mean value, the remaining features are the main features. After dimensionality reduction of these features, the effect of FDP will increase. Figures 4 and 5 show the effect after extracting the main features. t-SNE will further put the original class together instead of leaving a class. t-SNE PCA and locally linear embedding [21] (LLE) cannot effectively separate the data, which affects the clustering effect. Now, let us take a look at Figure 4. This picture shows the comparison between the improved t-SNE and t-SNE and PCA and LLE. In Figure 4, the improved t-SNE, t-SNE, PCA, and LLE are represented by (a), (b), (c), and (d), respectively. We use the wine dataset in the UCI dataset, which has 3 categories and 13 features. The goal of this experiment is to reduce this set of data to 2 dimensions. We use the same color to represent the same class and use four algorithms to reduce the original data to 2 dimensions. It can be seen that t-SNE, PCA, and LLE do not effectively divide the same color into the same area. In t-SNE, in the -200 to 200 regions, three colors are mixed together. In PCA, in the -200 to 200 regions, the three colors are mixed together, so is LLE. In the improved t-SNE, green, purple, and blue are effectively   Wireless Communications and Mobile Computing divided into three parts. In this way, the clustering algorithm can cluster more effectively. In Figure 5, the improved t-SNE, t-SNE, PCA, and LLE are represented by (a), (b), (c) and (d), respectively. We use the digits dataset in the UCI dataset, which is 64dimensional data. The goal of this experiment is to reduce this set of data to 2 dimensions. We use the same color to represent the same class and use four algorithms to reduce the original data to 2 dimensions. First of all, you can see that all the colors in c are mixed up in a disorderly way. After dimensionality reduction, each class is not separated effectively. This will lead to the subsequent clustering effect worse or even get the wrong result. In (d), although each color is roughly in a straight line, except for fans and dark blue which are obviously separated, other colors are mixed together and not effectively separated. The effect in (b) is better than that in (c) and (d), but we can see that a large number of small pieces are not assigned to the corresponding position. For example, a small knob of light blue is far away from the original a lumpen mass of light blue, and this small knob of light blue is closer to other colors, which will lead to the subsequent clustering algorithm cannot effectively classify. It will lead to the result error. The improved algorithm can reduce the occurrence of two cases, so it can improve the accuracy of clustering algorithm after dimension reduction.
3.2. Self-Adaptive Selection. Now, let us talk about FDP. In our further experiments, we found a lot of problems and defects about the algorithm itself. The selection of d c and the judgment of the decision graph will affect the result of the final clustering algorithm. In this paper, we introduce and improve the decision graph selection problem. We will further improve the selection of d c in the follow-up work. The improvement of d c selection is also very meaningful. Back to the part of the decision graph, we find that the experimenter needs to decide the cluster center by himself. This greatly affects the clustering performance and effect. When artificial selection is introduced into the algorithm, the accuracy of the algorithm will be greatly reduced. If this algorithm needs to be promoted and expanded, its ease of use will be greatly reduced. Through a large number of experiments, we solve the problem of manually selecting the center point and improve the accuracy of the original algorithm.
For the problem that the center point cannot be selected effectively in FDP, we design a new judgment system through the difference and change rate, so that it can select the center adaptively. It does not need to check manually to complete the center point selection. We should make full use of ρ i and δ i . Firstly, ρ i and δ i are multiplied, and the value is set as λ i The improved algorithm relies on the calculation of the λ. We sort and start with 1. λ 1 is biggest, and so on. And then, we make use of mathematical analysis of the results λ. Firstly, we use Equation (10) to get the difference. The main idea of formula (10) is to make difference Δ between two adjacent terms. Then, the calculated difference Δ is processed by Equation (11). The main idea of Equation (11) is to get the change rate of adjacent difference θ. These two values are regarded as the prerequisite of the core point judgment. This is the pseudocode described by the algorithm RCD-FDP.
At this time, we calculate the arithmetic mean value of Δ and θ to get avgðΔÞ and avgðθÞ, respectively. At this point, we can transform the problem into comparing Δ and θ with avgðΔÞ and avgðθÞ. After a series of experiments, when Δ is less than 10%-25% of the avgðΔÞ and θ is less than 50% of the avgðθÞ. You can stop judging and get the center point. Table 1 is a manual dataset. Due to the large amount of data, only the first 11 sample data are shown in this paper. It can be seen that if we judge by hand, we cannot accurately judge the cluster center. However, by introducing Δ, θ, avgðΔÞ, and avgðθÞ, the core point can be determined adaptively. It avoids   Wireless Communications and Mobile Computing the ground error caused by manual judgment. The above pseudocode RCD-FDP shows the process of the algorithm.
Combining the data after improved t-SNE dimension reduction with RCD-FDP, this algorithm is called IT-RCD-FDP, which has an obvious effect on high-dimensional datasets.

Evaluation Criterion.
This section shows the performance of the improved algorithm through experimental comparison. In this experiment, digits, wine, and heart disease in the UCI dataset are selected. Through these datasets, the experiment uses some commonly used evaluation criteria [22] to compare with the original FDP algorithm, K-means algorithm, DBSCAN, and spectral clustering. Four evaluation criteria are used in this experiment. It includes accuracy, adjusting rand index, normalized mutual information, and completeness.
Accuracy [23] is one of the most commonly used means, which is often used in binary classification. In clustering, we only need to transform multiclassification into a twoclassification problem by transforming the problem into a judgment of consistency. If the tested algorithm used is very good, the calculated value is 1. The accuracy of the evaluated algorithm can be calculated by (12).

ACC = TP + TN TP + TN + FP + FN
: ð12Þ Rand index [24] is judged and calculated by comparing tags with existing results. It calculates the number of the same cluster and the number of different clusters by judging whether it is a cluster. But RI has problems with random tags. At this time, adjusting RI [25] was proposed to solve this problem. Equations (13) and (14) show the calculation process of RI and ARI In view of the knowledge of the assignment of base truth classes and our prediction of the assignment of clustering algorithms for the same sample, mutual information [26] (MI) is the degree of closeness between the two is expressed in the form of joint probability. It can measure the clustering result of the clustering algorithm. It compares the joint probability density of the existing tags and the clustering algorithms. Normalized MI [27] is obtained by calculating the arithmetic mean of MI. Equations (15), (16), and (17) show the calculation process of MI and NMI.
Completeness [28,29] is a measure of cluster labels given ground truth. Its idea is illustrated by an example: there are three classes, and each class has a fixed arrangement of students. When all the students go back to their class, its value is the largest. But when the clustering algorithm guides the students back to the class, there will be errors, so that the students who should not be in this class will appear in this class. Then, the value will decrease.

Data and Comparison
Experiments. The experimental datasets are the digits dataset, wine dataset, and heart disease (Cleveland) dataset. Through following UCI dataset experiments, we can see that it a has good performance under the four evaluation indexes. The UCI dataset wine is 13-dimensional, UCI dataset heart disease is 14-dimensional, and UCI dataset digits is 64-dimensional. Experiments are compared with the original FDP, K-means, DBSCAN, and spectral clustering algorithm, through the above evaluation criteria.
For each set of datasets, we use the idea of 10-fold crossvalidation. Each dataset is divided into ten parts according to the number of samples and labeled from 1 to 10. In the first experiment, the part marked 1 is removed and the remaining data is used for testing. At this time, the values of all clustering algorithms under different evaluation indexes are obtained. According to this method, 10 sets of data are obtained. For these 10 groups of data, for example, ACC of the IT-RCD-FDP part has 10 results. We remove the maximum and minimum values and then calculate the draw value as the final experimental result. For K-means algorithm, the center point needs to be randomly selected during initialization. This brings a certain amount of randomness. In order to solve this problem, we select the first k sample values as the center in the experiment. In this way, the randomness can be eliminated. After all the data are processed above, results are obtained in Tables 2-4. Table 2 shows a set of comparison results of four different evaluations using UCI dataset digits, among which IT-RCD-FDP is our improved algorithm. The results show that ACC, ARI, and completeness are the best. In the digits dataset, the evaluation index NMI of IT-RCD-FDP is 7.2% higher than other algorithms. Table 3 also shows a set of comparison results of four different evaluations using UCI dataset wine. The results show that ACC, ARI, and completeness are the best. In the wine dataset, the evaluation index NMI of IT-RCD-FDP is 11% higher than other algorithms. According to wine and digits, the evaluation index NMI of IT-RCD-FDP is 23% higher than that of the original FDP. As can be seen from Table 4, although IT-RCD-FDP scored the highest on NMI, ARI, and completeness. But the overall level is very low. This also shows that the clustering algorithm is weak in the medical field. We hope to break through these scenes in the later work.

4.3.
Application. Now, let us talk about applications. As mentioned above, clustering algorithm also has good applications in wireless sensor networks. Because the wireless sensors distributed in the space are just like every data point in the dataset. They all have their own characteristics and attributes. How to manage and control these sensors needs to understand them and analyze them. If wireless sensors are regarded as data points, clustering can be used to study and analyze them. In wireless sensor networks, the selection of the cluster head is particularly important. The cluster head is just like   the cluster center. In this way, the cluster head selection problem is transformed into the cluster center selection problem. The algorithm will cluster according to the centers. In many algorithms, low-energy adaptive clustering hierarchy (LEACH) algorithm [30] has the most reliable and efficient. Although LEACH performance is very good, there is also the problem that the selected cluster heads are too concentrated. To solve this problem, we introduce IT-RCD-FDP into LEACH to solve this problem. IT-RCD-FDP algorithm is used to cluster all the sensors in a wireless sensor network. Due to the adaptive nature of it, cluster heads can be automatically selected and clustered according to the cluster heads. In this way, wireless sensor networks are automatically divided into several clusters. Then, a sensor with the largest energy is selected as the cluster head in each round of the cluster. In this way, we can solve the problems of LEACH. Figure 6 shows the performance and effect of applying IT-RCD-FDP to LEACH.

Conclusions
This paper starts from two problems of FDP. Firstly, FDP algorithm is difficult to deal with high-dimensional data. This paper introduces the improved t-SNE algorithm. After such processing, the ability of the original algorithm to process high-dimensional data is improved. Secondly, FDP algorithm cannot select centers adaptively. In this paper, the change rate and difference are introduced to make the original algorithm select the centers adaptively. Finally, we apply the improved algorithm in WSN cluster head selection and achieve good results. The IT-RCD-FDP proposes that new way to solve high-dimensional data. Compared with the original algorithm, the algorithm finds a threshold point by a mathematical method and improves the original manual judgment method to automatic operation by setting the range. In this way, the accuracy of the algorithm can be greatly improved. Through the improvement of t-SNE, the result of FDP is more accurate.

Data Availability
Previously reported data were used to support this study and are available at url = "http://archive.ics.uci.edu/ml." These prior studies are cited at relevant places within the text as references.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.