Spectral Clustering Algorithm Based on Improved Gaussian Kernel Function and Beetle Antennae Search with Damping Factor

There are two problems in the traditional spectral clustering algorithm. Firstly, when it uses Gaussian kernel function to construct the similarity matrix, different scale parameters in Gaussian kernel function will lead to different results of the algorithm. Secondly, K-means algorithm is often used in the clustering stage of the spectral clustering algorithm. It needs to initialize the cluster center randomly, which will result in the instability of the results. In this paper, an improved spectral clustering algorithm is proposed to solve these two problems. In constructing a similarity matrix, we proposed an improved Gaussian kernel function, which is based on the distance information of some nearest neighbors and can adaptively select scale parameters. In the clustering stage, beetle antennae search algorithm with damping factor is proposed to complete the clustering to overcome the problem of instability of the clustering results. In the experiment, we use four artificial data sets and seven UCI data sets to verify the performance of our algorithm. In addition, four images in BSDS500 image data sets are segmented in this paper, and the results show that our algorithm is better than other comparison algorithms in image segmentation.


Introduction
Clustering analysis is an important research problem in the field of data mining. The purpose of clustering is to divide the data set into different clusters according to the intrinsic structure and relationship between the data so that the similarity between data points within the same cluster is higher, and the similarity between data points in different clusters is lower. The main clustering methods include partitioning-based clustering, hierarchical clustering, density-based clustering, grid-based clustering, and graph theory-based clustering. Different clustering algorithms are also applied to different fields, such as image segmentation [1][2][3][4], text clustering [5,6], and community division [7][8][9].
Spectral clustering is a kind of clustering algorithm based on graph theory. By spectral graph partition theory [10], the clustering problem of the data set is transformed into the graph partition problem. In spectral clustering, each data point is regarded as the vertex of the graph, and the similarity between data points is regarded as the weight of the edge. By dividing the graph, the sum of the weight of the edge in the subgraph is as high as possible, and the sum of the weight of the edge between different subgraphs is as low as possible.
In 1973, Donath and Hoffman [10] first proposed the concept of graph partition based on the adjacency matrix, marking the formal birth of spectral clustering. In the same year, Fiedler [11] found that the two-way partition of the undirected graph is closely related to the eigenvector corresponding to the second small eigenvalue of the corresponding Laplacian matrix, which provides a new way to solve the problem of graph partition. In 2000, Shi and Malik [12] put forward the standard cut objective function, also known as the N-cut criterion, based on the spectral theory. In 2001, Ding et al. [13] put forward the minimum and maximum cut-set criterion based on N-cut, which balances the two requirements of minimum division loss and maximum vertex number of subgraphs, making division more inclined to balance the cut set and avoiding segmentation of smaller subgraphs with only a few vertices. In 2002, Jordan, Weiss, and Ng [14] proposed NJW algorithm, which is different from two-way division. The algorithm is based on k-way division, and it is also the most widely used spectral clustering algorithm so far. Despite the good development of spectral clustering, there are still some problems with the algorithm itself, such as how to select the scale parameters in the Gaussian kernel function. In 2004, scholars [15] have proved that the selection of scale parameters will affect the clustering results. To solve this problem, Zhang et al. [16] proposed a construction method of the similarity matrix based on local density. Nataliani and Yang [17] proposed an energy Gaussian kernel function to solve this problem.
Beetle antennae search algorithm (BAS) is an optimization algorithm inspired by the beetle's foraging principle proposed by Jiang and Li [18] in 2017. By simulating the detection function of beetle's tentacles and the mechanism of beetle's random walking, an optimization mechanism similar to beetle's foraging process is realized. According to the smell of food, the moving direction of the beetle is determined. When the smell of the left tentacle is strong, it will move to the left; otherwise, it will move to the right. Through the random orientation mechanism and variable step size mechanism, a beetle can search in the global scope. Compared with other intelligent algorithms, the algorithm does not need to know the specific form of gradient information and function and has the advantages of fast convergence speed and low requirements for parameters. So, it has been applied in some fields. Wang and Liu [19] combined the reverse neural network with the BAS algorithm to predict the loss of storm disaster. Chen et al. [20] used the particle swarm optimization algorithm based on the BAS algorithm to solve the portfolio model. Wang and Chen [21] proposed a kind of bee swarm antenna search algorithm (BSAS).
The main contributions of this paper are as follows: (1) A construction method of the similarity matrix is proposed, which uses the distance information of some nearest neighbors to define the scale parameter σ to overcome the influence of artificial designated scale parameter σ on the results. (2) In the clustering stage, we use the proposed beetle antennae search algorithm with damping factor (DBAS) to complete the clustering. Through such an intelligent optimization algorithm, we can overcome the impact of random initialization of the cluster center on the results when K-means is used in the traditional spectrum clustering. And the damping factor overcomes the oscillation in the iterative process and improves the stability of the algorithm.
The content of this paper is organized as follows. In Section 3, an improved spectral clustering algorithm based on the distance information of some nearest neighbors and beetle antennae search algorithm with damping factor is proposed. Section 4 shows the performance of the algorithm through experimental analysis. The conclusion will be presented in Section 5.

Spectral Clustering and Beetle Antennae
Search Algorithm 2.1. Spectral Clustering. The spectral clustering algorithm uses the eigenvectors of the Laplacian matrix corresponding to the data set to cluster. In the spectral clustering algorithm, firstly, an undirected graph G � (V, E) is constructed according to the data points. Each vertex v i on the graph corresponds to a data point, and the weight w ij on the edge is the similarity between the data points. In general, we use Gaussian kernel function to construct the similar matrix. Then, we can get a degree matrix D, d ii � n j�1 w ij , whose main diagonal element is equal to the sum of the row elements corresponding to the similar matrix. There are usually three ways to construct the Laplacian matrix L: . . e k corresponding to the first k eigenvalues of the Laplace matrix can be calculated and set U � [e 1 e 2 . . . e k ]. Then, a new feature matrix F is obtained by normalizing U. Each row in the feature matrix F is regarded as a sample, which is clustered to obtain a group of clusters C 1 , C 2 , . . . , C k . NJW algorithm [14] is the most commonly used spectral clustering algorithm. The basic step of the NJW algorithm is shown in Algorithm 1.

Beetle Antennae Search Algorithm (BAS).
Based on the principle of beetle's foraging, three optimization strategies can be simplified: (1) The left and right antennae of the beetle are located on both sides of the individual. (2) The ratio of the step length of each action to the distance between two antennae is a fixed constant. (3) After a move, the direction of its head is random. Then, we can build an optimization model (the beetle is simplified as an individual): (1) For an optimization problem in the n-dimensional space, x l is used to represent the coordinates of the left antennae of an individual, x r represent the coordinates of the right antennae of an individual, and x is the centroid coordinate. D 0 is the distance between two antennae. Since the orientation of the individual is random after each movement, the direction of the vector that the right of the individual points to the left is also random. It can be expressed by a normalized random vector di r � di r/norm (ran ds(n, 1)). There is If f left is less than f right , then the individual travels in the direction of the left antennae step, otherwise, the distance step of the individual toward the right antennae direction.

Improved Spectral Clustering Algorithm
In this section, we improve Gaussian kernel function and BAS algorithm, respectively. After using the new Gaussian kernel function to construct the similarity matrix, we use the spectral clustering algorithm to get a new feature matrix, and then, we use the improved BAS algorithm to cluster.
2 Computational Intelligence and Neuroscience

An Improved Gaussian Kernel Function.
In the traditional spectral clustering, the similarity matrix is usually constructed according to the Gaussian kernel function in the formula of Algorithm 1, where σ is the scale parameter; in general, the scale parameter σ is selected artificially. In 2004, scholars [15] had proved that the selection of scale parameters will affect the clustering results. In order to solve this problem, this paper proposes a method of constructing a similarity matrix based on the distance information of some nearest neighbors: where , which is the mean distance of the nearest g points from point i. g is the ratio of the total number of samples to the square of the number of clusters. g � N/k 2 , where N is the total number of samples and k is the number of clusters.

Beetle Antennae Search Algorithm with Damping Factor (DBAS).
As mentioned in Section 2.2, the direction of the individual is random in each iteration. This results in more oscillations in the process of algorithm iteration. It is possible that the result of the M + 1 iteration is worse than that of the M iteration many times. We proposed to add a damping factor to the formula of the position update of the individual, which updates the position information by using the results of this iteration and the last iteration. The formula is described as where x t indicates the position in the t − 1th iteration, da mp ∈ [0.5, 1). We use the algorithm with damping factor and the algorithm without damping factor to experiment on the Iris data set. Figure 1 shows that adding damping factor to the algorithm can effectively overcome the oscillation problem in the iterative process.

SC-DBAS Algorithm.
Firstly, we use the Gaussian kernel function based on the distance information of some nearest neighbors (formula 2) to construct the similarity matrix and then calculate the corresponding degree matrix and Laplace matrix. We select the eigenvectors corresponding to the first k minimum eigenvalues of the Laplace matrix to construct an eigenmatrix and then normalize it to get a new eigenmatrix. Each row of the matrix is regarded as a sample point. For such a new data set, we randomly initialize a group of cluster centers as an individual and then use DBAS algorithm to cluster. SC-DBAS algorithm flow is given in Algorithm 2.

Computational Complexity.
The computational complexity of the proposed algorithm can be calculated as follows: the SC-DBAS algorithm is divided into three parts: (1) constructing a similar graph, which needs O(n 2 ), (2) eigenvalue decomposition, which needs O(n 3 ), and (3) clustering by using DBAS algorithm, which needs O(nkl), where k is the number of cluster centers and l is the number of iterations. According to the notation of big O, the computational complexity of the proposed algorithm is O(n 3 ).

Experimental Setting.
All the experiments are conducted on the computer with Intel core i5-3230M CPU, 8 GB RAM. The experiment environment is Matlab 2016b. In the experiment, we compare the proposed algorithm with the K-means, NJW [14], MPSC algorithm [22], PGSC algorithm [17], and SC-NP algorithm [23] on four artificial data sets and seven UCI data sets. The proposed algorithm will also use the image in the BSDS500 data set for image segmentation. In the experimental part of image segmentation, the comparison algorithm is K-means, NJW [14], PGSC algorithm [17], and SC-NP algorithm [23].
In the experiment, the parameters are set as follows: step � 0.1; step adjustment factor eta � 0.95; the ratio between step and D 0 is 5; the number of iterations n � 100; and damp � 0.5. The information of data sets is shown in Table 1.

Evaluation Indicators.
In the experiment, we use four indicators to evaluate the clustering results: accuracy, ARI, F1 score, and time (s).
(1) Accuracy rate: the accuracy rate represents the proportion of the number of correct clustering samples to the total number of samples, where V is the division label and U is the real label: (2) ARI: there are four cases by comparing the calculation results V with the real label U. SS contains sample pairs that belong to the same cluster in V and the same cluster in U. SD contains sample pairs that belong to the same cluster in V but not the same cluster in U. DS contains sample pairs that do not belong to the same cluster in V but belong to the same cluster in U. DD contains sample pairs that do not belong to the same cluster in V and do not belong to the same cluster in U.
The larger the value of ARI means that the clustering results are more consistent with the real situation.
(3) F1 score: F1 score is one of the commonly used evaluation criteria in information retrieval. It is a weighted harmonic mean value based on precision Computational Intelligence and Neuroscience and recall. Its definition is as follows, where a, b, and c have been defined in the above content: (4) Time: in this paper, we use the average time of each algorithm running 100 times as the evaluation index. Table 2 shows the experimental results of the six algorithms on the four artificial data sets. From Figure 2, we can see that our proposed algorithm can well divide the data sets of various structures.

Experimental Results of Artificial Data Sets.
Step 1: use the Gaussian kernel function to construct the similar matrix W. w ij � exp((− ‖x i − x j ‖ 2 )/2σ 2 ).
Step 2: degree matrix D, d ii � n j�1 w ij .
Step 4: calculate the feature vector v corresponding to the first k eigenvalues of L, and construct the feature matrix U.
Step 5: normalize the feature matrix U to obtain a normalized matrix Y, which contains n points in space reduced to k dimensions.
Step 6: treat each row of Y as a point, and cluster them by K-means algorithm.  Step 4: calculate the eigenvector e 1 e 2 . . . e k corresponding to the first k minimum eigenvalues of the Laplace matrix which forms the eigenmatrix U Step 5: normalize the feature matrix U to get a new feature matrix F Step 6: treat each row of the feature matrix F as a data point, and randomly initialize a group of cluster centers as an individual Step 7: randomly initialize a group of cluster centers as an individual Step 8: calculate the fitness of the right antennae f right and the left antennae f left of the current individual, where Step 10: repeat steps 8 and 9 until the maximum number of iterations is reached Step 11: according to the cluster center corresponding to the last individual position, the cluster C 1 C 2 . . . C k is obtained Output: C 1 C 2 . . . C k ALGORITHM 2: SC-DBAS algorithm. 4 Computational Intelligence and Neuroscience           Table 3 and Figure 3 show the experimental results of the six algorithms on seven UCI data sets. By comparing the results, we can see that the algorithm proposed in this paper performs better than the other five algorithms and has a shorter running time.

Application of the SC-DBAS Algorithm to Image
Segmentation. Clustering-based image segmentation is based on the similarity between image pixels; through some clustering algorithms, the pixels are divided into different clusters so as to complete the segmentation of the original image.
In this section, we segment some images of the BSDS500 data set. For a 481 * 321 pixels image, if we treat each pixel as a data point, there will be 154,401 data points. Therefore, in order to reduce the scale of data points, we first use SLIC algorithm [24] to perform presegmentation (superpixel segmentation) on the image. Each superpixel is an oversegmented region and is considered as a data point. Then, the proposed algorithm is used to segment the image. In the experiment, the number of superpixels of each image is 200. The comparison algorithm used in the experiment is K-means, NJW [14], PGSC algorithm [17], and SC-NP algorithm [23]. Then, we can get the results which are given in Figure 4.
From the experimental results, we can see that our algorithm can segment the object and the background better, while the other four comparison algorithms will have the wrong segmentation area. The segmentation accuracy results are shown in Table 4.

Conclusion
In this paper, an improved spectral clustering algorithm combined with the improved BAS algorithm is proposed. The proposed algorithm first improves the construction of the similarity matrix, which uses the distance information of some nearest neighbors of each point to calculate the corresponding scale parameters. In the stage of clustering, we proposed BAS algorithm with damping factor to cluster, which can overcome the problem that the original algorithm oscillates many times in the iterative process. The experimental results show that our algorithm is better than other algorithms in UCI data sets, artificial data sets, and image segmentation. However, in the application of image segmentation, our results will be affected by the effect of superpixel segmentation. The future work is to improve our algorithm so that it does not need to preprocess in image segmentation and can directly segment the image, and we will use more real images and medical images to verify our algorithm.

Data Availability
The four artificial data sets that were manually generated can be obtained by contacting the author. The seven UCI data sets are often used in the existing literature which are from the UCI Machine Learning Repository available at http:// archive.ics.uci.edu/ml/datasets.php. The four tested images are from the Berkeley computer vision group, Berkeley segmentation data set, and benchmark 500 (BSDS500), which are available at https://www2.eecs.berkeley.edu/ Research/Projects/CS/vision/grouping/resources.html.