An Improved Spectral Clustering Community Detection Algorithm Based on Probability Matrix

)e similarity graphs of most spectral clustering algorithms carry lots of wrong community information. In this paper, we propose a probability matrix and a novel improved spectral clustering algorithm based on the probability matrix for community detection. First, the Markov chain is used to calculate the transition probability between nodes, and the probability matrix is constructed by the transition probability. )en, the similarity graph is constructed with the mean probability matrix. Finally, community detection is achieved by optimizing the NCut objective function. )e proposed algorithm is compared with SC, WT, FG, FluidC, and SCRW on artificial networks and real networks. Experimental results show that the proposed algorithm can detect communities more accurately and has better clustering performance.


Introduction
With the development of information technology, the interactions among the complex systems of biology, sociology, and other fields are getting closer and closer. It is of great theoretical significance and practical value to obtain relevant information from real complex systems. According to graph theory, most real complex systems where their internal entities have rich associations can be abstracted into complex networks, such as neural networks, power networks, and social networks. In addition to the small-world and scale-free properties, complex networks have an extremely important community structure [1]. Community is a mesoscopic structure in which nodes from the same community are closely connected to each other, but nodes from different communities are sparsely connected. It is playing an important role in revealing the topological structure and functional features of complex networks. In recent years, community detection has been a popular research field for searching information, analysing function, and forecasting behaviour.
Community detection is a process of dividing a network into many clusters according to certain relationships among nodes. Moreover, community detection can classify nodes based on the topological structure of the network. It can reveal the hidden hierarchical structure of the real network and improve the performance and efficiency of storing, processing, and analysing network data. So far, there are many methods for community detection, such as spectral bisection algorithm [2], graph segmentation algorithm [3], heuristic algorithm [4], and objective optimization algorithm [5].

Related Work
Community detection is an important branch of complex networks. Among the traditional community detection algorithms, the most famous algorithm is the spectral analysis algorithm based on network topology, which is referred to as spectral clustering [2] in the following. Its main idea is eigendecomposing the similarity matrix of the network to obtain the main eigenvectors for finding communities. Not only is the spectral clustering algorithm applicable for a variety of data structures but also it utilizes dimensionality reduction to reduce computational complexity. Consequently, scholars began to research spectral clustering and optimize and expand on it.
Qin et al. [6] proposed a multisimilarity spectral method for clustering dynamic networks. It detects communities by bootstrapping the clustering of different similarity measures. Ulzii and Sanggil [7] designed an agglomerative spectral clustering method with conductance and edge weights. e most similar nodes are agglomerated based on eigenvector space and edge weights. Ding et al. [8] explored the equivalence relation between the nonnegative matrix factorization and spectral clustering and developed a semisupervised spectral clustering algorithm.
Spectral clustering typically constructs a similarity matrix with Euclidean distance between nodes. However, the Euclidean distance may lose the hidden relationship among nodes. As a result, the similarity matrices cannot contain complete community information. Clustering performance is not satisfied. If the constructed similarity matrix can approach the ideal matrix, the spectral clustering algorithm will have better clustering performance. Hence, constructing an excellent similarity matrix is the key to the spectral clustering community detection algorithm.
Nataliani and Yang [9] proposed a new affinity matrix generation method by using neighbour relation propagation principle. e method can increase the similarity of point pairs that should be in the same cluster. But the distance threshold is easily affected by outside points or noise points. Beauchemin [10] presented a method to build affinity matrices from a density estimator relying on K-means with subbagging procedure. However, this method would not work well when manifold proximity exists. Zhang and You [11] developed an approach based on a random walk to process the similarity matrix. e pairwise similarity is not only related to the two points but also related to their neighbours. However, the threshold of neighbouring nodes is set manually, and the stability of clustering is bad.
Although many community detection algorithms based on optimizing a similarity graph have been proposed, how to construct the similarity graph that can correctly reflect the community structure has not been solved. Consequently, this paper focuses on the transition probability between nodes to calculate the similarity, presents the concept of probability matrix, and proposes an improved spectral clustering community detection algorithm based on the probability matrix.

Constructing a Similarity Graph by Probability Matrix.
e similarity graph of spectral clustering is constructed by calculating the similarity between nodes. In this section, the similarity between nodes is calculated by the transition probability among nodes. And the related concepts of probability matrix and mean probability matrix are introduced. en, the similarity graph is constructed based on the mean probability matrix.

Transition Probability.
A Markov chain is a stochastic process of variables with Markov property, describing a sequence of states. e state changes over time, and the next state of the sequence depends on the current state [12]. e possibility of transition between states is called the transition probability.
Given a network N, the number of nodes is n, the adjacency matrix of N is W. e probability that node i reaches to node j after one step is the 1st transition probability, which can be defined as (1) e 1st transition matrix Pr is a matrix composed of entry pr ij , then j�0 w ij . e probability that node i reaches to node j after l steps is the l-th transition probability. And the matrix formed by lth transition probability is called the l-th transition matrix Pr l . According to the properties of the Markov chain, we can get Pr l � Pr l . (2) Pr l denotes that Pr is multiplied by l times.

Probability Matrix
Definition 1. Given a network N(V, E), considering that the transition probability from node i to node j is p ij , then the probability matrix of N is a V × V matrix composed of p ij . e probability matrix can be referred to as P, P � (p ij ). e probability matrix describes the transition probability between nodes in the network. e 1st transition probability can reflect the most direct relationship between the node and its adjacent nodes, but there is a lack of hidden relationship with the nonadjacent nodes. e multistep transition probability can include more neighbour nodes, reflecting the multiple complex associations among nodes. However, the multistep walk may fail to reach the adjacent nodes, which could weaken the relationship with the adjacency nodes. Consequently, we propose a method for constructing the probability matrix based on the accumulation of weighted multiorder transition matrices, and P can be defined as where i denotes that the current state of the Markov chain is at time i, t refers to the size of the Markov chain, called time scale, Pr i is the i-th transition matrix, and w i represents the weight of

Mean Probability Matrix.
e time scale is the key to calculating the similarity of nodes. But the optimal time scale of different networks is not necessarily the same. ere will be mistakes in using a fixed time scale. In order to reduce the influence of parameters t and w, the mean probability matrix obtained from the mean values of P with different time scales is proposed. 2 Discrete Dynamics in Nature and Society Definition 2. Given a network N(V, E), considering that its probability matrix is P and the time scale is t, then the mean probability matrix is a V × V matrix composed of the average of P 1 , P 2 , . . ., P t . According to equations (2) and (3), the mean probability matrix can be referred to as P M : Not only does the time scale t of P M provide the size of time scale for each P, but more importantly, it specifies the number of summing probability matrices. It could take different P to average the error caused by t and w. e mean would reduce mistakes, and the different value of t does not cause a great error. As a result, the value of t can be randomly chosen, but in order to reduce the computational complexity, we set t to be [5,13]. e weight w j also represents the weight of the j-th transition matrix. According to Definition 2, we can get that i gradually changes, and the number of corresponding weights w j also gradually changes. To satisfy the constraint, w j is defined as where ws is a set of weights of size t, and it is artificially set, satisfying ws 1 > ws 2 ≥ . . . ≥ ws t .

Constructing the New Similarity
Graph. e similarity matrix W P is constructed by the mean probability matrix P M . Given a network N, the mean probability matrix of N is P M ; then, the similarity between node i and j can be defined as where w Pij denotes the i, j-th entry of W P , and p Mij refers to the i, j-th entry of P M . e similarity matrix of the traditional spectral clustering is a symmetric matrix, which is beneficial to calculate the Laplacian matrix L. Although W P is not a symmetric matrix, W P has special properties and can also construct L. e properties of W P are as follows: w Pij , the entries on the diagonal are positive, W P is a matrix with nonnegative entries, its diagonal entries are all 0, and each row of entries is not all 0. To sum up, it turns out that L W is a matrix where all the diagonal elements are positive, and the other elements are negative. en, we obtain that L W is invertible.
For any vector f, L W can satisfy w Pij f i f j , As a result, L W is a Laplacian matrix, and W P can construct a similar graph of spectral clustering.

NCut Objective Function.
Spectral clustering has many different objective functions. e purpose of the objective functions is to find a partition of the network such that the edges between different communities have lower weight and the edges within the same community have a higher weight. In other words, nodes in different clusters are dissimilar from each other, and nodes within the same cluster are similar to each other. e more popular functions are RatioCut [13] and NCut [14]. RatioCut focuses on maximizing the number of nodes in the community, while NCut pays attention to maximizing the weights in the community. Given a network N(V,E), they can be defined as where C k denotes the set of nodes in the community k, C a ∩ C b � ∅, C 1 ∪ C 2 .. .∪C K � V, a ≠b, a, b ∈ 1,2,.. ., K { }, K represents the number of communities, C k refers to the complement of C k , C k � V − C k , W(C k , C k ) � i∈C k ,j∉C k W Pij , |C k | is the number of nodes in C k , and vol(C k ) is the sum of the weights of edges in C k , vol(C k ) � i∈C k |V| j�1 w Pij . e number of nodes in the community does not mean that the weight in the community is high. In comparison, NCut is more consistent with the clustering strategy of spectral clustering. erefore, we choose NCut as the objective function of the proposed algorithm. Combined with equation (6), the objective function can be optimized as F is a matrix composed of vectors f, and I is the identity matrix. F can be obtained by solving the first K smallest eigenvectors of D −1/2 · L W · D −1/2 . However, a little information is missing due to dimension reduction, resulting in the fact that F cannot fully indicate the attributes of nodes. erefore, taking a traditional clustering on F, such as Kmeans, can divide the network into K communities more accurately in the end. Discrete Dynamics in Nature and Society

e Main Steps of the Algorithm.
e main steps of the improved spectral clustering algorithm is given in Algorithm 1.

Experiments and Analyses
e experimental data includes artificially generated networks and real networks. On the one hand, we use the LFR benchmark network [15] to generate the networks and evaluate the quality of community detection by normalized mutual information (NMI) [16]. On the other hand, we adopt several real networks and take the modularity (Q) [17] as the evaluation index.
In order to show the performance of the improved spectral clustering algorithm (ISCP), ISCP is compared with SC [2], WT [18], FG [19], FluidC [20], and SCRW [11]. e experimental environment includes Intel 2.5 Hz i7-4710MQ CPU and 8 G RAM. e software platform is PyCharm 2018.1.2 (Community Edition) in Windows 10 × 64. Input network N, adjacency matrix W, community number K, time scale t, and a set of weights ws Output K communities (1) Compute the 1st transition matrix Pr according to equation (1) (2) Compute the mean probability matrix P according to equation (4) (3) Construct the similarity matrix W P according to equation (6) (4) Construct the unnormalised Laplacian matrix L W according to the property 1 of W P in Section 3.2.1 (5) Construct the normalized Laplacian matrix L n with L n � D −1/2 ·L W ·D −1/2 (6) Compute the first K eigenvectors of L n , referred to as U (7) Consider the rows of U as nodes, and use K-means to cluster them into K communities ALGORITHM 1: Improved spectral clustering algorithm based on the probability matrix.  e LFR benchmark networks are computer-generated networks, and they can produce different features of networks by adjusting some parameters. e experiments mainly use mixing parameter μ (μ denotes the average rate of edges connected with other communities, 0 ≤ μ ≤ 1) and network size N to evaluate performance. To guarantee consistency, the detailed descriptions and values of other parameters are shown in Table 1. Figure 1 shows the performance of the six algorithms on μ. From Figure 1, we can get that the NMI trend of ISCP is smoother than other algorithms, and the NMI of ISCP is significantly higher than the other five algorithms. In [16], we can obtain that the larger the NMI is, the better the quality of community detection is. Overall, the clustering effect of ISCP is significantly better than the other five algorithms. In general, ISCP is more stable, and its convergence speed is faster. Figure 2 demonstrates that the performance of the six algorithms on different network sizes N. As seen in Figure 2, the NMI of ISCP is higher than the other five algorithms. And as network size increases, its NMI increases. When the network size reaches 5000 or more, its NMI tends to be stable and stays around 0.9. erefore, whether the order of magnitude of the network size is 1000 or 10,000, the clustering performance of ISCP is better than the other five algorithms.

Real-World Networks.
e real-world networks have different topologies from the benchmark networks. To further evaluate the performance of the algorithms, 8 realworld networks are taken to do experiments. Moreover, it is necessary to normalize some real-world networks, such as eliminating self-loops and constructing a connected network. e detailed information of these networks is shown in Table 2.   Discrete Dynamics in Nature and Society e experiments take modularity Q to evaluate the clustering performance of the six algorithms. e range of Q is from −0.5 to 1. e larger Q is, the better the community detection performance will be. Q generally falls in about 0.3 to 0.7 in practice [17]. Figure 3 shows the performance of the six algorithms for clustering real-world networks. As shown in Figure 3, Q of ISCP is almost all above 0.3 and is larger than Q of the other algorithms. Although ISCP is not the best community detection algorithm for network 1 and network 7, its performance is very close to the best algorithm. Generally speaking, ISCP has excellent clustering performance and can cluster real-world networks more accurately.

Conclusions
Spectral clustering plays an important role in the field of community detection. It is an excellent community detection algorithm, but the traditional similarity graphs contain lots of incorrect information about the community structure. As a result, the performance of community detection is bad. Hence, this paper presents the probability matrix and proposes an improved spectral clustering community detection algorithm ISCP. A large number of experiments on benchmark networks and real-world networks show that ISCP is better than most traditional community detection algorithms and can more accurately cluster complex networks.
However, the ISCP will cost lots of time and space. Given a network N, the number of nodes is n, and the time scale is t. ISCP needs to multiply the transition probability matrix by t times to construct the similarity matrix. Even with the Fast Power algorithm, the time complexity of the algorithm will reach O(n 3 lbt). As the size of the network is larger, computing the similarity matrix will take more time and space. Moreover, ISCP is only applied to nonoverlapping complex networks. So the next step is to research how to optimize the computational complexity of the algorithm and how to cluster overlapping networks.
Data Availability e data cannot be released for the time being. When the relevant research is finished, we will release detailed research results.