A Community Detection Algorithm Based on Topology Potential and Spectral Clustering

Community detection is of great value for complex networks in understanding their inherent law and predicting their behavior. Spectral clustering algorithms have been successfully applied in community detection. This kind of methods has two inadequacies: one is that the input matrixes they used cannot provide sufficient structural information for community detection and the other is that they cannot necessarily derive the proper community number from the ladder distribution of eigenvector elements. In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with nodes' topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes. Experiments results showed that the new algorithm gave excellent performance on artificial networks and real world networks and outperforms other community detection methods.


Introduction
Most networks show community structure [1]. The results of community detection are meaningful for forecasting the behavior and evolution trend of complex networks [2]. For example, in World Wide Web, community detection can be used to improve the performance of search engine, in social networks, community detection can be used to forecast the information propagation among users [3], in electronic commerce area, community detection can be used to select potential user for advertising; and in bioengineering area, community detection can be used to recognize functions of protein [4].
In recent years, many methods inspired by different paradigms are put forward for community detection [5]. Among these efforts, spectral clustering has shown to be successful [6], for it is very simple to implement and can be solved by standard linear algebra methods.
The traditional spectral clustering methods are based on kinds of input matrixes, such as the adjacency matrix, the standard Laplacian matrix, and the normalized Laplacian matrix. The standard Laplacian matrix is defined as = − , and the normalized Laplacian matrix is defined as = −1 , where is adjacency matrix and is a diagonal matrix with elements being the degree of the th node. Almost all above matrixes are constructed with the adjacency matrix and diagonal matrix of networks. These matrixes can only reflect the local relationship between a node and its direct neighbors, as [6] pointed out, "the eigenvalues and eigenvectors of traditional input matrixes cannot provide sufficient structural information for community detection. " As a result, the accuracy of community detection may decrease.
What is more, the community number must be set in advance for the spectral clustering method based on standard Laplacian matrix. The normalized Laplacian matrix can solve this problem to some extent, which has nontrivial eigenvalues close to the biggest eigenvalue 1. The eigenvector elements corresponding to these eigenvalues present ladder distribution. The proper community number of communities can be estimated by the ladders. However, when the community structure of network is not clear, the eigenvector elements cannot show obvious ladder distribution but an approximately continuous curve [7]. In this case, we cannot get 2 The Scientific World Journal the proper community number from the ladder distribution of eigenvector elements.
In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The algorithm constructs the normalized Laplacian matrix with topology potential of network nodes. The topology potential of a node is the sum of potential components produced by neighbors at the position of this node. The topology potential describes the complicated interaction among nodes and contains rich structural information of the network. This structural information is meaningful for community detection. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes, whether the community structure of network is obvious or not. Experiments results showed that the new algorithm can improve the accuracy of community detection and has significant adaptability.
This paper is organized as follows. Section 2 describes related works. Section 3 introduces the concept of topology potential. Section 4 shows the new community detection algorithm based on topology potential and spectral clustering. Section 5 is simulation experiment and results. Section 6 comes to the conclusion of this paper.

Related Works
Spectral clustering algorithms have been successfully applied to community detection. From the perspective of input matrix, spectral clustering methods can be divided into the adjacency matrix [8], the standard Laplacian matrix [9], the normalized Laplacian matrix [10], the modularity matrix [11], and the correlation matrix [12]. Reference [13] found that the normalized Laplacian matrix significantly outperforms the other matrixes in identifying the community structure of networks.
In order to improve the performance of spectral clustering, many nontraditional spectral clustering algorithms have been proposed [6], such as complement based spectral clustering [14], complex eigenvector based spectral clustering [15], semisupervised spectral clustering [16], and eigenspacebased spectral clustering [17]. Zarei and Samani [14] gave out a spectral method based on the network complement and anticommunity concept, declaring "the spectrum of matrixes corresponding to a network complement reveals the communities more accurately than that of a matrix corresponding to the network itself. " Zarei et al. [15] also put forward a spectrum method based on complex eigenvectors and found that the complex eigenvectors of network matrixes showed better performance in community detection. Mavroeidis [16] proposed a semisupervised spectral clustering, and its results showed that the partial supervision cannot only improve the quality of spectral clustering but also accelerate the spectral clustering. Ma et al. [17] presented an eigenspacebased spectral method for community detection, which can identify both the overlapping and hierarchical community without increasing the time complexity. All these methods try to integrate some additional topology structure information into input matrixes.
Except for methods mentioned above, there are some other newly developed spectral methods for community detection. Gong et al. [18] proposed a spectral algorithm utilizing multiple eigenvectors to identify the communities in networks, which performed better for more spectral information is used. Newman [19] found that, within the spectral approximations, community detection by modularity maximization, community detection by statistical inference, and normalized-cut graph partitioning are identical. With the large-deviation theory, Bo et al. [20] established a relationship between the hierarchical community structure of a network and the local mixing properties.
Recently, a novel theory-topology potential theory was introduced to complex network for community detection [21]. Because of its inherent advantage in time complexity and performance, this theory has attracted plenty of attention. Gan et al. [21] put forward a topology-potential-based community detection algorithm. With the algorithm, the community structure can be uncovered by "detecting all local high potential areas margined by low potential nodes. " Han et al. [22] proposed an overlapping community detection algorithm, which divides networks into separate communities by "spreading outward from each local maximum potential node. " Zhang et al. [23] proposed a variable scale network overlapping community identification method based on topology potential. In order to identify overlapping nodes, this method defined an identity uncertainty measure related to topology potential. These above topology-potential-based methods show better performance in community detection; however, there is a weakness for almost all these methods; that is, they definitely need additional strategies or parameters to determine the community attachment of nodes, such as the benefit function in [21] and the parameter in [23].
Different from above works, this paper puts forward a novel community detection algorithm, which combines spectral clustering and topology potential, making best use of their advantages and bypassing their disadvantages. The new algorithm constructs the normalized Laplacian matrix with topology potential of network nodes. The topology potential contains rich structural information of the network, which is meaningful for community detection. What's more, the new algorithm can automatically judge the optimal community number from the local maximum potential nodes, whether the community structure of complex network is obvious or not.

Topology Potential
The topology potential field theory is an important branch of the field theory. People abstracted the classical field as a mathematical model to describe noncontact interaction between objects [24]. Any complex network has its relatively stable topology structure; nodes in the network are not isolated, and there exist relationships between nodes linked by edges. Therefore, the topology potential field theory was introduced into complex network to describe the interaction and association among network nodes [22].
Given a network = ( , ), where = { | = 1, . . . , } is a set of nodes, is the total number of nodes, The Scientific World Journal 3 = {( , ) | , ∈ } is a set of edges. According to the topology potential field theory, the topology potential of any node is defined as follows: where ( ) is the topology potential of node , 1 ≤ ≤ ; the node is a node within the influence scope of node , and is the total number of nodes within the influence scope, 1 ≤ ≤ − 1, 1 ≤ ≤ ; is the hops between s and ; ( ) is the mass of node ; generally speaking, it is set to 1, and the mass difference between nodes is ignored; is an impact factor used to control the influence scope of node, the maximum scope is ⌊3 / √ 2⌋ hops.
The impact factor will affect topology potential field and the influence scope of node. If is small, the interaction and association among nodes is weak. And when → 0, there is even no interaction and association. Conversely, if is big, the interaction and association become strong, and in extreme conditions, all nodes associate with each. Therefore, we need to select suitable value, so as to make the distribution of topology potential value reflect the structure characteristics of network. Potential entropy has been introduced to evaluate the rationality of topology potential value distribution [21].
As can be seen from the formula (1), the topology potential of a node totally depends on the topology structure of its surroundings, which reflects the influence ability of another node over it. Obviously, the topology potential contains rich structural information of the network, which offers a desirable solution to the insufficient structural information in the traditional Laplacian matrix. If we construct the Laplacian matrix by using topology potential of network nodes, the additional structural information can be provided for community detection. So, this paper puts forward a novel algorithm based on topology potential and spectral clustering to improve the performance of community detection, and Section 4 will describe the new algorithm in detail.

Community Detection Algorithm
In this section, we will give out a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm is described as follows.
Output: a community partition of . Algorithm Description: (1) calculate the topology potential of node with formula (1); (2) search all local maximum potential nodes of . Suppose we find local maximum potential nodes; (3) construct the potential component matrix and topology potential matrix of ; (4) compute the normalized Laplacian matrix = −1 of ; (5) compute the first eigenvectors 1 , . . . , of , is the total number of local maximum potential nodes; (6) map all nodes in to R corresponding to eigenvectors 1 , . . . , ; (7) cluster the nodes in R with the -means algorithm into communities 1 , . . . , .
Compared with the traditional spectral clustering method, the new algorithm constructs the normalized Laplacian matrix with the topology potential of nodes and can automatically get the optimal community number from the local maximum potential node.
The following part of the section will focus on the normalized Laplacian matrix construction and local maximum potential node search of the new community detection algorithm.

Normalized Laplacian Matrix Construction.
In order to add additional structural information of networks, the normalized Laplacian matrix is redefined as follows: where the adjacency matrix used in the conventional normalized Laplacian matrix is replaced by the potential component matrix and the degree matrix by the topology potential matrix . The topology potential matrix is an -dimensional diagonal matrix, and the diagonal element , = ( ), that is, the topology potential of node .
The potential component matrix is × matrix, and the matrix elements , are the potential component produced by node at the position node , which is defined as follows: where ( ) is the mass of node; is the hops between node and node ; is an impact factor used to control the influence scope of node. If = , then , = 0; if node is out of the influence scope (⌊3 / √ 2⌋) of node , then , = 0. Figure 1 shows a simple network model, which contains only six nodes. Here, we take the figure as an example to show the construction of the potential component matrix and topology potential matrix . For this network, the selected optimal impact factor = 1.39; thus, the influence scope of node ⌊3 / √ 2⌋ = 2. We can use formula (1) to get the topology 4 The Scientific World Journal ) .

Local Maximum Potential Node Search.
The hill-climbing method is a traditional algorithm for local maximum point search, which may leave out some local maximum points, and search performance is greatly influenced by initial point selection. We give out a new local maximum potential node search algorithm with review to local maximum potential nodes' characteristics.
The key steps of the new search algorithm are shown as follows.
(1) All network nodes are initialized to "unvisited. " (2) Randomly choose an "unvisited" node and compare the topology potential of with its neighbors' . If the topology potential of is higher than all neighbors' , then jump to step (3); otherwise, jump to step (4). (3) Add to the local maximum potential node set and mark as well as its all neighbors "visited. " (4) Mark "visited, " and mark neighbors with lower topology potential than 's "visited. " (5) Repeat steps (2), (3), and (4), until all nodes in network are marked "visited. " (6) If there are two local maximum potential nodes whose distance, that is, hops, is smaller than ⌊3 / √ 2⌋, then we delete the smaller one from . (7) Output the final local maximum potential node set .
More details about local maximum potential node search can be referred to [24].

Simulation Experiments
In this section, a series of experiments will be carried out to empirically evaluate the performance of the new algorithm. Simulation program was implemented with MAT-LAB. The experiment data include two kinds of complex networks: artificial networks and real world networks. The artificial networks were generated by ad hoc model [26] and LFR Benchmark generator [27]. LFR Benchmark is a network generator, which produces networks with powerlaw degree distribution and with implanted communities within the network [27]. The real world networks come from http://www-personal.umich.edu/∼mejn/netdata/. The normalized mutual information (NMI) [28], a widely used measure, is calculated for the community partition by each algorithm.

Ad Hoc Network.
The generated ad hoc network, with 128 nodes, is split into 4 communities containing 32 nodes each. The parameter out is the average edge that links one node with other nodes of different communities. As out increases, the community structure of the ad hoc network becomes ambiguous gradually. In the experiment, we changed out from 0 to 8 and observed the corresponding NMI produced by six methods: our algorithm, traditional spectral method, the -means based on diffusion distance (DD -means) [26], the -means based on dissimilarity index (DI -means) [26], Fast Newman algorithm [29], and Extremal Optimization method [30]. The experiment results are shown in Figure 2, whereaxis represents the value of NMI, and each point represents an average 30 simulation experiments. Compared with the other five methods, our algorithm is only slightly worse than the Extremal Optimization method for 6.4 < out < 7.1. Our algorithm has a good performance for the ad hoc network, and the accurate rate is more than 98% for out < 5.5.

LFR Network.
In generated LFR networks, the node degree and community size distribute according to power law. A mixing parameter is defined as the ratio between the external degree of a node with respect to its community and the total degree of the node [26], 0 ≤ ≤ 1. As increases, the community structure of the LFR network becomes ambiguous gradually. There are many other parameters used to control the generated LFR networks: the number of nodes , the average node degree , the maximum node degree max , the minimum community size min , and the maximum community size max [26].
The experiment results are shown in Figure 3, where -axis represents the value of NMI. Compared with other six algorithms, our algorithm performs quite well, and its accuracy is only slightly worse than that of the Clique Percolation, Louvan, and Informap in the case of 0.25 < < 0.45. Because of the complexity of topology potential distribution in the topology potential field, local maximum potential nodes may not necessarily the real central nodes of communities in some cases, resulting in the split or merger of some actual communities and the fluctuation of NMI value.

American College Football
Network. The American College Football network [18] contains 115 teams, among which 616 games were carried out. In the network, nodes represent teams and edges games. All teams are organized into 12 conferences, and each of which contains about 8-12 teams. These 12 conferences are Atlantic Coast, Big East, Big Ten, Big Twelve, Conference USA, Independents, Mid American, Mountain West, Pacific Ten, Southeastern, Sun Belt, and Western Athletic. We compared our algorithm with other three algorithms, including the traditional spectral algorithm, the spectral algorithm based on modularity [18], and CMITP (community members identification based on topology potential) [22].
Firstly, we compared our algorithm with the traditional spectral algorithm. The latter cannot obtain the football network community number from the ladder distribution of eigenvector elements; therefore, we set its community number the same as our method. Figures 4 and 5 show the community detection results by our algorithm and traditional spectral algorithm, respectively. Each node represents a competing team, using its name as label. The teams in same community are marked the same color. For this network, the traditional spectral algorithm gets six correct communities: Mountain West, Atlantic Coast, Southeastern, Pacific Ten, Big Ten, and Conference USA. Compared with the traditional spectral algorithm, our algorithm gets three new correct communities: Big Twelve, Big East, and Mid American. For the conference Western Atlantic, our algorithm gets 9 correct teams, with only 1 team missing. Both algorithms split the conference Sun Belt and Independents. 6 The Scientific World Journal Secondly, we compared our algorithm with the spectral algorithm based on modularity. Tables 1 and 2 show the results of our algorithm and the spectral algorithm based on modularity, respectively. The conference names are listed in the leftmost column, and columns ∼ represent the communities found by the two algorithms. Each found community consists of teams from one or more conferences as indicated by the numbers in the corresponding column [18]. The spectral algorithm based on modularity divided this network into 10 communities, and six communities are correctly detected: Atlantic Coast, Big East, Big Ten, Big Twelve, Mid American, and Pacific Ten. Compared with the spectral algorithm based on modularity, our algorithm found 11 communities and got a new correct communities Mountain West.
The CMITP method divided this network into 17 communities, and there are many overlapping nodes between communities, such as nodes "Hawaii" and "Nevada. " Table 3 shows the community number, and NMI of four different algorithms. Compared with other three methods, our algorithm got the highest NMI 0.9292. In addition, our   5.4. The Influence of Impact Factor on Algorithm Performance. The impact factor will affect topology potential field and the influence scope of node. With different impact factor , the distribution of topology potential value will be different. These changes may bring out different community We take a real world network, the Zachary karate club network, to analyze the influence of impact factor on algorithm performance. Figure 6 shows the NMI of our algorithm with different impact factor . Figure 6 shows that if ≤ 0.4716, the NMI is 0; if 0.4716 < ≤ 1.66, the NMI is 1; if 1.66 < ≤ 1.90, the NMI is 0.8372; if 1.90 < ≤ 1.934, the NMI is 0.6459; if > 1.934, the NMI is 0.1701. The analysis is as follows. The maximum influence scope of node is ⌊3 / √ 2⌋ hops. When ≤ 0.4716, the influence scope of node ⌊3 / √ 2⌋ = 0; it means that all nodes are isolated and have same topology potential value. For Zachary network, the optimal is 1.02 according to formula (2). When 0.4716 < ≤ 1.66, we can detect accurate community structure, and the NMI is 1. But as further increases, one node can associate with almost all the other nodes. In this case, the distribution of topology potential value cannot truly reflect the structure characteristics of network; therefore, the community detecting results are bad. In a word, as long as the impact factor is set near the optimal value, our algorithm can get good outcomes.

Conclusion
Identifying community structure is crucial for understanding complex networks. Recently, spectral clustering algorithms have been successfully applied in community detection. The traditional spectral clustering methods cannot provide sufficient structural information for community detection and cannot always get the community number from the ladder distribution of eigenvector elements. Aiming at these inadequacies, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with network nodes' topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically judge the optimal community number from the local maximum potential nodes. Experiments on ad hoc network, LFR network, and the American college football network showed that the new algorithm can improve the accuracy of community detection and has significant adaptability.