An Improved Topology-Potential-Based Community Detection Algorithm for Complex Network

Topology potential theory is a new community detection theory on complex network, which divides a network into communities by spreading outward from each local maximum potential node. At present, almost all topology-potential-based community detection methods ignore node difference and assume that all nodes have the same mass. This hypothesis leads to inaccuracy of topology potential calculation and then decreases the precision of community detection. Inspired by the idea of PageRank algorithm, this paper puts forward a novel mass calculation method for complex network nodes. A node's mass obtained by our method can effectively reflect its importance and influence in complex network. The more important the node is, the bigger its mass is. Simulation experiment results showed that, after taking node mass into consideration, the topology potential of node is more accurate, the distribution of topology potential is more reasonable, and the results of community detection are more precise.


Introduction
Most complex networks show community structure; that is, groups of vertices that have a higher density of edges within them and a lower density of edges between groups [1]. Identifying community structure is crucial for understanding the structural and functional properties of complex networks [2]. Many works inspired by different paradigms are devoted to the development of community detection [3]. Recently, topology potential theory was introduced to complex network area for community detection [4]. Because of its inherent advantage, such as low time complexity and good performance, this novel theory has attracted plenty of attentions [5][6][7][8][9].
Gan et al. [4] used the topology potential theory to describe the interaction and association among complex network nodes and put forward a community detection algorithm based on topology potential. The community structure can be uncovered by detecting all local high potential areas margined by low potential nodes.
Han et al. [5] proposed an overlapping community detection algorithm based on topology potential. A complex network will be divided into separate communities by spreading outward from each local maximum potential node.
The algorithm claims that different nodes play different roles in complex network, such as seed node, overlapping node, and isolated node. Different community roles are identified during spreading process.
Zhang et al. [6] proposed a variable scale network overlapping community identification method based on topology potential. This method defines an identity uncertainty measure to identify overlapping nodes and utilizes the parameter to control community scale. Topology potential calculation is the foundation and key step for the above topology-potential-based community detection methods. In a given network = ( , ), where = {V | = 1, . . . , } is a set of nodes, is the total number of nodes, = {V , V | V , V ∈ } is a set of edges, and = | | is the total number of edges. The topology potential of any node V can be computed as follows: where (V ) is the topology potential of node V ; is the distance between node V and node V ; is the mass of node V ; and is impact factor, which is used to control 2 The Scientific World Journal the affecting hops of node. The optimal impact factor can be obtained by using the method described in [4].
In formula (1), the node mass is an important parameter, which will directly affect the value of (V ). However, almost all the above topology-potential-based community detection methods ignore the difference between nodes and assume = 1. This hypothesis is debatable, and the reasons are described as follows.
On one hand, a node's mass reflects its inherent properties, such as importance and influence. Different nodes have different inherent properties. For example, in social network, the importance of different people is significantly different, and public figures obviously have more influence than general people.
On the other hand, (1) shows that topology potential (V ) depends on the distance and the mass (the impact factor is a constant). If we suppose = 1, the calculated topology potential value will deviate from the actual value, and this deviation may affect the precision of community detection.
In order to solve the above problems, this paper puts forward a mass calculation method for complex network nodes, which is inspired from the idea of PageRank [10] algorithm. Node mass calculated by this method can effectively reflect the importance and influence of nodes in complex network. The more important a node is, the bigger its mass is. Simulation experiment results showed that, after taking node mass into consideration, the topology potential of node is more accurate, the distribution of topology potential is more reasonable, and the results of community detection are more precise.
This paper is organized as follows: Section 2 describes the node mass calculation method; Section 3 analyzes the influence of node mass on topology-potential-based community detection; and Section 4 comes to the conclusion of this paper.

Node Mass Calculation
Apparently, matter particle has its inherent mass. But how to weigh the mass of network nodes? A node's mass should reflect its importance and influence in the complex network. The more important a node is, the bigger its mass should be. Inspired by the idea of PageRank algorithm, this paper puts forward a mass calculation method for complex network nodes.
The PageRank algorithm has been successfully used by Google to evaluate the importance of web pages. Each web page is assigned a PR value to reflect its importance. The algorithm claims that the PR value of a web page can be measured by the number and importance of web pages linking to this page. Generally speaking, the more web pages link to this page, the more important it is. The contributions of these web pages are different: the more important these pages themselves are, the more contribution they make to this page.
Similarly, the importance of a network node can be measured by the number and importance of its neighbor nodes. The more neighbor nodes the node has, the more important it is. The more important its neighbors themselves are, the more important the node is. Definition 1. In a given network = ( , ), where = {V | = 1, . . . , } is a set of nodes, is the total number of nodes, = {V , V | V , V ∈ } is a set of edges, and = | | is the total number of edges. The mass of any node V is defined as follows: where is the degree of node V ; and is the damping factor, 0 < < 1.
Definition 1 shows that the value of damping factor will influence the distribution of node mass. The PageRank algorithm set at 0.85 according to a large number of experiments and experiences. Apparently, a suitable damping factor is also needed in node mass calculation.
This paper selected a representative social network-Zachary network to analyze the relationship between damping factor and node mass. The Zachary network is a karate club network with 34 members. This karate club finally split into two communities because of the confliction between its chairman and coach. Table 1 shows the mass of number 1 node-number 7 node with different damping factors.
As can be seen from Table 1, when is 0, the mass of the seven nodes are all 1, which means that there is no difference in these nodes. With increasing, the mass difference between nodes gradually becomes apparent. When comes to 1, the mass difference reaches the maximum. Figure 1 shows the gap between maximum mass and minimum mass of Zachary network nodes with different damping factors. As can be seen from Figure 1, the gap is increasing with the increasing of , and it almost shows a linear uptrend. In order to ensure mass difference between nodes, highlight important nodes, and meanwhile avoid extreme mass difference, this paper selects , to which the half position (B in Figure 1) between no mass difference (C in Figure 1) and the biggest mass difference (A in Figure 1) corresponds, as optimal value. For the Zachary network, the corresponding optimal damping factor is 0.38 (D in Figure 1).
Node mass calculated by our method can effectively reflect the importance and influence of nodes in complex network. The more important a node is, the bigger its mass is. After taking node mass into consideration, the topology potential of node will be more accurate, and the distribution of topology potential will be more reasonable. Now that mass reflects the influence of node in whole complex network, thus, the topology potential, which depends on the distance and the mass , is of global characteristic to some extent. This global characteristic will be meaningful for community detection.
The Scientific World Journal 3

Simulation Experiments
This section will empirically analyze the influence of node mass on three typical topology-potential-based community detection methods. These three methods come from literature [4], literature [5], and literature [6]. In this paper, they are called Gan, Han, and Zhang, respectively. Simulation program was implemented using scientific computing software MATLAB in the Windows environments. The experiment data include two complex networks: one is a real world network-Dolphin social network, which comes from http://www-personal.umich.edu/ ∼mejn/netdata/; and the other is an artificial network, which is generated by LFR-Benchmark generator [11]. LFR-Benchmark is a network generator, which produces networks with power-law degree distribution and with implanted communities within the network.
For each network, there are two schemes: one is "without mass" scheme, which ignores node difference and sets = 1, and the other is "with mass" scheme, which takes node mass into consideration; node mass is computed according to Definition 1. We analyzed the topology potential of nodes and community detection results with these two schemes.

Artificial Complex Network.
The artificial complex network is generated by the LFR-Benchmark generator. The node number is 100, the edge number is 230, the average degree is 4.6, and the implanted community number is 2. The structure of the artificial complex network is shown in Figure 2. Table 2 shows the topology potential of number 1 nodenumber 20 node with two schemes. As seen from Table 2, the topology potential of artificial network nodes shows obvious changes after taking node mass into consideration. Table 3 shows the top 20 nodes with the biggest topology potential in two schemes. As seen from Table 3, the top 20 nodes sequence changes from the fourth biggest node after taking node mass into consideration. The change of node sequence implies the change of topology potential distribution, which may affect community detection results.

The Influences of Node Mass on Community
Detection Results. The artificial complex network contains two communities: the community 97 and the community 99 . The representative node of 97 is number 97 node, and the representative node of 99 is number 99 node.
(1) The Gan Method. The Gan method first identifies internal nodes and boundary nodes and then uses defined benefit function to determine which community a boundary node belongs to. For the "without mass" scheme, the boundary nodes identified by the Gan method are  {8, 11, 20, 25, 32, 39, 40, 41, 42, 43, 49, 67}, with a total number of 12. For the "with mass" scheme, the boundary nodes identified by the Gan method are {8, 11, 20, 25, 32, 39, 40, 42, 43, 49, 67}, with a total number of 11. Obviously, after taking node mass into consideration, the boundary nodes reduced from 12 to 11; thereby it can lighten the load of determining    which community a boundary node belongs to. As can be seen from Figure 2, number 41 node is apparently the internal node of community 97 . But if we do not take node mass into consideration, this node is regarded as boundary node by mistake.
(2) The Zhang Method. Zhang method uses the same strategy as the Gan method to identify internal nodes and boundary nodes. The only difference is the way of determining which community a boundary node belongs to. Therefore, For the "without mass" scheme, the boundary nodes identified by the Zhang method are also {8, 11, 20, 25, 32, 39, 40, 41, 42, 43, 49, 67}, with a total number of 12. When we take node mass into consideration, the boundary nodes identified by the Zhang method are also {8, 11, 20, 25, 32, 39, 40, 42, 43, 49, 67}, with a total number of 11.
(3) The Han Method. The community detection results remain the same with these two schemes. The reason is as follows: the Han method simply utilizes topology potential to find local maximum topology potential nodes, that is, representative nodes of communities, and then it uses a strategy similar to modularity to determine which community nodes it belongs to. Whether we take node mass into consideration or not, local maximum topology potential nodes are not changed (always number 97 node and number 99 node), and complex network structure is steadiness; therefore, community detection results remain the same.

Dolphin Social Network.
The Dolphin social network describes the frequent associations between 62 dolphins in a community living off Doubtful Sound, New Zealand. The structure of the Dolphin social network is shown in Figure 3. Table 4 shows the topology potential of number 1 nodenumber 20 node with two schemes. As seen from Table 4, the topology potential of Dolphin nodes shows obvious changes after taking node mass into consideration. Table 5 shows the top 20 nodes with the biggest topology potential in two schemes. As seen from Table 5, the top 20 nodes sequence changes from the fifth biggest node after taking node mass into consideration. This change may affect community detection results.