Due to the defects of all kinds of modularity, this paper defines a weighted modularity based on the density and cohesion as the new evaluation measurement. Since the proportion of the overlapping nodes in network is very low, the number of the nodes’ repeat visits can be reduced by signing the vertices with the overlapping attributes. In this paper, we propose three test conditions for overlapping nodes and present a fast overlapping community detection algorithm with selfcorrecting ability, which is decomposed into two processes. Under the control of overlapping properties, the complexity of the algorithm tends to be approximate linear. And we also give a new understanding on membership vector. Moreover, we improve the bridgeness function which evaluates the extent of overlapping nodes. Finally, we conduct the experiments on three networks with well known community structures and the results verify the feasibility and effectiveness of our algorithm.
Community structure is an important field in complex networks research. In the traditional social network, Newman et al. discover the community structure [
In the exploration of community structure, the crisp division [
In this paper, we propose a fast overlapping community detection algorithm with selfcorrecting ability through the following contributions. First, we introduce new features of modularity as a new evaluation measurement and explore the advantage of the new weighted modularity in structure through combining the cohesive and density synthetically. Second, we propose three test conditions for overlapping nodes and present a fast overlapping community detection algorithm with selfcorrecting ability, which consists of two processes. Under the control of overlapping properties, the complexity of our algorithm tends to be approximate linear. Third, we give a new understanding on membership vector to improve the bridgeness function which evaluates the extent of overlapping nodes. To evaluate the feasibility and effectiveness, we implement our approach on three existing networks with the wellknown community structures.
The rest of the paper is organized as follows. In Section
The overlapping node is the vertex that belongs to more than one community. After the analysis on the wellknown networks whose structures are also known, we propose three conditions to judge the overlapping node in this paper. In particular, all those are not disrelated. The priority is from the top down, and some nodes will be in accord with several conditions. If a node meets one of the conditions, it shall be an overlapping node.
This is the most common case. Referring to the view of Lázár et al. [
The community structure of Protein reaction network.
The common overlap.
The analysis on the known networks reveals that some core nodes in community may reduce the holistic modularity. Though they have many adjacent nodes, the gain in inner connection is less than the outer connection, which results in the decrease of modularity. However, the internal connection of community is strengthened. In the sparse network, it performs as the addition in density besides the stationarity in modularity whose threshold value is 0.015 in Karate club network as shown in Figure
The community structure in Karate club network.
In the networks, the belonging factors of some nodes belong to their communities impartially, which means the adjacent nodes are distributed to the communities averagely. If the degree of node is high, it can be in accord with the previous two cases, just as the Zds2 in Figure
The average distribution of belonging factor should meet the condition in (
In (
The graph of maximum and minimum value in average distribution.
Modularity is the measurement of evaluation and is the preferred object function. Considering various definitions of the past types [
There is the defect in the actual modularity [
The fitness function is unfit for modularity, too. That is because it weakens the inner connection which leads to recognize some structures unsuccessfully. For example, the complete graph and the ring structure are shown as Figure
The complete graph
From the above discussion, in view of network density, we propose a new weighted modularity, which is composed of the density and cohesion.
The new weighted modularity is defined as (
The modularity of whole network is the average of each community. The research on several typical networks shows that the density of whole network is in a low level which is varying between 0.10 and 0.30. For example, Karate club network [
Test on the parameter allocation reveals that when
The detailed communities information in networks.
Network  Community  Node  Inner link  Outer link  Modularity 

Protein reaction network  Community 1  9  21  11  0.751 
Community 2  7  16  11  0.748  
Community 3  8  24  7  0.870  


Karate club network  Community 4  18  35  10  0.746 
Community 5  16  33  10  0.750 
The allocation parameter test graph.
Set the allocation parameter
Through analyzing the test in the known networks, it gets the minimum value of the community structure. In particular, the threshold is for the rough judgment which is not the only condition. In addition, after the node joining, the impacts on the original network, namely the smoothness of the modularity, need to be considered, namely the smoothness of the modularity. The modularity threshold is the last condition in judging, and the detailed introduction is given in next section.
The traditional community detection methods [
Referring to the process of local modularity [
Raw community detection algorithm is composed of the following steps.
Pick a node randomly whose isVisited attribute is false as the root of the community, and get the core of the original community.
If the count of nodes in the community is greater than 3, install the community model and set the isVisited attribute to be true for all original nodes. Otherwise, set the isVisited of root vertex to be true; then return step 1.
Get the adjacent nodes set whose isLocated attributes are false on the basis of parameter nodes which are going to access. And if the count is 0, go next; otherwise, turn to step 5.
If the count of nodes in the community is not less than 5, check whether the isLocated attributes are true, and output the original community, and then return to step 1. Otherwise, return to step 1 directly.
Access each node in the adjacent nodes set in turn, and set its isVisited attribute value is true.
If a node meet the conditions, add it to the current community, and update the community model, and then put it to the next layer to access. Otherwise, return to step 5.
If all the nodes are calculated, return to step 3 with the next layer nodes set.
Here, the conditions for a node joining the community are described as follows. Assume that the node joins the community, it meets one of the following conditions: (1) it brings gain in new modularity; (2) it gets addition in density and stable; (3) the rate linked with vertex in community is not less than threshold value (1/3); (4) the modularity is greater than threshold and stable. Theoretical research proves that random selection in nodes has nothing to do with the community structure of network. In other words, every node must belong to a certain community [
In this progress, it builds the corresponding community model, which records the detailed information, such as the inner nodes and edges, and the outer edges. If a new node joins the community, it needs to update the community model immediately, which avoids the repetitive computation and lessens the complexity in time.
Moreover, the amount of expansion is not unlimited. From steps 3 to 7, even if there are coincident nodes every time, due to the sixdegree theory [
After discovering the community, the algorithm needs to confirm the isLocated attribute. The studies show that the threshold of most nodes’ belonging factor in community is 2/3. But in this paper, the threshold of belonging factor is 3/4, and the rate of overlapping edges in all the adjacent edges is less than 1/4. Our algorithm takes a strict criterion to prevent missing the possible overlapping nodes, which is convenient for the error correction algorithm in the next step, just as the number 31 node in Karate club network.
Some studies on the known networks reveal that some nodes have accessed (isVisited = true) in the raw detection stage but failed to be assigned to any detected communities. The reasons are listed as follows: (1) high threshold modularity; (2) close connections among some nodes which form the structure like analogous triangle and any individual vertex cannot meet the requirements. For example, the number 25, number 26, and number 32 nodes are unable to form an independent cluster and need to join the community in union. So, it is necessary to execute the redistribution algorithm for unallocated nodes, which ensures every node belongs to cover.
The way to acquire the unallocated nodes set is calculating the subtraction between the beginning nodes and allocated nodes. For each unallocated node, the flow of the process is described as Algorithm
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
Owing to the complexity of networks, there may be chain effect. For example, the adjacent nodes connected in sequence just form a chain. To deal with this case, the first node is allocated to the community, and the others are isolated nodes, which will not be participated in the next community detection.
The error detection and correction algorithm aims to recognize and check the overlapping nodes, which ensure the accuracy of the result. Here, the specific nodes are those whose isLocated attribute is false. In the process of initial community detection, the timing of nodes joining the community is different, and some core nodes are put in the cluster first. The other will not be identified completely because of missing the information of adjacent nodes. In the following redistribution process, the unallocated nodes decrease the membership value to the community, which may lead to wrongly label to other community. However, the isLocated attribute of wrong classified nodes is false, too. It is the minimum range to execute error correction algorithm on those nodes whose isLocated attribute is false. The researches on some known network show that the more evident the community structure is, the less the specific nodes are. In Karate club network, it is 11/34, in Protein reaction network it is 5/21, and in Dolphins interaction network it is 8/62.
In our algorithm, for every node, the procedure is described as follows.
Get the adjacent communities list. If they exist, go next. Otherwise, go to step 5.
Select an adjacent community, and verify whether the node meets the conditions to join the community (referring to distribution algorithm), and then decide to join or continue.
If all the adjacent communities are tested, then check whether it is equal distribution. If it is, join each adjacent cluster. Otherwise, go next.
If the type of joining community is equal distribution, return. Otherwise, continue to go.
Get the belonging community set of the node. If the count is 1, return. Otherwise, go next.
Choose an unverified community; recalculate the belonging factor which is linked with the community. If it is bigger than the threshold, return.
Calculate the modularity variation when removing the node from the cluster. If it is positive, remove the node, return. Otherwise, directly return.
After the initial community detection, the information of nodes membership is completed mostly. The experiments reveal that some overlapping nodes interplay, which would change the node property. That is to say, a new joining node will bring its unallocated adjacent nodes to the same community, and the allocated nodes may be overlapping nodes and expand the overlap region, such as the number 10 and number 3. Therefore, the mutual linking nodes should be extracted, and the error detection and correction algorithm should be executed once again. It could clear up the possible wrong division, which makes the partition reasonable and steady.
Moreover, since the error detection and correction algorithm aims to the specific nodes, when expanding the nodes to the whole network, it is able to detect the validity of other community detection algorithms. If there is no change in node membership, they are just steady and accurate.
Community modularity is independent with the partition pattern, whether it is hard or crisp. But for the overlapping and nonoverlapping nodes, the roles are different in the network. It needs bridgeness [
The sum of node membership vertex meets the condition in (
For each node, bridgeness means the degree of sharing in different communities. Nepusz et al. define that it is 0 when the node belongs to only one cluster. And it is 1.0 when sharing equally by the belonging communities. On the basis of membership vertex
The distribution of belonging factors determines the status in network. The more approximate to the uniform distribution it is, the greater the effect is. However, the membership number is a positive and significant element. In addition, the degree of node itself is not an ignorable element. Obviously, Nepusz et al. overlook its own factors. If two nodes all conform to uniform distribution, the bridgeness is 1.0. So it cannot indicate the importance. Moreover, the node contribution of overlap can be positive in more than one community, and
Improved bridgeness is defined as follows, in which
As shown in (
In this section, we evaluate our algorithm with the community structures of three wellknown networks, which, respectively, are Karate club network, Protein reaction network, and Dolphins interaction network. Karate club network [
Karate club network is the classic interpersonal relationship network, in which 34 members constitute 78 connections. Since the disagreements between members lead to divide into two pies, the network is divided into two distinct communities. In traditional hard classification model, the node can only belong to one community. However, after utilizing our algorithm on this network, we found that three nodes meet the criteria of overlapping node. Since overlap is an important characteristic of complex networks, through analyzing the network structure, the overlap model is more in accord with the actual situation.
During our algorithm implementation process, firstly, it chooses the root node of the community, which is “1,” after three extensions, no subsequent nodes can join in, the original community is found; then it selects the root node, which is the “15”; it ends after three times external extension and completes the found of another community; the obvious overlap nodes {“9”, “31”} are identified. Original community discovery process is described as follows:
Com1: “1”
Com2: “15”
After forced distribution, the set of the unallocated nodes is {“32”, “25”, “26”, “29”, “10”} and none of overlapping nodes is classified by mistake. After the initial community distribution, 29 nodes determine the belonging community.
In the error detection and correction algorithm, the set of the detected nodes is {“34”, “3”, “32”, “9”, “31”, “28”, “29”, “26”, “26”, “25”, “20”, “10”}. Since the adjacency node has not been fully allocated in the process of the initial distribution, most of the nodes lack of adjacency information reference and are unable to determine isLocated = true. In view of the connected closely node {“3”, “9”, “10”}, the theoretical analysis has pointed out this problem that wrong classification may result in the change of the node properties. So, the error detection and correction algorithm should be performed again for such a node, which can eliminate the unreasonable factors and make the classification tend to be stable.
Figure
The detailed information of overlapping nodes in Karate club network.
Overlapping node  Degree  Membership number  Belonging factor  Original bridgeness  Improved bridgeness 

3  10  2  0.60  0.86  0.40 
0.50  


9  5  2  0.60  0.55  0.30 
0.80  


31  4  2  0.50  0.65  0.25 
0.75 
The community structure found by our algorithm in Protein reaction network.
As seen from the membership value, in Karate club network, the sum of overlap node membership degree is greater than 1.0, and each node can increase modularity for belonging community. In addition, the cumulative difference of the absolute value of the node 3 and node 31 membership vertex value and average distribution is less than or equal to a quarter, which are also in accord with the condition of average distribution. Therefore, it also verifies the principle and sequence of overlapping node determination conditions.
Protein network is built according to metabolism response relationship between the biological protein, containing 21 nodes and 61 sides. It is a typical overlapping community network, the community structure of which is obvious. Through running our algorithm for finding original community, all of the communities are identified, and the nodes are all classified correctly, including high overlapping nodes. The redistribution algorithm needs not to run; thus our algorithm quickly found the overlapping community, which validates the efficiency of our algorithm.
Let us focus on the implementation process of our algorithm. The root node of each community and its subsequent extension process are described as follows:
Com1: “cdc12”
Com2: “cph1”
Com3: “pph21”
Figure
The detailed information of overlapping nodes in Protein reaction network.
Overlapping node  Degree  Membership number  Belonging factor  Original bridgeness  Improved bridgeness 

Zds2  9  2  0.56  0.89  0.39 
0.44  


Zds1  10  2  0.30  0.88  0.53 
0.40  
0.40 
The community structure found by our algorithm in Protein reaction network.
Protein reaction network also demonstrates the irrationality of the original bridgeness. Zds1 node degree and community membership number are all greater than Zds2. But the original bridgeness is still lower than Zds2, and the difference of numerical value is very small, while our improved bridgeness avoids this defect, which is more in line with the node’s position in the network community.
Dolphins interaction network is established according to the dolphin interaction information. Its community structure found by our algorithm is shown as Figure
Com1: 0
Com2: 13
The community structure found by our algorithm in Dolphins interaction network.
Between the starting node 0 in the community Com1 and the last joined node 58 in the community Com1, the shortest distance among them is only 3, in line with the feature of the small world network. Unallocated node set {“27”, “25”, “26”, “22”, “31”, “48”} is allocated correctly, and the specific nodes {“17”, “1”, “27”, “7”, “19”, “25”, “26”, “39”} are achieved stability after completing the error detection and correction algorithm.
The detailed information of overlapping nodes is shown in Table
The detailed information of overlapping nodes in Dolphins interaction network.
Overlapping node  Degree  Membership number  Belonging factor  Original bridgeness  Improved bridgeness 

19  4  2  0.50  0.65  0.25 
0.75  


39  2  2  0.50  1.00  0.14 
0.50  


7  5  2  0.60  0.80  0.33 
0.60 
The other community found that optimization algorithms take the maximize objective function value as the end condition. The initial community detection algorithm proposed in this paper is different and based on extracting the overlapping node features. Multiple conditions and multiple thresholds are constrained to find natural communities, which contains situations as follows: (1) the basic modularity increases; (2) density increases with stability; (3) the connection is greater than the community size threshold; (4) the modularity is greater than the specified threshold with stability; (5) the belonging factors are in accord with uniform distribution. These conditions form an access priority according to the judgement order, and high computational complexity would inspect in the final. Through the reasonable arrangement of the condition priority, it avoids the unnecessary calculation and reduces the complexity.
In the process of our algorithm running, through controlling the attribute of isLocated, it gradually shrinks the expansion space of the adjacent available nodes and cuts down the repeated access of the nodes, which improve the operation efficiency of our algorithm. For nonoverlapping nodes, theoretical visit is only one time. For the possible overlapping nodes, it only needs to visit and verify the relevant adjacent communities. The attributes are to classify nodes, and it can avoid the repeated visit and compute for the irrespective node effectively.
In the process of discovering natural communities, by establishing community model and updating community information in real time, it avoids the repeated compute community information when the nodes join the community. Most of the time is spent in the process of finding natural community. The subsequent error detection and correction algorithm is a supplement. Since the involved node is less, the complexity of the calculation is lower than community discovery process. So the overall complexity is approximate linear. It shows the node proportion information in each stage during our algorithm running in Table
The proportion of overlapping nodes and nodes of each stage in the networks.
Community structure  Overlapping proportion  Raw community 
Redistribution proportion  Error detection 

Karate  0.09  0.85  0.15  0.32 
Protein  0.10  1.00  0.00  0.24 
Dolphins  0.05  0.9  0.10  0.13 
In each period of our whole algorithm, the core is the raw community detection algorithm, which detects the main areas of communities. It determines the number of network partition, and affects the efficiency of the algorithm. Compared with the data in the network, the vast majority of nodes in the community are nonoverlap. The overlap nodes and nonoverlap nodes can be identified by the attribute of isLocated, which greatly reduces the repeated visit and calculation to the irrelevant nodes, and the algorithm enables to detect communities rapidly. In Figure
The proportion of the various types of nodes in networks.
In this paper, we first put forward the new features of modularity and also show the advantage of the new weighted modularity in structure, which is based on cohesive and density synthetically. The experiments are conducted on the classical networks with wellknown community structure, which explore the distribution of the parameter factor. In addition, according to the proportion of overlap in the community, we present a fast overlapping community algorithm with selfcorrection by setting the nodes with the attributes of isVisited and isLocated, which consists of two stages: (1) the initial community detection algorithm and (2) the error detection and correction algorithm. We also propose an improved bridgeness function to evaluate the extent of overlapping nodes. The experimental results demonstrate that our algorithm is good for the expansion of discovery algorithm when extracting the overlap features. Although our algorithm is already effective, but the later work can be expanded in more different types of networks, to test out appropriate parameter and conclude parameter distribution principle. In addition, the threshold setting in the overlapping node conditions, such as in the modularity, stationarity, and the close connection, is strict in algorithm, which expands the scope of nodes in error detection and correction slightly. However, finding and extracting the new features of overlapping node are the directions in the next step. Through the experiments on the existing network, our algorithm can be applied to largescale networks in the future.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was supported in part by Guangdong Natural Science Foundation (Grant no. S2013040012895), the Major Fundamental Research Project in the Science and Technology Plan of Shenzhen (Grant no. JCYJ20120613104215889, Grant no. JCYJ20130329102017840, and Grant no. JCYJ20130329102032059), and Natural Science Foundation of SZU (Grant no. 00035697). The authors wish to express their appreciation to the four anonymous reviewers and the journal editor for their valuable suggestions and comments that helped to greatly improve the paper.