Community Clustering Algorithm in Complex Networks Based on Microcommunity Fusion

With the further research on physical meaning and digital features of the community structure in complex networks in recent years, the improvement of effectiveness and efficiency of the community mining algorithms in complex networks has become an important subject in this area. This paper puts forward a concept of the microcommunity and gets final mining results of communities through fusing differentmicrocommunities.This paper starts with the basic definition of the network community and applies Expansion to the microcommunity clustering which provides prerequisites for the microcommunity fusion. The proposed algorithm ismore efficient and has higher solution quality comparedwith other similar algorithms through the analysis of test results based on network data set.


Introduction
Network community structure is one of the most common and important topological properties of complex networks whose characteristic is that links between the same communities are dense while links between different communities are sparse.The research on the network community mining algorithm has a very important theoretical meaning for analyzing the topology of complex network, understanding its function, finding its hidden patterns, and predicting its behavior which is widely used in social networks, biological networks, and the World Wide Web.The literature [1][2][3][4] summarizes the research background, research significance, research status at home and abroad, and current main problems of the complex network clustering method.
Network community clustering algorithm can be divided into intelligent optimization algorithm and heuristic algorithm or the mixture of the two algorithms.The idea of intelligent optimization algorithm is to abstract the community clustering problem into a mathematical problem of calculating the optimal solution, using intelligent optimization algorithm to calculate the optimal solution which is updated by judging preferential conditions of the objective function.The idea of heuristic algorithm is to calculate the community which each node belongs to according to the rules of the algorithm [5][6][7].
In recent years, with further research and exploration in the complex network community, efficient community clustering algorithms emerge endlessly.The multiobjective discrete particle swarm optimization (MODPSO) [8], one of intelligent optimization algorithms, calculates the optimal scheme for the community clustering by updating two objective functions: NRA and RC.This algorithm has better nonrandomness and executes efficiently.Moreover, research on heuristic algorithms continues to develop; core node fusion algorithms based on data field [9] and betweenness centrality [10] have also received widespread attention.
Radicchi has given characteristics of the network community structure [11].Links between nodes in the same communities are dense while links between nodes in different communities are sparse.For a network  = (, ),  represents set of nodes and edges in networks,   represents the degree of node  (the number of nodes connected with node ),  represents an adjacency matrix of the network , The definition of strong community is The definition of weak community is According to this characteristic of the community, we can divide a real community which a node belongs to and split edges which connected with node  into two parts: edges connected with community  and edges disconnected with community , and the number of edges is  in  and  out  , respectively.We can determine the community which nodes belong to by comparing the two values.If  in  >  out  , we can determine which community that node  belongs to.Further analysis shows that if  in  ≥   /2, we can determine that node  belongs to this community.
Newman and Girvan put forward the concept of modularity to measure the quality of network community clustering in paper [12].Many community clustering algorithms have accepted this concept as an index to measure the quality of community clustering.The formula of modularity  is given as follows: where  represents the number of edges in the network,   is the adjacency matrix of the network,   is the degree of node , and (, ) = 1 represents that node  and node  belong to the same community while (, ) = 0 represents that node  and node  are not in the same community.
In multiobjective particle swarm algorithm, objective function in single objective particle swarm modularity  is further replaced with modularity density  and we explore by updating the value of RA (Ratio Association) and the RC (Ratio Cut).The formula of RA, RC, and  is shown as follows: where  denotes the number of nodes in the network,  represents the number of divided communities in the network,   is the th community among divided communities, |  | is the number of nodes in the th community,   represents the set of nodes which are not in the community , (  ,   ) = ∑ ∈  ,∈    ,  is the adjacency matrix of the network, and RA and RC are closely connected with the two measurement indexes (Conductance and Expansion) of network community clustering mentioned in the paper [13].Conductance denotes the ratio of the number of nodes pointing outside the community to the number of edges of the community.Expansion represents the number of edges each node has which point outside the community.
The formula of Conductance is The formula of Expansion is In the above formulas,   represents the number of links on the boundary of ,   denotes the number of links within the community , and   is the number of nodes in community .
The algorithm of this paper adopts the divide-andconquer strategy [14].The nodes in the network are divided into microcommunities with a single node as the core.We can get the community structure through fusing the microcommunities randomly [15].After finishing the core steps of the algorithm, the final result of the community clustering can be screened out via the index of the modularity density.The clustering of microcommunities and the whole process of the algorithm will be described in Section 2 in detail.Section 3 will make simulation analysis on experimental results of the algorithm.

Microcommunity Fusion
The algorithm in this paper constructs microcommunities according to the index Expansion during the process of community clustering.In the procedure of microcommunity fusion, it merges and fuses microcommunities according to the definition of strong community.

Microcommunity.
The algorithm is different from other heuristic algorithms.The algorithm divides communities into several basic "microcommunities" in the network and then mergers and fuses these "microcommunities" to get the final results of the network community clustering.
Firstly,  nodes with larger values of degree in the network are selected corresponding to center nodes of  communities in the network.The selected minimum value of degree is called the threshold.By testing different choices of the threshold during the process of the algorithm, we can find that the threshold is larger than the average value of degree of network nodes and the experimental result is ideal when the number of center nodes accounts for about 20% (see Section 3.4) of the total number of nodes.The formula of the choice of threshold Deg is where  = { 1 ,  2 , . . .,   } is an array which is in ascending order according to the value of node degree in the network,  is the number of elements in the array, that is, the number of nodes in the network, and the value of  is 20% in this algorithm.
According to the chosen center node, eligible node in its neighbors is selected to join the microcommunity.This algorithm uses the index Expansion summarized in [10] for choice.Because Expansion is a nonlinear function, when the number of nodes is small, the change range of the function value cannot meet the expectation.Thus, this paper adjusts the computing method in the index Expansion and removes the center nodes and connected edges.The calculation example of Expansion is given in Figure 1.
The calculation process of Expansion which used  9 as its center node is given as follows.At the stage of initialization, we set all neighbor nodes of  9 as a microcommunity which sets  9 as its center node and the value of Expansion is 4/6.The algorithm traverses each neighbor node and calculates the value of Expansion after removing it out of the microcommunity.If the value becomes small, the node will be removed.Otherwise, it calculates the next neighbor node.In the example, the change process of Expansion value of each node in the network diagram is given in Table 1.One of the initial nodes traversed is randomly selected as  3 .EXP is the value of Expansion before removing the node.EXP NEXT is the value of Expansion after removing the node.
According to the information from the table, nodes which set  9 as center node of the microcommunity are  3 ,  6 ,  7 ,  8 , and  9 .Compared with standard Expansion, the Expansion used to screen out the node from the microcommunity is stricter in computational condition.It is conducive to make the structure of microcommunity stable and the change of nodes is more explicit during the process of the microcommunity fusion.

Algorithm Flow.
In this algorithm, each center node clustered is used as the core node of a microcommunity.The algorithm merges microcommunities by comparing the close level of links between different core nodes. and  are core nodes,   represents the degree of node ,   represents the degree of node ,   represents the assigned community number of node ,   represents the assigned community number of node ,  is the adjacency matrix of two nodes,  and  have some overlapping neighbor nodes, and  is the If then   =   .All nodes of the microcommunity are updated synchronously. If then   =   .All nodes of the microcommunity are updated synchronously.
After completing the fundamental fusion of microcommunities, the node without clustering will be classified.The proportion of the number of each neighbor node of the undetected nodes in community numbers is checked.Then, the node will be added to the community which has the largest proportion.The operation of merging is implemented according to the sequence of nodes during the process of searching network nodes three times.But as we know, the relationship of nodes in complex networks is extremely cumbersome.Each node may repeat with more than one node's neighbor nodes.And the ratio of repetition is more than 1/2.Therefore, if we take the ordinal search classification algorithm, some unpredictable extreme situations will emerge.In view of the consideration of the detail, in this paper, the order of search is generated randomly.In the last step of the algorithm, the result which fits the community clustering rules better can be screened out.
The following are the specific steps of the algorithm.
Step 1. Detect nodes in the network to  microcommunities and set the numbers of center nodes as the microcommunity numbers according to formula (8) and (9).
Step 2. Fuse  microcommunities on the basis of the order of the random sequence according to formula (10) and (11).
Step 3. The community which is not detected is added to its most closely linked community.
Step 4. Classify nodes of communities which have not been detected.Step 5. Save the result of classification, compute the module density , and save it after detecting community.

Mathematical Problems in Engineering
Because the different order of merging network nodes can lead to different results, the clustering result with larger value of module density  is used as the final result of the community clustering according to formula (6).
During the process of conducting the core steps, Steps 2 and 3, of the algorithm, both steps search the network nodes once, respectively.The time complexity of the algorithm is ( 2 ).

Simulations and Analysis
The algorithm of this paper is written in JAVA.The hardware environment of running the program is Inter (R) Core (TM) i5-4200U CPU, 1.60 GHZ, and 4 GB RAM.The software environment is Microsoft Windows 8.1 operating system, jdk 1.7, and Eclipse software development environment.
In order to analyze the quality of network community clustering easily, this paper adopts the so-called Normalized Mutual Information (NMI) index described in [16] to compare the actual clustering result with the clustering result of this algorithm.NMI is commonly used to estimate the similarity between the true clustering results and the detected ones.Two vectors,  and , are inputted during the process of comparison.The th bit of the vector represents the class of the th node.The NMI (, ) is then defined as follows: , (12) where   (  ) is the number of clusters in vector  (),  is the mixing matrix which consists of vector  and vector ,   is the number of elements shared in common by the th classification of vector  and by the th classification of vector ,  .( . ) is the sum of elements of  in row  (column ), and  is the number of nodes of the network.The value of NMI (, ) is in the interval [0, 1].If NMI (, ) = 1, then  = .If NMI (, ) = 0, then  and  are totally different.
This paper conducts the test on Dolphin Networks, Football Networks, Karate Networks, and so on.The clustering result of the algorithm in this paper is better than other algorithms by analyzing the experimental results.At the same time, this algorithm has higher execution efficiency. 2 and Figure 2.

Experimental Data Analysis of Dolphin Networks. Data set profile of Dolphin Networks [17] is shown in Table
Each node represents a bottlenose dolphin in the data set.By observing the living habits of these dolphins for a long time, their study found that these dolphins show a specific pattern of contact and construct a social network containing 62 nodes.If two dolphins do something together frequently, there will be an edge between the two corresponding nodes in the network.
The algorithm of this paper conducts the community clustering on Dolphin Networks and sets the maximum value of the module density:  = 4.326.This clustering result is as follows: the value of the module degree  is 0.374 and the value of NMI is 1.0.This result is the same as the actual community clustering.As already stated in Section 2.1, the threshold selected in this data set is [62 * (1-20%)] = 7; that is, the node whose degree is equal or greater than 7 is chosen as the core node.During the investigation of the data set, we chose multiple parameters for test.Figure 3 gives the comparison of real clustering results from 10 groups of parameter calculation in which  choose the value from 0 to 90%.As shown in the diagram, when  = 20% and  = 30%, the real clustering result occupies the largest proportion.From the calculation, we find that when  = 30%, obtained threshold [62 * (1-30%)] = 7 is the same as the former.

Experimental Data Analysis of Football Networks. Data set profile of [18]
Football Networks is shown in Table 3.
In the network, each node represents a university team which participates in the USA football season in 2000.The edge which links two nodes represents that the corresponding two teams once had a game at least rather than the relationship between the two teams.
The actual community structure of Football Networks is given in Figure 4. We can get the community clustering result shown in Figure 5 by using the algorithm in this paper.The module degree  of the actual community clustering of Football Networks is −0.0239 and the module density of the actual community clustering of Football Networks is −100.83.Obviously, the actual networks clustering of Football Networks does not fully comply with the rules of network community clustering.In Figure 4, we can find that all nodes of community 6 cannot meet the basic rules of community clustering.Nodes of community 6 have no connection with each other.But the connection between nodes of community 6 and nodes of other communities is dense.The condition that a few nodes have less connection with their own community also exists in other communities.It is inevitable for those communities which have a lot of nodes.Figure 6 also gives the comparison of experimental results when  choose different parameters.Because of the irrationality of the real clustering in this data set, the diagram only shows the proportion of the modularity density when the value is larger than 1 in experimental results.From the diagram, we can find that when  choose the value between 20% and 50%, the proportion is large, and the final threshold of the degree is same, so we choose 20% in the experiment.

Experimental Data Analysis of Karate
Networks.This network is a classical data set in the field of social network analysis [19].In the early 1970s, Zachary, a sociologist, spent two years to observe the social relation network among 34 members of a karate club in an American university.The network consists of 34 nodes.An edge between two nodes indicates that the corresponding members are friends and they contact each other frequently.The network attribute profile is shown in Table 4.
According to this algorithm, we conduct the community clustering to Karate Networks.We set the maximum value of the module density  = 2.826.At the same time, NMI = 1.0.This result is the same as the result of the actual community clustering.The topology of Karate Networks community clustering is shown in Figure 7.
Figure 8 gives the contrast diagram of experimental results when  in this network data set choose different parameters.When  = 20%, the threshold of degree is 6; the real clustering result occupies the largest proportion.

Summary. Through comparing experimental results
using different values of  in different networks comprehensively, we choose  = 20% as the final parameter which has good experimental results for most of the networks.In fact, the selection of  covers a wide range because a threshold     may correspond to multiple , while  = 20% is covered in parameters in most of the better experimental results.

Conclusions
There are many kinds of algorithms for community clustering in complex networks.All of them, however, not only have advantages but also have drawbacks.According to the degree of nodes in the network and Expansion, the algorithm proposed in this paper clusters several microcommunities and gets the final community structure of the network by merging microcommunities.The clustering can be implemented by generating 100 random sequences when merging core nodes, and then the result with the maximum value of the modularity density will be selected as the final result of clustering.Experiments show that sieve method used in this paper can efficiently find the result which is very close to the result of community structure in real networks.According to the basic principle of the network community clustering, the algorithm can give the better community structure in an efficient way.

Figure 1 :
Figure 1: Example of calculation process of Expansion.

Figure 3 :
Figure 3: Proportion of real clustering result calculated from different .

Figure 4 :
Figure 4: The actual community clustering of Football Networks.

Figure 5 :
Figure 5: Football Network based on Microcommunity Fusion algorithm.

Figure 6 :Figure 7 :
Figure 6: Experimental comparison of each of the  in data set of Football Netwok data set.

Figure 8 :
Figure 8: Experimental comparison of each of the  in data set of Karate Network data set.

Table 1 :
The selection of nodes in the microcommunity.overlapping neighbor nodes.If   = 0, we do not take measures to deal with both nodes.If   = 1,  will be calculated.

Table 2 :
Dolphin Networks data set properties.

Table 3 :
Football Networks data set properties.

Table 4 :
Karate Networks data set properties.