Heuristic Artificial Bee Colony Algorithm for Uncovering Community in Complex Networks

1College of Computer Science and Technology, Jilin University, Changchun 130012, China 2Computer & Electrical Engineering and Computer Science Department, Florida Atlantic University, Boca Raton, FL 33431, USA 3School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun 130117, China 4Department of Applied Mathematics, Changchun University of Science and Technology, Changchun 130022, China


Introduction
A community is thought of as a set of nodes where nodes are densely interconnected and sparsely linked to other parts of the networks, producing a community structure of complex networks like that sketched in Figure 1. In the case, there are three communities which are denoted by the dashed circles. A community is a set of nodes with a high density of internal links, whereas links between communities have a comparatively lower density.
In real world, many complex systems can be represented in terms of networks. For example, social networks are widely observed in our lives, because people naturally are clustered to form communities, within their work environment, family, and friends [1]. Another example is the proteins network [2], where they are interacted very frequently with each other, as they belong to metastatic cells, which have a high motility and invasiveness with respect to normal cells. Communities of proteins networks correspond to functional groups. The web is a network of HTML pages interconnected by hyperlinks. Communities of the web networks [3] are groups of pages having topical similarities. Internet is described by service and client as nodes and the connection between them by TCP or UDP. Communities of the Internet are groups of applications with similar function. The research of community structure in complex networks has positive theoretical and practical effects on analyzing topology structure of complex networks, understanding the functions of complex networks, detecting potential patterns of complex networks, and forecasting behaviors of complex networks.
The rest of this paper is organized as follows: the related works are discussed in detail in Section 2. Section 3 describes HABC, including overall framework of HABC, colony and nectar sources initialization, and heuristic searching progress. In Section 4, the experiments are carried out to verify the proposed method, and then in Section 5, conclusions are drawn. and comparison, one can refer to references [4][5][6][7][8]. The community detection problem is challenging in part because it is not very well posed [9]. The essence of the community detection problem is a node-clustering problem of complex networks. To resolve it, various approaches have been developed in recent years, including hierarchical clustering, dynamic algorithm, and optimization. Most variants of the graph partitioning problem are NP-hard; that is, it is unlikely that the solution can be computed in a time growing as a power of the graph size [4]. For example, network community structure detection can be modeled as modularity maximization problems, but unfortunately, it has been proved that modularity optimization is a NP-hard problem, so it is probably impossible to find the solution in a time growing polynomially with the size of the graph.
Hierarchical clustering is the first kind approach to detect community structure. Hierarchical clustering algorithms create a dendrogram according to the similarity between two groups of nodes and then uncover the community structure through cutting the dendrogram into subdendrogram. Hierarchical clustering algorithms can be classified into two categories: agglomerative algorithms and divisive algorithms. Agglomerative algorithms mainly include EAGLE [10], CPM [11], MCC [12], and CDHC [13]. Divisive algorithms mainly include GN [14] by Girvan and Newman, BCA [15] by Tyler et al., and local algorithm [16] by Radicchi et al. Hierarchical clustering has the advantage that it does not require a preliminary knowledge of community structure. However, the efficiency of these algorithms is poor in the procedure of cutting the dendrogram.
Dynamic algorithms employ processes running on the networks. Random walk [17][18][19] and label propagation are often applied to detect communities in dynamic algorithms. A drawback of random walks algorithm is the fact that the final partition is strongly dependent on the choice of the parameter. Therefore the most meaningful partitions cannot be obtained. Label propagation includes LPA [20], CNP [21], -rank [22], COPRA [23], and BMLPA [24]. Label propagation is near linear time algorithm. However, the result of label propagation algorithm is not stable.
The methods in the third category produce communities by optimization [25]. In the community problem, the number of communities and the size of each community are unknown. For example, in a biological network, community structure is not observed. In order to evaluate the quality of a partition, many graph clustering indexes have been proposed [26][27][28]. Clustering indexes optimization becomes a very effective method to detect communities. Modularity proposed by Newman and Girvan is the most popular in these indexes [29]. Optimization of modularity includes hierarchical optimization of modularity by Blondel et al. [30], extremal-optimization-based algorithm that optimizes the modularity [31], and optimization modularity via mean field annealing [32]. However, Brandes found that the unknown complexity status of modularity maximization [33][34][35] is NPcomplete in the strong sense [36]. Therefore, the study of modularity optimization has attracted attention recently. GA is applied to discover community [37][38][39][40]. The advantage of GA is the rapidity of convergence and universalness. However, premature convergence is the shortcoming of GA, and the speed of convergence is slow in later iteration process. Swarm intelligence based computation is an important approach of address optimization problem, and it mainly focuses on the collective behavior of decentralized, selforganized systems [41]. Ant colony optimization algorithms [42][43][44] are applied to detect community in complex networks. Our method which is named Heuristic Artificial Bee Colony (HABC) algorithm is this kind. The global optimization process of HABC is accelerated by positive feedback of bee colony optimizing.
In this paper, we apply Artificial Bee Colony optimization algorithm to detect community structure. The algorithm is named HABC. The original Artificial Bee Colony (ABC) uses random searching [45], so it will lead to a time consuming search process for community detection problems. Therefore, based on ABC, the proposed HABC introduces a new heuristic function to the search process, in order to find the local optimal component according to neighbor communities. The HABC includes initialization and three searching processes. In initialization phase, every nectar source gets simply community structure through using network dynamic algorithm. Employed bees and onlookers use a heuristic function to guide the searching process.
The key contributions of this paper can be summarized as follows: (1) proposing a Heuristic Artificial Bee Colony based community detection algorithm; (2) redefining a searching progress of Artificial Bee Colony algorithm to address community problem; (3) using an agglomerate probability of two neighbor communities as heuristic function of searching progress.

Heuristic Artificial Bee Colony Algorithm
Before we introduce the proposed HABC, we would like to briefly introduce some basic concepts in network. A network can be modeled as a graph = ( , ), where is the set of nodes and is the set of edges. is the adjacency matrix of . is 1 if there is an edge form node to node and 0 for otherwise. A community in a network is a group of nodes where nodes are densely interconnected and sparsely linked to other parts of the networks. Community structure can be represented by a set = { 1 , . . . , }, where is the number of communities and is a community which is a set of nodes Input: a complex network = ( , ); Output: the set of communities C; (1) Initial parameters: NNS, MCN, limit; (2) NS, EB, OB ← Initialization(); with the same community ID and is regarded as a partition of the network.

Framework of HABC.
Artificial Bee Colony (ABC) algorithm [45] simulates the behavior of a honey bee swarm for solving optimization problems. The model consists of two essential components: nectar source and bee colony. A nectar source represents a possible solution to the optimization problem. The nectar amount of a nectar source corresponds to the quality of the solution. The colony of bees consists of three groups of bees: employed bees, onlookers, and scouts. Employed bees are associated with a particular nectar source which is exploited by employed bees. Onlookers search nectar source which is shared by employed bees. Scouts search new nectar source when a nectar source is abandoned. The search process of ABC is directed by cooperation and conversion of bee colony. ABC is shown as a flowchart on Figure 2.
Based on ABC, the proposed HABC introduces a new heuristic function to the search process. The framework of HABC algorithm is shown in Algorithm 1. HABC algorithm consists of three processes including setting parameters, initialization, and searching processes. Firstly, the following parameters need to be initialized: the number of nectar source (NNS), the maximum cycle number (MCN), and the limit for abandoning nectar source. Secondly, initialization of HABC generates nectar source set NS, employed bee set EB, and onlooker bee set OB. A nectar source represents a solution of community problem. The number of employed bees is equal to the number of nectar sources. Thirdly, the search processes of the employed bees, the onlooker bees, and the scout bees are repeated until MCN is reached. The initialization has two stages. In the first stage, all complete subgraphs of complex networks are extracted by Bron-Kerbosch algorithm [46]. The nodes in the same complete subgraph are assigned unique community ID. Other nodes of complex networks are randomly assigned unique community ID. We can get a nectar source with simple community structure. In the second stage, the nectar sources are generated though repeatedly running LPA [20] on the nectar source which is got in the first stage. Each nectar source gets simple community structure through initialization, and the nectar source set is diversity.

Heuristic Searching Progress.
The nectar amount of nectar source corresponds to the quality (fitness) of the associated community solution. In HABC, the fitness function is defined by the following equation: where is a partition of the network and ( ) is the conductance [17] of the network. The conductance of a network, which reflects how easily the diffusion occurs among different communities, is obtained by the following: where vol( ) is volume of community and in vol( ) is referred to as the inward volume of . vol( ) is defined as (3) and in vol( ) is defined as (4): The conductance of network is the average departure probability of the whole community. It reflects the diffusion capacity among different communities. The value of conductance is in the range from 0 to 1. High value of conductance indicates a strong community structure. The standard ABC algorithm uses (5) to generate a new nectar source position from the old one in the searching progress of bee colony, where is a random nectar source index. is different from , and , , are vectors which represent nectar sources. represents current nectar source.
is new nectar source. is a nectar source nearby . The searching progress is accomplished by the component optimization of nectar source vector.
In HABC, the component CID of nectar source is community ID. The searching progress of employed bee and onlooker is the components optimization of a nectar source according to neighbor community ID of components. So the searching function of HABC is adapted as (6), where NCI 1 , . . . , NCI are neighbor community ID of node . In the searching progress of HABC, each component is updated according neighbor community ID. CID = rand (NCI 1 , . . . , NCI ) .
The conductance of networks, which is the objective function of HABC, is average departure probability of all communities. So we can use the average agglomerate probability of two adjacent communities as the heuristic function of searching progress which is defined as follows: Equation (7) can convert agglomerate probability of a node and a community as (8), where is a node of networks.
( ) is the community in which is a node. Equation (8) is applied as heuristic function to searching progress of employed bee, (6). A new search function is proposed as (9). The neighbor community ID set of node is computed by ( ). is a community with community ID L.
For every nectar source, there is only one employed bee. The employed bee is sent into the nectar source with which it is associated. Each component of nectar source is optimized in the employed bee searching progress. Employed bee searching algorithm is proposed as in Algorithm 3. A component of the nectar source is randomly selected and is updated according to (9). The employed bee searching progress is repeated until each component is updated.
Onlooker searching progress is similar to the employed bee searching progress. But onlookers select nectar source according the probability which is calculated as (10). The probability is information about nectar source which is shared by employed bee. Searching progress of onlooker is important to distinguish HABC from other evolutionary algorithms. That is a greed searching algorithm which makes HABC fast convergence. Onlooker bee searching is proposed as Algorithm 4.
The employed bee whose nectar source has been exhausted by the bees becomes a scout. The scout searches the area surrounding hive for new nectar source. In line 4 of Algorithm 2, the vector is created. The scout bee searching runs LPA on PN times. The scout bee searching progress is proposed as Algorithm 5.

Experiment and Analysis
In order to test performance of the proposed HABC, we do different experiments on both real and artificial synthetic networks. EAGLE is a classic hierarchical clustering Mathematical Problems in Engineering 5 Input: a complex networks = ( , ), the number of nectar source NNS, the number of label propagation NLP; Output: the set of nectar source NS, the set of employed bee EB, the set of onlooker bee OB; (1) create all complete subgraphs of complex networks ; (2) the nodes in same subgraph are assigned unique community ID; (3) the other nodes are assigned different community ID; (4) the vector = {CID 1 , CID 2 , . . . , CID } is generated where CID is community ID of node ; community detection algorithm [10]. COPRA is an available method in the class of dynamic algorithms [23]. GCE is a local optimization algorithm [25] and HABC belongs to local optimization associated with globe target. Therefore, EAGLE, COPRA, and GCE are selected as compared methods. We use NMI [26] to evaluate the difference between the community structure uncovered by HABC method and the true community structure. NMI is defined as (11). It is based on defining a confusion matrix , where the rows correspond to the "real" communities and the columns correspond to the "detected" communities.
is simply the number of nodes in the real community that appear in the found community . The number of real communities is denoted by and the number of found communities is denoted by . The sum over row of matrix is denoted by ⋅ , and the sum over column is denoted by ⋅ . The value of NMI is between 0 and 1. A higher NMI indicates that the detected communities more perfectly match the true communities.

Artificial Networks.
We apply HABC to the LFR benchmark [47] which is networks with various distributions of vertex degree and community size. There are many parameters to control the created networks: the number of nodes , the average vertex degree , the maximum vertex degree max , the mixing ratio , the exponent of the power law distribution of vertexes, the exponent of the power law distribution of community size, the minimum community size min , and the maximum community size max . In our experiment, we generate six network sets. The parameters of six network sets are shown in Table 1. The default configuration of LFR benchmark is ⟨ ⟩ = 15 and max = 50. As shown in Figure 3, each graph shows a comparative result of four methods on one network set which is generated by adjusting values of mixing ratio (0 ≤ ≤ 1). When the community structure is evident (0.1 ≤ ≤ 0.3), both HABC and other three methods can accurately detect the community structure. But the higher NMI value for HABC clearly illustrates that it is superior to other three algorithms when 0.1 ≤ ≤ 0.3. As can be seen in Figure 3(d), the NMI value of HABC is higher than other algorithms when 0.4 ≤ ≤ 0.5. But the effect of CPM, EAGLE, and GCE declines obviously when > 0.4.
In order to explore the accuracy of different algorithms, the number of standard communities in the benchmark is compared with the number of detected communities by four algorithms. As shown in Figure 4, the cyan represents the number of standard communities each network sets. The black, red, green, and blue represent the number of uncovered correct communities by EAGLE, CPM, GCE, and HABC, respectively. We found that when (Mix) increases, the accuracy of these four algorithms declines. As can be seen from Figures 1 and 2, we can find inherent relationship between the accuracy of four algorithms and the number of uncovered communities. The closer the number of uncovered communities is to standard communities, the higher the NMI value is.

Real World
Networks. HABC is further tested on many real world networks which are shown in Table 2. We select Zachary's karate club network [48], dolphin network [49], American college football network [14], and books on US political network (Polbooks) [50]. For these networks, we cannot use NMI to evaluate performance of four algorithms, because the true community structures of these networks are unknown. Table 3 shows the comparison results of values of HABC, EAGLE, GCE, and CPM. The performance of HABC outperforms EAGLE, GCE, and CPM.

Conclusion
In this paper, Heuristic Artificial Bee Colony (HABC) algorithm is proposed for community detection. The proposed HABC is inspired by the intelligent behaviors of the bee swarm and contains four important sections, which are initialization, employed bee searching, onlooker searching, and scout searching stages. Combining LPA and Bron-Kerbosch algorithm, the initialization stage generates nectar Mathematical Problems in Engineering  sources to which an employed bee is assigned. Each nectar source represents a solution of community. Then the nectar source is explored by the employed bee and the onlooker.
In exploring progress, a newly defined heuristic function is applied to guide searching nectar source. A nectar source will be abandoned when it is exhausted, and the scout is sent for discovering new nectar sources. Experiments are carried out on artificial and real networks, and the performance of the HABC has been analyzed in detail. Simulation results on six artificial network sets demonstrate that HABC has better performance at community detection quality with much higher NMI value than that of EAGLE, CPM, and GCE.
The experimental results demonstrate that HABC has better performance at community detection quality aspect.