Local Community Detection in Complex Networks Based on Maximum Cliques Extension

Detecting local community structure in complex networks is an appealing problem that has attracted increasing attention in various domains. However, most of the current local community detection algorithms, on one hand, are influenced by the state of the source node and, on the other hand, cannot effectively identify the multiple communities linked with the overlapping nodes. We proposed a novel local community detection algorithmbased onmaximumclique extension called LCD-MC.Theproposedmethod firstly finds the set of all the maximum cliques containing the source node and initializes them as the starting local communities; then, it extends each unclassified local community by greedy optimization until a certain objective is satisfied; finally, the expected local communities will be obtained until all maximum cliques are assigned into a community. An empirical evaluation using both synthetic and real datasets demonstrates that our algorithm has a superior performance to some of the state-of-the-art approaches.


Introduction
In recent years, more and more research has begun to pay attention to large complex networks, such as social networks, protein interaction networks, citation networks, and WWW.Extensive researches have indicated that community structure universally exists in complex networks and the connection between nodes in a community is closer than that between communities.Meanwhile, these nodes often have similar attributes or play a similar role.Therefore, community detection has become one of the basic tasks of complex network analysis and is of important theoretical significance and real value.
The research of community detection mostly has focused on detecting all community structures in a whole network from a global viewpoint [1][2][3][4][5].However, the large scale of complex networks in many real applications is inconceivable.For example, the friend relation networks on Facebook and Twitter contain hundreds of millions of nodes [6], and the detection of the community structures in such huge complex networks will cost tremendous time and space.In addition, as the nodes and links of many complex networks are dynamically evolving [7,8], it is often hard for us to acquire the complete network information, further increasing the difficulty in global community detection.Therefore, many scholars have begun to focus on local community detection of complex networks.
Different from the global community detection which classifies a total complex network, local community detection is only to inquire the community structure where a designated node (source node) is located in a network.A complex network is essentially divided into two parts, namely, the community where a designated node is located and the rest part.Furthermore, the local community where the node is located has a close internal connection within the community but a relatively loose relation with the outside.Local community detection need not know all information about a complex network in advance.It starts from a node, gradually extends from the node, and gradually acquires the local information around the current community during the extension process.The representative algorithms for local community detection include [9][10][11][12][13].
However, most of available local community detection algorithms have two restrictions: firstly, the method including direct start from a source node, continuous selection of the best nodes from candidate ones by greedy algorithm, and adding them into a local community till the local community detected satisfies all specified conditions, which makes it easy to deviate from the real local community, thus reducing the accuracy of local community detection; secondly, in this way, finally, only a unique local community structure can be obtained and when the source node is an overlapped (hub) node connecting multiple communities, it is unable to obtain all local communities.
To solve the above two problems, we propose a local community detection algorithm based on maximum cliques extension (LCD-MC), which includes finding maximum cliques and extending local communities.Its main advantages are shown as below.
(i) Instead of taking the source node as input directly, finding all maximum cliques containing the source node is made as the start of local community extension, thus increasing the stability of local community detection.
(ii) The flexibility in identifying overlapped nodeinvolved local communities is realized by extending all maximum cliques satisfying certain conditions, respectively.
(iii) The experimental results on both synthetic and real networks demonstrate that, compared with the stateof-the-art local community detection algorithms, LCD-MC, on one side, can obtain better local community quality and, on the other side, can effectively identify multiple local community structures connected with the overlapped node.

Related Work
Let a complex network  = (, ), where  represents the node set,  the edge set, and  and  number of nodes and number of edges in the network, respectively.Different from global algorithms which divide  into a number of closely connected community structures, local community detection designates a node V 0 and explores the community structures in close relation with the node V 0 .
Clauset firstly proposed the formal definition of local community detection [9].Assume that we have known a community structure  (initially,  contains only the node V 0 ) composed of some nodes; set  is connected with nodes in  but does not belong to the node set of community .The process of local community detection is to continuously select nodes from  and add them to the current community  till the predefined local modularity  reaches the maximum value.To define the objective function , Clauset also defined the boundary  of community , that is, node sets in  that have at least one node connecting with , as shown in Figure 1.
Assume that the given , , and  are known;  is defined by where where (, ) = 1 if both node V  and node V  exist in community ; otherwise, (, ) = 0.And (, ) = 1 means that only one, either node V  or node V  , exists in community .In addition to a different objective function, LWP algorithm also includes addition and deletion operations, making it possible to add into or delete from community  the nodes that can increase  value.Besides, LWP algorithm need not predefine the size of community  in advance.LMF [11] is a local-global community detection algorithm, which proposes a fitness function, as shown in the following: where   in and   ex refer to the sums of the internal node degrees and external node degrees of community , respectively, and  is a resolution parameter used for controlling the size of local community.This algorithm is similar to LWP algorithm in that, according to the given , it achieves the objective of making the fitness function  reach the maximum local value by addition and deletion.
Wu et al. [12] proposed a local community detection algorithm based on link similarity (LS).The algorithm firstly defines the similarity between a single node and a local community and then carries out local community detection in a decrease sequence of the calculated similarity values.In addition, this algorithm's search process is composed of greedy clustering, optimization, and trimming.
Chen et al. [13] proposed a local community detection algorithm based on local degree center node (LMD).Though the objective of local community detection is to find the community structure where the given node is located, it was held by the authors that, for some given nodes, the detection directly starting from V 0 may not necessarily obtain ideal results.Therefore, to increase the robustness of local community detection, instead of starting the search from the given node, LMD starts from finding a local degree center node nearest to the given node and then extends local communities starting from this local degree center node with , , and  as objective functions.Here, the degree of the local degree center node is greater than or equal to that of all neighbor nodes.

Algorithm
In this section, we introduce the proposed local community detection algorithm based on maximum cliques extension (LCD-MC), which is mainly composed of two parts, namely, algorithm FindMC for finding the maximum cliques of a node and algorithm LCD for local community extension corresponding to the maximum cliques.

FindMC Algorithm for Finding Maximum Cliques
Definition 1.Given an undirected graph  = (, ), if  ⊆ , for random , V ⊆ , and (, V) ⊆ , then  is called 's complete subgraph.Definition 2. 's complete subgraph  is 's maximum clique if and only if  is not included in the 's larger complete subgraphs.
In FindMC, we mainly adopt the concept of the Bron-Kerbosch algorithm [19] to find the maximum cliques where the given source node V 0 is located.That means mainly using three sets, namely, , , and , whose functions are, respectively, explained as follows: (1)  is used for storing the already acquired nodes forming the current clique structures; (2)  is used for storing all the candidate nodes edgeconnected with the nodes in , which can be used to extend all the clique structures already found and will be added to ; (3)  is used for storing the candidate nodes that have been used in .If a certain node in  can cluster with nodes in  to form a larger clique structure, this node will be added into both set  and set  but deleted from set .
FindMC algorithm starts from node V 0 and constructs recursively a search tree as the nodes in set  are continuously added to set .On this search tree, each internal node corresponds to a state or is called a candidate clique structure, while a leaf node represents a corresponding maximum clique.
Algorithm 1 gives the pseudocode of FindMC algorithm for finding maximum clique.In the initialization phase, FindMC stores only node V 0 in set  and only V 0 's neighbours in set .This is because the nodes in the maximum clique where node V 0 is located can only be V 0 's neighbours and any search outside this will be invalid (line 01).In addition, to protect the updating of set , the nodes in  are copied to nodeList to be extended, where the nodes are arranged in a degree-decreasing sequence.This is because that node with a larger degree is easier to form a maximum clique.Therefore, they are given a priority so as to improve the algorithm efficiency (lines 02∼03).After the initialization is completed, FindMC algorithm, at its second phase, will execute the conventional Bron-Kerbosch algorithm recursively according to sets , , and  and store the acquired maximum cliques in MCS (maximum clique set).
Algorithm 2 is the conventional Bron-Kerbosch algorithm.In this algorithm, if both sets  and  are null sets, then the nodes in  satisfy the condition for maximum clique and will be added to the maximum clique set MCS (lines 01-03).Otherwise, it will select a pivot node  from sets  and  and continue to make self-recursive call for each node V except  in .When all subprograms of V are ended, V will be deleted from set  and added into set  (lines 04-09).

LCD Algorithm for Extending Local Community Structure.
A maximum clique is a very closely connected node set; however, the requirement on full connection is too strict to the definition of community structure.Therefore, the second step of LCD-MC is to further extend the local community structure according to the MCS obtained in Algorithm 1, as shown in Algorithm 3 (LCD).LCD algorithm carries out the following operation for each unclassified maximum clique (no node in such clique has been allocated to any local community): (1) initialization, which will add each node in the current clique to the local community LC and add all nodes connected with but not belonging to LC into set U (lines 04-08); (2) extending local community LC, which will select a node  from  that can make the greatest increase of the objective function, add it to the local community LC, and update the corresponding nodes in  till there is not any node that can increase the value of the objective value (lines 09-21); (3) finally, return the required local community set LCS, each community of which containing the initial node V 0 (line 23).
On line 12 of Algorithm 3, function CalculateDeltaValue () calculates the incremental value of the objective function after node  is added to the current local community.Here, the objective function can be any objective function mentioned in Section 2. To improve the algorithm efficiency, the incremental value of objective function can be calculated according to the assumed change value generated after the addition of node  into the local community.Take the objective function  in (2) as an example.Assume the number of edges in the current local community LC is  in , the number of the edges between LC and  is  out , where  represents the delta value of edge number in LC due to the addition of , and  represents the number of edges of nodes in LC connecting  after the addition of  into LC; then, the delta value of  is

Evaluation
In this section, we make the verification comparison of LCD-MC proposed in this paper with several representative local community detection algorithms.We conduct all the experiments on a Pentium Core2 Duo 2.8 GHz PC with 2 GBytes of main memory, running on Windows 7. We implement our algorithm in C#, using Microsoft Visual Studio 2008.

Experimental Setup.
We compared LCD-MC with several representative local community detection algorithms including Clauset [9], LWP [10], LS [12], and LMD [13] We firstly compared the quality of local communities found by these algorithms.As these algorithms, except LCD-MC, have no ability to identify local communities connected by overlapping nodes, in experimental data, we selected LFR benchmarks [21,22] with nonoverlapped structure and several labeled real networks, whose information is as shown in Tables 1 and 2. The meaning of the parameters in Table 1 is described as follows: , the number of nodes; , the average degree; max , the maximum degree; min , the minimum for the community sizes; max , maximum for the community sizes; mu, a mixing parameter, the probability of nodes connected with nodes of external community.It should be pointed out that, to find the only local community structure, in the second step of LCD-MC, the maximum clique with the most number of nodes was selected for extension.
To evaluate the quality of the local communities generated by various methods, we adopt F-Measure score (FM) and normalized mutual information (NMI) [23] as the evaluation indexes.
FM is a commonly used measure for community detection algorithms.Assume  is the set of node pairs (, ), where nodes  and  belong to the same classes in the ground truth, and  is the set of node pairs that belong to the same communities generated by an algorithm.Then FM is computed from both the precision and the recall synthetically: where precision and recall are written as (6).Consider that NMI is another widely used criterion for measuring the performance of community detection algorithms.Formally, the measurement of NMI can be defined as  experimental results, compared with LMD algorithm which starts the detection from local degree center point, the LCD-MC algorithm that starts from node maximum clique is more effective.
(3) As a whole, LCD-MC achieves the best results of all four evaluation indexes.On both small community networks (S1 and S3) with a mixed parameter mu smaller than 0.6 and large community networks (S2 and S4) with a mixed parameter mu smaller than 0.5, LCD-MC could find out the local community structure of each node almost fully correctly.On a highly mixed network, for example, with a mu of 0.8 or 0.9, neither LCD-MC nor the other algorithms could obtain ideal results.This just conforms to the real condition that network community structure is not distinct.
To sum up, LCD-MC can find local communities with better quality on synthetic networks compared with the other representative local community detection algorithms.

Real Networks.
To further verify the performance of LCD-MC, we compare it with the other algorithms on real networks and show the comparison results in Table 3.The bold digits are the maximum value of local community quality of each algorithm for the related evaluation index.Of them, precision and recall only reflect one aspect of algorithm performance, while FM and NMI take algorithm performance into a comprehensive consideration and, therefore,       network.The configuration parameters of the LFR synthetic network are  = 100,  = 10, max  = 20, min  = 20, max  = 30, mu = 0.1, on = 4, and om = 2, where on represents the number of the overlapping nodes, and om represents the number of memberships of the overlapping nodes.The corresponding network layout is as shown in Figure 6.Table 4 shows the corresponding community distribution, in which 2, 28, 37, and 56 are overlapped nodes and each node connects two communities.
As Clauset, LWP, LS, and LMD algorithms can obtain only one local community from each node; we selected Clauset as their representative in comparison with LCD-MC algorithm.The results of local community detection on the network depicted in Figure 6 by Clauset and LCD-MC are shown in Tables 5 and 6, respectively.It can be seen that Clauset only found out one local community from each overlapped node, because Clauset algorithm can only extend the current only local community according to objective function .LCD-MC effectively found out two local communities from each overlapped node.This is because, in an initialization, LCD-MC uses maximum cliques as candidate local communities instead of only one node.Take node 2 as an example.In initialization, its maximum clique sets included {2, 79, 93}, {2, 79, 99}, and {2, 83, 94}.Of them, the maximum clique {2, 79, 93} was used as the initial local community for extension and, as a result, community 4 was found, as shown in Table 4.At this stage, the maximum clique {2, 79, 99} was already included in this local community, but the maximum clique {2, 83, 94} was still outside it.Therefore, LCD-MC started from {2, 83, 94} to continue the search for new local community and, then, found two local community structures connected with node 2. Therefore, LCD-MC has the ability to find out multiple local communities connected by overlapped nodes.

Conclusion
In this paper, we propose a novel local community detection algorithm for large complex networks based on maximum cliques extension (LCD-MC).This algorithm firstly adopts  the idea of the Bron-Kerbosch algorithm to find out all maximum cliques containing the source node in the network by recursion, then, takes an arbitrary maximum clique satisfying certain conditions as the initial local community, and, by continuous exploring the neighbor area around the current local community, continuously adds conforming nodes to the local community structure till there is not any conforming node.LCD-MC algorithm is most characterized by starting the extension from the maximum clique of a given node instead of starting directly from the given node.In this way, it avoids the deviation of community extension and can identify multiple local community structures connected by an overlapped node.We compared LCD-MC with some representative algorithms such as Clauset, LWP, LS, and LMD on various synthetic and real networks.The experimental results demonstrate that LCD-MC algorithm can find local communities with better quality on both synthetic and real networks.Moreover, the experimental results on a synthetic network with known overlapped community structures indicate that the LCD-MC algorithm has the ability to identify multiple local communities where the overlapped node is located.
[10]larger the  value is, the better the local community structure detected will be.Initially,  = {V 0 }, and Clauset used a greedy optimization algorithm for the local modularity  to find the local community structure where the designated V 0 is located.Similar to Clauset algorithm, Luo et al. proposed LWP algorithm[10], in which  is replaced by a new local modularity , as shown in the following: ) ∈  and either node V  or node V  exists in ; otherwise,   = 0.When either node V  or node V  exists in  and the other node exists in , then (, ) = 1 and, otherwise, (, ) = 0.
) 3.3.Time Complexity.Let us analyze the time complexity of LCD-MC from its two steps, respectively.As the worst time [20]lexity of Bron-Kerbosch algorithm is (3 /3 )[20], but LCD-MC only needs to find cliques containing the initial node, it essentially operates Bron-Kerbosch algorithm in the subgraphs formed by the initial node and its neighbors.In this way, the worst time complexity of FindMC, the first step in LCD-MC, is (3 /3 ), where  is the degree of the initial node, and that of LCD, the second step in LCD-MC, is (|| 2 ), where  is the local community detected.It should be noted that either  in FindMC or || in LCD is far smaller than the overall size of the network. Terefore, LCD-MC indicates satisfied time efficiency.

Table 1 :
Information of benchmark networks.
. Of them, Clauset was the first to propose local community detection, and, therefore, we take it as the basic algorithm.Both

Table 2 :
Information of real networks.
is the number of nodes both in the th class and the th cluster,  and  are the number of classes and clusters, respectively, and  . and  .are the number of nodes in the th class and the th cluster, respectively.
=1   log (  / . . ) ∑  =1  .log(./)+∑  =1  .log((./)),(7)where is the confusion matrix,(2) Comparison of LCD-MC and LMD with Clauset.It can be seen that, in terms of the four indexes, both these algorithms can achieve better results than those by Clauset to various extents, indicating that there are really certain restrictions in starting local community detection from the initial node.Viewing from the

Table 3 :
Results on real networks.

Table 4 :
The real community of the synthetic community.

Table 5 :
Local community result of Clauset on the synthetic network.

Table 6 :
Local community result of LCD-MC on the synthetic network.