In order to discover the structure of local community more effectively, this paper puts forward a new local community detection algorithm based on minimal cluster. Most of the local community detection algorithms begin from one node. The agglomeration ability of a single node must be less than multiple nodes, so the beginning of the community extension of the algorithm in this paper is no longer from the initial node only but from a node cluster containing this initial node and nodes in the cluster are relatively densely connected with each other. The algorithm mainly includes two phases. First it detects the minimal cluster and then finds the local community extended from the minimal cluster. Experimental results show that the quality of the local community detected by our algorithm is much better than other algorithms no matter in real networks or in simulated networks.

Community detection on complex networks has been a hot research field. Recently, a large number of algorithms for studying the global structure of the network are proposed, such as the modularity optimization algorithms [

Most of local community detection algorithms are based on the above-mentioned process. The definition of local community detection is to find the local community structure from one or more nodes, but most of the existing local community detection algorithms, including Clauset [

The problem of local community detection is proposed by Clauset [

Definition of local community.

Local community detection problem is to start from a preselected source node. It adds the node meeting the conditions in

At present, many local community detection algorithms have been proposed. We introduce two representative local community detection algorithms.

The definition of local community modularity is as follows:

The local community detection process of Clauset algorithm is similar to that of web crawler algorithm. First, Clauset algorithm starts from an initial node

Given an undirected and unweighted graph

In the incremental step, the node selected from

The complexity of these two algorithms is

Generally, a network can be described by a graph

It is a set of nodes connected directly to a single node or a community.

For node

For community

The number of shared neighbors for nodes

The minimal cluster detection is the key of the algorithm. The minimal cluster is the set of nodes that connect to the initial node most closely. We introduce a method proposed in [

(1)

(2) for

(3) if

(4) Let

(5) end if

(6) end for

(7) return

In the network

The process of finding the minimal cluster is illustrated by an example shown in Figure

The discovery of minimal cluster.

First of all, we use Algorithm

(01) Let

(02) Calculate

(03) While

(04) foreach

(05) if Δ

(06) Let

(07) End if

(08) End for

(09) Update

(10) Until no node can be added into LC

(11) Return LC

In the algorithm, we still use

The complexity of the NewLCD algorithm is almost the same as the Clauset algorithm. The NewLCD algorithm uses extra time of finding minimal cluster which is linear to the degree of the initial node

In this section, the NewLCD algorithm is compared with several representative local community detection algorithms, namely, LWP, LS, and Clauset, to verify its performance. The experimental environment is the following: Intel (R) Core (TM) i5-2400 CPU @ 3.10 GHz; memory 2 G; operating system: Windows 7; programming language: C#.Net.

The dataset of LFR benchmark networks and three real network datasets are used in the experiments.

(1) LFR benchmark networks [

LFR benchmark network information.

Network ID | | | | | | mu |
---|---|---|---|---|---|---|

B1 | 1000 | 20 | 50 | 10 | 50 | 0.1~0.9 |

B2 | 1000 | 20 | 50 | 20 | 100 | 0.1~0.9 |

B3 | 5000 | 20 | 50 | 10 | 50 | 0.1~0.9 |

B4 | 5000 | 20 | 50 | 20 | 100 | 0.1~0.9 |

(2) We choose three real networks including Zachary’s Karate club network (Karate), American college Football network (Football), and American political books network (Polbooks). The detailed information is shown in Table

Real network information.

Network ID | Name | Number of nodes | Number of edges | Reference |
---|---|---|---|---|

R1 | Karate | 34 | 78 | [ |

R2 | Football | 115 | 613 | [ |

R3 | Polbooks | 105 | 441 | [ |

Because of the large size of the synthetic networks, 50 representative nodes are randomly selected from each group as the initial node and all the experimental results are averaged as the final result. Figures

Comparison of B1.

Comparison of B2.

Comparison of B3.

Comparison of B4.

(1) LS and LWP algorithms have higher Precision compared with Clauset algorithm. But their Recall value is lower than Clauset algorithm. LS and LWP algorithms cannot have both high accuracy and Recall. Their comprehensive effect may be not higher than the benchmark algorithm Clauset.

(2) All these three indicators of NewLCD algorithm are significantly higher than Clauset algorithm, which shows that the initial state indeed affects the results of local community detection algorithm, and starting from the minimal cluster is better than a single node.

(3) Overall, NewLCD algorithm is the best. On the four groups of networks, when the parameter mu is less than 0.5, NewLCD algorithm can find almost all the local communities where each node is located. In high hybrid networks, when the value of mu is greater than 0.8, the local community detection effect of NewLCD algorithm is not good, just like other algorithms. The main reason is that the community structure of the network is not obvious.

In summary, NewLCD algorithm can detect better local communities on the artificial networks than the other three local community detection algorithms.

In order to further verify the effectiveness of NewLCD algorithm, we compare it with three other algorithms on three real networks (Karate, Football, and Polbooks). These three networks are often used to verify the effectiveness of algorithms on complex networks. The experimental results are shown in Table

The comparison of algorithms on the real networks.

Dataset | Evaluation criteria | Clauset | LS | LWP | NewLCD |
---|---|---|---|---|---|

Karate | Precision | 0.927 | | 0.884 | 0.934 |

Recall | 0.526 | 0.329 | 0.529 | | |

| 0.671 | 0.494 | 0.662 | | |

| |||||

Football | Precision | 0.803 | | 0.680 | 0.880 |

Recall | 0.878 | 0.732 | 0.712 | | |

| 0.839 | 0.824 | 0.696 | | |

| |||||

Polbooks | Precision | 0.741 | 0.879 | 0.770 | |

Recall | 0.442 | 0.182 | 0.477 | | |

| 0.554 | 0.301 | 0.589 | |

Karate network is a classic interpersonal relationship network of sociology. It reflects the relationship between managers and trainees in the club. The network is from a Karate club in an American university. The club’s administrator and instructor have different opinions on whether to raise the club fee. As a result, the club splits into two independent small clubs. Since the structure of Karate network is simple and it reflects the real world, many community detection algorithms use it as the standard experimental dataset to verify the quality of the community. In order to further verify the effectiveness of the algorithm, we do a further experiment on Karate. Figure

The real community structure of Karate.

The result of NewLCD algorithm on Karate.

The result of Clauset algorithm on Karate.

The contrast results of each node in Karate.

This paper proposes a new local community detection algorithm based on minimal cluster—NewLCD. This algorithm mainly consists of two parts. The first part is to find the initial minimal cluster for local community expansion. The second part is to add nodes from the neighbor node set which meet the local community condition into the local community. We compare the improved algorithm with other three local community detection algorithms on the real and artificial networks. The experimental results show that the proposed algorithm can find the local community structure more effectively than other algorithms.

The authors declare that there is no conflict of interests regarding the publication of this paper. They declare that they do not have any commercial or associative interest that represents a conflict of interests in connection with the work submitted.

This work was supported by the National Natural Science Foundation of China (no. 61572505, no. 51404258, and no. 61402482), the National High Technology Research and Development Program of China (no. 2012AA011004), China Postdoctoral Science Foundation (no. 2015T80555), and Jiangsu Planned Projects for Postdoctoral Research Funds (no. 1501012A).