Incremental Density-Based Link Clustering Algorithm for Community Detection in Dynamic Networks

Community detection in complex networks has become a research hotspot in recent years. However, most of the existing community detection algorithms are designed for the static networks; namely, the connections between the nodes are invariable. In this paper, we propose an incremental density-based link clustering algorithm for community detection in dynamic networks, iDBLINK. This algorithm is an extended version of DBLINK which is proposed in our previous work. It can update the local link community structure in the current moment through the change of similarity between the edges at the adjacent moments, which includes the creation, growth, merging, deletion, contraction, and division of link communities. Extensive experimental results demonstrate that iDBLINKnot only has a great time efficiency, but alsomaintains a high quality community detection performance when the network topology is changing.


Introduction
In the real world, many systems can be abstracted as a network, such as the Internet, interpersonal relationship networks, disease transmission networks, and scientist cooperation networks.A large number of studies have shown that community structure exists in complex networks.Nodes are connected closely within the community, while the connections between the communities are relatively sparse.Community structure plays an increasing important role in the complex networks.It can help us understand the function of the complex networks, find the potential law, and predict the behavior of complex networks [1].For example, community structure represents the group with the same interests and hobbies in the social networks; in the field of Web, Web pages have more links within the same community; in the literature network, literatures within the same community have related research topic.In a word, it has important theoretical significance and practical value to study the community structure of complex networks.
At present, great progress has been achieved in the research on community detection of complex networks and many representative algorithms have been put forward.Hierarchy centered algorithms use the idea of hierarchical clustering, so that the original complex networks are built into a hierarchical community structure.These algorithms include clustering and classification methods, whose representative algorithms are GN [2] and P-SNCD [3].Node centered algorithms make strict demands on every node of the complex networks to carry on community detection which is represented by CPM [4] algorithm.Link centered algorithms include LINK [5] and DBLINK [6]; DBLINK algorithm is proposed in our previous work.The framework in [7] gives the basis for multidimensional network analysis, and the method in [8] also uses a partition of the links of a network in order to uncover its community structure.Group centered algorithms define a community of network and find all the communities which meet this definition in the network.There are many typical algorithms, such as LFM [9], DOCA [10], and OSLOM [11].Network centered algorithms use certain standard (such as modularity [12]) to divide the entire network, and, by finding the best partition to represent the community structure, CNM [13] and AdClust [14] are good examples.Other algorithms include label propagation algorithms [15,16], algorithms based on information dissemination [17,18], and so on.
Most existing community detection algorithms are aimed at static networks; namely, the connections between the nodes in the network are invariable.However, networks are not the case in many practical applications which are dynamic.Taking an example of microblog and other social networks, from a macro perspective, the number of users will continue to grow, making the network scale increase.And from the microscopic point of view the users can add new friends or delete existing good friend relationship.These changes will result in changes of the entire network community structure, such as the formation of new communities, the disappearance of old communities, and the merger or division of communities, as well as the larger or smaller community size.Although LabelRankT [19], QCA [20], DENGRAPH [21], AFOCS [22], and other incremental community algorithms have been proposed in succession, community detection in dynamic networks still faces enormous challenges due to the rapid and unpredictable changes.
This paper proposes an incremental density-based link clustering algorithm for community detection in dynamic networks, iDBLINK, which in essence is an extension of DBLINK algorithm.It can update the current affected local community structure according to different network topology changes.The advantages of iDBLINK mainly embody in the following aspects: (1) iDBLINK can update the link community structure in the current network quickly and efficiently through the intensity changes of similarity relation between the edges and link community structure a moment before.
(2) iDBLINK has inherited the advantage of DBLINK algorithm which can effectively identify overlapping community structure.It only needs to consider the changes of the disjoint link communities, while the overlapping node communities can be naturally reflected according to the link communities.
(3) Experimental results in the simulative and real networks show that DBLINK can find overlapping community structure with higher quality compared with some other representative community detection algorithms, while iDBLINK algorithm can be better applied to the dynamic network environment.

Static Network Community Detection DBLINK
Usually, a complex network can be shown by a graph  = (, ), where  = {V 1 , V 2 , . . ., V  } represents node set and  = { 1 ,  2 , . . .,   } represents edge set;  and , respectively, are the number of nodes and edges in a network.Any edge  = (, V) means the two nodes connecting  are  and V.
Definition 1.If two edges share a common node, then there is similarity between them.The similarity between Edge  = (, V) and edge  = (, ) is defined as Γ(V) is the node set consisting of the node V itself and all its neighbor nodes, so Γ(V) = { | (V, ) ∈ } ∪ {V}.Definition 2. () is the neighbor of  = (, V) which refers to a collection of edges, respectively, connected with the node  or V while not including edge : wherein (V) and () denote the neighbor node sets of node V and node .
Density connection has symmetry.If edges  and  are density connected, so are edges  and .Definition 10.Given parameters  and , link community LC refers to a nonempty edge subset which satisfies the following two conditions: (1) connection: any two edges are connected in LC; that is, ∀,  ∈ LC: Connect(, ); (2) maximum: ∀,  ∈ :  ∈ LC ∧ Reach(, ) ⇒  ∈ LC.
According to Definition 10, the first step of DBLINK algorithm is to find all the link communities in a network meeting connection and maximum.If one edge does not belong to any link community, the edge is called an isolated edge.Any two edges within the link community are density connected, and meet the symmetrical relationship; therefore, given parameters  and , DBLINK can start from any edge when detecting link communities and the result is uniquely determined.
Definition 11.Node community  means all nodes connected by edges in the corresponding link community LC: Definition 12.If there are two edges connecting the same node V, respectively, belonging to different link communities, then node V is called overlapping node, so The second step of DBLINK algorithm is to convert the link community set LCS to the corresponding node community set CS based on Definitions 11 and 12.As shown in Figure 1, we can get two link communities given parameters  = 0.5 and  = 4, and also edge (1, 7) is an isolated edge which do not belong to any link community.As a result, the final node communities are {0, 1, 2, 3, 4, 5} and {5, 6, 7, 8, 9}, where node 5 is an overlapping node.We have done a lot of experiments in the literature [6] and proved that the DBLINK algorithm not only gives consideration to the time efficiency, but also finds the overlapping community structure with high quality in the network.

Dynamic Community Detection iDBLINK
An incremental density-based link clustering algorithm iDBLINK for community detection in dynamic networks has been proposed in order to effectively track and analyze the community structure of temporal dynamic network.This algorithm is an extended version of DBLINK which can effectively update the current community structure according to the change of the community structure and network topology a moment before.

Problem Description.
It can be assumed that a complex network is described by an undirected and unweighted graph  = (, ), where  represents node set and  is edge set. = { 1 ,  2 , . . .,   } means the community structure of the networks and ∀  ∈  is a nonempty subset of .
0 = ( 0 ,  0 ) represents the original network,   = (  ,   ) denotes the network snapshot at time , Δ  and Δ  , respectively, refer to the set of nodes and edges joined or deleted at time , and Δ  = (Δ  , Δ  ) means the change of the whole network.The network snapshot at time  + 1 is  +1 =   ∪ Δ  .A dynamic network model g is a sequence of the network snapshots at different time; thus g = { 0 ,  1 ,  2 , . ..}.
Given a set of dynamic network model g = { 0 ,  1 ,  2 , . . .,   },  0 represents the original network and  1 ,  2 , . . .,   is the network snapshots according to Δ 1 , Δ 2 , . . ., Δ  .Incremental community detection intends to design an adaptive algorithm to effectively update the current network community structure according to the previous moment network snapshots and the changes and also track the evolution of the network, just as shown in Figure 2.
The mission in the first phase of incremental community detection is to cluster the original network  0 , so as to find the community structure and lay the foundation for community updating of the network snapshots.We select DBLINK as the

Overlapping node community
Network topology community detection algorithm during the initial stage.This algorithm has advantages of high time efficiency, as well as the ability to find high quality overlapping community structure.We will prove it in the experiments.

Algorithm Framework.
Figure 3 shows the iDBLINK algorithm framework and it can be seen that iDBLINK focuses on the change of link communities.This has two advantages: first, it reduces the factors needed to be considered because link communities are nonoverlapping; second, it naturally updates changes of overlapping node community according to the variation of nonoverlapping link community.

Network Topology Changes.
iDBLINK is an extension of DBLINK in essence and the deciding factor of link community structure is the similarity between edges; therefore, we need to further translate the changes of network topology into the changes of similarity.For instance, the enhancement of similarity relation between the two edges may make one edge enter into the  neighborhood of another edge and make it become a core edge, resulting in the growth of link communities or the consolidation of multiple link communities.
Assuming that edge  = (, V) is a change of the network topology, namely, edge  needs to be added or deleted, then the changes of the similarity between the network edges caused by edge  are divided into the following several situations: (1) The similarity between edge  itself and its neighbors will change.When edge  is the new added edge, the similarity between itself and its neighbors starts from scratch; when edge  is to be deleted, the similarity between them is just the opposite.
(3) If edge  1 = (, V 1 ) and edge  2 = (, V 2 ) are the two edges, respectively, connected to node , similarity between  1 and  2 will not change.This is because Γ(V 1 ) and Γ(V 2 ) which are determinants of similarity between  1 and  2 have no changes, and the value of The two edges connected to node V appear at the same situation.(4) If two edges do not connect to each of node  or V, namely, , the similarity between edge  1 and  2 has no changes also because of the unchanged Γ(V 1 ) and Γ(V 2 ).
In conclusion, if edge  is a new adding edge or is already removed, edge  needs to create or cancel the similarity with its neighbors.Assume that edge  is the edge which is connected to node  or V.The similarity between edge  and the edges associated with  will also change.In the actual calculation, some unnecessary repeated computation can be avoided considering the symmetry of similarity.
We first intuitively analyze the effects of link community structure by the change of similarity between two edges in the following network.Assume that the similarity of two edges  and  is  () (, ) <  at  moment and their connection strengthens at  + 1 moment, just as  () (, ) <  (+1) (, ).If  (+1) (, ) ≥ , one or two of edges  and  may be the core edge and  neighborhood will also be changed: (1) increasing an existing community; (2) forming a new community; (3) building bridges between the two existing nonoverlapping communities and merging them.On the contrary, if the similarity between the two edges becomes smaller as  () (, ) >  (+1) (, ) and  (+1) (, ) < ,  neighborhood of the two edges will be contracted.The two edges may not be density connected which will cause the division or disappearance of communities.
We will divide the changes of link communities into positive changes and negative changes.Positive changes mean the growth, merge, or creation of link communities; negative changes are the division, recession, or disappearance of link communities.

Update Positive Changes.
We assume that during the two adjacent moments  and +1, the similarity of edge  and edge  is  () (, ) < , and  (+1) (, ) ≥ ; namely, the similarity between edges  and  gets strengthened.This will probably cause the several circumstances in Figure 4.
(1) New Core Edge.At least one of edge  and edge  is core edge.iDBLINK constructs a new neighborhood relation using the new core edge  (or ) and its neighbors (including  ) and updates accordingly by examining whether edge  or  has been assigned a community label.
(2) New Boundary Edge.An original noise edge  becomes a boundary edge, namely, direct density reach to a core edge ; iDBLINK will assign edge  to the community of edge .
(3) New Noise Edge.The addition of a new edge does not meet the above two cases; then the link community structure remains unchanged.
From Figure 4, we can see that only when producing new core edge or new boundary edge, it updates the corresponding link communities.The following is mainly about the creation, growth, and merging of communities.
(1) Creation of New Link Communities.When a noise edge (not belonging to any existing link communities) turns into the core status, a new link community will come into being and iDBLINK will assign a new community label to this core edge and its corresponding  neighborhood.
(2) Growth of Link Communities.If edge  contains a new  neighborhood at  + 1 moment and has been allocated a link community label at  moment, then  neighborhood of edge  will be absorbed by the link community of this label, so that link communities will get growing.
(3) Merging of Link Communities.Suppose edge  becomes a core edge at  + 1 moment and the edges in the field of  neighborhood belong to different link communities.iDBLINK incorporates these different link communities and assigns them a new community label.It is worth mentioning that the generation of a new core edge can only cause the merge of  link communities at most.

Update Negative Changes.
We assume that during the two adjacent moments  and  + 1, the similarity of edge  and edge  is  () (, ) ≥ , and  (+1) (, ) < ; namely, the similarity between edges  and  gets weakened.This will probably cause the several circumstances in Figure 5.
(1) Losing the Core Edge.Edge  is a core edge at  moment, while at  + 1 moment it loses the core attribute because of its lower similarity with edge ; namely, |  ()| < .
(2) Losing Connection with the Core Edge.Edge  is the boundary of edge  at  moment, but it loses relationship of density direct reach with  at  + 1 moment.
(3) Losing Connection between Two Core Edges.Edge  and edge are all core edges at  and  + 1 moment, and also they are density connecting at  moment, while losing it at  + 1 moment.
(4) Losing Connecting Relationship.Two connected edges  and  are not core edges at  moment, and they lose connection at  + 1 moment.At this time link communities have no change.
For the above four conditions except for the last one, iDBLINK will calculate the  neighborhood of affected core edges again, and detect the distribution of link communities within  neighborhood.There are three types of the corresponding operations as follows.
(1) Deletion of Link Communities.Assuming that edge  is an edge losing core attributes, there is no other core edges existing in  neighborhood; then the original link community where edge  is in will be deleted.The original members in the link community set the status as noise or boundary according to the specific situation.
(2) Contraction of Link Communities.Suppose edge  is in the  neighborhood of core edge  at  moment; it loses connection with edge  at +1 moment, but it has not changed the core attribute of edge .If so, edge  will be deleted from the link community of edge , and link community size becomes smaller, just as link community contraction.
(3) Division of Link Communities.Assume that edge  and edge  are all core edges at  and  + 1 moment, and density  connected at  moment.However, they lose this relationship at  + 1 moment, so link communities will be split at this time.

Node Community Update.
Although iDBLINK only cares about updating link communities in dynamic network, and link community structure is nonoverlapping, the corresponding overlapping node communities can be naturally reflected according to link communities.All nodes connected by edges within the same link communities belong to the same node communities, while intersection nodes that belong to different link communities are overlapping nodes.As shown in Figure 6, (a) embodies the link communities and node communities in the network at  − 1 moment.At moment , an edge is added which makes the formation of new link communities, and naturally four yellow nodes become overlapping nodes connecting two communities.

Experimental Results
This section is to verify the performance of the iDBLINK algorithm.We will, respectively, compare iDBLINK with static community detection algorithms DBLINK, FOCS, and DENGRAPH and dynamic community detection algorithms AFOCS and DENGRAPH-IO.The experimental environment was as follows: processor Inter(R) Core(TM)2 Duo 2.8 GHz PC, internal storage 2 G, operating system Windows 7, and programming language C#.Net.

Experiment Settings.
We plan to verify the performance of iDBLINK algorithm with four groups of LFR simulation networks and two real networks.The simulation network configurations of each group are shown in Table 1.
is the number of nodes. is the average degree of nodes in the network and max  represents the maximum degree.min  is the number of nodes in the minimum community and max  is the number of nodes in the maximum community.Mixing parameter mu represents the probability of nodes connected with external community.The greater the value of mu is, the more difficult the community detection will become.The community structure of LFR benchmark network is already known, so we can evaluate the quality of the community finding by different algorithms.
We selected Amazon and DBLP [23] as real networks, which are large complex networks with true value of community structure.
DBLP is a scientist cooperation network that widely applied.Scientists are as nodes in the network and if two scientists at least cooperate to publish one article, then they establish an edge between them.The network contains 317080 nodes and 104986 edges.
Amazon is goods connecting network, where goods are regarded as the nodes and if two products are frequently bought at the same time, they establish a link between the two nodes.The network has 334863 nodes and 925872 edges.
Both LFR benchmark networks and real networks and their dynamic networks are generated as follows: for each network, we first select 90% of the edges as the original network  0 , and, in addition to the original network, we generate ten networks  1 , . . .,  10 to make up dynamic networks.Then every network removes 5% of the entire network edges and adds the same amount of new edges, which means dynamic networks always keep the number of edges invariant.

Evaluation Standards.
For the LFR benchmark networks, we adopt the normalized mutual information (NMI) given in [24] as the evaluation standard to compare all kinds of algorithms, since the true value of community structure is known.
We suppose that the real community set in the network is ; community affiliation of node  can be represented as binary vector   whose length is ||.The fact that the value of (  )  is one or zero indicates whether node  belongs to the th community.The th element can be considered as random variable   , and its probability distribution is (  = 1) =   /, (  = 0) = 1 −   / where   means the number of nodes in the th community and  represents the number of all nodes in the network.Also, in the community set   found by the algorithm,   is used to represent the probability distribution of nodes in the th community.(  |   ) = (  ,   ) − (  ) is defined as the conditional entropy of   in   , and, according to (  |   ), the conditional entropy (  | ) of   in  (collection composed by   ) is Standardization of conditional entropy of  (collection composed by   ) in  is Similarly, we can calculate the standardization conditional entropy ( | ) of  in .According to formula (14) We first compare the two static algorithms DBLINK and FOCS on the network.At time  = 0, community detection  findings of simulative networks show that the quality of the community which DBLINK finds is better than FOCS; especially on the two lower overlap networks, advantages are more obvious.However, the time efficiency of DBLINK algorithm is slightly lower than FOCS.
Next, we focus on the comparison of each algorithm for dynamic network in time efficiency and quality of community detection.FOCS and DBLINK, respectively, conduct community detection at every moment on the whole network, and iDBLINK, AFOCS are all incremental community detection algorithms.
(1) In terms of the algorithm efficiency, DBLINK and FOCS all run algorithm on the entire network, so its program running time is substantially consistent with the running time on the network at time  = 0. Nevertheless, since iDBLINK and AFOCS quickly update local community structure according to changes of the network topology (10% of edges added or deleted), their running time is significantly less than that of the corresponding static network community detection algorithms.Moreover, the time efficiency of DBLINK algorithm is slightly lower than FOCS.On the contrary, we believe iDBLINK relative to DBLINK in the increase of time efficiency is greater than AFOCS to FOCS.(2) In terms of the community detection quality, we can see that the community detection results of AFOCS and FOCS do not always coincide; even the NMI values of AFOCS appear to be greater than FOCS, while iDBLINK is almost the same as DBLINK at NMI values of community structure found at every moment on the network.It shows that iDBLINK is indeed the effective incremental community detection version of DBLINK.In most cases, especially on the two lower overlap networks, quality of community structure detected by iDBLINK algorithm is superior to that of AFOCS.
In conclusion, the iDBLINK algorithm proposed in this paper not only keeps the high quality community structure detection performance of DBLINK algorithm, but also has a very good time efficiency on the dynamic networks.

Experimental Results of Real Networks.
We have further validated the performance of iDBLINK on the two real networks DBLP and Amazon.These two networks are all very large complex networks.AFOCS and FOCS fail at first step due to the insufficient memory, so, as shown in Figures 11 and 12, we can only get the results of DENGRAPH-IO and iDBLINK.It is obvious that the result of iDBLINK is better comparing with DENGRAPH-IO and is very close to the community detection quality of DBLINK algorithms.However, its time efficiency is far superior to DBLINK algorithm, which explains that iDBLINK is indeed the effective incremental version of DBLINK and can be very effectively used in the community detection on dynamic networks.

Conclusion
This paper proposes an incremental density-based link clustering algorithm for community detection in dynamic networks iDBLINK.This algorithm can update the corresponding community structure quickly and efficiently according to changes of network topology.It decomposes the change of the network into four cases: adding and deleting nodes and adding and deleting edges.Furthermore, this algorithm adjusts the link density in each network for each case, and, based on the connectedness of densitybased algorithm, it updates the current community structure quickly and efficiently according to the community structure at previous moment.iDBLINK just updates the current community structure in view of the network topology changes, so iDBLINK can be effectively applied to community detection in the dynamic network.In addition, iDBLINK has kept the advantages of DBLINK algorithm, that is, to find overlapping community structure with high quality.Experimental results show that iDBLINK not only has a great time efficiency, but also maintains a high quality community detection performance while the network topology is changing.

Figure 4 :
Figure 4: Update link communities for positive changes.

Figure 5 :Figure 6 :
Figure 5: Update link communities for negative changes.

2 . ( 14 ) 4 . 3 .
, we can calculate the normalized mutual information NMI( | ) of two community sets: NMI ( | ) = 1 − [ ( | ) +  ( | )] Experimental Results of Simulative Network.Figures 7-10 show the comparison of six algorithms.The abscissa represents the network at each moment, zero coordinates refer to the original network, and ordinate, respectively, shows the community detection quality NMI (a) and the algorithm running time (b).