Community structure is one of the most important properties in social networks. In dynamic networks, there are two conflicting criteria that need to be considered. One is the snapshot quality, which evaluates the quality of the community partitions at the current time step. The other is the temporal cost, which evaluates the difference between communities at different time steps. In this paper, we propose a decomposition-based multiobjective community detection algorithm to simultaneously optimize these two objectives to reveal community structure and its evolution in dynamic networks. It employs the framework of multiobjective evolutionary algorithm based on decomposition to simultaneously optimize the modularity and normalized mutual information, which quantitatively measure the quality of the community partitions and temporal cost, respectively. A local search strategy dealing with the problem-specific knowledge is incorporated to improve the effectiveness of the new algorithm. Experiments on computer-generated and real-world networks demonstrate that the proposed algorithm can not only find community structure and capture community evolution more accurately, but also be steadier than the two compared algorithms.

Many real-world complex systems can be represented as complex networks. Networks could be modeled as graphs, where nodes (or vertices) represent the objects and edges (or links) represent the interactions among these objects. The area of complex networks has attracted many researchers from different fields such as physics, mathematics, biology, and sociology. Besides a number of distinctive properties such as the small-world effect, the right-skewed degree distributions, and network transitivity that many networks seem to share, community structure is another important property in complex networks [

In recent years, dynamic networks have become an increasing interest due to their great potential in capturing natural and social phenomena over time [

Some recent studies have attempted to unify the processes of community extraction and evolution by using certain heuristics, such as regularizing temporal smoothness. This idea comes from a new kind of clustering concept called evolutionary clustering which has been proposed to capture the evolutionary process of clusters in temporal data [

Evolutionary clustering could be regarded as evolutionary multiobjective optimization (EMO). The optimization problems with only one objective are called single-objective optimization problems, and those with more than one objective are called multiobjective optimization problems (MOPs). The main purpose of EMO is to deal with multiobjective optimization problems by evolutionary computation. It has become a hot topic in the area of evolutionary computation. By simultaneously optimizing two or more than two objectives, multiobjective optimization evolutionary algorithm (MOEA) can acquire a set of solutions considering the influence of all the objective functions. Each of those solutions cannot be said to be better than the others and corresponds to a tradeoff between those different objectives.

Community detection in dynamic networks is a problem which can naturally be formulated with two contradictory objectives and consequently be solved by an MOEA. Nevertheless, how to make the best use of MOEA to detect community structures in dynamic networks has not been fully investigated. Motivated by these, a decomposition-based MOEA for community detection in dynamic social networks (DYN-DMLS) is proposed. DYN-DMLS employs the framework of MOEA/D [

(a) It is the first time to apply the framework of MOEA/D to detect community structure of dynamic networks. MOEA/D is applied as the framework of the proposed algorithm. It optimizes N scalar subproblems simultaneously instead of a single one. It has been proved to be effective in solving MOPs by a lot of literature [

(b) Problem-specific genetic operators and a local search operator are designed for community detection in dynamic networks. Problem-specific genetic operators make use of neighborhood information to enhance the performance of crossover and mutation. The neighborhood information comes from the topology of network. Gene mutates among the neighboring alleles resulting in the fact that each offspring is a meaningful code. Then, uniform crossover would be surely safe in exchanging genes between two meaningful parents. The genetic operators avoid unnecessary search burden significantly. Moreover, label propagation [

(c) NMI and modularity work as objective functions perfectly, which is proven by our experiments. As mentioned in [

Experiments on computer-generated and real-world networks show the performance of our algorithm. Compared to the state-of-the-art algorithms, our algorithm has the ability to discover the community structure and its evolution more accurately.

The remainder of this paper is organized as follows: Section

MOEA is the base of our work. It is so basic that a brief introduction is enough. There are many famous MOEAs that have been proposed in recent years. For instance, NSGA-II [

Dynamic network is the topic mainly discussed in this paper. Dynamic networks could be analyzed in many kinds of aspects, for example, the tracing of communities, the prediction, and the evolution. In [

Existing methods for analyzing communities and their temporal evolution in dynamic networks can be divided into two classes. For the first class, communities and their evolutions are studied separately (usually community structures are independently extracted at each time step and then in retrospect). For the second class, communities and their evolutions are studied in a unified framework where the temporal smoothness is incorporated into analyzing communities, in order to make community structure more appropriate.

The first method to detect dynamic network structures is proposed by Hopcroft et al. [

To overcome the fluctuation of two-stage approaches, the framework of evolutionary clustering was introduced by Chakrabarti et al. [

An evolutionary spectral clustering approach, proposed by Chi et al. [

DYN-MOGA [

Our DYN-DMLS has a close relationship with DYN-MOGA. It is certain that DYN-MOGA is involved in the comparison. While consensus clustering is the latest algorithm representing the current level of this field, we also take it into the comparison.

There is a large amount of researches focusing on other properties of dynamic network besides community structure. Ahmed and Karypis tried to mine the evolution of conserved relational states from dynamic networks in their new paper [

A dynamic network (DN) can be modeled as a sequence of graph

Assuming

As mentioned above, community detection in dynamic networks is a problem which can naturally be formulated with two contradictory objectives. One objective is the community quality at the current time. The other objective is the temporal cost, which measures the distance between two community structures at consecutive time steps. In this paper, the framework of MOEA/D [

MOEA/D maintains a population

(i)

(ii)

(iii)

(iv)

(v) Uniform spread of weight vectors (

(vi) The maximum number of generations,

(1)

(2)

(3)

(4)_{max}, then go to (5). Otherwise, gen = gen + 1, go to (3).

(5)

To start MOEA/D, the decomposition of the original problem is needed. MOEA/D initially decomposes the MOP into

In order to solve each subproblem, the network at the first time step should provide a kind of community structure as the initialization to time smoothing. Because there is no history information at the first time step, the network can be clustered without time smoothing. Therefore, in Step 1 of Algorithm

Following our previous work in [

Illustration of the LAR. (a) A network modeled as a graph; (b) the LAR of one possible genotype; (c) the community structure of the genotype.

The locus-based adjacency encoding scheme has several major advantages for our task. Firstly, it is unnecessary to fix the number of communities in advance, as it is automatically determined in the decoding step, which is an important feature to address the real-world networks with no prior knowledge. Secondly, some standard crossover operators such as uniform, one-point, or two-point crossover can be employed in this representation, which effortlessly implements merging and splitting operations of communities on individuals and also maintains the remainder of the partitioning. Finally, the genetic representation contains all possibilities of connected subgraphs, which guarantees that a better community structure can be obtained by maximizing the modularity.

As mentioned above, under the framework of temporal smoothness, we need two objective functions to quantitatively measure the quality of the communities and temporal cost, respectively. A quantitative definition, network modularity, proposed by Girvan and Newman [

The modularity criterion is based on the intuitive idea that a random graph does not exhibit cluster structure, while possibly there is cluster structures that is revealed by the comparison between the actual density of edges in a subgraph and the density which one would expect to have in the subgraph if the vertices of the graph were attached regardless of community structure [

The second objective is NMI, which is a well-known entropy measure to evaluate how similar the community structure

In order to maintain the effective connections of the nodes in the child individual, uniform crossover is employed as the crossover operator in our method. Unlike one-point and two-point crossover, the uniform crossover enables the parent chromosomes to contribute to the gene level rather than the segment level and can generate any combination of alleles from the two parents [

Illustration of uniform crossover. A uniform crossover on parents

For the mutation, we adopt the neighbor-based mutation [

According to many researches [

Why could label propagation be the local search strategy here? Firstly, it is localized and quick by making use of the neighborhood information. Based on a given partition, the interaction between members’ labels will provide a membership adjustment which may lead to some improvement. Label propagation imitates the process of communication in the real world. It is consistent with the definition of community. Secondly, though label propagation has a low stability, it still could contribute to the optimization process. As local search is a trail for better solutions, failure is acceptable. The effect of label propagation would be indicated within the iterations because it gets more chance to show. Thirdly, because MOEA/D works well in the global search, the input of the local search is nearly of high quality. It will enhance the stability of label propagation in some degree. The flow of local search is shown in Algorithm

Otherwise, select the

go to

find the label of community which most neighborhood nodes belong to.

And then select one from these nodes to replace the

by the corresponding value.

corresponding single-objective sub problem. If its value is greater than

that before local search, replace the current individual by the new one, go to

otherwise, go to

According to the feature of the MOEA/D, each subproblem is a single-objective optimization problem. Therefore, a better solution can be obtained by a local search procedure in optimizing corresponding single-objective problem.

In order to use the prior knowledge about relations between nodes, the local search strategy is based on the neighbor nodes. There is an obvious intuition that a node tends to be in the same community with most of its neighbors. In other words, if most of a node’s neighbors are in the

MOEA/D decomposes the original problem into several sub-ones. Each subproblem is a single-objective optimization and provides one solution at the end of each time step. Subproblems exchange information within their neighborhood. Though it is a single-objetive optimization, dominant relationship is still implicated. The solution which is nondominated with both its own subpopulation and neighborhood would be reserved. These solutions form the nondominated solution set. The decomposition of the original problem supports the diversity of MOEA, while the dominant relationship with subpopulations and neighborhood pushes the solution set moving to Pareto front. The front is supposed to contain all the nondominated solutions theoretically. But in real world it is hard to realize. As the optimization of modularity and NMI are nondeterministic polynomial, we cannot identify whether the generated solutions by MOEA/D within limited generations are the optimal solution or not. MOEA/D makes efforts to approach Pareto front.

In this paper, modularity density [

In this section, we evaluate the effectiveness of the proposed decomposition-based multiobjective evolutionary algorithm with local search for community detection in dynamic networks (termed as DYN-DMLS for short) on two synthetic networks and three real-world networks. The compared algorithms include DYN-MOGA [

As to the performance metric, in the case that we have the ground truth for each time step, we directly adopt a similarity measure, normalized mutual information (NMI) [

The experiments are performed on an Intel Core2 Duo CPU machine with 1.98 GHz and 1.99 GB RAM. The parameter settings are as follows. The population size

Lancichinetti and Fortunato’s consensus clustering needs a static algorithm as the base. In this study, we choose consensus clustering with label propagation method (LPM) [

Note that it is hard to determine which ground truth each consensus result corresponds to. This is due to the fact that several time steps are calculated together to produce a result representing the general state of the network structure during the time window. To make the comparison clear, we prepared a strategy for the consensus clustering determining the ground truth of its results. The strategy is to make the consensus algorithm running in a similar way to DYN-DMLS. In DYN-DMLS, the result of each time step is determined mainly by two aspects. One is the current network and the other is the previous one. Similarly, for consensus clustering algorithm we took

In order to evaluate the ability of our approach to successfully detect the community structures for dynamic networks, we use benchmark datasets. Benchmark networks take

Two kinds of benchmarks are involved. The first is the GN benchmark [

The second is modified by Kim and Han [

The involved comparison algorithms include DYN-DMEA which is the version of DYN-DMLS without local search, DYN-MOGA, consensus clustering with OSLOM (consensus-OSLOM-2), and consensus clustering with LPM (consensus-LPM-2). As consensus clustering [

In this study, we generate the datasets under four different levels by setting

Figure

NMI results of the five algorithms on the SYN-FIX dataset with

Figure

The box plots to illustrate the distribution of NMI at each time step when

Figure

NMI results of the five algorithms on the SYN-FIX dataset with

Figure

The box plots to illustrate the distribution of NMI at each time step when

Figure

NMI results of the five algorithms on the SYN-FIX dataset with

Figure

The box plots to illustrate the distribution of NMI at each time step when

Figure

NMI results of the three algorithms on SYN-FIX dataset with

Figure

The box plot to illustrate the distribution of NMI at each time step when

In this study, we generate the SYN-VAR datasets under four different levels by setting

Figure

NMI results of the five algorithms on the SYN-VAR dataset with

Figure

The box plots to illustrate the distribution of NMI at each time step on SYN-VAR when

In SYN-VAR, every two consecutive time steps are less similar. It affects MOEAs more than consensus clustering. The merging and splitting of communities would lower the value of objective function NMI directly. When the objective is low, the number of probable solutions would increase. Generally speaking, it leads to a larger search space. Then, great fluctuation appears in the box plot. As to consensus algorithm, the deviation of two consecutive time steps would affect neither the separated static run at each time step nor the consensus process. But generally consensus clustering could not obtain a satisfying result.

Figure

NMI results of the five algorithms on the SYN-VAR dataset with

NMI results of the five algorithms on the SYN-VAR dataset with

The box plots to illustrate the NMI distribution of DYN-DMLS at each time step on SYN-VAR when

Overall considering the experimental results on synthetic networks, we can conclude the following.

On most of the tested synthetic networks, DYN-DMLS performs best. Without local search process, DYN-DMEA always results in being a little lower than DYN-DMLS. This proves that the local search is effective. However, DYN-MOGA performs poor. These indicate the rationality of our method.

Consensus clustering is good at finding the consensus part from a set of structures. Its result may be meaningful. But to our evaluation metric, its performance is not so good. Consensus clustering is of strong convergence. As we can see from the box plot, repeats bring about the same result. Because of the strong convergence and the stable results from static algorithms, the results may be rough and have no chance to be improved. The most important question is that which structure should be regarded as the corresponding ground truth for comparison. The evaluation metric of consensus clustering on dynamic network is tough.

In this section, we present experimental studies on three real-life datasets: the football network dataset (

The football network dataset is the National Collegiate Athletic Association (NCAA) Football Division 1-A Schedule, which has been used by Newman and Girvan [

Note that the football dataset is dynamic. Though the ground truth is invariant, the network is always changing. The edges between nodes are changing. At each timestamp, it is not prior knowledge that the division is the same. It could be considered as a kind of strong time smoothing information. If the previous result is accurate enough, time smoothing would contribute more to getting a satisfying result for the current. This is not against the dynamic assumption but a probable situation.

Figure

In order to analyze visually, the communities found by our algorithm DYN-DMLS on the football network for the year 2009 are shown in Figure

The box plots to illustrate the NMI distribution of consensus-OSLOM-2 at each time step on SYN-VAR when

NMI results of the football dataset.

The communities found by DYN-DMLS on the football network for the year 2009.

As what can be seen from Figure

The VAST Dataset is a challenge task from IEEE VAST 2008, whose primary task is to characterize the Catalno/Vidro social network based on the cell phone call data provided and to characterize the temporal changes in the social structure over the 10-day period.

This dataset consists of information about 9834 calls between 400 cellphones over a 10-day period in June 2006 in the Isla Del Sueño. It includes records with the following fields: identifier for caller, identifier for receiver, time, duration, and call origination cell tower. In order to detect the communication patterns, we construct call graphs based on the call records. In order to evaluate our algorithm better, we convert the input social network and the corresponding dynamic graph into 5 snapshot graphs, where the graphs in every two days are aggregated into one snapshot graph and therefore we have 5 snapshot graphs over 10 days.

Note that the dataset records phone chains in 10 days. The phone call is a kind of temporary connection. It is in a short time window that the community structure would be confusing. Two members in the same community may make phone calls every day, while it is also common that there are just a few calls in the 10 days. One might not be able to catch the relationship at any time. To handle this kind of hidden information, time smoothing could work. As NMI is a kind of statistical information, little missing information would not affect the result too much. Therefore, time smoothing makes sense macroscopically.

Due to no a priori knowledge about ground truth of the cellphone network, the result has been figured out. Here, we only discover the community structure in the network to evaluate the performance of our algorithm, rather than performing the contest task which is the goal of the Mini Challenge 3.

As a challenge task from IEEE VAST 2008, this dataset has been analyzed by many researchers. It has been confirmed that the structure of the cellphone network changes drastically from the 7th day to the 8th day [

The main community structure of VAST found by DYN-DMLS at time step 3 and at time step 5. (a) The main community structure at time step 3; (b) the main community structure at time step 5.

DBLP Coauthorship dataset is obtained from DBLP database, which has been described in [

The author corresponding to the nodes in snapshot graphs.

1 “P. S. Yu” | 2 “C. C. Aggarwal” | 3 “M. S. Chen” | 4 “W. Fan” | 5 “B. Gedik” | 6 “J. Han” |

7 “B. Liu” | 8 “L. Liu” | 9 “J. Pei” | 10 “H.X. Wang” | 11 “K. Wang” | 12 “K. L. Wu” |

13 “Y. Xu” | 14 “X. F. Yan” | 15 “Z. F Zhang” | 16 “J.Y. Wang” | 17 “C. C. Chen” | 18 “H. L. Chen” |

19 “M. C. Chen” | 20 “W. T. Chen” | 21 “Y. H. Chu” | 22 “K.T. Chuang” | 23 “J. M. Ho” | 24 “J.H. Hsiao” |

25 “C. M. Hsu” | 26 “J. W. Huang” | 27 “H. P. Hung” | 28 “K. H. Liu” | 29 “W. G. Teng” | 30 “C. Y. Tseng” |

31 “M. Y. Yeh” | 32 “K. Zhang” | 33 “L. Liu” | 34 “D. Cai” | 35 “C. Chen” | 36 “H. Gonzalez” |

37 “X. F. He” | 38 “S. K. Kim” | 39 “X. L. Li” | 40 “H.Y. Liu” | 41 “Q. Z. Mei” | 42 “Z. Shao” |

43 “D. Xin” | 44 “X. X. Yin” | 45 “C. X. Zhai” | 46 “F.D. Zhu” | 47 “N. J.” | 48 “X. l. Li” |

49 “J. Caverlee” | 50 “K. K. Chen” | 51 “A. Iyengar” | 52 “C. Pu” | 53 “A. Singh” | 54 “M. Srivatsa” |

55 “J. Yin” | 56 “A. W. C. Fu” | 57 “D. X. Jiang” | 58 “X.M. Lin” | 59 “Y. F. Tao” | 60 “R. C. W. Wong” |

61 “X. K. Xiao” | 62 “X. m. Lin” | 63 “X. F Meng” | 64 “C. Zaniolo” | 65 “B. C. M. Fung” | 66 “E. P. Lim” |

67 “H. W. Lauw” | 68 “J.N. K. Liu” | 69 “W. M. Ma” | 70 “R. She” |

In this experiment, we apply our DYN-DMLS algorithm to analyze this dynamic network. Firstly, we detect the communities in snapshot graph at first time step (2005-2006) without smooth evolution by employing the memetic community detection algorithm (Meme-Net) [

(a) The community partition on snapshot graph at first timestamp (2005-2006) without smooth evolution; (b) the community partition on snapshot graph at second timestamp (2007-2008) with smooth evolution; (c) the community partition on snapshot graph at third timestamp (2009-2010) with smooth evolution.

Then, DYN-DMLS algorithms are employed to detect the communities on the snapshot graphs over the other two time steps. Using the solution of our algorithm, we analyze how some individual authors’ community membership changes over time. Figure

From Figure

In our approach based on evolutionary clustering, NMI between the consecutive time steps is used to represent time smoothing. Time smoothing assumes that two consecutive time steps have structure connection with each other. Their community structures may not be the same but at least have a relatively higher similarity than that of two totally different ones. In evolutionary clustering, this corresponds to history cost (modularity corresponds to snapshot quality). NMI plays the role to measure the similarity. The higher the value of NMI is, the more similar the two consecutive time steps are. Therefore, its history cost is small. In the multiobjective optimization process, the solution which has high NMI but low modularity would be reserved in the population and it has a better chance to evolve into a proper solution than the one which has low NMI and modularity. Therefore, NMI can work as expected.

The experiment compares results of the two approaches on the benchmark network with

When

NMI results on the SYN-FIX with

Though the result of the first time step has an influence on the subsequent process, the influence is rather limited and diminishing. When calculating objective function NMI, the result is directly determined by the two consecutive time steps, the current one and the previous one. The

From another point of view, multiobjective optimization is a global statistical search process. Each step may generate different results in different runs. So time symmetric may be a property of the dynamic network but in multiobjective optimization it is hard to maintain.

To support the above description, we test the dynamic network in an inverted time sequence and compare its result with the common one. Strictly speaking, it is not symmetric. The result is shown in Figure

NMI results of the two sequences on the SYN-FIX with

The detection of communities and analysis of the community evolution in dynamic networks with temporal smoothness is a new challenging research problem with broad applications. In this paper, the two cost functions, community quality function and temporal cost function, are optimized simultaneously by the decomposition-based multiobjective evolutionary algorithm with a local search. The methods can provide the solution representing the best tradeoff between the accuracy of the communities structures obtained and the similarity between one time step and the previous one, without fixing a weight parameter in advance. In addition, a local search operator is incorporated into our method according to the problem-specific knowledge, which has a better ability to search the solution, especially when the community structure changes more dramatically over time. Experiments on SYN-VAR benchmark demonstrate that the proposed algorithm has a better accuracy in extracting community and capturing community evolution than the classic DYN-MOGA and consensus clustering algorithm. In our future work, we will expand our algorithm to be suitable for processing the large-scale networks in real life. Some better local search strategies should be studied to incorporate into our method to improve the performance further.

The authors declare that there is no conflict of interests regarding the publication of this paper.

This work was supported by the National Natural Science Foundation of China (Grant nos. 61273317, 61202176, and 61203303), the National Top Youth Talents Program of China, the Specialized Research Fund for the Doctoral Program of Higher Education (Grant no. 20130203110011), and the Fundamental Research Fund for the Central Universities (Grant nos. K50510020001 and K5051202053).