A Constrained Power Method for Community Detection in Complex Networks

For an undirected complex networkmade upwith vertices and edges, we developed a fast computing algorithm that divides vertices into different groups by maximizing the standard “modularity” measure of the resulting partitions. The algorithm is based on a simple constrained power method which maximizes a quadratic objective function while satisfying given linear constraints. We evaluated the performance of the algorithm and compared it with a number of state-of-the-art solutions. The new algorithm reported both high optimization quality and fast running speed, and thus it provided a practical tool for community detection and network structure analysis.


Introduction
Many real-world complex systems can be formulated as networks, with entities represented by vertices and relationships represented by edges.With wide important applications in world wide web, social networks, biological networks, communication networks, and so forth, research on networks has attracted significant recent work in different fields, including computing sciences, applied mathematics, theoretical physics, biological sciences, and social sciences, as well as engineering disciplines [1,2].
Among various aspects of networks that need to be investigated, we are particularly interested in the automatic detection of community structures therein.That is, "how many communities are there and what are the memberships" for each vertex [3].Such community structures are commonly believed to exist in networks: vertices tend to cluster in groups with dense edge connections while the edge connections are sparser between vertices from different groups.This phenomenon is analog to the human being's society where people fall into different groups based on mutual connection.The ability to find and analyze such groups has proved to be invaluable in understanding many important properties of real-world networks [1,2].
To automate the process of community detection in complex networks, a well-known technique is often adopted through maximizing the "modularity" measure proposed by Girvan and Newman [4].This model quantifies the quality of a particular vertex partition of a network with a single numeric value.People have shown via a variety of both simulated and empirical evaluations that larger modularity values are correlated with better vertex grouping qualities, and therefore maximizing the value provides a principled way in detecting the communities of a complex network.
Unfortunately, maximizing the modularity value for a given network is usually computationally hard and involves prohibitive complexity even for small-sized networks.Thus, approximation methods have to be sought to ensure the tractability of the solutions.In this paper, we propose a spectral relaxation-based method which reports both high optimization quality and fast running speed.
This paper is organized as follows.Firstly, we briefly review three topics which are essential to the development of our work: the modularity model, the power method, and the projected power method.Secondly, we propose a spectral method for fast modularity optimization based on the projected power method.Finally, we show our evaluation results with improved results, and draw the conclusion.

Modularity Model.
Modularity quantifies the quality of a given division of a network into communities.Good divisions, which have high modularity values, are those with dense edge connections between the vertices within a community but sparse connections between vertices in different communities.Consider an undirected graph = {, }, where  = {V 1 , . . ., V  } is a set of vertices and  is a set of edges connecting pairs of vertices.Let   give the number of edges (or weights) between two vertices V  and V  .We further denote   = ∑    as the degree of vertex V  and  = (1/2) ∑    as the number of edges in the graph.
For a candidate partition of the vertices into communities, the modularity is defined to be the portion of the edge connections within the same community minus the expected portion if the connections were distributed randomly.Assuming that the degree   associated with each vertex V  is preserved, under uniform random selection the expected number of edges between V  and V  is     /2.Thus, the observed number minus the expected number is   =   −     /2.Summing over all vertex pairs within the same group, the modularity, denoted by , is given by [4] where   denotes the group membership of V  and (  ,   ) is one if   =   and zero otherwise.The value of  lies in the range [−1, 1].It is positive when the observed connections within the same group exceed the expected, and negative otherwise.Given a larger than expected portion of connections, it is natural to infer the presence of an underlying group structure.One can search for community structures precisely by looking for graph divisions that have large modularity values.
Unlike most other clustering models, which require a prior setting of partition numbers or group sizes [5], maximizing the modularity score gives the partition number and the group size automatically without manual intervention.This measure also allows the possibility that no good division of a network exists, corresponding to the case that the modularity value is zero and cannot be increased by any division of vertices.

Power Method.
Given a matrix , the classical power method (also known as the Von Mises iteration) computes the dominant eigenvalue (the eigenvalue with the largest absolute value) and its associated dominant eigenvector very efficiently [6].Starting with a vector V 0 , which may be an approximation to the dominant eigenvector or a random vector, the method iteratively computes That is, at each iteration, the vector V  is multiplied by the matrix  and then normalized.The sequence of vectors V  ( = 0, 1, . ..) converges to the dominant eigenvector under certain assumptions which are often trivial to assume in practice.
When the matrix  is positive definite, all eigenvalues are positive.The dominant eigenvector is also the maximizer to a quadratic optimization problem max V V  V subject to : ‖V‖ = 1. (3) Applying the power method actually provides a fast solution to this optimization problem.

Projected Power Method.
For a positive definite matrix , we study a general constrained quadratic optimization problem Due to the existence of the linear constraints (expressed by V = ) on the solution vector, this problem cannot be solved by the standard power method any more.
In previous work, we developed a fast projected power method to solve this constrained optimization problem [7,8].Given the current estimate V  , we stretch it by multiplying with  and project it to the hyperplane V = .After renormalization, we have the next estimate update rule can be expressed as

A Spectral Relaxation Method.
Maximizing  has been proved NP-hard.The required computation for optimal solution grows exponentially with the size increase of the network [9].Therefore, researchers have sought approximate solutions to ensure the tractability when the network grows large [3,4,[10][11][12][13][14][15].In practice, a spectral relaxation method is widely used, which runs fast and obtains reasonable empirical optimization qualities [3].
The standard spectral relaxation method starts from a simple case where the vertices are divided into just two groups.One can define   = ±1 to indicate the group membership of V  , yielding where  is a modularity matrix with elements   (,  = 1, . . ., ) and  is a column vector with elements   ( = 1, . . ., ).
The vector  can be expressed as a linear combination of the normalized eigenvectors   of the modularity matrix , so that  = ∑      with   =    .Then, one obtains where  is the eigenvalue of  associated with the eigenvector   .Assume that the eigenvalues are labeled nonincreasingly,  1 ≥ ⋅ ⋅ ⋅ ≥   .Maximizing  requires the assignment vector  to concentrate as much weight as possible in the terms involving the leading (largest algebraic) eigenvalue.If  were unconstrained, this could be achieved by simply setting  proportional to the leading eigenvector  1 .But with the binary "±1" constraints,  cannot be chosen freely, which makes the optimization process difficult.
Fortunately, there is a convenient approximation heuristic available, which we refer to as conventional rounding.Ignoring the inconvenient fact that it is not possible to make  perfectly parallel to  1 , one simply divides the vertices into two groups according to the signs of each element of  1 .That is, where  1 denotes the th element of  1 .Although this heuristic is straightforward, it has often been found to give good results in practice.Now, the problem is to seek the leading eigenvector  1 .In fact, the modularity matrix  has a special structure that can be exploited to efficiently compute the leading eigenvector  1 via the classical power method very efficiently.As  is generally not positive definite, and we study the matrix + instead, where  is the identity matrix.As long as we choose a sufficiently large  > |  |, the matrix + is positive definite and has the same set of eigenvectors as .We compute the dominant eigenvector by the power method for this positive definite matrix, which turns out to be exactly the same as 's leading eigenvector.
This simple two-way partition method can be extended to multiway partitions recursively.That is, using successive twoway partitions that divide the networks into subnetworks, the division process is continued on each subnetwork until no further increases in  can be found.
This spectral method gives a broad partition of the network vertices.In practice, the grouping results need to be further refined by locally exchanging vertices between different groups using the Kernighan-Lin algorithm [16].Given two groups of vertices, the refinement proceeds as follows.Successively, find the vertex that, when moved to the other group, obtains the largest increase in , or the smallest decrease if no increase exists.Repeatedly, make such moves, ensuring that each vertex is moved only once.When all vertices have been moved, search all intermediate states to find the division that obtained the largest  value.Starting again from this state, repeat the exchange process, until no further improvement is possible for  value.
As reported in [3], this combination gives good results on many open networks and has become a standard community detection method for small-to medium-sized networks.
For even larger networks, unfortunately, this method is still not applicable.The major difficulty lies in the refinement process.Although computing the leading eigenvector (by power method) is very fast, exchange heuristics is much slower.The complexity of the refinement can often be ( 3 ) for a network with  vertices, which is not feasible for large networks.It is highly desired if we can find a spectral method that avoids the refinement process.

A Constrained Power
Method.Note that the conventional rounding strategy is based on the signs of the leading eigenvector elements, regardless of their magnitudes (absolute values).However, the elements with different magnitudes contribute differently to , and therefore affect the confidence in the rounding decisions.For example, if  1 has a large magnitude, then   will have a significant influence on    1  in the objective, and we would be more confident in inferring its rounded value.However, if the magnitude is small,   's contribution to the objective is less evident, and one would be less confident to make the rounding decision.In this latter case, we would like to postpone the rounding decision to a later phase.
Based on this idea, we propose a successive rounding method that only rounds variables with top magnitudes.That is, unlike conventional rounding that makes the entire partition decision in a single batch, we propose to recover more accurate community structure incrementally, using a strategy that we will refer to as iterative rounding.
At the beginning, we have the same problem, max    , as the conventional spectral method in Section 3.1.We simply use the power method and get the leading eigenvector.Then, rather than deploying conventional rounding, we only round the elements with sufficiently large magnitudes, or a portion of elements with the largest magnitudes, into binary decisions.
After the first iteration, we left a residual problem to solve.Now, suppose  = (  1  2 ), where  1 denotes the rounded elements that are held fixed (as either "+1" or "−1") and  2 contains the unrounded elements yet to be set.Our problem now becomes max where  1 denotes the th element of  1 and the index  traverses all elements that have been fixed.This is a constrained quadratic optimization problem that can be solved by our projected power method in Section 2.3.Actually, by exploiting the special structure of this problem, the iterative update rule of  2 can be further simplified as where  11 ,  12 ,  21 , and  22 are four submatrices of the modularity matrix .
We call the proposed method constrained power method.This update occurs in a similar manner as in the classical power method.When  21  1 = 0, the constrained power method reduces exactly to the power method.

Mathematical Problems in Engineering
Here, we omit the detailed derivation of the constrained power method from the projected power method.Instead, we can explain this update rule from a more intuitive viewpoint.
Our objective is to maximize ( with respect to  2 subject to certain normalization constraint.We apply gradient-based method to maximize  by iteratively updating  2 along the gradient direction and renormalizing it.The convergence is guaranteed and achieved when the gradient ∇ is parallel to the current  2 , or ∇ = 2 2 , where  is a scalar number.Then, we have  2 = ( 22  2 +  21  1 )/.We can force  = ‖ 22  2 +  21  1 ‖ considering the normalization constraint on  2 , which reveals exactly the same update rule as in our constrained power method.
Given the relaxed solution of  2 produced by the constrained power method, we again only round those elements in  2 that have large magnitude.After this partial rounding, we left another residual problem sharing the same structure.The constrained power method and iterative rounding are applied successively for each residual problem.The process is terminated once  2 becomes empty; hence, all vertex grouping decisions have been made.
As can be seen from our evaluation results, by applying this iterative rounding strategy, it is no longer needed to apply the time-consuming local exchange heuristic for refinement while still with comparable, if not improved, optimization quality.And the computational time is highly reduced.
Similar to the conventional spectral method, this iterative rounding-based spectral method can also be extended to multiway partitions trivially and we omit the discussion.

Evaluations
We systematically evaluated the proposed iterative roundingbased spectral method (IR) and compared its performance with the conventional rounding-based spectral method (CR), simulated annealing (SA) [15], the linear programming method (LP) [10], and a multilevel optimization method (ML) [14], which have achieved state-of-the-art optimization quality in different applications.In a series of experiments we observed the following: (i) when comparing spectral method with conventional rounding, our proposed method reports significantly improved optimization quality, while only with reasonable computational overhead; (ii) when comparing with other state-of-the-art methods, our proposed method reports comparable accuracy, but with significantly improved running speed.

Comparison with Conventional
Rounding.The evaluations were conducted on all fourteen networks contained in a standard benchmark collection from Newman's personal website.These benchmark networks cover a variety of application areas and are briefly described in Table 1.The size of To compare conventional rounding and iterative rounding-based spectral methods, we collected two-way partition results and multiway partition results (with and without exchange heuristics).In two-way partitions, we divided each network into two partitions and checked the  value obtained by CR and IR, respectively.In multiway partitions, we experimented the case where the two-way partitions are repeated until no further increase of  is possible.We recorded the results both with refinement by exchange heuristics and without refinement.
Table 2 lists the results of CR and IR on different networks.Each item has three  values (, , and ), where  is for two-way partitions,  is for multiway partitions, and  is for multiway partitions refined by exchange heuristics.For networks with over 10, 000 vertices, we were unable to obtain the results on our computer in the running time limit (ten hours), and thus were omitted.
From the results, we can see that the effectiveness of the new rounding strategy is fully demonstrated.Our proposed method often reports significantly improved results over the conventional rounding method.On two-way partitions, for small networks, both methods reported comparable  values and the improvement from iterative rounding is not large.On the smallest "karate" network with 34 vertices, the improvement is only slight, from 0.371 to 0.372.However, on larger networks, the improvements are much more significant.One example is on the "hepth" network with 7, 610 vertices, where the conventional rounding result is 0.034 while the iterative rounding achieves a modularity score of 0.455.
On multiway partitions, similar to the two-way partition case, the improvement is particularly significant on large networks.For example, on the "karate" network, the increase is only from 0.393 to 0.417, while on the larger "hepth" network, it becomes 0.739 to 0.829; and on the largest "internet" network, the improvement is from 0.419 to 0.620.
On refined multi-way partitions, both methods successfully detected the partition structure with  = 0.420 which is known to be optimal.On "adjnoun" network, the two methods obtained equivalent results after refinement.On all the other nine networks where the exchange heuristic was applicable, iterative rounding achieved improved  values.
Not like conventional rounding, iterative rounding benefits much less from exchange heuristics.On the other hand, on eight out of eleven networks, the iterative rounding's multiway results alone (without refinement) are better than the conventional rounding's results with refinement, further exhibiting the effectiveness of iterative rounding.For large networks, we can apply the iterative rounding-based spectral method, avoiding the expensive refinement process, and we can still expect to have good partition quality.

Comparison with Other
Methods.Besides conventional rounding-based spectral method, we also compared our method with a number of recent network partition algorithms that achieved state-of-the-art optimization quality.These algorithms include simulated annealing (SA) [15], linear programming relaxation (LP) [10], and multilevel local search (ML) [14].When applicable, we also list the solutions of a number of networks that are already known to be optimal in literature [10].
The results are listed in Table 3. Comparing with the optimal results, we can see that, out of six networks with known optimal values, our method successfully obtained five optimal values, with only one result on "dolphins" network slightly lower than the optimal value.The other three algorithms also had similar performances on these six small networks.
On large networks with more than 1, 000 vertices, we were unable to finish the experiments of simulated annealing and linear programming relaxation due to their prohibitive computational requirements and only compared the results with the multilevel optimization algorithm.On most of the networks, the new method reported improved results.Only on "netsci" network, the multilevel optimization method found better solutions than ours.

Running Time.
We recorded the running time of the two spectral methods (multiway without refinement) and the multilevel optimization method on networks with more than 1,000 vertices.The experiments were carried out under MAT-LAB environment running on an Intel Xeon workstation with 32 GB RAM.The results are shown in Figure 1.From these results, we can see that the two spectral methods run fast and both solved the largest network ("internet") partition problem within two minutes.Comparatively, iterative rounding is approximately 3 to 8 times slower than conventional rounding, given the additional time required to solve the residual problems.Considering the significantly improved quality of network partitions brought by iterative rounding, especially on large networks, this is an acceptable cost for practical applications.
When comparing with the multilevel optimization method, the spectral methods exhibited their significant improvement in running speed.On the largest "internet" network, both methods are more than ten times quicker.

Conclusion
Complex network analysis has attracted significant research attention recently.This paper investigates the community detection problem in complex network analysis.Our work proposed a fast iterative rounding-based spectral method to maximize the modularity value of network community partitions.Comparing with state-of-the-art solutions, the new method reported improved results in both optimization quality and running time.
The key to our work is the development of a constrained power method.The method is a special instance of the projected power method and optimizes a certain type of constrained quadratic functions effectively with guaranteed convergence.Besides community detection and modularity Mathematical Problems in Engineering maximization, we anticipate the method to be useful in a broader area of applications.
Our current work mainly focuses on the undirected networks.Actually, the proposed algorithm can be modified easily for community detection problem in directed and other networks [23].An empirical evaluation along this line will be part of our future work.
Considering the wide applications of the modularity model, the proposed method is also expected to be applied in other problems related to complex network analysis, for example, in high dimensional data analysis which was revealed by our most recent work [24,25].

Figure 1 :
Figure 1: Running time on different networks.

Table 1 :
Networks used in evaluations.

Table 2 :
Modularity values by spectral methods.

Table 3 :
Modularity values by different methods.