Beyond the Expanders

Expander graphs are widely used in communication problems and construction of error correcting codes. In such graphs, information gets through very quickly. Typically, it is not true for social or biological networks, though we may find a partition of the vertices such that the induced subgraphs on them and the bipartite subgraphs between any pair of them exhibit regular behavior of information flow within or between the subsets. Implications between spectral and regularity properties are discussed.


Introduction
We want to go beyond the expander graphs that -for four decades -have played an important role in communication networks; for a summary, see e.g., Chung [8] and Hoory et al. [14]. Roughly speaking, the expansion property means that each subset of the graph's vertices has "many" neighbors (combinatorial view), and hence, information gets through such a graph very "quickly" (probabilistic view). We will not give exact definitions of expanders here as those contain many parameters which are not used later. We rather refer to the spectral and random walk characterization of such graphs, as discussed, among others by Alon [1], and Meila and Shi [17]. The general framework of an edge-weighted graph will be used. Expanders have a spectral gap bounded away from zero, where -for a connected graph -this gap is defined as the minimum distance between the normalized Laplacian spectrum (apart from the trivial zero eigenvalue) and the endpoints of the [0,2] interval, the possible range of the spectrum. The larger the spectral gap, the more our graph resembles a random graph and exhibits quasirandom properties, e.g., the edge densities within any subset and between any two subsets of its vertices do not differ too much of what is expected, see the Expander Mixing Lemma 1 of Section 2. Quasi-random properties and spectral gap of random graphs with given expected degrees are discussed in Chung and Graham [9], and Coja-Oghlan and Lanka [11]. However, the spectral gap appears not at the ends of the normalized Laplacian spectrum in case of generalized random or generalized quasi-random graphs that, in the presence of k ≥ 2 underlying clusters, have k eigenvalues (including the zero) separated from 1, while the bulk of the spectrum is located around 1, see e.g., [6]. These structures are usual in social or biological networks having k clusters of vertices (that belong to social groups or similarly functioning enzymes) such that the edge density within the clusters and between any pair of the clusters is homogeneous. Our conjecture is that k so-called structural eigenvalues (separated from 1) in the normalized Laplacian spectrum are indications of such a structure, while the near 1 eigenvalues are responsible for the pairwise regularities. The clusters themselves can be recovered by applying the k-means algorithm for the vertex representatives obtained by the eigenvectors corresponding to the structural eigenvalues (apart from the zero). For the k = 2 case we will give an exact relation between the eigenvalue separation (of the non-trivial structural eigenvalue from the bulk of the spectrum) and the volume regularity of the cluster pair that is obtained by the k-means algorithm applied for the coordinates of the transformed eigenvector belonging to the non-trivial structural eigenvalue, see Theorem 1 of Section 3. To eliminate the trivial eigenvalue-eigenvector pair, we shall rather use the normalized modularity spectrum of [7] that plays an important role in finding the extrema of some penalized versions of the Newman-Girvan modularity introduced in [18]. Theorem 2 of Section 4 gives an estimation for the extent of volume-regularity of the different cluster pairs in the k > 2 case based on the spectral gap and the k-variance of the vertex representatives. In [10,16], the authors give algorithms -based on low rank approximation -to find a regular partition if k is known and our graph comes from a generalized random graph model with k clusters. Without knowing k, there are constructions -like [13] -based on refinement of partitions and leading to a very fine partition with number of clusters depending merely on the constant ruling the regularity of the cluster pairs. On the contrary, our purpose is to estimate the extent of the regularity of the cluster pairs by means of spectral gaps and eigenvectors. The estimations given are relevant only in the presence of a large spectral gap (between some structural and the other eigenvalues) and special classification properties of the eigenvectors corresponding to the structural eigenvalues, see Theorem 2 of Section 4. In this case, the algorithm is straightforward via k-means clustering.

Preliminaries and statement of purpose
Let G = (V, W) be a graph on n vertices, where the n × n symmetric matrix W has nonnegative real entries and zero diagonal. Here w ij is the similarity between vertices i and j, where 0 similarity means no connection/edge at all. A simple graph is a special case of it with 0-1 weights. Without loss of generality n i=1 n j=1 w ij = 1 (1) will be supposed. Hence, W is a joint distribution, with marginal entries which are the generalized vertex degrees collected in the main diagonal of the diagonal degree matrix D = diag (d), d = (d 1 , . . . , d n ) T . In [4,5] we investigated the spectral gap of the normalized Laplacian L D = I− D −1/2 WD −1/2 , where I denotes the identity matrix of appropriate size. Suppose that our graph is connected (W is irreducible). Let 0 = λ 1 < λ 2 ≤ · · · ≤ λ n ≤ 2 denote the eigenvalues of the symmetric normalized Laplacian L D with corresponding unitnorm, pairwise orthogonal eigenvectors u 1 , . . . , u n . Namely, In the random walk setup D −1 W is the transition matrix (its entry in the (i, j)-th position is the conditional probability of moving from vertex i to vertex j in one step, given that we are in i) which is a stochastic matrix with eigenvalues 1 − λ i and corresponding eigenvectors D −1/2 u i (i = 1, . . . , n). "Good" expanders have a λ 2 bounded away from zero, that also implies the separation of the isoperimetric number where for X, Y ⊂ V : w(X, Y ) = i∈X j∈Y w ij is the weighted cut between X and Y , while Vol (U ) = i∈U d i is the volume of U ⊂ V . In view of (1), Vol (V ) = 1, this is why the minimum is taken on vertex sets having volume at most 1 2 . In [5], we proved that while in the λ 2 ≤ 1 case the stronger upper estimation holds. (We remark that λ 2 ≤ n n−1 always holds.) If a network does not have a "large" λ 2 (compared to the natural lower bound), or equivalently -in view of the above inequalities -it has a relatively "small" isoperimetric number, then the 2-partition of the vertices giving the minimum in (2) indicates a bottleneck, or equivalently, a low conductivity edge-set between two disjoint vertex clusters such that the random walk gets through with small probability between them, but -as some equivalent notions will indicateit is rapidly mixing within the clusters. To find the clusters, the coordinates of the transformed eigenvector D −1/2 u 2 will be used. In [4], we proved that for the weighted 2-variance of this vector's coordinates holds. For a general 2 ≤ k ≤ n, the notion of k-variance -in the Analysis of Variance sense -is the following. The weighted k-variance of the k-dimensional vertex representatives x 1 , . . . , x n comprising the row vectors of the n × k matrix X is defined by where c a = 1 Vol (Va) j∈Va d j x j is the weighted center of cluster V a (a = 1, . . . , k) and P k denotes the set of k-partitions of the vertices. We remark that S 2 The above results were generalized for minimizing the normalized k-way cut of the k-partition P k = (V 1 , . . . , V k ) over the set of all possible k-partitions. Let be the minimum normalized k-way cut of the underlying weighted graph G = (V, W). In fact, f 2 (G) is the symmetric version of the isoperimetric number and f 2 (G) ≤ 2h(G). In [5] we proved that where the upper estimation is relevant only in the case when S 2 k (u 1 , . . . , u k ) is small enough and the constant c depends on this minimum k-variance of the vertex representatives. The normalized Newman-Girvan modularity is defined in [7] as the penalized version of the Newman-Girvan modularity [18] in the following way. The normalized k-way modularity of and is the maximum normalized k-way Newman-Girvan modularity of the underlying weighted graph G = (V, W). For given k, maximizing this modularity is equivalent to minimizing the normalized cut and can be solved by the same spectral technique. In fact, it is more convenient to use the spectral decomposition of the normalized modularity matrix with eigenvalues β 1 ≥ · · · ≥ β n , that are the numbers 1 − λ i with eigenvectors u i (i = 2, . . . , n) and the zero with corresponding unit-norm eigenvector √ d. In [5,7], we also show that a spectral gap between λ k and λ k+1 is an indication of k clusters with low inter-cluster connections; further, the intra-cluster connections (w ij ) between vertices i and j of the same cluster are higher than expected under the hypothesis of independence (in view of which the vertices are connected with probability d i d j ). In the random walk framework, the random walk stays within the clusters with high probability. Conversely, minimizing the above modularity will result in clusters with high inter-and low intra-cluster connections. In [7], we proved that The existence of k "large" (significantly larger than 1) eigenvalues in the normalized Laplacian spectrum, or equivalently, the existence of k negative eigenvalues (separated from 0) in the normalized modularity spectrum is an indication of k clusters with the above property. In the random walk setup: the walk stays within the clusters with low probability. These two types of network structures are frequently called community or anti-community structure. These are the two extreme cases, when f k (P k , G) is either minimized or maximized, and the optimization gives k clusters with either strong intra-cluster and weak inter-cluster connections, or vice versa. Some networks exhibit a more general, still regular behavior: the vertices can be classified into k clusters such that the information-flow within them and between any pair of them is homogeneous. In terms of random walks, the walk stays within clusters or switches between clusters with probabilities characteristic for the cluster pair. That is, if the random walk moves from a vertex of cluster V a to a vertex of cluster V b , then the probability of doing this does not depend on the actual vertices, it merely depends on their cluster memberships, a, b = 1, . . . , k.
In this context, we examined the following generalized random graph model, that corresponds to the ideal case: given the number of clusters k, the vertices of the graph independently belong to the clusters; further, conditioned on the cluster memberships, vertices i ∈ V a and j ∈ V b are connected with probability p ab , independently of each other, 1 ≤ a, b ≤ k. Applying the results [6] for the spectral characterization of some noisy random graphs, we are able to prove that the normalized modularity spectrum of a generalized random graph is the following: there exists a positive number θ < 1, independent of n, such that there are exactly k − 1 so-called structural eigenvalues of B D that are greater than θ − o(1), while all the others are o(1) in absolute value. It is equivalent that L D has k eigenvalues (including the zero) separated from 1.
The k = 1 case corresponds to quasi-random graphs and the above characterization corresponds to the eigenvalue separation of such graphs, discussed in [9]. The authors also prove some implications between the so-called quasi-random properties. For example, for dense graphs, "good" eigenvalue separation is equivalent to "low" discrepancy (of the induced subgraphs' densities from the overall edge density). For the k ≥ 2 case, generalized quasi-random graphs were introduced by Lovász and T. Sós [15]. These graphs are deterministic counterparts of generalized random graphs with the same spectral properties. In fact, the authors define so-called generalized quasi-random graph sequences by means of graph convergence that also implies the convergence of spectra. Though, the spectrum itself does not carry enough information for the cluster structure of the graph, together with some classification properties of the structural eigenvectors it does. We want to prove some implication between the spectral gap and the volume-regularity of the cluster pairs, also using the structural eigenvectors. The notion of volume regularity was introduced by Alon et al. [2]. We shall use a slightly modified version of this notion. Our definition was inspired by the Expander Mixing Lemma stated e.g., in [14] for regular graphs and in [8] for simple graphs in the context of quasi-random properties. Now we formulate it for edge-weighted graphs on a general degree sequence. We also include the proof as a preparation for the proof of Theorem 1 of Section 3.
Lemma 1 (Expander Mixing Lemma for Weighted Graphs) Let G = (V, W ) be a weighted graph and suppose that Vol (V ) = 1. Then for all X, Y ⊂ V : where B D is the spectral norm of the normalized modularity matrix of G.
Proof Let X ⊂ A, Y ⊂ B and 1 U ∈ R n denote the indicator vector of U ⊂ V . Further, x := D 1/2 1 X and y := D 1/2 1 Y . We use the spectral decomposition D −1 . . , n) are eigenvalues of B D and ρ 1 = 1 with corresponding unit-norm eigenvector u 1 = √ d = D 1/2 1. We remark that u 1 is also an eigenvector of B D corresponding to the eigenvalue zero, hence B D = max i≥2 |ρ i |. Let x = n i=1 x i u i and y = n i=1 y i u i be the expansions of x and y in the orthonormal basis u 1 , . . . , u n with coordinates x i = x T u i and y i = y T u i , respectively. Observe that x 1 = Vol (X), y 1 = Vol (Y ) and where we also used the triangle and the Cauchy-Schwarz inequalities.
We remark that the spectral gap of G is 1 − B D , hence -in view of Lemma 1 -the density between any two subsets of "good" expanders is near to what is expected. On the contrary, in the above definition of volume regularity, the X, Y pairs are disjoint, and a "small" α indicates that the (A, B) pair is like a bipartite expander, see e.g., [8].
In the next section we shall prove the following statement for the k = 2 case: if one eigenvalue jumps out of the bulk of the normalized modularity spectrum, then clustering the coordinates of the corresponding transformed eigenvector into 2 parts (by minimizing the 2-variance of its coordinates) will result in an α-volume regular partition of the vertices, where α depends on the spectral gap. We may go further: if k − 1 (so-called structural) eigenvalues jump out of the normalized modularity spectrum, then clustering the representatives of the vertices -obtained by the corresponding eigenvectors in the usual way -into k clusters will result in α-volume regular pairs, where α depends on the spectral gap (between the structural eigenvalues and the bulk of the spectrum) and the k-variance of the vertex representatives based on the eigenvectors corresponding to the structural eigenvalues. In Section 4, we give an estimation for α in the k ≥ 2 case; further, we extend the estimation to the clusters themselves.
3 Eigenvalue separation and volume regularity (k=2 case) is an edge-weighted graph on n vertices, with generalized degrees d 1 , . . . , d n and D = diag (d 1 , . . . , d n ). Suppose that Vol (V ) = 1. Let the eigenvalues of D −1/2 WD −1/2 , enumerated in decreasing absolute values, be Using the spectral decomposition D −1/2 WD −1/2 = n i=1 ρ i u i u T i and the fact that u 1 = √ d = D 1/2 1, we can write (11) as where x = n i=1 x i u i and y = n i=1 y i u i is the expansion of x and y in the orthonormal basis u 1 , . . . , u n with coordinates x i = x T u i and y i = y T u i , respectively. First we will prove that 1 − ρ is governed by ρ 2 ; more precisely, |1 − ρ| ≤ |ρ 2 | + ε. Applying the arguments of Lemma 1 and the above formulas for the special A, B ⊂ V yields where a = n i=1 a i u i and b = n i=1 b i u i is the expansion of a and b in the orthonormal basis u 1 , . . . , u n , respectively. The separation of A and B is based on the vector D −1/2 u 2 which has both negative and positive coordinates, since u 2 is orthogonal to u 1 of all positive coordinates. With formulas, a + b = u 1 , and hence, a 2 + b 2 = u T 1 u 2 = 0. (If it is the eigenvalue λ 2 of the normalized Laplacian that is the farthest from 1, then the corresponding eigenvector, our u 2 , is also called "Fiedler-vector" as the two-partition of the vertices into two loosely connected parts was based on the signs of its coordinates in the early paper of Fiedler [12]). If θ is much larger than ε, the first term in the last formula of (13) -apart from a term of O(|ε|) -will dominate the sign of ρ − 1 which is therefore opposite to the sign of ρ 2 . Therefore, we will distinguish between two cases.
If the two largest absolute value eigenvalues of the normalized modularity matrix are of different sign, then we are able to find a gap at least θ − ε between eigenvalues of the same sign.
Therefore, (12) can be estimated from above with As for the second term, εx 1 y 1 = εVol (X)Vol (Y ), so it does not need further treatment.
Using the Cauchy-Schwarz inequality, the last term can be estimated from above with , since x 1 = Vol (X), y 1 = Vol (Y ) and n i=1 x 2 i = x 2 = Vol (X), n i=1 y 2 i = y 2 = Vol (Y ). The first term is reminiscent of an equation for the coordinates of orthogonal vectors. Therefore, we project the vectors u 1 , u 2 onto the subspace F = Span {a, b}. In fact, u 1 = a + b, and hence, u 1 ∈ F . The vector u 2 can be decomposed as where q is the component orthogonal to F . For the squared distance q 2 between u 2 and F , in [4], we proved that it is equal to the weighted 2-variance S 2 2 (D −1/2 u 2 ) and in (15) we estimated it from above with 1−θ 1−ε . (In the ρ 2 = 1 − λ n case similar upper estimation works using (16)). Let s 2 denote this minimum 2-variance of the coordinates of D −1/2 u 2 (in both cases). To estimate , the problem is that the pairwise orthogonal vectors u 1 , u 2 and a, b are not in the same subspace of R n as, in general, u 2 / ∈ F . However, by an argument proved in [4], we can find orthogonal, unit-norm vectorsũ 1 ,ũ 2 ∈ F such that where, in view of u 1 ∈ F ,ũ 1 = u 1 . Let r := u 2 −ũ 2 . Sinceũ T 1 a,ũ T 2 a andũ T 1 b,ũ T 2 b are coordinates of the orthogonal vectors a, b in the basisũ 1 ,ũ 2 , using (19) and the fact that b − a 2 = 1. Now we estimate . Going back to (18) we have and similarly, that in view of q 2 = s 2 yields Summarizing, the second and third terms in (17) are estimated from above with ε Vol (X)Vol (Y ) ≤ ε Vol (A)Vol (B). Because of ε < θ, by an easy calculation it follows that it is less than 1−θ 1−ε . Therefore, the constant α of the (A, B) pair's regularity is O( 1−θ 1−ε ).

Remark 1
The statement has relevance only if θ is much larger than ε. In this case the spectral gap between the largest absolute value eigenvalue and the others in the normalized modularity spectrum indicates a regular 2-partition of the graph that can be constructed based on the eigenvector belonging to the structural eigenvalue.
The partition (V 1 , . . . , V k ) of V is defined so that it minimizes the weighted k-variance of the vertex representatives obtained as row vectors of the n× k matrix X of column vectors D −1/2 u i , where u i is the unit-norm eigenvector belonging to ρ i (i = 1, . . . , k). With the notation s 2 = S 2 k (X), the (V i , V j ) pairs are 2( √ 2s + ε)-volume regular (i = j) and for the clusters V i (i = 1, . . . , k) the following holds: for all X, Y ⊂ V i we have that where ρ(V i ) = w(Vi,Vi) Vol 2 (Vi) is the relative intra-cluster density of V i .
Proof Denoting by u 1 , . . . , u k the eigenvectors belonging to the so-called structural eigenvalues ρ 1 , . . . , ρ k , the representatives r 1 , . . . , r n of the vertices are row vectors of the matrix X = (x 1 , . . . , x k ), where x i = D −1/2 u i (i = 1, . . . , k) and the trivial x 1 = 1 (belonging to ρ 1 = 1) can be omitted, see (5). The minimum k-variance S 2 k (X) of the k-dimensional (actually, (k−1)-dimensional) representatives is as small as s 2 . Suppose that the minimum k-variance is attained by the k-partition (V 1 , . . . , V k ) of the vertices. By an easy analysis of variance argument of [5,6] it follows that where F = Span {D 1/2 z 1 , . . . , D 1/2 z k } with the so-called normalized partition vectors z 1 , . . . , z k of coordinates z ji = 1 Vol (Vi) if j ∈ V i and 0, otherwise (i = 1, . . . , k). Note that the vectors D 1/2 z 1 , . . . , D 1/2 z k form an orthonormal system. By [4,5] we can find another orthonormal With these vectors, we construct the following k-rank approximation of the matrix i with the following accuracy (in spectral norm): where σ i is the angle between u i and v i , and for it, sin σi 2 = 1 2 u i − v i holds, therefore Hence, the above difference can be estimated from above with √ 2s + ε in spectral norm. Based on these considerations and the fact that the cut norm is less than or equal to the spectral norm, the densities to be estimated in the defining formula (10) of volume regularity can be written in terms of stepwise constant vectors in the following way. The vectors y i := D −1/2 v i are stepwise constants on the partition (V 1 , . . . , V k ), i = 1, . . . , k. The matrix k i=1 ρ i y i y T i is therefore a symmetric block-matrix on k × k blocks belonging to the above partition of the vertices. Letw ab denote its entries in the (a, b) block (a, b = 1, . . . , k). Using (21), the following approximation of the matrix W is performed: Therefore, the entries of W -for i ∈ V a , j ∈ V b -can be decomposed as where the cut norm and spectral norm of the n × n symmetric error matrix E = (η ij ) is at most D ( √ 2s + ε). But we will restrict the error matrix to V a × V b : its entries are η ij 's for i ∈ V a , j ∈ V b , and zeros otherwise. Denoting the restricted matrix by E ab , and the restricted diagonal matrices by D a and D b , respectively, the following finer estimation holds: Consequently, for a, b = 1, . . . , k: that gives the required statement both in the a = b and a = b case.
Remark 2 In the k = 2 case, the estimate of Theorem 1 has the same order of magnitude as that of Theorem 2, since s 2 = O( 1−θ 1−ε ). The statement has only relevance for an integer