Underestimated cost of targeted attacks on complex networks

The robustness of complex networks under targeted attacks is deeply connected to the resilience of complex systems, i.e., the ability to make appropriate responses to the attacks. In this article, we investigated the state-of-the-art targeted node attack algorithms and demonstrate that they become very inefficient when the cost of the attack is taken into consideration. In this paper, we made explicit assumption that the cost of removing a node is proportional to the number of adjacent links that are removed, i.e., higher degree nodes have higher cost. Finally, for the case when it is possible to attack links, we propose a simple and efficient edge removal strategy named Hierarchical Power Iterative Normalized cut (HPI-Ncut).The results on real and artificial networks show that the HPI-Ncut algorithm outperforms all the node removal and link removal attack algorithms when the cost of the attack is taken into consideration. In addition, we show that on sparse networks, the complexity of this hierarchical power iteration edge removal algorithm is only $O(n\log^{2+\epsilon}(n))$.


Introduction
Resilience of complex networks refers to their ability to react on internal failures or external disturbances (attacks) on nodes or edges. The reaction is fundamentally connected to the robustness of the network structure 1 that represents the complex system, which is often characterized by the existence of a giant connected component (GCC). Robustness of the connected components under random failure of nodes or edges is described with classical percolation theory 2 . In network science, percolation is the simplest process showing a continuous phase transition, scale invariance, fractal structure and universality and it is described with just a single parameter, the probability of removing a node or edge. Network science studies have demonstrated that scale-free networks 3,4 are more robust than random networks 5, 6 under random attacks but less robust under targeted attacks [7][8][9][10][11] . Recently, studies of network resilience has moved their focus to more realistic scenarios of interdependent networks 12 , different failure 13 and recovery 14,15 mechanisms.
Although the study of network robustness is mature, the majority of the targeted attack strategies are still based on the heuristic identification of influential nodes 10,[16][17][18][19] with no performance guarantees for the optimality of the solution. Finding the minimal set of nodes such that their removal maximally fragments the network is called network dismantling problem 20,21 and belongs to the NP-hard class. Thus no polynomial-time algorithm has been found for it and only recently different state-of-the-art methods were proposed as approximation algorithms [20][21][22][23][24] for this task. Although state-of-the-art methods [20][21][22][23][24] show promising results for network dismantling, we take one step back and analyze the implicit assumption these network dismantling algorithms have. They make implicit assumption that the cost of removing actions are equivalent for all of the nodes, regardless of their centrality in network. However, attacking a central node, e.g., a high degree node in socio-technical systems usually comes with the additional cost when compared to the same action on a low degree node. Therefore, it is more realistic to explicitly assume that the cost of an attack is proportional to the number of the edges this attack strategy will remove.
We investigated different state-of-the-art algorithms and the results show that with respect to this new concept of cost, most state-of-the-art algorithms are very inefficient for their high cost, and in most instances perform even worse than random removal strategy. To overcome this large cost, we proposed a edge-removal strategy, named Hierarchical Power Iterative Normalized cut (HPI-Ncut) as one of the possible solutions. Actually, removing a node is equivalent to removing all edges of that node, and therefore all node removal actions can be reproduced with edge removal strategy but vice versa does not hold. To partition a network, node removal algorithms always remove all the edges connected to some important nodes. However, it is unnecessary to do this because only some specific edges play a key role both on the importance of the nodes and the connectivity of the network. In cases when the link removal strategies are possible, our results show that the edge removal algorithm we used outperforms all the state-of-the-art targeted node attack algorithms. Finally, we compared the cost of the proposed edge removal strategy HPI-Ncut with other two edge removal strategies which are based on edge betweenness centrality 16 and bridgeness centrality 25 . arXiv:1710.03522v1 [cs.SI] 10 Oct 2017

Results
A lot of algorithms have been proposed to address network fragmentation problem 8,10,19,22,26 from the node removal perspective. These algorithms mainly pay attention to the minimization of the size of the giant connected component and assume that the cost is proportional to the number of removed nodes. However, the essence of the node removing is to remove all the edges connected to it. In this article, we make explicit assumption that the cost of removing a node is proportional to the number of the associated edges that has to be removed. This implies that the nodes with higher degree have higher associated removal cost.
In subsection 2.1, we introduce the empirical and artificial networks that are used in this paper. Then in subsection 2.2, we quantify the cost of the state-of-the-art node removal strategies and show that in most cases the cost of such attacks are inefficient. This results have important impact for real world scenarios of network fragmentations where cost budget is limited. Finally, when it is possible to remove single edges (e.g. shielding a communication links, removing power lines, cutting off trading relationships or others), we use a spectral edge removal method and compare its cost with other strategies in subsections 2.3, 2.4. The effect of edge removal as an immunization measure for the spreading process is shown in subsection 2.5.

Data sets
To evaluate the performances of the network dismantling (fragmentation) algorithms, both real networks and synthetic networks are used in this paper: (i) Political Blogs 27 is an undirected social network which was collected around the time of the US. presidential election in 2004. This network is a relatively dense network whose average degree is 27.36. (ii) Petster-hamster is an undirected social network which contains friendships and family links between users of the website hamsterster.com. This network data set can be downloaded from KONECT 1 . (iii) Powergrid 28 is an undirected power grid network in which a node is either a generator, a transformator or a substation, while a link represents a transmission line. This network data set can also be downloaded from KONECT 2 . (iv) Autonomous Systems is an undirected network from the University of Oregon Route Views Project 29 . This network data set can be downloaded from SNAP 3 . (v) Erdős-Rényi (ER) network 30 is constructed with 2500 nodes. Its average degree is 20 and the replacement probability is 0.01. (vi) Scale-free (SF) network with size 10,000, exponent 2.5, and average degree 4.68. (vii) Scale-free (SF) network with size 10,000, exponent 3.5, and average degree 2.35. (viii) Stochastic block model (SBM) with ten clusters is an undirected network with 4232 nodes and average degree 2.60. The basic properties of these networks are listed in table 1.

Cost-fragmentation inefficiency of the node targeting attack strategies
Let us define the function f D (x) as the size of GCC for fixed attack cost x for strategy D. The cost x ∈ [0, 1] is measured as the ratio of the number of removed edges in the network. Now, for the fixed budget x, the strategy D is more efficient than strategy L if and only if f D (x) < f L (x), i.e., size of the GCC is smaller by attacking with strategy D than with strategy L with limited budget x.
One way to compare the attack performances of strategies is to plot the function f D (x) of the size of GCC after attack versus the cost, see fig. 1. Here we define the cost-fragmentation effectiveness (CFE) for strategy D as the area under the curve of the size of GCC versus the cost, which can be computed as the integral over all possible budgets: The smaller the CFE (i.e., the area under the curve), the better the attack effect.
Taking into account the role of the cost in targeted attacks, the results are highly counterintuitive: For a fixed budget, many networks are more fragile with the High Degree (HD) attack strategy than by High Degree Adaptive (HDA) strategy, as the results shown in table 2 and table 3. Furthermore, the performances of the state-of-the-art node removal-based methods can become even worse than the naive random removal of nodes (site percolation) when we take into account the attack cost, as shown in fig. 1 and fig. 2. In addition, comparing with the CFE of site percolation and bond percolation method in table 2, we can find that the bond percolation works better on the networks with lower average degree, i.e., on the Powergrid, SF (γ = 3.5), and SBM network, otherwise, it is better to choose the site percolation method.
In fact, networks have their intrinsic resilience under attacking for their distinct network structures. To avoid the interference of the architectural difference of networks, we use site percolation method as a baseline null model. The site percolation strategy randomly removes nodes in a network, which could be used to reflected the intrinsic resilience of the attacked network to a certain extent. The cost-fragmentation effectiveness of the site percolation is denoted with F * = 1 0 f * (x)dx.
On the whole, all node-centric strategies (HD 31 , HDA 31 , EGP 19 , CI 22 , CoreHD 20 and Min-sum 21 ) distinctly work better than baseline on the three networks with lower average degree, i.e., powergrid, SF (γ = 3.5), and SBM network. However, on empirical social Petster-hamster network, Political Blogs network, Autonomous  Systems network and SF (γ = 2.5) network, all node-centric strategies (HD 31 , HDA 31 , EGP 19 , CI 22 , CoreHD 20 and Min-sum 21 ) are comparably equal or even less cost-fragmentation inefficient than the baseline random model, according to the CFE score. The last line of the table 3, the average value of the improvement over different networks is computed, which can reflect the overall CFE of the algorithms. This results suggest that state-of-the-art node-centric algorithms in realistic settings are rather inefficient if the cost of fragmentation is taken into account.

The edge-removal problem
In network science, nodes represent entities in a system and edges represent the relationships or interactions between them. Both the nodes and the edges are the fundamental part of a network. Deleting or removing a specific ratio of them will lead to great changes in the structure and functions of the network. The problem of network attack or fragmentation has received a huge amount of attention in the past decade 19,[32][33][34][35] . However, as far as we are concerned, almost all of the attack strategies are   Table 3. The improvement of the CFE of each algorithm, comparing with the baseline, i.e., site percolation method. The best performing algorithm in each column is emphasized in bold.
node removal based, in which the node removal operation is carried out via removing all the edges connected to it. In fact, to partition a network in to small clusters, it is unnecessary to remove all the links of a node. We remove a node because either we suppose it has a higher influence or the node is a bridge between clusters. If we remove part of its connected links, its influence may greatly reduce or it may not be a bridge any more. From another perspective, edges play far different roles in real networks 36,37 . Some of them are crucial to the diffusion process, while others are irrelevant. Thus, if the edge removal actions on networks are applicable, the edge removal attack will be more accuracy and efficient. The link fragmentation or attack problem can be narrated as follows: If we have a budget of x links that can be attacked or removed, which links should we pick? This is mathematically equivalent to asking how to partition a given network with a minimal separate set of edges. The objective function of link attack takes the following general form 38 : where A 1 , · · · , A k are k nonempty subsets from a partition of the original network, A i is the complement set of the nodes of A i , and W (A i , A i ) is the number of the links between the two disjoint subsets A i and A i . In this paper, we applied the spectral strategy for edge attack problem, which fall in the class of well known spectral clustering and partitioning algorithms [39][40][41][42][43] . We use the hierarchical partitioning with Ncut objective function 40 combined with power iteration procedure for approximation of eigenvectors. The complete description of this HPI-Ncut edge removal strategy will be presented in the Section 3. The results show that the HPI-Ncut strategy greatly decreases the cost of the attack, comparing with the state-of-the-art nodes removing strategies. In the following subsection, we will compare HPI-Ncut algorithm with random uniform attack strategy, edge betweenness, bridgeness, and some classical node removing strategies (see the definitions of these algorithms in the Section 3).

Effectiveness of the HPI-Ncut algorithm
In general case, each attack strategy algorithm could generate a ranking list of all (or partial) nodes or links of the network. After removing the nodes or links one after another, the size of the GCC of the residual network characterizes the effectiveness of each algorithm. The removal process will cease when the size of the GCC is smaller than a given threshold (here we use 0.01).  In this paper, to test the effectiveness of this spectral edge removal algorithm, HPI-Ncut, we plot the size of the GCC versus the removal fraction of links, for both real networks ( fig. 1 and fig. 3) and synthetic networks ( fig. 2 and fig. 4), comparing with classical node removing algorithms ( fig. 1 and fig. 2) and existed link evaluation methods ( fig. 3 and fig. 4). The results show that the HPI-Ncut algorithm outperforms all the other attack algorithms. In fig. 1 and fig. 2, we compared the HPI-Ncut algorithm with some state-of-the-art node removal-based target attack algorithms. Fig. 1 (a) shows that all the node removal-based algorithms are better than the site percolation method on Powergrid network, that is because the average degree of the Powergrid network is very low, only 2.67. This could also be verified by the results in fig. 2 (c) and (d), in which the average degree of the SF (γ = 3.5) and the SBM network are 2.35 and 2.60, respectively. The trends of the curves in fig. 1 and fig. 2 also show that the target attack algorithms works better on networks with lower average degree. Furthermore, regardless of the HPI-Ncut algorithm, other algorithms have poorer performance than baseline method (site percolation). The performances of site percolation are better until the proportion of the removed links is greater than 0.7 on SF (γ = 3.5) network and until the proportion is greater than 0.2 on SF (γ = 2.5) network. The site percolation on the SF (γ = 3.5) presents an obvious phase transition phenomenon 31 comparing with the result on the SF (γ = 2.5). In addition, in fig. 2 (a) and (d), the SBM network has obvious clusters structure comparing with the ER network. The Min-Sum, CI, CoreHD, EGP, and site percolation algorithms have a better performance on the SBM network. Moreover, the error of the site percolation method on the ER network is larger than the error on SBM network. That implies that the cluster structure of a network has a big influence on the performance of the attack strategies.
To conclude the results of fig. 1 and fig. 2, the state-of-the-art targeted node removal strategies make large cost for optimized targeted attacks. Contrary, HPI-Ncut algorithm overwhelmingly outperforms all the node removal-based attack algorithms, no matter on sparse or dense networks, on the networks with or without clusters structure.
In fig. 3 and fig. 4, we compared the HPI-Ncut algorithm with some exited link evaluation algorithms. First of all, we can find that the HPI-Ncut algorithm works better and is more stable than all the other algorithms. Secondly, comparing with the results of site and bond percolation in fig. 1 and fig. 2, we can see that the bond percolation method outperforms the site percolation method only when the average degree of the network is lower (see the results of the Powergrid, SF (γ = 3.5), and SBM network), otherwise, the site percolation is a better choice. Thirdly, in the fig. 4 (b) and (c), we can see that the bond 5/14  Figure 5. The spreadabilities of the networks before and after the removing of 10% edges by HPI-Ncut algorithm. The x-axis is the time unites. P i is the number of infected entities and P r is the number of recovered entities in the network. In the SIR model, the infection rate β is 0.10, the recovery rate is 0.02, and the basic reproduction number is 5. All the results are the average of 100 times independent runs. It is worth noting that the size of GCC of the Powergrid network is only 54 after removing 10% of links by HPI-Ncut algorithm.
percolation method have a better performance comparing with the edge betweenness and bridegeness algorithm when the cost is limited on scale-free networks, i.e., the proportion of the removed links is smaller than 0.63 in fig. 4(b) and is smaller than 0.4 in fig. 4(c).
To conclude, the HPI-Ncut algorithm overwhelmingly outperforms all the node removal-based attack algorithms and link evaluation algorithms, no matter on sparse or dense networks, on networks with or without clusters structure.

Spreading dynamics after spectral edge immunization/attack
To more intuitively display the targeted attack by HPI-Ncut, we studied the susceptible-infectious-recovery (SIR) 44 epidemic spreading process on four real networks. We compared both the spreading speed and spreading scope on these networks before and after targeted immunization by HPI-Ncut. The simulation results in fig. 5 show that, by simply removing 10% of links, the function of the networks had been profoundly affected by the HPI-Ncut immunization. The proportion of the GCC of the Political Blogs, Powergrid, Petster-hamster, and Autonomous Systems network after attack are 37% (449/1222), 1% (54/4941), 57% (1146/2000), and 37% (2387/6474), respectively. Thus, the spreading speeds are greatly delayed and the spreading scoops are tremendously shrunken on these networks.

Some existed attack strategies
In this subsection, we will briefly introduce some state-of-the-art node removal attack algorithms and some edge evaluation methods used in this paper. The two edge evaluation methods, i.e., edge betweenness and bridgesness, are used to measure the importance or significance of links in spread dynamics or structure connectivities of networks. We use them as comparable link removal-based attack algorithms in this paper.
• Percolation method. In percolation theory 45 , node of networks usually called 'site', while edge usually called 'bond'. In the study of the network attack, percolation is a random uniform attack method which either removes nodes randomly (site percolation) or removes edges randomly (bond percolation).
• Equal graph partitioning (EGP) algorithm. EGP algorithm 19 , which is based on the nested dissection 46 algorithm, can partition a network into two groups with arbitrary size ratio. In every iteration, EGP algorithm divides the target nodes set into three subsets: first group, second group, and the separate group. The separate group is made up of all the nodes that connect to both the first group and the second group. Then minimize the separate group by trying to move nodes to the first group or the second group. Finally, after removing all the nodes in the separate group, the original network will be decomposed into two groups. In our implementation, we partition the network into two groups with approximate equal size.
• Collective Influence (CI) algorithm. CI algorithm 22 attacks the network by mapping the integrity of a tree-like random network into optimal percolation theory 47 to identify the minimal separate set. Specifically, the collective influence of a 6/14 node is computed by the degree of the the neighbors belonging to the frontier of a ball with radius l. CI is an adaptive algorithm which iteratively removes the node with highest CI value after computing the CI values of all the nodes in the residual network. In our implementation, we compute the CI values with l = 3.
• Min-Sum algorithm. The three-stage Min-Sum algorithm 20 includes: (1) Breaking all the circles, which could be detected form the 2-core 17 of a network, by the Min-Sum message passing algorithm, (2) Breaking all the trees larger than a threshold C 1 , (3) Greedily reinserting short cycles that no greater than a threshold C 2 , which ensures that the size of the GCC is not too large. In our implementation, we set C 1 and C 2 as 0.5% and 1% of the size of the networks.
• CoreHD algorithm. Inspired by Min-Sum algorithm, CoreHD algorithm 21 iteratively deletes the node with highest degree from the the 2-core 17 of the residual network.
• Edge betweenness centrality 48 . Betweenness is a widely used centrality measure which is the sum of the fraction of all-pairs shortest paths that pass a node. Edge betweenness, an extension of the betweenness, is used to evaluate the importance of a link, and is defined as the sum of the fraction of all-pairs shortest paths that pass this link 49 .
• Bridgeness 25 . Bridgeness use local information of the network topology to evaluate the significance of edges in maintaining network connectivity. The bridgeness of a link is determined by the size of k-clique communities that the two end points of this link are connected with and the size of the k-clique communities that this link is belonging to.

Hierarchical Power Iterative Normalized cut (HPI-Ncut) edge removal strategy
Here we describe the hierarchical iterative algorithm for edge removal strategy. This algorithm hierarchically applies the spectral bisection algorithm, which has the same objective function as the Normalized cut algorithm 40 . Furthermore we have used the power iteration method to approximate spectral bisection. We provide proof on the exponential convergence and asymptotic upper bounds for the run-time complexity.
In order to explain our algorithm, we quickly recall the spectral bisection algorithm.
The spectral bisection algorithm Input: Adjacency matrix W of a network Output: A separated set of edges that partition the network into two disconnected clusters A,Ā.
1. Compute the eigenvector v 2 , which corresponds to the second smallest eigenvalue of the normalized Laplacian matrix L w = D − 1 2 (D −W ) D − 1 2 , or some other vector v for which v T L w v v T v is close to minimal. We use the power iteration method to compute this vector, which will be explained later.
2. Put all the nodes with v 2 (i) > 0 into the first cluster A and all the nodes with v 2 (i) ≤ 0 into the second clusterĀ. All the edges between these two clusters form the separation set that can partition the network.
The clusters that we obtained by this method had usually very balanced sizes. If, however, it is very important to get clusters of exactly the same size, one could put those n 2 nodes with the largest entries in v 2 into one cluster and the remaining nodes into the other cluster.
Hierarchical Power Iterative Normalized cut (HPI-Ncut) algorithm Input: Adjacency matrix of a network Output: Partition of the network into small groups 1. Partition the GCC of the network into two disconnected clusters A andĀ by using the spectral bisection algorithm and removing all the links in the separated set.
2. If the budget for link removal has not been overrun, and if the GCC is not yet small enough, partition A andĀ with step 1, respectively.
The reason why we cluster hierarchically is that, this allows us to refine the fragmentation gradually. For example, if after partitioning the network into 2 k clusters, we decide that the clusters should be smaller, we would just have to partition each of the existing clusters into 2 new clusters, obtaining 2 k+1 clusters. So the links that were attacked already remain attacked and we just need to attack some additional ones. If, however, we had used spectral clustering straightforwardly, it could happen that the set of links to be attacked in order to partition the network into 2 k+1 clusters, would not contain the set of links that needed to be attacked for 2 k clusters.

7/14
Power iteration method Input: Adjacency matrix W of a network Output: The eigenvector v 2 or some other vector v for which v T L w v v T v is close to λ 2 . 1. Draw v randomly with uniform distribution on the unit sphere.

Objective function of the spectral bisection algorithm
In appendix A, we show that the spectral bisection algorithm has the same objective function with the relaxed Ncut 40 algorithm: where A ⊆ V denotes set of nodes in the first partition,Ā the set of nodes in the second partition and D ii is the degree of the node i.
The main reason we used this objective function is that it minimizes the number of links that are removed and the total sum of node degree centralities in both partition A andĀ is approximately equal.
In appendix B, we show the exponential convergence of the power iteration method to the eigenvector associated with the second smallest eigenvalue of L w .

Complexity
In appendix C, we show that the complexity of the spectral bisection algorithm is O(η(n) · n ·d) and the complexity of the hierarchical clustering algorithm is O(η(n) · n ·d · log(n)) where η(n) is the number of iterations in the power iteration method. The power iteration method converges with exponential speed as η(n) → ∞. The average degreed is almost constant for large sparse network. Hence we may expect assymptotically good results with η(n) = log(n) 1+ε for any ε > 0, giving the hierarchical spectral clustering algorithm a complexity of O(n · log 2+ε (n)). In practice, we have used ε = 0.1, which gives a complexity of O(n · log 2.1 (n)).

Conclusion
To summarize, we investigated some state-of-the-art node target attack algorithms and found that they are very inefficient when the cost of the attack is taken into consideration. The cost of removing a node is defined as the number of links that are removed in the attack process. We found some highly counterintuitive results, that is, the performances of the state-of-the-art node removal-based methods are even worse than the naive site percolation method with respect to the limited cost. This demonstrates that the current state-of-the-art node targeted attack strategies underestimate the heterogeneity of the cost associated to node in complex networks.
Furthermore, in cases when the link removal strategies are possible, we compared the performances of the node-centric (HD 31 , HDA 31 , EGP 19 , CI 22 , CoreHD 20 and Min-sum 21 ) and edge removal strategies (edge betweenness 48 and bridgeness 25 strategy) based on the cost of their attacks, which are measured in the same units, i.e., the ratio of the removed links. Node removal-based algorithms always deletes all the links respected to the removed nodes which is not economical respect to the limited cost. In order to resolve the high-cost problem in network attack, we proposed a hierarchical power iterative algorithm (HPI-Ncut) to partition networks into small groups via edge removing, which has the same objective function with the Ncut 40 spectral clustering algorithm. The results show that HPI-Ncut algorithm outperforms all the node removal-based attack algorithms and link evaluation algorithms on all the networks. In addition, the total complexity of the HPI-Ncut algorithm is only O(n · log 2+ε (n)).
L w is real and symmetric. Therefore it has real eigenvalues λ 1 ≤ λ 2 ≤ ... ≤ λ n corresponding to eigenvectors v 1 , ..., v n which form an orthonormal basis of R n . One can easily show that λ 1 = 0 and λ n ≤ 2. So in order to compute v 2 we consider the matrixL = 2 · I − L w , which has the same eigenvectors v 1 , ..., v n as L w . Now the corresponding eigenvalues arẽ λ 1 = 2 ≥ ... ≥λ n = d max − λ n ≥ 0 and in particular v 1 corresponds to the largest eigenvalue and v 2 to the second largest eigenvalue.
If v is a random vector uniformly drawn from the unit sphere and we force it to be perpendicular to v 1 by setting .. + ψ n v n and ψ 2 = 0 almost surely. FurthermoreLv =λ 2 ψ 2 v 2 + ... +λ n ψ n v n and if we set converges with exponential speed to some eigenvector of L with eigenvalue λ 2 , because for every i with λ i > λ 2 we haveλ ĩ λ 2 < 1 and therefore (12) and therefore this quantity converges to λ 2 with exponential speed.

Appendix C: Complexity
The complexity of the spectral bisection algorithm is the same as the complexity of the power iteration method. The complexity of the power iteration method equals the number of iterations η(n) times the complexity of multiplyingL and v. That is O(η(n) · n ·d) whered is the average degree of the network, or equivalently O(|E| · η(n)) where |E| is the number of edges.
Assuming that the spectral bisection algorithm always produces clusters of equal size, the complexity of the hierarchical spectral clustering algorithm is then given by the sum of: • The complexity of applying spectral bisection once on the whole network. → O(η(n) · n ·d).
• The complexity of applying it on each of the two clusters that we obtained from the first application of spectral bisection and which will have size n 2 . • The complexity of applying it on each of the 4 clusters that we obtained from the previous step and which will have size • The complexity of applying it on each of the n 2 = 2 log 2 (n)−1 clusters that we obtained from the previous step and which will have size n 2 log 2 (n)−1 = 2.

12/14
where we have made the pessimistic assumption that the number of iterations and the average degrees are in each step as large as they were in the beginning. The choice of the function η(n) is a little bit involved. If the initial random choice of the vector v is very unfortunate, there may be many iterations needed in order to have a good approximation of the eigenvector v 2 . In fact, if ψ 2 = 0, then this algorithm would not converge to v 2 at all, however this event has probability 0.
Another condition that might slow down the computation of v 2 is if some of the other eigenvalues λ i , i ≥ 3 are close to λ 2 . In that caseλ ĩ λ 2 would be close to 1 and therefore one can see from equation (11) that the correspoding v i might have a large contribution in v (k) for a long time. However when λ i is close to λ 2 , this also implies that is close to and therefore also provides a good partition of the network, since these are the quantities that are related to the cut-size. Due to this fast convergence, one can expect assymptotically good partitions when η(n) = log(n) 1+ε and ε > 0, giving the hierarchical spectral clustering algorithm a complexity of O(n ·d · log 2+ε (n)) in general and O(n · log 2+ε (n)) for sparse networks.