The Extrapolation-Accelerated Multilevel Aggregation Method in PageRank Computation

An accelerated multilevel aggregation method is presented for calculating the stationary probability vector of an irreducible stochastic matrix in PageRank computation, where the vector extrapolation method is its accelerator. We show how to periodically combine the extrapolationmethod togetherwith themultilevel aggregationmethod on the finest level for speeding up the PageRank computation. Detailed numerical results are given to illustrate the behavior of this method, and comparisons with the typical methods are also made.


Introduction
The PageRank algorithm for assigning a rank of importance to web pages has been the key technique in web search [1].The core of the PageRank computation consists in the principal eigenvector of a stochastic matrix representing the hyperlink structure of the web.Due to the large size of the web graph (over eight billion nodes [2]), computing PageRank is faced with the big challenge of computational resources, both in terms of the space of CPU and RAM required and in terms of the speed of updating in time; that is, as a new crawl is completed, it can be soon available for searching.Among all the numerical methods to compute PageRank, the Power method is one of the standard ways for its stable and reliable performances [1], whereas the low rate of convergence is its fatal flaw.Many accelerating techniques have been proposed to speed up the convergence of the Power method, including aggregation/disaggregation [3][4][5], vector extrapolation [6][7][8][9][10], multilevel [11][12][13][14], lumping [15,16], Arnoldi-type [10,17], and adaptive methods [4,18].To some extent, our work follows in this vein, that is, seeking a method to accelerate the convergence of PageRank computation.In this research, we introduce a multilevel aggregation method to accelerate the computation of PageRank, and the acceleration is performed by vector extrapolation, which aims at combining the old iteration sequences.
The remainder of this paper is organized as follows.Section 2 provides preliminary and related works on PageRank and some acceleration techniques.Section 3 describes our main acceleration algorithm.Section 4 covers our numerical results and comparisons among the main PageRank algorithms.Section 5 contains conclusion and points out directions for future research.

Preliminaries and Related Works
2.1.Background on PageRank.PageRank [19] is the heart of software, as Google claims on its webpage, and continues to provide the basis for all the web search tools.Link-based PageRank views the web as a directed graph, where each node represents a page, and each edge from node  to node  represents the existence of a link from page  to page , and at this time, one can view page  as "important." Page and Brin [1] made another basic assumption: the amount of importance distributed to  by  is proportional to the importance of  and inversely proportional to the number of pages  is pointing to.This definition suggests an iterative fixed-point computation for determining the importance for every page on the web.Each page has a fixed rank score   , forming the rank vector .Page and Brin convert the computation of PageRank to the computation of stationary distribution vector of the linear system   =   , where  is the adjacency matrix of the web graph  normalized so that each row sums to 1.For  to be a valid transition probability matrix, every node must have at least 1 outlink; that is,  should have no rows consisting of all zeros (pages without outlinks are called dangling nodes).Meanwhile, to guarantee the uniqueness of a unitary norm eigenvector corresponding to the eigenvalue 1, by the ergodic theorem for Markov chains [20],  should be aperiodic and irreducible.To solve these problems, Page and Brin have introduced a parameter  and transferred matrix  to a new one   :   = ( +   ) + (1 − )  , where  is the vector with all entries equal to 1,  is the column vector representing a uniform probability distribution over all nodes,  = [1/] ×1 ,  = 1, while page  is dangling and  = 0 otherwise, assuming the total page number is .If viewing a surfer as a random walker on the web, the constant  can be explained below: a web surfer visiting a page will jump to any other page on the web with probability (1 − ), rather than following an outlink.Matrix   is usually called the Google matrix.Then, PageRank vector  is the stationary distribution vector of   ; that is, obviously, (1) can be rewritten as where  =  −   ,  is an identity matrix, and  is a singular -matrix with the diagonal elements are negative column sums of its off-diagonal elements.So what remains to do is to solve the linear system (1) or its homogeneous one (2), corresponding to an irreducible Markov chain.

The Vector Extrapolation-Acceleration of PageRank. The
Power method is the standard algorithm for PageRank, that is, giving the initial uniform distribution  (0) of the system (1), to compute successive iterates  () =  (−1)   , until convergence, that is, when lim  → ∞  () exists, which is exactly the PageRank vector.As mentioned in Section 1, although the Power method is a stable and reliable [1] or even a fast iteration algorithm [21], accelerating the computation is still important, since every search engine crawls a huge number of pages, and each matrix multiplication in iteration is so expensive, requiring considerable resources both in terms of CPU and RAM, and hence, the rate of convergence deteriorates as the number of pages grows larger.Now there are many acceleration algorithms for PageRank computation, and among all the algorithms the vector extrapolation method is a very popular and efficient one.A detailed review of the vector extrapolation method can be found in the work of Kamvar et al., Smith et al.,and Sidi [9,22,23].To our knowledge, there are several kinds of extrapolation methods, such as quadratic extrapolation [9], two polynomial-type methods including the minimal polynomial extrapolation method (MPE) of Cabay and Jackson [24] and the reduced rank extrapolation (RRE) of Eddy [25], and the three epsilon vector extrapolation methods of Wynn [26].In recent years, many papers have discussed the application of vector extrapolation method to compute the stationary probability vector of Markov chains and web ranking problems, see [6,8,9,22,23,27] for details.Numerical experiments in these papers suggest that the polynomial-type methods are in general more economical than the epsilon vector extrapolation methods in the sense of computation time and storage requirements.For this reason, our concern in the context is to consider a polynomial-type vector extrapolation, that is, the generalization of quadratic extrapolation method (GQE) proposed in [23].Now let us discuss simply the strategy of GQE.As it is known, the starting point of the extrapolation method is to accelerate the convergence of the sequences {  } generated from a fixed-point iterative method in Markov chain of the form where  0 is an initial guess.Supposve that we have produced a sequence of iterates {  } ∞ =1 , where   ≥ 0.Then, at the th outer iteration, let be a matrix consisting of the last  iterates with   being the newest, where  is called the window size as usual.It is evident that  has the following properties: The problem to be solved is transformed into obtaining a vector  satisfying ∑  =1   = 1, and, thus we have an updated probability vector a linear combination of the last  iterates.That is to say, the current iterate can be expressed as a linear combination of some of the first eigenvectors, combined with the Power method, up to the converge of the principal eigenvector.GQE is derived in this light and can be given as follows [23].

Accelerated Aggregation Strategy in PageRank Computation
The core of PageRank computation, as discussed earlier, is to solve the large sparse Markov chain problem (2).Thus, it is evocative of using a especially useful method-multilevel method based on aggregation of Markov states, which has attracted considerable attention [11-14, 28, 29].Isensee and Horton considered a kind of multilevel methods for the steady state solution of continuous-time and discrete-time Markov chains in [13,14], respectively.De Sterck et al. proposed a multilevel adaptive aggregation method for calculating the stationary probability vector of Markov matrices in [11], as shown in their context, which is a special case of the adaptive smoothed aggregation [30] and adaptive algebraic multigrid methods [31] for sparse linear systems.
The central idea of multilevel aggregation method is to convert a large linear system to a smaller one by some aggregation strategies, and thus, the stationary state solution can be obtained in an efficient way.
However, aggregation/disaggregation cannot always dissolve the algorithmic scalability due to poor approximation of  in problem (2) by unsmoothed intergrid operators, so a careful choice of aggregation strategy is crucial for the efficiency of the multilevel aggregation hierarchies.Now there are many aggregation methods, such as graph aggregation [32], neighborhood-based aggregation, and (double) pairwise aggregation [4,13,14].In our study, we consider the neighborhood aggregation method as described in [12][13][14], since it is able to result in well-balanced aggregates of approximately equal size and provide a more regular coarsening throughout the automatic coarsening process [12,29,33].Our aggregation strategies are based on the problem matrix by the current iterate  =  × diag(  ), rather than the original coefficient matrix .Let   denote the current iterate; then, we say node  is strongly connected to node  in the graph of  if where  is a strength of connection parameter and  = 0.25 is used there.Suppose that N  is the set of all points which are strongly connected to  in the graph of  including node  itself.Then, we have the neighborhood-based algorithm as follows (see [12,29] for more details).
Note that the aggregation matrix  in Algorithm 2 has the following form From ( 8), we find that there exists only one element   = 1 in each row, but each column may have several elements   = 1, and the sum of the elements in the th column denotes the number of the nodes combined into the th aggregation.
The multilevel aggregation method, based on the neighborhood-based aggregation strategy, introduced by De Sterck et al. in [11,12,28,29] can be expressed as follows.
(2) Build  according to the automatic coarsening process described below.Obtain  ←   and  ← diag().
In Algorithm 3,  =  1 is given in system (2),  is an initial guess vector,   denotes the coarse-level matrix, and   is the corresponding coarse-level vector, where  = 1, 2, . . .,  is the number of levels, the finest level is 1, and the coarsest level is . is the aggregation matrix generated by the aggregation method-Algorithm 2, and   and   ,  = 1, 2, . . .,  − 1 are the prolongation and restriction operators, respectively, which are created by an automatic coarsening process in step 2, and then, we get the coarse-level matrices by the equation   =   ×   ×   .Setting the weighted Jacobi method with weight , a variant of Jacobi method, as the pre-and postsmoothing approaches and letting ] 1 and ] 2 be their iteration times, respectively, the matrix  in system (2) is splitted into the following form where  is the diagonal part of the matrix  with   > 0 for all  and  and  are the negated strictly lower-and uppertriangular parts of , respectively.Then, the weighted Jacobi relaxation method can be written as with weight  ∈ (0, 1).Here, we use (⋅) to denote the normalization operator defined by () := /‖‖ 1 for all  ̸ = 0. Note that we have carried out the normalization after each relaxation process in Algorithm 3 to ensure that the finelevel iterates   can be interpreted as approximations to the stationary probability vector.
It should be noted that the direct method on the coarsest level, used at step 4 of Algorithm 3, is based on the following theorem.
Theorem 4 (see [34,Theorem 4.16]).If  is an irreducible singular -matrix, then each of its principal submatrices other than  itself is a nonsingular -matrix.
If   is the coarsest-level operator, then we use the direct method presented in the coarsest-level algorithm below to solve the coarsest-level equation     = 0.
Step 4. Compute   :=   \   ; let   (  ) = 1; and obtain the coarsest-level solution   =   /‖  ‖ 1 .Now, let us introduce the main idea of this paper and the implementation details of the main algorithm.In fact, our idea is derived from the excellent papers [12,28,29].In the literature, De Sterck et al. studied several strategies to accelerate the convergence of the multilevel aggregation method, including the application of a smoothing technique to the interpolation and restriction operators in [28] and the analysis of a recursively accelerated multilevel aggregation method by computing quadratic programming problems with inequality constraints in [29].Of particular note, in [12], they introduced a top-level acceleration of adaptive algebraic multilevel method by selecting a linear combination of old iterates to minimize a function over a subset of the set of probability vectors.Their acceleration strategy was involved mainly in cases when the window size  = 2, and for larger window size , such as  = 3 or 4, they used the active set method from Matlab's quadprog function [35].With this in mind, enough interest is aroused to look for a general acceleration method for any case of window size.In view of the effectiveness of vector extrapolation in improving the convergence of the iterate sequence, now consider a new accelerated multilevel aggregation method, combining the multilevel aggregation method and the vector extrapolation method.Giving the finest level iterate sequence {  } produced by Algorithm 3 in multilevel aggregation, the updated iterate x is generated as their linear combination by Algorithm 1.And so we can present our accelerated multilevel aggregation as follows.
( Note that  in Algorithm 5 is the window size and  is a prescribed tolerance.In particular, there exists a difference in constructing the matrix  between our Algorithm 5 and the algorithm of [36].From Algorithm 1, the main reason is that the GQE needs  + 2 vectors  0 ,  1 , . . .,  +1 as their input.That is to say, when the window size is  = 2, four approximate probability vectors given in the matrix  = [ 0 ,  1 ,  2 ,  3 ] ∈ R ×4 are required as its input of GQE algorithms.In particular, the window size  = 2 corresponds to the quadratic vector extrapolation method presented in [9].Hence, the matrix  is given as the form of step 4 in Algorithm 5.The efficiency of our accelerated multilevel aggregation algorithms will be shown in the next section, where the relative advantage of our method over Matlab's quadprog function in the Markov chain and over other methods in PageRank computation will be demonstrated.

Numerical Results and Analysis
In this section, we report some numerical experiments to show the numerical behavior of extrapolation-accelerated multilevel aggregation methods.All the numerical results are obtained with a Matlab 7.0.1 implementation on a 2.93 GHz processor with 2 GB main memory.
For the sake of justice, the initial guess vector is selected as  0 = /‖‖ 1 for all the algorithms.All the iterations are terminated when the residual 1-norm          1      0    1 ≤ tol, (11) where   is the current approximate solution and tol is a user described tolerance.For convenience, in all the previous tables we have abbreviated the Power method, the generation of quadratic extrapolation method, and the extrapolationaccelerated multilevel aggregation method as Power, GQE, and EAMA, respectively.We use "res" to denote the 1-norm residual, "lev" to the number of levels in the multilevel aggregation, "ite" to the number of iterations, and "CPU" to the CPU time used in seconds.
Example 1.As described in Section 3, the active set method from Matlab's quadprog function [12,35] is usually used for the acceleration strategy in Markov chains and other problems.And in this paper, we have suggested the extrapolationacceleration strategy instead of the Matlab's one.This example aims to compare the two methods and to test the efficiency of our new method.The test example is a birth-death chain  with invariant birth and death rates as shown in Figure 1 [37][38][39], which is usually used in queuing theory, demographics, performance engineering, or in biology [32,40].

𝜇 𝜇 𝜇
Setting  = 0.96, relaxation parameter  = 0.7, and the strength parameter of connection  = 0.25 in our experiment, we use -matlab to denote the Matlab's quadprog function acceleration method that improves the -cycles on the finest level and "" to denote the length of Markov chains."  " denotes the operator complexity of the last cycle, which is defined as the sum of the number of nonzero entries in all operators on all levels   ,  = 1, . . ., , divided by the number of nonzero entries in the finest-level operator  1 .That is,   = ∑  =1 nnz(  )/nnz( 1 ) where nnz() is the number of nonzero entries in the matrix .Numerical results for EAMA and -matlab methods have been reported in Table 1.
From Table 1, it is evident that our accelerated multilevel aggregation method EAMA has better performance than Matlab's function -matlab for the testing problem from both an operator complexity and the iteration count perspective, and moreover, our method is more efficient with the length of Markov chain increasing.For instance, when  = 1024 and the window size :  = 2, 3, 4, the EAMA method cuts the number of iterations by up to 37.7%, 20%, and 35.4%, respectively, compared to the -matlab method.
Example 2. In this example, we consider the performance of EAMA in the PageRank computation.We take a typical website of "Hollions" as our numerical example, which has been listed in Chapter 14 of [41], which contains 6012 web pages and 23875 hyperlinks.We compare the Power method, the generalization of quadratic extrapolation method, and the extrapolation-accelerated multilevel aggregation method for the PageRank problem.
In Algorithm 5, we consider that accelerators, -cycles ( = 2), and -cycles can be treated in a similar way.Some specific sets of parameters are used.We use the weighted Jacobi as the pre-and postsmoothing approaches in Algorithm 2, with relaxation parameter  = 0.7.Direct coarsest-level solvers are performed by the coarsest-level algorithm, and the strength parameter of connection is chosen as  = 0.25.There are some numerical results reported in Tables 2 and 3 in the following parts.
The residual analysis in Table 2 indicates that EAMA does much better in PageRank computation than the other two methods, namely, the classic Power method and GQE, and EAMA has a more obvious advantage over the other two methds, with the iteration count increasing.
From Table 3, when an error tolerance is provided and no matter what value the window size is, the accelerated multilevel aggregation method outperforms the Power and GQE, in terms of both iteration count and CPU time.For instance, when the window size is 3, the number of iterations by EAMA is about half of the one by GQE and less than one third of the one by Power.
Example 3. The test matrix is the widely used CS-Stanford Web matrix, which is available from http://www.cise.ufl.edu/research/sparse/matrices/index.html.It contains 9914 nodes and 35,555 links.In this example, we run the Power method, the generalized quadratic extrapolation method, and our extrapolation-accelerated multilevel aggregation method.The convergence tolerance is tol = 10 −5 , and the window size is selected as  = 3.Table 4 lists the results.
It is seen from Table 4 that the numerical behavior of the three algorithms strongly relies on the choice of the damping factor  in PageRank computation.When  is high, say 0.95 and 0.99, the computing time and iteration count are sizable.However, when  is moderate, say 0.85, the computing time and iteration count are greatly reduced, like, about oneseventh of the case  = 0.99.This just confirms Page and Brin's point [1]:  = 0.85 is the most common choice.Besides, just as expected, it is obvious to see that the GQE method is superior to the Power method, while the new multilevel aggregation method (EAMA) performs the best, in both computing time and iteration count terms.For instance, when  is 0.85, comparing with the results by Power and GQE, the CPU time for EAMA has been reduced by 36.4% and 30.9%, respectively, and the iteration count has been reduced by 64.1% and 33.3%, respectively.Example 4. This example aims to examine the influence of the choice of the convergence tolerance on the algorithms.The test matrix is the Stanford-Berkeley Web matrix (available from http://www.stanford.edu/∼sdkamvar/research.html).It contains 683,446 pages and 7.6 million links.We run the Power method, the generalized quadratic extrapolation method, and our extrapolation-accelerated multilevel aggregation method on this problem and choose tol to be 10 −5 , 10 −6 , 10 −8 , and 10 −10 , respectively.The window size for the extrapolation procedure is selected as  = 3, and  in PageRank is set as 0.85.Table 5 reports the results obtained.
It is easy to see from Table 5 that our new algorithm, EAMA, performs the best in most cases, regardless of the convergence tolerance and both in terms of computing time and iteration numbers.For instance, when tol varies from 10 −6 to 10 −8 , the iteration count needed for the three algorithms (power, GQE, and EAMA) increases 24.1%, 18.0%, and 17.3%, respectively, but nonetheless, the count for EAMA is still less than the corresponding one for GQE and far less than that for the Power, only about 74.2% of the latter.It can be seen that about the same goes for CPU time.

Conclusions
This paper illustrates the accelerated multilevel aggregation method for calculating the stationary probability vector of an irreducible stochastic matrix, Google matrix, in PageRank computation.It is conducted to combine the vector extrapolation method and the multilevel aggregation method, where the neighborhood-based aggregation strategy [11,12,28,29] is used, and the vector extrapolation acceleration is carried out on the finest level.Our approach is inspired by De Sterck et al. in [12] where the active set method from Matlab's quadprog function was used therein for window size , which is greater than two.Thus, it is natural to seek for a general acceleration method for any case of window size, and thus, our EAMA is produced.
Although our method is effective, however, there are still many places worth exploring.The Google matrix, for example, can be reordered according to dangling nodes and nondangling nodes of the matrix [16,[42][43][44], which reduces the computation of PageRank to that of solving a much smaller problem.And for the special linear system (2) in Markov chains, there maybe other even better options than the neighborhood-based aggregation strategy in multilevel method.With some improvement, our algorithm will be worth watching and will be a part of future work.

Table 1 :
Numerical comparison results for EAMA and Q-matlab for Example 1.

Table 2 :
Comparisons of the residual vectors for the Power, GQE, and EAMA methods when given the number of iterations.

Table 3 :
Comparisons of the iteration counts and CPU for the Power, GQE, and EAMA methods when the residual vector is 10 −5 .

Table 4 :
Numerical results of the three algorithms (with various ) on the CS-Stanford Web matrix.

Table 5 :
Numerical comparison results of the three algorithms (with various convergence tolerances) for Example 4.