^{1, 2}

^{1}

^{1}

^{1}

^{1}

^{2}

We introduce a partition of the web pages particularly suited to the PageRank problems in which the web link graph has a nested block structure. Based on the partition of the web pages, dangling nodes, common nodes, and general nodes, the hyperlink matrix can be reordered to be a more simple block structure. Then based on the parallel computation method, we propose an algorithm for the PageRank problems. In this algorithm, the dimension of the linear system becomes smaller, and the vector for general nodes in each block can be calculated separately in every iteration. Numerical experiments show that this approach speeds up the computation of PageRank.

The rapid growth of the World Wide Web has created a need for search tools. One of the best-known algorithms in web search is Google’s PageRank algorithm [

In this paper, we focus on Google’s PageRank algorithm. Let us introduce some notations about Google’s PageRank algorithm. We can model the web as a directed graph with the web pages as the nodes and the hyperlinks as the directed edges. In the graph, if there is a link from page

If the web page

The problem is that if at least one node has zero outdegree, that is, no outlinks, then the Markov chain is absorbing, so a modification to

Now, we have got many methods for solving the PageRank vector

Recently, the structure of the web link graph has been noticed. Kamvar et al. in [

In our paper, we combine ideas from the existence of the dangling nodes and the block structure of the web and exploit a new structure for the hyperlink matrix

Generally, the Google problem is to solve the eigenvector

Suppose that the matrix

Since the coefficient matrix

The rows in the matrix

Then, the coefficient matrix

(1) Partition the web nodes into dangling and nondangling nodes, so that the hyperlink

matrix

(2) Solve

(3) Compute

(4)

In this reordered PageRank Algorithm

(1) Partition the web nodes which form

blocks:

(2) Partition the given vector

according to the size of the

(3) Compute the limiting vector of

(a) Compute

(b) Solve for

(4) Compute

(5) Normalize

It is noted in [

To investigate the detail of the web structure, we can see the experiments in [

Assume that a web link graph with dangling nodes removed has

If a node in a web link graph is not a dangling node or a common node, then we call it general node. The nodes in a web link graph are divided into three classes: dangling node, common node, and general node. Specially, the common nodes and general nodes belong to the nondangling nodes.

There is no dangling node in the blocks

In Figure

A separation of the common nodes for a web link graph which has four blocks.

The structure before the separation

The structure after the separation

Notice that the matrix in (

Notice that the matrix

As a result, the PageRank system in (

As we know, some web link graphs appear to have a nested block structure. Then according to the definition of common node, it is not difficult to find the common nodes among the different blocks. This can be done by a process of locating nonzero entries on submatrices of

Note that there is no links among the new blocks

Since

In this section, we give an example to present our algorithms.

For the dot plot graph of these three web link graphs, if there exists a link from node

There is a definite block structure to the web.

The individual blocks are much smaller than entire web.

There are clear nested blocks.

For example, Figure

One of the three web link graphs, where the proportion between general nodes and common nodes is 7 : 3 in each subblock.

Then, in each experiment, we separate the nodes into dangling nodes, common nodes, and the rest (general nodes). The result of this process is a decomposition of the

A reordering of the submatrix

The web link graph of the submatrix

The web link graph of the submatrix

Based on the three experiment datasets, we compare Algorithm

Comparison of original PageRank, reordered PageRank and Algorithm

Dataset 1 | Dataset 2 | Dataset 3 | ||
---|---|---|---|---|

Reordered PageRank | Iterations | 81 | 80 | 72 |

Time (sec.) | 0.0377 | 0.0366 | 0.0400 | |

Original PageRank | Iterations | 20 | 25 | 31 |

Time (sec.) | 0.0292 | 0.0270 | 0.0305 | |

Algorithm |
Iterations | 21 | 31 | 42 |

Time (sec.) | 0.0165 | 0.0145 | 0.0187 |

Comparison among the three algorithms which are run on three datasets.

Experiment 1 on dataset 1

Experiment 2 on dataset 2

Experiment 3 on dataset 3

It has investigated that the hyperlink graphs of some web pages have nested block structure which can be found in [

The authors would like to express their great thankfulness to the referees and the editor for their much helpful suggestions for revising this paper. This research is supported by 973 Program (2013CB329404), NSFC (61370147, 61170311, and 61170309), Chinese Universities Specialized Research Fund for the Doctoral Program (20110185110020), and Sichuan Province Sci. & Tech. Research Project (2012GZX0080).