Link Prediction via Convex Nonnegative Matrix Factorization on Multiscale Blocks

Low rank matrices approximations have been used in link prediction for networks, which are usually global optimal methods and lack of using the local information. The block structure is a significant local feature of matrices: entities in the same block have similar values, which implies that links are more likely to be found within dense blocks. We use this insight to give a probabilistic latent variable model for finding missing links by convex nonnegative matrix factorization with block detection. The experiments show that this method gives better prediction accuracy than original method alone. Different from the original low rank matrices approximations methods for link prediction, the sparseness of solutions is in accord with the sparse property for most real complex networks. Scaling to massive size network, we use the block information mapping matrices onto distributed architectures and give a divide-and-conquer prediction method.The experiments show that it gives better results than common neighbors method when the networks have a large number of missing links.


Introduction
As a fundamental problem in the network researches, link prediction attempts to estimate the likelihood of relationship between two individuals by the study of observed links and the property of nodes.Researches on the problem can benefit a variety of fields.For example, researchers in different areas can efficiently find their cooperative partners or assistants.Security agencies can more precisely focus their efforts on probable relationships in malicious networks.In social networks, people can find companions based on the prediction of their surrounding networks.
The natural framework of link prediction methods is the similarity-based algorithm.Simple similarity-based measures such as neighborhood-based measures, for example, Adamic-Adar score [1] and common neighbors [2], require consideration of the local structure of the networks.Recently, a considerable amount of work which draws attention to community structure and scalable proximity estimation [3,4] gives good prediction accuracy.Some similarity-based measures such as the path based methods, for example, Katz [5] and Rooted PageRank [6,7], which focus on the global structure of the networks, are more effective but have a high computational complexity.A new measure based on neighbor communities has a good performance with a low complexity [8].Maximum likelihood estimation, such as hierarchical structure model [3] and stochastic block model [9,10], presuppose some organizing principles of the network structure.Some algorithms, such as probabilistic relational models [11], probabilistic entity-relationship models [12], and stochastic relational models [13], learn the underlying structure from the observed network and then predict the missing links.Lichtenwalter et al. [14] designed a flow based method for link prediction, which is more localized.Low rank matrices approximations can also be used in link prediction for network [15][16][17].Based on the technique of cluster low rank approximation for massive graphs, Shin et al. proposed a multiscale link prediction method [18], which captures the information of global structure of network and can handle massive networks quickly.
In order to capture the information of both global structure and clustering structure of network, we consider low rank approximations as well as blocks in networks' adjacent matrices.Low rank approximations algorithms are good techniques to get the global information of the matrices.Meanwhile block structure is an important feature of matrices and it is often true that links in the same blocks have similar properties.Indeed, links are easy to be found in dense blocks.Good block detection algorithms have error tolerance: they are unaffected by a few missing edges in a network.This suggests that the principle of block detection could be applied to edge prediction.
Theoretically, a probabilistic latent variable model is proposed that combines both the concepts of block structure and low rank approximations for matrices.The model provides a framework for predicting links.Firstly, any modularity clustering algorithm can be used to generate blocks, while the only limit is the computational complexity.Then different from the low rank matrices approximations algorithms already used for link predictions, we use a new low rank matrices approximations algorithm named convex nonnegative matrix factorization (CNMF) [19] to get the predicting results within the blocks.The reason we use CNMF is that the sparseness of solutions is in accord with the sparse property for most real complex networks, so the predicting results are more reliable.In small networks we use -means to detect the block structure of a network's adjacency matrix and average the prediction matrices for some  to get the predicting results.Experiments show that our method shows better performance.Scaling to the massive size networks, it is infeasible to use CNMF directly for the high computational complexity.In this case, we use the block information mapping matrices onto distributed architectures and give a divide-and-conquer prediction method to embrace distributed computing.

Background
The network of  nodes can be represented mathematically by an adjacency  ×  matrix .Here we set the diagonal entries to be 1, which means each node has a link to itself.This adjacency matrix can be treated as an object-feature matrix.Reduced rank method CNMF gives an approximation which has an interesting property: if  is sparse, the factors  and  both tend to be sparse.CNMF has a direct interpretation:  ×  ( ≪ ) matrix  =  is a convex combinations of the columns of ; thus we could interpret its columns as weighted sums of certain objects' coordinates (the coordinate of -object is given by column of ).So the columns of  can be treated as cluster centroids of objects and  weights the association between objects and clusters.Meanwhile  measure the strength of relationship between clusters and features; that is,   = 1 if cluster  has feature ;   = 0 otherwise.So ()  can measure the strength of relationship between object  with feature  and then can be used to predict link between  and .

A Probabilistic Latent Variable Model
Although the background gives an intuitionistic interpretation of CNMF used in link prediction, we still need theoretical guarantee.Here we propose a probabilistic latent variable model, and the model ensures that the probability of a link between two nodes can be expressed as a combination of CNMF and the block structure of a given adjacent matrix.
In probabilistic view, the observed network data is a realization of an underlying probabilistic model, either because it is itself the result of a stochastic process, or because the sampling has uncertainty.We can think of the adjacent matrices of network data   = {   ,  = 1, . . ., } as the  object-feature matrices for objects {  ,  = 1, . . ., } and features {  ,  = 1, . . ., }.In this paper,   contains an adjacency matrix and its blocks found by clustering methods such as -means.The joint occurrence probability of an object and a feature can be factorized as where  is a variable for the index of objects,  is a variable for the index of features, and  is a variable for the index of sampling.( = ,  =  |  = ) is the joint occurrence conditional probability given the observation    , and ( = ) is the priori probability that    is observed.Objects in real data are often organized into modules or clusters and the probability that a object has a feature depends on the groups to which they belong.These clusters memberships are unknown to us.In the language of statistical inference, they are latent variable.Assuming each cluster is a combination of objects, the joint occurrence probability can be factorized as where  is the variable for the index of cluster.Here, we assume that the random variables ,  are conditional independent given .
Define a random variable If observing once, the expected value is Let   be a random variable of the occurrence frequency of ( = ,  = ) in observing ∑  =1 (∑ ,    ) times.Then the expected value is For the reason of interpretability, we suppose the joint concurrent probability of th object, th cluster, and th data sampling is given by a combination of data    as follows: where   ≥ 0 and Constraint (8) ensures that the probability defined by ( 7) is well defined.This constraint has the advantage that we could interpret ( = ,  = ,  = ) as weighted sums of certain joint concurrence probability of object, features, and data, given by Therefore, (6) can be expressed as where   = ( =  |  = ,  = ).Our goal is to compute two 3-order tensors  and .
If we are only inputting the adjacency matrix, we can drop the index for sampling; then Equation ( 11) can be expressed by matrix as Now, we show that this factorization is equivalent to CNMF.In fact, for any CNMF solution ( W, G), ∑  =1 G = 1 does not hold and ∑  =1 W = 1 holds for any .Let  G be the diagonal matrix, containing the row sums of G.We say that the matrix W G approximately satisfies (8).This can be proved as follows: So the factorization satisfies the condition of (11).If we are also inputting blocks, (10) can be solved as where W and G are solutions of CNMF for    .This can be proved as the case of only inputting the adjacency matrix.
The algorithm of CNMF gives a global optimal solution to min ‖ − ‖ 2 .The computational complexity of it (the most time-consuming step in our method) for  ×  matrix is of order ( 2  + (2 2  +  2 )) for  ×  factor  and is of order (2 2  + 2 2 ) for  ×  factor , where  is the number of iterations [19].

Algorithms
Inputting the network data, the missing link prediction by calculating (15) has three steps.First, partition the observed adjacent matrix into  2 blocks, using any modularity clustering algorithms.Secondly, the predicting matrix is given by doing CNMF to approximate each block.Thirdly, sum the corresponding entities of predicting matrices for all  to make the final prediction.In small networks, We call our method -CNMF, as we use -means to partition the matrix.The diagram of -CNMF for  = {1, 2, 3, . ..} is shown in Figure 1.
The purpose of the first step is to use several scales structures information of the observed network.For small networks, CNMF approximation can be computed directly on the original matrix (block generated by -means with  = 1) to use the global information.A simple interpretation of our method is that if an edge is predicted to exist in many scales, it should be a missing link with high probability.The input of -CNMF also needs two parameters: desired rank  and scale parameter .Algorithm 1 shows the algorithm for -CNMF.
When predicting links in massive size network, means is unsuitable for high dimensional data clustering.Meanwhile, the high computational complexity of CNMF makes it also infeasible to be used on the large adjacent matrix directly.So we use fast modularity clustering algorithm [20] to generate blocks.Based on block structures, we give a divideand-conquer algorithm (-CNMF) to predict links, which is shown in Algorithm 2. The algorithm works by partitioning a matrix into blocks which are small enough for CNMF directly.Then the predicting results for the small blocks are combined to give the final predicting result for the original matrix.In order to give a solution for CPU load balancing, the size of blocks should be similar, which is achieved by splitting the large blocks and combining the small blocks to make their sizes in the neighborhood of a given threshold.

Experiments and Comparison
In general, links between different nodes may have different weights in networks, representing their relative importance in the network.In our experiments, we set all weights to be one and get the original adjacency matrix   of the network.The observed network   is generated by removing a fraction of links randomly from the original network   , which will be called the missing edges.Then we use the two algorithms -CNMF and -CNMF to get the probability of links between nodes, which appears to be links' weight in the observed network.

Evaluation Method.
To measure the accuracies of link prediction methods, the main metric we use is AUC [21], area under the receiver operating characteristic (ROC) curve, which is widely used in the machine learning and social network communities.If we rank pairs of nodes in order of decreasing, AUC is mean value of the probability that a missing link (   = 0 &    ̸ = 0) has a higher ranking than a nonexistent link (   = 0).In practice, we do  independent comparisons.At each time we randomly pick a missing link and a nonexistent link to compare their ranking.If there are  0 times the missing link has a higher ranking than the missing one and  * times they have the same ranking, the AUC value is The missing links fraction  ranges from 0.05 to 0.95, and the interval is set at 0.05.

Methods Used to
Compare.We compare our algorithm with three prediction methods: Common Neighbors, Block Model, and Hierarchical Random Graphs.Fraction of missing edges (1) Common Neighbors (CN) [2].If two nodes,  and  have many common neighbors, they are more likely to have a link.The measure of this is where () is the set of neighbors of .
(2) Block Model (BM) [4].In block models, nodes are partitioned into groups and the connecting probability of two nodes only depends on the groups they belong to.Given a partition  of the network,    is the number of edges in the observed network between nodes in groups  and , and   is the maximum possible number of links between  and .The reliability of an individual link is where the sum is over partitions  in the space P of all possible partitions of the network,   is node 's group, H() = ∑ ≤ (ln(  + 1) + ln       ), and  = ∑ ∈P exp(−H()).
(3) Hierarchical Random Graphs (HRG) [3].The hierarchical structure of a network can be represented by a dendrogram with  leaves (the vertices from the given network) and  − 1 internal nodes.A probability   is associated with internal node  and the connecting probability of a pair of leaves is equal to   , where  is the deepest common ancestor of these two leaves.HRG combines the maximum likelihood method and Markov chain Monte Carlo method to sample the hierarchical structure with probability proportional to their likelihood from the observed network and then calculate   .

Performance of 𝐾-CNMF.
We evaluate the performance of -CNMF using four high-quality small networks, and they are listed in Table 1: the social network of interactions between people in a karate club [22], the social network of frequent associations between dolphins [23], the air transportation network of USA, the coappearance network of characters in the novel Les Miserables [24], and a network of hyperlinks between weblogs on US politics [25].Each AUC is obtained by averaging over 100 independent realizations.Communities are basic structure in networks, which is widely used to predict missing links.Using block structure, our combined method is also dependent on communities of the networks.As Figures 2(a) and 2(b) show, -CNMF ( = 2,  = 3) performs much better than CNMF ( = 2) alone on Karate and Les-Mis, because both of those networks have more than two communities.However the enhancement by block structure is small on PB with desired rank  = 2 (see Figure 2(c)), which has two main communities.As the  Will more block information usage bring more accuracy?
If partitioning matrix into too small blocks, -CNMF will have too many parameters relative to the observed data, and then overfitting will occur.An overfitted model describes noise instead of the underlying relationship and generally has poor predictive performance.From the performance of -CNMF ( = 62) with different  on Dolphins (see Table 2), we can see that -CNMF ( = 62,  = 1) has revealed enough local information, and increasing  caused overfitting.
Figure 3 shows the comparison of -CNMF with CN, BM, and HRG on Karate (inputting parameters of -CNMF:  = 3,  = 34) and Dolphins ( = 1,  = 62), Les-Mis ( = 40,  = 50), and US-Air ( = 3,  = 300).-CNMF performs better than CN, because it concerns the property of both local and global information.The performance of  Comparisons are made only between -CNMF and CN, for the reason that BM and HRG are not suitable for large networks.Figure 4 shows the comparison of -CNMF with CN.The performances of -CNMF are better than CN when the observed networks are much sparse, because common neighbors miss too much in sparse case and CN only concerns this property.In the power network, our method is obviously better than CN, because the original network is sparse.
Figures 5(a) and 5(b) show the comparison of AUC results between -CNMF and -CNMF on Karate and Dolphins, respectively, where  = 2,  = 2.There are no obvious rules that different modularity clustering algorithms will influence the results of AUC.

Conclusions
We have introduced a probabilistic latent variable model for finding missing edges, which combines convex nonnegative matrix factorization with block structures detection.It is inspired by two properties of block structure for matrices: the facts that entities in the same block tend to be similar and that good block detection algorithms have tolerance to missing edges.Scaling to massive size network, we use fast modularity clustering algorithm to generate blocks and give a divide-andconquer algorithm (-CNMF) for predicting links.For the load balancing of CPU, we split the large blocks and combine the small blocks to make their sizes in the neighborhood of a given threshold.
Since most applications of link prediction are facing the problems of sparse data, such as personalized recommendation, we plan to combine other sparse low rank approximation algorithms with block detection methods to get effective link prediction algorithms for massive networks in the future.
-CNMF is comparable with BM and HRG, but faster, as it does not need Monte Carlo samplings.