Link Prediction Methods and Their Accuracy for Different Social Networks and Network Metrics

,


Introduction
Network structures have been studied for many years.First research in this area can be traced back to 1736 when Euler defined and solved the Seven Bridges problem of Königsberg [5].Since then, for a long time, networks have been mainly studied by mathematicians and this resulted in a very prominent research field known today as the graph theory.There was not much ground breaking development in the complex network research area until 1960s, when the Erdos-Renyi random graph model (ER-model) was introduced [15,16].This is the simplest model of complex network.Due to the fact that there was a lack of large real world data, most of the work had been done on theoretical analysis of phenomena existing in networked structures (e.g.phase transition).
Over the years data collection techniques have significantly improved our ability to store massive and heterogenous network data.During the time when ER-model was introduced, progress has also been made by sociologists in researching real world human relationships [51,36].A new wave of research was set off by Watts and Strogatz who published a paper about the smallworld effect in 1998 [54] and introduction of the scale-free network model by Barabasi and Albert one year later [4].
As the accessibility of database systems and Internet is growing, more and more real world network datasets are available.The available information about people and their activities is much richer and more complex than ever before.The complex network concept is an abstract form of various real-world networks, e.g.biological networks such as protein-protein interac-tion networks, metabolic networks [22,26], human networks and disease spread [47,53,9,11], scientific collaboration networks [21,41] and online social networks [1,13,17,23].
Link prediction in complex network is one of the popular research topics.Most of the researchers focus on the link prediction problem [31] which is very valuable for solving real world problems.Generally, the prediction problem is mainly studied from two angles: (i) network structure and (ii) attributes of nodes and connections.Structure refers to the way in which nodes that compose the network are interconnected.It reflects the information about network topology.Majority of the progress in the area of structure based prediction has been made by mathematicians and physicists.Some of the well-known structure based prediction methods are Common Neighbour, Jaccard's Index, Adamic/Adar Index, Katz, etc. (for a review of the methods please see [34]).
The link prediction problem also has been studied from the angle of the network attribute information.The attribute information refers to description of the features of nodes.Such information is difficult to show directly in the network graph.It can be for example done by labelling nodes, e.g. 1 depicts node that represents woman and 2 means that node represents man.The majority of attribute-based prediction methods follow a machine learning approach, i.e. they use classification-based methods to make predictions.Widely used methods include Decision Tree, Support Vector Machine(SVM), Naïve Bayes, etc. [45,33].In [32,55,20], authors report that the performance of link prediction improves when machine learning approaches are used.However, this is done using additional network information that is not always available.We would like to emphasize that in our work, we are interested in the methods that only require the basic network structure information and thus we do not include machine learning methods in our study.
However, although much effort has been made, there is still no prominent prediction method that could provide a satisfactory performance.Thus, there is still a huge research gap that needs to be addressed.

Research Motivation
In the realm of network prediction, many efforts have been done on exploring new prediction methods that could provide better performance.However, the methods presented in most of the studies only improve the prediction result significantly for the network used in the study.There is a lack of systematically research that would enable to reveal the reason why the methods are good predictors when it comes to some of the networks but very bad when other networks are considered.This paper addresses this problem, by exploring the correlation between network metrics and prediction accuracy of different methods.We expect that such approach will enable to find the reasons why methods performance vary on different networks.Apart from having a further understanding of the prediction methods, the study is also important as a theoretical base for developing new prediction methods.This could be relevant to many subjects.The prediction methods could help to find the relationships between proteins which might not be easily observed directly due to the interaction complexity .For example, new interactions can be inferred from the existing known interaction networks [10,2] which shows a much better performance than prediction purely by chance.Online market targeting might also benefit from the network prediction which has already been applied in real world industries.For example, Google and Amazon recommend customers the potential goods and services that they might be interested in which is a kind of link prediction that predict the link between customers and products.
Beyond that, analysis of the link prediction problem in a time series approach could help researchers gain a better understanding of the evolution of the networks.Many works have been done to study the dynamic of complex network [6,7,8].The achievement of network prediction analysis could help explain the mechanism of the network evolution.

Contributions
The main contribution of our study is that we look at the link prediction as a time series problem and systematically analysed the correlation between network metrics and methods accuracy.In addition, in our experiments, we also find that for some networks, most of the prediction methods could provide a good performance while for some other networks, most methods are relatively powerless.We name them 'prediction friendly' networks and 'prediction unfriendly' networks respectively.
The paper is structured as follows: Section 2 presents the prediction methods and performance metrics used our experiments.Section 3 presents how the dataset were selected and processed.In section 4 and 5 we introduce the experimental design and present obtained results.We conclude the paper in Section 6.

Link Prediction Problem
Link prediction problem has been extensively studied by members of the complex network community.David Liben-Nowell and Jon Kleinberg have formalised the link prediction problem in [31] in the following way: Let G(V, L) be a network within the time period of G[t, t 1 ] where V represents the set of nodes and L represents the set of links.For the next time period G(t 1 , t 2 ], the network might change.The link prediction focuses on how to predict the evolution of links, i.e. how L [t,t 1 ] will differ from L (t 1 ,t 2 ] . Researchers with background in physics and mathematics usually deal with the problem by focusing on the topology information of the networks.Researchers with machine learning and data mining background favour to solve the problem with considering the nodes' attribute information.There are three types of link prediction problems as shown in Fig 1 : we can consider (i) only adding links to the existing network, (ii) only removing links from the existing structure, and (iii) both, adding and removing links at the same time.

Adding Links
Adding links (Fig 1a) means that in the next time window a new link will be created between existing nodes.There can be one or more newly created links.

Removing Links
Removing links (Fig 1b) means that the link will disappear in the next time window.Similar to the situation when new links are added, one or more more link can be removed in one time step.

Adding and Removing Links
This problem is the combination of two previously described problems.It means that from one time window to another both appearance and disappearance of links can be predicted (Fig 1c).
In this research, we will only focus on the first type of link prediction problem which only aims at predicting the appearance of links.The main reason for this is that the vast majority of existing methods for real-world data focus on this problem, so it means that we have big enough base to perform correlation analysis.

Prediction Methods
We select and present a brief description of ten commonly used prediction methods that use topology information about networks in the prediction process.Throughout this section the symbols x, y denote nodes, N denotes number of nodes in the network, and k is the average degree.Γ(x) and Γ(y) denote the neighbour sets of these nodes, k x and k y denote the degree number of node x and y respectively.

Common Neighbours
This method is based on the assumption that two nodes with many common neighbours will be connected in the future.The more common neighbours two users have, the higher the probability that a relationship between them will emerge.As a basic and intuitive method, Common Neighbours approach is usually used as a baseline to judge the performance of other methods [34,31,14,13].The complexity of this method, as introduced in [35], is O(Nk 2 ).

Jaccard's Coefficient
The Jaccard's Coefficient, also known as Jaccard index or Jaccard similarity coefficient, is a statistic measure used for comparing similarity of sample sets.It is usually denoted as J(x, y) where x and y represent two different nodes in a network.In link prediction, all the neighbours of a node are treated as a set and the prediction is done by computing and ranking the similarity of the neighbour set of each node pair.This method is based on Common Neighbours method and its complexity is also O(Nk 2 ).The mathematical expression of this method is as follows [31]: Preferential Attachment Due to the assumption that the node with high degree is more likely to get new links [42], preferential attachment was introduced as a prediction method.The degree of both nodes in a pair needs to be considered for the prediction.Same as common neighbours, this is also a basic prediction method which is usually used as a baseline to measure the performance of other prediction methods.This method will calculate similarity score for each pair of nodes within the network rather than only the neighbour of nodes thus the complexity of Preferential Attachement is O(N 2 k 2 ).This method can be expressed as: Adamic/Adar Index It was initially designed to measure the relation between personal home pages.As shown in equation 4, the more friends z has, the lower score it will be assigned to.Thus, the common neighbour of a pair of nodes with few neighbours contributes more to the Adamic/Adar score (AA) value than this with large number of relationships.In real world social network, it can be interpreted as: if a common acquaintance of two people has more friends, then it is less likely that he will introduce the two people to each other than in the case when he has only few friends.It shows good results in predicting the friendship according to personal homepage and Wikipedia Collaboration Graph, but in the experiment of predicting author collaboration, it shows a poor accuracy prediction [1].It is another method that is based on common neighbour, the complexity is also the O(Nk 2 ).It is calculated as: Where z is a common neighbour of node x and node y.

Katz β
This method takes lengths of all paths between each pair of nodes into consideration [24].According to equation 5, the number of paths between node x and node y with length l (written as |paths l xy |) are calculated and then multiplied by a factor β l .By summing up all the results for a given two nodes with path length from 1 to ∞, a prediction score for the pair of nodes (x, y) is obtained.Katz is a prediction method based on the topology of whole network and thus its calculation is more complex than other methods in this section.The complexity is mainly determined by the matrix inversion operator, which is O(N 3 ) [35,18].
The parameter β, as shown in equation 5, is used to adjust the weight of path with different length.When an extremely small β is chosen, the longer paths will contribute less to the score in comparison to shorter ones so that the result will be close to the common neighbours.
It is one of the prediction methods that, as it will be shown in further sections, achieves high prediction accuracy in many experiments.

Cosine Similarity
The idea of this method is based on the dot product of two vectors.It is often used to compare documents in text mining [34].In network prediction problem, this method is expressed as: For each pair of nodes with common neighbours, this methods will perform a vector multiplication and thus the complexity is O(Nk 3 ).

Sørensen Index
This index [50] is designed for comparing the similarity of two samples and originally used to analysis plant sociology.The complexity of this method is O(Nk 2 ).It is defined as: Hub Promoted Index HPI is proposed for analysing metabolic networks as shown in [48].The property of this index is that the links adjacent to hubs are likely to obtain a higher similarity score.The complexity of the method is O(Nk 2 ).It is expressed as: Hub Depressed Index Approach that uses the idea of hub in totally different manner than HPI is Hub Depressed Index (HDI).It gives links adjacent to hub a lower score.Its complexity is same as Hub Promoted Index, O(Nk 2 ).It is defined as Leicht-Holme-Newman Index LHNI [30] was proposed to quantify the similarity of nodes in networks.It is based on the concept that two nodes are similar if their immediate neighbours in the network are themselves similar.As another common neighbour based method, its complexity is O(Nk 2 ).It is defined as: All of the methods presented in this section are following similar approach.The required input for each method is the adjacency matrix that represents a network in which there are only 0 and 1 (0 -when there is no link between two given nodes, 1 -when the links between two given nodes exists).The output of each method is a similarity matrix in which each element represents the similarity score of a pair of nodes within the network and it is calculated according to the equation used in a given method.

Prediction Performance Metrics
In order to measure the performance of a prediction method, we need to use historical network data.Link prediction is a time related activity, therefore, we should use time-stamped dataset and according to the time stamp, separate the data into two sets, G t,t 1 (V, L 1 ) as training set for prediction methods and G t 1 ,t 2 (V, L 2 ) as unknown future network for testing where t < t 1 < t 2 .Those two networks must consist of the same set of nodes V.The number of possible links that is denoted by U is |V| * (|V| − 1)/2.The link prediction method, in principle, provides a similarity score for each non existing links (U − L 1 ) and for most methods, a higher score means higher likelihood that the link will appear in the future.Final prediction is done by ordering this score list and selecting top N links with the highest score.
In our work, AUC is used for quantifying the accuracy of prediction method.It is the area under the receiver operating characteristic curve [19].In the context of network link prediction, AUC can be interpreted as the probability that a randomly chosen missing links (L 1 ∪ L 2 − L 1 ) is given a higher similarity score than a randomly chosen pair of unconnected links (U −(L 1 ∪ L 2 )) [12].The algorithmic implementation of AUC follows the approach in [34].It is calculated as Where n is the number of times that we randomly pick a pair of links from missing links set and unconnected links set; n is the number of times that the missing link got a higher score than unconnected link while n is the number of times when they are equal.The AUC value will be 0.5 if the score are generated from and independent and identical distribution.Thus, the degree to which the AUC exceeds 0.5 indicates how much better the predictions when compared to predict by chance.

Data Preparation
All six datasets used in experiments are real world social networks, five of them come from Koblenz Network Collection (KONECT [28]) and another from the Wrocław University of Technology.(see Table 1)

Dataset Selection
Datasets for the experiments have to meet certain requirements: (i) they have to represent data about users' interactions or any other type of activity that enables to define connections between users, and (ii) those activities have to be time stamped.As described in section 2, the link prediction problem is a time series problem that looks into the evolution of networks in time.Time-stamp is thus necessary.Table 1 shows the original dataset information that were selected based on these two criteria.

Data Processing
To make the data suitable for the experiments, first the preprocessing of datasets has been performed.It consists of the following three steps: 1. Select data samples.For each dataset, we first randomly select 6000 -8000 user records (8000 samples is selected due to the calculation capacity.As for some dense networks, 8000 nodes is also too big, so we choose 6000) from the original dataset as the sample user data.As UC Irvine Messages only contains 1899 users, so we leave it as it is.
The specific sample numbers are shown in Notes.This table shows the original information about the datasets used in the experiments. (a) The Email network among employees of Enron.Nodes in the network are individual employees and edges are individual emails [27]. (b) The wall posts from the Facebook New Orleans networks [52]. (c) The social network of Flickr users and their friendship connections.It is collected by taking a snapshot of the network on November 2nd, 2006 and record it daily until December 3rd, 2006, and then again daily between February 3rd, 2007 and May 18th, 2007 [39,40]. (d) The Email communication of Wrocław Univerisity of Technology [25]. (e) The network contains messages send between the users of an online community of students from the University of California, Irvine.A node represents a user.An edge represents sent message.Multiple edges denote multiple messages [46]. (f) The social network of YouTube users and their friendship connections between December 10th, 2006 and January 15th, 2007, and again daily between February 8th, 2007 and July 23rd, 2007 [37,38].After all, we get the train set G t,t 1 (V, L 1 ) and test set G t 1 ,t 2 (V, L 2 ) as described in section 2.2 where the both sets have same nodes V.

Experimental Design
In order to be able to apply all selected methods and taking into account the types of datasets available, the network is represented as a binary un-weighted network.This enables consistent and comprehensive review of the existing methods.
First, the prediction methods described in section 2.1 will be applied to each of the processed training sets to get the similarity matrix as the prediction result.The prediction results will be then evaluated using the testing set and the AUC for each method will be calculated.
For the implementation of those methods, we applied the toolbox that presented in [34] and all the experiments were implemented in Matlab.
As stated before, the main goal of the research is to explore the correlations between the accuracy of different prediction methods and network metrics.For the training set of each network, the network metrics are calculated with toolboxes provided by KONECT [28] and MIT Strategic Engineering research group.The metrics we calculate include:

Global Clustering Coefficient
It is defined in [43] as: 3 * number of triangles in the network Number of connected triples of vertices (12) It shows the transitivity of the network as a whole.The coefficient range is between 0 and 1.

Average Clustering Coefficient [54]
It is based on local clustering C l .For each of the vertex l, its local clustering coefficient can be calculated by: and then the ACC can be calculated as: where v is the number of nodes in a network.

Network Density
The ratio between existing links and all possible links given the node numbers.
Network Density = Number of Existing Links Number of all possible links (15) where Where v is the number of nodes in the network.
Gini Coefficient [29] In network study is defined as: where is the sorted list of degrees in the network and n is the number of nodes in a network.Its value is between 0 and 1, where 0 denotes total equality between degrees and 1 denotes dominance of single node.

Diameter
The longest of the shortest paths in the network.
Where d(i, j) is the shortest path between node i and j.

Average Shortest Path
The average number of the shortest paths between each pair of vertices.
Once the accuracy of prediction for each method and the metrics for each network are calculated, the correlation between them will be analysed.The Pearson's Coefficient [49] is used to measure the correlation between accuracy of network prediction method and selected network metrics.It is a widely used statistic The coefficient value is between −1 and 1 where −1 means that two variables are negatively linearly correlated and 1 means that they are positively linearly correlated.

Network Profiles
The values of network metrics for each of the extracted social network are presented in Table 5.As it is much easier to set up relationship between people in online social network than in real world network, the average shortest path in our experiments are all smaller than six, the number suggested by the six degrees of separation theory [44].The average shortest path of the six Notes.k is the average degree and v is the number of nodes in the network selected networks is 3.65.This reflects the small-world property of the networks.People are closer to each other in online social networks than in face-to-face networks.This phenomenon was also pointed out in [3] where authors established that the average shortest path of Twitter is 3.43.
The degree distributions of the six networks, shown in Fig 3, indicates that they are scale-free networks as the distributions follow the power law.
We also compared the GCC and ASP metrics of the real network with the theoretical metrics of random network and regular network that have same number of nodes and links.The analytical formulas for GCC and ASP in random and regular networks with a given number of nodes and links are given in Table 4.The results of calculations for each analysed network are presented in Table 3.
Fig 2 plots the metrics of six analysed networks and related theoretical networks respectively.It shows that the clustering coefficient of the analysed networks are all between random and regular networks.Meanwhile, the average shortest path of realworld networks are all very close to the random networks.This two phenomena indicate the small-world property of analysed structures.Taking into account both metrics and node degree distribution, it can be concluded that those networks are a combination of small-world and scale-free networks.

Prediction Results
The prediction results are summarised in Table 6.Katz method achieved the best average performance and the overall performance is ranked as: Katz > Preferential Attachment > Adamic-Adar > Common Neighbours > Cosine Similarity > Jaccard Index > Hub Depressed Index > Hub Promoted Index > Sørensen > Leicht-Holme-Newman Index.By comparing the variance of each method, we find that the Katz also provides the most stable prediction performance among those methods while Common Neighbours is the worst performing approach.Overall, we find that Katz and Preferential Attachment provide good prediction accuracy together with a relatively stability.
To study the prediction results from the perspective of each network please see Figure 4.The prediction results of different methods align on the vertical lines for each network respectively.From this figure, we find that for some networks, most of the prediction methods could provide a good prediction result.Such networks include Flickr, Enron and YouTube.We call this type of networks the 'prediction friendly' network.Apart from this type of network, there are also some networks for which most of the prediction approaches provide fairly low accuracy, such as Facebook, UC Irvine and PWr.Similarly, we call those network 'prediction unfriendly' networks.Please note that in the ex- periments, for both prediction friendly and unfriendly networks, Katz β always provide a good performance level.

Correlation between Prediction Accuracy and Network Metrics
Table 7 shows the Pearson's linear correlation coefficient of prediction accuracy and network metrics.The closer the absolute value to 1, the higher the correlation between analysed factors is. Figure 5 presents a heat-map plot to show the degree of linear relation between the two factors where we use the absolute value of Pearson's Coefficient.The brighter the colour in the heat-map is, the stronger a given network metric and the accuracy of prediction method are correlated.
In Figure 5, we can see that the Preferential Attachment and Gini Coefficient provides the highest correlation coefficient (0.94) which indicates that they generally follow a linear relationship.This is not a surprise.For a network with a high Gini Coefficient, there exist some nodes with dominant high degrees.It just reflects the phenomenon of "rich get richer" which is also the assumption of Preferential Attachment method.So we can say that preferential attachment could lead to a high Gini Coefficient and thus Preferential Attachment, on the other hand, could also describe how a network with high Gini Coefficient evolves by giving a better prediction result.
Cosine-GCC and Sor-GCC also provide a correlation coefficient above 0.8.We can draw the conclusion that Cosine Similarity and Sorensen Index method perform better in a network with higher GCC than it does in smaller GCC.
The Diameter and Average Shortest Path shows a negative linear relation to almost all of the prediction methods (excluding Katz and LHN where the negative correlation is weak).Both the Average Shortest Path and the Network Diameter reflect how easy it is to get from one node in a network to another one.Shorter path as well as smaller diameter means a higher probability that a pair of randomly picked nodes will be connected.Negative correlation between those two metrics and prediction accuracies of different methods means that most of the methods work well in the situations where networks feature short ASP and in consequence small Diameter.This is additionally supported by the fact that Global Clustering Coefficient is positively correlated with those of the prediction methods meaning that these methods work well with networks with high clustering It should be clear that the Pearson's coefficient does not indicate the accuracy of the method.For example, although the prediction method Katz does not show strong correlation to any of the network metrics, it still provides best result in our experiments.The reason can be found in Table 6, where it is shown that Katz always provides a high prediction accuracy regardless the tested network metrics.
The most important value of our correlation study lies in the variety of prediction methods used in the experiments.The prediction with methods combination could be a way to improve accuracy and this will be investigated in the future.The correlation between methods and network metrics could be used to determine the weight of different prediction methods in the combination process.

Prediction Friendly and Unfriendly Networks
Table 7 also shows the average correlation of network metrics and prediction accuracy.As we know the closer the absolute value of correlation to 1, the stronger the linear relation.Here we take 0.5 as a threshold for strong correlation.According to this, we find that there are four metrics strongly correlated with the prediction accuracy which includes GCC, ACC, Diameter and ASP.So it is reasonable to assume that these metrics could be used to classify the prediction friendly and unfriendly networks.We ranked each of the analysed networks according to the metrics that have strong correlation with prediction accuracy and based on this for each network we calculate the average ranking (Table 8).Top three ranked networks (with the small average ranks) are the prediction friendly networks and the other three are prediction unfriendly networks.It can be seen that the prediction friendly networks usually have large global and local clustering coefficient, a short average shortest path as well as small diameter.It suggests that networks with the structural profile similar to small-world network are easier to predict than networks similar to random structures.

Conclusions
In this research, we look into the correlation between ten prediction methods and different network metrics in six time-stamped social networks.The study of network metrics confirmed that the node degree distribution of real world social networks follows a power law distribution.We also found that the average shortest path of online social network is much smaller than six.This might be due to the fact that online relationships are much eas-ier to setup.The results of the prediction accuracy show that the best method among the tested ones is Katz β .It is also the most stable technique from all tested ones.Preferential Attachment is the second best method that also provides a good prediction accuracy.In addition, for some 'prediction friendly' networks, most of prediction methods could provide a good performance while for some others, called in here as 'prediction unfriendly' networks, most prediction methods are lack of power.Notes.The accuracy of selected prediction methods measured by AUC.The average performance and the variance for each methods are also listed. (a) In our experiment, we choose β = 0.0005 The Pearson correlation coefficient enabled to investigate the relationship between network metrics and prediction accuracy.Our research showed that some methods are highly correlated with certain network metrics (e.g.PA-Gini, Sor-GCC and Cosine-Gcc).
There are several further directions of the presented study.As discovered, for some networks, most prediction methods could provide a good performance which we name them as 'prediction friendly networks'.Similarly, we also find the existence of 'prediction unfriendly' networks.Section 5.4 explores the prediction friendly and unfriendly network classification according to the metrics ranking.The problem is that it does not provide an exact threshold that could be used to classify networks.It is out of scope of this research but is a very interesting topic for another study that we plan to conduct.
Based on the results of correlation between network metrics and the prediction accuracy, another possible work is to develop a new prediction approach which combine several, existing methods.We can also extend this research to many other networks, not only social ones, which might be good for finding some more general relations.

Fig. 4 .
Fig. 4. The AUC Prediction Results for Each Network

Fig. 5 .
Fig. 5. Heat-map of Network Metrics and Prediction Methods Correlation Notes.As for the Pearson Coefficient, both 1 and -1 stands for linear relationship (positive and negative), we use the absolute of correlation coefficient in this figure to indicate whether the two factors are linearly correlated.

Table 2 . 2 .
Split the data into training and testing sets.Prediction in a time series problem means the dataset should be divided into train and test sets based on time stamps available.As the dataset of Flickr and YouTube are collected by taking snapshot of the network which is different from other four datasets, we take the first day snapshot as the training set and the remaining data as the test set.The other four networks are split according to the time scale with a ratio approximate training time : test time = 80% : 20% as shown in Table2.
3. Extract connected network.Dividing data into trainingand testing sets can cause the isolation of some nodes or cliques.This, in turn, generates noise for measuring the accuracy of prediction methods as the methods we selected can not predict unconnected nodes.To eliminate the impact

Table 1 .
Original Dataset Information

Table 2 .
Dataset Details L 2 ) is obtained by extracting the network with all the nodes that exist in G t,t 1 (V, L 1 ) from the original test set obtained from step 2. For nodes existing in the final training set but not present in the original test set, we just keep and leave them isolated in the final test set as it is formed by link disappearing.
Notes.The time range of train and test set, the number of sample nodes selected from the original dataset and number of nodes in the giant component which are used as the final nodes set for the experiment are presented in table.ofthis noise, we extract the giant component from training dataset as our final training set G t,t 1 (V, L 1 ).The final test set G t 1 ,t 2 (V,

Table 3 .
Theoretical GCC & ASP of Random, Real and Regular Network

Table 4 .
Article number, page 6 of 12 Fei Gao et al.: Link Prediction Methods and Their Accuracy for Different Social Networks and Network Metrics Analytical formulas for GCC & ASP in random and regular networks

Table 5 .
Network Metrics Results Article number, page 8 of 12 Fei Gao et al.: Link Prediction Methods and Their Accuracy for Different Social Networks and Network Metrics

Table 7 .
Pearson Correlation of Prediction Methods Accuracy and Network MetricsNotes.This table shows the correlation between prediction methods accuracy and network metrics calculated with Pearson's linear correlation coefficient.The number within the range of [-1,1] where 1 is completely positive correlation, 0 is no correlation, and -1 is completely negative correlation.

Table 8 .
Metrics Rank of Networks