It has been proved in a number of applications that it is useful to predict unknown social links, and link prediction has played an important role in sociological study. Although there has been a surge of pertinent approaches to link prediction, most of them focus on positive link prediction while giving few attentions to the problem of inferring unknown negative links. The inherent characteristics of negative relations present great challenges to traditional link prediction: (1) there are very few negative interaction data; (2) negative links are much sparser than positive links; (3) social data is often noisy, incomplete, and fast-evolved. This paper intends to address this novel problem by solely leveraging structural information and further proposes the UN-PNMF framework based on the projective nonnegative matrix factorization, so as to incorporate network embedding and user’s property embedding into negative link prediction. Empirical experiments on real-world datasets corroborate their effectiveness.
National Natural Science Foundation of China6147216061872161National Key Technology Research and Development Program of China2014BAH29F03National Natural Science Youth Fund61602057Jilin Province Science and Technology Development Plan Project2018101328JCProject of Excellent Young Talents Fund of Jilin Provincial Science and Technology Department20170520059JH1. Introduction
With the increased popularity of smart devices and online applications, online behaviors can be regarded as an effective way to reflect people’s lives. Social media has permeated every aspect of people’s daily life, and people’s behaviors and lifestyles are gradually transferred to the Internet nowadays. Huge volumes of opinion-rich data are user-generated in social media at an unprecedented rate, easing the mining of valuable information [1]. Meanwhile, a large-scale social network has been formed among online users. Social networks contribute to developing a vast variety of relations between users such as friendships in Facebook (https://www.facebook.com/), follower relations in Twitter (https://twitter.com/), trust relations and distrust relations in Epinions (http://www.epinions.com/), and Slashdot (http://slashdot.org/). Furthermore, these relations can be generally classified into two categories: positive relations and negative ones. It is significant to detect relationships among users when researches on social network are conducted, which can not only be helpful to some online applications, but also discover social problems through mining social relations. Besides, it is also helpful for sociological study.
Signed network analysis has attracted increasing attention in recent years. However, most of the current researches on signed network concentrate on inferring positive links, for example, predicting trust relationships between users in social network [2, 3] or discovering the citation relationships between the citing reference and the cited reference in citation network [4]. Trust, which provides information about from whom users should accept information and with whom users should share information, plays an important role in helping online users collect reliable information [2]. Recent years have witnessed many trust-related online applications, such as trust-aware recommendation systems [5–7], high-quality user-generated content finding [8, 9], and viral marketing [2]. All in all, by studying the positive links in signed network, people can find potentially valuable knowledge. As the counterpart of positive links in signed network, negative links have not obtained the same degree of attention, which are usually ignored in many researches. In general, discovering distrust relationships can help users avoid fake information and Internet fraud. But essentially, inferring unknown negative links is worth far more than that, for example, a small number of negative links can significantly improve positive link prediction and they can also improve the performance of recommendation systems in social media [10, 11]. Furthermore, negative emotions are the latent factor of inducing contradictions. For instance, the crisis of trust between husband and wife could lead to marriage crisis; the credibility gap between employees and employers could influence the unity and harmony of the company, which would harm the company’s interests, and the crisis of trust between citizens and governments could give rise to social unrest. Thus it can be seen that negative link prediction is significant because it can not only help many online applications to be more effective, but help to discover unsteadiness and solve contradictions ahead of time.
It is a nontrivial task to conduct researches on negative links in signed social network with many difficulties and challenges. Firstly, all links, positive or negative ones in signed social network, are very sparse. Negative links are even sparser and more are missing, thereby the signed social network has a lot of noise data. Secondly, there is little negative interactive data between users in social network. Regardless of it being users rating items or online interactions, users always tend to give favorable reviews or “Likes”. Tang et al. [2] investigated two product review sites, Epinions and Ciao, both of which employ 5-star system to rate items, finding that the majority of ratings are scores of 4 and 5. The result is consistent with the study mentioned: users are likely to give positive ratings to items or others. Lastly, with the dramatic rise in the number of Internet users, social network data is growing explosively, and with the change in users’ participation, the social network structure is fast-evolved, so negative link prediction becomes increasingly difficult. Based on the above difficulties, this paper proposes to predict negative links only with network topology and does not rely on any interactive data between users. To address the sparsity issue, this paper represents each node in a unified low-dimensional space through projective nonnegative matrix factorization, which can further alleviate the inconveniences caused by sparsity. Moreover, this study finds a latent factor of user pairs, which contributes to the formulation of social network links. Finally, it transforms the new negative link prediction problem into an unsupervised learning framework: UN-PNMF. The main contributions are summarized as follows:
The negative network structure is embedded into a low-dimensional vector space through projective nonnegative matrix factorization.
Investigate the signed network only with positive links and exploit the activity degree and influence degree of users for negative link prediction via embedding these properties of user pairs.
An unsupervised framework is proposed: UN-PNMF, which embeds network structure and a latent factor of user pairs into a low-dimensional vector space, simultaneously.
The proposed UN-PNMF framework in real-world social media datasets intends to be evaluated so as to understand the effectiveness and mechanisms.
The rest of paper is organized as follows. Section 2 briefly reviews the related work. Section 3 describes the details of datasets. Section 4 defines the problem. The embedding of the negative network structure into a low-dimensional vector space through projective nonnegative matrix factorization is mainly discussed in Section 5. Section 6 investigates a latent factor of user pairs, which is helpful to predict negative links in social network. Section 7 proposes the framework UN-PNMF and introduces the details of its algorithm. Section 8 presents experimental results and some observations. Section 9 is the conclusion part with some prospects mentioned in this field.
2. Related Work
This section briefly reviews work which is related to different variants of the link prediction in signed social network. The methods of most existing researches can be roughly divided into three categories: supervised, unsupervised, and semisupervised learning.
The supervised link prediction methods of unknown links in social network have two critical steps, first constructing features from available data sources and second training a binary classifier based on these features. Lichtenwalter et al. [12] show several advantages of supervised methods in link prediction such as superior performance adaptation to different domains and variance reduction [11]. Wang et al. [13] introduce a new objective function for signed network embedding guided by extended structural balance theory and propose a deep learning framework SiNE. The embedding learned by SiNE can significantly improve the link prediction performance. Leskovec et al. [14] first introduce status theory and balance theory to predict positive or negative link in signed networks. Zolfaghar et al. [15] develop a framework of several trust inducing factors then investigate C5.0 tree and neural network to predict trust and distrust relations. Tang et al. [16] apply user’s interaction data to predict distrust. Wang et al. [17] explore the combination of Dempster-Shafer theory and neural network to predict trust and distrust. The Dempster-Shafer theory allows one to combine evidence from different sources and arrive at a belief function by taking into account all the available evidence. Because of this feature, the authors combine the Dempster-Shafer theory and neural network to predict trust and distrust.
The unsupervised learning is the learning task of inferring a function to describe hidden structure from unlabeled data. Unsupervised link prediction mostly uses matrix factorization methods, which can alleviate the sparsity of the links in social network. Tang et al. [2] investigate the homophily in trust prediction and formulate the trust prediction problem into an optimization problem integrated with homophily. Tang et al. [11] propose NeLp framework which can exploit only with positive links and content-centric interactions to predict negative links. Wang et al. [3] verify the status theory in trust relations and calculate the status of users by PageRank. They exploit status theory for trust prediction under the trust prediction framework based on low-rank matrix factorization. Oh et al. [18] propose a probability-based trust prediction model based on trust information transferring and the information includes explicit information and implicit information. Wu et al. [19] investigate the users’ consumption behaviors and social behavior and propose an approach to predict unknown links in social network by jointly modeling users’ consumption behaviors and social behaviors in social networking services.
Embedding network data into a low-dimensional vector space has shown promising performance for link prediction in signed network. Most semisupervised link prediction methods have two key steps. The first step is to learn network representation and each node is represented by a low-dimensional vector. The second is to apply general machine learning techniques on these low-dimensional vectors. Wang et al. [20] propose a Structural Deep Network Embedding (SDNE) framework, to perform network embedding. Specifically, to capture the highly nonlinear network structure, they design a semisupervised deep model and the embedding learned by SDNE significantly improves the link prediction performance.
3. Data Analysis
This paper collects two publicly available datasets for study, i.e., Epinions and Slashdot, which include both positive and negative links to perform link prediction. Epinions and Slashdot are two product review sites, where users can decide to trust or distrust other users. Accordingly, the datasets are two signed graphs, so the trust relations can be seen as positive links and distrust relations can be seen as negative links in signed network. Similarly, trust network and distrust network can be regarded as positive network and negative network respectively. Firstly, these users with less than three (in-degree plus out-degree) in positive network are filtered. Then, this part deletes some users who only have negative links, aiming to achieve datasets with sufficient information of positive network. A number of key statistics of the datasets are shown in Table 1.
Statistics of the datasets.
Epinions
Slashdot
# of user
7323
54052
# of Positive Link
118739
404153
# of Negative Link
19629
112307
Positive Network Density
0.0022
0.00014
Negative Network Density
0.00037
0.00004
As seen from Table 1, links are very sparse in our datasets but negative links only take up a small proportion of total links. The negative network density of Epinions and Slashdot is 0.00037 and 0.00004, respectively. On average, users of Epinions have 16.21 positive links, while users of Epinions only have 2.68 negative links. Thus it can be seen that negative link prediction is more difficult and complicated than positive link prediction.
4. Problem Statement
This paper uses bold uppercase characters for matrices (e.g., A), bold lowercase characters for vectors (e.g., a), and normal lowercase characters for scalars (e.g., a). Also, this work represents i-th row of matrix A as A (i,:), j-th column of matrix A as A(:,j), (i, j)-th entry as Ai,j, transpose as AT, and trace as tr(A) if A is a square matrix.
Let U={μ1,μ2,μ3,….,μn} be the set of users where n is the number of users in the signed social network. A signed network can be decomposed into a positive network component Gp(U,Ep) and a negative network component Gn(U,En) where Ep and En are the sets of positive and negative links, respectively. Gp∈Rn×n is the matrix representation of positive network where Gp (i,j) = 1 if there is a positive link from μi to μj and Gp(i,j) = 0 otherwise. Gn∈Rn×n is the matrix representation of negative network where Gn (i,j) = 1 if there is a negative link from μi to μj and Gn (i,j) = 0 otherwise. This paper also analyzes some properties of user pairs in positive network. H∈Rn×n denotes the property matrix where H (i, j) represents the property value of μi related to μj. Because the social network is fast-evolved, this study splits the whole dataset into 6 timestamps, i.e., t={t5,t6,t7,t8,t9,t10}, where the subscript q of tq represents q∗10 percent of negative links in chronological order. Gpt(U,Ept), Gnt(U,Ent), and Ht represent positive network, negative network, and property matrix at time t, respectively.
With the aforementioned notations and definitions, the problem of negative link prediction in social media is formally defined as follows.
Given the positive network Gpt(U,Ept), negative network Gnt(U,Ent), and property of pairwise matrix Ht at time t, we aim to develop a predictor ℊ to predict the negative network Gnt′(U,Ent′) at time t′, with Gpt, Ht as(1)g:Gpt,Gnt,Ht⟶Gnt′
The negative link prediction problem model can be illustrated in Figure 1 and the relations among Gpt, Ht, Gnt, and Gnt′ are demonstrated clearly.
The illustration of unsupervised negative link prediction model.
5. Low-Rank Matrix Factorization for Negative Network
The sparsity of data and the huge volume of noise data are always the main difficulties for researches on signed social network. The quantity of users is huge and this number is rapidly increasing, but the links are very limited in social network. As excruciating as it is, negative links are less than positive links; thus the negative network is much sparser, and it is difficult to directly extract feature information. Owing to the advantages of network representation methods in sparse network, this paper intends to predict negative links in social network through network representation methods. Projective nonnegative matrix factorization [21] is employed to embed negative network, which reduces the dimension of representation vector of each node by matrix factorization and at the same time reduces the data sparseness. Our goal is to seek a low-rank representation W∈Rn×k with k≪n by a matrix factorization model. The low-dimensional vector can not only maximumly preserve effective information, but also eliminate a considerable quantity of noise data. What we will probably do is to predict new negative links by the existent social relationships (including both positive links and negative links) at the time of t. Therefore, for the negative network Gnt(U,Ent), Gnt is the adjacency matrix of Gnt, this section embeds Gnt into a low-dimensional vector space by projective nonnegative matrix factorization. This method is based on the factorization of the matrix Gnt, so this problem can be solved by the following optimization problem:(2)minWGnt–WWTGntF2where∙F is the Frobenius norm of a matrix and W∈Rn×k is the low-dimensional representation of negative network. The k is the dimension of the low-dimensional representation while the value of k is a key element for quality of low-dimensional representation. How to determine the value of k is very important, which will be examined in detail in later sections. To avoid overfitting, smoothness regularization is added on W, into (2),(3)minWGnt–WWTGntF2+αWF2where α is nonnegative and is introduced to control the capacity of W. Nonnegative constraint is always applied to W in (3) as(4)minWGnt–WWTGntF2+αWF2s.t.W≥0
As for the optimization problem shown above, many optimization methods can be applied such as gradient decent. In the process of solving above model, which can not only reduce the dimensionality of input data through W but also effectively reduce Gaussian noise by setting suitable dimension of W, the main principle of this model is to obtain a proper projection operator that can project prior knowledge of user pairs to new feature space, and the projection operator guarantees minimum differences between new projection matrix and adjacency matrix of negative network. Accordingly, due to its flexibility, it allows us to include the prior knowledge such as some latent factors of user pairs, which will be introduced in next section.
6. Latent Factor Representation Learning
This section investigates the structural information of positive social network. A latent factor is found by conducting the research on positive social network topology. This latent factor contributes to the formulation of new negative links in the future.
In this study, given the positive network, the negative link prediction is to discover some user pairs which can establish negative links with high probability from a tremendous amount of user pairs without links. It needs to find out what kind of user pairs are easy to establish negative links in social network in the future, so our problem comes down to a clustering problem. In order to investigate which latent factor could contribute to the establishment of a new negative link between two users, the in-degree and out-degree of each user attract a great interest in positive network because in a trust network, if a user has a bigger in-degree, which shows that there are a lot of users who trust him and he has a great influence or authority in this network. Moreover, more and more users would be influenced by him. For example, the more influential a Twitter user is, the more followers he has. Also in some social media, the number of celebrities’ fans is far higher than that of ordinary users’ fans. This is also because celebrities have a greater influence than ordinary people. At the same time, the out-degree of a user could reflect his active degree, if a user has bigger out-degree in trust network, which shows that he communicates with a larger number of users in social network. From another point of view, this user likes interacting with other users. It can be considered in this way, if out-degree (μi) > out-degree (μj), μi is more active than μj. In [22], researchers believe distrust is a low level of trust. In other words, the initial attitude of distrusted users is trust, and they convert from trust to distrust because of different notions or values during the later association or communication. The study of the negative link prediction aims to discover some user pairs that are more likely to interact with each other. Considering all user pairs in network, it is essential to combine the active degree of one user with the influential degree of another to investigate the social links between these two users. For the sake of convenience this latent factor is denoted by In_Out. Extending to signed social network, there is a question: if out-degree (μi) > out-degree (μm) and in-degree (μj) > in-degree (μn), is the tendency of establishing link between μi and μj higher than the tendency of establishing link between μm and μn? To answer this question, this paper defines a latent factor matrix In_Outt, where the superscript t refers to the time, defined as follows:(5)In_Outt=ζ1,1ζ1,2⋯ζ1,nζ2,1ζ2,2⋯ζ2,n⋮⋮⋱⋮ζn,1ζn,2⋯ζn,nwhere ζ(i,j) can be calculated as(6)ηi,j=out-degreei∗in-degreej(7)ζi,j=ηi,jdi(8)di=∑j=1nηi,jThe out-degree (i) and in-degree (j) are μi’s out-degree and μj’s in-degree in positive network Gpt(U,Ept), respectively. According to the correspondence between Gnt and In_Outt, two vectors are conducted, sn and sr. For the arbitrary user μi and user μj, if the value of Gnt(i,j) is equal to 1, then the value of In_Outt(i, j) is kept in sn; otherwise, save it in sr.
In order to further prove the effectiveness of the latent factor In_Out contributing to the formulation of new negative links in social network, this part conducts a two-sample t-test on sn and sr. The null hypothesis is H_{0}: sn = sr, and the alternative hypothesis is H_{1}: sn > sr. For two datasets, the null hypothesis is rejected at significance level α = 0.01 with p-value of 4.39e-16 and 6.68e-43, respectively. The evidence from two-sample t-test suggests a positive answer to the above question: in signed social network, the In_Out has a large difference between user pairs with negative link and the user pairs without any link. Moreover, user pairs with high value of In_Out may establish links with high probability in the future. Owing to the positive links given in our problem and compared with user pairs without links, the high value of In-Out makes it easier for two users to establish a new negative link.
Synthesizing the analysis and discussion above, it can be realized that the latent factor In_Out plays an important role in the formulation of negative links in signed social network. In social network, the number of users is great. However, in most time, the number of users who have established links with each user is really limited. Just as the conclusion obtained in social science, when one person communicates with others in different periods of time, the people associated with him always change over time, but the number of people who communicate with him basically does not change or the number is constant in a range. Therefore, extracting features from all user pairs in social network, not only needs a large storage space but also causes much valuable information to be drowned by noise data; thus it is also a major difficulty in social media data mining. In short, the paper intends to embed the latent factors of user pairs into a low-dimensional vector space, which could eliminate the excessive noise data. In particular, there are a lot of favorable reviews without practical significance in social media, so the method for denoising is very necessary to be adopted. Therefore, our goal is to seek a low-rank representation P∈Rn×h with h ≪ n, P being the effective representation of matrix In_Out. In this section, this part still adopts the method based on projective nonnegative matrix factorization, and the end projection operator is P. The low-dimensional matrix P only saves some valuable information of potential links among users in negative network. Each row of P∈Rn×h is the low-dimensional representation of a node in network, where h is the dimension of representative vector. Choosing an optimal number for h can not only effectively represent In_Out but also reduce the dimension and eliminate huge noise data. Therefore, embedding latent factor In_Out of user pairs into P can be solved by the following optimization problem:(9)minPGnt–PPTIn_OuttF2
It is evident that this function is similar to (2) and the processing procedure is exactly the same as in the previous analysis. Due to the space limitations, the objective function of this submodule can be obtained directly, as shown below:(10)minWGnt–PPTIn_OuttF2+βPF2s.t.P≥0where β is nonnegative and is introduced to control the capacity of P. Nonnegative constraint is always applied to P in (10). In (10), Gnt and In_Outt are nonnegative; therefore, function (10) of P is convex. In signed social network, nodes representation via embedding latent factors of user pairs provides a new approach for link prediction. It can get a flexible and unified model for embedding latent factors through (10), because researchers may discover more latent factors of user pairs in later study, and anyone could use this model to embed these new latent factors into low-dimensional space conveniently. This paper mines several latent factors which contribute to the formulation of social links, but only In_Out has significant effect on formulating of negative links. Due to the space limitation, there is no need to enumerate other latent factors here. This section only demonstrates the analysis of representation learning of In_Out, but this does not limit the scalability and efficiency of this model, because this model could cope with different latent factors flexibly in further study.
7. Modeling Unsupervised Learning for Negative Link Prediction7.1. The Proposed Framework: UN-PNMF
Section 5 introduces the representation learning of negative network structure, which adopts projective nonnegative matrix factorization. The matrix factorization can be applied to reduce the sparsity as well as to preserve the effective information of the first order proximity of negative networks [23–26]. Moreover, some valuable information of potential negative links is also kept. Sparsity is an inherent difficulty in social network and matrix factorization methods are widely used by many researchers, so the advantaged projective nonnegative matrix factorization is very suitable for the negative link prediction problem. Section 6 represents the latent factor matrix In_Out by a low-rank matrix. Obviously, the In_Out is equivalent to H in (1). Each element of In_Out represents the potential possibility of negative link formulation between corresponding users. However, as the number of users increases, the number of user pairs tends to grow exponentially. Hence, the scale of latent factor matrix In_Out will become infinitely large, which is not available by computer hardware. Besides, the matrix In_Out contains much noise data. The number of everyone establishing links (both positive links and negative ones) with others is always limited. Therefore, this study embeds In_Out into a low-dimensional matrix through the projective nonnegative matrix factorization. With this solution, the achieved low-dimensional matrix not only filters huge noise data but also decreases the space complexity of algorithm in a large degree.
In this paper, the negative network structure and the latent factor of user pairs are not treated independently. For these two problems, we intend to make them be complementary with each other, because the structure of negative network with the latent factor of each user pair can more precisely orientate the user pairs who would establish negative links in the future. With the combination of the above two analyses, this part proposes the framework, UN-PNMF, based on the projective nonnegative matrix factorization. UN-PNMF is to solve the following optimization problem:(11)minW,PF=Gnt–WWTGntF2+λGnt–PPTIn_OuttF2+αWF2+βPF2s.t.P≥0,W≥0
Quite evidently, (11) is the simple summation of (4) and (10), where λ is a parameter that controls the degree of latent factor In_Out. In this way it will obtain two low-dimensional vectors which represent the network structural information and the latent information of users respectively, but this way is opposite to the original intention. This paper intends to embed network structural information and In_Out into a low-dimensional projection operator simultaneously. Therefore, (11) can be rewritten as shown below:(12)minWF=Gnt–WWTGntF2+λGnt–WWTIn_OuttF2+αWF2s.t.W≥0
Through the above adjustment, the UN-PNMF can help to save storage space and improve the computation efficiency as well. There are two iteration vectors P and W in the process of solving (11), but only one iteration vector W in the process of solving (12). Moreover, in the later experiments, it is also proved that (12) is more efficient than (11) in negative link prediction, so (12) can be the objective function of this problem. By removing constants in the objective function, (12) can be rewritten as(13)F=tr-2GntTWWTGnt+GntTWWTWWTGnt+λtr-2GntTWWTIn_Outt+In_OuttTWWTWWTIn_Outt+αtrWWT
It can be clearly seen from (13) that this objective function only has one unknown variable W. However, the traditional methods of matrix factorization have two unknown variables, which is difficult to find optimal solutions for two variants simultaneously. For (12), the W can be updated according to the following updating rule:(14)Wi,j←Wi,jAi,jBi,jwhere A and B are defined as(15)A=2GntGntTW+λGntIn_OuttT+In_OuttGntTWB=GntGntTWWTW+WWTGntGntTW+αW+λWWTIn_OuttIn_OuttTW+λIn_OuttIn_OuttTWWTW
To ensure the final W is the optimum solution to the objective function, next the correctness of the updating rules in (14) shall be proved by showing that the final W would satisfy the KKT condition. The Lagrangian function of (12) can be written as follows:(16)LF=tr-2GntTWWTGnt+GntTWWTWWTGnt+λtr-2GntTWWTIn_Outt+In_OuttTWWTWWTIn_Outt+αtrWWT-trΛWwhere the Λ is the Lagrangian multiplier for nonnegative W. Take the derivative of (16) with respect to W; then we have(17)∂LF∂W=2-2GntGntTW+GntGntTWWTW+WWTGntGntTW-λGntIn_OuttT+λIn_OuttGntTW+λWWTIn_OuttIn_OuttTW+λIn_OuttIn_OuttTWWTW+αW-ΛT
The KKT complementary condition is(18)Wi,jΛi,j=0,∀i∈1,n,j∈1,k
Set ∂LF/∂W=0,(19)2-2GntGntTW+GntGntTWWTW+WWTGntGntTW-λGntIn_OuttT+λIn_OuttGntTW+λWWTIn_OuttIn_OuttTW+λIn_OuttIn_OuttTWWTW+αW=Λ
Then, (19) into (18),(20)2-2GntGntTW+GntGntTWWTW+WWTGntGntTW-λGntIn_OuttT+λIn_OuttGntTW+λWWTIn_OuttIn_OuttTW+λIn_OuttIn_OuttTWWTW+αW⊙Λ=0where ⨀ is the Hadamard product, e.g., (A⨀B)i,j=Ai,j ×Bi,j. According to gradient decent, it is evident that the updating rule equation (14) satisfies the above KKT condition. Furthermore, since Gnt and In_Outt are nonnegative, W is negative during the updating process. Because the objective function only has an independent variable W, it is easy to verify that the updating rule equation (14) is guaranteed to converge.
7.2. UN-PNMF Algorithm
The detailed algorithm for the proposed framework, UN-PNMF, is shown in Algorithm 1. The input of the framework is the adjacency matrix of negative network at time t and hyperparameters α and λ. It constructs the latent factor matrix of user pairs, In_Outt, in line (1). From line (3) to line (8), the algorithm alternatingly updates W until it achieves convergence. Note that, in practice, Algorithm 1 will stop when reaching predefined maximal iterations or there is little change for the objective function value. After obtaining the optimal W, the final predicted matrix P_Gt′ can be calculated following line (12). The likelihood of μi and μj to establish negative link is indicated by P_Gt′(i, j). The parameter σ could coordinate between network topology and In_Out of user pairs in predicting negative links. The σ is set to 0.7 and 0.2 in Epinions and Slashdot, respectively.
<bold>Algorithm 1: </bold>The framework UN-PNMF for negative link prediction.
Input: Gnt, Gpt, α, λ
Output: Rank list of pairs of users
(1) Construct the latent factor matrix In_Outt
(2) Initialize W randomly
(3)while Not convergent do
(4) Set A=2(Gnt)(Gnt)TW+λ((Gnt)(In_Outt)T+(In_Outt)(Gnt)T)W
(5) Set B=(Gnt)(Gnt)TWWTW+WWT(Gnt)(Gnt)TW+αW+λWWT(In_Outt)(In_Outt)TW +
λ(In_Outt)(In_Outt)TWWTW
(6)for i = 1:n do
(7)for j = 1:k do
(8) Update W(i,j)←W(i,j)A(i,j)/B(i,j)
(9)end for
(10)end for
(11)end while
(12) Set P_Gt′(i, j)=((WWGnt)+σ(WWIn_Out))(i, j)
(13) Ranking pairs of users according to P_Gt′ in a descending order
For this algorithm, each iteration for updating W occupies most of the running time. The updating rules for W may limit the application of this proposed framework, so it is essential to analyze the time complexity. First we consider the time complexity of A = 2(Gnt)GntTW +λ((Gnt)GntTW + (In_Outt)T (Gnt)T W). The matrix Gnt is very sparse; thus the time complexity of (Gnt)(Gnt)TW and (In_Outt)T (Gnt) W is О(nk) and О(n^{2}k), respectively. The Gnt(In_Outt)T can be computed inО(n^{2}); therefore, the time complexity of A is О(n^{2}k). For B, we can calculate (Gnt)(Gnt)TWWTW by either ((Gnt)(Gnt)T)(WWT)W or (Gnt)(Gnt)TWWTW. The former costs О(n^{2}k^{2}), while the latter takes О(n^{3}k^{2}), so the former is more efficient. Similarly, (In_Outt)(In_Outt)T(WWT)W can be computed inО(n^{2}k^{2}); therefore, the B can be computed in О(n^{2}k^{2}). Owing to k ≪ n, the overall time complexity of Algorithm 1 is #iterations ∗О(n^{2}). We also counted the time of each iteration for updating W, which costs about 1.699s and 1.689s on Epinions and Slashdot, respectively.
8. Experiments
This section conducts extensive experiments on real-world dynamic networks to evaluate the effectiveness of the framework for negative link prediction. This part also compares different prediction methods and analyzes the impact of parameters. The data and code used in the paper can be available. Anyone could access the data and code by emailing the author.
8.1. Experiment Setting
The experiment setting of the datasets is demonstrated in Figure 2. The whole datasets have been divided into 6 timestamps, i.e., t={t5,t6,t7,t8,t9,t10}. Gnt has been chosen as old negative network and Gnt′ as the new or missing links negative network which needs to be predicted. The t is varied as {t5,t6,t7,t8,t9}.
Separation of the dataset.
This paper follows a common metric for unsupervised link prediction in [2] to evaluate the effectiveness of negative link prediction. In detail, let N = f(Gnt′ - Gnt), where f(M) is a function for calculating the number of nonzero elements in matrix M. Each negative link predictor ranks pairs in descending order of confidence and takes the first N pairs as the set of predicted negative links. These pairs’ corresponding values are set to 1 in matrix P_Gt′; the rest of elements of P_Gt′ are set to 0. Then the prediction accuracy (PA) can be calculated as follows. The g(M) refers to the function of the number of the elements which are equal to 2 in matrix M.(21)C=gP_Gt′+Gt′n-Gtn(22)PA=CN
8.2. Comparison of Different Predictors
In this section, to evaluate the effectiveness of the proposed framework UN-PNMF, this paper compares UN-PNMF with several baseline methods, and the detailed descriptions are listed as follows:
Random: it randomly establishes negative links between two users in signed social network.
MF: it is the representative method of traditional matrix factorization, which conducts a matrix factorization on the matrix representation of negative links [27].
hTrust: it is an unsupervised framework, which exploits the homophily effect for positive link prediction in social media[2].
PMF: it performs a low-rank representation based on the projective matrix factorization as shown in (4).
UN-PNMF_1: it is a variant of the proposed method, and it embeds network structure representation and latent factor In_Out, respectively, as shown in (11).
triNMF: it predicts social links based on nonnegative matrix factorization[2].
For all baseline methods, this paper uses the implementation released by the original authors. Note that this study does not compare the proposed framework with the methods proposed in [13, 17, 20]. Firstly, these methods are either supervised or semisupervised methods. Secondly, these methods use extra sources such as users’ interaction data. Although, the hTrust framework is designed for trust prediction, Rotter et al. [22] suggest distrust is a low level of trust, and the essence of our problem is to discover some user pairs which would establish links from huge user pairs without any link; therefore this comparison is significant. This paper calculates the neighbors' similarity of each user pair to replace the homophily of each user pair. This paper empirically sets these parameters as {α=0.5, λ = 20, k = 5} and {α=0.5, λ = 20, k = 6} in Epionions and Slashdot, respectively. The effectiveness of these parameters will be discussed later. This part uses a random sample of 2000 users as the experiment data. In this section, eight groups of experiments are designed to evaluate the efficiency of different methods. The old negative link matrix Gnt can be Gn5, Gn6, Gn7, and Gn8, respectively; the predicting matrix Gnt′ can be Gn6, Gn7, Gn8, Gn9, and Gn10, respectively, e.g., if Gnt = Gn5, then Gnt′ = {Gn6, Gn7, Gn8, Gn9, Gn10}, if Gnt = Gn6, then Gnt′ = {Gn7, Gn8, Gn9, Gn10}, and so on. To ensure the accuracy and reliability of experiment results, the experiments have been repeated 5 times and the average performance is reported. The comparison results of various unsupervised link prediction algorithms on Slashdot and Epinions are shown in Figure 3.
Performance comparison for different negative link predictors.
Slashdot: t = 5
Slashdot: t = 6
Slashdot: t = 7
Slashdot: t = 8
Epinions: t = 5
Epinions: t = 6
Epinions: t = 7
Epinions: t = 8
From Figure 3, we make the following observations:
The proposed UN-PNMF framework almost outperforms all baseline methods. As Gnt = Gn5, the prediction accuracy of UN_PNMF predicting Gn6 achieves the maximum value of 16.71% in Slashdot, but the prediction accuracy only reaches 14.06% in Epinions. The average prediction accuracy of UN-PNMF is approximately 3.7% higher than average prediction accuracy of hTrust. Thus it can be seen the similarity between two users has only a limited effect on the negative link prediction in social network. There are many user pairs with a high similarity, but they are not linked. Moreover, we cannot consider that user pairs without links do not have similarity between them in social network.
UN-PNMF_1 is a variant of UN-PNMF. Besides UN-PNMF, UN-PNMF_1 has a better performance compared with other predictors. The average prediction accuracy of UN-PNMF is approximately 1.36% higher than UN-PNMF_1, which shows that jointly embedding network structure and latent factor In_Out are more efficient than embedding them separately. At the same time, it proves that the negative network structure and latent factor of user pairs interacting with each other could be helpful to predict negative links in social media.
UN-PNMF always performs far better in predicting negative links than PMF, so it is helpful that embedding the latent factor In_Out can explore some node pairs with negative links. The average prediction accuracy of UN-PNMF is approximately 3.06% higher than PMF, which shows that embedding In_Out can improve the performance of predicting negative links.
The performance of MF, UN-PNMF_1, PMF, hTrust, and UN-PNMF is much better than that of random, which supports that modeling negative link properties can improve the performance significantly. The average prediction accuracy of UN-PNMF is much higher than random. However, we find that the performance of predicting long-term negative links is not very good. As Gnt = Gn5, the prediction accuracy of UN_PNMF predicting Gn10 just reaches 9.15% and 9.41% in Slashdot and Epinions, respectively. Therefore, UN_PNMF cannot capture the characteristics of node pairs in dynamic networks very well and is not effective in predicting long-term negative links.
In order to explore the impact of different input data on UN-PNMF, eight groups of experiments are designed to observe the results. Other parameters are fixed, and Gnt can be {Gn5,Gn6,Gn7,Gn8}. This part integrates the different experimental results of different Gnt, and the comparison results are shown in Figure 4 and Table 2.
Performance comparison for different input Gnt.
Slashdot
Epinions
t′=6
t′=7
t′=8
t′=9
t′=10
t′=6
t′=7
t′=8
t′=9
t=10
t=5
0.1671
0.1408
0.1194
0.1022
0.0915
0.1406
0.1165
0.1032
0.0993
0.0941
t=6
0.1558
0.1211
0.1080
0.0958
0.1429
0.1199
0.1073
0.1025
t=7
0.1549
0.1127
0.0901
0.1357
0.1126
0.0992
t=8
0.1127
0.0873
0.1336
0.0921
Performance comparison for different input Gnt.
The first observation is that, with the increase of t, the performance of the proposed UN-PNMF framework reduces. And it also can be found that UN-PNMF achieves best performance when the input data Gnt is equal to Gn5 and Gn6 in Slashdot and Epinions, respectively. In general, with more old negative links, more effective data information can be learned to predict new negative links, and the predictor should also obtain better performance. However, the experiment results are quite contrary to this situation. By analyzing the reasons, as the negative links add in input matrix, the negative links which need to be predicted become less and less and the sparsity gets more and more serious, so inferring new negative links becomes more and more difficult. It also finds that the prediction accuracy of short-term is much better than long-term. For example, when the input matrix is Gn5, the accuracy of UN-PNMF predicting Gn6 is 16.71%, but the accuracy of UN-PNMF predicting Gn10 is 9.15% in Slashdot. The difference between them is 7.56%; hence the results of each group of experiments present descending trend in Figure 4. With the result of the social network being fast-evolved, the interactive data among users vary from hour to hour, which leads to the decrease of the reference value of current data. Therefore, the prediction accuracy of long-term negative links gets low.
8.3. Parameters Setting
This section investigates the impact of parameters with different values on UN-PNMF framework. Because the values of parameters play an important role in machine learning algorithms, appropriate parameters could improve the performance of algorithms. The proposed framework UN-PNMF includes three parameters which are the dimension of representative vector k, regularization coefficient α, and parameter λ. These parameters are important but are not to tune. The range of regularization coefficient α generally is from 0 to 1, and α empirically is set to 0.5. This part intends to explore the impact of different k on negative link prediction. Because the too large value of k not only cannot reduce the sparsity but also could preserve some noise data; however, the too small value of k must lose some effective information. Finding an appropriate k has great significance for the framework UN-PNMF. Eight groups of experiments are designed to compare the performance of UN_PNMF with different k. The k is varied as {3,5,6,8,10,20,50,100} and the input data Gnt is fixed to Gn5. The results are shown in Figure 5. Since the selection process of k is similar in two datasets and the space is limited, we take Slashdot dataset as the example.
Effect of different dimensions k.
In general, with the increase of k, the performance of predictors shows similar patterns: first increasing, reaching its peak value and then degrading. These patterns can be used to determine the optimal value of k for UN-PNMF in practice. In Figure 5, it can be observed that
When k increases from 3 to 6, the performance improves a lot, which shows that, with the increase of k, the low-dimensional vector W contains more and more effective information data, and it helps UN-PNMF improve the ability of negative link prediction.
UN-PNMF achieves its best performance when k = 6, which shows that the W contains relative optimal data information.
From k = 6 to k = 100, the performance decreases rapidly. This can be explained by the fact that, with the increase of k, more and more noise data are contained, which is harmful to the effectiveness of UN-PNMF.
Thus it can be seen that, in social network, the negative link prediction is faced with two difficulties, respectively, the data sparseness of links and much noise data of latent factors. Only by finding an appropriate dimension k, can UN-PNMF achieve the best performance.
Parameter λ controls the degree of latent factor In_Out in the formulation of negative links. UN-PNMF can control the influence of latent factor In_Out on predicting new negative links by setting an appropriate value of λ. In the development of social network, the formulation of negative links is a result that might be influenced by multiple latent factors and social network structure. Due to the limited research level about this problem, this paper only introduces the latent factor In_Out. However, with the in-depth research, more efficient latent factors could be found, so controlling parameters setting is crucial to UN_PNMF framework. If the value of λ is set to be too large, UN-PNMF must exaggerate the effect of latent factor in formulating negative links and mislead the negative link prediction. For example, some user pairs without links have a long distance in network topology, but they would be wrongly predicted to establish negative links only because the framework exaggerates the effect of In_Out. If the value of λ is set to be too small, UN-PNMF would neglect the effect of In_Out in establishing negative links between two users. In order to seek an appropriate value of λ, eight groups of experiments are designed to compare the performance of UN-PNMF with different λ, and the λ is varied as {0.1,0.5,1,5,10,20,50,100}. The results are shown in Figure 6. Similarly, this part also takes Slashdot dataset as the example. It can be observed, when λ increases from 0.1 to 20, the performance improves a lot, which shows that, with the increase of dimension λ, the effect of In_Out can help to predict new or missing negative links. UN-PNMF achieves its best performance when λ = 20, which shows that the effect of In_Out is controlled to the relative optimal state. From λ = 20 to λ = 100, the performance decreases rapidly. This can be explained by the fact that, with the increase of λ, the effect of In_Out could be greatly exaggerated. Accordingly, this paper chooses λ = 20 as the most suitable value in the UN-PNMF framework.
Performance of UN-PNMF by varying the λ.
This section also investigates the convergence of the UN-PNMF framework. For illustrative purpose, the change of the value of objective function can be drawn for two datasets, in Figure 7. As shown in the graph, the value of the objective function continuously decreases and then stabilizes. The results show that the Algorithm 1 usually converges to a stable value.
Convergence of the UN-PNMF.
9. Conclusion
This paper studies the problem of the negative link prediction based on network embedding, which only focuses on the topology of social network and does not rely on any interaction data. Firstly, the paper seeks to learn the low-dimensional vector representation of social negative network through the projective nonnegative matrix factorization. Secondly, the latent factor In_Out is discovered, which contributes to the formulation of negative links in social media, and the embedding of latent factor matrix In_Out is conducted by a low-dimensional projection operator. Lastly, the network structure and latent factor matrix are embedded into the same low-dimensional space corporately, and an unsupervised framework UN-PNMF is proposed. Extensive experiments are conducted on two datasets from real-world product review sites to evaluate the proposed framework, and the experimental results demonstrate that UN-PNMF consistently outperforms other negative link prediction methods.
The negative links are very sparse, making the negative link prediction very difficult. However, studying the negative links has great significance for the development of social network and the discovery of social problems, so the negative link prediction is worth having further studies. There are several interesting directions that need to be investigated in future. With in-depth analysis in social network, more and more valuable latent factors of user pairs could be found. The future study can also combine matrix factorization with supervised methods to propose a semisupervised method to predict negative links. A flexible and unified framework for a specific study on social network can also be proposed in the future, which can not only do link prediction but also do other assignments, such as node classification, community detection, and recommendation system.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the National Natural Science Foundation of China [Grant no. 61472160], the National Key Technology Research and Development Program of China [Grant no. 2014BAH29F03], the National Natural Science Foundation of China [Grant no. 61872161], the National Natural Science Youth Fund [Grant no. 61602057], the Jilin Province Science and Technology Development Plan Project [Grant no. 2018101328JC], and the Project of Excellent Young Talents Fund of Jilin Provincial Science and Technology Department [Grant no. 20170520059JH].
ChengK.LiJ.TangJ.LiuH.Unsupervised sentiment analysis with signed social networksProceedings of the 31st AAAI Conference on Artificial Intelligence2017San Francisco, Calif, USA342934352-s2.0-85029067784TangJ.GaoH.HuX.LiuH.Exploiting homophily effect for trust predictionProceedings of the 6th ACM International Conference on Web Search and Data Mining2013Rome, Italy536210.1145/2433396.24334052-s2.0-84874223723WangY.WangX.TangJ.Modeling status theory in trust predictionProceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence2015Austin, Tex, USA18751881GarciaduranA.NiepertM.Learning graph representations with embedding propagationProceedings of the 31st Conference on Neural Information Processing Systems2017Long Beach, Calif, USA112KouraY. H.ZhangY.LiuH.Competitive interaction model for online social networks' users' data forwarding at a subnetMaH.ZhouD.LiuC.KingI.What your images reveal: exploiting visual contents for point-of-interest recommendationProceedings of the 26th International World Wide Web Conference Committee2017Perth, Australia391400WuL.LiuH.Tracing fake-news footprints: characterizing social media messages by how they propagateProceedings of the 11th ACM International Conference on Web Search and Data Mining2018Los Angeles, Calif, USA63764510.1145/3159652.3159677WangX.LiuY.NanY.A stable-matching-based user linking method with user preference orderHuY.JohnA.WangF.KambhampatiS.ET-LDA: joint topic modeling for aligning events and their twitter feedbackProceedings of the 26th AAAI Conference on Artificial Intelligence2012Toronto, Ontario, Canada5965HuangC.LiuM.GongH.XuF.Season-aware attraction recommendation method with dual-trust enhancementTangJ.ChangS.AggarwalC.LiuH.Negative link prediction in social mediaProceedings of the 8th ACM International Conference on Web Search and Data Mining2015Shanghai, China879610.1145/2684822.2685295LichtenwalterR. N.LussierJ. T.ChawlaN. V.New perspectives and methods in link predictionProceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining2010Washington, Wash, USA24325210.1145/1835804.18358372-s2.0-77956192510WangS.TangJ.AggarwalC.ChangY.LiuH.Signed network embedding in social mediaProceedings of the 17th SIAM International Conference on Data Mining2017Houston, Tex, USA327335LeskovecJ.HuttenlocherD.KleinbergJ.Predicting positive and negative links in online social networksProceedings of the 19th International Conference on World Wide Web2010Raleigh, NC, USA64165010.1145/1772690.17727562-s2.0-77954580498ZolfagharK.AghaieA.A syntactical approach for interpersonal trust prediction in social web applications: combining contextual and structural dataTangJ.HuX.ChangY.LiuH.Predictability of distrust with interaction dataProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management2014Shanghai, China18119010.1145/2661829.2661988WangX.WangY.SunH.Exploring the combination of Dempster-Shafer theory and neural network for predicting trust and distrustOhH.KimJ.KimS.LeeK.A probability-based trust prediction model using trust message passingProceedings of the 22nd International Conference on World Wide Web Companion2013Janeiro, Brazil161162WuL.GeY.LiuQ.ChenE.LongB.HuangZ.Modeling users' preferences and social links in social networking services_ a joint-evolving perspectiveProceedings of the 30th AAAI Conference on Artificial Intelligence2016Phoenix, Ariz, USA279286WangD.CuiP.ZhuW.Structural deep network embeddingProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining2016San Francisco, Calif, USA12251334YangZ.OjaE.Projective nonnegative matrix factorization with α-divergenceProceedings of the 19th International Conference on Artificial Neural Networks: Part I20092029RotterJ. B.Interpersonal trust, trustworthiness, and gullibilityChenX.CaoL.LiC.XuZ.LaiJ.Ensemble network architecture for deep reinforcement learningTangJ.AggarwalC.LiuH.Node classification in signed social networksProceedings of the Siam International Conference on Data Mining2016Fla, USA5462BrentanB. M.CampbellE.MeirellesG. L.LuvizottoE.IzquierdoJ.Social network community detection for dma creation: criteria analysis through multilevel optimizationGuoL.ZuoW.PengT.YueL.Text matching and categorization: mining implicit semantic knowledge from tree-shape structuresZhuS.YuK.ChiY.GongY.Combining content and link for classification using matrix factorizationProceedings of the SIGIR2007Amsterdam, The Netherlands487494