Sign Inference for Dynamic Signed Networks via Dictionary Learning

Mobile online social network (mOSN) is a burgeoning research area. However, most existing works referring to mOSNs deal with static network structures and simply encode whether relationships among entities exist or not. In contrast, relationships in signed mOSNs can be positive or negative and may be changed with time and locations. Applying certain global characteristics of social balance, in this paper, we aim to infer the unknown relationships in dynamic signed mOSNs and formulate this sign inference problem as a low-rank matrix estimation problem. Specifically, motivated by the Singular Value Thresholding (SVT) algorithm, a compact dictionary is selected from the observed dataset. Based on this compact dictionary, the relationships in the dynamic signed mOSNs are estimated via solving the formulated problem. Furthermore, the estimation accuracy is improved by employing a dictionary self-updating mechanism.


Introduction
Over the past few years, a number of mobile applications that allow users to enjoy networking have emerged.Correspondingly, there has been a proliferation in mobile online social networks (mOSNs).With the ubiquitous use of mobile devices and a rapid shift of technology, it is worthy to investigate the mOSNs from a privacy or security standpoint [1,2].The related applications are also extensive such as authentication and recommendation online.In this context, researches about mobile online networks where two opposite kinds of relationships can occur have become common; people not only form links to indicate friendship, support, or approval but signify disapproval or distrust of the opinions of others.It is natural to model such networks as signed networks, where the sign of a link weight can be either positive or negative, representing the status of a relationship.Analogous to traditional social networks analysis, the relationships in signed mOSNs can be represented as a graph, where nodes denote the objects (e.g., people or mobile terminals) and signed edges denote the relationships or links (e.g., a communication made between two people).The link structure of the resulting graph can be exploited to detect underlying groups of objects, predict missing links, and handle many other tasks [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17].
One of the most fundamental theories that are applicable to signed social networks is social structural balance [5,6,16].Structural balance corresponds to the possibility of exactly dividing the signed graph into two adversary subcommunities such that all edges within each subcommunity have positive weights while all edges joining agents of different communities have negative weights.Obviously, graphs of nonnegative weights are a special case of structural balance, in which one of the two subcommunities is empty.Since the assumption that structural balance exists in a real signed network might be too extreme, a concept called weak structural balance further generalizes structural balance by discussing the multiadversary-subcommunities partition of the signed graph [7].
Structural balance and weak structural balance have been shown to be valid to analyze signed networks.For instance, the sign inference problem, which aims to infer the unknown relationship between two objects, can be solved by mining balance information of signed networks from local and global perspectives [8][9][10][12][13][14][15][16][17].With the help of the result inferred, it is possible to predict the relationships so that legitimate selection method is designed by reducing the size of an overcomplete feature set extracted from the training dataset.Also, a dictionary self-updating mechanism is introduced to improve accuracy of the inference.
Here are the key contributions we make in this paper.
(i) A dictionary selection approach based on group sparsity has been designed to generate a set containing minimal sizes of features to increase computational efficiency.Specifically, the observation tensor is considered to be the raw materials for feature extraction.
(ii) The sign inference problem referring to the weakly balanced mOSNs is formulated as a low-rank matrix reconstruction from the selected dictionary.Under certain mild conditions, a low-rank matrix reconstruction algorithm is applied to solve the sign inference problem, and it turns out to be much more accurate and efficient than other inference methods in the literature.A dictionary self-updating mechanism is also introduced to adjust the dynamic characteristics of the network and improve the sensing accuracy.
The rest of this paper is organized as follows.In Section 2, we build the model of the dynamic signed network.Some basics of balance theory are also reviewed for the sake of integrality.In Section 3, we first extract the initial candidate feature pool from the observation tensor and propose a dictionary selection approach.Then we propose our low-rank matrix reconstruction method to solve the sign inference problem.The implementation details of the dictionary selfupdating procedure are also proposed.In Section 4, we conduct numerical experiments which demonstrate the validity of our network model for sign inference and justify the performance of our methods as well.Finally, we present our conclusions in Section 5.

Background and Preliminaries
2.1.Dynamic Signed Network Structure.Formally, a dynamic undirected signed network is represented as a dynamic graph G = (V, E), where V is the vertex set of size  and E is the edge set varying over time.A network snapshot denoted by S  = (V, E  , A () ) presents the connections of G observed at time .Here, E  is the subset of E and A () ∈ {−1, 0, 1} × is the adjacency matrix of S  with the signed weights   0 ,  0 + 1, . . .,  0 +  − 1, respectively.Correspondingly, we let A ∈ {−1, 1} ×× denote the three-dimensional tensor that contains relationship information between all pairs of entities in G. Thus, the observation tensor A consisting of a series of network snapshots can be represented as where Ω is the index set of the observed entries.Let P Ω be the orthogonal projection operator onto the span of tensors vanishing outside Ω so that the (, , )th component of P Ω (X) is equal to X ,, when (, , ) ∈ Ω and zero otherwise.Then we have P Ω (A) = A (shown in Figure 1) and P Ω  (A () ) = A () for each time slice , where ⋂  Ω  = 0 and ⋃  Ω  = Ω.
While the above kind of signed networks is called homogeneous, that is, relationships of the networks are between the same kinds of entities, a signed network can also be heterogeneous.In a heterogeneous signed network, there can be more than one kind of entities, and relationships between same or different entities can be positive and negative, such as YouTube with two kinds of entities-users and videos.Moreover, this three-dimensional network adjacency tensor can increase dimensions (e.g., spatial dimension, etc.) to adapt to a wider range of scenarios.In this paper, we mainly focus our attention on three-dimensional homogeneous signed networks.

Weak Structural Balance.
Structural balance theory was first formulated by Heider [18] in order to understand the structure in a network of individuals whose mutual relationships are characterized in terms of friendship and hostility.Formally, a triad is considered balanced if the product of the signs in the triad is positive; that is, it contains an even number of negative edges.This is in agreement with principles such as "a friend of my friend is more likely to be my friend" and "an enemy of my friend is more likely to be my enemy" [6].The configurations of balanced and unbalanced triads are shown in Figure 2. One possible weakness of this theory is that the defined balance relationships might be too strict.In this perspective, by extending the fundamental beliefs in real networks, weak structural balance is proposed as a way of eliminating the assumption that "the enemy of my enemy is my friend" [7].Equivalently, the case that "the enemy of my enemy is my enemy" is permitted.Therefore, the local structure of weak balance posits that only triads with exactly two positive edges are implausible and that all other kinds of triads should be permissible (also illustrated in Figure 2).
The formal definition of weakly balanced networks is as follows.
Definition 1 (weakly balanced networks [7]).A (possibly incomplete) network is weakly balanced if and only if it is possible to obtain a weakly balanced complete network by filling the missing edges in its adjacency matrix.Furthermore, in terms of patterns of global structure, a complete network is weakly balanced if and only if the vertex set can be divided into  clusters,  ≥ 1, such that all the edges within clusters are positive and all the edges between clusters are negative.
There exists the literature discussing the approaches of clustering and sign prediction with respect to signed networks.Ideas derived from local balance of signed networks can be successfully used to yield algorithms for sign inference [9,10].Meanwhile, several works analyze the social interrelations from global perspective of structural balance [8,[13][14][15]17].In particular, it is shown in [8] that the adjacency matrix of weakly balanced networks has a "low-rank" structure, and the sign prediction methods based on low-rank modeling were proposed as well.
Theorem 2 (low-rank structure of signed networks [8]).The adjacency matrix A ∈ {1,−1} × of a complete -weakly balanced network has rank 1, if  ≤ 2, and has rank  for all  > 2. Actually, since the global viewpoint of weak balance stated in Definition 1 obeys clustering characteristics presented in Theorem 2, for A, there exists an invertible matrix P such that where 1   on the primary diagonal is an   -order square matrix whose entries are all 1 (∑  =1   = ) and the other entries of PAP  are all −1.The   -order square matrix indicates the th cluster.

Sign Inference via Dictionary Learning
In this section, we focus on a solution of the sign inference to estimate connection statuses via dictionary learning.As the preparation, we propose a large-scale dictionary selection method to generate the dictionary for inferring.Assume that we are given a (usually incomplete) network observation tensor A sampled from an underlying dynamic weakly balanced complete network G with the adjacency tensor A.
As the description in Section 1, it is reasonable to suppose that most relationships between entities have their own stability in a long period of time in practice and subsequently the change in the scale of each subcommunity is limited.Apparently, this implies the strong dependence retained among the observed data.Combining these assumptions with the lowrank characteristic of weakly balanced complete networks, we extract an initial feature pool from the observation tensor A and propose a dictionary selection method to compress the scale of the feature pool in Section 3.1.The corresponding algorithm is presented, respectively, in Section 3.2.With the trained dictionary, we propose our sign inference approach and dictionary updating mechanism in Section 3.3, which are also inspired by the low-rank characteristic of weakly balanced complete networks.The method we propose to handle the dictionary selection is motivated by the Singular Value Thresholding (SVT) algorithm, which is a simple and efficient algorithm for nuclear norm minimization problems proposed by Cai et al. [20].Our basic idea is to obtain the optimal solution of the trace norm minimization problem by solving its dual problem whose objective function can be shown to be continuously differentiable with Lipschitz continuous gradient.Specifically, we prove that the optimal solution of the primary problem can be readily obtained from the optimal solution of the dual problem.We first provide a brief review of the standard SVT algorithm.
Considering the problem min Cai et al. [20] give a theoretical analysis that, when  → ∞, the optimal solution of problem (6) converges to that of the standard problem: Given that  > 0, the SVT algorithm operates as a linear Bregman iteration scheme.Furthermore, by defining the Lagrangian function of problem (6) as where Y is the Lagrangian dual variable, we can derive its dual function as Cai et al. show that SVT indeed optimizes the dual function (Y) via the gradient ascent method.
3.1.Large-Scale Dictionary Selection.We address how to select the dictionary given an initial candidate feature pool in this subsection.To this end, we first extract an initial candidate feature pool from A, which is sampled from A.
Since A consists of the adjacency matrices A () ( =  0 ,  0 + 1, . . .,  0 +  − 1), the matrix A () in A can retain the information of A () more or less.Thus, we reserve the group of A () with relatively higher sample rate to extract features.We use singular value decomposition (SVD) to express each A () as a series of orthogonal bases in Hilbert space; that is, where u ()  and k ()  are singular vectors of A () with eigenvalue  ()   , 1 ≤  ≤   .Without loss of generality, we sort the singular values of A () in descending order, and set Then, due to the low-rank property of the weakly balanced complete adjacency matrix, we keep the group of U ()  corresponding to the   largest  ()   as the features.By this procedure, we extract an initial candidate feature pool as {U ()   :  0 ≤  ≤  0 +  − 1, 1 ≤  ≤   }, where each matrix U ()   ∈  × denotes a feature.Equivalently, we can discuss Due to massive data of the initial feature pool Φ, we hope to find an optimal subset to form the dictionary Ψ = [vec(U 1 ), vec(U 2 ), . . ., vec(U  )] such that the set Φ can be well reconstructed by Ψ and the size of Ψ is as small as possible.To achieve this goal, we select Ψ such that the rest of the features in Φ can be well reconstructed using it.Analogous to the optimization problem in [21], the basic problem is formulated as follows: where Φ ∈  × ( =  2 ), X ∈  × , and ‖X‖ 2,1 = ∑  =1 ‖X ⋅ ‖ 2 .Apparently, ‖X‖ 2,1 enforces the group sparsity on the variable X and the optimal solution usually contains zero rows.This means that not all features in Φ are necessary to be selected to reconstruct any data sample.
Motivated by SVT, we have the equivalent problem of (12) as follows: The Lagrangian function of problem ( 13) is defined as and its dual function is We first examine the properties of the dual function (Y) and then show how to achieve the optimal solution of the problem (13) from its dual optimum directly.As the mixed norm ‖X‖ 2,1 is not differentiable, it is difficult to optimize the dual function (Y) directly.However, we can obtain a useful property of the dual function(Y) as follows.
Theorem 3.For all  ≥ 0, the dual function (Y) is continuously differentiable with Lipschitz continuous gradient at most .Furthermore, the primal optimal X of the problem (13) is given by when the dual optimal Ŷ of the problem (13) is obtained.
The proof of Theorem 3 is based on the following results.

Lemma 4.
For each  ≥ 0 and Y ∈  × , one has As a matter of fact, considering the following optimization problem: it is easy to show that the unique solution admits a closed form called the soft-thresholding operator, following a terminology introduced by Donoho and Johnstone [22]; it can be written that Thus, from a generalized view, one has Lemma 4. Also, the following result can be deduced based on the properties of Moreau-Yosida regularization [23].Proof of Theorem 3. Since where  = sup ‖Z‖  =1, Z∈ × ‖Φ  Z‖  .Then the gradient of (Y) can be obtained as follows: It follows that, for any where the first inequality follows from (20) and  = sup ‖Z‖  =1, Z∈ × ‖ΦΦ  Z‖  .When the dual optimal Ŷ is obtained, by using the result of ( 21), we can get This concludes the proof.
Since (Y) is the dual function of the objective function (13), which is convex.Thus, the following holds for any It is also easy to show that (Y) belongs to the class S 1,1 0, ( × ) and where I ∈  × is the identity matrix.Therefore, we can solve problem (13) by minimizing the objective function (Y); that is, Therefore, the dictionary Ψ is selected by the optimal solution Ŷ; that is, the th column of Φ is chosen to be the atom of Ψ if ‖ Ŷ⋅ ‖ 2 ̸ = 0.The optimization algorithm is presented in the next subsection.

Optimization Methods.
In this subsection, we develop an efficient optimization algorithm to solve the dual problem (29).Because the objective function (Y) is continuously differentiable with Lipschitz continuous gradient, it is feasible to utilize gradient-based optimization methods to achieve the optimal solution for their simplicity and low complexity within each iteration.However, classical gradient-based methods for functions with Lipschitz continuous gradient converge at a rate of (1/), where  is the number of iterations during optimization [19].In fact, this is too slow especially when dealing with large-scale datasets.Note that Nesterov showed in his work [24] that an accelerated gradient algorithm can be constructed such that (1/ 2 ), the lower bound on the convergence rate for gradient-based methods [25], is achieved when minimizing unconstrained smooth functions.With this consideration, in the following we propose an accelerated thresholding algorithm to solve these smooth convex optimization problems using Nesterov's method with an adaptive line search scheme [19,26].
We recall Nesterov's method with an adaptive line search scheme as follows.Take the unconstrained smooth convex minimization problem min y∈  (y), for instance, where (y) belongs to S 1,1 , (  ),  ≥ 0, and  < +∞.Nesterov's method for this problem utilizes two sequences: {y  } and {s  }, y  , s  ∈   .The searching point s  satisfies where   is a tuning parameter.The approximate solution y +1 can be computed as a gradient step of s  as where 1/  is the step size.Starting from an initial point y 0 , s  and y +1 can be computed recursively according to (30) and ( 31) and can arrive at the optimal solution ŷ.Although it has been shown that Nesterov's method is a very powerful optimization technique for class S 1,1 , (  ) [19], how to choose   and 1/  in each iteration is a critical issue in Nesterov's method.When they are set properly, the sequence {y  } can converge to the optimal ŷ at a certain convergence rate.As a well-known scheme for setting   and   , Nesterov's constant scheme assumes   and   to be constant [19], while Nemirovski's line search scheme requires   to monotonically increase, and   is independent of   [27].Both of the settings result in slow convergence.
To overcome this drawback, an adaptive line search scheme for Nesterov's method is proposed in [26].Under the assumption that μ, the low bound of , is known in advance, this scheme is built upon the estimate sequence [19] defined as follows.
The estimate sequence defined in Definition 6 has the following important property.
We further specify the estimation sequence in [19]: where the sequences {  }, {k  }, and { φ } satisfy Then Algorithm 2 in [26] is proposed by modifying Nemirovski's line search scheme with the adaptive parameters of this sequence, which satisfy Theorem 7.
Note that Theorem 3 indicates that the objective function (Y) satisfies the conditions of using Nesterov's method with an adaptive line search scheme.Therefore we directly extend Algorithm 2 in [26] to the high-dimensional scenarios to solve (29).The complete procedures are summarized in Algorithm 1.
In Algorithm 1, the while loop from Step 4 to Step 13 is designed to choose a proper step size to satisfy Step 8.As the Lipschitz gradient of (Y) is ,   is upper bounded by 2 since Step 8 always holds when   ≥  [27].In Step 14, we initialize  +1 = ℎ()  , where and  > 1 due to the condition in Step 8 [26].Apparently, when  is large,  +1 can be adjusted to avoid the step size 1/  becoming too small, which may slow down the convergence rate.

Sign Inference and Dictionary
Update Mechanism.This subsection details how to use the dictionary to solve the sign inference problem.Actually, this problem bears similarity to the sign prediction problem in the static signed networks or the unsigned networks varying periodically [3,8,11,12].
In this paper, we intend to infer the unknown relationship between a pair of entities  and  based on partial relationship observations of the entire dynamic network at time  0 +.We expect to accomplish this task with the help of the dictionary constructed by the relationship data for times  0 through  0 +−1.As aforementioned, there exists strong dependence between the connection status at time  0 +  and the history relationship dataset in the dynamic network.We formulate the sign inference problem as follows: where Ψ is the dictionary and y is the invertible vectorization of the matrix A ( 0 +) observed at time  0 + ; that is, y = vec(A ( 0 +) ).Because A ( 0 +) = ∑   ( 0 +)  U ( 0 +)  by using SVD and subsequently (1) Input: μ,  −1 = 0.5, ≤ 0, to ensure the elements coinciding with the value setting of relationships.
Furthermore, assume that we are given a sequence input samples Y = [y ( 0 +) , y ( 0 ++1) , . . ., y ( T) ], where y () = vec(A () ),  0 +  ≤  ≤ T, the task of the sign inference becomes to reconstruct the complete adjacency matrices A () one by one.Since the A () may contain some features which are not included in dictionary, it is necessary to add these features into the dictionary to increase the accuracy of the inference.However, the inferred matrix is not the original matrix exactly and consequently the unobserved relationships are not really known.In contrast, the observed adjacency matrix A () retains all existing relationships.For this reason, we only use A () to extract the features rather than the optimal solution of (35).We apply the extracting approach in Section 3.2 and add the complementary features into the dictionary.Note that this operation will continuously increase the scale of the dictionary while the samples keep inputting for inference; the dictionary selection approach proposed in Section 3.2 will be applied to compact the dictionary once the size of the dictionary exceeds a predetermined bound.

Numerical Experiment
In this section, we perform experiments on synthetic networks and show that our low-rank model and dictionary learning method outperform other methods on the task of the sign inference for dynamic signed networks.To ensure that our results are reliable, we conduct all experiments 20 times and average out the results from all of the trials.
To construct synthetic networks, we first consider a weakly balanced complete network G whose adjacency tensor is A. The slide of A at time  is an adjacency matrix A (𝑡) in the form of (3).In addition, only a few patterns of A () exist in A. The observation tensor A is formed by sampling some entries from A. Concretely, we let the adjacency tensor A of G consist of 50 250 × 250 matrices of complete 4weakly balanced structure.For the network G, four clusters are generated randomly.The size of each cluster is larger than 20 and the sum of the sizes is 250.We further assume that only a part of network relationships is observed by uniform sampling with probability  ∈ (0, 1).It results in  2  entries being randomly sampled from A () , where  is the fraction of observed entries.We choose a set of matrices whose lost rates are from 0.05 to 0.55 and apply the approach proposed in Section 3.2 to select the dictionary Ψ.
With the dictionary Ψ and the given observed matrix A () at time  ≥  0 + , the task of the sign inference is achieved by solving (35).We use BAOMP to estimate the complete matrix A () and compare the performance of our approach to two state-of-the-art methods, alternating least square (ALS) [29] and singular value projection (SVP) [30], for the sign inference problem.Different from accuracy defined by the relative error on the observed set in [8], we utilize the   ) , Â() ⟩|/‖A () ‖  ‖ Â() ‖  .We vary the lostrate of the original matrix A () from 0.5 to 0.999 and plot the inference accuracy in Figure 3 (lost-rate: 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.96, 0.97, 0.98, 0.99, 0.995, and 0.999).Apparently, dictionary learning outperforms ALS and SVP.To present our result more clearly, we also use a visual expression in which the white pixels represent 1 and the black pixels represent −1. Figure 4 shows one example of the sign inference and we find that relationships and the clusters can almost be accurately estimated by our inference approach.

Conclusion
In this paper, we establish a low-rank tensor model for the dynamic weakly balanced signed networks.With this model, we first extract the feature pool and propose an approach to extract the compact dictionary from pool.To improve the performance of the selection approach, we derive the corresponding dual problem and introduce an accelerated thresholding algorithm to solve the dual problem.Consequently, the optimal solution of the primary problem can be readily obtained from optimizing the dual problem.In addition, combined with the compact dictionary generation method, the sign inference approach is provided for estimating

Figure 1 :
Figure 1: Illustration of the adjacency tensor, the cube units symbolize the data of relationships: (a) the adjacency tensor of the observed network and (b) the adjacency tensor of the underlying complete network.
of the original matrix Similarity between the recovered and original matrices ALS recovery Dictionary learning SVP recovery

Figure 3 :Figure 4 :
Figure 3: Accuracy of sign inference algorithms on synthetic datasets.In general, we can see that dictionary learning outperforms ALS and SVP.
and  have positive relationship, −1 if  and  have negattive relationship, 0 if relationship between  and  is unknown.