Mining Important Nodes in Directed Weighted Complex Networks

In complex networks, mining important nodes has been a matter of concern by scholars. In recent years, scholars have focused on mining important nodes in undirected unweighted complex networks. But most of the methods are not applicable to directed weighted complex networks. Therefore, this paper proposes a Two-Way-PageRank method based on PageRank for further discussion ofmining important nodes in directed weighted complex networks.We havemainly considered the frequency of contact between nodes and the length of time of contact between nodes. We have considered the source of the nodes (in-degree) and the whereabouts of the nodes (out-degree) simultaneously.We have given node important performance indicators.Through numerical examples, we analyze the impact of variation of some parameters on node important performance indicators. Finally, the paper has verified the accuracy and validity of the method through empirical network data.


Introduction
Complex networks are composed of nodes and edges between nodes [1].But the importance of each node is different in most cases; that is, different nodes have different weights.In the actual network, identifying some important nodes is crucial for understanding and controlling the whole networks [2][3][4][5].
Currently, most of the algorithms for mining important nodes focused on undirected unweighted complex networks.For example, degree centrality [6] represents that the more the number of neighboring nodes is, the more important the nodes are.Information index [7] depends on the amount of information through its propagation path.Kitsak et al. [8] proposed -shell decomposition.In addition, there are many other concepts such as closeness centrality [9], subgraph centrality [10], eigenvector centrality [11], and cumulative nomination [12] which were proposed to evaluate the importance of the nodes in the networks.In those regards, Ren and Lv [13], Wang and Zhang [14], and He et al. [15] have done excellent summaries.Further, Sun and Luo [16] and Liu et al. [17] also summarized the history of the methods of mining important nodes in complex networks and summarized the research results.Most of the discussions focused on undirected unweighted complex networks.However, undirected unweighted complex networks only reflect the connections between the nodes and the topology of complex networks.They cannot describe the directions and intensities of interaction between nodes, since when the networks are abstracted into simple undirected unweighted networks they would lose a lot of existing information which is helpful to analyze accuracy.So, some scholars [18][19][20][21] have begun to research on it.Among them, Hu [22] proposed an evaluation method for the importance of a node in directed weighted complex networks based on PageRank-DWNodeRank evaluation method.Chen [23] improved DWNodeRank algorithm and proposed B-DWNodeRank algorithm.The literature [24] analyzed the structural characteristics of weighted complex networks considering influence of the weight of an edge on nodes.Since there are many factors to identify the influential nodes, this issue can be seen as a multiattribute decision making model (MADM) [25,26].Many MADM methods, such as fuzzy sets [27] and evidence theory [28,29], are widely used to ranking the nodes in complex networks [30].The author gave a new definition on important degree of weighted nodes.
These literatures did not consider the out-degree of a node.In addition, PageRank [31] developed by Google founders Brin and Page at Stanford University considers that the most important pages on the Internet are the pages with the most links leading to them.In other words, the importance of a web page focuses on its inbound (in-degree), rather than its link out (out-degree).Indeed, the importance of a node depends on its in-degree and its out-degree.For example, the importance of a school is determined by its appeal to students (its in-degree) and students employment (its out-degree).The importance of a person depends on how many people he/she can attract and what he/she is concerned about.In addition, another major feature of this paper is that we have mainly considered the frequency of contact between nodes and the length of time of contact between nodes as the weight of an edge.
This paper proposes the Two-Way-PageRank method based on PageRank and analyzes the importance of two important factors that affect the importance of the nodes and gives the definition and expression of the importance of the nodes.Secondly, we give the expression of the importance of nodes.Subsequently, the effects of some parameters to the results are analyzed through numerical simulation.Finally, the conclusions are given.

Preliminaries
A directed weighted network (DWN) is a tuple DWN = (, , ). = {V 1 , V 2 , . . ., V  } is a finite set, and the elements of  are called nodes; that is, V  represents node .And  = {  } ⊆  ×  is a set of ordered tuples in .The elements of  are called nodes edges with || =  (| ⋅ | is the cardinality of a set).The indices ,  run from 1 to , where  is the size of the network.In a directed network, the edges are formed by ordered pairs of nodes, so ⟨V  , V  ⟩ ∈  represents an edge from V  to V  . = {  } is a set of edge-weights.  is the value of edge-weight.DWN is defined as a weighted network if   could be any real number greater than 0. In this paper, we consider a directed weighted network, and   is defined as follows.
Because DWN is a directed network, each V  has a weighted-out-degree for  out (V  ) and a weighted-in-degree for  in (V  ). out (V  ) is the sum of the weighted of edges which point out from V  .Similarly,  in (V  ) is the sum of the weighted of edges which point to V  .The weighted-in-degree and weighted-out-degree of nodes are thus related by the following expressions: where  out (V  ) is the set of nodes which point out from V  .Analogously,  in (V  ) is the set of nodes which point to V  .Generally speaking, the strength of a relationship depends primarily on two factors: the frequency of contact and the length of contact (intimacy).In various practical networks,   represents a different meaning.In the paper, it represents the strength of a relationship between V  and V  .If V  and V  represent persons,   means the closeness between them.The larger the frequency of contact and the length of time of contact (intimacy) are, the closer the relationship between them is.And these two factors are calculated based on [32] as follows.
(a) Frequency Factor.It depends on the frequency of V  pointing to V  , that is, the number of times that V  takes the initiative to meet with V  .
Here, FF  () is frequency factors of V  pointing to V  , and () is the number of times that V  takes the initiative to meet with V  .() is the number of times that all of the nodes take the initiative to meet with V  .
(b) Length of Contact (Intimacy).That is the length of time contact.It depends on the length of time that V  takes the initiative to contact with V  .
where FI  () is the length of contact of V  pointing to V  .() is the length of contact time that V  takes the initiative to meet with V  .() is the length of contact time that all of nodes take the initiative to meet with V  .Then with ,  variable parameters, and 0 < ,  < 1,  +  = 1.
We should note that, generally speaking,   ̸ =   .

The Two-Way-PageRank for Mining Important Nodes in Directed Weighted Complex Networks
Indeed, the importance of a node depends on its in-degree (its links source) and its out-degrees (its links whereabouts).
Hence, here we consider the importance of a node from both its weighted-in-degree and its weighted-out-degrees simultaneously.

The Definition of the Importance of V 𝑖 in Directed Weighted
Complex Networks.Let (V) be the importance of V  in directed weighted complex networks.We assume that the nodes which point to V  are  1 ,  2 , . . .,   (including V  ), and then the sum of the weighted-in-degrees of V  is as follows: Then V  has got an importance of value from the importance of V  : Likewise, we assume that the nodes to which V  points are  1 ,  2 , . . .,   (including node V), and then the sum of the weighted-out-degrees of V  is defined by Then node V has got an importance of value from the importance of V  : Then with (V  ) and (V  ) own importance of V  , V  .Moreover, ,  (0 < ,  < 1) are random jump factors and  +  = 1.V  in (9a) is the ending node of the edge that starts from node V.
As well, V  in (9b) is the node which points into node V.  VV  is weight value of edge  VV  and  V  V is weight value of edge  V  V .

The Algorithm of the Two-Way-PageRank.
Let M be the adjacency matrix of the directed weighted complex networks DWN, and M = (  ) × whose elements are the weight on the edge connecting V  to V  , and 0 otherwise.Here we use the convention   = 0. Then Here we normalize processing for M firstly; in other words, each element of the matrix is divided by the sum of the elements in its row.Thus we get the probability transition matrix P = (  ) × , and it can be written as with   the transfer probability from V  to V  .Obviously, each element of the matrix P is nonnegative.The sum of elements of each row is 1, and that is ∑  =1   = 1.So it is a random matrix.Let the probability transition matrix P make transpose and get the probability transition matrix P  .The reason for transposing to the matrix P is that we consider weighted-in-degrees of the node.
In addition, we normalize processing for M; namely, each element of the matrix is divided by the sum of the elements in its column.Thus we then obtain the probability transition matrix Q = (  ) × .
where   is the transfer probability from V  to V  .It is not difficult to find each element of the matrix Q is nonnegative.The sum of elements of each column is 1, and ∑  =1   = 1.Likewise, it is a random matrix.Similarly, the reason for doing this to the matrix Q is that we consider weighted-out-degrees of the node.Then according to (9a) and (9b), the equations for the matrix H 1 and the matrix H 2 can be explicitly solved, obtaining in which ( ×1 ×   ×1 ) × is the matrix whose elements are 1.It is not difficult to find that matrix H 1 and matrix H 2 are irreducible random matrixes, and they have an eigenvalue for 1.The eigenvectors of eigenvalue 1 are the stationary distributions of the matrix H 1 and the matrix H 2 .
We can use the power iteration method to compute the stationary distributions of the matrix H 1 and the matrix H 2 .The iterative formulas are as follows: Set the initial values of the importance of weighted-outdegree and weighted-in-degree for   ,   , respectively.So   = ( 1 ,  2 , . . .,   ),   = ( 1 ,  2 , . . .,   ).Here, for simplicity, let the initial vector be the ratio of the weighted-out-degree (outdegree) of V  ( = 1, 2, . . ., ) and the sum of the weightedin-degrees (out-degree) of all nodes in the network by the following expressions.
Given a precision error  > 0. The iteration would stop when At this time we get approximations  1 and  2 with  1 =  +1 1 and  2 =  +1 2 .Finally, calculate the formula  =  1 +  2 .Further, we rank the elements of  from big to small.It is the order of the importance of nodes.
It is worth noticing, however, that this indicates  ≪ 1.
The algorithm steps of the importance of mining important nodes in directed weighted complex networks are described in the following.
Step 2. Normalize processing for the adjacency matrix M, and get the probability transition matrices P and Q.
Step 3. Let the probability transition matrix P make transpose and get the probability transition matrix P  .
Step 4. Calculate matrices H 1 and H 2 of the directed weighted complex networks according to (9a) and (9b).
Step 5. Solve the stationary distributions of the matrix H 1 and the matrix H 2 using the power iteration method.And that is to calculate  1 = H 1  and  2 = H 2 .
Step 6. Count  =  1 +  2 and rank the elements of  from big to small.It is the order of the importance of nodes.

Experiment Simulation
In the section, we show the application of the method on a directed weighted network (see Figure 1).Its adjacency matrix can be expressed as M.
The results of   relate to the in-degree and out-degree of a node, so when only considering in-degree or out-degree of a node it is clearly not enough.When considering simultaneously the in-degree and out-degree of the nodes, we can better find the important nodes.Through the example, we can mine the important nodes preferably using the method of the Two-Way-PageRank.

Conclusions and Discussion
Recently, research on complex networks has shown that some real networks exhibit the property of important nodes.Some nodes play an important role in the actual network and control the entire network.Some different physical quantities are considered in the definitions of important nodes of complex networks.However, the existing studies on the importance nodes mainly have focused on undirected unweighted complex networks.The previous analytical study does not accurately reflect the actual information on the networks.In this paper, we therefore addressed the problem of mining important nodes in directed weighted complex networks by constructing a novel Two-Way-PageRank analysis method.We have presented a quantifiable metrics and shown how it can be used to analyze the relative importance of nodes in a network with respect to the contributions nodes which make the overall network connectivity.Numerical examples of real directed weighted complex networks show that when only considering the in-degree or out-degree of a node, the importance of the node cannot well be characterized.The Two-Way-PageRank analysis method proposed can well reveal the importance of the node of directed weighted complex  networks such as the infectious disease networks and social networks.To sum up, the proposed method is capable of revealing the importance of the node of directed weighted complex networks.These results not only deepen our understanding of the interplay between network topology and dynamical processes but also have implications in all areas where ranking has a role, from social network to marketing.
Our algorithm has been verified in small networks.In future work, we will further build a real data set and verify the algorithm.In addition, what is the relationship between the accuracy of the results and the number of iterations?How can we mine important nodes in directed weighted dynamic complex networks?In future work, we hope to address this problem more systematically.

Figure 1 :
Figure 1: An example directed weighted network consisted of 13 nodes and 24 edges.The figure shows the connection relationships of the directed weighted network.

Figure 2 :
Figure 2: The values of the importance index of nodes when  = 0.2.

Figure 3 :Figure 4 :
Figure 3: The values of the importance index of nodes when  = 0.45.

Table 1 :
The values of the importance of nodes.

Table 2 :
The ranks of the importance of nodes.