Key Node Discovery Algorithm Based on Multiple Relationships and Multiple Features in Social Networks

. The key nodes play important roles in the processes of information propagation and opinion evolution in social networks. Previous work rarely considered multiple relationships and features into key node discovery algorithms at the same time. Based on the relational networks including the forwarding network, replying network, and mentioning network in a social network, this paper ﬁrst proposes an algorithm of the overlapping user relational network to extract diﬀerent relational networks with same nodes. Integrated with these relational networks, a multirelationship network is established. Subsequently, a key node discovery (KND) algorithm is presented on the basis of the shortest path, degree centrality, and random walk features in the multi-relationship network. The advantages of the proposed KND algorithm are proved by the SIR propagation model and the normalized discounted cumulative gain on the multirelationship networks and single-relation networks. The experiment’s results show that the proposed KND method for ﬁnding the key nodes is superior to other baseline methods on diﬀerent networks.


Introduction
With the rapid development of social networks (e.g., Facebook, Twitter, and Sina Weibo), they have become main platforms for people to obtain, spread, and exchange information. In social networks, how do we quickly spread information? How do we effectively control the speed of virus diffusion? How do we efficiently suppress the width of rumor propagation? How do we correctly control and guide the evolution trend of public opinion? For these practical application scenarios, key nodes are able to play important roles in the structures and functions of networks [1][2][3]. In recent decades, scholars have mainly focused on single-relation networks [4,5]. Single-relation networks mean that the networks consist of the same type of nodes and only one type of relationship between nodes. Traditional single feature mining contained degree centrality and related variants of degree centrality [6]. Chen et al. [7] proposed a degree discount centrality algorithm to effectively make influence maximization. Sheikhahmadi et al. [8] proposed degree distance centrality algorithm. eir experiments showed that the performance of this algorithm was better than other measures including high degree and betweenness in eight large-scale networks. Wang et al. [9] proposed degree punishment method to select spreaders. ey adopted SIR (Susceptible-Infected-Recovered) model to assess the performance of the method. Combined with the neighbors' numbers, neighbors' influences, and clustering coefficient, Chen et al. [10] proposed a cluster rank algorithm. eir experiments showed that the performance of this algorithm is significantly superior to the degree centrality and k-core decomposition. Besides, there are many methods to find the key nodes in networks, such as feature vector method [11], shortest path increment [12], spreading influence related centrality [13], PageRank [14], LeaderRank [15], HITS [16], k-shell centrality [17], and k-shell improved algorithm [18][19][20][21]. From these research results, it can be seen that the key node identification is very successful in single-relation networks.
However, in real social networks, users often participate in different social activities in various ways and form various connections with different users [4,[22][23][24]. e different interactive relationships form multirelationship networks [25]. For the multirelationship networks, there are many types of relationships between nodes; and the types of nodes may be different. Obviously, the multirelationship networks possess more information than single-relation networks to effectively express the diversity of relationships for social networks. Battiston et al. [25] first proposed the relevant symbolic representation of the multirelation network. Boccaletti et al. [26] further enhanced the understanding of multirelationship networks. Al-Garadi et al. [27] applied the node identification method of single-relation networks to multirelation networks. ey only mined the single feature of networks, failing to accurately identify the key nodes. Chen et al. [13,28] integrated multiple centrality measures into the hierarchical ranking method to identify the key nodes in complex social networks. Wang et al. [29] proposed a new metric for measuring key nodes in multilayer networks. ey further verified the metric in single-layer networks, multirelationship networks, and aggregated networks, respectively. Li et al. [30] proposed a key node identification method for multilayer networks based on evidence theory. eir method had high computational complexity and was not applicable to large social networks. Pedroche et al. [31,32] extended some algorithms for single-layer networks to two-layer networks. Fu et al. [33] used the representation learning to learn the global structural features and local structural features of networks. eir method well represented the characteristics of nodes. Singh et al. [34] proposed a new multirelationship network aggregation method to identify key nodes. eir experiments showed that their method has obviously advantage for influence maximization across multiple social networks. Huang et al. [35] treated the community as a node through extended neighbor strategy. ey transformed the multilayer network into a single-layer network and then proposed an algorithm to find the key nodes in monolayer or multilayer networks.
us, the key node discovery is beneficial to understand the structure properties of multirelation networks.
At present, there is no unified concept of the key node discovery in multirelation networks [36][37][38][39][40]. e existing methods of key node discovery for multirelationship networks have not fully used the importance of the relationships between different layers and the importance of the edges between layers. erefore, the key node discovery algorithms have yet to study for multirelation networks.
In this paper, we will design a novel algorithm for identifying top-K key nodes in social networks. Firstly, we propose an algorithm of overlapping user relational networks to establish different relational networks in social networks. Subsequently, we reconstruct a multirelationship network based on these relational networks. en, we build a node influence measure based on the multiple relationships and multiple features to rank influential nodes. On the multirelationship network, we propose a key node discovery (KND) algorithm to obtain key node. Some comparison experiments are made to verify the effectiveness of the KND method on social networks. e main contributions of this paper are as follows: (1) An algorithm of overlapping user relational network is proposed. is algorithm is able to extract different relational networks with the same nodes. (2) An algorithm of establishing multirelationship network is proposed based on the different relational networks. e multirelationship network fully integrates the multiple relationships and multiple features in social networks.
(3) A key node discovery (KND) algorithm with multiple relationships and multiple features is proposed.
In this method, a novel node influence measure is built on the multirelationship network. Based on the node influence method and the algorithms of overlapping user relational network and multirelationship network, the key node discovery algorithm is designed to find the key nodes on multirelationship network. Some comparison experiments on different datasets verify that the performance of the proposed KND algorithm is better than baseline methods. By the evaluation of the normalized discounted cumulative gain (NDCG), the proposed KND algorithm gets the best NDCG score in different networks. ese results show that the proposed KND algorithm can accurately find the top-K nodes in the social networks. e rest of the paper is organized as follows. Section 2 presents related definitions and proposed algorithms. Section 3 explains the experimental setup and result analyses of comparison experiments on three datasets. Section 4 draws conclusions and future directions.

Key Node
. . , v n represents the set of users in the ith relational social network; E i � e 1 , e 2 , e 3 , . . . , e n represents the set of edges between nodes in the ith relational network; R i � (r 1 , r 2 , r 3 , . . . , r l ) represents the set of relationships among users at each layer in the social network, and their values belong to [0, 1]; and W i � W 1i , W 2i , W 3i , . . . , W ni represents the set of weights in the ith relational social network.
Integrated with the l relational networks, the multi- and W i � W 1i , W 2i , W 3i , . . . , W li }. Specifically, E represents the set of interlayer edges connected between the ath and bth relational networks; V a and V b , respectively, represent the nodes of the ath and bth relational networks; and W is the set of the weights of all user pairs in different relationships. In order to better measure the weights of the relationships in the social network, their values will be normalized in the following. Figure 1 shows an example of a social network and the multirelationship network. In Figure 1(a), a social network has 3-layer relational networks, where each layer network has 10 nodes, and the dotted lines represent the edges between layers.
Each layer represents a type of relationship; the first layer network is a forwarding network, where the network edges with the blue lines represent the relationships between users; the second layer is a replying network, with relationships between users within the network represented by red lines; and the third layer is a mentioning network, with relationships between users of the network represented by green lines. Assume that users in different relational networks are connected by a user participating in two or more relationships. en all the relational networks aggregate the multirelationship network, as shown in Figure 1 Definition 1 (Neighbors) [34]. e neighbors for a node u are denoted by Definition 2 (Degree centrality) [34]. For a node u ∈ G, the degree centrality is the number of links incident upon node u; that is, D(u) � |N(u)|.
Definition 3 (Shortest path of a node) [12]. Let d iv represent the distance from node i to node v. en the sum of the distances from node i to the other nodes in a graph G, denoted by C(i), is called the shortest path of node i; that is, Definition 4 (Shortest path in the graph G) [12]. Shortest path in graph G represents the sum of the shortest paths of all nodes of G. at is,

Overlapping User Relational Networks.
In social networks, users actively participate in different relational networks. ese users are called overlapping users. e subgraph induced by the overlapping users in the ith relational network is called the ith overlapping user relational network for 1 ≤ i ≤ l. e network that is aggregated by all overlapping user relational networks is called multirelationship network. In the following, for convenience, the ith overlapping user relational network for 1 ≤ i ≤ l is briefly called the ith relational network.
In the kth relationship network (1 ≤ k ≤ l), the weight of the edge between users i and j is defined as en, the algorithm of overlapping user relational networks that are abstracted from a social network with rela- Mathematical Problems in Engineering Based on the definition of the multirelationship network, we can further establish a multirelationship network with weights of edges through aggregating the l relational networks. For example, Figure 2 gives four relational networks: following, forwarding, replying, and mentioning networks. Figure 3 illustrates the process of aggregating the four relational networks and the multirelationship network. e multirelationship network can be established by Algorithm 2.

Key Node Discovery of the Multirelationship Network.
Let G\i be the graph removing node i. Based on the definition of the shortest path for a node, the shortest path increment for a node i, denoted by SPIS i , is the following: Let DC i and PR i denote the degree centrality [6] and PageRank value [14] of a node i in graph G, respectively; and let Sd(·) denote 0-1 normalization method. en, we have where D i stands for the degree of node i; and a is the damping coefficient, generally 0.85. Since a multirelationship network is aggregated by different relational social networks through the overlapping users, the node influence score of a user i in the network can be calculated by combining with the degree centrality, PageRank value, and shortest path increment as follows: (9) where α, β, and c are parameters and α + β + c � 1.
e shortest path increment is based on global features; the degree centrality is based on local features; and PageRank is a random walk feature. us, this formula combines both local and global features.
In multirelationship network, based on the weight of edge ij in the kth relationship network (1 ≤ k ≤ l), the node weight of node i, denoted by NW(i), can be defined as follows: Considering the edges of each relational network and the characteristics of the network structure of different relational networks, the final node influence score of a node i, denoted by INF final (i), is aggregated with the node weight and the node influence score in the multirelationship network.
where α + β + c � 1. By the algorithms on the overlapping user relational network and multirelationship network, the key node discovery algorithm based on multiple relationships and multiple features of social networks (KND algorithm) can be stated by Algorithm 3.

Experiments
In this section, we will give the experimental dataset, baseline methods, and evaluation experiments of KND algorithm.  networks include two LFR artificial synthetic networks and a karate club network. e multirelationship network is Higgs Twitter network, including following, forwarding, replying, and mentioning relational networks. e statistics of the two groups of datasets are shown in Tables 1  and 2.   [41]. Since the networks follow power law distribution, they can be used to simulate real networks. If the LFR network has M nodes, then the network is denoted by LFR-M network.

Higgs Twitter Network (http://snap.stanford.edu/data/ higgs-twitter.html).
e twitters contained news about the discovery of new particles with Higgs boson features before, on, and after July 4, 2012. is dataset has four types of ALGORITHM 2: Establishing multirelationship network.

Baseline Methods.
To evaluate the accuracy of the KND algorithm, six baseline methods are selected as follows: (1) PageRank (PR) [14]: e initial values of nodes are given the same score. en each value is constantly updated by an iteration formula of PR. When the iteration results tend to converge in a stable state, the top-K nodes with higher scores are selected as the key node. (2) Degree centrality (DC) [11]: It computes the centrality of a network. en, the top-K nodes with high centrality are selected as the key nodes. (3) SPIS [12]: It computes the shortest path increments of all nodes in networks. en, the nodes that have the highest top-K shortest path increments are selected as the key nodes. (4) MCIM [42]: is method considers four metrics to evaluate a node. e overlapping influence and the influence between nodes and their neighbors are both considered to evaluate a node. en, the nodes with top-K evaluation values are selected as the key nodes. (5) Eigenvector [17]: e importance of a node depends on both the number of neighbor nodes and the importance of its neighbor nodes. Both the topology and the properties of the node are considered. en the nodes with top-K eigenvector scores are selected as the key nodes. (6) Random [43]: It randomly selects K nodes as the key nodes.

Evaluation Experiments of KND Algorithm.
e SIR (Susceptible-Infected-Removed) model [9] is adopted to compare transmission ability on the KND algorithm and baseline methods. e susceptible population S is converted to the infected population with the probability η, and the infected population I recovers to be immune to the information with probability ξ. When there are no infected nodes in the network or a preset number of iterations is reached, the propagation process stops in networks. Based on different sizes of dataset, we adopt different values of η and ξ according to [44]: η min ≈ 〈k〉/〈k 2 〉, where 〈k〉 indicates average degree in the whole network and 〈k 2 〉 indicates the mean of the squared degrees in all nodes of the network. For convenient comparison, we set ξ � 0.01 and η: � η + 0.02. e values of η for each network are shown in Table 3. Moreover, parameters α, β, and c are set as 0.4, 0.2, and 0.4, respectively. Considering the fact that all experiments are probabilistic experiments, we get 50 times of calculations each time and take their average result as the final result to avoid the effect of random error. For the Higgs Twitter network, to reduce the complexity of the experiments, we only consider three important relationships: replying, mentioning, and forwarding. e three relational networks are aggregated as the Higgs multirelationship network by Algorithm 2. Under the above parameters, we make some comparison experiments for the KND, DC, PR, Eigenvector, MCIM, SPIS, and Random methods on the karate club, LFR-500, LFR-1000, replying, mentioning, forwarding, and Higgs multirelationship networks. e more infected nodes are, the stronger transmission ability of the initial infected nodes is. When the initial infected nodes are the same, the more infected nodes are, the better the performances of the methods are.
From Figure 4, on the karate club network, whatever the initial numbers of infected nodes are, the total number of infected nodes stays the same, close to 34. e reason is that the size of the karate club network is too small; and the initial infected nodes easily infect other nodes of the network. e performances of the KND, DC, PR, Eigenvector, MCIM, and SPIS methods are almost the same and better than the performance of the Random method on the karate club network.
On the LFR-500 and LFR-1000 networks, with the increase of initial numbers of infected nodes, the total numbers of infected nodes for the KND and baseline methods are going up; the performance of the KND method is the best; and the performance of the Random method is the worst. Since the sizes of the two networks are different, at the same initial number of infected nodes, the total number of infected nodes on LFR-500 is more than that on LFR-1000. It indicates that the more the number of nodes in the network is, the weaker the transmission ability is under the same initial number of infected nodes.
On the replying, mentioning, and forwarding networks, the performance of the KND method is better compared to all baseline methods; the Random method gets the worst performance. On the replying and mentioning networks, as the initial numbers of infected nodes increase, the total numbers of infected nodes steadily rise. However, on the forwarding network, the total numbers of infected nodes rise from 5 to 10; from 10 to 20, they keep about 280; from 20 to 25, they go through a descent; and then they increase slowly. erefore, the KND method on the three single networks has the best performance overall.
On the Higgs multirelationship network, from 5 to 10, the performance of the MCIM method is the best, while the performance of the KND method is the second; from beginning to end, the total numbers of infected nodes continuously rise; from about 10 to end, the performance of the KND method is the best; and, from beginning to end, the Eigenvector method is the worst. On the replying, mentioning, forwarding, and Higgs multirelationship networks, when the initial numbers of infected nodes are the same, the total number of infected nodes on the Higgs multirelationship network is lower than those on the replying, mentioning, and forwarding networks. is implies that the transmission abilities on single networks are stronger than those on multirelationship network. is is because real social networks always contain multiple relationships and features, which reduces the transmission capacity to some extent. erefore, the whole performance of the KND method is superior to the baseline methods on single-relationship and multirelationship networks; and the multirelationship network can indicate social networks better. Summing up these discussions, compared with all baseline methods, the proposed KND method can effectively discover the key nodes in social networks.

Evaluation Indicator and Its Analysis.
To evaluate the sorting qualities of key nodes (top-K nodes) for all methods, the normalized discounted cumulative gain (NDCG) [45] is adopted. Suppose that NDCG@n denotes the normalized discounted cumulative gain of the first n nodes; DCG n represents the cumulative loss gain of the nodes; and IDCG n represents the maximum DCG n in the ideal case. en where DCG n � n i�1 Rl(i)/log(i + 1); and Rl(i) represents the correlation between node i and the final result. Figure 5 shows the values of NDCG@n on different networks. From Figure 5(a), on the karate club network, when n � 10, 30, the KND, PR, and MCIM methods get the largest NDCG scores. When n � 10, 20, 30, the Random method is the worst. When n � 20, the NDCG score of the MCIM method is the highest, whereas the NDCG score of the KND method is very close to the highest value. When n � 30, the NDCG scores of all methods are slightly different. e reason is that the size of the network is very small. It can be concluded that the advantage of the KND method is not obvious in small size networks.
From Figure 5(b), on the LFR-500 network, when n � 10, 20, 30, the KND method is the best; the Random method is the worst; and the NDCG scores of the KND, DC, PR, Eigenvector, and MCIM methods are subtly different. ey show that the KND method is the best. From Figure 5(c), on the LFR-1000 network, when n � 10, 20, 30, the NDCG scores of the KND are the largest; and the NDCG score of the Random method is the smallest. When n � 10, the NDCG scores of the PR, MCIM, and SPIS methods are slightly different. When n � 30, the DC, Eigenvector, and SPIS methods almost obtain the same NDCG@30. ey show that the KND method can take the best performance on the LFR-1000 network.
From Figure 5(d), on the replying network, when n � 10, the NDCG score of the KND method is the same as that of the MCIM, the highest. When n � 20, the DC method gets the largest NDCG@10, but the NDCG score of the KND method is slightly lower than that of the DC method. When n � 30, the KND and MCIM methods obtain the highest NDCG@30. When n � 10, 20, 30, the NDCG scores for Random method are the lowest. ese imply that the overall performance of the KND method is superior to those of the other methods on the replying network.
From Figure 5(e), on the mentioning network, when n � 10, the SPIS method gets the largest NDCG@10; and the KND method obtains the second NDCG@10. When n � 20, the NDCG score of the MCIM method is the largest and slightly more than KND method. When N � 30, the NDCG scores of the KND and MCIM methods are almost the same and the largest. When n � 10, 20, 30, the Random method gets the worst performance. ese imply that the more the value of n is, the better the KND method obtains the performance on mentioning network.
From Figure 5(f ), on the forwarding network, when n � 10, 20, 30, the NDCG scores of the KND, DC, PR, MCIM, and SPIS methods are slightly different; the KND method gets the best performance; and the Random method obtains the worst performance. us, the performance of the KND method is the best on the forwarding network.
From Figure 5(g), on the Higgs multirelationship network, when n � 10, 20, 30, the NDCG score for the KND method far exceeds all baseline methods. us, the KND method can get the best performance on the Higgs multirelationship network. erefore, the overall sorting quality of the KND method is superior compared to all baseline methods on the single- relationship network and multirelationship network. In particular, the sorting quality of the KND method is far more than all baseline methods on the multirelationship network. It is concluded that the KND method is very helpful to find the key nodes in social networks.

Conclusion
In social networks, a user always has multiple relationships and features. Considering the fact that the different relationships constitute different relational networks, this paper proposed the algorithm of overlapping user relational network to find different relational network consisting of the same nodes in social networks. Based on these relational networks, we first proposed the algorithms of the overlapping user relational network and multirelationship network. en, we proposed the key node discovery algorithm with multiple relationships and multiple features to find the top-K nodes in social networks. e experiments of the KND algorithm and six baseline methods on karate club, LFR-500, LFR-1000, replying, mentioning, forwarding, and multirelationship networks show that the KND algorithm can obtain the best performance on the multirelationship network. e key node discovery methods with multiple relationships, multiple features, and users' attributions are worth research directions in the future.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.