An Algorithm for Critical Nodes Problem in Social Networks Based on Owen Value

Discovering critical nodes in social networks has many important applications. For finding out the critical nodes and considering the widespread community structure in social networks, we obtain each node's marginal contribution by Owen value. And then we can give a method for the solution of the critical node problem. We validate the feasibility and effectiveness of our method on two synthetic datasets and six real datasets. At the same time, the result obtained by using our method to analyze the terrorist network is in line with the actual situation.


Introduction
It is a basic process that happened in the network for the spread, diffusion, and cascade behavior of information. Considering that we plan to introduce new products, we can use the network feature which is called "word-of-mouth" or "viral marketing. " That is, we may find out some individuals with influence and let them recommend the product to their friends so that such a cascade spreads by the greatest extent in the people. How to choose these influential individuals is called critical node problem (CNP). An effective solution for the problem has an important practical value [1]. For example, we can find out the leaders quickly in the criminal relationship network; in the power network, we can protect important circuit breakers and power units for effectively preventing large-scale blackouts caused by cascading failure; in the disease network, we can specifically isolate the source of the disease and block its spread and diffusion; we can discover the initiators and avoid the butterfly effect in the rumor network.
This paper gives a solution for the CNP, which assigns a marginal contribution for every node in a community of social networks using the solution concept and union concept of cooperative games. Then we sort all nodes by their contribution and obtain critical nodes according to some rules. The rest of the paper is organized as follows. Section 2 introduces two basic diffusion models and related background knowledge. An algorithm based on Owen value is presented in Section 3 and validated in Section 4. We conclude the paper in Section 5.

Diffusion Models.
The models for information propagation on networks have been widely studied [2][3][4]. We consider two basic models in this paper: independent cascade model (ICM) and linear threshold model (LTM) [5]. Some necessary definitions and hypotheses are firstly given.
A network is modeled as a graph = ( , ) with vertices in modeling the individuals in the network and edges in modeling the relationship between individuals, where | | = and | | = . A vertex has two states: active and inactive, which means whether a product or idea is accepted by individuals or not. Assume that a vertex can only change from inactive to active and not vice versa; an inactive vertex can be activated by its active neighbor vertices and an active vertex can activate its inactive neighbors; the increment of activated vertices represents the dissemination of information.

2
The Scientific World Journal 2.1.1. ICM. In this model, a propagation probability ,V is given for each edge ( , V) ∈ ; that is, vertex V is activated with probability ,V by . When an initial set 0 of active vertices is given, the diffusion process spreads up according to the following randomized rule. When a vertex is activated at time-step , it has a single chance for activating its neighbor V with ,V . If succeeds, V will become active at time-step +1.
Here, if V has multiple parent vertices that become active at time-step for the first time, then their activation attempts are sequenced in an arbitrary order. Whether or not succeeds, it cannot make any further attempts to activate V in subsequent steps. The process runs until no more activation is possible.

LTM.
In this model, vertex V is influenced by each neighbor according to a weight V, such that ∑ V V, ≤ 1. Each vertex has a predefined threshold V ∈ [0, 1], which is chosen uniformly at random. When an initial set 0 with active vertices is given, the diffusion process unfolds according to the following randomized rule. All activated vertices at time-step still keep active at time-step + 1. Whether or not any inactivated vertex is activated is determined by its neighbors' weights such that ∑ V V, ≥ V . The process runs until no more activation is possible.
The difference between ICM and LTM is that each attempt of activation is independent of the attempts by all the other active individuals while in the later model each inactive individual is influenced by the aggregated weight of all its active neighbors.

Problem
Description. Given = ( , ), influence spread model , and a positive integer , the critical node problem (CNP) is to find vertices which maximize the extent of spread initiated by these vertices under the current model . That is, where ( ) denotes the expected size of the set of vertices activated by an initial set ; denotes any set with vertices in .

Related Works.
Many statistical properties for social network analysis have been presented in the complex network theory, such as degree, clustering coefficient, and betweenness [6]. PageRank algorithm [3] and HITS algorithm [4] use the eigenvector centrality for ranking web pages; White and Smyth compute the node's relative importance by Markov centrality [7]; Shetty and Adibi discover the important nodes from the email network through graph entropy [5]; Li et al. identify influential Bloggers by artificial neural network [8]; Li et al. find out the Effectors by tree structure based on direct graph and dynamic linear programming [9]. Some scholars mine critical nodes for social networks based on specific network information [10][11][12][13][14][15]. Domingos and Richardson firstly studied the CNP as an algorithmic problem [16,17]. Kempe et al. formulated the problem as the discrete optimization problem, proved that the problem is an NP-hard, and presented a greedy approximation algorithm (see Algorithm 1) which approximates the optimum within a factor of (1 − 1/ ) [18,19].
There is a key problem how to compute the value of ( ) in Algorithm 1. Currently, we have not any efficient method to get its exact solution. However, we can use Monte-Carlo method to simulate the process of influence spread for obtaining approximate results by high probability. Assuming that every vertex run the process of ( ∪ {V}) times, and computing time of ( ) is ( ), then the overall complexity for this greedy algorithm is ( ). However, the method's efficiency severely restricts its scalability. Leskovec et al. proposed an optimized method referred to as the cost-effective lazy forward (CELF), which can speed up the above greedy algorithm [20]. Chen et al. did not improve the greedy algorithm itself but focused on computing mutual influence between the individuals in local network structure. They present a degree discount algorithm for ICM, which achieves almost matching influence thread with the greedy algorithm and is less than one-millionth of time of the greedy algorithm [21]. After that, they design a maximum influence arborescence model for the general ICM by restricting computations on the local influence regions of nodes [22] and a local directed acyclic graph algorithm for the general LTM [23] to solve the CNP. They also study the CNP in social networks when negative opinions may emerge and propagate [23].

Cooperative Game and Owen Value.
Given a finite set of players , cooperative game with transferable utility is a pair ( , V), characteristic function V : 2 → R, and V(⌀) = 0. For ∀ ∈ , if payoff vector satisfies ≥ V({ }) and ∑ =1 = V( ), then it is called an allocation of ( , V). The solution of cooperative game is a kind of allocation rule and the allocated payoff for every player denotes a method to measure the negotiation strength of the players in the game. Shapely presented a solution concept which finds out the only allocation distribution scheme from the solutions with different property; that is, it assigns the player's payoff according to the importance of every player for the game [24]. The Shapely value of the player in the game ( , V) is where = | | and = | |. However, the Shapely value does not consider the impact of coalition structure and Owen extends it [25]. Each union obtains its payoff from the game between the unions, and then the payoff is allocated by the internal game among the members of the union. All the payoffs are computed by the Shapely value.

The Critical Node Discovery Algorithm Based on Owen
Value. The idea of the greedy algorithm is to find a node with the greatest influence during an iteration based on the diffusion model of social networks. In nature, the node with the maximum marginal contribution is chosen. Because the community structure is prevalent in the social networks [26], we, respectively, consider the community's influence on the information diffusion and every node's influence in the community. We take the nodes in the social network as the players in the cooperative game and information diffusion as coalition formulation. Thus, we can define an appropriate cooperative game for mapping the information diffusion in the social network and identify the critical nodes through the node's marginal contribution.
A network is a list of which pairs of players are linked to each other. The network structure is the key determinant of the level of productivity or utility to the society of players involved. A network game consists of a set of players and a value function. The value function assigns a real value to each possible network on all players. An allocation rule is a way to allocate the real value generated by a set of players and has to take into account the marginal value of a player. If we take the value function as the characteristic function, then network game can be seen as the cooperative game with transferable utility [27].
We define the cooperative game ( , V) where is the set of nodes in the social network and the characteristic function V : 2 → R, 2 is the set of all subsets of . For each ⊆ , if all the nodes in are initially activated, then V( ) represents the expected number of active nodes at the end of the diffusion process; that is, V( ) = ( ).
So, we can use the Owen value to obtain the marginal contribution of every node. Because the Owen value can be seen as a two-step procedure in which the Shapely value is applied twice, we firstly compute a node's Shapely value.
Given a node ∈ and a subset ⊆ such that ∉ , the marginal contribution of node is V( ∪ { }) − V( ), ∀ ⊆ \ { }. Consider the set of all possible permutations Ψ on , let ∈ Ψ, and define ( ) to be the set of all nodes appearing before node in the permutation . So, the average marginal contribution of node to the given coalitional game is Note that this method must work with ! permutations and its computational complexity is (( / ) ) [28]. Therefore, we give the approximate method for computing Shapely value. Randomly generate a set Ψ with permutations; let ∈ Ψ and ( ) denotes the th node in the permutation. The number of activated nodes after running the diffusion model when the node (1) is activated is the contribution of (1). Next, we consider the node (2). If (2) becomes active after (1) is activated, then the contribution of (2) is 0. Otherwise, the contribution of (2) is the number of activated nodes by (2). Therefore, we can get the contributions of (3), . . . , ( ). For ∈ Ψ , repeat the above process times. Then the average contribution of each node in the diffusion process can be calculated. We can obtain the top-k nodes sorted by the greatest influence and ensure that they are not adjacent to each other. See Algorithms 2 and 3.
According to the description in Section 3.1, we give the calculation method of the Owen value. Roger Guimerà et al. study node roles in the community according to within-module degree and participation coefficient , which are divided into seven categories, Role = { 1, 2, 3, 4, 5, 6, 7} [29]. We consider two types of the roles: nonhub connector node ( < 2.5 and 0.62 < ≤ 0.80, R3) and connector hub ( ≥ 2.5 and 0.30 < ≤ 0.75, R6). Most of the nodes which belong to these two roles connect to other communities.
We use the CNM algorithm [30] to divide the network = ( , ) into communities = { 1 , 2 , . . . , | = ( , ), = 1, . . . , } and assign the role V ℎ ∈ Role for every node V ℎ ∈ in the community. Let Role = { 3, 6}, = 1, . . . , , = 1, . . . , , define = ( , ) as community game network. So, we can obtain the Shapely value of every node in the network and take the sum of the Shapely values of all nodes in the same community as the community's payoff. Then we treat a community as a separate network and calculate the Shapely value of every node in the community. The node's Owen value is assigned according to the normalized Shapely value of the node in the community and the community's payoff. So, we can get the critical nodes by Algorithm 3.

The Computational Complexity.
We consider the computational complexity for − based on Shapely value. In Algorithm 2, we compute the marginal contribution of every node and its complexity is ( ( + ) ); the cost of ranking the contributions and selecting nodes is ( log( ) + ). Therefore, the overall running time of Shapely value-based algorithm is ( ( + ) + log( ) + ). Because it is reasonable to assume that < for the real-world graphs and the Owen value can be seen as a two-step procedure in which the Shapely value is applied twice, the computational complexity of Owen value-based algorithm for − is ( ), where is a polynomial in .

Evaluation
We validate our method on two synthetic datasets and six real network datasets. All experiments are executed in the PC with 3.2 GHz CPU, 4 G memory, and Windows 7. The development tools are MATLAB 2009 and Microsoft Visual Studio 2010. We compare our method (the Ov algorithm) with the Shapley value-based algorithm (the Sv algorithm), the greedy algorithm (the greedy algorithm), and the degree-heuristic algorithm (the degree algorithm). The greedy algorithm is as a benchmark for measuring other algorithms; the degree algorithm selects nodes with the greatest degree from the network as the initial set. In order to obtain the accurate influence of every algorithm, we use the average number of the activated nodes after running ICM and LTM 10000 times for every initial set. In ICM, the propagation probability is set to 0.05; in LTM, the edge weight of a node is the reciprocal of the node's degree. The size of the initial set is, respectively, set from 1 to 20. The in the Ov and Sv algorithms is shown in Table 1. In the experiments, we only discuss the case of using the ICM because we obtain the same conclusions for ICM and LTM.

Performance Comparison in Synthetic Networks.
We consider the influence of community structure on the Ov algorithm. We use BA model [31] and the forest fire model [32] to generate two synthetic datasets with 5000 nodes. The BA model takes the power-law distribution as two important features: growth and preferential attachment. The model can generate a scale-free network whose power exponent is 3 and community structure is not obvious. The forest fire model can generate a network with degree power law, densification law, shrinking diameter, and obvious community structure.
We, respectively, compute the Shapely value and Owen value of every node in the BA and FF datasets and obtain the initial set and the number of activated nodes after running the ICM. The process is repeated 100 times and the average number of the nodes activated by the initial set with different size is drawn in Figure 1. The Sv algorithm is almost the same as the Ov algorithm on BA dataset with unobvious community structure (Figure 1(a)). In contrast, the Ov algorithm is significantly better than the Sv algorithm on FF dataset with obvious community structure (Figure 1(b)). In Figure 1, the number of activated nodes by the initial set from the Ov algorithm is almost the same as the greedy algorithm. The AS dataset [37] with 11456 nodes and 32759 edges is from the topology of autonomous systems on the Internet. The PG dataset [38] with 4941 nodes and 6594 edges is from the North American power grid.

Performance Comparison in Real
We, respectively, use the greedy, Ov, Sv, and degree algorithms to find out the initial sets from above six real datasets. Figure 2 describes the number of the nodes activated by the initial set with different size based on ICM. From Figure 2, we see that whether on the networks with obvious community (Figures 2(a), 2(b), 2(c), and 2(d)) or on the ones with unobvious community (Figures 2(e) and 2(f)), the accuracy of the Ov algorithm is similar to the greedy algorithm, sometimes even better than it (Figure 2(b)). Compared with the Sv and degree algorithms, the Ov algorithm has a large advantage.

The Time Efficiency Comparison.
We discuss our method's time efficiency.
We generate three datasets FF 0.35,0.2 (sparse graph), FF 0.37,0.32 (densifying graph), and FF 0.38,0.35 (dense graph) with 1000 nodes using the FF model with three groups of parameters. Then we find out the top-20 critical nodes on these datasets by the Ov and greedy algorithms and plot the running times in Figure 3. We note that the Ov algorithm obtains a speedup of dozens of times over the greedy algorithm on these datasets. Krebs studied the terrorist network in the event of September 11, 2001. Figure 4 and Table 2 show the trusted contacts among 19 hijackers [39]. Table 2 gives the names and airlines of these hijackers, and the squares in Figure 4 denote the critical nodes. We use the Ov algorithm to analyze the dataset and find out the sequence of critical nodes in which the first four people are Nawaf Alhamzi, Ziad Jarrah, Abdulaziz Alomari, and Mohald Alshehri. These four people are just on four different airlines which show that the Ov algorithm is effective to some extent.

Conclusions
For solving the CNP, this paper presents a method based on the Owen value from cooperative game, which considers the widespread community structure in social networks. We validate the proposed method on two synthetic datasets and the results show that our method is more suitable for the networks with community structure. Compared with other algorithms on six real datasets, our method is more The size of initial set The number of activated nodes The size of initial set The number of activated nodes The size of initial set The number of activated nodes The size of initial set The number of activated nodes  effective. How to further improve the time efficiency of the Ov algorithm needs to be studied.

Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.