Influence maximization problem aims to identify the most influential individuals so as to help in developing effective viral marketing strategies over social networks. Previous studies mainly focus on designing efficient algorithms or heuristics on a static social network. As a matter of fact, realworld social networks keep evolving over time and a recalculation upon the changed network inevitably leads to a long running time. In this paper, we propose an incremental approach, IncInf, which can efficiently locate the top
The increasing popularity of online social network has promoted the diffusion of information, opinions, adoption of new products, and so forth and provided great opportunities for intelligent viral marketing. To benefit best from the wordofmouth effect, influence maximization (IM) is one fundamental and important problem that aims to identify a small set of influential individuals so as to develop effective viral marketing strategies to maximize the influence over a given social network [
Existing researches and solutions on influence maximization focus mainly on developing effective and efficient algorithms on a given static social network. Although one could possibly run any of the static influence maximization methods, such as [
Unfortunately, the rapidly and unpredictably changing topology of a dynamic social network poses several challenges in the reidentification of influential users, which we list as follows. On one hand, the interconnections between edges in realworld social graphs are rather complicated; as a result, even one small change in topology may affect the influence spreads of a large number of nodes, not to mention the massive changes in largescale social networks. It is very difficult to efficiently compute the changes of influence spreads for all the nodes after the evolution. On the other hand, since there are a great number of nodes in largescale social networks, how to effectively limit the range of potential influential nodes and reduce the amount of calculation to the maximum is a very challenging problem.
To well address these challenges, we investigate the dynamic characteristics exhibited during the evolution of realworld social networks. Through tests on three realworld dataset traces, Facebook, NetHEPT, and Flickr, we observe that, first, the growth of social network is mainly based on the preferential attachment principle [
First, we design an efficient approach to quantitatively analyze the influence spread changes from network topology evolution by adopting the idea of localization. A tunable parameter is provided for tradeoff between efficiency and effectiveness.
Second, we propose a pruning strategy that could effectively narrow the search space into nodes only experiencing major increases or with high degrees based on the changes of influence spread and the previous top
Third, we conduct extensive experiments on three dynamic realworld social networks. Compared with the stateoftheart static algorithm, IncInf achieves remarkable speedup in execution time while providing matching influence spread. Moreover, IncInf provides better scalability to scale up to extraordinarily largescale networks.
A preliminary version of this paper appears in [
The remainder of this paper is organized as follows. In Section
Influence maximization on static networks has attracted a great deal of attention. The hillclimbing greedy algorithm proposed by Chen et al. suffers from low efficiency, and many efficient algorithms have been proposed recently to address this problem. Leskovec et al. [
The influence maximization problem on dynamic social networks still remains largely unexplored to date. Habiba et al. [
In this section, we illustrate the definition of social network and the influence diffusion model that we will use throughout the paper and then give the problem definition of influence maximization in evolving networks.
(
(
(
(
(
This paper distinguishes itself from previous works by considering the dynamic nature of online social networks. As a matter of fact, the realworld social networks are not wholly static but keep evolving gradually over time. The evolution of large social networks has raised new sets of questions; among them one interesting yet challenging problem is how to quickly identify the top
To solve such a problem, we define an evolving network
In this section, we study some patterns of social network evolution. The numbers of nodes and edges are firstly investigated in Section
Nodes and edges are the basic elements of the social network topology. In this subsection, we use the numbers of nodes and edges to examine the growth of users and interconnections over time. Figure
Number of nodes and edges per month of the Facebook dataset.
Understanding the pattern of the network topology evolution is of primary importance to design efficient influence maximization algorithms for evolving social networks. In this subsection, we further investigate the degree distribution of nodes and the preferential attachment rule [
Degree distribution and preferential attachment on Facebook.
Degree distribution
Preferential attachment
We also study the preferential attachment rule or, in other words, the “richgetricher” rule [
Examining the relation between the influence and the degree of node can help us understand the effect of degree changing on the influence spread of nodes. For this reason, we run the static MixGreedy algorithm [
The relation between the influence spread and the degree in Facebook.
In this section, we present the detailed design of IncInf, an incremental approach to solve the influence maximization problem on dynamic social networks. The main idea of IncInf is to take full use of the valuable information that is inherent in the network structural evolution and previous influential nodes so as to substantially narrow the search space of influential nodes. In this way, IncInf can significantly reduce the computation complexity and improve the efficiency. Figure
IncInf design.
The evolution of social network, when reflected into its underlying graph, can be summarized into six categories, which are inserting or removing a node, introducing or deleting an edge, and increasing or decreasing the influence probability of an edge. We denote the six types of topology change as
Details of six types of basic operation.
Operation  Description  Impact on influence spread 


Add a new node 
The influence spread of 

Delete an existing node 
The influence spread of 

Introduce a new edge 
The influence spread of all the nodes that can reach 

Remove an existing edge 
The influence spread of all the nodes that can reach 

Increase 
The influence spread of all the nodes that can reach 

Reduce 
The influence spread of all the nodes that can reach 
It should be noted that only after the
As discussed above, whenever an edge
The main idea of localization is to use the local region of each node to approximate its overall influence spread. In particular, we use the maximum influence path to approximate the influence spread from node
Similarly, in our proposal, we localize the impact of topology changes on influence spread into local regions and thus reduce the amount of computation. Among six types of topology change,
Consider the case when a new edge
(
(
(
(
(
(
(
(
(
(
(
(
(
The first case is when the probability of maximum influence path from
The second case is when the probability of maximum influence path from
We treat the network dynamics from
Inspired by the observations of Section
From the preferential attachment rule, we know that the influence spread changes of those highdegree nodes should be much greater than the ordinary nodes. Moreover, according to the powerlaw distribution, such highdegree nodes only account for a small part of the whole nodes. Consequently, we can pick out nodes only experiencing major increases or with high degrees because these nodes are of great potential to become the top
In the
In most cases, the influential nodes will attract a great number of new nodes and establish new links. Thus, their influence spreads will increase drastically. In such a case, it is impossible for the nodes whose influence spread changes are smaller than the influential nodes to become the most influential nodes in
In the
It should be noted that although the case where the influence spread of a previous influential node decreases during the evolution rarely happens, we consider it here for completeness. In this case, except for qualification 1, we further select nodes because the number of nodes satisfying qualification 1 is relatively large, which leads to mass computation. Meanwhile, in reality, a node with small degree has only very low probability to become an influential node. In order to select only the most potential nodes, we refine the requirement and additionally select the nodes with large degree and large increase. Consequently, the search space is strictly circumscribed and the computational complexity is greatly reduced.
After the potential nodes are selected, we calculate the actual influence spread of these nodes in
(
(
(
(
(
(
(
(
(
(
(
(
In this section, we present the experimental results of our algorithm on identifying top
We choose three realworld social networks: Facebook social network, NetHEPT citation network, and Flickr social network (Table
Facebook: this dataset is the friendship relationship network among New Orleans regional network on Facebook, spanning from September 2006 to January 2009 [
NetHEPT: this is an academic citation network [
Flickr: this dataset [
Summary information of the realworld social networks.
Datasets  Nodes  Edges  

Initial number  Final number  Growth  Initial number  Final number  Growth  
12,364  61,096  394%  73,912  905,665  1125%  
NetHEPT  5,802  29,555  409%  57,765  352,807  511% 
Flickr  1,620,392  2,570,535  58.6%  17,034,807  33,140,018  94.5% 
We compare our algorithm with five static algorithms:
The propagation probability of the IC model is selected randomly from 0.1, 0.01, and 0.001 for each network snapshot. The parameters of the evaluated algorithms are set as suggested by their authors. For IMM, the parameters
In this subsection, the efficiency of our proposed algorithm is studied and compared with corresponding static algorithms, MixGreedy and MIA, through experiments on the Facebook, NetHEPT, and Flickr datasets. The experiments are conducted on a PC with Intel Core i7 920 CPU @2.67 GHz and 6 GB RAM. The running times of four algorithms are measured by selecting 50 seeds from the whole dataset.
The time costs of different algorithms are illustrated in Figure
The time costs of different algorithms on three realworld datasets.
We also test the effect of our pruning strategy. Here we take the Facebook dataset as an example; the results on other datasets are similar and thus are omitted. Different from other experiments, we recorded the Facebook graph from September 2006 to October 2007 (14 months) as snapshot
In this subsection, we study the influence spread of the top
Figure
The influence spread of different algorithms on three datasets.
The effect of pruning strategy on the Facebook dataset.
We shall note that the reason IncInf has slightly lower influence spread is mainly twofold. First, IncInf restricts the influence into local regions to speed up the computation of influence spread changes, which will affect the effectiveness. Second, a pruning strategy is designed to narrow down the search space based on the influence spread changes and previous top
First, we study how effectively the localization parameter
The experimental results are shown in Figure
The effect of tuning of
NetHEPT
Then, we will evaluate the sensitivity of pruning threshold
The effect of tuning of
In terms of influence spread, with the increase of
Experimental results demonstrate that our proposed IncInf algorithm significantly reduces the execution time of stateoftheart static influence maximization algorithm while maintaining satisfying accuracy in terms of influence spread. Although IncInf performs better, it has a few limitations for further improvement.
First, IncInf directly depends on previous information of top
In this paper, we consider the influence maximization problem in evolving social networks and propose an incremental algorithm, IncInf, to efficiently identify top
There are several future directions for this research. First, IncInf has large potential to fit into modern parallel computing framework. This is because IncInf restricts the computation of influence spread changes into local regions, which could ease the partition of social graph for parallel computation. Moreover, the proposed pruning strategy could be effectively performed in parallel. Second, our current IncInf algorithm is derived from the basic IC model. We believe that the conception of incremental computation for influence maximization could be properly extended to other influence diffusion models, such as another classic LT model. Third, although there have been a few researches [
The authors declare that they have no conflicts of interest.
This research was supported by NSFC under Grant no. 61402511.