Identifying Important Nodes in Complex Networks Based on Multiattribute Evaluation

Assessingandmeasuringtheimportanceofnodesinacomplexnetworkareofgreattheoreticalandpracticalsignificancetoimprovetherobustnessoftheactualsystemandtodesignanefficientsystemstructure.Theclassicallocalcentralitymeasuresofimportant nodesonlytakethenumberofnodeneighborsintoconsiderationbutignorethetopologicalrelationsandinteractionsamong neighbors.Duetothecomplexityofthealgorithmitself,theglobalcentralitymeasurecannotbeappliedtotheanalysisoflarge-scalecomplexnetwork.Thek-shelldecompositionmethodconsidersthecorenodelocatedinthecenterofthenetworkasthemost importantnode,butitonlyconsiderstheresidualdegreeandneglectstheinteractionandtopologicalstructurebetweenthenode anditsneighbors.Inordertoidentifytheimportantnodesefficientlyandaccuratelyinthenetwork,thispaperproposesalocal centralitymeasurementmethodbasedonthetopologicalstructureandinteractioncharacteristicsofthenodesandtheirneighbors. Onthebasisofthek-shelldecompositionmethod,themethodweproposedintroducestwopropertiesofstructureholeanddegree centrality,whichsyntheticallyconsidersthenodesandtheirneighbors’networklocationinformation,topologicalstructure,scale characteristics,andtheinteractionbetweendifferentnuclearlayersofthem.Inthispaper,selectiveattacksonfourrealnetworks are,respectively,carriedout.Wemakecomparativeanalysesoftheaveragelydescendingratioofnetworkefficiencybetweenour approachandothersevenindices.Theexperimentalresultsshowthatourapproachisvalidandfeasible.


Introduction
In recent years, the research of node importance ranking has attracted more and more attention, not only because of its important theoretical significance, but also because of its extensive practical application value [1,2].In complex networks, the most important nodes can help us effectively to prevent network attacks [3], to obstruct the spread of computer viruses on the networks [4], to prevent the epidemic of infectious diseases in the population [5], to inhibit the spread of gossip in society [6], and to guide the dissemination of information in social networks [7,8].The commonly used centrality measures include degree centrality [9], closeness centrality [10], betweenness centrality [11], and Katz centrality [12], but these indices are highly dependent on the topology of the network.
Kitsak et al. [13] found that a node with high betweenness or degree value in social network research is not necessarily the most important node.The k-shell decomposition method is proposed and decomposes a network into hierarchically ordered shells by recursively pruning the nodes with degree lower than or equal to k.The k-shell decomposition method can determine the location of nodes in the network, and the core layer is considered as a highly important node set [14].Due to the low computational complexity of the k-shell decomposition method, it is widely used to excavate and analyze important nodes in biological networks, scientific cooperation networks, friend networks, communication networks, and so on.However, some limitations are found from this approach [1].The k-shell decomposition method only considers the influence of the residual degree in the network decomposition, but the ranking results are too coarsegrained, which makes much difference on the node.This method is not suitable for tree diagrams, regular networks, and BA networks [15].The k-shell decomposition method has been extended and improved by many scholars.Zeng et al. [16] evaluated the residual degree and the exhausted degree simultaneously and proposed the mixed degree decomposition.Liu et al. [17] considered the k-shell information of the target nodes and the distance from the maximum kshell nodes of the network comprehensively, which overcome the defect that the nodes importance can not be accurately measured due to the existence of a large number of nodes with the same coreness value.Garas et al. [18] performed kshell decomposition method on weighted networks.Liu et al. [19,20] found that the core after the k-shell decomposition was not the true core because there existed small groups in the network which were too close to each other.Based on the definition of entropy in the theory of information entropy, they proposed the connection entropy to measure the diversity of network shell connections.
Identification of important nodes in complex networks has important theoretical significance for the structure, propagation, and synchronization of complex networks.It has a very practical value for understanding the communication and control of information, disease, and rumor, marketing the promotion of new products.In order to identify the important nodes in the complex network efficiently and accurately, this paper combines the local environment, the location, and the influence of the node to the network function and describes the importance of the complexity of the complex network nodes.In this paper, the main contributions include the following: (1) the outward links diversity assessment index is proposed and the defect, caused by the same coreness value of a large number of nodes in network after the k-shell decomposition, which can not accurately measure the importance of nodes is solved.The index not only considers the position of nodes in the network, but also takes the different nuclear layer of interaction between neighbor nodes into consideration.
(2) An index of important metrics is put forward, based on the multiattribute evaluation and node deletion, which not only considers the node and its neighborhood topology structure and the interaction characteristics between nodes, but also is able to dig out the important node in the core position and can identify the key node in the structure of hole position as well.(3) After the network's deliberated attack simulation experiment was carried out in the real data set, importance of the nodes can be calculated and quantified by the decreasing ratio of the network efficiency before and after the network attack.The experiment result shows that the method proposed in the paper has better performance in identifying important nodes in complex networks and is quite suitable for large-scale quantitative analysis of important nodes.
The remainder of this paper is structured as follows.In Section 2, we will propose and describe our method.Section 3 briefly reviews seven typical centrality indices for subsequent comparative analysis.According to the calculation of the monotonicity index and the decreasing ratio of the network efficiency, it is verified that the method proposed in this paper is better than other seven indices.Section 4 summarizes the full text and looks forward to future research directions.

Method
Consider an undirected network  = (, ) with  nodes and  edges.Given the adjacency matrix  = (  ) × of the network G, the degree of node i can be expressed as Then the sum of the adjacency degrees of node i is defined as Γ() denotes the neighbor set of node i. Degree centrality only considered the neighborhood information of nodes but ignored the topological relations between neighbors and the location of nodes in the network.Therefore degree centrality can not reflect the interaction between neighbor nodes in the calculation, and the calculation result is not accurate enough.
The k-shell decomposition method can determine the position of nodes in the network, but it only considers the influence of the residual degree when it is decomposed, which causes the ranking results to be too coarse-grained and to make the nodes less distinguishable.When an intentional attack simulation is conducted on the network, the node in the innermost kernel after the k-shell decomposition is deleted and it will be easily replaced by other nodes, if its local structure is too close and its outbound links are too small.That is to say, the node rarely interacts with other nodes and deleting it can not cause system paralysis.So the node importance is reduced.As shown in Figure 1, node B, which is in the innermost layer, does not have any outward links.Obviously, when deliberately attacking node B, other nodes in the same layer can replace it and cannot cause the system to be paralyzed.In order to overcome the defect when the coreness values of a large number of nodes in the network after the k-shell decomposition are the same, not measuring the importance of the nodes accurately, we propose the outward links diversity assessment index, which is expressed by   .
Each node is assigned an index  to represent its coreness,   represents the maximum coreness value,    represents the number of links from node i to the other nodes with  coreness value,   indicates the number of nodes with coreness equal to ks, and    is normalized to the coreness value of node i's neighbor in the  layer.It is not difficult to see that   considers not only the proportion of the number of links between the node i and its neighbor nodes in the  layer, but also their core attributes.In other words,   fully considers the network locations of the network nodes and also considers the interactions with neighboring nodes at different layers.
In addition to considering the number of multilevel neighborhood nodes and the interaction between nodes at different coreness levels, the topological relations between nodes and their neighboring nodes need to be considered.That is to say, the focus of the node important ranking problem cannot be limited to the core nodes in the network, nor can the nodes in the structure holes be ignored.Burt [22,23] proposed the network constraint coefficient to measure the network constraints experienced by network nodes in forming a structural hole.The larger the value of the network constraint coefficient is, the smaller the number of neighbors and the higher the closeness between neighbors is.Such a node is disadvantaged in competition due to its lack of easy access to new relational resources.Conversely, the smaller the value of the network constraint coefficient is, the greater the chance of structural holes is formed and the more conducive the new relationship resources are obtained.From the perspective of complex network, the network constraint coefficient uses the local attribute of the network to evaluate the importance of the node, that is, the smaller the constraint coefficient value, the greater the importance of the node.Network constraint coefficient can be defined as denotes the weight proportion of node j in all the adjacencies of node i, node m is the common neighbor node of node i and node j, and the value of ∑      is determined by the number of common neighbors m of node i and node j.The tighter the connection is, the more closed triangles they form.And the larger value of ∑      is, the less chance structural holes are formed.It can be seen that the calculation of   value takes into account the nodes degree and the topology information of their neighbors.The larger the value of network constraint coefficient is, the less structural holes are formed and the less important the nodes are.
In this paper, a new local centrality metric (labeled as ) is proposed on the basis of the interaction between the nodes of different coreness layers, the number of multilevel neighborhood nodes, and the constraints on the node forming the structure hole.
is a tunable parameter to adjust the influence of constraint coefficient, and the value range is [1, ⟨𝑘⟩].⟨⟩ represents the average degree of network nodes and here  is set to one.It is easy to see from ( 8) that the  index comprehensively considers the nodes neighborhood size, the topological structure, the network location, and the interaction between nodes.
Taking the nodes A, B, C, and D in the innermost core of Figure 1 Through the  value calculation, it can be seen that node A has a higher diversity of outward links, and node B has the less diversity of outward links which conforms to Figure 1.Then we calculate the network constraint c of the four nodes as and by analogy,   =   +   +   +   = 0.5898 and   =   +   +   +   = 0.5898.Through the calculation of the constraint of the node forming structure hole, we can see that the constraint value of node A is the maximum, and the constraint value of node B is the minimum, which indicates that node A is easy to form the structure hole and is advantageous to the information spreading of nodes.The  values of four nodes are   = 4.0160,   = 0.9767,   =   = 1.4764, respectively.It is not

Experimental Studies
In this section, selective attack simulation experiments on four real networks are conducted.Firstly, we use the monotonicity  of ranking list to measure whether each index can distinguish the difference between nodes clearly and then compare the VKC index with the representative indices in the importance measurement of the single node.Finally, we choose a certain percentage of nodes of deliberate attacks and compare the effect of each index on the network robustness to verify the validity and applicability of the VKC index.

Network Efficiency.
Invulnerability research is an important issue in complex network and has achieved many research results.The results show that networks with different structures perform different invulnerability to different types of network attacks.For example, the scale-free network has high invulnerability when faced with random attacks, but it is vulnerable to deliberate attacks.The failure of the top 5% to 10% important nodes will paralyze the whole network [24].Therefore, we consider the connectivity of the network before and after the node deletion and the importance of the node is equivalent to the destructiveness of the network when the node is deleted.The worse the network connectivity becomes after deleting the node, the more important the node is.And the closer the ranking result is to the actual ranking result, the more accurate this method is [21].Network efficiency [25][26][27][28] is an index which tests the effect of removing nodes on network efficiency.The better the connectivity of the network is, the higher the network efficiency becomes.Assume that a node in a network is under a network attack, removing the node, which means that all the edges connected to the node are removed at the same time, which may cause some paths between the other nodes in the network to be interrupted, resulting in the shortest path increasing between some nodes, thereby increasing the average path length of the entire network and affecting network connectivity.Network efficiency is expressed as represents the shortest path between node i and node j, and N represents the number of network nodes.The value of network efficiency  is within [0, 1].If  equals one, it indicates that the network connectivity is the best.Otherwise, it indicates that the network consists of isolated nodes. is normalized to its possible largest value N(N − 1), for totally connected graph having ( − 1)/2 edges.
In this paper, we select a certain proportion  ∈ [0, 1] of the top important nodes of the network to simulate deliberate attack experiments and calculate the descending ratio of network efficiency before and after the network attack to quantitatively describe the accuracy of various indices.Assume when the network is not suffering from network attacks, the network efficiency is  0 , and then the network efficiency is  after deleting a certain proportion of important nodes.The descending ratio of network efficiency  is expressed as The range of  is [0, 1].When e equals one, it means that the network efficiency drops to zero after the attack, that is, the network consists of isolated nodes.When  equals zero, it indicates that the efficiency of the entire network has not changed after the attack.It can be seen from ( 10) that the higher  value is, the worse the network efficiency becomes after deleting selectively some important nodes and the more accurate the identification of the importance of these nodes is.

Datasets.
Considering that the different types of social networks represent different network topology properties, we selected four real and open social network data sets for analysis and comparison.Zachary's karate club [29] is a social network of friendships among 34 members of a karate club at a US university in the 1970s.Dolphin social network [30] is an undirected social network of frequent associations among 62 dolphins in a community living off doubtful sound, New Zealand.Books about US politics are the network of books about US politics published around the time of the 2004 presidential election and sold by the online bookseller Amazon.The network was compiled by Krebs and is unpublished but can be found on Krebs' website.Neural network [31,32] is the network representing the neural network of Elegans.We will give the basic structural properties and  0 of these four networks studied in this work in Table 1.As it can be seen from Table 1, in neural network,

𝐺(𝑗)
Taking the  values as the qualities and taking the shortest distance as their distance based on classical gravity formula

𝑄(𝑠)
Considering the sum of the degree of both the nearest and the next nearest neighbors the maximum degree of a node differs greatly from the maximum k-shell value.The degree assortativity coefficients of the four networks are less than zero, which indicates that the larger degree nodes in the network are more easily connected with the smaller degree nodes.All four networks have small-world networks features.Table 1 shows the structural properties and the network efficiency of the real networks studied in this work. and  are the number of nodes and edges, respectively.⟨⟩ is the average degree and   is the maximum degree.  is the maximum k-shell value. is average path length. is degree assortativity and  is clustering coefficient. 0 is network efficiency of initial network.

Contrast Centrality Indices.
Here we briefly review the definitions of seven centrality indices that will be discussed in this work.The k-shell decomposition method can determine the location of nodes in the network and nodes are assigned to  values according to their remaining degree, which is obtained by successive pruning of nodes with degree smaller than the  value of the current layer [13].
Clustering coefficient [31,33] is the one representing the degree of node aggregation in the network.Centola [34] found that propagation behavior in the high-aggregation network spread faster, and the importance of the propagation of nodes is related to the clustering of the nodes.Bae and Kim [35] proposed a new important node measure indexcoreness centrality, which reflects the node's influence by calculating the sum of the  values of the node's neighbor set.The neighborhood coreness is labeled as  + () and  + () denotes the extended neighborhood coreness, which is the second-order neighborhood coreness.Liu et al. [36] proposed a weight degree centrality (labeled as   ) to measure the influence of node propagation and regulate the weight between the degree and the ability of spreading out with a tuning parameter .And the extended weight degree centrality method is labeled as   .This method has considerable performance in most experiments and here  is equal to the absolute value of the degree assortativity coefficient r.Based on the idea of classical gravity formula, Ma et al. [37] put forward a gravity model which takes the  value of node i as its quality and takes the shortest distance between two nodes in the network as their distance, and this model is used to evaluate the nodes importance, which is labeled as . + () is an extended gravity index to consider the nearest neighborhood of node i. Chen et al. [38] proposed a semilocal centrality measure as a tradeoff between low-relevant degree centrality and time-consuming measures (labeled as  index).It considers both the nearest and the next nearest neighbors.The above indices are described in Table 2.

Experiment Results
. The higher the resolution of the important evaluation index is, the more easier the difference between nodes better can be distinguished.To quantify the denoting of different indices, the monotonicity M of ranking list I is adopted, and the formula is as follows [35]: N p represents the number of nodes that select a certain proportion of , and   represents the number of nodes with the same index value i.If  () = 1, it indicates that the ranking method is completely monotonic and each node is assigned a different index value.On the contrary,  () = 0 indicates that all nodes have the same index value and that it is completely indistinguishable from the node's importance.In Table 3, the resolution  values of the ,  + ,   ,   ,  + , , and  are listed, respectively, when  is about 25%.The  value of -shell is zero in all four networks, indicating that the importance of the node is invalidated by  value.It is not difficult to see from Table 3 that the  values of ,   , and   are one in four networks; that is to say, ,   , and   can distinguish between nodes better.
First, we select four indexes with higher resolution like   ,  + , , and  + and compare them with  to analyze the importance of single node.As mentioned in It can be seen from Figure 2(a) that the importance of the nodes does not increase with the increase of the  + index values.The most important nodes are mainly distributed in the middle position of the  + index value, especially obvious performance in the middle dolphin social network and books about US politics network.The  index performance of positive synchronization is better than  + can better find the most important nodes earlier.The distribution of the values VKC index,   index, and  + index presents a monotonically increasing theoretical curve in Figures 2(b) and 2(c).The corresponding color coordinate value of each node decreases with the increase of index value; that is to say, the performance of these three indices' forward synchronization is better and can better find the most important nodes in different networks.Figure 2(d) shows that the most important nodes appear at the middle of the  index value, especially in karate network and dolphin social network.In other words, the  index cannot identify the most important node in the network earlier than the  index.The experimental results show that the  index is significantly superior to the other four indices in identifying the top-ranked important nodes. has a strong universality, which verifies the rationality and universality of the proposed model.
Digging out a set of important nodes in a complex network plays a crucial role in understanding the structure and function of the network.Next, we select the 25% top nodes ranked by eight indices, respectively, and carry out the deliberate attack simulation experiments, respectively, in four different networks.The experiment results are shown in Figures 3(a)-3(d).When there are the same measure values in the ranking results of some method such as -shell, the result of each simulation experiment will change when selecting the different nodes with same values in each round.Therefore, n multiple random simulation experiments will be carried out and the arithmetic mean of the descending ratio of network efficiency will be taken.Here n represents the number of all the nodes having the same values with the selected node.
In Figure 3(a), when  = 2.94%, ,   ,  + , and shell can find important nodes in the network earlier than the other three indexes.However, the later follow-up of kshell index is weak, and the effect of discovering important node is not obvious.The two curves of  and   are close to each other, and the descending ratio of network efficiency of  and   is 91.59% and 89.51%, respectively, at =17.65%.The two curves corresponding to  and   begin to separate from each other and the later follow-up performance of VKC is relatively better.It is not difficult to find that the slope of the curve corresponding to the index suddenly increases, indicating that the index may identify the node with the higher importance.For example, there is a sudden rise of e in  curve at =5.88%, indicating that the index may identify top important nodes. and  perform relatively poor.In Figure 3(b),  and  + have good overall synchronization.When  = 24.19%, the corresponding e of  and  + are 74.13% and 73.97%, respectively.The  of   are 53.14 at both =17.74%and =19.35%,which indicates that the index is unstable in identifying important nodes.-shell and  have relatively poor performance.In Figure 3(c), the corresponding curves and trends of each index are roughly the same, and  and -shell behave relatively poorly.In Figure 3(d), ,  + , and  have relatively poor ability to identify important nodes compared to the other five indices.The performance of ,   , and  + is relatively stable, which is consistent with the  value in Table 3; that is, the nodes have different index values and the ranking results are more accurate.Figures 3(a)-3(d) show that, in different scale and structural properties of the networks,  is more stable than the other seven indices and more accurately measures the node importance than the other seven indices, because  is based on the multiattribute evaluation of degree centrality, structural hole, and k-shell decomposition method, while the other seven indices are mainly based on single attribute evaluation.
Table 4 shows the averagely increasing ratio of , which is made by  index compared with the other seven indices, after selecting ten different proportions of nodes for deliberate attack simulations.In karate network, the average ability order from strong to weak to identify important nodes is ,   ,  + ,   , -shell,  + , , and .In dolphin social network, the average ability order from strong to weak to identify important nodes is , G + , EW DC , C nc+ , SL, W DC , k-shell, and C. In books about US politics network, the average ability order from strong to weak to identify important nodes is , G+,   ,   , ,  + , -shell, and .In neural network, the average ability order to identify important nodes is ,  + ,   ,   , -shell,  + , , and .It is not difficult to see that, in the different networks, these eight indices have their own advantages and disadvantages, but  has the overall best effect and versatility to effectively distinguish the most important nodes.
When the structure changes as the network suffers from deliberate attacks, the ranking of nodes importance will change simultaneously.The method proposed in this paper     is also suitable for the recognition of important nodes in dynamic networks; that is, VKC values of nodes in the network are recalculated after each round of network attacks.

Conclusions
Accurate assessment and measurement of the nodes importance have great significance to improve the robustness of the actual system and the design of system structure.On one hand, accurate assessment of the nodes importance can protect these important nodes to improve the reliability and survivability of the entire network.On the other hand, the entire network also can be destroyed by deliberately attacking on these important nodes.The method presented in this paper takes the local characteristics of nodes and their neighbor nodes into consideration; not only are the important nodes in the core position excavated, but also the key nodes in the location of the structural holes are identified, which overcome the defects from theoretical perspective such as the coarse-grained k-shell decomposition method, overconsidering the residual degree, and ill-application for BA model.We select the 25% top ranking nodes in the four real-world networks, calculate the monotonicity M, and find that the  can distinguish the difference from the important nodes.In this paper, we analyze and compare  with  + ,   ,  + , and , respectively, after removing single node, and then make a comparative analysis of the averagely descending ratio of network efficiency by ,  + ,

Figure 2 :
Figure 2: (a)-(d) Experimental results by  and  + ,   ,  + , and  in four different complex networks.The horizontal axis represents the index value of VKC, and the vertical axis represents the index value of  + ,   ,  + , and , respectively.The color coordinates represent the network efficiency  after deleting one node, and the smaller the value is, the greater the actual influence the node has.

Figure 3 :
Figure 3: (a)-(d) Descending ratio of network efficiency e after deleting nodes at different proportions p.

Table 1 :
Structural properties and the network efficiency of the real networks studied in this work. and  are the number of nodes and edges, respectively.⟨⟩ is the average degree and   is the maximum degree.  is the maximum k-shell value. is average path length. is degree assortativity and  is clustering coefficient. 0 is network efficiency of initial network.

Table 3 :
The monotonicity of these seven indices.results of VKC with  + ,   ,  + , and  in four different complex networks, respectively.