An Entropy-Based Self-Adaptive Node Importance Evaluation Method for Complex Networks

Identifying important nodes in complex networks is essential in disease transmission control, network attack protection, and valuable information detection. Many evaluation indicators, such as degree centrality, betweenness centrality, and closeness centrality, have been proposed to identify important nodes. Some researchers assign different weight to different indicator and combine them together to obtain the final evaluation results. However, the weight is usually subjectively assigned based on the researcher’s experience, which may lead to inaccurate results. In this paper, we propose an entropy-based self-adaptive node importance evaluation method to evaluate node importance objectively. Firstly, based on complex network theory, we select four indicators to reflect different characteristics of the network structure. Secondly, we calculate the weights of different indicators based on information entropy theory. Finally, based on aforesaid steps, the node importance is obtained by weighted average method. +e experimental results show that our method performs better than the existing methods.


Introduction
Complex network is playing an important role in our daily life. People communicate with each other and stay in touch with old friends through online social networks.
e Internet connects all the world, so nowadays information spreads faster and wider than before. Electricity companies build their own networks to provide electricity for production and living. Policemen cooperate through their inner networks in catching criminals. However, identifying important nodes in complex networks is a critical issue in various situations. For example, if the important nodes are well isolated during disease transmission, the epidemic would be controlled effectively. In traffic networks, we can ease traffic congestion by taking corresponding measures to split traffic flow in certain important nodes. However, how to evaluate node importance is not an easy task, especially if there exist layered but complicated relationships instead of a flattened hierarchy. A typical example is mobile edge computing network, where smaller edge clouds connects with each other, but at the same time follow arrangements from larger edge clouds. While mobile users may connect to and accept service from both of them [1].
To solve this problem, many researchers have proposed different methods to identify important nodes in complex networks [2]. Considering the local information of nodes and their neighbors, degree centrality [3], semilocal centrality [4], and k-shell decomposition [5] are proposed to characterize the node importance. But it does not take into account the layer information among nodes of mobile edge computing network. Because these indicators only consider the local information of nodes, the calculation complexity is low. However, they cannot accurately reflect the characteristics of the whole network. To consider more global characteristics of the whole network, closeness centrality [6], betweenness centrality [7], etc. are proposed. ese indicators consider paths and information flows. erefore, these indicators could reflect more characteristics about the whole network structure. However, the calculation complexity is high. To reduce the time complexity, some researchers proposed methods that divide networks into several parts, such as community-based methods [8] and cluster-based methods [9]. Besides the aforesaid methods, researchers also proposed methods considering not only the number of neighbors but also the importance of neighbors [10][11][12][13]. ese methods often iterate step by step to obtain the steady result of each node.
From the above discussions, we can find that most of the indicators listed above could only reflect one characteristic of the network and cannot produce a comprehensive evaluation. To make up for it, we can combine several indicators and assign different weights to different indicators to obtain better evaluation result. However, the weight assignment is usually selected based on the subjective experience of the researchers instead of enough scientific basis, which has great possibility of leading to inaccurate evaluation results.
In this paper, we propose an information entropy theory-based self-adaptive node importance evaluation method (EBSAM), which could evaluate the node importance objectively by adaptively assigning weights to different indicators. Firstly, we select four indicators to evaluate node importance separately based on complex network theory. Secondly, the weights of different indicator are calculated based on the information entropy theory. Finally, the four indicators are combined together to indicate the node importance. So the proposed EBSAM can combine the best one of other node importance indicators and compute the weights of them to create a more comprehensive indicator. To study the effectiveness of our method, we conducted experiments on three different networks and compared the results with other methods. e results show that the proposed method performs better than the existing methods and improves the evaluation accuracy. e rest of the paper is organized as follows. Section 2 gives a brief overview of the related work. Section 3 illustrates the problem definition and other basic preliminaries. In Section 4, we propose the entropy-based self-adaptive evaluation method and explain the technical detail of it. e comparative simulation experiments followed by the result analysis are given in Section 5. Section 6 concludes this paper and points out the future work.

Related Work
Researchers have proposed many methods to evaluate the node importance from different perspectives. In this section, we look into some of the most recent and important research works done on node importance evaluation in complex networks.
Liu et al. [14] proposed a node ranking method based on the importance of lines. Firstly, the proposed method calculates the importance of lines between nodes with their topological properties. In addition, the contribution of each node to the line importance is recorded. e final ranking result is a combination of the node degree and its contribution to the line's importance. Important bridge nodes could be well identified with lower computational complexity. e proposed method performs better than current single local centrality measures, but still does not consider enough global information for more accurate evaluation. Hu et al. [15] applied the Locally Linear Embedding (LLE) algorithm [16] in evaluating node importance. LLE, which is often used in machine learning, is a nonlinear dimensionality reduction technique. In order to identify the important nodes in a complex network, several centrality measures have been proposed. e input of the algorithm is a matrix constructed by calculating the centrality measures of the nodes in the network. However, due to the limitation of LLE, this algorithm has some requirements for the distribution of the input data. Xu et al. [17] proposed a comprehensive node importance evaluation approach by classifying nodes into several types according to their functions in the network. Different measure indices are applied to evaluate the importance of different types of network nodes. e paper takes the power transmission grids as example and divides nodes into three types: power supply node, connection node, and terminal load node. For each type of node, the ranking result is obtained based on different centrality measures according to their function in the network. Although this method could evaluate node importance precisely, it is only applicable when nodes in the network could be divided into several different types. For networks where the functionality of node is hard to distinguish, the method performs badly. Zhang et al. [18] proposed a node importance evaluation method that combines betweenness centrality and closeness centrality. ey believe that two types of factors determine the importance of nodes. e first factor is its location in the network, and the second factor is the contribution of its neighboring nodes. Betweenness centrality has an important impact on the location of a node, and closeness centrality could determine the contribution of neighboring nodes. e final node importance is a plus of the two factors. Pinget al. [19] believe that the importance contributions from both the adjacent and nonadjacent nodes have an important impact on the node importance. ey divide nodes into different layers according to their distance with the evaluated node. In addition, two parameters are defined to indicate the dependence strength between two nodes. e contribution probability from one node to another is denoted by the importance correlation parameter. e impact of the layer on the dependence strength is reflected by the strength correlation parameter. e final result combines both the importance of the evaluated node and the contribution of other nodes in the network. e above methods mainly exploit the local information or global information to evaluate the node importance.
Yu et al. [20] evaluate the node importance considering both the factors of the node closeness centrality degree and the node degree. e global importance of nodes is represented by closeness centrality. e local importance of nodes is characterized by the importance contribution between adjacent nodes. erefore, both local attributes and global attributes are considered during the node importance evaluation process. Hu et al. [21] proposed a method that combines the k-shell decomposition algorithm with the community centrality. e method considers not only the local information of the node but also the community structure it belongs to. e final result is a combination of these two indicators. Different weights are assigned to the 2 Complexity two indicators. However, the weight is set based on the people's personal experience on network structure. erefore, the evaluation result is very subjective. Zhang et al. [22] proposed a new algorithm combines betweenness centrality and Katz centrality. e proposed method comprehensively considers both the local node importance and the global node importance. It overcomes the limitations of betweenness centrality for only considering shortest paths. In addition, it overcomes the limitations of Katz centrality for local optimum. However, the weights of the two indicators are selected by conducting amounts of experiments on the dataset with different weight values. Apparently it is not a good way to determine the weight value by conducting lots of experiments. Yang and Xie [23] proposed a node importance evaluation method by using the multiobjective decision method. ey select several different representative indicators. e weights of the indicators are calculated based on Analytic Hierarchy Process. Each node in the network is regarded as a solution, and different indicators of each node are regarded as the solution properties. e evaluation result is obtained through calculating the closeness degree of each node in the network to the ideal solution. In this method, the weights of different indicators are calculated using Analytic Hierarchy Process. erefore, the accuracy is highly dependent on the researchers' personal experience. Similarly, Liu et al. [24] proposed a multiattribute ranking method for node importance evaluation in complex networks. ey also select four representative indicators and assign the weights by using Analytic Hierarchy Process. e final result is obtained using the Technique for Order Preference by Similarity to Ideal Object (TOPSIS). e method is similar to the method proposed in [23]. e difference between these two methods lies in representative indicators selection. e above methods have the problem that the accuracy is highly dependent on the researchers' personal experience. erefore, how to assign appropriate weights for different indicators in different networks objectively and adaptively is still a problem to be solved. We will address the problem in this paper.

e Topology of Complex
Networks. e complex networks can be modelled as undirected and unweighted networks. We define an undirected and unweighted network as .., v n denotes the set of nodes in the complex network, and E � e ij � (v i , v j ) | i � 1, . . . , n; j � 1, . . . , n denotes the set of edges in the complex network. n is the total number of nodes in the network.

e Definition of Node Importance Indicators.
ere are two different types of methods in network node importance evaluation. e first type of methods only considers the local node information, which means that only the node itself and its neighbor's quantity are considered. e second type of methods considers the hierarchy infrastructure of a network and the position of each node of the network, which means that the global information of a node is considered. To absorb their respective advantages and effectively evaluate the node importance, we adopt two local-information-related attributions and two global-information-related attributions. Degree centrality and improved K-shell decomposition can reflect the local information of a node. Moreover, closeness centrality and betweenness centrality can reflect the global information of a node.
3.2.1. Degree Centrality. Degree centrality [3], namely DC, is defined as the ratio of the number of edges that connect to a node directly: where d i is the number of edges connecting to node v i directly. n is the total number of nodes in the network. A larger value of DC i indicates that node v i has more neighbors. erefore, v i can influence more nodes in the network and is more important.

Closeness Centrality.
Closeness centrality (CC) [6], is defined to represent the average distance of node v i to all other nodes in the network. Suppose l ij denotes the length of the shortest path from the source node v i to the destination node v j . e average shortest distance from node v i to all other nodes in the complex network can be calculated by e smaller s i is, the more important v i is. e closeness centrality CC i of node v i is defined as the reciprocal of s i : If there is no path between v i and v j , CC i is set to 0. A larger value of CC i indicates that node v i is closer to the centre of the network. In other words, the position of node v i is very important in the network.

Betweenness Centrality.
Betweenness centrality (BC) [7] is defined to represent the importance of a node in data transmission. Suppose v s and v t are two nodes in the network. e betweenness centrality is defined as follows: where g s,t denotes the number of the shortest paths from v s to v t . g i s,t denotes the number of the shortest paths (from node v s to node v t ) that go through node v i . A larger BC i indicates that there more shortest paths travel through node v i . erefore, v i is more important in the data transmission process.

Improved K-Shell
Decomposition. K-shell decomposition [5] is employed to identify the position of a node. e schematic diagram of K-shell decomposition is illustrated in Figure 1(a). Firstly, remove all nodes whose degree is 1 from the network, and set their Ks value to 1. Repeat this operation until the degree of all nodes in the network is larger than 1. en set Ks � 2, 3, . . ., and do the removing operation continuously until all nodes have been removed from the network. e larger Ks is, themore important the node is in the network. As can be seen from the definition, K-shell decomposition would assign the same value to all nodes when the network is a Star network or a Tree network. To overcome this challenge, improved K-shell decomposition (IKs) is proposed by Liu et al. in [24]. e process of improved K-shell decomposition calculation is illustrated in Figure 1(b). Firstly, IKs is initialized to 1. en, all the nodes whose degrees are minimum currently are removed from the network and IKs is increased by 1. Repeat this operation until all nodes have been removed from the network. e improved K-shell decomposition can overcome the limitation of K-shell decomposition and can reflect the characteristic of the network structure more precisely.

Our Proposed Method
We illustrate the technical details of our entropy-based selfadaptive node importance evaluation method in this section.

Attribute Matrix of Nodes.
e nodes in a complex networks can be denoted by V � v 1 , v 2 , . . . , v n . e indicators that are chosen to evaluate the node importance are defined as I � I 1 , I 2 , . . . , I k . k is the total number of indicators. In our method, I � DC, CC, BC, IKs { }, and the attributes of node v i can be expressed as a i1 , a i2 , a i3 , a i4 . erefore, the attribute matrix P is defined as follows:

Data Normalization.
e value of different indicators can vary in different ranges. For example, the value of DC is a decimal number in [0, 1] while the value of IKs is larger than 1. So the data should be normalized before they are combined together to allow for a uniform measurement. Common normalization methods include decimal scaling, Gaussian normalization, zero-mean normalization, minmax normalization, etc. Min-max normalization method is employed to normalize the attribute matrix, defined as following: e normalized attribute matrix R is as follows: 4.3. Weights Calculation. Introduced by Claude E. Shannon in 1948, entropy is a measure of unpredictability and uncertainty in information [25,26]. For example, the entropy is zero when we toss a two-headed coin. at is because there is a 100% chance of getting heads. e entropy has a maximum value when we toss a fair coin. Since the chance of getting tails is equal to the chance of getting heads, there is no way to predict what will come next. A smaller value of entropy indicates that there is less useful information content [27][28][29][30][31]. In a multiattribute decision-making problem, we need to assign a larger weight to attribute with more useful information rather than the attribute with greater uncertainty. By analyzing the probability distribution of the original data, we could obtain the entropy objectively.
Calculating the weight of each attribute based on entropy is more reasonable than setting it subjectively. In this paper, node importance is decided by four indicators and their weights are obtained based on entropy theory. Suppose the weight of each indicator is expressed as W � w 1 , w 2 , w 3 , w 4 . According to Shannon entropy theory, the entropy of each indicator can be calculated as follows: where b ij is the normalized jth indicator value of node v i . And e j (j � 1, 2, 3, 4) is the entropy of the indicators. As mentioned above, the larger the entropy is, the less the useful information contained in the indicator. erefore, the weight should be smaller. e weight of each indicator is calculated by the following: We now illustrate the relationship between entropy and weight by taking the campus network of Beijing University of Posts and Telecommunications (BUPT) as an example. e topology of the campus network of Beijing University is illustrated in Figure 2. e dots in Figure 2 denote the main nodes of BUPT campus network. e relationship between entropy and weight is illustrated in Figure 3. As we can see, the larger the entropy is, the smaller the weight is. e smaller the entropy is, the more useful information can be provided by an indicator. erefore, the indicator with smaller entropy has a larger weight.

Node Importance Ranking.
e node importance is calculated by the following: e larger s i (i � 1, 2, . . . , n) is, the more important the node is. e general node importance calculation and node ranking steps in complex networks is shown in Algorithm 1. Firstly, determine the indicators DC, CC, BC, IKs { } and calculate the value of the four indicators for all nodes in the complex network. en we construct the attribute matrix P based on equation (5). irdly, we calculate the normalized attribute matrix R based on equations (6) and (7). Fourthly, we calculate the entropy e of each indicator based on

Experiments
We conducted the experiments on three real networks and compared the results of our method with the random selection method (Random) and the TOPSIS-RE method in [24]. e experimental result proves that our method performs better.

Experiment Setup.
e selected networks are the campus network of Beijing University of Posts and Telecommunications (BUPT), Shanxi Water Network, and Shanxi Railway Network. First we prove the effectiveness of our method by experimenting on the BUPT campus network. en we illustrate the experimental results on Shanxi Water Network and Shanxi Railway Network to see how the proposed method works in more complicated cases. e experiment is conducted on a PC with Intel Core i5-3470 3.2 GHz CPU, 4 GB RAM.
TOPSIS-RE extensively employs the Technique for Order Preference by Similarity to Ideal Object (TOPSIS) to  Input: the normalized attribute matrix R Output: the ranking result (1) for each I j in I do (2) sum j � 0; ensum j � ensum j − p ij ln p ij ; (10) end (11) e j � ensum j /ln n (12) esum � esum + e j (13) end (14) for each I j in I do (15) w j � 1 − e j /4 − esum; (16) end (17) for each v i in V do (18) s i � 4 j�1 w j b ij (19) end (20) Rank the node list based on s i (21) return the ranked node list; ALGORITHM 1: Node importance ranking algorithm. 6 Complexity evaluate the node importance. e core idea of TOPSIS-RE is to construct a positive ideal object and a negative ideal object from the original data. e positive ideal object is calculated based on the max value of the indicators, and the negative ideal object is calculated based on min value of the indicators. All methods are implemented by using the network analysis software Cytoscape together with Java programming language.
In the experiments, all nodes are ranked based on the node importance. en, the nodes are removed one by one from the

Method
Ranking results EBSAM  8  31  11  3  18  22  7  20  21  4  9  10  14  24  26  28  TOPSIS-RE  8  18  31  3  11  7  21  20  9  10  4  14  24  26  28  22  Random  19  12  23  25  30  32  13  34  16  5  27  21  9  20  6    Complexity networks according to the ranking results. e Number of Connected Components (NCC) is employed to evaluate the effectiveness of the methods. A connected component of an undirected network is a subgraph in which any two nodes are connected to each other by edges. After we remove one or more nodes in a network, the network will be divided into several disconnected subgraphs. Any node inside a subgraph is reachable from other nodes in the same subgraph. ere is no path between two nodes belonging to different a subgraph. NCC is the number of these disconnected subgraphs. NCC reflects the connectivity of a network. e robustness of a network could be measured by calculating the size of the largest connected component after removing a fraction of the nodes [32][33][34]. e number of connected component in a network could reflect its connectivity. A larger value of NCC reflects that the network is divided into more disconnected subgraphs, which indicates the node you remove is more important respect to network connectivity. erefore, a larger value of NCC indicates a better performance. A node is considered to be more important if more number of connected component increases after it has been removed.

Experiment Results on BUPT Campus
Network. e topological structure of BUPT campus network is illustrated in Figure 2. e number in each node is just an identity of the node. It does not have any meaning except to identify different nodes. e node can be identified by its number in the graph. e node importance rank results of EBSAM, TOPSIS-RE, and a random selection algorithm (Random) are The number of nodes to be attacked Complexity illustrated in Table 1. We only list the top 16 nodes in the rank results because the rank results of the rest nodes are the same in EBSAM and TOPSIS-RE. According to the rank result, we remove the nodes one by one from the network until all nodes have been removed from the network. We calculate the number of connected components in the network after removing a node. e removing process of EBSAM is shown in Figure 4. We list out the topological structure of the network after removing every four nodes. e NCC of EBSAM and TOPSIS-RE and Random methods are shown in Figure 5. As we can see, the number of connected components of Random method is much less than the other two methods. erefore, Random is less effective in destroying the network by attacking the important nodes. We can also see that the number of connected components of EBSAM is more than TOPSIS-RE. Hence, the connectivity of the network is worse with EBSAM. Attacking the network based on the ranking result of EBSAM is more effective than TOPSIS-RE. at is because we obtain the weight of the four indicators objectively and adaptively other than assign a fixed value subjectively.

Experimental Results on Shanxi Water Network.
As shown in Figure 6, Shanxi Water Network plays a vital role in the normal production and living activities. e green line in Figure 6 denotes the water supply network. e Shanxi water network provides guarantee for water demand of north China, and its topological structure is shown in Figure 7. As shown in Figure 7, Shanxi Water Network is composed of 82 nodes. e experimental result is shown in Figure 8. As we can see, the connectivity of the network has been destroyed after the top 50 nodes have been attacked. However, NCC of our method is larger than the other two compared methods. erefore, the performance of EBSAM is better than other compared methods. Yun Cheng The first 1000km The second 1000km The third 1000km Expressway under construction and to be built

Experimental Results on Shanxi Railway Network.
Finally, we conduct experiment on Shanxi Railway Network. As shown in Figure 9, Shanxi Railway Network is a part of the transportation network in Shanxi. It provides great convenience for people's outgoing and commodities trading. e topological structure of Shanxi Water Network is shown in Figure 10.
e experimental result is shown in Figure 11. e network is coming to break down after the top 60 nodes have been attacked. e NCC of Shanxi Water Network obtains the largest ascent with our The number of nodes to be attacked erefore, EBSAM performs better than other methods.

Conclusions and Future Work
In this paper, we proposed an entropy theory-based selfadaptive node importance evaluation method for complex networks. Firstly, we select four centrality measures which can reflect different characteristics of the node as node importance evaluation indicators. en, we combine them together with appropriate weights calculated by an entropy theory-based algorithm. e algorithm shows a strong adaptability and thus allows be widely implemented in different kinds of networks. In the traditional method, the weights are selected based on the subjective experience of the researchers instead of enough scientific basis, which would lead to inaccurate evaluation results. e proposed method is better because it utilizes entropy theory to calculate the weight of each indicator. A smaller value of entropy indicates that the corresponding attribution contains less useful information. In a multiattribute decision-making problem, we need to assign a larger weight to attribute with more useful information rather than the attribute with greater uncertainty. So with this algorithm, we can better assign proper weight to different attributions. e experimental results on three types of real-world complex networks show that our method performs better with compared methods. Our ongoing research will focus on investigating the effectiveness of our method in more complex environments.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.