Link prediction in complex networks predicts the possibility of link generation between two nodes that have not been linked yet in the network, based on known network structure and attributes. It can be applied in various fields, such as friend recommendation in social networks and prediction of protein-protein interaction in biology. However, in the social network, link prediction may raise concerns about privacy and security, because, through link prediction algorithms, criminals can predict the friends of an account user and may even further discover private information such as the address and bank accounts. Therefore, it is urgent to develop a strategy to prevent being identified by link prediction algorithms and protect privacy, utilizing perturbation on network structure at a low cost, including changing and adding edges. This article mainly focuses on the influence of network structural preference perturbation through deletion on link prediction. According to a large number of experiments on the various real networks, edges between large-small degree nodes and medium-medium degree nodes have the most significant impact on the quality of link prediction.
Complex networks play an important role in modeling and analyzing complex systems such as the social system, biological system, and information system [
Link prediction in a complex network includes prediction on links unknown or to be built in a given network. Based on the structure and attributes of a public network, link prediction is intended to predict the possibility of link generation between two nodes, which have not been connected yet [
Diagram of link prediction.
Researchers have made considerable efforts to enhance the accuracy of link prediction. Many link prediction algorithms are based on the similarity between nodes, assuming that the more similar two nodes are, the more likely that there is a link between them. The index describing the similarity between nodes can be roughly classified into the local index, global index, and quasi-local index. The local index is the most commonly used among the methods based on node similarity, due to its simplicity and adaptability for considerably large networks. There are other link prediction algorithms which could be based on maximum likelihood estimation, and they have better performance when dealing with networks with a distinct hierarchical structure, such as a grassland food chain [
Link prediction is usually proven useful in biological networks. In social networks, however, they may raise concerns about privacy and security, in that our data are valuable not only for enterprises and public entities but also for an increasing number of cybercriminals conducting network analysis for malicious purposes. Through the link prediction algorithm, cybercriminals can accurately predict the friends of a social account user and even the owner of that account according to his or her relationships. If they dig further, criminals may find the name, age, address, bank account, and other private information of a social account’s corresponding entity.
Considering what has been mentioned above, it is urgent to improve privacy protection. However, currently, there lacks intensive research on how to prevent identification of link prediction algorithms utilizing concealing, changing, or adding edges through network structural disturbance at a small cost. Based on the perturbation of the adjacency matrix, Lu et al. [
Based on the previous research, this article focuses on the influence of network structural preference disturbance through deletion on link prediction. According to a large number of experiments on the various real networks, edges between large-small degree nodes and medium-medium degree nodes have the most significant impact on the performance of link prediction. In the real-world network, the connection choice between nodes is not uniform, but there is an obvious preference, which leads to a certain correlation between nodes in the network. Based on this connection correlation between nodes, people put forward the concept of homogeneity and heterogeneity to distinguish the connection preference between nodes. Therefore, the heterogeneity of complex network nodes is a measure of the uniform distribution of nodes. If the nodes tend to connect similar nodes, they will form homogeneous network; if the high nodes and low nodes have certain probability to connect, they will form heterogeneous network.
In this section, some basic terminologies used in this article will be first introduced, based on which official definitions will be made. Then, we will present the method of network structural preference perturbation. Finally, the pseudocode of this method will be given.
A complex network: a given biological or social network can be modeled as a graph,
Link prediction: in a given network
Network structural preference perturbation: in a given network
The method of network structural preference perturbation mainly consists of one or more operations among adding, changing, and deleting towards the edges of a network. This article will focus on deletion, trying to identify the particular quality of edges that are significant in influencing the effect of link prediction. For a given network
When a training network has been divided from the original one, we start to apply perturbation on it. For any edge denoted by
In this formula,
After acquiring the deletion value of every edge, we set up a parameter of proportion
The pseudocode of the proposed method is as follows. (see Algorithm
Input: adjacent matrix of the original network Initialize the value matrix: Calculate the value of each element Randomly choose a position Output: the adjacent matrix after the perturbation
We have experimented on four real networks, whose statistics are shown in Table
Statistical features of four real networks, including the number nodes
Networks | |||||||
---|---|---|---|---|---|---|---|
Jazz [ | 198 | 2,742 | 27.697 | 2.235 | 0.618 | 0.020 | 1.396 |
Macaques [ | 62 | 1,187 | 38.290 | 1.380 | 0.667 | 1.039 | |
Metabolic [ | 453 | 2,025 | 8.940 | 2.664 | 0.647 | 4.485 | |
Neural [ | 297 | 2,148 | 14.465 | 2.455 | 0.292 | 1.801 |
Resource allocation (RA) is
Adamic-Adar (AA) index is
Common neighbor (CN) is
Preferential attachment (PA) is
We here choose precision as the index of the performance of link prediction. For a given group of edges that has not been observed, precision is defined as the ratio of successfully predicted edges to the top
For a given network, the train set and test set will be divided by the ratio of
First, the experiment supposes that
Precision under different
Analysis of the reason why perturbation effect reaches its peak when
Then,
Precision under the condition of various
In order to better demonstrate the influence network structural preference perturbation has on link prediction, we have also tested the precision calculated through four algorithms on four data sets, under the condition that
Precision under the condition of various
In this article, the influence of network structural preference perturbation by a deletion on link prediction is analyzed. By using an interactive criterion to determine node degree, we first assign a perturbation value through the calculation to every edge in a given network. Then, we apply perturbation through deletion on edges selected according to perturbation value. This procedure will be repeated until a certain proportion of edges have experienced perturbation. After that, we make link prediction on networks before and after perturbation, using four methods including RA, AA, CN, and PA, compared to the different influence types of connection and the ratio of deletion has on the performance of link prediction.
Massive experiments on various real networks indicate that the edges between large-small degree nodes and those between medium-medium degree nodes have the most significant influence on the performance of link prediction. By deleting the specific link in the network, we can resist the impact of link prediction on privacy protection. The above strategies can not only protect privacy in the field of social networks but also be worth promoting and applying in other fields. For example, in the design of computer communication topology, to minimize the connection between large and small nodes, medium and medium nodes can resist topology estimation, so as to better protect our own network; in the field of counter-terrorism, we should pay more attention to the connection between the leader node and leaf node, which often means the vulnerability of the terrorist team in communication connection.
The data can be obtained upon request to the corresponding author.
The authors declare that they have no conflicts of interest.
This work was partially supported by the National Natural Science Foundation of China (Grant no. 61903266), China Postdoctoral Science Foundation (Grant no. 2018M631073), China Postdoctoral Science Special Foundation (Grant no. 2019T120829), Fundamental Research Funds for the Central Universities, and Sichuan Science and Technology Program (No. 20YYJC4001).