Link Prediction and Node Classification Based on Multitask Graph Autoencoder

The goal of network representation learning is to extract deep-level abstraction from data features that can also be viewed as a process of transforming the high-dimensional data to low-dimensional features. Learning the mapping functions between two vector spaces is an essential problem. In this paper, we propose a new similarity index based on traditional machine learning, which integrates the concepts of common neighbor, local path, and preferential attachment. Furthermore, for applying the link prediction methods to the field of node classification, we have innovatively established an architecture named multitask graph autoencoder. Specifically, in the context of structural deep network embedding, the architecture designs a framework of highorder loss function by calculating the node similarity from multiple angles so that the model can make up for the deficiency of the second-order loss function. Through the parameter fine-tuning, the high-order loss function is introduced into the optimized autoencoder. Proved by the effective experiments, the framework is generally applicable to the majority of classical similarity indexes.


Introduction
Nowadays, with the explosive growth of network data, the mainstream network representation learning algorithms are gradually difficult to adapt to the intricate data types. A variety of approaches were proposed to address privacy [1] and security [2] issues. The network is the carrier of the sophisticated relationships between data. Taking social networks as an example, large websites such as Twitter and Facebook have been consistently developing for a long time so that they can possess millions of online users. The user information scale is enormous, and the network structure is rather intricated. Thus, a mass of relationships between online users are worth exploring. By capturing the structural characteristics of real-world networks, experts and scholars can deal with multiple data analysis tasks efficiently, such as community detection [3], link prediction [4,5], and node classification [6]. The emergence of network representation learning [7,8] technology is of vital significance to social network analysis.
In the field of link prediction based on the classical similarity index, the CN [9] index calculates the number of common neighbors to predict the potential links between node pairs. The AA [10] index imposes a penalty on lowerconnected neighbors. The Jaccard [11] index measures the similarity by comparing the proximities and differences between sample sets of common neighbors. The LP [12] index introduces the influencing factor of a third-order local path to the algorithm. The Katz [13] index improves the prediction accuracy by optimizing the LP index, by which it comprehensively extends the local path to the global path.
Motivated by Natural Language Processing [14], lots of network representation learning algorithms based on the Continuous Bag-of-Word model and Random Walk have gradually appeared. Essentially, it is a network mapping technique that each node is uniquely represented in form of lowdimensional vectors. By measuring the similarities between embedding vectors, these latent representations are probably to find the potential correlations between different entities denoted by nodes. Specifically, the low-dimensional space can visualize the potential links in the complex network that are hard to be observed. Network representation learning not only is broadly employed to handle sophisticated social network tasks but also can be parallelized to reduce computational time.
Perozzi et al. [15] utilized the Random Walk mechanism to traverse all the network nodes deeply and preferentially. Given the initial node and walk step size, the algorithm samples a neighbor node as the next access node at random and then constitutes node access sequences of specified length in order so as to express the cooccurrence relation between nodes. After obtaining associated sampling data, the algorithm inputs sampling data into the skip-gram model for training, and the neighborhood structure of discrete nodes is then represented by vectors. Struc2vec [16] redefines node similarity from the perspective of a spatial structure. The algorithm constructs the weighted hierarchy graph by computing the node pair distances in different layers. Eventually, it leverages the generated node sequences that are structurally similar to learn network representations. Tang et al. [17] use the gradient descent method to separately optimize the first-order proximity and the second-order proximity. During the process of training, Tang et al. apply the negative sampling [18] method to decrease the time complexity.
Here, the contributions of our paper are demonstrated as follows: (1) We propose a new link prediction algorithm of mixed local neighbor and path, namely, MLNP. (2) For the deficiency of loss functions in structural deep network embedding (SDNE), our work establishes an architecture of multitask graph autoencoder (MTGAE), which designs a framework of high-order loss function from the perspective of capturing the similarity information. (3) We confirm the universal effectiveness of the loss function framework on different datasets. The specific model flow chart is shown in Figure 1.

Related Works
2.1. Autoencoder. As a special form of feedforward neural network, an autoencoder [19][20][21][22] is often used for dimensionality reduction feature learning in a graph embedding field. Let R N be an N-dimensional adjacency matrix repre-senting a graph network as input and x i ∈ R N be an adjacency vector comprised of the local neighborhood structure information. The autoencoder consists of two components: the encoder gðx i Þ: R N → R D and the decoder f ðy i Þ: R D → R N . Specifically, it maps the adjacency vector to the lowdimensional embedding space composed of several nonlinear functions and acquires the approximate representation vector by effective way of compressing the graph-structured data. Then, we decode the embedding vector and represent it as the reconstruction vector b x i . During the backward pass, the reconstruction loss error between the input and the output is minimized by adjusting the weight matrix cyclically. The representation vectors of latent space for different layers are computed as follows:    is the ðk − 1Þth layer latent vector, b ðkÞ is the biases of the kth layer, and σð·Þ denotes the sigmoid nonlinear activation function.

Structural Deep Network
Embedding. In 2016, Wang et al. [23] put forward a structural deep network embedding model in two aspects. The first-order proximity captures local structure features of the network by judging whether nodes are linked by a direct edge [24], which can be thought of as the supervised component. Meanwhile, the secondorder proximity preserves global structure features by observing the differences between the neighborhood structure of nodes, which can be regarded as the unsupervised component. Two concepts of proximity describe the characteristics of the network structure from complementary viewpoints. The SDNE model gives weights to the first-order and second-order proximity loss functions for iterative optimization, respectively. The SDNE architecture is shown in Figure 2.
The first-order loss function makes the corresponding embedding vectors of adjacent nodes y ðkÞ i and y ðkÞ j approximate in embedding spaces. The objective function is calcu-lated as follows: where trð·Þ denotes the matrix trace, s i,j is the element of the adjacency matrix, L is the Laplace vector matrix, and Y is the encoded vector matrix of the hidden layer. Intuitively, the second-order proximity compares the neighborhood structure of node pairs, and the proximity is computed as follows: where b x i − x i is the reconstruction error, ⨀ denotes the Hadamard product, and b i is a penalty coefficient, Affected by the sparsity of the network, the quantity of zero elements in the adjacency matrix is far more than that of nonzero elements. We assume that the adjacency matrix is directly Mixed local neighbor and path. Input: edge list Output: similarity matrix, AUC score 1: Input adjacency matrix A 2: Divide all edges into the training set and probe set 3: Construct the third-order path matrix with attenuation parameters αA 3 4: Construct optimized common neighbor matrix S MAARA 5: Construct matrix based on the method of preferential attachment S PA = k x × k y 6: Calculate the similarity matrix that incorporates multiple methods S MNLP = S MAARA * S PA + αA 3 7: Calculate the AUC score of the MLNP index Algorithm 1: Multitask graph autoencoder. Input: the network G = ðV, EÞ with adjacency matrix M, node labels, the parameters α, β, γ, v Output: network representation Y and updated parameter θ 1: Apply Adam optimizer and ReLU activation function 2: Construct the similarity matrix M 3: X = A 4: Repeat 5: Based on X, apply Equation (1) to obtainX and Y = Y K 6:

8:
Loss 1st = 2trðY T LYÞ 9: Loss mix = αLoss 1st + Loss 2nd + Loss high-order + νLoss reg 10: Use ∂L/∂θ to backpropagate through the whole network to obtain the parameter θ 11: Until converge 12: Obtain the network representations Y = Y K Algorithm 2: 3 Wireless Communications and Mobile Computing addressed as the input of SDNE; it is simpler to reconstruct the zero elements. However, this is not in accordance with our previous expectations, and a reasonable solution is to impose a higher penalty coefficient β on the reconstruction error of nonzero elements. The ultimate goal of the SDNE model is to jointly optimize the proximity loss functions, and the integral loss function is shown in where Loss reg denotes the regularization term to avoid overfitting. Because of the robustness of the sparse network, performances of overall optimization are hardly affected by variations of parameters α and β.

Proposed Link Prediction Algorithm
In this paper, we innovatively propose an MLNP link prediction algorithm that integrates methods of common neighbors, high-order path, and preferential attachment. We adjust the structural factors of the LP index by weighing prediction accuracy against computational efficiency. The calculation method is shown in where A is the adjacency matrix, α is the attenuation parameter, and k x and k y denote the degrees of pairwise nodes. More importantly, Γð·Þ means the neighbor nodes. By utilizing the MAARA matrix based on the AA index and RA [25] index, we highlight the importance of nodes with tremendous influence. In specific, the algorithm enhances the contribution of nodes with higher degree centralities to similarity and weakens the contribution of nodes with lower degree centralities to similarity. We distinguish common neighbors with different degree centralities to reflect the correlations between pairwise nodes more accurately. The node similarity is calculated as follows: According to the theory of preferential attachment [12], the probability of potential links between the central node and other neighbor nodes is directly proportional to the degree centrality of the central node. Furthermore, the likelihood one link connecting pairwise nodes v x and v y is also directly proportional to k x × k y . To summarize, the Hadamard product of the reconstructed MAARA matrix and PA matrix compresses the local neighborhood information so that we can thoroughly take the properties of nodes themselves, the number, and influence of common neighbors into consideration.
The above method conducts structural optimizations for the common neighbor index and explains its superiority from the theoretical level. Inspired by the idea of the global path, as the number of intermediate nodes in local paths increases, the weight parameter of the high-order path will decay. Intuitively, the number of second-order paths is equal to the number of common neighbors that have been discussed, indicating that the weight of the third-order path is the highest. Hence, our work innovatively introduces the factor of third-order path combined with the above matrices into the ultimate similarity matrix so as to produce a substantial boost on prediction accuracies. The basic algorithm procedure is shown in Algorithm 1.

Multitask Graph Autoencoder
4.1. High-Order Loss Function. The deficiency in secondorder proximity of the SDNE model is explained as follows: When imposing a penalty coefficient β on nonzero elements, the only criterion for measuring similarities is whether an edge exists between pairwise nodes. Factually, the properties of common neighbors, the length of paths, and even the attenuation parameters will bring about deviations in the process of computing similarities. The adjacency matrix only describes the actual condition, while the similarity matrix reveals the hidden structural similarity of the network. For instance, a couple of individuals who have more common friends are more likely to establish friendships, even though they do not get acquainted with each other before. In network topology, we can directly observe the explicit links but may  ignore the potential links simultaneously. Thus, the idea of applying the adjacency matrix only is single that seeking the potential links inferred by the algorithm is the key to lifting the capability of our model. The high-order proximity and second-order proximity are complementary in that they, respectively, punish matrix elements according to the explicit similarity and the hidden similarity of the network structure. By using the backpropagation algorithm, we cyclically minimize the introduced high-order loss function error. In detail, the reconstructed high-order loss function is defined as follows: where M is the similarity matrix and γ is the adjustment parameter. Parameter γ directly controls the fluctuation range of similarity and constrains the reconstruction weight. We believe that γ should be consistent with β (1-20), or the different loss functions will exhibit extreme imbalance. Our model has its advantages in addressing the tasks of link prediction and semisupervised node classification at the same time. In specific, we borrow the idea of link prediction, which takes the output similarity matrix as an intermediate product, and then, we input the processed vector matrix into a stacked autoencoder.

Optimization of Autoencoder.
In our experiment, we use the Keras [26] module to implement two layers of encoder and decoder at the CPU-enabled Tensorflow [27] backend. The hidden layer dimensionality of our model architecture is fixed at N-256-128-256-N. Due to the abandonment of the deep belief network [28] structure for parameter pretraining, the SGD optimizer and Sigmoid activation function applied by the original SDNE algorithm may lead to the 5 Wireless Communications and Mobile Computing cessation of training. Alternatively, our architecture attempts to apply the Adam [29] algorithm with a fixed learning rate and ReLU [30] activation function for optimization.
The Adam optimizer has the characteristics of inertia retention and environmental perception. The method of calculating a new round of gradient descent is the linear weighting of the current real gradient with the gradient used in the previous round for gradient descent. The superiority of its adaptive learning efficiency lies in overcoming the network sparsity problem effectively. Compared with the SGD optimizer which is easy to converge to the local optimum and trapped in the saddle point, the Adam optimizer is recognized for accelerating the convergence speed and maintaining the convergence stability. However, the adaptive learning rate algorithm of the Adam optimizer performs worse in the fields of object recognition and syntax component analysis. In the deep neural network (DNN), the gradient of the Sigmoid activation function is very small at a position away from point 0. During the backpropagation phase, the information loss problem caused by gradient disappearance may occur, and computation of the partial derivative involved with division may increase the time complexity of the algorithm. ReLU activation function, however, can effectively alleviate this type of vanishing gradient issue and perform well in enhancing computational efficiency. To summarize, the complete algorithm is shown in Algorithm 2.

Datasets and Evaluation Metrics.
For link prediction, we evaluate our MLNP algorithm on five classical graphstructured datasets. The fundamental information of datasets is introduced as follows.
NS [31] is a collaboration network of scientists who have published distinguished papers on the topic of complex networks. An observed link is present if there is a cooperative relationship between scientists in papers. PB [32] is an American political blog network that documents the links between blogs extracted from network websites. PPI [33] is a proteinprotein interaction network. The nodes denote macromolecules of proteins, and the links indicate the interactions between a couple of proteins. USAir [34] is an aviation network of the United States that each node corresponds to a termination. If there is a direct air route between terminations, it means that there is a connection between nodes. Router is a router [35] network on the Internet, where nodes denote routers and edges directly connect the two routers for packet exchange through optical fiber or other means.
Our work adopts the most widely used AUC score to evaluate our proposed MLNP algorithm on link prediction tasks. It can be explained as the likelihood that the randomly selected test links score higher than stochastically selected nonexistent links. In contrast to the precision@k evaluation indicator, the AUC score overall measures the prediction accuracy. It is defined as follows: Among n times of experimental comparisons, n′ denotes the occurrences of missing links that score higher than nonexistent links, while n′′ denotes the occurrences of having the same score.

Result Analysis.
To guarantee a more fine-grained comparison, we empirically choose 90% links at random as the training set, and the remaining 10% links constitute the probe set for prediction. We summarize the consequences of link prediction for five datasets in Figure 3.
In comparison to other strong baselines [36], the experiment results explicitly show that the formulated MLNP algorithm consistently achieves the best AUC performance on three datasets fNS, PB, and PPIg, respectively, 1.24%, 0.33%, and 0.3% higher than the best baseline. Although the prediction accuracy of our method is slightly 0.24% lower than the RA index on the USAir dataset and 2.78% lower than the Katz index on the Router dataset, it remains competitive compared with the rest of the similarity indexes.
On small-scale datasets, we explicitly observe that our method outperforms all other baselines, even exceeding the Katz index based on the global path. To our surprise, the prediction accuracy reaches 99.7% on the NS dataset. However, the formulated algorithm gets worse AUC performances than the RA index and Katz index on large-scale datasets. The possible reasons are twofold. Firstly, with the increase of diameter and average path length of the network, it is far from enough that the MLNP algorithm only captures local information. Secondly, the Katz index preserves the global structures adequately by traversing the network. The experiments reveal that  Wireless Communications and Mobile Computing the MLNP algorithm is quite effective for optimization of the original similarity index. We attribute the efficacy of our innovation to multiple integrated methods.

Node Classification Experiments
6.1. Datasets and Evaluation Metrics. We select two air transportation networks of Europe-flight and Brazil-flight to assess the effects of representations. Specifically, the dataset contents comprise nodes, links, and node labels that 399 nodes and 5995 links exist in the Europe-flight network, and 131 nodes and 1074 links exist in the Brazil-flight network. Both datasets divide node labels into four categories, and the detailed statistics of network attributes are computed in Table 1.
To ensure that the adopted similarity theory can traverse the network locally and globally, we calculate the degree distribution of nodes as well. According to the simulation con-sequences, although the quantity of network nodes decreases, the structure information is adversely more intact due to the relatively high link density and average node degree. The exact degree distributions of datasets are shown in Figure 4.
Empirically, we employ the current popular F1-measure indicator [367] to evaluate the quality of graph embedding representations, and the calculation method is defined as follows:

Loss Convergence Comparison of Optimizers.
To check the loss convergence of the Adam optimizer and SGD optimizer, we apply the control variable method to perform 100 iterative training epochs on the premise of consistent model parameters. The simulation experiments of loss convergence are shown in Figure 5.
In this experiment, the results obviously reveal that the architecture combined with the Adam optimizer converges more quickly and more stably. Under the same circumstances, there is no doubt that the capability of the Adam optimizer is better compared with the SGD optimizer.  Table 2. Specifically, the weight parameters of first-order and second-order proximity should remain strictly constant. Affected by the negative effect of overfitting, the autoencoder applies the regularization to limit the weight threshold value in the fully connected neural network.
The feature learning of network structure is insufficient when we train on fewer nodes. Considering the contingent consequences that may appear, we determine to give up sampling 10% and 20% of the observed links in networks for training. Instead, when the training percentage increases from 30% to 90%, every time, we calculate the mean value of 10 experiments to compare the performances between 2.42% and 2.25%, respectively, and enhances the average Macro-F1 by 2.54% and 2.21%, respectively. When the training percentage is up to 90%, it means that the algorithm completely learns the network representations, and the promotion of node classification accuracy reaches the climax, even 5%-6%. It can be seen that whatever proportion of the training set is divided by the experiment, both the Micro-F1 and Macro-F1 of our algorithm are generally higher than those of the related algorithms. We find that our algorithm

Horizontal Contrast of Loss Function Framework.
In order to verify the universal validity of the high-order loss function framework, we separately adopt the same processing method as the MLNP index for the CN index, RA index, and Katz index. In two different networks, the horizontal contrasts of our experiments are shown in Figures 9 and 10.
The results reveal that no matter what kind of similarity index we introduce into the framework of the high-order loss function, the MTGAE model is superior to the SDNE model except for a couple of special cases on two datasets. Only when we randomly sample 80% of the links in the Europeflight network and stochastically sample 50% of the links in the Brazil-flight network for training, the SDN model can behave better slightly than one or two other models. The underlying cause is the particularity of datasets. Moreover, it can be found that when we convert the MLNP index and Katz index to high-order loss functions, the improvement margin of node classification is more apparent. The accurate results are shown in Table 3. We choose the MTGAE model with the best prediction accuracy to display the specific improvement margin compared with the SDNE model. Hence, the experiment consequences demonstrate that the introduced framework of high-order loss function is generally effective in boosting the accuracy of node classification.

Conclusions
In this paper, we put forward an MLNP similarity algorithm that integrates multiple similarity theories. In addition, we establish an architecture of the MTGAE model which introduces the high-order loss function into an optimized autoencoder by preprocessing the similarity index. The extraordinary innovation of the MTGAE model is that it successfully applies the link prediction methods to the field of node classification. Specifically, the MLNP index of link prediction is used as an intermediate product to construct the high-order loss function. The above algorithms perform favorably well in both applications of link prediction and node classification. Furthermore, our work applies different similarity matrices as the high-order loss functions to verify the universal validity of the framework. The results demonstrate that our framework of high-order loss function adapts to the majority of popular similarity indexes.
With the continuous development and innovation of deep learning, numerous deep models with side information of nodes and edges emerge in an endless stream. However, some static models can no longer satisfy the needs of a broad range of practical applications. Experts and scholars have gradually turned their attention to dynamic graph embedding models. Although some professors have put forward algorithms to address the dynamic network, quite efficient methods to handle the multidimensional features still lack. The dynamic network is increasingly becoming a significant research object. Embedding the features of nodes and edges into autoencoder architecture and building dynamic evolution models are becoming significant research directions to extend graph embedding technologies. In the future, the majority of models to address the network representation learning problems have broad application prospects in such as recommender systems [38] and mobile computing [39].

Data Availability
The data used to support the findings of this study are publicly available.

Conflicts of Interest
The authors declare that they have no conflicts of interest.