Deep Learning with Graph Convolutional Networks: An Overview and Latest Applications in Computational Intelligence

. Many complex communication network problems can be abstracted as graph-based optimization problems and solved by GCN, thus overcoming the limitations of traditional methods. Tis survey briefy describes the defnition of graph-based machine learning, introduces diferent types of graph networks, summarizes the application of GCN in various research felds, analyzes the research status, and gives the future research direction.


Introduction
AI has drawn the attention of the whole industry in recent years as a frontier feld of scientifc research and has progressively become a new engine for social and economic development [1]. NLP, computer vision, intelligent robots, data mining, cognition and reasoning, and other disciplines of society have widely practiced and implemented AI. Today's network data trafc is increasing dramatically due to the rapid expansion of smart devices such as smartphones, smart automobiles, and smart homes. Simultaneously, using technologies such as edge computing, virtualization, and network slicing broadens network services, improves user experience, and creates a more complicated network environment. Te efcient management of a large number of intelligent devices and the optimization of resource allocation in large-scale and complex network environments has emerged as a critical challenge for future network growth. AI, as the future network's brain, is employed for network optimization and decision-making. Simultaneously, increasing the computational capability of network nodes adds bones and muscles to the network, allowing intelligent network calculations to be performed [2]. Besides that, the image processing feld is also improving day by day due to diferent highdimensional and complex images generated from diferent data sources. Deep learning applications have attracted great attention in the feld of image processing due to their ultrahigh prediction accuracy in recognition applications, which is bound to improve the performance of existing image processing systems and open up new application felds [3][4][5].
Te coordinated development of algorithms and computing power will enable future applications to enter a new era of intelligence. Graph data is a typical non-Euclidean spatial data with complex correlations and interobject dependencies [6]. Te traditional graph theory method is difcult to adapt to the complex graph problems in the future network. Terefore, fnding algorithms to solve complex graph data to guide the resource allocation, management, and scheduling of communication networks has become an important scientifc problem in future networks. As an emerging technology in the feld of AI in recent years, GNN has opened up a new space for processing complex graph structure data. With the help of artifcial intelligence technologies such as deep learning and reinforcement learning, GCN can quickly mine topological information and complex features in graph structures and have solved many major problems in the felds of computer vision, recommendation systems, and knowledge graphs [7,8]. Terefore, the combination of GCN and the latest advancement is an important way to solve real-world issues efciently and efectively.
1.1. Signifcance of this survey. CNN has grown dramatically in recent years, attracting worldwide attention due to its remarkable modeling skills. In comparison to older methods, the introduction of CNN has resulted in signifcant advances in the disciplines of image processing and natural language processing, such as machine translation, image recognition, and speech recognition, among others [9]. Traditional convolutional neural networks, on the other hand, can only deal with data in Euclidean space (such as images, text, and speech), and the data in these domains is translation invariant. Because of translation invariance, we can build a globally shared convolution kernel in the input data space, allowing us to defne a convolutional neural network. Using picture data as an example, an image can be represented as a set of regularly distributed pixels in Euclidean space, and translation invariance means that local structures of the same size can be produced with any pixel as the center [10]. Based on this, the CNN develops meaningful hidden layer representations for pictures by learning convolution kernels shared at each pixel and then models local connections. Figure 1 depicts the distinction between non-Euclidean and Euclidean space.
Although traditional CNN bring improvements in text and image domains, they can only handle Euclidean space data. At the same time, non-Euclidean spatial data: graph data, has gradually attracted attention due to its ubiquity. Graph data can naturally express data structures in real life, such as transportation networks, World Wide Web, and social networks. Diferent from image and text data, the local structure of each node in graph data are diferent, which makes the translation invariance no longer satisfed [11]. Te lack of translation invariance poses a challenge to defne CNN on graph data. In recent years, due to the ubiquity of graph data, researchers have begun to focus on how to construct deep-learning models on graphs. With the help of the CNN's ability to model local structures and the ubiquitous node dependencies on the graph, the GCN has become the most active and important algorithm. Recently, some articles have emerged to explore and summarize deep learning on graphs, but for the most important branch, GCN, the in-depth discussion and summary of its modeling methods and applications are still an area that needs to be focused on. In this regard, in this article, we deeply organize and summarize the development history and future trends of GCN with applications developed recently in all felds of science. Te challenges faced in the construction of GCN mainly come from the following aspects: (i) Graph data are non-European spatial data: Graph data, as non-Euclidean spatial data, does not satisfy translation invariance, which means that each node has diferent local structures. Traditional convolutional neural networks' basic operators, convolution and pooling, rely on data translation invariance. At the moment, defning convolution and pooling operators on graph data are a difcult task. (ii) Diferent characteristics of graph data: Graph data can be used to represent a wide range of real-world applications, such as social networks, citation networks, and political relationships networks, all of which have unique characteristics that can be represented by graph data. Positive and negative tendencies are linked to signs, symbols, and other indicators. GCNs are more difcult to design because they have to model a wider range of graph characteristics. (iii) Graph data with large scale: Large-scale graphs, such as user commodity networks and user networks in social networks, can be used in practical applications with millions or even tens of millions of nodes in them. Additionally, the challenge of building a large-scale graph convolutional neural network with acceptable time and space constraints is a major one.
Te current survey's primary focus is GCN, including its variants and the most recent GCN trends. We specifcally cover all of the most recent works that use the GCN in various felds of science. We concentrate on works published between the years 2000 and 2022. We used the PRISMA (preferred reporting items for systematic reviews and metaanalyses) framework guidelines to select GCN-related publications. Papers were found through a variety of sources of publishers, including Springer, IEEE, MDPI, HINDWAI, WILEY, Elsevier, and ACM library. Articles were searched with diferent titles as follows: "graph convolutional networks," "graph networks," "GCN," "Graph attention network," "Attention-based Graph," "GAT," "GATnet," and "Graph Query." All the articles were searched only in the English language in the selected time period. Te main contributions of this survey are three-fold as follows: (1) We provide a thorough analysis of GCN deep learning techniques, including variants and advancements in GCN, applications, and current trends in various felds of study, performance measures, and so on (2) A hierarchical and structural review of recent improvements in deep learning-based GCN techniques is ofered, and the benefts and limitations of each component for an efective GCN solution are examined (3) In order to provide intelligent, advice to the general public, we discuss the obstacles and unresolved concerns, as well as new trends and future directions Figure 2 gives the complete structure of our survey. Tis paper frstly introduces the basic model of GNN and several important graph neural networks; secondly, it introduces the specifc application methods of GNN in various felds of research such as NLP, computer vision etc; in the conclusion part, it discusses the current research status and gives the future research direction.

Graph Neural Network (GNN) and Its Variant
GNN was frst proposed by Gori et al. [12] and Scarselli et al. [13] elaborated on this model in more detail. GNN proposed by Gori et al. [12] draws on the research results in the feld of neural networks, which can directly process graph structure data, and its core is the local transfer function and the local output function. Te local transfer function generates the state vector of the node, which contains the neighborhood information of the node. Te transfer function is shared among all nodes and updates the node's state vector h 1 according to the input neighborhood, and its expression is as follows: In the formula, x v is the feature of the node x e vu is the feature connecting the node v and its neighbor node u, x u is the feature of the neighbor node of the node v. Te local output function generates a new representation of the node, and its expression is as follows: (2) Te stacking form of the local transfer function and the local output function applied to all nodes constitutes a GNN structure model that will eventually reach a stable state through iteration. Te early graph neural network has great limitations, its efciency is low, the computational cost is high, and the node characteristics. It is difcult to afect the state after multiple updates. In recent years, in order to process graph structure data more efciently, new graph neural networks and application studies have been proposed one after another. (GCNs). GCN introduces convolution operation into graph structure and is one of the most important GNNs at present. According to the diferent feature extraction methods, it can be divided into GCN based on spectral-domain and graph convolution network based on the spatial domain. Te graph convolution network are derived from graph signal processing, and a flter is introduced to defne graph convolution, which can be understood as removing noise through a flter to obtain the classifcation result of the input signal.

Graph Convolutional Networks
Based on the spectral graph theory, Bruna et al. frst proposed the convolution layer function to defne the spectral domain GCN [14]. Kipf and Welling [15] frst proposed the concept of semisupervised GCN based on the spectral domain (structure shown in Figure 3). Te spectral domain graph convolution is defned as the product of the signal and the flter function, and its expression is as follows: In the formula, g θ is the flter function, x is the signal of the graph at the node, and U is the eigenvector of the normalized Laplacian matrix of the graph. g θ can be understood as the eigenvalue function of the graph Laplacian matrix, namely, g θ (Λ), where Λ is the diagonal matrix composed of the eigenvalues of the graph Laplacian matrix, and θ is the function parameter. In order to reduce the computational complexity, g θ (Λ) can be approximated, and its expression is as follows: In the formula, T k is the k-order Chebyshev polynomial, θ ′ is the Chebyshev coefcient vector, L is the graph Laplacian matrix, λ max is the largest eigenvalue of L, I N is the identity matrix, D is the opposite angle matrix, and A is an adjacency matrix. When limiting k � 1, the convolutional layer can be simplifed to as follows: International Journal of Intelligent Systems Ten, the convolutional layer formula of the graph convolutional network is as follows: In the formula, σ(·) is the nonlinear activation function, and w (l) is the l-th layer graph convolution of the weight matrix of the network.
After the concept of GCN was proposed, new forms of graph convolutional network models based on spectraldomain have been proposed, such as AGCN [16], AGC [17], and so on. However, the GCN based on the spectral domain cannot handle directed graphs and has poor scalability, while the GCN based on the spatial domain is more fexible and general. Spatial domain-based graph convolutional networks defne graph convolutions according to the spatial relationships of nodes. NN4G [17] is the earliest proposed GCN based on spatial domain, which realizes graph convolution by directly accumulating feature information of node neighborhood. Te message passing neural network (MPNN) proposed by Gilmer et al. [18] can be regarded as a general framework for GCN based on the spatial domain. MPNN decomposes the spatial domain convolution into two processes: information transfer and state update, and it takes the feature of node v as the initial state of the hidden state, namely, where x v is the feature of node v. Te hidden state update formula of MPNN is as follows: where l is the layer index, U l (·) is the update function, and M l (·) is the information transfer function. After obtaining the hidden representation of all nodes in the graph, the representation of the entire graph can be generated by the readout function where R(·) is the readout function. By defning diferent forms of update function, information transfer function and readout function, MPNN can represent a variety of spatial domain-based graph convolutional networks. Typical spatial domain-based graph convolutional networks also include PATCHY-SAN [19], GraphSage [20], and Difusion CNN [21].

Communication and Networking
Hyperspectral Image Processing and Prediction Computer Vision Transport and Traffic System Bioinformatics

Input Layer
Output Layer Hidden Layer

Graph Attention Network (GAT).
GAT introduces an attention mechanism on the basis of GCN, which enables the model to focus on the information most relevant to the current task, thereby improving the performance of the model. In spectral domain-based GCN, the flter function depends on the Laplacian matrix, which is derived from the graph structure, which makes the model trained on a specifc graph cannot be directly applied to other graph structures.
To solve this problem, Velikovi et al. [22] proposed a new type of graph neural network structure, namely, GAT, and Figure 4 shows the attention mechanism of GAT. Te GAT learns the average value of the neighborhood features of each node in the graph and is diluted and weighted according to the importance of the neighborhood. Te graph attention layer is the key structure of GAT to realize the attention mechanism. Te graph attention layer takes the features of the nodes in the graph as input and outputs another set of higher-level node features that may have diferent cardinality. Te graph attention layer realizes the conversion of input and output through the attention coefcient obtained by the attention mechanism a. Te attention coefcient represents the importance of node j to node i, and its expression is as follows: In the formula, W is the weight matrix applied to all nodes, representing the relationship between input features and output features; x i and x j are the features of node i and node j, respectively. Te model introduces the attention mechanism into the graph structure by only calculating the attention coefcients of nodes and their neighbors without considering the structural information of the graph. To simplify operations and facilitate comparison, the attention coefcients are regularized and used to generate output features as follows: where σ(·) is the nonlinear activation function; αij is the regularized attention coefcient. GAT also introduces a multihead attention mechanism similar to the transformer architecture, which can perform parallel computing on adjacent node pairs and stabilize the learning process. Te complexity of the GAT method is low and it only pays attention to the adjacent nodes without the information of the whole graph, and it does not need to repeat the training model when it is applied to the new graph structure. For complex graph structures, some studies have proposed new graph attention networks, such as heterogeneous GAT [23], multirelational GAT [24], and spectral GAT [25]. Tese models can be used in more complex and informative networks to achieve better results.

Graph
Autoencoder. GAE is an unsupervised learning framework that can convert graph structures into lowdimensional vectors and reconstruct graph structures using encoded information and is often used for graph embedding (GE) and graph structure generation [26]. Graph embedding is a graph representation learning (GRL) method that aims to map graph structure data into low-dimensional dense vectors while preserving node information. Graph embedding enables graph-structured data to be more efciently applied to traditional machine learning algorithms to achieve better results in tasks such as recommendation and classifcation. Typical methods include random walk-based graph embeddings, such as DeepWalk [27] and Node2Vec [28] ( Figure 5 shows the diference between DeepWalk and Node2Vec), and graph embeddings based on matrix decomposition, such as singular value decomposition (SVD), locally linear embedding (LLE), and non-negative matrix factorization (NMF). Compared with graph embeddings based on the random walk and matrix factorization, graph autoencoders can be applied to highly nonlinear graph structures, preserving the nonlinear structure and complex features of graphs. In 2014, Tian et al. [29] applied autoencoders to graph data for the frst time, which took the adjacency matrix of the graph or its variants as the original node features and generated it by stacking sparse autoencoders (SAE). Te nonlinear embedding of the graph, that is, the low-dimensional node representation. Structural deep network embedding (SDNE) [30] is an important graph International Journal of Intelligent Systems autoencoder model that also adopts the stacked autoencoder structure. It maintains the local network structure and global network structure of the graph through the frst-order similarity and second-order similarity between nodes, respectively. Multilayer nonlinear functions generate-graph embedding vectors. Te hidden layer expression of SDNE is as follows: where x v is the feature of node v, W (l) is the weight matrix of the lth layer, and b (l) is the deviation of the lth layer. After the fnal hidden layer output is obtained, the output representation x' can be obtained by inverting the calculation process of the encoder. SDNE contains two loss functions, of which the frst loss function adopts the idea of a Laplacian feature map to preserve the frst-order similarity, and its expression is as follows: In the formula, s i,j represents the connection relationship of the nodes in the graph, if and only when the node i is connected with the node j, s i,j > 0. Te second loss function is used to maintain the second-order similarity, and a penalty vector is introduced to impose a larger penalty on the reconstruction error of nonzero elements than zero elements. Its expression is as follows: where represents the Hadamard product To maintain both frst-and second-order similarity, the joint loss of SDNE is as follows: where L reg is the regularized L2 norm, which is used to prevent overftting. Another type of graph autoencoder uses variational autoencoders (VAE) [31] to implement graph embedding, which is an important generative model and can improve the generalization ability of the model. VGAE [32] applies the variational autoencoder to the graph structure, and its inference model, namely, the encoder, utilizes a 2-layer GCN structure, whose expression is as follows: where μ is the mean matrix of the encoder, log(σ) is the variance matrix, X is the feature matrix, A is the adjacency matrix, and z i is the random latent variable. Te generator function of VGAE, the decoder, is derived from the inner product of the hidden variables, and its expression is as follows:

Other Graph Neural
Networks. In addition to graph convolutional networks and graph attention networks, commonly used graph neural networks also include gated graph neural networks (GGNNs) and spatial-temporal graph neural networks (STGNNs). Te gated graph neural network is an improvement on the traditional graph neural network architecture. By introducing the gated recurrent unit (GRU) into the graph neural network, the performance of the model in the long-term propagation of information is improved. Te gated graph sequence neural network proposed by Ruiz et al. [33], introduced the gated recurrent unit into the information dissemination process and controlled the iterative loop to a fxed number of steps and no longer needed parameter constraints to ensure convergence. In addition to this model, the gated graph neural network model also includes GAAN [34] and so on. Te spatiotemporal graph [30] is a graph structure that depicts the interaction between entities in the spatial and temporal dimensions. It has three basic elements: nodes, spatiotemporal edges, and temporal edges. Te feature matrix in the high-dimensional feature space will change with time. Te spatiotemporal graph neural network can learn the hidden patterns in the spatiotemporal graph, and obtain the feature information of the time domain and the spatial domain in the graph structure at the same time. Spatiotemporal graph neural networks can be divided into methods based on recurrent neural networks (RNNs) and methods based on convolutional neural networks (CNNs). RNN-based STGNN captures spatiotemporal correlations through graph convolution. Compared with the RNN-based method, the CNN-based STGNN processes the spatiotemporal graph in a nonrecursive way, which can perform parallel computation and avoid the problem of gradient explosion or gradient disappearance, such as CGCN.

Graph Convolutional Neural Network Spectral Method.
Te lack of translation invariance on graphs makes it difcult to defne convolutional neural networks in the node domain. Te spectral method uses the convolution theorem to defne graph convolution from the spectral domain. We frst give some background on the convolution theorem. 6 International Journal of Intelligent Systems (1) Graph signal processing: Convolution theorem: the Fourier transform of the signal convolution is equivalent to the product of the signal Fourier transform [35]as follows: Among them, f, g represents the two original signals, F (f ) represents the Fourier transform of f, · represents the product operator, and * represents the convolution operator. Perform inverse Fourier transform on both sides of (1), we can get Among them, F − 1 (f) represents the inverse Fourier transform of the signal f. Using the convolution theorem, we can multiply the signal in the spectral space, and then use the inverse Fourier transform to convert the signal to the original space to realize the graph convolution, thus avoiding the convolution caused by the graph data not satisfying the translation invariance. Defne difcult problems. Te Fourier transform on the graph depends on the Laplacian matrix on the graph. In the following, we will give the defnition of the Fourier transform on the graph. Te defnition of the Fourier transform on the graph depends on the eigenvectors of the Laplace matrix.
Taking the eigenvectors as a set of bases in the spectral space, the Fourier transform of the signal x on the graph is as follows: Among them, x refers to the original representation of the signal in the node domain. x refers to the representation of the signal, Random walk International Journal of Intelligent Systems spectral domain, and U T represents the transposition of the eigenvector matrix, which is used for Fourier transform. Te inverse Fourier transform of the signal x is as follows: Using the Fourier transform and inverse transform on the graph, we can implement the graph convolution operator based on the convolution theorem as follows: Among them, G ⋆ represents the graph convolution operator, x, y represents the signal of the node domain on the graph, and ⊙ represents the Hadamard multiplication, which represents the multiplication of the corresponding elements of the two vectors. We replace the vector U T y with a diagonal matrix g θ , then Hadamard multiplication can be transformed into matrix multiplication.
(2) Graph convolutional neural network based on convolution theorem. Spectral convolutional neural network (Spectral CNN) [36] is the earliest method to construct a convolutional neural network on the graph. Tis method uses the convolution theorem to defne the graph convolution operator at each layer and passes the gradient under the guidance of the loss function. Back-pass learning convolution kernel, and stack multiple layers to form a neural network. Te structure of the mth layer of the spectral convolutional neural network is as follows: Among them, p, q are the dimensions of the input feature and output feature, respectively, X m i ∈ R n represents the ith input feature of the node in the mth layer on the graph, F m i,j represents the convolution kernel in the spectral space, and h represents the nonlinear activation function. In the spectral convolutional neural network, such a layer structure transforms the features from p-dimensional to q-dimensional, and based on the convolution theorem, graph convolution is realized by learning the convolution kernel.
Te spectral convolutional neural network applies the convolution kernel to the input signal in the spectral space, and uses the convolution theorem to realize graph convolution to complete the information aggregation between nodes, and then applies the nonlinear activation function to the aggregation result and Stack multiple layers to form a neural network. Te model does not satisfy the locality, so the locality of the spectral convolutional neural network is not guaranteed; that is, the nodes that generate information aggregation are not necessarily adjacent nodes.
Te original intention of modeling a GCN is to use the graph structure to describe the information aggregation of adjacent nodes, and the spectral convolutional neural network introduced previously does not satisfy locality. Recently, graph wavelet neural network (GWNN) [37] proposed to use of the wavelet transform instead of Fourier transform to realize the convolution theorem.
Te wavelet neural network pointed out that, similar to the Fourier transform, the wavelet transform also defnes a method to transform the signal from the nodal domain to the spectral domain. Here, we use ψ s � ψ s1 · ψ s2 , · · · , ψ sn to represent the basis of wavelet transform, where ψ si represents the energy difusion from the ith node, which describes the local structure of the ith node. Te defnition of the wavelet basis depends on the eigenvectors of the Laplace matrix, namely, , and the diagonal elements are obtained by applying the g function to the eigenvalues. Diferent g functions endow the wavelet base with diferent properties. In the wavelet neural network, the author uses the thermal kernel function, Taking ψ s as the base of the spectral space, the transformation matrix of the inverse wavelet transform in the fgure is Compared with the Fourier transform, the basis of the wavelet transform has several good properties: (1) the basis of the wavelet transform can be obtained by the Chebyshev polynomial approximation, avoiding the high cost of the Laplace matrix eigen decomposition, (2) the wavelet the transformed basis has locality, and (3) the locality of the wavelet basis makes the wavelet transform matrix very sparse, which greatly reduces the computational complexity of Ψ − 1 s x and makes the calculation process more efcient. Te parameter s is used to represent the range of heat difusion, and it can be fexibly adapted to diferent task scenarios by adjusting the hyperparameters.
Using the wavelet transform on the graph to replace the Fourier transform, the mth layer structure of the wavelet neural network is defned as follows: Compared with the spectral convolutional neural network, the wavelet neural network replaces the Fourier transform with the wavelet transform; that is, it replaces U and U T with Ψ and Ψ − 1 s . Under such a set of wavelet bases, the graph convolutional neural network satisfes locality, and the computational complexity of the graph convolutional neural network is greatly reduced due to the accelerated computation and sparseness of the wavelet base. In addition to wavelet neural networks, there are also some works dedicated to realizing locality and accelerated computation of graph convolutional neural networks, but diferent from the way wavelet neural networks replace the substrate, these works achieve locality by parameterizing convolution kernels, while reducing parameter complexity and computational complexity. 8 International Journal of Intelligent Systems GraphHeat [38] analysed the previous spectral methods from the perspective of flters and pointed out that spectral convolutional neural networks are nonparametric flters, while Chebyshev networks and frst-order graph convolutional neural networks are high-pass flters. However, this is inconsistent with the smoothness prior to the task of graph semisupervised learning. Based on this, the graph thermal kernel network uses the thermal kernel function to parameterize the convolution kernel and then implements a low-pass flter.

Spatial Method of Graph Convolutional Neural
Network. In contrast to the previous methods, which all start from the convolution theorem to defne graph convolution in the spectral domain, the spatial method aims to start from the node domain and aggregate each central node and its adjacent nodes by deriving an aggregation function from each central node. Te defnition of the general framework draws attention to the fundamental problems of graph convolutional networks and provides a platform for a comparative analysis of previously published work in the feld. Two recent papers aim to defne a general framework for graph convolutional networks by combining their respective contributions. In particular, hybrid convolutional networks (MoNet) [20] concentrate on the lack of translation invariance on graphs and map the local structure of each node by defning a mapping function, which is then applied to each node. Message propagation networks (MPNNs) [18], on the other hand, are based on the aggregation of information propagation between nodes and propose a framework by defning a general form of the aggregation function. Te lack of translation invariance makes it difcult to defne graph convolutional neural networks, which is a necessary but not sufcient condition. When applied to a graph, hybrid convolutional networks defne an orthogonal coordinate system and represent the relationship between nodes as a low-dimensional vector in the new orthogonal coordinate system. At the same time, the hybrid convolutional network defnes a set of weight functions that can be used to train the network. Te weight function acts on all adjacent nodes centered on a node. Te input is the relationship between nodes (a low-dimensional vector), and the output is a scalar value. With this cluster of weight functions, the hybrid convolutional network obtains a vector representation of the same size for each node as follows: Among them, N(x) represents the set of adjacent nodes of x, f (y) represents the value of the node y on the signal f, u(x, y) represents the node under the coordinate system u, the low-dimensional vector representation of the relationship, w j represents Te jth weight function, and J represents the number of weight functions. Tis step makes each node get a J-dimensional representation, and this representation integrates the local structure information of the node. Te hybrid convolution model defnes the shared convolution kernel on this J-dimensional representation Diferent from the hybrid convolutional network, the message propagation network points out that the core of graph convolution is to defne the aggregation function between nodes. Based on the aggregation function, each node can be represented as the information superposition of surrounding nodes and itself. Terefore, this model proposes a general framework for graph convolutional networks by defning a general aggregation function. Te message dissemination network is divided into two steps. First, the aggregation function is applied to each node and its adjacent nodes to obtain the local structural expression of the node; then, the update function is applied to itself and the local structural expression to obtain the new express Among them, h t x represents the hidden layer representation of the node x in the t-th step, e x,y represents the edge feature of the nodes x, y, M t represents the aggregation function in the t-th step, m t+1 x represents the local structure expression obtained by the node x after passing the aggregation function, and U t represents the update function of the t-th step. Using the previously mentioned aggregation function and update function to designate each layer of the neural network, each node can continuously update itself with information from its own and neighboring nodes as the source information, and then obtain a new expression that is dependent on the local structure of the node. Some methods, such as those described previously, no longer rely on Laplacian matrices but instead design neural networks for learning aggregation functions in the context of the spatial framework. Aggregate functions learned through these methods can be tailored to specifc tasks and graph structures, resulting in greater adaptability and fexibility. Diferent methods of Graph approach are summarized below in Table 1:

GCN in Communication Network.
Te previous graph neural network method has the ability to deal with complex communication network problems and has been applied to network function virtualization, wireless network resource allocation, network modeling, and performance analysis. Software-defned network (SDN) and network functions virtualization (NFV) have been researching hotspots in the feld of communication networks in recent years. SDN separates the control plane and forwarding plane of the network.
Obtain the topology and resource information of the entire network. NFV uses virtualization technology to separate network functions from traditional hardware devices, which improves the fexibility of network confguration. GNN can be used to solve problems in SDN and NFV that need to explore graph structures, such as dynamic resource allocation, service function chain (SFC) establishment, and virtual network embedding (VNE). Rafq et al. proposed a supervised learning method for SFC trafc prediction. Te method uses GNN to map the input historical trafc to the output predicted trafc and adjust the resource allocation accordingly. Te graph neural network of this model trains 2 functions: the transfer function of the point and the output function. Te transition function of node n inputs the features of n, all adjacent edge features, all adjacent node features and states, and outputs the state of node n. Te output function computes nodes based on the state and characteristics of the point's output [39]. Li et al. [40] used GNN to predict NFV resource requirements so as to obtain advanced information about upcoming requests and improve the efectiveness of SFC reconstruction algorithms based on deep reinforcement learning. Network trafc migration is also an important branch of dynamic resource allocation. Sun et al. [41] proposed a method for NFV network trafc migration using GNN and deep reinforcement learning. Te method maps the input network topology to the output network topology after migration, which is used to realize the expansion of network trafc, reduction, and load balancing. Te essence of the SFC dynamic resource allocation problem is the transformation of the topology structure, the optimization goal is the total endto-end delay, and there are no complex constraints, so it is easy to use GNN to solve.
Te virtual network mapping problem is similar to the SFC establishment problem, but the network request and resource constraints are more complicated. VNE problems are divided into node mapping and link mapping. Te existing methods of GNN to solve the VNE problem mainly focus on the node mapping. Habibi et al. [42] proposed a method of using GAE to assist VNE physical node classifcation. Te input of the model is the adjacency matrix and the resource feature matrix, and a supervised learning model that can reconstruct the network topology is trained through the graph neural network. Yan et al. [43] proposed to use of GCN combined with deep reinforcement learning to complete the node classifcation task. Tis method uses actor-critic reinforcement learning, in which GCN is used to extract physical node features, and the features extracted from physical nodes and virtual network requests are fused through feed-forward neural networks (FF), and fnally, the probability of mapping nodes is obtained. In fact, for the SFC establishment and VNE problems of large-scale complex networks, considering the complexity of node and link resources and optimization objectives, graph neural networks are powerful tools for extracting Method Description GAE GAE is an unsupervised learning framework that can convert graph structures into low-dimensional vectors and reconstruct graph structures using encoded information and is often used for graph embedding (GE) and graph structure generation [26]. Graph embedding is a graph representation learning (GRL) method that aims to map graph structure data into low-dimensional dense vectors while preserving node information. Graph embedding enables graph-structured data to be more efciently applied to traditional machine learning algorithms to achieve better results in tasks such as recommendation and classifcation GAT GAT introduces an attention mechanism on the basis of GCN, which enables the model to focus on the information most relevant to the current task, thereby improving the performance of the model. In spectral domain-based GCN, the flter function depends on the Laplacian matrix, which is derived from the graph structure, which makes the model trained on a specifc graph cannot be directly applied to other graph structures GCN GCN introduces convolution operation into graph structure and is one of the most important GNNs at present. According to the diferent feature extraction methods, it can be divided into GCN based on spectral-domain and graph convolution network based on the spatial domain STGNN RNN-based STGNN captures spatiotemporal correlations through graph convolution. Compared with the RNN-based method, the CNN-based STGNN processes the spatiotemporal graph in a nonrecursive way, which can perform parallel computation and avoid the problem of gradient explosion or gradient disappearance, such as CGCN topological information, with the potential to provide faster and more optimized solutions. With the rapid development and application of technologies such as 5G, the Internet of Tings, and edge computing, the problem of resource allocation in wireless networks has become more and more important. Trough the efective allocation of resources, various optimization goals can be achieved in diferent application scenarios, and the utilization rate of network resources can be improved. Te problem with wireless power control is how to determine the transmit power of each transmitter so that the network can achieve the overall optimal signal-to-noise ratio. Its basic model is an optimization problem with constraints. Te optimization target is the weighted sum of the ratio of signal to interference plus noise, and the constraint is the transmit power of the base station or equipment. Shen et al. [44] proposed to represent the multiuser wireless channel with a complete graph and used GNN to solve the power control problem. Te node of the complete graph is a transceiver pair, and the node features include the direct channel state and weight; the link of the graph is the interference channel, and the link feature is the interference channel state. Te method trains the transfer function and output function through GCN to output the optimal transmit power of each transmitter. Considering the situation of base stations and users in practical problems, Guo and Yang [45] proposed a method to solve the power control problem in heterogeneous networks. Te nodes of this model include two kinds of heterogeneous nodes: base station and user. Heterogeneous nodes use diferent transfer functions and use parameter sharing to obtain output results. Te wireless power control problem is not an intuitive graph structure problem, so it is necessary to transform the problem into a graph structure through modeling, and then use the GNN model to solve it.
Nakashima et al. [46] used GCN based on deep reinforcement learning to extract the features of the channel vector with topological information, and then generate the channel deployment strategy. Tis method can perform channel allocation in densely deployed wireless local area networks, thereby improving system throughput. Yan et al. [43] proposed an energy-saving topology control algorithm based on GCN, which uses GCN to imitate the maximum spanning tree algorithm for link prediction, and introduces new edges into the topology according to the probability graph, which optimizes the wireless network in 5G and B5G environments.
Network modeling and performance analysis is a fundamental problem to realize an efcient communication network. As mentioned previously, GNN can be used for resource optimization of wired and wireless networks. Various resources in the network are allocated to devices through optimization strategies. Terefore, an efcient network model is urgently needed to evaluate the quality of resource allocation. Rusek et al. [47] proposed RouteNet, which uses GNN to accurately evaluate the end-to-end delay and packet loss of network paths. RoutNet takes network topology, trafc matrix, and end-to-end path as input, and outputs performance evaluation indicators (delay, jitter, packet loss, etc.) according to the network state. RouteNet contains a multilayer information transfer neural network, uses RNN as the transfer function, compresses the link and path information into the hidden state vector, and fnally obtains the evaluation index value of the path through the output function. RouteNet is used for the following two example problems: (1) routing optimization based on network delay and packet loss and (2) network topology upgrade with budget constraints.
Routing is an ancient and core optimization problem in the feld of communication networks. Artifcial intelligence algorithms have been used for network routing. Geyer et al. [48] proposed to use of GNN to learn distributed routing algorithm. Tis method abstracts the router interface as a point in the topology and uses GNN to train the hidden node information so that each node has a local representation of the graph topology. Tis method is a rare distributed-oriented GNN application. Secure network communication relies heavily on encrypted network trafc, and it can help protect sensitive data and maintain its integrity. However, it obscures the data's characteristics, makes it more difcult to identify malicious trafc, and shields such activity from detection. Consequently, encryption alone cannot guarantee fundamental information security. It is also important to keep an eye out for suspicious activity by monitoring trafc. Trafc classifcation methods based on statistical features and graphs are currently the most widely used. Te limitations of these two approaches make them unreliable when used to detect malicious trafc that is encrypts its contents. Prior to this, the external connections between the network fows were not considered at all in the analysis. Te latter, on the other hand, is completely the opposite. GCN model called GCN-ETA is proposed by [49] which considers the statistical features (internal information) of network fows and the structural information (external connections) between them to identify malicious trafc. GCN-ETA has two parts: an improved GCN feature extractor and a decision tree classifer. It is possible to enhance the efectiveness and speed of detecting malicious encrypted trafc by modifying the traditional GCN, and this can be used as a model for the implementation of GCN in similar scenarios [50]. Te design of poisoning-resistant graph neural networks is extremely difcult, and several attempts have been made in the past. Existing research attempts to reduce the negative impact of adversarial edges only with the poisoned graph, which is suboptimal because they fail to distinguish between adversarial and normal edges. Tang et al. developed PA-GNN relies on a penalized aggregation mechanism that directly restricts the negative impact of adversarial edges by assigning them lower attention coefcients [51]. Pan et al. proposed the trafc classifcation method using GCN and LSTM, which low labeled sample for model classifcation and getting better accuracy using GCN [52].

GCN in Medical Imaging.
Coronavirus 2019 (COVID-19) disease chest computed tomography (CT) scans are typically derived from multiple datasets gathered from International Journal of Intelligent Systems various medical centers, with images sampled using a variety of acquisition protocols. However, while combining datasets from multiple sites increases sample size, it is hampered by intercenter heterogeneity, which makes comparisons difcult. Te following steps are proposed by Song et al. [53] for diagnosing COVID-19 using an augmented multicenter graph convolutional network to address this issue: (AM-GCN). Amgen's convolutional neural network (AM-GCN) extracts features from initial CT scans using a 3-D CNN, which is supplemented by a ghost module and a multitask framework to improve the network's performance. Tis study uses the extracted features to construct a multicenter graph that takes into account intercenter heterogeneity, as well as the disease status of training samples, as described in the following section. In addition, the AM-GCN algorithm employs an augmentation mechanism to increase the number of training samples, resulting in an augmented multicenter graph. Tis method achieved a mean accuracy of 97.76 percent based on 2223 COVID-19 subjects and 2221 normal controls from seven medical centers.
Given the high cost of exhaustively annotating 3D data, a more sustainable approach would be to develop diagnosis algorithms using only patient-level labels. Chen et al. [54] proposes the Instance Importance-aware GCN (I 2 GCN) for multi-instance learning (MIL), motivated by the fact that 2D slices of 3D data exhibit explicit diagnostic efcacy. To be more precise, this study begins by calculating the instance importance of each slice for diagnosis using a preliminary MIL classifer, which is then used to promote the refned diagnosis branch. Create the instance importance-aware graph convolutional layer (I 2 GCLayer) in the refned diagnosis branch to exploit complementary features in both importance-based and feature-based topologies. Additionally, to address the defcient supervision of 3D datasets, the importance-based subgraph augmentation (SGA) technique was proposed to efectively regularize framework training.
Zhu et al. [55] developed Interpretable Dynamic GCN (IDGCN) to enhance the performance of personalized alzheimer's disease diagnosis and to generate interpretable results. Tis is accomplished through the incorporation of interpretable feature leaning and dynamic graph leaning into a GCN architecture. More precisely, interpretable feature learning ensures that diagnosis results are interpretable, and preclassifcation ensures that selected features are classifcation-oriented. Additionally, by adjusting the similar and dissimilar correlations of all objects, dynamic graph learning dynamically updates the graph structure for GCN to produce superior diagnosis results. Tus, by optimizing feature learning, graph learning, and the GCN simultaneously, the proposed disease diagnosis method not only generates reliable personalized diagnoses but also provides interpretability for diagnosis results. Similarly, Jiang et al. [56] proposed a hierarchical GCN framework (called hi-GCN) to learn the graph feature embedding, while considering the network topology information and subject's association at the same time. Memory, thinking, behavior, and emotion are all afected by dementia, which is a term used to describe progressive brain syndromes. A dementia patient's ability to carry out everyday tasks may deteriorate, and they become increasingly dependent on their caregivers. As a result, spotting the early signs of cognitive decline and alerting caregivers and doctors would be benefcial. Arfoglu et al. [57] used GCN to recognize activities and fag abnormal behavior related to dementia. Figure 6 shows a standardized approach implemented by diferent studies for the classifcation of medical images. For medication recommendation and lab test imputation, Mao et al. [58] developed MedGCN, a machine learning framework based on MedGraph. Te framework can be applied to a wide range of medical procedures. MedGCN built a graph to associate four diferent types of medical entities, namely patients, encounters, lab tests, and medications, and then used a graph neural network to learn node embeddings for medication recommendation and lab test imputation. Shi et al. [59] proposed a new method called cell-graph convolutional neural network (CGC-Net) that converts each large histology image into a graph in which each node is represented by a nucleus within the original image and cellular interactions are denoted as edges between these nodes based on node similarity. To improve the algorithm's performance, the CGC-Net employs nuclear appearance features in addition to spatial node location. Zhang et al. [60] proposed a BDR-CNN-GCN model using batch normalization with CNN and GCN to get an accurate classifcation of breast disease. Yin et al. [61] create a novel multi-instance deep learning method for building a robust classifer by treating multiple 2D ultrasound images of each individual subject as multiple instances of the same bag. Convolutional neural networks (CNNs) are used in this method to learn instance-level features from 2D US kidney images, and GCNs are used to further optimize the instancelevel features by exploring potential correlations among instances of the same bag. Tis study also use full-connected neural networks (FCNs) to learn bag-level features using gated attention-based MIL pooling. Table 2 gives a detailed comparison of diferent approaches using graphs.

GCN for Recommendation and Prediction.
In today's web platforms and applications, recommender systems are widely used as important tools to alleviate information overload and improve user experience. Tey are now widely deployed. It is a hot topic right now to take more user preferences into account when making recommendations. Although the systems often choose "click" or "purchase" as the optimization target in real-world information systems, there are also various types of user behaviors, such as view and add-to-cart. Users have the option of viewing, adding to a cart, and ultimately purchasing any item. In order to create a more precise recommender system, data on a user's diverse behaviors is crucial. Before this study, researchers would typically use a default value (i.e., "other") to represent the missing attribute, which resulted in suboptimal performance. In order to address this issue, Liu et al. propose an attribute-aware attentive graph convolution network that is both fast and accurate (A2-GCN) [20]. Te A2-GCN algorithm, in particular, begins by constructing a graph in which nodes represent users, and items represent attributes.
Following that, A2-GCN makes use of the graph convolution network to characterize the complex interactions among the participants. Tis model also employs the message-passing strategy to aggregate the messages passed from the other types of nodes that are directly linked in order to learn the node representation (see Figure 2) (e.g., a user or an attribute). Guo et al. used a similar approach and developed a domain-aware GCN (DA-GCN) model, which links users and items in each domain as a graph [67]. Shehnepoor et al. used GCN for a recommendation of fraudster detection in rating the user profle and proposed a HIR-RNN algorithm [68]. Tis algorithm performs two tasks i.e., prediction of user rating and fraudster detection based on user behavior.
Knowledge graphs (KGs) when combined with a recommendation system are helpful for providing the explainable recommendation. Ma et al. proposed knowledgeaware reasoning with graph convolution network (KR-GCN) which integrates user-item interactions and knowledge graphs into a heterogeneous graph, which is performed with the GCN [69]. Social data are much more important for getting a recommendation about the product, and Yu et al. proposed an enhanced social recommendation system based on GCN, which solves the problem of limited neighbors, noisy social relationships, and heterogeneous neighbors [70]. Tis model use an autoencoder to augment the data by encoding the high-order and complex connectivity patterns [71]. To extract the relationship between indirect instances between users and items, hamming similarity model is proposed by Liu et al. named hamming spatial graph convolutional networks (HS-GCNs). Xiao et al. proposed a GCN model for a recommendation system using a deep graph neural network named DeepFM graph convolutional network (DFM-GCN) [72]. DFM-GCN is mainly focused on solving the problem of cold start and data sparseness, which is solved by getting the interactive information between the nodes and the representation of items as vector nodes in GCN. Tere are many types of interaction data that can be generated by users, but traditional studies on recommender systems tend to focus on just one type of user behavior (the optimization target, for example, purchasing) (e.g., view, click, add-to-cart, and so on). Well-structured information can be derived from heterogeneous multi-relational data, and this information can be used to make excellent recommendations. As a result, early attempts to leverage these heterogeneous data fail to capture the high-hop structure of user-item interactions, which are insufcient to make full use of them and may only deliver limited recommendation performance. Graph heterogeneous collaborative fltering (GHCF) explores the high-hop heterogeneous user-item interactions; this study takes the advantages of graph convolutional network (GCN) and further improves it to jointly embed both representations of nodes (users and items) and relations for multirelational prediction data sparsity issue is further solved by Tang et al. by developing multigraph collaborative fltering (DMGCF) model to mine and reuse side information. Tis method generates multiple graphs with a dynamic evolution mechanism to simulate side information for better performance, especially when side information is unavailable [73].
Monti et al. [74] combined a multi-graph convolutional neural network with a recurrent neural network, in which the multigraph convolutional neural network was used to extract locally stationary features, and the recurrent neural network could difuse the fractional values and reconstruct the matrix. Zhang et al. [75] modeled the recommender system as a link prediction problem on graphs and proposed a graph self-encoding framework based on diferent message propagation to model the bipartite graph of the recommender system and achieved the best results on data including social networks good result. Yang et al. [76] applied convolutional neural networks to recommender systems and proposed a data-efcient graph convolutional neural network algorithm MultiSage to generate embedded representations for commodity nodes. Tese expressions contain graph structure and node feature information. Compared with the traditional graph convolution method, it proposes an efcient random walk strategy to model convolution, designs a new training strategy, and successfully integrates graph convolution neural networks. Te network is applied to a superlarge-scale recommendation system with 1 billion nodes. Wang et al. [77] proposed the RippleNet framework, which introduced knowledge graph information to improve the performance of the recommender system. Liao et al. [78] proposed the SocialLGN framework, which includes three  International Journal of Intelligent Systems  parts: user modeling, commodity modeling, and scoring prediction. Using the attention mechanism, the user's interaction information and the user's social network information are efectively modeled. Te graph convolutional neural network is considered to be able to model the structural attributes and node feature information of the graph well, and the recommendation system can be regarded as either a matrix completion problem or a bipartite graph (users and items) for the link prediction problem. Compared with traditional methods, graph convolutional neural networks can better utilize the information of user attributes and product attributes that are ubiquitous in recommender systems, which is why graph convolutional neural networks can attract widespread attention in recommender system tasks. Table 3 highlights the latest progress in recommendation and prediction using the graph-based methods.

GCN for Hyperspectral Data.
Hyperspectral techniques have been greatly improved by the rapid growth of optics and spectroscopy. A considerable quantity of important information can be captured using hyperspectral images, which are images that contain many contiguous bands. It has been used in a variety of disciplines, including military target identifcation, vegetation monitoring, and disaster prevention and control throughout the past few decades [4]. Various algorithms have been proposed so far for categorizing the pixels of a hyperspectral image into specifc landcover categories. Te early-stage approaches rely heavily on traditional pattern recognition methods such as K-nearest neighbor classifers and linear classifers. K-nearest neighbor has been frequently employed among these traditional methods due to its simplicity in both theory and practice. With high-dimensional hyperspectral data, support vector machine (SVM) works stably and satisfactorily. Te aforementioned approaches, however, are all based on constructed spectral-spatial properties that rely largely on professional skill and are extremely empirical. Deep learning is being used to solve this faw. Mou et al. were the frst to use a recurrent neural network (RNN) to classify hyperspectral images [81]. Convolutional neural network (CNN) has recently emerged as a potent method for hyperspectral image classifcation, and Lou et al. developed a high-performance novel HSI classifcation algorithm based on CNN. Figure 7 shows the basic implication method in diferent studies.
It's been shown that convolutional neural networks (CNNs) are excellent at representing and classifying hyperspectral images. Convolution can only be performed on normal square image regions with fxed sizes and weights in traditional CNN models, so they cannot generically adapt to unique local regions with varying item densities and geometric appearances. Tey must be improved in classifcation, especially at class boundaries [82]. To address this shortcoming, Luo et al. [83] propose using the recently proposed graph convolutional network (GCN) for hyperspectral image classifcation, as it can perform convolution on arbitrarily structured nonEuclidean data and is applicable to irregular image regions represented by spatial graph information. Mou et al. [84] proposed a graph-based semisupervised network called nonlocal-GCN. Unlike existing CNNs and RNNs, which take pixels or patches of a hyperspectral picture as input, this network takes the entire image (including both labeled and unlabeled data) into account. To be more specifc, a nonlocal graph is initially computed. To extract characteristics from this network representation, a pair of graph convolutional layers are used. Finally, the network's semisupervised learning is accomplished by employing a cross-entropy error over all labeled occurrences. Ding et al. [32] adopt graphSAGE for feature extraction in local regions of the graph, which is helpful in getting more accurate and efective information about nodes. Using MSAGE-CAL based attention method with graphSAGE improves the classifcation accuracy of HSI.
Guo et al. [79] found that GCN models are shallow and feature extraction is not efective. To solve this issue, DGU-HSI is proposed. Tis DGU-HSI constructs two separate graphs for spatial and spectral data for feature extraction, which extract features simultaneously. Once the feature is extracted, the graph u-nets is used for the fusion of features for classifcation. Yang et al. [85] also used a similar approach using the spectral (Se-GCN) and spatial (Sa-GCN) data to develop an adaptive cross-attention-driven spatial-spectral graph convolutional network (ACSS-GCN). Tis model is improved by using an attention mechanism in both the blocks of spectral and spatial information. Qu et al. [86] used frst-time GCN for change detection in HSI data and proposed dual-branch diference amplifcation GCN (D2AGCN) which is highly efcient in low samples of data. Te dual-branch structure can efectively extract sufcient diferent features to facilitate the detection of the changed areas.
GCN is used by the methods described previously to investigate large-range spatial relations of HSI, whereas local spatial information is more important when training samples are limited. S2RGANet (spectral-spatial residual graph attention network), a novel method for HSI classifcation that addresses the shortcomings mentioned previously, has been developed to address these issues. Te spectral residual modules in the S2RGANet are designed to extract spectral discriminative features, while graph attention convolutions are introduced to explore the local geometric structure. In contrast to existing GNNs, which are designed to learn largerange spatial relations between samples in HSI, the proposed graph convolutions are designed to capture the distribution pattern of land cover in a given local patch of ground [86]. Sha et al. [87] used GAT for HSI classifcation by using diferent weights for diferent nodes according to their attention coefcients during the convolution process. Table 4 highlights the latest progress in hyperspectral data classifcation using the graph-based methods.

GCN for Computer Vision.
With the development of science and technology, image processing technology has been presented to everyone's feld of vision. Among them, the content covered is relatively extensive, and the image information can be optimized, and the corresponding image recognition, detection, data encoding processing, enhanced restoration, and other work can be completed. It can not only transmit the information that the people need to obtain but also penetrate into all aspects of work production. For example, the felds of transportation, agriculture, communication technology, and aviation all require the support of image processing technology. AI is playing a vital role in image processing and helping in diferent tasks of image processing efciently such as image segmentation, change detection, denoising, image enhancement, and 3D images. After traditional approaches such as SVM, KNN, and so on. CNN provides a wide range of applications in image processing but currently, the research is diverting towards GCN because of better results and low complexity of calculation. GCN research in image processing is extensive in remote sensing images, medical images, 3D images, etc., with different types of images denoising and image enhancement techniques.
Saha et al. [90] used GCN for developing a change detection mechanism in remote sensing images. Tis semisupervised CD method encodes multitemporal images as a graph via multiscale parcel segmentation that efectively captures the spatial and spectral aspects of the multitemporal images. Ismail et al. [91] proposed a BLDNet algorithm for estimating the damage detection in the building caused by disasters or earthquakes. Te model is based on Siamese CNN with trained GCN in semisupervised to get the predictions of disasters carried out by earthquakes. In recent years, deep learning-based image denoising methods have outperformed traditional denoising techniques. In order to train a convolutional neural network to infer clean images, most deep learning-based image denoising methods use cropped small patches. However, in practice, real-world noisy images tend to be of high resolution rather than the cropped small patches, and the vanilla training strategies ignore the cross-patch contextual dependency in the whole image. Li et al. [92] used cross-patch GCN with the help of CNN to perform denoising of the image, and the results show that denoising is 95% accurate. Shen et al. [93] extended the denoising work using GCNs and proposed a novel approach, GCN-Denoiser, which preserves features of mesh denoising and performs graph convolution operations in the dual space of mesh triangles.
Remote sensing (RS) image semantic segmentation, as the fundamental task of GIS, serves as the foundation for other RS research and applications such as natural resource protection, land cover mapping, and land-use change detection. Despite receiving signifcant attention over the last decade, semantic segmentation of high-resolution RS images remains difcult due to the complexity of structure in RS images, which leads to interclass similarity and intraclass variability. Ouyang et al. [94] proposed a DSSN-GCN framework that combines deep semantic segmentation with GCN. In this framework, an attention residual U-shaped network (AttResUNet) is used as a feature extractor while graph nodes are denoted by the superpixels, and the graph weight is calculated by considering the spectral information and spatial information. Kim et al. [95] proposed the Split-GCN model which outlines the objects in by similar nodes features and highlights them in a specifed region. Tis model consists of two parts: an encoder (feature extraction network) to extract the boundary information of an object and a decoder (novel graph composition network) to capture the shape of an object. Te model used the polygon-based approach to detect the object boundary in uniform spaced points.
Computer vision is a long-running research topic because it can perceive and recognize the world without the aid of humans by gathering data from sensors. Reverse engineering, intelligent surveillance, and remote sensing all rely on target recognition as a critical component of their respective applications. Unmanned systems and augmented reality are examples of practical application scenarios where three-dimensional (3D) object recognition is more relevant than two-dimensional (2D) target recognition. Zhan et al. [96] proposed a 3D point cloud model named minimum bounding box oversegmentation GCN (MBBOS-GCN). Tis model uses a minimum bounding box algorithm, and the farthest point sampling (FPS) algorithm is used to sample within each small region to reduce sampling randomness, with an accuracy of the model being more than 90% for segmentation of the 3D objects. Wang et al. [97] used GCN models for activity recognition in 3D space and proposed a spatial-temporal graph convolutional network (ST-GCN) model. In this method, semantically close point data are treated as neighbors, and using the motion capture data, a graph was created as follows: the intrabody edges between skeleton areas are defned based on the natural connections in human bodies; the interframe edges connect the same skeleton areas between consecutive frames. For 3D shape analysis, Wei et al. [98] proposed a model named View-GCN which can 3D shape based on a graph representation of multiple views infexible view confguration. Tis model is a fexible model with diferent view confgurations, e.g.,  18 International Journal of Intelligent Systems cameras located on circles, corners of dodecahedron, or even irregular positions around objects. Second, by using viewgraph representation, this study can take advantage of GCN to aggregate multiview features considering the relations of graph nodes. Table 5 highlights some latest innovations for image processing using graphs.
3.6. GCN for Transport and Trafc System. Te trafc prediction problem is also one of the tasks in which graph convolutional neural networks are widely used. Its purpose is to predict the rate of future trafc given the historical trafc speed and route map. In the trafc prediction problem, nodes represent sensors placed on the road, and edges represent the physical distances of node pairs. Each node contains a temporal feature. Compared with traditional graph analysis problems, the trafc prediction problem includes both time and space modeling, and how to use the graph convolutional neural network to better model the road network in trafc brings opportunities and challenges. Li et al. [101] proposed a difusion convolutional recurrent neural network (DCRNN) to model trafc forecasting. In this model, it regards trafc fow as a difusion problem on a directed graph and proposes to use difusion convolution to model graph-structured data. Use recurrent neural networks to model time dependencies. It achieves a 12%-15% improvement on two large-scale road network trafc datasets. Cui et al. [102] proposed a trafc graph convolutional long short-term memory network (TGC-LSTM) to learn road networks and time-varying trafc patterns. It defnes the graph convolutional neural network on the physical road network topology. Te experimental results show that the method can capture the complex spatiotemporal dependencies efectively existing in the vehicle trafc network. Zhang et al. [1] proposed a graph gate recursive unit (GGRU) to solve the trafc fow prediction problem. Tey applied the graph gate recursive unit to the encodingdecoding model of the recurrent neural network and applied it to the Los Angeles highway data set. Zhang et al. [38] proposed a new deep learning framework, a space-time graph convolutional neural network (STGCN), to solve the problem of time series prediction in the trafc feld. In this framework, it frst formalizes the problem onto a graph using convolutional structures for modeling, which achieves signifcant improvements over traditional machine learning methods in short-term and mid-to long-term trafc prediction due to better utilization of topology.
Zhu et al. [103] developed AST-GCN for modeling trafc forecasting for intelligent transportation systems. Tis model uses external factors as dynamic attributes and static attributes and designs an attribute-augmented unit to encode and integrate those factors into the spatiotemporal graph convolution model and perform trafc speed prediction. In another approach, Zhu et al. [104] proposed BRBbased RNN-GCN model for trafc fow prediction, which solves the existing issues of trafc fow prediction models such as saturation or speed. In the scenarios related to trafc prediction, how to solve the spatiotemporal dependence is an important research direction. Since the graph convolutional neural network provides a solution to the modeling of graph data problems, it combines with time series models such as recurrent neural networks to give a good solution to the modeling trafc forecasting problem is presented [70]. However, how to further fne-grained consideration of spatiotemporal data modeling is still the focus of future research.

GCN for NLP.
Graph convolutional neural networks have a large number of applications in the feld of natural language processing. In this feld, the more common graph data are knowledge graphs, syntactic dependency graphs and abstract meaning expression graphs, word cooccurrence graphs, and graphs constructed by other methods. Entity relation extraction (RE) is a method of encoding the meaning of a sentence as a rooted directed graph [105]. Sun et al. [106] applied graph convolutional neural networks to dependency syntax trees for machine translation tasks in English and German, and English and Zhou et al. [107] used a graph convolutional neural network for event extraction, where the graph used here is a dependency syntax tree. Table 6 provide a further description of the methods: In addition to the previous graphs, word cooccurrence networks have also been applied to text classifcation tasks. Where nodes are nonstop words, and edges are word cooccurrence relationships in a given window. Deferard et al. [112] proposed a convolutional neural network defned in graph theory, which provides the necessary mathematical background and an efcient numerical scheme to design fast local convolutional flters on graphs. Reference [113] used a weighting approach with GCN for the categorization of text. Pal et al. [114] used graph convolutional neural networks for text classifcation tasks on the Reuters dataset. Yao et al. [115] applied a graph convolutional neural network to a text classifcation task by constructing a coword network and document relation network, and achieved the best results without using external knowledge and word representation.
A large number of studies have shown that the results of various natural language processing tasks have been improved to a certain extent after using the graph convolutional neural network model [116]. Te use of graph structure enables the complex semantic relationship between objects to be efectively mined. Compared with the traditional serialization modeling for natural language processing, the use of graph convolutional neural networks can mine nonlinear complex semantic relationships.

GCN for Bioinformatics.
In addition to the traditional modeling of graph data, graph convolutional neural networks have also received a lot of attention from researchers in felds such as biochemistry. Compared with traditional graph data research, in the feld of biochemistry, people usually regard a chemical structure or a protein as a graph, the nodes in the graph are smaller molecules, and the edges represent bonds or interactions. Figure 8 is a molecular graph of medicine, where the nodes are carbon, hydrogen, and oxygen atoms, and the edges are chemical bonds. Researchers focus on the chemical function of a graph, that is, the object of study is no longer the nodes in the graph, but the entire graph itself.
Intracellular interactions are the focus of most methods for determining gene-gene interactions from expression data. High-throughput spatial expression data enables methods that can infer such interactions between cells and within cells. Yuan et al. [117] developed graph convolutional neural networks for Genes were developed to accomplish this (GCNG). It uses supervised training to combine spatial information with expression data. Prior methods for analyzing spatial transcriptomics data have been improved by GCNG, which can propose new extracellular interacting gene pairs. Upstream analyzes, such as functional gene assignment, can make use of the GCNG output. It is one of the primary goals of genomic medicine to identify the genetic variations in a patient that are responsible for their clinical phenotypes and to determine their relationship to those phenotypes. Only the genotype information is taken into consideration when prioritizing genomic variants, which results in the identifcation of a few hundred potential variants on average. It is still a signifcant challenge to narrow it down even further in order to identify the disease genes that are responsible for the clinical phenotypes observed. Tis is especially true for rare diseases. Motivated by the recent progress in spectral graph convolutions. Rao et al. [118] developed the graph convolution-based technique HANRD (Heterogeneous Association Network for Rare Diseases) to infer new phenotypegene associations from this initial set of associations.
Predicting chemical compounds is one of the fundamental tasks in bioinformatics and cheminformatics because it contributes to various applications in metabolic engineering and drug discovery. Harada et al. [119] proposed a new graph convolutional neural network architecture called a dual graph convolutional network that learns compound representations from both the compound graphs and the intercompound network in an end-to-end manner. For the prediction of DNA protein, Zhang et al. [120] build a sequence k-mer graph for the whole dataset based on the kmer cooccurrence and k-mer sequence relationship and then learn DNA graph convolutional network(DNA-GCN) for the whole dataset. It has not yet been thoroughly investigated whether advanced graph network methods can be used to identify functional protein complexes from the proteinprotein interaction networks (PPIs) at a high level. To improve the detection of protein complexes, Zaki et al. [121] proposes a variety of graph convolutional network (GCN) methods. A node classifcation problem is frst formulated as a protein complex detection problem. Once the model is developed and a complex afliation matrix is in place, this model will be able to use it to group the nodes (proteins). In addition, a multiclass GCN feature extractor and a mean shift clustering algorithm are used to extract the nodes' features and perform clustering. Furthermore, applications are in Table 7: Appropriate gene prioritization is critical for genomebased diagnostics of a variety of genetic diseases. However, it is a difcult task that requires a limited and noisy understanding of genes, diseases, and their associations. While several computational methods have been developed for the task of disease gene prioritization, their performance is largely constrained by manually crafted features, network topology, or predefned data fusion rules [127].
Li et al. [50] defne convolutional neural networks directly on graphs. Te neural network model inputs molecules of any size or shape and learns molecular fngerprints end-to-end. Te model can better help to realize the molecular design of specifc functions. Torng et al. [128] used a graph convolutional neural network to encode atoms, bonds, and distances, which can better utilize the information in the graph structure. It provides a new paradigm for ligand-based virtual screening. Gilmer et al. [129] proposed a message propagation model MPNNs to predict the chemical properties of a given molecule. Zitnik et al. [130] used graph convolutional neural networks to model multiple drug side efects. It frst constructs multimodal maps of protein-protein interactions, drug-protein target interactions, and multiple drug interactions. In the graph, each side efect is treated as a diferent type of edge. Furthermore,  [109] applied graph convolutional neural networks to reading comprehension, and the abstract meaning graph is applied to tasks such as text generation and relation extraction. Te task of semantic role labeling (SRL) is to give a sentence and identify the predicate and the corresponding object in the sentence GCN with LSTM Marcheggiani et al. [110] proposed to use of a graph convolutional neural network to act on the syntactic dependency graph, and stack it with a long-short-term memory network Dictionary-based learning using ST-GCN Ye et al. [111] developed short text GCN (ST-GCN) in which diferent topic models for short text are employed, and a short text short-text graph based on the word cooccurrence, document word relations, and text topic information, is developed the modeling of side efects with drugs is transformed into a link prediction problem, which provides a new research idea for further study of pharmacology. Xiao et al. [131] and Reau et al. [132] proposed to apply graph convolutional neural network to protein interaction prediction. In this task, proteins are chains of amino acid residues that fold into three-dimensional structures that give them biochemical functions. Proteins exert their functions through complex networks of interactions with other proteins. You et al. [133] proposed the graph convolutional policy network (GCPN), a model based on general graph convolution and reinforcement learning to generate target graphs. In this model, the hidden state is expressed as a node by means of message propagation, and then a policy π is generated.

Prospects for Future Research Directions and Limitations of GCN
Although graph convolutional networks have succeeded in recent years, there are still some unsolved problems and directions worthy of further research.

Deep Network Structure.
After stacking a large number of network layers, the traditional deep learning model has achieved remarkable results in many problems due to its powerful representation ability [38]. However, in the graph convolutional neural network model, after stacking a small number of layers, the network achieves the best efect. Adding graph convolutional layers simultaneously will make the results worse. Because graph convolution includes aggregating the features of neighbor nodes, when the network stacks multiple layers, the components between nodes are too smooth and lack discrimination. Te experimental results of GCN show that when the number of network layers exceeds two layers, with the increase in the number of layers, the efect of GCN on the semisupervised node classifcation problem will decrease instead [134]. At the same time, with the continuous superposition of the network, eventually, all nodes will learn the same expression. Whether the graph neural network needs a deep structure or whether a deep network structure can be designed to avoid the problem of excessive smoothness is an urgent research problem to be solved.

Multiscale on-Graph Tasks.
Graph mining tasks can be divided into node-level problems, graph and subgraph-level problems, and signal-level problems according to the diferent main objects. Te critical point of node-level tasks is to learn efcient representations for each node, while learning representations for graphs is the key to graph-level tasks [135]. Te key points of the signal-level task are learned to express efectively for diferent graph signals, while the network structure is unchanged. At present, most graph convolutional neural networks are designed for node-level tasks, and less attention is paid to graph-level and signallevel tasks.

Dynamically Changing Graph Data.
In practical scenarios, the network is often dynamic. Tis dynamic includes the characteristics of nodes and edges that are constantly changing over time and the structure of the network that is constantly changing (there are new edges, nodes join the network, and nodes and edges disappear from the network) [136]. Considering the dynamics of the network is also the trend of graph mining algorithms. Te current graph convolutional neural networks are designed for static networks, so designing a graph convolutional neural network that can model the dynamic changes of the network is also an important direction in the future.

Te Complex Nature of Graph Data.
In practical scenarios, networks often have complex structural characteristics. For example, the types of nodes, the complex features on the edges, the community structure of the network, and so on. Although many works have proposed some solutions, they are all networks designed for a certain characteristic [137]. Whether a network can be designed to simultaneously model various complex characteristics of the network is also a question worthy of discussion. GCN employs mean pooling. As a result, it will be impossible to distinguish aggregation on, say, the 2 diferent multisets (a,b) and (a,a,b,b). Mean-pooling will produce the same estimate for both multisets, so it is not injective. Because of meanpooling, GCN will be unable to distinguish between nodes receiving messages from two other nodes and nodes receiving messages from four other nodes. Te structural distinction is not distinguished here [133,[138][139][140].

Adversarial Attacks on Graph Neural Networks.
Neural networks shine in various tasks but still have instability problems. For example, adding a certain amount of noise to the picture will not change the type of the picture to the human eye, but the neural network has already judged it as other types. Designing a targeted sample to make the machine learning model make a misjudgment is called an adversarial attack. In the feld of GNN, constructing adversarial samples using the characteristics of nodes and network structure and designing a graph neural network that can defend against adversarial attacks are all important directions for future development.

Conclusion
Graphs are a powerful and rich structured data type with strengths and challenges that difer greatly from images and text. We have outlined some of the milestones that researchers have reached in developing neural networkbased models that process graphs in this study. We have gone over some of the key design decisions that must be made when employing these architectures, and hopefully, the GNN playground can provide some insight into the empirical outcomes of these decisions. Te recent success of GNNs opens the door to a wide range of new problems, and we are excited to see what the feld will bring. Te key points of this survey include the following points: Supervised learning is mostly used for trafc/resource/index prediction, node classifcation, and other issues; reinforcement learning is mostly used for path selection, topology transformation/mapping, and other issues. (iii) Te existing application goals mainly focus on the tasks of nodes. Te output features are mostly the features of nodes or the overall indicators of the network and are rarely used for linking tasks. (iv) Almost all existing applications are based on centralized learning, and it is necessary to obtain the information of all nodes before learning.  [122] proposes a novel graph convolutional network-based method for disease gene prioritization, CADA, based on the systematic embedding of the heterogeneous network formed by genes and diseases, as well as their unique characteristics HPOFiller's Liu et al. [123] proposed HPOFiller based on GCN for human phenotype ontology. HPOFiller's central idea is to repeatedly run these two GCN modules over the three networks in order to refne the embeddings. Extensive empirical evidence demonstrates that HPOFiller signifcantly outperforms all other state-of-the-art methods, including cross-validation and temporal validation, demonstrating that HPOFiller outperforms all other state-of-the-art methods. Te ablation study demonstrates that batch normalization makes the greatest contribution to performance Graph Attention Model (GAM) Lee et al. [124] proposed a graph attention model (GAM), which adaptively selects some "information nodes" and collects the information of the entire graph for the problem of graph classifcation. Such et al. [125] directly defned the flter as a polynomial function of the graph adjacency matrix and proposed a Graph-CNNs model, which can handle heterogeneous and homogeneous graph data. On the molecular classifcation problem, it shows the best experimental results

MolGAN
Cao et al. [126] proposed MolGAN, which designs molecular structures containing specifc chemical properties through generative adversarial networks combined with graph convolutional neural networks