Multipath Cross Graph Convolution for Knowledge Representation Learning

In the past, most of the entity prediction methods based on embedding lacked the training of local core relationships, resulting in a deficiency in the end-to-end training. Aiming at this problem, we propose an end-to-end knowledge graph embedding representation method. It involves local graph convolution and global cross learning in this paper, which is called the TransC graph convolutional network (TransC-GCN). Firstly, multiple local semantic spaces are divided according to the largest neighbor. Secondly, a translation model is used to map the local entities and relationships into a cross vector, which serves as the input of GCN. Thirdly, through training and learning of local semantic relations, the best entities and strongest relations are found. The optimal entity relation combination ranking is obtained by evaluating the posterior loss function based on the mutual information entropy. Experiments show that this paper can obtain local entity feature information more accurately through the convolution operation of the lightweight convolutional neural network. Also, the maximum pooling operation helps to grasp the strong signal on the local feature, thereby avoiding the globally redundant feature. Compared with the mainstream triad prediction baseline model, the proposed algorithm can effectively reduce the computational complexity while achieving strong robustness. It also increases the inference accuracy of entities and relations by 8.1% and 4.4%, respectively. In short, this new method can not only effectively extract the local nodes and relationship features of the knowledge graph but also satisfy the requirements of multilayer penetration and relationship derivation of a knowledge graph.


Introduction
With the increasing construction of giant knowledge graphs, graph neural networks (GNNs), graph convolutional network (GCN) [1], and other neural networks that originally performed well on the graph appear to be incapable, and the calculation of adjacency matrix with full graph has become a problem. In many task scenarios, the entities in the map have close and important relationships with surrounding entities [2][3][4] and may have nothing to do with entities beyond a few steps. For example, TransE [5] series mostly only consider the direct relationship between entities. However, facts show that the rich and complex multistep relationships between entities in the knowledge graph are of great value for improving the quality of knowledge graph embedding. In a sense, the value of an entity lies in its interaction with other entities. is relationship can be quantitative or qualitative.
In other words, the same entity has relatively stable characteristic attributes in a fixed scene, and the relationship path is the necessary information supplement of the entity. e multilayer type of the entity mapped by its relationship is significant in knowledge representation learning or logical reasoning [6]. Moreover, various works have also been developed that support entity and relationship prediction [7][8][9]. For example, Hogan [10] replaced entities with canonical labels for solemnising existential nodes. Zhao et al. [11] proposed an effective method of using local relationships in entity type prediction.
A large number of knowledge graph cases show that a node often has a strong semantic relationship with a small number of adjacent nodes. It is not like GCN on "seeing flowers in the fog" and "finding needles in the sea" on the entire map, learning some valuable information from the giant knowledge map. rough the training and learning of the model, we should enable it to accurately classify the types of entities and predict and judge the attributes of the classified entities. An example is shown in Figure 1<LeBron Raymone James, player number, ？>.
e simple dot product of feature vectors and linear classification calculations can cause certain feature loss, which makes the entity classification and attribute prediction ineffective. As such, we propose a lightweight GCN for nonlinear cross learning of local knowledge graphs. Global iteration will certainly improve the coverage and accuracy, but this is often at the expense of the computational efficiency of the algorithm. At the same time, the knowledge graph structure or the baseline models of knowledge representations, such as TransE, TransR [12], and PTransE [13], considers the 1-3 step relationships and proves that the algorithm's reasoning performance can be improved [14].
To sum up, PTransE has a significant impact on the relational path embedding. It integrates multistep relational paths into knowledge representation learning, realizes information reasoning from the relational level, and improves the performance of knowledge graph completion. In terms of path selection, we proposed a resource allocation algorithm. Although this algorithm is feasible in quickly obtaining an effective relationship path, it is easy to cause resources to move closer to the entity that flows to the first step. If we select the first step of resource flow to the entity path, the first step of relationship path fitting will be generated.
e main reason is that the co-occurrence relationship of global node features is ignored. us, a new local convolution global crossover named TransC-GCN is proposed in this study. It can not only effectively extract the local nodes and relationship features of the knowledge graph but also consider the needs of multilayer penetration and relationship derivation of the knowledge graph.
We are committed to embedding giant graphs and highdimensional entities into low-dimensional entity type relationships. Figure 3 presents an example of learning and crossing through multiple local semantic spaces with similar classes, finally achieving two goals: (1) entity type judgment and classification and (2) entity name prediction.
Based on the research goals, we consider further adding local knowledge relationship path features on the basis of PTransE. At the same time, we combine multiple local crosses to realize the combination of local deep learning of relationship paths and global fusion representation. rough the graph convolution learning of local entity relationship, not only can the hidden entities and relationship features in the local be discovered but also the local knowledge representation effect can be improved. e continuous improvement of local knowledge representation ability will improve the overall knowledge learning performance and strengthen the knowledge reasoning ability. e small-scale local graph convolution application can avoid the occurrence of global overfitting while solving the long calculation time.
In order to achieve the above goals, we need to address two challenges: (1) Partial Division Problem. If the part is too small, it will increase the amount of graph convolution calculation. On the contrary, the local features would be too rough. erefore, partial division is the primary challenge.
(2) Cross Loss Function. e iterative application of the local GCN improves the learning effect of local features by defining the loss function and constraining the optimization during the local crossover process. is is a key to ensuring the quality of the model.
Our contributions can be summarized as follows: (1) Using graph convolution combined with the out degree and in degree of the knowledge graph to iteratively calculate the local range, which can prevent the local division from being too large, we limit the local calculation to a certain threshold range related to the global graph structure.
(2) e joint loss function is constructed through knowledge prior probability, posterior probability, and local cross entropy, which is calculated by normalization.

GCN Full Graph
Reasoning. Kipf et al. [15] introduced Spectral GCNs for semisupervised classification of spatial GCN graph structure data and applied convolution operations to calculate new feature vectors for each node with its neighborhood information. e fly in the ointment is that GCN needs to import the entire image to train the information and requires the training data to be unified with the verification data. GCN combining features of nearby nodes is dependent on the structure of the graph, which limits the generalization ability of the trained model on other graph structures. Ermis et al. [16] believe that in a graph, predicting the link relationship between nodes can better study the entire graph network. For example, Wu et al. [17] used GCN to express the relationship between users and projects in the userproject structure diagram. Wang et al. [18] used GCN in KGs to improve the recommendation effect, while causing the hidden danger of overfitting and GCN performance degradation due to the lack of regularization. Wang et al. [19] realized the alignment of cross language knowledge graphs through graph convolutional networks.

Neighbor Sampling Learning.
Graph Sample and Aggregate (GraphSAGE) is the most representative method in terms of uniform sampling of neighbor nodes and local node aggregation, as shown in Figure 4. By training the function of neighbors on the aggregated subgraph nodes, GCN is extended to inductive learning, thereby generalizing unknown nodes [18].
GAT [20] uses the attention distribution metric of neighbor nodes to weight and aggregate the local implicit information of the adjacency matrix. e local graph embedding representation of the central node is composed of the feature representation of the central node and that of neighbor nodes. rough the splicing of node vectors, the feature representation of the center node can be iteratively updated, and then the feature representation of all nodes on the graph can be updated. In essence, GAT uses the feature aggregation function of the attention weight of neighbor nodes instead of the normalized function of GCN.
Unlike GCN, GAT allows implicitly assigning different importance to neighbors of the same node, while learning that attention is helpful for the interpretability of the model. Computational Intelligence and Neuroscience e operation of GAT is point-by-point, and it is unnecessary to visit the global graph structure in advance. erefore, it is suitable for inductive tasks. Important nodes in the graph and relations between nodes help to filter the noise between the neighbors of nodes and improve the interpretability of model results.

Embedding.
A wide range of knowledge graph embedding techniques has been proposed. Based on the idea of TransE, the Trans(D,R) [21] defines the projection matrix M r for the relationship r of each triplet <h, r, t> from the perspective of the relationship difference, and the head and tail entities are projected into the corresponding relational space. en, TransE is used for translation. It is just that the head entity and the tail entity share M r in the same triple, and there is no distinction between the head entity and the tail entity.
For example, <LeBron Raymone James, work for, Los Angeles Lakers>, LeBron Raymone James is a person's name, and Los Angeles Lakers represents the collective. According to the above methods, we find that < James Cameron, director, Titanic> and <James Cameron, director, Avatar > are two triples with the same head entity and relationship. So, their tail entities are Titanic ≈ Avatar. Obviously, this is incompatible with the fact. To solve this problem, TransD considers the difference between the head and tail entities. Similar to TransR, the head and tail entities are respectively projected into the relation r space; then, M rh h and M rt are obtained. CrossE [22] uses a relational interaction matrix C to generate the interaction vector of the head entity and the relationship and then uses the vectors of these two interaction representations to predict the tail entity.

Algorithm Model
In this section, we propose an end-to-end knowledge map convolutional cross embedding representation method (TransC-GCN) as shown in Figure 5. Firstly, multiple local semantic spaces are divided according to the largest neighbor, and then a translation model is used to map the local entities and relationships into a cross vector, which is used as the input of GCN. rough training and learning of local semantic relations, the best entity and the strongest relationship are found. Finally, the optimal entity relationship combination output is evaluated through the posterior loss function based on the mutual information entropy. e framework is mainly composed of 4 parts: (1) partial knowledge graph learning partitioning and embedding representation; (2) performing GCN coding on the partial graph and using the combined node relationship of the partial graph as input; (3) cross aggregating multiple partial knowledge graphs along with key relationships into reasoning nodes; and (4) sorting prediction nodes for multiple relationship paths.

Subgraph Division.
ere are often complex relationships between nodes in a local knowledge graph. erefore, the minibatch method is used for reference to the set of local subgraphs, which is composed of central nodes. Besides, all nodes at the q-order subgraph of B are presampled and stored in the traversal. Feature and label propagation are only propagated in the local map. So, we define the subgraph division as follows.
assume |E| � n, |R| � m. Define its incidence matrix A ∈ R n×m as follows: Definition 2. e number of nodes is n < ∞. Use one n × n matrix to represent adjacency matrix G, which is defined as A(G) � (a i,j ), where n is the order of the graph. e set of A(G) features is called spectrum of the graph.
where D(e (c) i , e (c) j ) denotes the Euclidean distance between the center node i to the neighbor node j and d ∘ is the i . e schematic diagram of process is shown in Figure 6. Algorithm 1 shows the details of the calculation process.

Local Relation Graph Convolutional Coding.
GCN can obtain the local information more accurately. Maximum pooling operation helps to grasp the strong signal on the local feature, thereby avoiding noise interference of the global redundant feature. Average pooling is used for neurons. e neighbor nodes connected to the central node are embedded as the input of a layer of neural network. Average pooling is used to eliminate the sparseness or overfitting problems of local nodes and relationship features. So, we use a layer of fully connected neural network and maximum pooling. en, the output vector is multiplied by the feature vector of the central node to obtain a local embedding representation.
e connection matrix A m is the label of m, m ∈ 1, 2, { . . . , M}, and D −1/2 m is the degree matrix corresponding to A m . where h is the weight matrix between hidden layers, and b s and ϵ°are the deviations. e nonlinear activation function σ is ReLU. Figure 7 shows the partial relationship diagram of the convolutional coding process.

Graph Relational Cross Matrix.
In order to eliminate the overfitting of some important relationship nodes caused by the division of local graphs, multiple heads can be used to calculate C subgraph branches in parallel.
en, all the subgraph branches can be defined by Local cross matrix: Definition 4. e graph cross matrix scaling factor is defined by the matrix mutual interference parameter [23], which measures the mutual interference parameters between different column vectors of local cross matrix. Specifically, the similarity relationship between the local graph structures can be found: From (9) and (10), the graph cross matrix of the fusion local graph structure is given by C � ρC α .

Dynamic Node Prediction.
e normalized attention coefficient is used to sum the weighted features as the preliminary output feature of each node: Local subgraph prediction node output:

Triple Knowledge Embedding Loss Function.
Cross correlation score of entities and relationships within triplet is used as the object of knowledge representation optimization.
Here, the formal scoring function is given by

Cross Convolution Loss Function.
A graph convolution operator is inside the local knowledge graph, and multiple local graphs are cross fused based on the information divergence. e local knowledge map features learned from the local convolution model are fed into the divergence fusion cross model. At the same time, we use gradient optimization algorithms of AdaGrad and stochastic gradient descent (SGD) to construct training modules. e estimated difference of partial subgraph output is defined as follows: e loss function of global cross training error rate for the supervised training optimization model is shown as follows:

Joint Loss Function.
In the training process, to effectively supervise the feature loss caused by local ij ); (6) end (7) end ALGORITHM 1: Subgraph generation algorithm. convolution of the knowledge map and the partial filtering loss. When the divergence fusion crosses, the two tasks in the model are combined for training. is can improve the endto-end training of the model. In summary, the joint loss function is written as where c ∈ [0, 1] is a hyperparameter that adjusts the ratio of balanced coding and partial crossover and θ°� θ°1, θ°2, θ°3 is the adjustment parameter.

Experiments
To further verify the effectiveness of TransC-GCN, we adopt the same experimental setting of Bordes et al. [24] in terms of entity prediction. In the evaluation index, the average ranking mean rank (MRR) and 10-hits rate (HITS@10) predicted by the entity are considered.
According to the experience of Shimaoka et al. [25], we use the pretrained word vector as the initialization and optimize the parameters with the optimizer Adam [26]. We use TransC-GCN to generate the representation vector of the triple. To avoid overfitting, we add dropout to the neurons of GCN and randomly inactivate neurons of the vector iteration. Furthermore, we compare our model with multiple baseline models under different parameter settings on the two tasks of entity type classification and entity attribute prediction. Moreover, we compare it with the baseline model [27], where ConvKB(https://github.com/daiquocnguyen/ ConvKB) and ConnectE-E2T + TRT(https://github.com/ Adam1679/ConnectE) programs were run.

Datasets.
In order to be more pertinent and comparative, we refer to the datasets extracted from the text relationship [28] as our research object. e characteristics of the three datasets are shown in Table 1.    Computational Intelligence and Neuroscience e visualization of the dataset intuitively shows that the number of entities and relationships and the degree of association have a greater impact on the local knowledge graph. Figure 8 shows that in the three datasets, the entities and relationships of YAGO43kET are dense and rich. On the contrary, WN18 has sparse entities and lacks relationships.

Evaluation Index.
In order to verify TransC-GCN, we refer to two typical evaluation methods [31]. Formally, mean reciprocal rank (MRR) is defined as where N is the triplet number for the training dataset; Rank i is the score averaged to the ith correct classification entity; and Hit rate is H_@K (K � 1/3/10), which means that in traversal training, the ability to obtain the correct triple entity prediction classification can be obtained once in K replacements.

Model Parameters.
We conducted experiments on the training sets and the validation sets, as shown in Table 2. We conducted cross explorations of different combinations in the setting of various parameters. According to the validation set effect of the corresponding dataset, the average ranking score will be the best.

Classification Prediction.
Compared with tail entity and head entity prediction baseline models of the Bilinear, MLP and Trans series are shown in Table 3. WN18RR with dense entities and poor relationships and YAGO43kET with dense entities and rich relationships are selected. For relationship prediction verification, sparse entities and rich relationships of FB15kET are used. Figures 9 and 10, TransC-GCN is better than the compared entity prediction models in both recall and quality indicators. e aggregation and intersection of key paths on the local knowledge graph can effectively improve the efficiency and quality of node prediction. Prediction index results of the head entity and tail entity of the triple completion are close to each other, which shows that the aggregation of critical paths based on local graphs has strong applicability for node prediction. Figure 11 shows that TransC-GCN also performs well in predicting the recall rate and quality of relationship evaluation. It not only considers the relationship path but also, more importantly, learns the characteristics of the knowledge graph through GCN, which can eliminate the random prediction of the model probability caused by the lack of triple entities or relationships to a certain extent. is provides richer necessary information for relationship path and entity prediction. Figure 12, on the same dataset, two different training optimization methods of TransC-GCN_SGD and TransC-GCN_AdaGrad are compared. We can observe from Table 4 and Figure 13 that the number of entities and the number of relationships have a significant impact on the convolution and crossover of local knowledge graphs. In addition, the comparison result proves that the maximum pooling is indeed better than the average pooling in solving feature redundancy. In a graph with sparse entities and lack of relationships, there is a serious data sparse problem, which leads to the long-tail distribution of entities and relationships. However, we take advantage of  Computational Intelligence and Neuroscience the critical path in the local graph, and the structure of the global graph is used.

Case Study.
We have shown that our TransC-GCN can handle large-scale knowledge graph and entity-relationship representation learning. In Figure 14, we provide an example of cross inference about local relationship paths. e core entity Savannah James is found through local cross learning. Two weak-strength relations (pay attention or like) are obtained by reasoning. We can find that Savannah James likes the 23 athletic undergarment bra of Zoe Saldana.   FB15kET (%) YAGO43kET (%) Figure 13: Comparison of baseline models of entity type prediction accuracy.
Because Savannah James like Zoe Saldana, she appreciated Avatar. Maybe she likes the bra of Athletic undergarment, which is of the brand Zoe Saldana. Of course, Savannah James must pay more attention to the player number of 23, so she may be want a signed 23 athletic undergarment bra of Zoe Saldana.

Conclusion and Future Work
In this paper, we propose a TransC-GCN method based on local convolution and global crossover for knowledge graph completion. ① is is the first time that local GCN and TransC are combined for knowledge graph representation learning. ②TransC-GCN can not only divide the huge knowledge graph into several local knowledge graphs and use convolutional neural network coding for more intelligent and subtle local knowledge graph feature learning with strong semantic relations but also realize the volume of adjacent nodes and relationship features. ③Pooling and filtering data noise provide a new and efficient method for node relationship prediction and classification. ④TransC-GCN considers the information value of nearby local knowledge graphs. We propose a parallel method of cross fusion of local knowledge graphs based on divergence, which combines local knowledge graphs and global knowledge graphs more flexibly in representation learning. In a number of challenging baseline model test comparisons. TransC-GCN has excellent performance in entity reasoning accuracy and generalization ability and is lightweight. However, there are deficiencies in entity diversity learning: (1) When the gradient adopts AdaGrad to start training, the square of the accumulated gradient is found, which causes the effective learning rate of GCN to decrease prematurely and excessively. After inactivating neurons with small gradients, ReLU is used as the activation function, resulting in loss of diversity. For future work, we will try to combine the pattern sequence ordering and the graph context to optimize TransC-GCN. Meanwhile, based on TransC-GCN, we intend to propose a new scene recommendation algorithm based on the combination of graph embedding and collaborative filtering.

Data Availability
Some or all data, models, and codes generated or used during the study are available from the corresponding author upon request.