Representation Learning Method with Semantic Propagation on Text-Augmented Knowledge Graphs

Knowledge graph representation learning aims to provide accurate entity and relation representations for tasks such as intelligent question answering and recommendation systems. Existing representation learning methods, which only consider triples, are not sufficiently accurate, so some methods use external auxiliary information such as text, type, and time to improve performance. However, they often encode this information independently, which makes it challenging to fully integrate this information with the knowledge graph at a semantic level. In this study, we propose a method called SP-TAG, which realizes the semantic propagation on text-augmented knowledge graphs. Specifically, SP-TAG constructs a text-augmented knowledge graph by extracting named entities from text descriptions and connecting them with the corresponding entities. Then, SP-TAG uses a graph convolutional network to propagate semantic information between the entities and new named entities so that the text and triple structure are fully integrated. The results of experiments on multiple benchmark datasets show that SP-TAG attains competitive performance. When the number of training samples is limited, SP-TAG maintains its high performance, verifying the importance of text augmentation and semantic propagation.


Introduction
Knowledge graphs (KGs) are structured graph databases, usually large in scale, with many entities, relations, and triples. KGs are useful for intelligent search [1], recommendation systems [2,3], intelligent question answering [4,5], and other applications. Common KGs include Freebase [6], YAGO [7], and WordNet [8]. Although these KGs are huge in scale and cover a wide range of information, they still face problems such as missing data and incomplete semantics, limiting the performance of subsequent applications. erefore, completing the missing facts semantically in KGs is an important task.
KG representation learning (KGRL) is an effective and practical approach to predicting missing facts. In recent years, it becomes an important research direction in KG. KGRL embeds entities and relations in KG into vectors and then predicts potential triples through vector computation.
In early KGRL methods such as TransE [9] and TransH [10], researchers embedded entities and relations using only triples; however, the representation ability of these methods for complex relations is limited. Hence, researchers have proposed various improvements, such as RotatE [11] and ConvE [12]. However, these methods may have difficulty in training when the number of triples with specific entities or relations in KG is limited. e model may not fully learn the features of KG elements through limited training samples, thus affecting training effect of the model.
To solve this problem, many methods integrate external auxiliary information such as type, time, text, or images into the model. ey usually encode auxiliary information or expand triples into "quadruples," endowing the entity with rich information to compensate for the lack of training samples. Text corpora contain more semantic information. Researchers have proposed a variety of methods for text, such as NTN [13], DKRL [14], ConMask [15], and TEGER [16]. ese methods integrate semantic information within text into entity representations to improve the representation performance but retain the following shortcomings: (1) Text information does bring semantic supplement to entities. Existing methods tend to encode the text information independent of the triple structure, which does not combine semantic and triple structure information well. Moreover, the sparsity of KGs, which make it difficult to share semantic information between related entities, remains a problem.
(2) Entities in the same triple tend to be semantically associated. Modelling the association information can obtain the features at entity-pair level and be beneficial to representation learning. However, existing methods inadequately represent such associations. (3) ese methods generally use traditional word embeddings to encode text information. For one token, there will be only one representation even in different contexts. e lack of semantic information is insufficient for encoding text description sequences.
In this study, we propose SP-TAG, a KGRL method with semantic propagation on text-augmented knowledge graph. SP-TAG embeds the triple structure and text description information in the same semantic space. Named entity recognition (NER) [17], as well as relation classification (RC) [18,19], is an important task in process of constructing KGs from unstructured text. Inspired by the process, for each entity in KG, we first extract the entities from the text description (called text entities, corresponding to the existing entities in the knowledge graph, i.e., original entities) with NER. e text entities are then connected to the original entities in the KG to construct a text-augmented KG, which contains the information from both triples and text. We then use a pretrained language model to obtain the initial feature embedding of the nodes in the text-augmented KG, which incorporates semantic information into the model. Next, a graph convolutional network (GCN) [20] is used to propagate the semantic features of the entities so that the textbased representation contains semantic information from other nodes. Finally, we jointly learn the structure-and textbased representations of the entities using a gate mechanism in the same vector space.
e main contributions of this study are as follows: (1) e proposed SP-TAG extracts text entity nodes to construct a text-augmented KG based on NER from the text description of the entity, which increases the average number of edges around the entity and reduces sparsity. (2) SP-TAG uses a pretrained language model to initialize the semantic features of the nodes. It also uses a GCN to propagate semantic features among entities to better integrate structural and textual information, thereby improving the semantic associations between entities. SP-TAG can be combined with existing KGRL methods to improve performance. (3) We conducted experiments on multiple benchmark datasets to compare SP-TAG with methods that only consider triples or integrate text information. e results also show that SP-TAG is more reliable with few training samples because of its augmentation and propagation characteristics.

Related Works
is section reviews three types of KGRL methods: methods that only use triples, methods that integrate text information, and new key techniques we use in our method, i.e., GCNs.

Methods Based on Triples.
ese methods can be divided into methods based on translation, rotation, and neural networks ( Figure 1). A classic translation method is TransE [9]. For each triple, TransE considers the relation r to be a translation operation from head entity h to tail entity t in vector space and uses a distance function to calculate the score, which in turn measures the confidence that the triple is true. e score function is as follows: TransE cannot effectively deal with complex relations such as 1-to-N, N-to-1, and N-to-N relations. TransH [10], TransR [21], and TransD [22] employ different projection strategies to improve the representation ability of this approach and handle such complex relations. TransH [10] maps the head and tail entity to different relation hyperplanes for further calculation; TransR [21] embeds entities and relations into entity and relation spaces, respectively, and TransD [22] constructs a dynamic mapping matrix to reduce the number of parameters of TransR. e concept in rotation methods originates from Euler's formula, e iθ � cos θ + i sin θ, which is used to embed entities and relations into complex space. In RotatE [11], for each triple, t is obtained from h through a rotation operation of relation r, which greatly improves its ability to represent symmetry relations such as "classmates." is method also measures the confidence of the triple by calculating the distance, as follows: QuatE [23] improves on RotatE by extending the complex space to quaternion space. MRotatE [24] combines both entity and relation rotations to further improve its ability to represent complex relations.
Many KGRL models are based on neural networks. Typical models include models that extract the deep features of triples using convolutional neural networks (CNNs) such as ConvE [12] and ConvKB [25]; models that learn longdistance KG relation dependencies using recurrent neural networks (RNNs) such as RSN [26]; models that generate trajectory sequences by traversing KGs using generative adversarial networks (GANs) such as GRL [27]. ese models perform well on real datasets, but the geometric interpretation is not as clear as it is in translation-and rotation-based methods. Because a neural network is a "black box," [28] its interpretability is not strong. Triple-based models perform reasonably well, but they often require sufficient training samples and are thus susceptible to sparsity in the KG. Moreover, such models do not fully utilize auxiliary information, making it difficult to accurately represent entity and relation semantics.

Methods Integrating Text Information.
In some KGs, entities have related text description information that does not exist in the triples, thus complementing the representation of the entities. To better utilize this information, existing methods usually encode it as a vector (referred to as text-based representations here) for joint training with the triple-based entity representations (referred to as structure-based representations here). Researchers originally used a one-hot or co-occurrence matrix to encode text, but as the size of the text increases, the computational complexity and number of parameters rapidly increase, and hence this approach is not suitable for large corpora. e word embedding method Word2VEC [29] maps words into a lower-dimensional dense vector that retains text features and greatly reduces the computational complexity. In recent years, with the development of deep learning, CNNs [30], RNNs, and BERT [31] are often used to learn and extract deep semantic features from text. BERT is a pretrained model based on a multilayer transformer architecture. It captures the contextual information of text sequences and represents semantics accurately and efficiently.
Models that integrate text information include DKRL [14] and Joint [32], which use a continuous bag of words and CNN for text encoding; STKRL [33], which uses an RNN to obtain text sequence information; EDGE [34] and AATE [35], which are based on the bidirectional long short-term memory network; TA-ConvKB [36], which uses bidirectional short and long term memory network with attention to encode the text. e text-and structure-based representations are generally combined when calculating the score functions. DKRL fuses the two representations using interleaving operations. Joint and AATE fuse the representations using combination mechanisms such as a gate structure and weight parameters. TA-ConvKB uses LSTM to combine the representations. Pretrain-KGE [37] utilizes BERT to obtain textual representations of entities and relations, introducing semantic information into the triples.
Some models do not adopt the idea of a joint representation. For instance, ConMask [15] allocates attention by calculating the semantic similarity of each word in the head entity description and relation, fuses the representation of the head entity description word and relation representation to extract features, matches tail entities, and ranks entities according to similarity. ConMask does not exploit the structural information of triples. OWE [38] establishes a transformation matrix between the structure and text representation space and converts the text-based representation into a structure-based representation to calculate scores. KG-BERT [39] treats triple reasoning as a sequence prediction problem. By inputting all entity descriptions and relations into the transformer, KG-BERT determines whether the triple is correct according to the result of the sequence output. ese models increase their representation ability by integrating text information, but the text encoding process is often independent of the structure of the KGs. It is difficult to fully integrate the text information and KG structure, and thus the utilization of the semantics in the text remains insufficient. TEGER [16] uses TF-IDF to extract keywords from text descriptions and connects them to the KG to expand the KG. en, TEGER follows traditional TransE method to obtain the embeddings of entities and relations. It strengthens the connection between text information and knowledge graph and improves the link prediction results.

Graph Convolutional Network (GCN).
A graph is composed of nodes and edges. It is not a structured matrix, and hence traditional CNN methods cannot be used for feature extraction. To extract rich information from graphs, the GCN [20] was proposed. Similar to a CNN, a GCN is essentially an aggregation of operations on the neighborhood information and can be divided into three steps. e nodes send their information to the neighbor nodes, aggregate the information from their neighbors, and perform a nonlinear transform on the aggregated information. Computational Intelligence and Neuroscience However, a GCN cannot be directly used for KGs because it often ignores edge information, i.e., relations. Relational graph convocational network (R-GCN) [40] introduces a neighbor node aggregation model based on a GCN that considers edge types. SACN [41] divides the entire graph into multiple subgraphs that each contain only one relation and then applies the GCN to each subgraph separately. TransGCN [42] combines the translation model and GCN to learn the representation of entities and relations and integrates the triple scoring function into the model. Models based on GCNs usually propagate information among the nodes in graph data to obtain more complex node neighbor features, which has strong potential in the field of KGRL.

Proposed Method
In this study, a KG is denoted as where E, R, and T denote the sets of entities, relations, and triples, respectively. Each triple is defined as (h, r, t), where h, r, and t refer to the head entity, relation, and tail entity, respectively. Its vector embeddings are denoted as the bold symbols h, r, and t. e text entity set extracted from the text description is denoted as e ner , and the text-augmented KG is K TAG . Subscripts s and d represent the structure-based and textbased representations of the entity, respectively, e.g., e.g., h s , t s , h d , and t d .
As noted in Section 2, most existing methods encode text independently of the triple structural information.
is makes these methods only retain the semantic information in the text, while ignoring the connection between the text and the knowledge graph structure. A few methods [16,32] connect the keywords from the text to the KG, realizing the fusion of text and structure. But on the one hand, these methods do not take into account the heterogeneous features of knowledge graphs during embedding; on the other hand, the methods used to obtain node representations only with triple/graph structure (such as TransE or graph neural networks) without preserving the semantics of the text information.
In response, we propose a representation learning method with semantic propagation on a text-augmented KG called SP-TAG, combining the text and the structure, and introducing rich semantic information. SP-TAG consists of three parts: text-augmented KG construction, feature initialization and semantic propagation, and joint embedding. SP-TAG is illustrated in Figure 2, in which the upper-left part is the original KG K, where the blue squares represent the entities, and the bottom is the corresponding K TAG , where the green and red squares represent the entities and text entities, respectively. e rounded rectangles with colored circles represent the embedding of the corresponding elements.

Text-Augmented KG Construction.
We construct a textaugmented KG by extracting named entities from an entity's text description and connecting them to KG. An entity e in a KG usually has a corresponding text description with related information such as associated entities and attributes. We focus on these keywords in the text description and extract them.
Existing TF-IDF methods do not consider the specific content, number, and parts of speech of keywords, leading to potential noise caused by inappropriate keywords. e nodes in the knowledge graph are entities with clear meaning, so we use named entity recognition to get the same meaningful entity nodes (called text entities) from the text. At the same time, considering the heterogeneous character of knowledge graph, we hope that when the extracted text entities are connected to the knowledge graph, different edges can be selected according to the types of the text entities.
We use the open-source natural language processing tool Spacy (https://spacy.io/) to perform NER operations. is is a popular tool that has achieved good results on several evaluation tasks in natural language processing. Spacy can help us complete the preliminary text processing work, allowing us to focus on the research of knowledge graph representation learning. According to the settings of Spacy, we select 11 common entity types and extract these types of text entities from the text description of the original entity. e 11 types are described in Table 1.
Usually, there is an explicit or latent semantic correlation between the text entities and original KG entities. erefore, we use two-direct edges to connect them to assist two-way semantic propagation. Simultaneously, for new connections, we do not consider the specific semantics of the edges, distinguishing them only by text entity types.
When a text entity is connected to KG, node number around the original entity e in KG increases, and the resulting KG is called a text-augmented KG (K TAG ). For example, in Figure 3, "/m/03ftmg" and "/m/013bd1" are the entities in the KG, and the five terms in the dashed rectangles are named (text) entities. Here, two entities not directly connected in the KG acquire a common text entity node neighbor.
erefore, semantic information can be propagated between then using a GCN. e left entity in the figure is the screenwriter "Anthony Horowitz," whereas the right entity is the actor "David Suchet," and the middle text entity "Agatha Christie's Poirot" happens to be the name of the TV series associated with them both. In this way, the potential associations of entities in the text can be mined and added to the KG, augmenting their semantic associations.
Note that the number of text entities around each entity may differ, while the number of entities in the original KG connected to a text entity may also differ. For example, "Agatha Christie's Poirot" connects two nodes, whereas all other text entities connect only one node.
In SP-TAG, text entities connected with only one entity are removed from K TAG . Text entities are extracted to connect two entities in the KG that may have semantic associations but no explicit triple, so that we can propagate semantics between them. If a text entity does not establish a connection between two entities, it does not represent additional necessary information. Retaining it will increase the complexity of the model. Moreover, some text entities, such as country and place names, connect too many entities. 4 Computational Intelligence and Neuroscience e relationships between these entities may not have practical significance, resulting in unnecessary semantics. For example, it is not informative for the entity "parrot" to share semantics with "Silicon Valley" via "United States." To consider both aspects, we define a threshold k in SP-TAG, and only when the number of text entities connected to entities is greater than 1 and less than k, are the text entities and their connections preserved. We will carry out experiments on parameter k. Details are provided in Section 5.2.
SP-TAG learns the text-based representation of entities from K TAG . In contrast to existing models that integrate text information, SP-TAG augments the existing KG by extracting named entity nodes.
is both alleviates the sparsity of KG and lays a foundation for subsequent joint learning of text-and structure-based representations.

Feature Initialization and Semantic Propagation.
SP-TAG adopts the existing classical KGRL methods such as TransE and RotatE to initialize the structure-based representation.
is representation, which has more credible structure semantics, is directly derived from the triples in the KG and is key in the prediction and reasoning of missing elements in triples.
SP-TAG initializes text-based representations of entities and text entities in K TAG . Because each entity has an explicit text description, we use BERT to directly encode the entity description. at is, for the textual description S(e) of each entity, there is a text-based representation e d � BERT(S(e)) preserving the semantics of the original description. Since the text entity will learn semantic features along with the entity in the subsequent semantic propagation, to ensure that the representations of the text entity and entity are in the same semantic space, SP-TAG directly inputs the text entity name into BERT to obtain its text-based representation, that is, e ner � BERT(name(e ner )).
After initialization, we continue semantic propagation. During the construction of K TAG , text entities are only connected to original entities, and no edges will be generated between text entities; therefore, direct semantic propagation only occurs between entities and text entities, and between entities and entities. Hence, we focus on semantic propagation between those entities that do not have explicit triples in K but are connected by text entities in K TAG .
We assume that the correlation between entities in the knowledge graph will decrease as the distance increases. It is meaningless to propagate semantic information from one entity to distant entities. e propagation process itself will  Computational Intelligence and Neuroscience bring information attenuation as well. erefore, it is needed to define a threshold to limit the propagation, as well as reduce the computation and simplify the model. By consulting literature and conducting experiments, we set the threshold as 2.
As shown in Figure 4, there are the following three main situations: (1) entity to entity, (2) entity to entity to entity, and (3) entity to text entity to entity. Situation 1 realizes the close-range semantic propagation between entities with a one-hop connection, situation 2 realizes longer-distance semantic propagation between entities with a two-hop connection, and situation 3 realizes distant semantic propagation between entities connected by a text entity.
Semantic propagation is realized using a simple GCN. e graph convolution operator is expressed as follows: where h (l) i is the feature vector for node i in the lth neural network layer, N i is the set of neighbors of node i, c ij is the normalization constant for edge (v i , v j ), W (l) is a layerspecific weight matrix, and σ is the activation function.
Because a KG is essentially a heterogeneous graph, we also considered the use of the more suitable R-GCN for semantic propagation on K TAG . R-GCN is a variant of the GCN that introduces a specific weight parameter for each relation. erefore, during semantic propagation, SP-TAG dynamically learns the appropriate weight parameters for different relations to obtain the semantics from neighbor nodes more accurately. We compare the results of GCN and R-GCN in the ablation study. e overall update process of R-GCN is as follows:    Computational Intelligence and Neuroscience where h (l) i is the feature vector for node i in the lth neural network layer, N r i is the set of neighbors of node i with edge r, c r ij is the normalization constant for edge (v i , v j ) with edge r, W (l) r is a layer-specific weight matrix with edge r, W (l) 0 is the self-loop weight, and σ is an activation function. Both GCN and R-GCN can transform the dimensions of the embedding, so the output of the BERT encoder can be transformed to the required dimensions. e initialized representations e d and e ner are the input of GCN. After the semantic propagation in the network, the output will contain the information of the original entity and the text entity itself, which also incorporates the information of the surrounding neighbor nodes.
In contrast to existing models that integrate text information, SP-TAG uses the pretraining model BERT to better express the semantic information of nodes in a textaugmented KG. With GCN-based semantic propagation, adjacent entities and text entities can share semantics.
In the following experiments, we will also compare the influence of whether the GCN is used for semantic propagation on the results. Details are provided in Section 5.3.

Joint Embedding.
To preserve both the structural semantic information of the triples and the text semantic information of the entity descriptions, SP-TAG adopts a gate mechanism [32] to combine the two parts and obtain the final representation of the entity. When the two representations are combined, the weights in each dimension of the vector are automatically learned without manual parameter setting. Moreover, the gate vectors learned for different entities are different. In addition, the entire model can be trained in an end-to-end manner. For entity e, the expression for combining its structure-and text-based representations are as follows: where e s and e d are the structure-and text-based representations of the entity, respectively, g e is the gate that balances the two representations, and ⊙ is element-wise multiplication.
To constrain the value of each element in g e to [0, 1], we use a sigmoid function as activation, i.e., where g e is a vector specific to entity e and is simultaneously initialized and optimized with e s and e d . e representations of the head entity, relationship, and tail entity are as follows: In KGRL research, the construction of negative samples is an important aspect of training, and the quality of their construction affects model performance. For example, during training, many triples are easily judged to be wrong by the model, and sampling these triples does not provide new information for training. To obtain high-quality negative samples, Sun et al. [11] proposed a self-adversarial sampling method that performs dynamic sampling according to the representation of entities and relations, so that new negative samples contain new information. e method samples negative triples according to the probability distribution: where α is the sampling rate and f(h i ′ , t i ′ ) is the score of the triple.
is probability is also introduced into the loss function as the weights of the triples. e overall loss function of the model is as follows: where c is the margin hyperparameter, σ is the sigmoid function, and (h i ′ , r, t i ′ ) is the ith negative sample. e model is trained to minimize the loss function, optimize parameters, and improve the quality of the entity and relation embedding.

Results and Discussion
We evaluated the model performance using a typical link prediction task. We also outperformed a hyperparameter analysis, ablation study, and semantic augmentation verification with a small number of training samples.

Datasets.
We used the datasets FB15K and WN18 [45], from the two real-world KGs Freebase and WordNet (Table 2). In addition to the triples, they include a specific text description of each entity. FB15K and WN18 have been widely used. However, several researchers argued that there are many inverse relations in FB15K and WN18 that cause data leakage. erefore, these inverse relations were removed, yielding FB15K-237 [46] and WN18RR [12]. In the selection of the remaining parameters, sampling rate is defined as α ∈ 0.5, 1.0 { } and learning rate is defined as λ � 0.00005. When using BERT to encode text, the maximum truncation length for text entity descriptions was set to 100 words. We use Adam [47] to optimize the parameters.

Link Prediction.
We evaluated the performance of the method by predicting the missing head or tail entities in triples. We tested the performance of SP-TAG with two representative methods TransE and RotatE and compared it    Tables 3 and 4 yield the following observations: (1) Compared with the triple-based methods, methods integrating text information generally perform better with respect to MR and HITS@10, whereas for HITS@1, HITS@3, and MRR, they also perform competitively. is reveals that text information effectively improves the lower bound of the entity link prediction ranking and improves average prediction performance.
(2) Compared with the original models TransE and RotatE, the SP-TAG-based models perform significantly better, demonstrating that text information brings more semantic features to entities. When the triple structure information is insufficient, text is a powerful complement that improves the performance of representation learning. ough TransE is an early method, SP-TAG-TransE can still achieve excellent results and is comparable with recent models, demonstrating that classical models can achieve very good results through KG augmentation and semantic propagation. Taking WN18RR as an example, the MR of the SP-TAG-TransE model (1423) is still substantially better than those of recent models (>2000), and the difference with respect to SP-TAG-RotatE is quite small. ese models obtain the top two results.
(3) SP-TAG-RotatE achieved the best MR, HITS@3 and HITS@10 results of all methods on FB15K-237, WN18, and WN18RR, and its MRR, HITS@1 were among the top results. On FB15K-237, the improvement obtained by SP-TAG is not as obvious as it is on WN18RR. However, considering the performance of the original method, the improvement is substantial, reaching an average level. We believe that because of the more complex scale and structure of FB15K-237, the triple structure contains more information than WN18RR itself, so the augmentation and semantic propagation effects are not as obvious. (4) Because FB15K-237 and WN18RR have no inverse relations, link prediction on these datasets is more difficult. On these datasets, SP-TAG's performance metrics decrease less and are more stable than those of Pretrain-KGE. e MR of Pretrain-RotatE decreased from 125 on WN18 to 2138 on WN18RR, whereas the MR of SP-TAG-RotatE only decreased from 72 to 942. On FB15K and FB15K-237, they performed comparably. Hence, when the complexity of the dataset increases, SP-TAG has better stability and adaptability because of the augmentation of the KG and semantic propagation.

Parameter Analysis.
To analyse the impact of semantic propagation on K TAG , SP-TAG-RotatE is used to analyse hyperparameter k. Figure 5 shows the distribution of the numbers of original entities connected to each text entity in K TAG . e abscissa is the number of connected entities, and the ordinate is the number of corresponding text entities.
In each K TAG , many text entities connect to only one entity. More than 1,500 such text entities exist in WN18RR (14.9% of all text entities) and more than 60,000 exist in FB15K-237 (30% of all text entities). As mentioned in previous section, retaining such text entities does not propagate semantics between entities and increases overall complexity. e small histograms on the right show that the number of text entities is roughly inversely proportional to the number of connected entities. e descriptions of entities in FB15K-237 are longer and more detailed than in WN18RR, so more text entities can be extracted to enhance the KG, and we tend to use a smaller k for FB15K-237 and a larger k for WN18RR. When choosing the value of k, we set k to 2, 3, 4 { } for FB15K-237 and 2, 4, 8, 12 { } for WN18RR. Figure 6 lists the specific number of text entities and the number of new triples in K TAG under different k, where the abscissa is the hyperparameter k, and the two histograms represent the number of new text entities and the number of new triples. Table 5 further compares the numbers of nodes, edges, and average edges per node of K and the corresponding K TAG for different values of k. After the augmentation, the number of edges (i.e., number of triples) has been increased to a certain extent. For example, the average number of edges per node in WN18RR increased by 6.6%, and that in FB15K-237 increased by 28.5%. erefore, the entities are associated more tightly, and semantics are fully propagated on K TAG . Figure 7 and Table 6 show the results of link prediction for FB15K-237 and WN18RR when k varies. Of the five metrics, MR is the most sensitive to hyperparameter k. On FB15K-237, as k increases, the link prediction metrics slightly decrease. On WN18RR, as k gradually increases from 2 to 8, the link prediction performance improves, but when k further increases to 12, the results are worse. is is due to the overpropagation of semantics. e addition of too many text entities induces noise, as described before. Hence, it is important to choose the number of text nodes when constructing K TAG . e remaining main hyperparameters are dimension d and margin c. For SP-TAG-RotatE, we used a grid search to select the optimal parameters (FB15K-237: c � 12, d � 800 and WN18RR: c � 12, d � 200). Figure 8 and Tables 7 and 8 present the results when one of the parameters is fixed and the other is adjusted (the remaining unrelated parameters are fixed). e results reveal that (1) For both datasets, the effect of c on the results is obvious. When c is optimized, the performance of the model significantly improves (WN18RR, c � 12). However, if the value is not appropriate, the performance decreases (FB15K-237, c � 18).

Computational Intelligence and Neuroscience
(2) On both datasets, as the dimension increases, the performance of the model improves, indicating that vectors with higher dimensions are important for fully expressing the features of entities and relations. e dimension parameter is typically related to dataset complexity. In terms of scale and content, FB15K-237 is more complex than WN18RR, and hence to represent the features of entities and relations in this dataset, a higher dimension is required.  (3) e effect of the parameter values is more pronounced on WN18RR. For simple datasets, finding appropriate parameters can effectively improve the performance of the model, whereas for more complex datasets, additional factors must be considered.
To explore the influence of hyperparameters on datasets from the same KG with different distributions, we also used SP-TAG-RotatE to compare the results of WN18 and WN18RR for different values of d and c. Figure 9 reveals that as the dimension increases, MR is significantly improved, whereas the other four metrics slightly improve. With the increase in c, MRR and HITS@1 significantly improve, and HITS@3 and HITS@10 slightly improve. e results of MRR and HITS@1 in Table 9 show that c still has a significant impact on this dataset. Unlike on WN18RR, on WN18, d is an important factor. In addition, parameter adjustment has a greater impact on WN18 (MRR increased by 18.8% from 0.796 to 0.946, and MR decreased by 76.1% from 301 to 72), whereas WN18RR was relatively less sensitive (MRR increased by 5.6% from 0.445 to 0.47, and MR decreased by 18.8% from 1160 to 942). As mentioned above, a large number of inverse relations in WN18 were removed to create WN18RR, making the WN18RR dataset more complex; in other words, WN18 has more data containing more information, making WN18 simpler. Consistent with previous experimental results, the simpler the dataset, the greater the impact of parameter tuning on the results. e data distributions are different, and thus the hyperparameters have different effects on them.

Ablation Study.
We evaluated the effect of GCN and R-GCN in semantic propagation as well as the importance of semantic propagation by replacing GCN with a linear transform.
e linear transform is obtained by a matrix operation M p * q , where p represents the embedding dimension and q represents the output dimension of the BERT encoder. Finally, we demonstrate the importance of textual information and the gate mechanism by replacing the BERT text initialization vector with a random vector and the gate vector with a constant.    Table 10 reveals that on both FB15K-237 and WN18RR, when the dimensions are the same, the effect of using GCN for semantic propagation is not as good as that of R-GCN, indicating that for each relation, setting a different weight matrix to distinguish their semantics improves results. Moreover, when R-GCN is used instead of the linear transform, the model has significant advantages in MRR, HITS@1, and HITS@3, demonstrating that semantic propagation further improves the original top-ranked prediction results. Table 11 reveals that on both datasets, the gate mechanism helps balance the text-and structure-based representations, and the semantic information introduced by BERT effectively improves the representation learning performance of entities.

Verifying Semantic Augmentation with Few Training
Samples. To further illustrate how SP-TAG more fully utilizes the semantic information in text, we reduced the number of training samples in the WN18RR dataset. e total number of entities remained the same, but the number of triples was reduced by 60%.

Computational Intelligence and Neuroscience
To make the comparison more convincing, the DKRL method (BERT + TransE), which also integrates textual information, was evaluated to highlight the importance of the text-augmented KG and semantic propagation when the number of training samples is limited. e results in Table 12, demonstrate that the models DKRL and SP-TAG combined with text information perform significantly better than TransE, which only uses triples in link prediction. When there are fewer training samples, the models obtain more entity feature information from the text, effectively compensating for the lack of triple training samples. Compared with the performance of DKRL, the performance of SP-TAG is obviously better in all three metrics. Its MR is very close to the results obtained by some methods on the complete training set (e.g., Pretrain-RotatE).
ese results further demonstrate that SP-TAG more closely connects entities in KG by connecting text entities and better achieves semantic propagation between related entities.

Conclusion and Future Work
To address the problem of insufficient utilization of text semantic information in existing methods, we proposed SP-TAG, which is based on text-augmented KG semantic propagation to better realize the full integration of text semantics and structural semantics and further improve the utilization of text information. e experimental analysis on multiple benchmark datasets demonstrated that SP-TAG can effectively improve link prediction performance, especially when the number of training samples is limited.     Computational Intelligence and Neuroscience 13 Our experimental results demonstrate the feasibility of the theory and indicate the significance of continuing research in this direction. In the future, the following further improvements could be considered: (1) When building a text-augmented KG, the entities and text entities could be aligned to make the KG more streamlined and accurate. (2) During semantic propagation, the attention mechanism can be further combined to obtain different representations of entities for different relations. (3) For entities that do not appear in the training set, text information could be used to represent them to achieve zero-shot prediction or open KG prediction.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.