Multiview Translation Learning for Knowledge Graph Embedding

Recently, knowledge graph embedding methods have attracted numerous researchers’ interest due to their outstanding eﬀec-tiveness and robustness in knowledge representation. However, there are still some limitations in the existing methods. On the one hand, translation-based representation models focus on conceiving translation principles to represent knowledge from a global perspective, while they fail to learn various types of relational facts discriminatively. It is prone to make the entity congestion of complex relational facts in the embedding space reducing the precision of representation vectors associating with entities. On the other hand, parallel subgraphs extracted from the original graph are used to learn local relational facts discriminatively. However, it probably causes the relational fact damage of the original knowledge graph to some degree during the subgraph extraction. Thus, previous methods are unable to learn local and global knowledge representation uniformly. To that end, we propose a multiview translation learning model, named MvTransE, which learns relational facts from global-view and local-view perspectives, respectively. Speciﬁcally, we ﬁrst construct multiple parallel subgraphs from an original knowledge graph by considering entity semantic and structural features simultaneously. Then, we embed the original graph and construct subgraphs into the corresponding global and local feature spaces. Finally, we propose a multiview fusion strategy to integrate multiview representations of relational facts. Extensive experiments on four public datasets demonstrate the superiority of our model in knowledge graph representation tasks compared to state-of-the-art methods.


Introduction
Knowledge graphs [1] are a sort of directed graphs consisting of entities as nodes and relations between entities as edges. And each relational fact of knowledge graphs is stored as a triplet (head, relation, tail), abbr. (h, r, t), where h and t represent head and tail entities, respectively, and r is a relationship from h to t. With the advent of the big data era, the scale of knowledge graph continues to grow; diverse largescale knowledge graphs (e.g., WordNet [2] and Freebase [3]) have appeared. Despite the large scale of current knowledge graphs, they are still far from the knowledge completeness. For example, 75% of people in Freebase lack nationality information and 71% lack birthplace [4]. erefore, it is necessary to design approaches to automatically complete or infer missing relational facts of the existing knowledge graph.
Recently, embedding-based approaches present strong feasibility and robustness in terms of knowledge graph completion, which project entities and relations of knowledge graphs into a dense, continuous, and low-dimensional vector space. Among the existing approaches, translationbased approaches have attracted numerous researchers' interests due to the outstanding effectiveness and robustness in the knowledge representation. e first translation-based method was proposed by Bordes et al., named TransE [5]. For each triplet (h, r, t), TransE treats a relation r as a translation operation from h to t in a vector space. If (h, r, t) holds, the translation principle h + r ≈ t should be satisfied in the vector space, where h, r, and t are vector representations of h, r, and t. TransE is a simple yet effective translation model in processing 1-to-1 simple relational facts, which stands for each single head entity connecting only one tail entity via a specific relation. It achieves state-of-the-art performance on link prediction. However, the translation principle is too rigid to deal with complex relational facts, including 1-to-N, N-to-1, and N-to-N facts. Technically, it may cause spatial congestion of entities when many head entities (or tail entities) are projecting at only one point of the vector space.
To eliminate the weakness of TransE in representing complex relational facts, a series of improved models were proposed, such as TransH [6], TransR [7], TransD [8], and TranSparse [9]. Essentially, the above methods focused on designing various translation principles to learn complex relational facts more precisely. However, they still embed a complete knowledge graph into a single vector space from a global perspective while they fail to learn the various type of relational facts discriminatively.
at is, each entity and relation of knowledge graphs are learned as corresponding unique representing vectors in their spaces. Hence, there are still vector congestions in the translation spaces. For a real world example, the head entities of the triples (Obama, President, the United States) and (Trump, President, the United States), i.e., Obama and Trump, are projected closely in the vector space due to the same social status. But, they are quite different in other perspectives.
To solve this problem, puTransE [10] splits knowledge graphs into multiple parallel spaces in the form of subgraphs from a local perspective and achieves the spatial sparsification of complex relational facts. Specifically in puTransE, entities and relations have respective feature representations in different parallel spaces.
is approach improves the ability to learn complex relational facts due to avoiding the spatial congestion of entities in complex relational facts. However, puTransE still has two shortcomings. First, puTransE poses excessive sparseness to simple relational facts during the parallel space generation. Consequently, it hardly learns the complete vector representations of simple relational facts within a single subspace.
is is because puTransE performs the spatial sparsification not only to complex relational facts but also to simple ones. Second, puTransE randomly selects local knowledge to construct multiple parallel spaces, which is prone to impair relational facts of the original knowledge graph. For example, there is a golden triplet in the original graph that cannot be consisted by entities and relations in any parallel spaces.
In summary, all of the above methods embed knowledge graph from a single perspective, i.e., from a local-view or global-view. us, they fail to learn local and global knowledge representation uniformly. To that end, we borrow the idea of multiview learning methods [11,12] to propose a multiview translation learning model, named MvTransE, which embeds relational facts from global-view and local-view, concurrently. In detail, we first generate multiple parallel subgraphs through semantic and structural perspectives of entities to accurately capture local-view knowledge of the knowledge graph. en, the original knowledge graph and generated parallel subgraphs are embedded into global-view and local-view spaces, respectively. Finally, we propose a multiview fusion strategy to integrate multiview representations of relational facts. We outline the main contributions of this paper as follows: (1) We incorporate the idea of multiview learning into our model MvTransE, which can precisely learn relational facts from both global and local views (2) Our model extracts local knowledge from semantic and structural perspectives to construct multiple parallel subgraphs so as to solve the entity spatial congestion problem by learning local-view representations of relational facts (3) MvTransE applies a multiview fusion strategy to combine global-view and local-view representations of the knowledge graph, which effectively overcomes the missing of relational facts in parallel spaces (4) Extensive experiment results demonstrate our method outperforms state-of-the-art models in two knowledge graph completion tasks

Related Work
Since the appearance of knowledge graphs, numerous researchers have studied various methods to represent relational facts of the graphs. Initially, some embedding-based models such as Structured Embedding (SE) [13], Semantic Matching Energy (SME) [14,15], Latent Factor Model (LFM) [16], and Neural Tensor Network (NTN) [17] achieved considerable performance in knowledge representation, while fail to cope with large-scale graphs due to the computing complexity. Recently, translation-based models have attracted lots of attention due to their effective and robust representation abilities. TransE [5] is the first translation-based method, which treats a relation r as a translation from h to t for a triplet (h, r, t). Hence, TransE defines the scoring function as f r (h, t) � ‖h + r − t‖ l1/l2 , where ‖·‖ l1/l2 stands for norm-1 or norm-2 computation. During the model training, if a triplet (h, r, t) holds, the translation principle h + r ≈ t should be satisfied in the vector space, of which process is illustrated in Figure 1. at is, TransE keeps translation vectors (h + r) approximate to tail vector t. TransE achieves remarkable performance in representing simple relational facts, i.e., 1to-1 triplets. However, it has limitations in dealing with complex relationship facts including 1-to-N, N-to-1, and N-to-N triplets due to the rigid translation principle.
TransH [6] tries to solve the problem of TransE by implementing an entity to have unique representations when the entity is involved in different relations. Specifically, for each triplet (h, r, t), TransH projects h and t to a relation r specific hyperplane to obtain projected vectors h ⊥ � h − w Τ r hw r and t ⊥ � t − w Τ h tw h . e scoring function is defined as f r (h, t) � ‖h ⊥ + r − t ⊥ ‖ l1/l2. TransR/CTransR [7] models entities and relations in different vector spaces, respectively, i.e., the entity vector space and the relation vector space. For each relation r, it set a projection matrix M r to map the vector of entities from the entity vector space to the relation vector space, i.e., h r � hM r and t r � tM r . Its scoring function is f r (h, t) � ‖h r + r − t‖ l1/l2 . TransD [8] considers the diversity of entities and relations simultaneously. It uses the product of two vectors of an 2 Scientific Programming entity-relation pair to replace a projection matrix, i.e., M rh � r p h p and M rt � r p t p . TransD is more extensible and can be applied to the large-scale knowledge graphs. Its scoring function is f r (h, t) � ‖hM rh + r − tM rt ‖ l1/l2 . TranSparse [9] considers the heterogeneity and imbalance of entities and relations in a knowledge graph, which are generally ignored by previous works. TranSparse constructs adaptive sparse matrices M h r (θ h r ) and M h r (θ h r ), instead of projection matrices, to concurrently prevent the overfitting of simple relational facts and the underfitting of complex relational facts. Its scoring function is [18] and DT [19] design the flexible translation principles and the dynamic translation principles, respectively. To some extent, they improve the ability to handle complex relational facts. Essentially, they focus on elaborating various translation principles to learn complex relational facts more accurately. TransAt [17] and GAN-based framework [20] use attention mechanisms and generate adversarial networks to improve the model performance, respectively. However, all of the above methods naturally embed the complete knowledge graph into a uniform vector space from a single perspective, failing to solve the space congestion problem thoroughly.
PuTransE [21] is an online and robust improvement of TransE solving the hyperparameters sensitivity problem and the spatial congestion of entity and relation, as well as the processing of dynamic knowledge graphs. It adopts multiple parallel spaces to learn the vectors of entities and relations, thus avoiding spatial congestion in complex relational facts. erefore, puTransE achieves state-of-the-art performance on the link prediction task. However, puTransE still has two weaknesses resulting in the performance limitation. First, puTransE causes the excessive sparseness of simple relational facts during the random parallel space generation. It performs knowledge extraction including not only complex relational facts but also sparse simple relational facts. us, it probably cannot learn complete vector representations of simple relational facts via a single subgraph. Second, puTransE randomly selects local knowledge to construct multiple parallel spaces, which is prone to impair original facts of knowledge graphs. For example, there is a golden triplet, i.e., a positive sample, in the original graph which cannot be consisted by entities and relation in any parallel spaces. is situation decreases the relational fact prediction accuracy of puTransE.

Our Method
In this section, we introduce the details of MvTransE, which embeds relational facts from global-view and local-view, respectively. e workflow of MvTransE mainly consists of three steps. e first step (Section 3.1) is to generate multiple parallel subgraphs so as to extract particular local relational facts of the knowledge graph accurately. e second step (Section 3.2) aims at discriminatively embedding the knowledge graph into multiple parallel spaces to acquire multiview representations of relational facts. e last step (Section 3.3) integrates multiple versions of knowledge representations, i.e., fuses local-view and global-view representations of entities and relations. Figure 2 presents a multiview knowledge learning process of MvTransE.

Subgraph Generation.
e subgraph generation aims to extract local relational facts from different perspectives, so as to solve spatial congestion of entities by sparsely embedding entities and relations into different parallel vector spaces in the following graph embedding step. erefore, we construct multiple parallel subgraphs based on different relations of the knowledge graph. Each subgraph mainly contains the local relational facts selected from a specific relation.
Initially, we give the definition of some related symbols in the subgraph generation process. We define a knowledge graph as G � (E, R, T), where E and R denote an entity set and a relation set of graph G, respectively; T ⊆ E × R × E represents a triplet set of G. And G sub is the final generated subgraph set, G i is a subgraph that G i ∈ G sub , and E i and R i represent an entity set and a relation set of G i , respectively. Algorithm 1 demonstrates the details of the subgraph generation, which mainly consists of two steps given in the following.

Semantics-Related Entity Selection.
To accurately learn local relational facts, we first select relation relevant entities for a subgraph to ensure the semantic consistency of its knowledge as much as possible. at is, entities in a subgraph should be semantically related to each other based on a specific relation. We randomly select a relation r from relation set R then to generate entity set E r , which consists of extracted entities interconnecting with r. Since E r is generated via relation r; therefore, r is deemed as the semantic center of the current subgraph G i .

Structure-Related Subgraph Expansion.
In order to learn latent knowledge associating with r more comprehensively, we need to expand each subgraph according to the local graph structure of entities in the set E r .
is step ensures a generated subgraph containing semantic and structural features of local relational facts simultaneously. Specifically, we first randomly select an entity e i from E r as a Scientific Programming starting entity to expand the subgraph G i . And then, we randomly select a triplet whose head or tail entity is the starting entity e i and add the head or tail entity to the subgraph G i . Consequently, we can get the local structure information of E r regarding G by repeating the above two operations for n s time. Due to the randomly selected entities and triplets in the subgraph expansion, each subgraph expanded from E r may include different relational facts, which makes generated subgraphs slightly different from each other in terms of semantics and structure. ereafter, with respect to learning local relational facts discriminately, these random operations ensure MvTransE can learn local knowledge representations from multiple perspectives.
Besides, to make each subgraph focusing on particular relational facts, we need to control the scale of subgraphs for avoiding extracting excessive irrelevant facts. We set hyperparameter n t to control the number of triplets in subgraphs, set hyperparameter n s to control the expansion speed of each subgraph, set hyperparameter τ to control the maximum iterations of the triplet selection, and set hyperparameter n to control the number of generated subgraphs.

Graph Embedding.
e goal of this step is to obtain global-view representation and local-view representation of entities and relations. Hence, we perform original knowledge graph G embedding and subgraphs G embedding to learn global knowledge and local knowledge, respectively. In each vector space, we define the following equation as the scoring function f r (h, t) to translate each triplet (h, r, t): where h, r, and t are vector representations of h, r, and t, and l 1 /l 2 is the l 1 -norm or l 2 -norm distance. In MvTransE, we use the margin-based loss function as the optimization target in each vector space, which is defined as follows: where Ψ is a set of embedded vector spaces, T is a set of positive triplets in a graph, and T′ is a set of negative triplets generated by randomly replacing the head (or tail) of each positive triplet (h, r, t) ∈ T, and c is a fixed margin distance for distinguishing positive and negative triplets. We use the stochastic gradient descent (SGD) [22] to minimize the loss function. Algorithm 2 presents the multiview graph embedding process, which aims at respectively embedding each knowledge graph and subgraph into single vector spaces.
us, we will get n + 1 vector spaces, including one globalview vector space and n local-view vector spaces. e globalview vector space obtains the global-view representation of all entities and relations regarding the original knowledge graph; the local-view vector spaces differentially learn localview representation of entities and relations regarding complex relational facts from different semantic and structural perspectives.  Figure 2: A multiview knowledge learning process of the MvTransE model. e blue arrow indicates a subgraph generation operation for extracting local relational facts from the knowledge graph; and the white arrow indicates a graph embedding process projecting the original knowledge graph and generated subgraphs into a global-view vector space and several local-view vector spaces, respectively. In each vector space, the colored coordinates represent different vector spaces. e solid circles represent entity vectors in a vector space whose color is the same as the color of the space coordinate. e four-angle stars represent relation vectors, and its color corresponds to the color of relations in the knowledge graph. 4 Scientific Programming

Multiview Fusion Strategy.
In this section, we propose a multiview fusion strategy that adopts an adaptive selection principle to integrate knowledge representations of global-view vector space and local-view vector space. For each testing triplet (h, r, t), we define a scoring estimation function S r (h, t) to calculate the distance score of a vector Input: Knowledge Graph G � (E, R, T), hyperparameters of subgraph n, τ Output: A subgraphs set G sub .
Scientific Programming space, and then dynamically select the final representation of the triplet according to the minimum score. e scoring estimation function is defined as follows: where ∆ is a vector space in ψ which contains h, r, and t; hΔ, rΔ and tΔ are vectors of h, r and in a vector space ∆.
Since each parallel vector space generally contains local relational facts related to a particular relation, our model can subtly solve the spatial congestion problem. Additionally, MvTransE constructs a global-view vector space containing complete knowledge representations in a knowledge graph, which makes any testing triplet able to find a knowledge representation at least. us, our model significantly improves the performance of learning simple relational knowledge.

Experiments
In this section, we study the performance of our model in link prediction and triplet classification tasks under four public datasets, i.e., WN18, WN18RR, WN11, and FB15K-237.

Datasets. WordNet [2] is a large knowledge graph of
English vocabulary which is widely used in graph embedding works. In WordNet, a set of synonyms representing a basic vocabulary concept is taken as an entity, and various semantic relations are established between these synonym sets. In the following experiments, we use three public subsets of WordNet, i.e., WN18, WN18RR, and WN11. WN18 contains 18 relations and 40943 entities. WN18RR is a modified version of WN18 introduced by Dettmers et al. [23], which removes the reversing relational facts avoiding information leakage problem in representation tasks. WN11 consists of 11 relations and 38696 entities. Freebase is a large collaborative knowledge graph storing the general facts of the real world. We use a subset of Freebase, i.e., FB15k-237 [21], which consists of 237 relations and 14541 entities in total. Table 1 presents the statistics of the above datasets.

Link Prediction.
Link prediction aims to predict the missing head entity h or tail entity t of a test triplet (h, r, t). In this experiment, we take the entity h (or t) missed in test triplets as the correct entity, and all other entities are considered as candidate entities. Firstly, we construct candidate triplets by replacing h (or t) of the test triplet. en, the link prediction score of each triplet is calculated by the scoring function of our model. Finally, candidate entities and the correct entity are sorted in ascending order based on their prediction scores. We adopt two metrics used in [5] to evaluate our model: the average rank of each correct entity (i.e., Mean Rank) and the average number of correct entities ranked in the top 10 (i.e., Hits@10). Obviously, a good prediction performance should achieve a high Hits@10 and a low Mean Rank.
Note that the candidate triplets may already exist in the knowledge graph, so these candidate triplets should be considered as the correct triplets. e scores of these candidate triplets are likely to be lower than the correct triplets. erefore, we should filter out these candidate triplets that have already appeared in the train, validation, and test sets. We denote an evaluation setting by "Filt" if we filter out these candidate triplets before the test, otherwise denote it by "Raw." We compare MvTransE with a few state-of-the-art methods in the link prediction task on WN18, WN18RR, and FB15K-237 datasets. On WN18, MvTransE is compared to RESCAL [24], SE, SME, LFM, TransE, TransH, TransR/ CTransR, puTransE, and TransAt in Table 2. In Table 3, we compare MvTransE with three competitive methods Dis-tMult [25], ComplEx [26], and ConvE [23] on WN18RR and FB15K-237 datasets both of which do not have information leakage problem found on WN18 dataset. We directly use the results reported in their published papers or in [7] due to the same experimental settings. For MvTransE, our experimental settings are as follows.  Table 2 presents the corresponding experimental results of three model settings: (1) MvTransE (Global-view) denotes  Dataset  #Ent  #Rel  #Train  #Valid  #Test  WN18  40943  18  141442  5000  5000  WN11  38696  11  112581  2609  10544  WN18RR  40943  11  86835  3034  3134  FB15K-237  14541  237  272115  17535  20466   6 Scientific Programming the prediction results by using the global-view representation.
(2) MvTransE (Local-view) denotes the prediction results by using all parallel local-view representation. (3) MvTransE (Multiview) denotes the prediction results of the integrated multiview representation derived from a multiview fusion strategy. It can be seen from Table 2, the multiview representation yields better results than globalview and local-view representations. e results prove that our idea of representing knowledge from multiple perspectives is effective. In detail, our method substantially outperforms state-of-the-art methods in the Mean Rank metric. In Hits@10, our method is also superior to all baseline methods and achieves the best performance under "Raw," and achieves the same best performance as TransAt under "Filt" settings. Table 3 presents the results of MvTransE (Multiview) on WN18RR and FB15K-237 to further illustrate the merits of our method. Obviously, MvTransE (Multiview) markedly outperforms all methods on WN18RR, achieving state-ofthe-art performance in two metrics. On FB15K-237, MvTransE (Multiview) has achieved the best performance over all of the methods on the Mean Rank metric, performing the same as ConvE on Hits@10 metric. Particularly, our model achieves excellent performance on all three different datasets in the Mean Rank metric, which evaluates the overall quality of the learned knowledge representations.
is is because that MvTransE aims at learning knowledge representations from multiple perspectives and dynamically fusing these representations as an optimal combination.
To further explain the above observation, we present the prediction results of our method regarding all types of relational facts of WN18. Table 4 lists the type distribution of triplets in WN18 based on four relation categories. Table 5 presents the experimental results of three model settings on each relation category. Specifically, the global-view setting outperforms the local-view setting in predicting 1-to-1, 1-to-N head, and N-to-1 tail on both metrics, which actually fall into the simple relational facts learning category. On the contrary, the local-view setting exhibits superior performance in predicting complex relational facts, i.e., including N-to-N, N-to-1 head, and 1-to-N tail facts learning category. Clearly, by combining the advantages of global-view and local-view settings, the multiview method achieves the best performance in learning simple and complex relational facts.

Triplet Classification.
Triple classification task aims to determine whether a given triple (h, r, t) is correct or not. In this experiment, we adopt WN11 to verify the effectiveness of our method. Following the experiment setting of previous work [5], we set a classifying threshold δ r for each relation r. To maximize the classification accuracy, we optimize δ r on the validation set. Giving a test triple (h, r, t), if its score is lower than δ r , it will be classified as a positive sample, otherwise a negative sample.   We choose SE, SME, Single Layer Model (SLM) [17], LFM, NTN, TransE, TransH, and puTransE as baselines. We use the results reported in [7] directly since the data set is the same. For MvTransE, our experimental settings are as follows.

Subgraph Generation Setup.
We set the number of subgraphs n � 5000, the length of random walk n s ∈ [50, 300], and the size of subgraphs n t ∈ [200, 1000].
e experimental results of the triplet classification are shown in Figure 3. Clearly, MvTransE obtains the best performance among all of the baseline methods. Compared with the translation-based methods, i.e., TransE and TransH, our method is able to learn complex relational facts more subtly by constructing subgraphs from the original knowledge graph. On the other side, our method outperforms the recent competitor puTransE due to our knowledge fusion strategy which leverages an adaptive selection principle to integrate the global-view and the local-view knowledge representations reasonably. erefore, MvTransE is more suitable for embedding large and complex knowledge graphs. MvTransE has great advantages in knowledge graph completion tasks due to its multiview knowledge learning and fusing methods.

Conclusion and Future Work
In this work, we propose a multiview translation learning model, named MvTransE, which aims at presenting graph relational facts from global-view and local-view, respectively. MvTransE achieves state-of-the-art performance by solving the entity spatial congestion problem and the relational fact impairment problem. Extensive experiments demonstrate that MvTransE outperforms state-of-the-art models on the link prediction task and triplet classification task. In the future, we will focus on the subgraph construction scheme to learn the local relational facts more efficiently.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.