Confidence-Aware Embedding for Knowledge Graph Entity Typing

,


Introduction
Knowledge graphs (KGs) consist of a huge amount of triples, each of which is formally denoted as (head entity, relation, tail entity) (or (e, r, e)). KGs are effective well-structural relational databases for knowledge acquisition. Beside the triples, KGs usually contain a great number of entity type instances in the form of (entity, entity type) (denoted by (e, τ)) [1], which indicate that an entity e is of a certain entity type τ. For example, an entity "Tom Hanks" is an instance of a type "actor." As an essential part of KGs, they play an important role in KGs and have been widely used in some NLP tasks such as entity linking [2] and relation extraction [3], and question answering (QA) [4]. For instance, the KGdriven QA system could utilize the entity type information in a query: "Is Tom Hanks an actor ?" In recent years, many KGs have been built from semistructured data or free text, such as Freebase [5], YAGO [6], Knowledge Vault [7], and Google Knowledge Graph [7].
However, the large-scale knowledge graph automatic construction inevitably brings noises into KGs due to limited human supervision. For example, the open fine-grained entity typing system [8] even only achieves 58.8% accuracy, which reaffirms the existence of entity type noises in KG construction.
us, the nonnegligible entity type noise problem extremely impedes the efficient use of KGs [9].
In this work, we focus on dealing with entity type noises located in existing KGs by learning entity type embeddings, which encodes all the entities and entity types into a latent vector space. Since the learning approach depends on the reliability of the existing (entity, entity type) tuples, it is crucial to consider the entity type noises for learning embeddings. ere are some models proposed for KG entity type embedding learning [10][11][12]. However, most of the learning methods unreasonably assume that all the existing entity type instances in KGs are true, which may lead to some potential errors for downstream entity type sensitive tasks. To address this issue, we propose ConfE, a novel confidence-aware embedding framework for entity type learning which takes the entity type noises into consideration. Figure 1 shows a simple illustration of our model ConfE, which learns entity type embeddings with the tuple confidence on noisy KGs. Such entity type noises are expected to be detected by ConfE and to be ignored in entity type embeddings learning.
Specifically, we build two different entity space and entity type space for learning the embeddings of entities and entity types since they are different objects in the (e, τ) tuple. We utilize a unique "rdf:type" relation matrix M to specify the interaction of their embeddings, that is, e ⊤ Mτ, which incorporates the tuple confidence C(e, τ) as well. To make the tuple confidence more universal, we only utilize the internal structural information, which makes it more challenging. Accordingly, we propose two kinds of tuple confidences that correspondingly consider the local tuple and global triple structural information in KGs. e extensive experimental results on two tasks including entity type noise detection and entity type prediction show that our model achieves the best performance, which demonstrates the effectiveness of ConfE in learning better entity type embeddings in a noisy scenario.
e main contributions are concluded as follows: (i) We propose ConfE, a novel confidence-aware embedding model for encoding the (entity, entity type) tuples to calculate the similarity of an entity and an entity type, which takes the tuple confidence into consideration. (ii) We build two distinguish tuple confidences according to the local and global structural information in existing KGs. e overall confidence of them is utilized in the final energy function for learning better embeddings. (iii) We conduct two experimental tasks including entity type noise detection and entity type prediction and utilize two public benchmark datasets (i.e., FB15kET and YAGO43kET) to verify the effectiveness of our model and the confidence-aware framework.

KG Noise Detection.
In recent years, the research of KG noise detection attracts wide attention, also known as KG refinement [13]. e noise issue can be roughly classified into two classes, that is, false relationships between entities (head entity, relationship, tail entity) and false entity type instances (entity, entity type). Most of the existing research concentrates on the deal with the noisy triple facts in KG [9,14,[14][15][16][17][18][19][20][21]. For example, Jiang et al. [15] present a Markov logic-based system for cleaning an extracted knowledge base. Melo and Paulheim [17] propose an error detection method which relies on path and type features used by a classifier for every relation in the graph exploiting local feature selection. Neil et al. [18] introduce a regularized attention mechanism to GCNNs that not only improves performance on clean datasets but also favorably accommodates noise in KGs. Liang et al. [19] propose a method for graph-based wrong IsA relation detection in a large-scale lexical taxonomy. Pujara et al. [9] propose to improve the quality of knowledge graphs by removing errors and adding missing facts. Xie et al. [20] propose a confidence-aware knowledge representation learning framework that detects possible noises in KGs while learning knowledge representations with confidence simultaneously. Zhao et al. [21] propose a trustiness-aware method for KG noise detection. Despite their success, which focuses on detecting triple fact noises, their goals are different from this paper. ere are a few models of dealing with entity typing noises [22]. However, they mainly concentrate on association rule mining [23], heuristic link-based type inference [24]; therefore, they are constrained by the capability of generalization. Recently, Ren et al. [25] propose a heterogeneous partial-label embedding model for label noise detection. Templemeier et al. [26] propose an approach to predict the missing categories for particular entities that are obtained from noisy and sparse Web markup. Despite their success, their goals are different from KG entity type noises detection, and none of them consider the confidence of the entity type tuples. In this work, we concentrate on knowledge graph entity type noise detection and learn better entity type embeddings with confidence in a noisy scenario. e model illustration of KG noise detection is included in Table 1.

KG Embedding.
Recently, KG embedding has become a hot topic in AI and NLP research field [27]. Most of the existing embedding models concentrate on learning the (head entity, relationship, tail entity) triples, such as SE [28], NTN [29], TransE [30], TransH [31], TransR [32], TransG [33], ComplexE [34], SSP [35], ProjE [36], ConvE [37], KBGAT [38], CapsE [39], and ConvKB [40], which pay less attention to the exploration of embedding the (entity, entity type) tuples. Recently, Neelakantan and Chang [10] propose a method to infer missing entity type instances, where they embedding the (e, τ) tuple by e ⊤ τ. However, they also use external information from Wikipedia besides the information within the existing KG. Moon et al. [11] propose an embedding approach for entity type embedding (ETE), in which they build the energy function as ‖e − τ‖ ℓ1 . Despite their success, they are lacking enough modeling capability due to their structural simplicity. In this work, we introduce an advanced embedding model with better expressive capability, which considers the structural information of both the (entity, entity type) tuples and the (head entity, relationship, tail entity) triples in KGs.

Methodology
To detect possible entity type noises in KGs and learn better entity type representations, we introduce a novel concept tuple confidence for each (entity, entity type) tuple. Tuple confidence describes the correctness and significance of a tuple, which could be measured according to local tuple and global triple information. e novelty of this work is to model the confidence of entity type instances for typing noise detection and propose an embedding method to model tuples .
In the following, we first present the confidence-aware embedding learning framework and then describe the embedding model and the methods for calculating the tuple confidences.

Confidence-Aware Embedding Learning Framework.
We intend to detect entity type noises and learn better entity type embeddings that take tuple confidence into consideration. Our ConfE model should concentrate more on those tuples with higher confidence. Similar to [20], we formally design the energy function of a tuple (e, τ) as follows: where H denotes the set of all (e, τ) tuples in KGs. e energy function consists of two parts: (i) G(e, τ) � e ⊤ Mτ denotes the model score of the tuple (more details are included in Section 3.2), which assigns for an asymmetric matrix M that specifies the interaction of the latent presentation of entity and entity type. A higher G(e, τ) indicates better interaction between the latent embeddings of entity and entity type in the tuple. (ii) While different from conventional methods, we propose tuple confidence in the framework. C(e, τ) stands for the overall tuple confidence of the tuple (Section 3.3), whose value comes higher when the current tuple is worth considering. Higher tuple confidence C(e, τ) implies that the corresponding tuple is more credible and thus should be more considered. Tuple confidence can be calculated both during and after KG construction from different aspects including internal knowledge in KG (such as topological information) and external information (such as textual data). To make our tuple confidence more universal and flexible, we only consider the KG structural information. Accordingly, we propose local and global tuple confidence that are learned iteratively during model training.

Model Optimization.
Following [30], we utilize the margin-based ranking loss function to train our model ConfE. e main idea is that each tuple in the training set H (i) � (t (i) , τ (i) ) should receive a higher score than a corrupt tuple in which a type is replaced with a random entity type. e ranking loss function is defined as follows: where G(e, τ) and G(e ′ , τ ′ ) represent the model score of positive triple and negative triple, respectively. e tuple confidence C(e, τ) makes our algorithm learning more on those convincing tuples with higher confidence. c 1 is a hyperparameter for distinguishing positive instance and negative one. H ′ is a set of corrupt tuples built in the following way: Models Data e method e goal Jiang's et al. [15] Structured information in KG Markov logic-based Triple fact noises PaTyBRED [17] Path and type features in KG Classifier Triple fact noises Neil's et al. [18] Structured information in KG Attention mechanism + GCNNs Triple fact noises Liang's et al. [19] Large-scale lexical taxonomy Graph-based method IsA relation detection Pujara's et al. [9] Structured information in KG Embedding techniques Triple fact noises

Complexity
Note that we do not replace both entity and entity type with a random one at the same time.

Embedding Model.
We introduce the embedding model of a (e, τ) tuple in this section. Similar to [11], we treat the (e, τ) tuples as triple facts that only have a unique relationship "rdf:type", for example, (Tom Hanks, rdf:type, actor). Accordingly, we assign for the "rdf:type" relationship an asymmetric matrix M that specifies the interaction of the latent presentation of entity and entity type, which is inspired by the previous embedding model RESCAL [41]. Formally, the embedding model of a given (e, τ) tuple is designed as follows: where e ∈ E, τ ∈ T, E, and T are the set of entities and entity types, respectively. Different from the conventional methods that encode entities and types into a common space, we build two distinct latent vector spaces for them, that is, entity space and entity type space, since the entities and entity types are different objects in KGs. e ∈ R κ stands for the representation of an entity e in entity space and τ ∈ R ℓ is the representation of a type τ in entity type space. M ∈ R κ×ℓ denotes the asymmetric matrix. Since the representations of entity types indicate the common knowledge of all their entities, therefore, they usually have fewer parameters, that is, ℓ < κ. e model score is expected to be higher for a positive tuple and lower for a negative one.

Tuple Confidence.
In this section, we will introduce the detailed methods of calculating the tuple confidence, which consists of two parts: (i) local tuple confidence LC(e, τ), which only considers the inside structural information of a tuple, and (ii) global triple confidence GC(e, τ), which considers the global triple information in KGs.

Local Tuple Confidence.
We first come up with local tuple confidence LC(e, τ) which only concentrates on the inside of a tuple. We assume that the more a tuple fits the interaction assumption, the more convincing this tuple should be considered. e basic idea behind it is that the model score of the positive tuple should be higher than the negative one. We believe that the more the value of the margin-based objective function, the more convincing the tuple should be considered in training. To measure the local tuple confidence during training, we first judge the current conformity of each tuple with interaction assumption. Inspire by the margin-based training strategy, we directly utilize it to represent the local tuple quality Q lt (e, τ) as follows: A higher Q lt (e, τ) usually indicates a better tuple judged by the interaction assumption. Hence, the local tuple confidence LC(e, τ) changes with its corresponding tuple quality Q lt (e, τ), which is formally built as follows: We assume all given tuples are true and set LC(e, τ) � 1 at the beginning, which would be continuously updated during training. α ∈ (0, 1) and β > 0 are hyperparameters that control the speed of LC(e, τ) when updated descendingly and ascendingly, respectively. If Q lt (e, τ) ≤ 0, it indicates that the interaction between the entity and entity type performs poorly, and thus the local tuple confidence should decrease; otherwise, it should increase it. e local tuple confidence LC(e, τ) will decrease at a geometric rate and increase with a constant addition. It urges to punish the violations of interaction rule for those tuples which are more likely to be noises; therefore, they should have smaller confidences.

Global Tuple Confidence.
Despite the success of LC, it only concentrates on the inside of tuples, ignoring valuable triple facts in KGs. We observe that the relational triple information is also helpful to judge tuple qualities. Inspired by the work in [1], we first build the entity type triple (head type, relationship, tail type) by replacing both head entity and tail entity with their corresponding entity types, that is, (e, r, e) ⟶ replace (τ, r, τ) , using two entity type tuples (e, τ) and (e, τ). e main idea behind it is that a significant premise of a triple holds is that their corresponding entity types should obey their relationship. Accordingly, we utilize the translating assumption [30] to model the entity type triples, that is, I(τ, r, τ) � ‖τ + r − τ‖. We believe that the more an entity type triple fits the translation assumption, the more convincing this entity type tuple should be considered. erefore, we calculate the global triple quality Q gt (e, τ) of an entity type tuple (e, τ) as follows: where τ ∈ τ|(e, r, e) ∈ D, (e, τ) ∈ H { }, D denotes the set of positive triple facts in KGs. τ ′ is a random negative entity type. c 2 > 0 is a hyperparameter. Hence, the global tuple confidence GC(e, τ) can be learned during training as follows: Here, the iterative learning process of GC(e, τ) is similar to LC(e, τ). We assume all tuples are true and GC(e, τ) � 1 at the beginning, which are continuously updated during training. α ∈ (0, 1) and β > 0 are hyperparameters that control the speed of GC(e, τ) (∈ (0, 1]) updating.

Overall Tuple Confidence.
To the end, we build overall tuple confidence for confidence-aware energy function. e overall tuple confidence consists of the following two parts: (i) local tuple confidence LC(e, τ) and (2) global tuple confidence GC(e, τ), which is formally designed as follows: 4 Complexity where λ ∈ (0, 1) is a parameter for trade-off.

Experiments
In this section, we evaluate the effectiveness of ConfE on entity type noise detection and entity type prediction.

Entity Type Noise Detection.
In this experiment, we conduct entity type noise detection, that is, detecting possible noisy entity types according to their tuple scores.
Evaluation protocol: We consider the model score: G(e, τ) � eMτ for entity type tuple. Similar to [29], we rank all (entity, entity type) tuples in the noisy training set by their scores in descending order. erefore, the tuples with lower ranking would more likely be noisy ones. We utilize the precision/recall curves to demonstrate the effectiveness of our model. Experimental results: Figures 2 and 3 show the performances of all models on entity type noise detection, from which we can find that (i) On FB15kET and YAGO43kET, our ConfE model achieves the best performance under different noise rates, which confirms that ConfE could effectively and competently detect entity type noises in KGs. As the recall increases, the improvement introduced by our ConfE model over the baseline grows more insignificant, which reaffirms that the noises greatly impede entity type noise detection. (ii) Compared to YAGO43kET, the ConfE model seems to perform more significantly on FB15kET. Considering that there are 37 relations in YAGO43kET while 1345 in FB15kET, the sparseness of relationships harms the effectiveness of the type-relation-type training set. Such sparseness causes a relation to be connected to too much entity type so that the embedding of relation may not be capable of accurately describing its internal connection with different entity types. e results also verify the effectiveness and robustness of our model in both scenarios.

Entity Type Prediction.
is task aims to verify the effectiveness of the ConfE model in entity type prediction, that is, completing the missing entity type tuple (entity, entity type � ?).
Evaluation protocol: For each tuple, we first remove its entity type and fill the resulting vacancies with all the entity types in turn as candidate tuples. Secondly, we compute the score of each candidate tuple based on the function G(e, τ) and rank them in descending order. en, we can get the rank of the original tuple. Finally, we use (1) the mean reciprocal rank (MRR) and (2) the proportion of correct entity types ranked in the top 10 (HITS@10(%)) as evaluation metrics for comparison. We follow the method utilized in [30] to define evaluation settings of "Raw" and "Filter": where C is the collection of all testing (entity, entity type) tuples and rank i is the rank location of the true candidate tuple for the i-th pair. Experimental results: Tables 3 and 4 show the result of all models on entity type prediction, from which we could observe that (i) ConfE consistently and significantly performs better than the baselines on FB15kET noisy testing datasets with all evaluation metrics. It reaffirms the quality of knowledge embedding in our ConfE model, which is also helpful for both KG entity type prediction and entity type noise detection. (ii) Our ConfE model outperforms on MRR in YAGO43kET noisy testing datasets in "raw" setting. Compared with HITS@10, MRR places more importance on the average ranking of the original tuple. We guess that although ConfE may be not as good as baselines, it also has considerable advantages in improving the average prediction accuracy. (iii) In the setting of "filter", ConfE performs better on HITS@10 and has a comparable    e best scores are in given bold, and the second-best ones are given in italics.
6 Complexity performance on MRR, which confirms the capability of entity type prediction. Moreover, our model has stronger adaptability in large-scale data modeling than other state-of-art models.

Conclusion and Future Work
We propose a novel confidence-aware embedding framework (ConfE) for KG entity typing on a noisy knowledge graph which takes the (entity, entity type) tuple confidence into consideration. Specifically, we build a bilinear embedding model to model the (entity, entity type) tuple. Moreover, we calculate the tuple confidence by considering the internal structural information in KGs. We evaluate our models on two experiments including entity type noise detection and entity type prediction. Empirical experiment results on FB15kET and YAGO43kET demonstrate the effectiveness of the proposed ConfE model in entity type noise detection. Interesting future work direction includes exploring to detect noises in entity type instances and entity type triples simultaneously.
Data Availability e data are available upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.

Authors' Contributions
Yu Zhao and Jiayue Hou contributed equally. e best scores are in given bold, and the second-best ones are given in italics.