Integrating Cross-lingual Ontologies through Co-Evolutionary Algorithm

To support the collaborations among intelligent applications, it is necessary to integrate various ontologies which are developed and maintained by diﬀerent organizations. One of the challenges is that diﬀerent ontologies in the same application domain might use diﬀerent languages to describe the same concept, which yields the cross-lingual heterogeneity problem, i.e., how to map two identical entities in diﬀerent languages. To address this problem, this work proposes a problem-speciﬁc co-evolutionary algorithm (CoEA)-based matching technique. In particular, we ﬁrst propose a parallel aggregating framework to aggregate diﬀerent SMs and then construct a continuous optimization model for deﬁning the problem of cross-lingual ontology integration. To better trade oﬀ the algorithm’s exploitation and exploration, we use two competitive subpopulations to, respectively, execute the exploitation and exploration. The experiment utilizes OAEI’s multifarm track for testing purpose, and the experimental results show that CoEA is able to eﬀectively integrate various cross-lingual ontologies.


Introduction
With the rapid development of Semantic Web [1,2], various ontologies have been developed in diverse domains to annotate the data. To implement the intelligent applications' collaboration, it is necessary to integrate various ontologies which are developed and maintained by different organizations. One of the challenges is that different ontologies in the same application domain might use different languages to describe the same concept, which yields the cross-lingual heterogeneity problem. To address this issue, we need to map two identical entities in different languages, which is the socalled cross-lingual ontology matching [3]. When matching two ontologies, it is important to use the similarity measure (SM) to distinguish the heterogeneous entities [4], and usually, different SMs should be aggregated to make their advantages and disadvantages complement each other, which is of help to improve the confidence of the result. Currently, the parallel framework owns such merits as its flexibility of tuning various weights and the relative independence on different matchers in terms of their executing processes. Before aggregating SMs, their corresponding similarity matrices should be calculated independently first. e similarity matrix's row and column are, respectively, two ontology's entities, and its elements are the corresponding entities' similarity value. en, these matrices are aggregated into one matrix by using the aggregating weights. Finally, a threshold is used to filter the correspondences with low similarity values.
Since it is a complex optimizing task of optimizing a cross-lingual ontology alignment, metaheuristics approaches, such as evolutionary algorithm (EA) [5], become the popular methods of determining the highquality alignments. Being inspired by the success of metaheuristics approaches in the cross-lingual ontology matching domain, this work proposes a co-evolutionary algorithm (CoEA) to determine the cross-lingual ontology alignment. CoEA uses the compact mechanism and co-evolutionary mechanism to overcome two shortcomings of classic EA, i.e., high computational cost and premature convergence. In particular, we make the following contributions: a parallel aggregating framework of matching cross-lingual ontologies is presented; a continuous optimization model is built to define the cross-lingual ontology matching problem; a problemspecific CoEA is proposed to optimize the cross-lingual ontology alignment's quality, which uses two competitive subpopulations to trade off algorithm's exploitation and exploration. e rest of this study is organized as follows. After defining the cross-lingual ontology matching problem (Section 2), three kinds of SMs are introduced (Section 3), then CoEA is presented (Section 4), and the experimental results are shown (Section 5). Finally, the conclusion is drawn (Section 6).

Cross-Lingual Ontology Integrating Problem
An ontology consists of concepts, properties, and axioms, and an ontology alignment is a mapping set between two heterogeneous ontologies. A mapping is a 3-tuple (c 1 , c 2 , simValue), where c 1 and c 2 are, respectively, two ontologies' entities, and simValue∈∈[0, 1] is their similarity [6,7]. Given an alignment A and a reference alignment A ref , A's quality can be measured with f-measure [8]: where | | is the cardinality of a particular set. e cross-lingual ontology integrating problem is modelled as a continuous optimization problem. In particular, its objective is to maximize the f-measure of the cross-lingual ontology alignment, and its decision variable � x 1 , x 2 . . . T , where x i ∈ [0, 1] is the ith aggregating weight, and their sum is equal to 1.

Similarity Measure
Generally, there are three broad categories of SM, i.e., syntactic SM, linguistic SM, and taxonomy SM [9][10][11]. Syntactic SM calculates two strings' similarity by measuring their edit distance. Levenshtein distance [12] is one of the popular syntactic SMs, which is defined as follows: where is the character numbers of a string and d() is two strings' edit distance. Linguistic SM measures two words' similarity with the electronic dictionary, such as Wordnet [13,14]. Given two entities' label label 1 and label 2 , their linguistic similarity value is defined as follows: Taxonomy SM uses two concepts c 1 and c 2 's context to determine their similarity, which is defined as follows [15,16]: where super 1 and super 2 are, respectively, c 1 and c 2 's super classes and sub i and sub j are, respectively, their ith and jth subclasses. In this study, by referring to Chen et al.'s work [17], we first use the Babelnet Translate 1 to translate the terminologies in various languages into English and then use the above three SMs to calculate their similarity values. All the similarity measures are executed in parallel to determine their corresponding similarity matrices, which store the similarity values of all correspondences. After that, CoEA is used to determine the aggregating weight for the similarity matrices in this parallel framework.

Compact Encoding.
CoEA uses the binary coding mechanism [18], which is of help to reduce the algorithm's computational complexity. Given a set of cut points C ' � {c 1 , c 2 , . . ., c n }, we first sort it in ascending order as C � {c 1 , c 2 , . . ., c n }, and then, we can get the corresponding weight set through the following equation: After that, we use n cutting points to obtain n + 1 aggregating weights.
is work uses two probability vectors (PVs) to describe the gene distribution on two competitive subpopulations, i.e., PV better and PV worse . In particular, PV better describes the subpopulation whose elite solution owns higher fitness value, while PV worse is lower. Each PV's elements are the real number in [0,1], which represents their corresponding gene bit's probability of being 1. With a PV, we are able to generate various solutions with the similar gene distributions.

Exploration and Exploitation.
We apply different strategies on PV better and PV worse 's corresponding subpopulations. e former mainly focuses on the exploitation operator, while the latter uses the exploration strategy. For the sake of clarity, we show these two strategies in Algorithms 1 and 2.
Here, we introduce the exponential crossover operator (EC) [19] to implement CoEA's exploration and exploitation operators. Comparing with traditional crossover operator, EC generates the offspring by inheriting a complete sequential genes from two parents, which is more exploitative. Given two solutions, EC randomly copies a certain number of sequential bits' values from the first one to the second one. With respect to the exploration operator, we use EC to mix a newly generated solution solution new and the elite pollen solution elite better , while in the exploitation operator, we first mix two newly generated solutions solution p and solution q to obtain the mediate individual; then, we mix it with solution new , which approximates the evolutionary operator of differential evolution algorithm (DE) [20] to ensure the algorithm's exploration.

Experimental Configuration.
In the experiment, Ontology Alignment Evaluation Initiative (OAEI)'s Multifarm track 2 , which includes 45 ontology pairs in different languages, is used to test CoEA's performance, and Table 1 gives a brief descriptions on the testing cases.

Experimental Results.
In the experiment, we compare CoEA with EA-and DE-based cross-lingual ontology matching techniques, whose results demonstrated in the tables are the mean values of 30 independent runs. In particular, Table 2 compares CoEA with CEA and CDE in terms of f-measure, and Table 3 carries out T-test [22] among three cross-lingual ontology matching techniques.
As can be seen from Table 2, CoEA's results are much better than CEA-and HCEA-based ontology matching techniques. CoEA makes use of two different evolving strategies, which is of help to ensures its robustness. From Table 3, we can see that CoEA outperforms other two crosslingual ontology matching techniques on 5% significant level.

Conclusion
To aggregate the ontologies in different languages, this work proposes a CoEA-based cross-lingual ontology matching technique. We first propose a parallel framework of aggregating different SMs and then construct a continuous optimization model to define the problem of cross-lingual ontology integration. To solve this problem, for better trading off the algorithm's exploitation and exploration, we propose a CoEA with two competitive subpopulations. e experiment utilizes OAEI's multifarm track for testing, and the experimental results show that CoEA is able to effectively match cross-lingual ontologies.
In the future, we are interested in further improving the solution's quality by introducing the specific domain knowledge base to distinguish the heterogeneous entity pairs. Moreover, when the scale of ontology becomes large, the searching space of algorithm will definitely grows, which is a challenge for CoEA's efficiency. A feasible approach to face this challenge would be the semantic pruning technique or the ontology partitioning techniques [23], which can be used to improve the efficiency of CoEA when solving largescale cross-lingual ontologies. [24].

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.