Over the recent years, ontologies are widely used in various domains such as medical records annotation, medical knowledge representation and sharing, clinical guideline management, and medical decision-making. To implement the cooperation between intelligent applications based on biomedical ontologies, it is crucial to establish correspondences between the heterogeneous biomedical concepts in different ontologies, which is so-called biomedical ontology matching. Although Evolutionary algorithms (EAs) are one of the state-of-the-art methodologies to match the heterogeneous ontologies, huge memory consumption, long runtime, and the bias improvement of the solutions hamper them from efficiently matching biomedical ontologies. To overcome these shortcomings, we propose a compact CoEvolutionary Algorithm to efficiently match the biomedical ontologies. Particularly, a compact EA with local search strategy is able to save the memory consumption and runtime, and three subswarms with different optimal objectives can help one another to avoid the solution’s bias improvement. In the experiment, two famous testing cases provided by Ontology Alignment Evaluation Initiative (OAEI 2017), i.e. anatomy track and large biomed track, are utilized to test our approach’s performance. The experimental results show the effectiveness of our proposal.
Ontologies provide a shared and common vocabulary for representing a domain of knowledge [
Recently, Evolutionary Algorithms (EAs) are one of the state-of-the-art methodologies to match the heterogeneous ontologies [
The rest of the paper is organized as follows: Section
Due to the complex and time-consuming nature of the ontology matching process, EA-based methods could present a good methodology for obtaining ontology alignments and indeed have already been applied to solve the ontology alignment problem by reaching acceptable results [
The Coevolutionary Algorithm [
Different from all the techniques mentioned above, in this work, we propose a compact coevolutionary Algorithm to match the biomedical ontologies, which combines the advantages of the compact EA and coEvolutionary Algorithm to save the memory consumption and runtime and overcome the bias improvement of solutions.
In this work, an ontology is defined as a quadruple
An alignment
Concept similarity measure is the foundation of biomedical ontology matching [
In this work,
Moreover, the similarity value of two profile elements is calculated by N-gram distance [
In this work, we suppose that, in the golden alignment, one concept in the ontology is matched with only one concept in the other ontologies and vice versa. Two rough alignment evaluations, i.e.,
And,
In most instances, it requires considering both
Given two biomedical ontologies
One of the shortcomings of
Model-based optimization using probabilistic modeling of the search space is one of the areas where research on Compact Evolutionary Algorithm (CEA) has considerably advanced in recent years. In each generation, CEA updates the probability vector (PV), which is a probabilistic model describing the univariate statistics of the best solutions and then uses it to generate new candidate solutions. By employing the PV, instead of a population of solutions, to simulate the behavior of classic EA, a much smaller number of individuals is needed to be stored in the memory. Thus, CEA can significantly reduce the memory consumption [
In the next, three main components of CEA, i.e., chromosome-encoding mechanism, probability vector, and local search strategy are, respectively, presented.
Given a gene segment
while if end while
This procedure is similar with the two-point crossover where the first cut point is randomly selected from
In this work, we use three PVs to represent the subswarms for maximizing
Take updating Generate a new individual [ if( for if( if( else if ( for( if((random(0, 1) < Generate an individual Generate while((random(0, 1) < if(( end While [winner, loser] = compete( if(( For For
if ( stop and the elite with best else gen = gen+1; go to Step 2; end if In the evolving process, we first update
In this work, we exploit the Anatomy (
The Compact Coevolutionary Algorithm uses the following parameters which represent a trade-off setting obtained in an empirical way to achieve the highest average alignment quality on all exploited testing datasets: Numerical accuracy = 0.01; Update rate = 0.1; Crossover probability = 0.6; Mutation probability = 0.03; Mutation rate = 0.05; Maximum generation = 3000.
In order to compare the quality of our proposal with the participants of OAEI 2017 (
Comparison of our approach with the participants in OAEI 2017 on anatomy track.
System | R | P | F | Runtime (second) |
---|---|---|---|---|
AML | 0.93 | 0.95 | 0.94 | 37 |
|
0.92 | 0.94 | 0.93 | 70 |
POMap | 0.90 | 0.94 | 0.93 | 808 |
LogMapBio | 0.89 | 0.88 | 0.89 | 820 |
XMap | 0.86 | 0.92 | 0.89 | 37 |
LogMap | 0.84 | 0.91 | 0.88 | 22 |
KEPLER | 0.74 | 0.95 | 0.83 | 234 |
LogMapLite | 0.72 | 0.96 | 0.82 | 19 |
SANOM | 0.77 | 0.89 | 0.82 | 295 |
Wiki2 | 0.73 | 0.88 | 0.80 | 2204 |
ALIN | 0.33 | 0.99 | 0.50 | 836 |
EA | 0.76 | 0.88 | 0.78 | 22 |
Our approach | 0.94 | 0.97 | 0.95 | 34 |
Comparison of our approach with the participants in OAEI 2017 on the Large Biomed track.
System |
|
|
|
Runtime (second) |
---|---|---|---|---|
|
||||
XMap |
0.85 | 0.88 | 0.87 | 130 |
AML | 0.87 | 0.84 | 0.86 | 77 |
|
0.89 | 0.82 | 0.85 | 279 |
LogMap | 0.81 | 0.86 | 0.83 | 92 |
LogMapBio | 0.83 | 0.82 | 0.83 | 1552 |
LogMapLite | 0.82 | 0.67 | 0.74 | 10 |
Tooll | 0.74 | 0.69 | 0.71 | 1650 |
PBIL | 0.76 | 0.88 | 0.78 | 22 |
Our approach | 0.87 | 0.89 | 0.88 | 72 |
|
||||
|
||||
XMap |
0.84 | 0.77 | 0.81 | 625 |
YAM-BIO | 0.73 | 0.89 | 0.80 | 468 |
AML | 0.69 | 0.88 | 0.77 | 177 |
LogMap | 0.65 | 0.84 | 0.73 | 477 |
LogMapBio | 0.65 | 0.81 | 0.72 | 2951 |
LogMapLite | 0.21 | 0.85 | 0.34 | 18 |
Tooll | 0.13 | 0.87 | 0.23 | 2140 |
PBIL | 0.72 | 0.74 | 0.72 | 147 |
Our approach | 0.81 | 0.84 | 0.82 | 183 |
|
||||
|
||||
AML | 0.67 | 0.90 | 0.77 | 312 |
YAM-BIO | 0.70 | 0.83 | 0.76 | 490 |
LogMapBio | 0.64 | 0.84 | 0.73 | 4728 |
LogMap | 0.60 | 0.87 | 0.71 | 652 |
LogMapLite | 0.57 | 0.80 | 0.66 | 22 |
XMap |
0.55 | 0.82 | 0.66 | 563 |
Tooll | 0.22 | 0.81 | 0.34 | 1105 |
PBIL | 0.64 | 0.81 | 0.71 | 304 |
Our approach | 0.73 | 0.88 | 0.79 | 326 |
As can be seen from Table
In particular, PBIL works with one PV, but our approach utilizes three PVs to cooperate with each other during the evolving process to improve the solution’s quality. As can be seen from the experimental results, although our approach takes only a little more runtime than PBIL, the qualities of our results are much better than PBIL in terms of both recall and precision, which shows that our approach can effectively overcome the bias improvement of solutions in PBIL.
In this work, in order to overcome the drawbacks in traditional E-based ontology matching techniques, we for the first time propose a compact Coevolutionary Algorithm to efficiently match the biomedical ontologies. In our approach, three PVs are utilized to characterize three subswarms that take as objectives maximizing
The data used to support the findings of this study have not been made available because of the protection of technical privacy and confidentiality.
The authors declare that they have no conflicts of interest.
This work is supported by the National Natural Science Foundation of China (Nos. 61503082 and 61403121), Natural Science Foundation of Fujian Province (No. 2016J05145), Scientific Research Startup Foundation of Fujian University of Technology (No. GY-Z15007), Scientific Research Development Foundation of Fujian University of Technology (No. GY-Z17162), and Fujian Province Outstanding Young Scientific Researcher Training Project (No. GY-Z160149).