Using Compact Coevolutionary Algorithm for Matching Biomedical Ontologies

Over the recent years, ontologies are widely used in various domains such as medical records annotation, medical knowledge representation and sharing, clinical guideline management, and medical decision-making. To implement the cooperation between intelligent applications based on biomedical ontologies, it is crucial to establish correspondences between the heterogeneous biomedical concepts in different ontologies, which is so-called biomedical ontology matching. Although Evolutionary algorithms (EAs) are one of the state-of-the-art methodologies to match the heterogeneous ontologies, huge memory consumption, long runtime, and the bias improvement of the solutions hamper them from efficiently matching biomedical ontologies. To overcome these shortcomings, we propose a compact CoEvolutionary Algorithm to efficiently match the biomedical ontologies. Particularly, a compact EA with local search strategy is able to save the memory consumption and runtime, and three subswarms with different optimal objectives can help one another to avoid the solution's bias improvement. In the experiment, two famous testing cases provided by Ontology Alignment Evaluation Initiative (OAEI 2017), i.e. anatomy track and large biomed track, are utilized to test our approach's performance. The experimental results show the effectiveness of our proposal.


Introduction
Ontologies provide a shared and common vocabulary for representing a domain of knowledge [1]. Over the recent years, ontologies are widely used in various domains such as medical records annotation [2], medical knowledge representation and sharing, clinical guidelines management [3], and medical decision-making [4]. However, most biomedical ontologies are developed independently by different experts who might define one entity with different names or in different ways, causing the problem of ontology heterogeneity. For example, to describe the muscles that surround and power the human heart, the National Cancer Institute's thesaurus and ontology (NCI) [5] use the name "Myocardium," whereas the Foundation Model of Anatomy (FMA) [6] uses "Cardiac Muscle Tissue." To implement the cooperation between intelligent applications based on biomedical ontologies, it is crucial to establish correspondences between the heterogeneous biomedical concepts in different ontologies, which is so-called biomedical ontology matching.
Recently, Evolutionary Algorithms (EAs) are one of the state-of-the-art methodologies to match the heterogeneous ontologies [7]. However, huge memory consumption, long runtime, and the bias improvement of the solutions hamper EA-based ontology matching techniques from efficiently matching biomedical ontologies. us, besides the quality of alignments, main memory consumption and runtime needed by the ontology matcher are of prime importance when matching the biomedical ontologies. In this paper, we propose to use the compact EA [8], which utilizes a probabilistic representation of the population, to save the memory consumption of classic EA. en, we introduce the local search strategy into its evolving process to balance the exploration and exploitation and reduce the runtime needed. On this basis, we further propose a compact Coevolutionary Algorithm, which utilizes three subswarms with different objectives to help one another to avoid the solution's bias improvement caused by traditional metric fmeasure [9]. e rest of the paper is organized as follows: Section 2 describes the related works; Section 3 gives some basic concepts of ontology, ontology alignment, and the similarity measures; Section 4 presents the optimal model problem and the details of the compact Coevolutionary Algorithm for matching biomedical ontologies; Section 5 gives the experimental results and relevant analysis; finally, Section 6 draws the conclusions.

Evolutionary Algorithm-Based Ontology Matching
Technique. Due to the complex and time-consuming nature of the ontology matching process, EA-based methods could present a good methodology for obtaining ontology alignments and indeed have already been applied to solve the ontology alignment problem by reaching acceptable results [10]. Different from other EA based approaches [11][12][13] which models the ontology alignment process as a metamatching problem, i.e. how to determine the best appropriate weight configuration in ontology matching process in order to obtain a satisfactory alignment, in this work, ontology matching problem is considered as a global entity matching problem. Genetic Algorithm-Based Ontology Matching (GAOM) [14] is the representative system, which utilized Genetic Algorithm (GA) to determine the optimal ontology alignment. Particularly, GAOM utilizes the chromosomes to describe the potential alignments between two ontologies and utilizes GAs to determine the optimal solution. Besides, MapPSO and MapEVO [15] which exploited the Particle Swarm Optimization Algorithm (PSO) [16] and Evolutionary Programming (EP) [17], respectively, also adopted this idea. Acampora et al. [18] designed a Memetic Algorithm (MA) which introduced a local search process to improve the performance of EA. More recently, Xue et al. [19,20], respectively, used the compact EA and compact Population-Based Incremental Learning Algorithm (PBIL) to save the memory consumption without sacrificing the solution's quality. Compact EA and compact PBIL represented the population as a probability vector (PV) over the set of solutions and are operationally equivalent to the order-one behaviour of the simple EA with uniform crossover. In this way, a much smaller number of solutions must be stored in the memory, thus significantly reducing the memory consumption.

Coevolutionary Algorithm.
e Coevolutionary Algorithm [21] makes multiple swarms simultaneously evolve and communicate with one another to improve the search performance. Currently, distributed coevolution is the most popular coevolving process, which shares the search information among multiple swarms through the population migration strategy. During the searching process, different swarms have evolving strategies and configurations. Tan et al. [22] proposed to decompose the problem's solution vector into multiple swarms to evolve simultaneously. Mu and Liu [23] presented an M-elite Coevolutionary Algorithm that applied different elite strategies in the coevolving process. e elite centered swarm has the highest priority, and other swarms implemented the cooperative coevolving process. In [24], a parallel evolving mechanism was designed by dividing the population into three swarms that evolved independently. However, all the swarms use the same evolving strategy, and the swarm's evolving process swarm was relatively independent, which decreased the algorithm's exploration and exploitation ability. More recently, Wang et al. [25] proposed a two-elite strategy which makes use of the differences between two elites to guide the whole evolving process.
Different from all the techniques mentioned above, in this work, we propose a compact coevolutionary Algorithm to match the biomedical ontologies, which combines the advantages of the compact EA and coEvolutionary Algorithm to save the memory consumption and runtime and overcome the bias improvement of solutions.

Ontology, Ontology Alignment, and Ontology Matching
Process. In this work, an ontology is defined as a quadruple O � (C, P, I, A), where (i) C is the class set, i.e., the set of concepts that populate the domain of interest, (ii) P is the property set, i.e., the set of relations between the concepts of domain, (iii) I is the instance set, i.e., the set of objects in the real world representing the instances of a concept, and (iv) A is the axiom set, i.e., the statements that say what is true about the modeled domain.
An alignment A between two ontologies O 1 and O 2 is defined as a set of correspondences, and each correspondence is a triple (e 1 , e 2 , n), where e 1 and e 2 are the entities in O 1 and O 2 , respectively, and n ∈ [0, 1] is a confidence value holding for the correspondence between them. In this work, the relation existing between two ontology entities is the equivalence (�). e ontology matching process can be defined as a function θ(O 1 , O 2 , p, r) [26], where p is the parameter set and r is the resource set. Ontology matching process returns a new alignment A N between ontologies O 1 and O 2 .

Concept Similarity Measure.
Concept similarity measure is the foundation of biomedical ontology matching [27]. In this work, we utilize an asymmetrical concept similarity measure to calculate the biomedical concepts' similarity values. First, for each biomedical concept, we construct a profile for it by collecting the label, comment, and property information such as label, domain, and range, from itself and all its direct descendants. en, the similarity of two biomedical concepts c 1 and c 2 is measured based on the similarity of their profiles p 1 and p 2 , which can be calculated by the following two asymmetrical measures: where |p 1 | and |p 2 | are the cardinalities of the profile p 1 and p 2 , respectively, |p 1 ∩ p 2 | is the number of identical elements in p 1 and p 2 . e similarity value of e 1 and e 2 is equal to (sim 1 (p 1 , p 2 ) + sim 2 (p 1 , p 2 ))/2 when |sim 1 (p 1 , p 2 ) − sim 2 (p 1 , p 2 )| ≤ δ, and otherwise, 0. In this work, δ is the threshold to measure the extent of the semantic equivalence between sim 1 (p 1 , p 2 ) and sim 2 (p 1 , p 2 ). When the similarity value between two profile elements is above the threshold, they are identified as semantically similar. Generally, δ should be set relatively small to reflect sim 1 (e 1 , e 2 ) and sim 2 (e 1 , e 2 ) have little difference when the entity e 1 and e 2 are semantically equivalent. However, if δ is too small, we would miss many semantically equivalent terms. erefore, the suggested domain of δ is [0.01, 0.10]. In this work, to obtain a suitable, we conducted a pre-experiment on the benchmark by varying the value of δ in its suggested domain, and found the semantic equivalence performs well when δ is assigned to 0.06.
Moreover, the similarity value of two profile elements is calculated by N-gram distance [28], which is the most performing string-based similarity measure for the biological ontology matching problem, and a linguistic measure, which calculate a synonymy-based distance through the Unified Medical Language System (UMLS) [29]. Given two words w 1 and w 2 , their similarity sim 2 (w 1 , w 2 ) is equal to 1 when two words are synonymous, and otherwise, N − gram(w 1 , w 2 ).

Rough Alignment Evaluations.
In this work, we suppose that, in the golden alignment, one concept in the ontology is matched with only one concept in the other ontologies and vice versa. Two rough alignment evaluations, i.e., MatchCoverage and MatchRatio, are utilized to measure the alignment's quality. In particular, MatchCoverage is utilized to approximate recall [9], which calculates the fraction of concepts which exist in at least one correspondence in the resulting alignment in comparison to the total number of concepts in the ontology. e formula of it is presented as follows: And, MatchRatio is used to approximate precision [9], which calculates the ratio between the number of found correspondences and the number of matched concepts. e formula of it is presented as follows: where In most instances, it requires considering both MatchCoverage and MatchRatio to measure the alignment's quality. By referring to the most common combining function f-measure [9], we define MatchFmeasure as follows:

e Optimal Model for Ontology Entity Matching
Problem. Given two biomedical ontologies O 1 and O 2 , we take maximizing MatchFmeasure as the goal, and the optimal model for ontology entity matching problem can be defined as follows: max MatchFmeasure(X), where the decision variable X represents an alignment between O 1 and O 2 , x i represents the ith correspondence between ith concept in O 1 and x i th concept in O 2 , |O 1 | and |O 2 | are the cardinalities of the concept set in O 1 and O 2 , respectively, and x |O 1 |+1 ∈ [0, 1] is the threshold to filter the final alignment.
One of the shortcomings of MatchFmeasure is that the improvement of it does not say anything about whether both MatchCoverage and MatchRatio are simultaneously improved or not. In other words, no matter how large a measured improvement in MatchFmeasure is, it can still be extremely dependent on the improvement on one of the individual metrics [30]. To overcome this bias improvement, we propose a compact coevolutionary Algorithm, which has three PVs that characterize subswarms that aim at maximizing MatchCoverage, MatchRatio, and MatchFmeasure, respectively. rough the cooperation of three PVs, we dedicate to ensure the simultaneous improvement on MatchCoverage and MatchRatio during the evolving process.

Compact Evolutionary Algorithm.
Model-based optimization using probabilistic modeling of the search space is Computational Intelligence and Neuroscience one of the areas where research on Compact Evolutionary Algorithm (CEA) has considerably advanced in recent years. In each generation, CEA updates the probability vector (PV), which is a probabilistic model describing the univariate statistics of the best solutions and then uses it to generate new candidate solutions. By employing the PV, instead of a population of solutions, to simulate the behavior of classic EA, a much smaller number of individuals is needed to be stored in the memory. us, CEA can significantly reduce the memory consumption [31]. In order to further improve CEA performance, we introduce the local search strategy into CEA's evolving process. is marriage between global search and local search is helpful in reducing the possibility of the premature convergence and increasing the convergence speed.
In the next, three main components of CEA, i.e., chromosome-encoding mechanism, probability vector, and local search strategy are, respectively, presented.
(1) Chromosome-Encoding Mechanism: in this work, the genes are encoded through the binary coding mechanism and can be divided into two parts. e first part stands for the correspondences in the alignment, and the other one stands for a threshold. Given the total number n 1 and n 2 of two biomedical concepts in ontologies, the first part of a chromosome (or PV) consists of n 1 gene segments, and the binary code length (BCL) of each gene segment is equal to log 2 (n 2 ) + 0.5, which ensures each gene segment could present any target ontology class's index, while the second part of a chromosome (or PV) has only one gene segment, whose BCL is equal to log 2 (1/numAccuracy) + 0.5, which can ensure this gene segment could present any threshold value under the numerical accuracy numAccuracy. us, the total length of the chromosome (or PV) is equal to n 1 × log 2 (n 2 ) + 0.5 + log 2 (1/numAccuracy) + 0.5.
Given a gene segment geneSeg � geneBit 1 , geneBit 2 , · · · geneBit n , }, where geneBit i is the ith gene bit value of the gene segment, we decode to obtain a decimal number whose value is equal to n i�1 2 geneBit i . In particular, with respect to the first part decoding results, the decimal numbers obtained represent the indexes of the target classes, where 0 means the source instance is not mapped to any target ontology's class. With regard to the second part of decoding result, the decimal number obtained should multiply the threshold's numerical accuracy. Last but not least, if a decimal number d obtained is larger than u, we will replace it with u/d. is procedure is similar with the two-point crossover where the first cut point is randomly selected from 1; 2; · · ·; len { }, and the second point is determined such that L consecutive genes (counted in a circular manner) are taken from ind new . Since ind new and ind elite are both generated through the PV, most of their gene bit values are the same. erefore, even when p c is large, ind neighbor only mutates a few gene bit values of ind elite . In this sense, this variation operator can be considered fairly exploitative.

Pseudocode of Compact Coevolutionary Algorithm.
In this work, we use three PVs to represent the subswarms for maximizing MatchRatio, MatchCoverage, and MatchFmeasure, respectively. In particular, the PV here represents the population that consists of the solutions of its corresponding representative subproblem and this problem's neighbor subproblems. Finally, these PVs help each other in the process of determining three representative solutions, which are given in the following. Here, we mark three representative subproblems of maximizing MatchRatio, maximizing MatchCoverage, and maximizing MatchFmeasure with the symbols P mr , P mc , and P mf , (1) ind neighbor � ind elite .copy(); (2) generate i � round(rand(0; len)); (v) p c : crossover probability; (vi) p m : mutation probability; (vii) MR: mutation rate.
Output: the solution with best MatchFmeasure Step 1. Initialization: Step 1.1. Set the generation gen � 0; Step 1.2. Set the neighbor subproblem of P mr and P mc as P mf and the neighbor subproblems of P mf as P mr and P mc .
Step 1.3. Initialize PV mr , PV mc , and PV mf by setting all the probabilities inside as 0.5.
Step 1.4. Using PV mr , PV mc , and PV mf to generate the elites, which are marked with symbols elite mr , elite mc , and elite mf for P mr , P mc , and P mf , respectively.
Step 2. Evolving process: Step 2.1. Update PV mr , PV mc , and PV mf , respectively. Take updating PV mr for instance, the procedures of updating PV mc and PV mf is similar to it: Step 2.  (29) elite mr � ind new ; Step 2.2. Update PV mr , PV mc and PV mf mutually. For P mr (or P mc ), PV temp is generated by applying the p c -based uniform crossover operator [32] on the PV mr (or P mc ) and its neighbor subproblem's probability vector PV mf en, generate an individual a through PV temp and try to update the PV mr and PV mf through the competition with elite mr (or elite mc ) and elite mf .

Continued.
Computational Intelligence and Neuroscience respectively, and three PVs for solving P mr , P mc , and P mf with the symbols PV mr , PV mc , and PV mf , respectively. We present the pseudocode of compact Coevolutionary Algorithm in Algorithm 2.

Results and Analysis
In order to compare the quality of our proposal with the participants of OAEI 2017 (http://oaei.ontologymatching.org/ 2017/results/index.html) and Population-Based Incremental Learning Algorithm (PBIL) [20], which is a state-of-the-art compact EA-based ontology matching technique, we evaluate the obtained alignments with traditional recall, precision, and f-measure. PBIL and our approach's results in Table 1 and Table 2 are the mean values in thirty time independent executions. e symbols P, R, and F in tables stand for precision, recall, and f-measure, respectively.
As can be seen from Table 1, our approach's f-measure outperforms all the competitors, and our approach's runtime is ranked the 4th place. In Table 2, our approach's fmeasure is the highest in task1, task2, and task3. For the running time, in task1 and task 2, our approach is in the 3rd place and 4th place in task3. In both tracks, our approach outperforms AML, which is the top ontology matcher and developed primarily for the biomedical ontology matching, in all tasks in terms of f-measure, and the runtime in our approach is also very close to or less than AML. e experimental results show that the cooperation among three swarms with different objectives can effectively overcome the bias improvements and improve the quality of biomedical ontology alignments.
In particular, PBIL works with one PV, but our approach utilizes three PVs to cooperate with each other during the evolving process to improve the solution's quality. As can be seen from the experimental results, although our approach takes only a little more runtime than PBIL, the qualities of our results are much better than PBIL in terms of both recall and precision, which shows that our approach can effectively overcome the bias improvement of solutions in PBIL.

Conclusion
In this work, in order to overcome the drawbacks in traditional E-based ontology matching techniques, we for the first time propose a compact Coevolutionary Algorithm to efficiently match the biomedical ontologies. In our approach, three PVs are utilized to characterize three subswarms that For PV mf , PV temp is generated through applying the uniform crossover operator between PV mr and P mc , which are its neighbor subproblems' PVs. en, generate an individual a through PV temp and try to update the P mf through the competition with elite mf .
Step 3. Stopping Criteria: (30) if (maxGen is reached) (31) stop and the elite with best MatchFmeasure; (32) else (33) gen � gen+1; (34) go to Step 2; (35) end if In the evolving process, we first update PV mr , PV mc , and PV mf , respectively (Step 2.1), which is equivalent to the process of updating the solutions of P mr , P mc , and P mf . en, we update PV mr , PV mc , and PV mf mutually (Step 2.2), which is equal to updating the solutions of P mr , P mc , and P mf through their shared neighbor subproblems' solutions, i.e., using the information of a PV to help its neighbor PVs. ALGORITHM 2 6 Computational Intelligence and Neuroscience take as objectives maximizing MatchCoverage, MatchRatio, and MatchFmeasure, respectively, and in each generation, PVs are first updated with CEA paradigm and then help each other to search for better solutions in the search space. In the experiment, OAEI 2017's Anatomy track and Large Biomed track are utilized to test our approach's performance, and the results show that our approach can efficiently determine better ontology alignments than state-of-the-art biomedical ontology matching techniques.

Data Availability
e data used to support the findings of this study have not been made available because of the protection of technical privacy and confidentiality.

Conflicts of Interest
e authors declare that they have no conflicts of interest.