Optimizing Biomedical Ontology Alignment through a Compact Multiobjective Particle Swarm Optimization Algorithm Driven by Knee Solution

,


Introduction
Decision-making requires finding an optimum solution to a decision problem in the process of identifying and evaluating alternatives. Nowadays, most real-world decision problems consist of two or more incommensurable objectives to be optimized simultaneously, so-called multiobjective optimization problems (MOPs) [1]. In general, MOP does not have one optimum solution but a set of solutions, so-called Pareto optimal solutions, which are superior to the others in terms of one or more objectives. In many situations, a decision maker (DM) only prefers one single optimum solution in the Pareto front (PF) [2], and therefore, the optimization and the decision process are often combined. One of the multiobjective methods is driven by the PF's knee solution which is the one for which an improvement in one of its objectives will result in a deterioration in another. Often, PF's knee solution is logically preferred to the DM if there are no user-specific or problemspecific preferences [3].
In this context, the biomedical ontology matching problem in the Semantic Web (SW) domain is investigated. Although biomedical ontology is extensively used in the biomedical domain to model the biomedical knowledge, it is developed and maintained by different communities, and the same biomedical knowledge could be described with different terminologies or in different contexts. To bridge the semantic gaps between two biomedical ontologies and support their communications, it is necessary to find their identical concept mapping, which is the so-called biomedical ontology matching. Since biomedical ontologies often own large-scale concepts with rich semantic meanings, it is difficult to find a perfect alignment that could meet all DM's requirements, and usually, the matching process needs to trade-off two conflict objectives, i.e., the alignment's recall and precision. To this end, in this work, the biomedical ontology matching problem is defined as a MOP. Although there are many studies on the single-objective approaches for addressing the ontology matching problem [4][5][6][7][8], the research on multiobjective ontology matching techniques is still in this infancy [9]. Being inspired by the success of the particle swarm optimization algorithm (PSO) in the ontology matching domain [10], largescale discrete optimization domain [11][12][13], and biomedical engineering [14], in this work, a compact multiobjective particle swarm optimization algorithm driven by knee solution (CMPSO-K) is further proposed to address this problem. In particular, CMPSO-K uses a compact multiobjective evolutionary mechanism to efficiently optimize the alignment's quality and a max-min approach to determine the PF's knee solution. e contributions made in this paper are as follows: (1) a discrete multiobjective optimal model for the biomedical ontology matching problem is constructed; (2) a hybrid biomedical concept similarity metric is proposed, which can effectively calculate the similarity value of two biomedical ontology concepts; and (3) a CMPSO-K is proposed to optimize the biomedical ontology alignment, which takes into consideration both DM's preference and algorithm's performance. e rest of the paper is organized as follows: Section 2 overviews the SIA-based ontology matching techniques; Section 3 defines the biomedical ontology matching problem and presents the similarity measure on biomedical concepts; Section 4 presents the CMPSO-K-based biomedical ontology matching technique in detail; Section 5 shows the experimental results; and finally, Section 6 draws the conclusion and presents the future work.

Swarm Intelligence Algorithm-Based
Ontology Matching Technique e first generation of swarm intelligence algorithm-(SIA-) based matcher dedicates to optimize the ways of aggregating various ontology matchers' corresponding alignments. e very first matching system is Genetics for Ontology ALignments (GOAL) [7], which uses evolutionary algorithm (EA) to optimize the aggregating weight set of different ontology matchers. Later, Alexandru-Lucian and Iftene [15] further used EA to optimize one more parameter to filter the unauthentic concept mappings to final alignment. Acampora et al. [4] introduced a local search process into EA's evolving process to improve the algorithm's performance. Xue and Wang [16] used a new metric as the fitness function to guide the algorithm's search direction. eir approach can address the holistic matching problem and determine a universal weight configuration for matching several pairs of ontologies at a time. He et al. [6] proposed to utilize artificial bee colony algorithm (ABC) to optimize all the parameters in the matching process, whose results are better than the EA-based matchers. More recently, Xue et al. [17] proposed a new approach that uses NSGA-III [18] to combine various similarity measures without tuning the aggregating parameters. However, when the scale of the similarity measures becomes huge, e.g., more than 50 similarity measures, this approach could be inefficient. e above matchers need to calculate all the matchers' alignments and store them in the main memory before aggregating them, which requires huge memory consumption. Recently, the second generation of the SIA-based matcher tries to directly find an entity correspondence set that is close to the golden alignment. GAOM (genetic algorithm-based ontology matching) [8] regards two ontologies as two discrete concept sets and employs EA to determine the optimal entity mapping set. Alves et al. utilized a memetic algorithm (MA), which combines EA with a local search strategy, to execute the instance-based ontology matching process [5]. ey first matched the instances and then propagated the instance pair's similarity value to the corresponding concepts. More recently, MapPSO [10] uses PSO to determine the optimal entity correspondence set. In particular, MapPSO introduces a new quality measure on the ontology alignment, which depends on the statistical results on the alignment. More recently, Chu et al. [19] first modeled the ontologies in the vector space so that two entities' similarity value can be calculated through the cosine function, and then EA was used to determine the optimal alignments. Xue [20] proposed a new similarity metric for measuring the biomedical concepts' similarity value and then used the firefly algorithm (FA) to optimize the biomedical ontology alignment. However, their proposal suffers from the premature convergence when matching large-scale ontologies. Our proposal also belongs to this category, but different from the existing work, in this work, the biomedical ontology problem is regarded as a discrete MOP, and a multiobjective SIA is presented to address it, which takes into consideration the DM's preference.

Biomedical Ontology Matching Problem.
Ontology matching aims at determining the identical entity mappings, which is the so-called ontology alignment. In the past, the ontology alignment's quality was often measured by f-measure [21], which is defined as follows: where R and A are, respectively, the alignments determined by the domain expert and ontology matcher. Recall and precision, respectively, measure A's completeness and soundness, f-measure is their harmony mean, and α ∈ [0, 1] is the weight to trade-off recall and precision. However, fmeasure requires domain experts to provide R, which limits its application in real practice. Supposing a golden biomedical ontology alignment's cardinality is 1 : 1, since the larger cardinality of the concept mapping set is and the higher the found correspondences' mean similarity value is, the better the alignment quality is, three approximate measures, i.e., recall', precision', and f-measure', are used to, respectively, approximate the original recall, precision, and f-measure [22]: where |C 1 |and |C 1 |are two ontologies' concept scale, |A|is A's concept mapping number, and simValue i is the i-th concept mapping's similarity value. Finally, biomedical ontology matching problem's discrete multiobjective optimal model is defined as follows: where |C 1 | and |C 2 | are, respectively, the concept number of two biomedical ontologies, x i means the i-th pair of concept correspondence, i.e., i-th source concept is mapped to target x i -th concept, and two objective functions r and p are to, respectively, maximize recall' and precision' of X's corresponding alignment.

Similarity Measure on Biomedical Concept.
In this work, a hybrid similarity measure is used to enhance the confidence of the calculated similarity value, which takes into consideration the concept's syntax and linguistic and context information. First, for each biomedical concept, the information (the label, comment, and property name) from itself and its context concept are put into its separated profile. en, two biomedical concepts' similarity value is calculated according to the following equation: where p 1

Compact Multiobjective Swarm Optimization Algorithm Driven by Knee Solution
is work proposes a CMPSO-K for solving the biomedical ontology matching problem, which approximates the population-based PSO's evolving process through a probability vector (PV) [25]. In the next, the objective decomposition approach, the encoding mechanism, and knee solution determination are presented, respectively, and finally, the details of CMPSO-K are shown through the pseudo-code.

Objective Decomposition.
In this work, the weighted sum approach is used to transform a MOP into a set of subproblems and solves them simultaneously.
where N is the number of decomposed problems and Two objectives of our problem are to maximize recall' and precision', and the ith subproblem's objective is defined as follows: where α 1 � 0 and α

Encoding Mechanism.
In this work, the Gray coding, a binary encoding mechanism, is used to encode an alignment. As can be seen from Figure 1, an example of the encoding mechanism is shown, in which the source concept "uterine gland" with index 8 is mapped to target concept "Uterine Gland" with index 6 whose Gray code is 110. In particular, Gray code 000 means a source concept is not mapped to any target concept. Moreover, a PV is utilized to characterize a population for solving one decomposed subproblem, whose element number is equal to that of an individual, and each element in PV represents the probability of being 0. erefore, various individuals can be generated with the binary code through a PV. Figure 2 shows an example of generating an individual through PV. rough comparing each of PV's element with the random number, the new individual's element value can be determined. It is obvious that when all of PV's elements are close to 1 or 0, the algorithm tends to converge. At the end of each generation, PV is updated according to the best individual by increasing (or decreasing) its element value if the corresponding element of that individual is 1 (or 0).    Discrete Dynamics in Nature and Society

Crossover Operator.
e crossover operator generates one child individual by mixing the information of two parent individuals, which is carried out according to the crossover probability. In this work, the uniform crossover operator is used, whose pseudo-code is shown in Algorithm 1. Given two parent solutions ind 1 and ind 2 , each gene bit value of their offspring ind new is the same with the corresponding gene bit value of ind 1 (or ind 2 ) when a random number rand(0, 1) is larger than (or smaller than) the crossover rate p cr . Figure 3 shows an example of knee solution and user preferred region. As can be seen from the figure, the knee solution, a subproblem's elite solution, and the PF form a user preferred region for that subproblem. In each generation, the newly generated solution tends to move toward that region to meet a DM's requirement. us, the knee solution's determination is of utmost importance for the algorithm's search performance. In this work, for those solutions in the PF, a max-min approach is used to determine the knee solution. In particular, suppose ind 1 and ind 2 are two solutions in the PF, and recall ' i and precision ' i , i � 1, 2, are, respectively, their recall' and precision', and they can be compared according to the following formula:

Determination of Knee Solution.
On this basis, a best solution can be selected from PF solutions, and this procedure is shown in Algorithm 2.

Pseudo-Code of Compact Multiobjective Particle Swarm Optimization Algorithm Driven by Knee Solution.
e pseudo-code of CMPSO-K is presented in Algorithm 3. CMPSO-K first divides the problem into three subproblems that, respectively, maximize f α�0 , f α�0.5 , and f α�1 . First, three PVs and local best individuals are, respectively, initialized for three subproblems, and then the knee solution (or global best individual) is initialized by first using the nondominated sorting algorithm [26] on the population and then the knee solution in the PF is determined. In each generation, CMPSO-K tries to solve each subproblem by approximating PSO's position updating strategy, i.e., crossover an individual with the local best individual and global best individual to obtain a new individual, and then use the new one to update the local best individual and PV. After solving each subproblem, a population in the current generation is obtained by 3 local best individuals and 6 individuals in total generated through PV α�0 , PV α�0.5 , and PV α�1 , respectively, and then the nondominated sorting algorithm is used to determine its PF and the current generation's knee solution ind elite ′ . rough comparing ind elite ′ with historical knee solution ind elite through the maxmin approach, the latter can be updated. Finally, when the generation approaches the maximum generation maxGen, the algorithm terminates and returns ind elite .

Experiment
To test CMPSO-K's performance, the experiment exploits three biomedical tracks in Ontology Alignment Evaluation Initiative (OAEI), i.e., anatomy track, large biomed track, and disease and phenotype track. e testing cases in these tracks are all practical ontologies that are widely used in the biomedical domain and open to achieve, which have lots of overlapping information with different representations. OAEI provides the reference alignments for each track to test a matcher's performance, and Table 1 briefly describes the ontologies in these tracks.
In terms of the alignment's quality, the EA-based matcher [8], ABC-based matcher [6], PSO-based matcher [10], and OAEI's participants are compared in Tables 2 and  3, and also, in terms of the memory consumption and converging speed, CMPSO-K-based matcher and other SIAbased matchers are compared in Figures 4 and 5. All the SIAs' results are the mean values of 30 independent executions. EA, ABC, and PSO's configurations are referred to their literatures, and CMPSO-K uses the following configuration: number of decomposed problems: N � 3; maximum generation: maxGen � 3000; crossover probability: p cr � 0.6; and step length for updating PV:step � 0.1. is configuration is determined in an empirical way, which represents a trade-off setting in the experiment to obtain the highest average results on all testing cases.
First, Friedman's test [32] is used to figure out whether all the matchers present any difference and then determines whether one matcher statistically outperforms others through Holm's test [33]. In Friedman's test, the null Input: PF solution set IND pf � ind 1 , ind 2 , · · · Output: a knee solution ind elite (1) (4) if min i > max then (5) max � min i ; (6) ind elite � ind i ; (7) end if (8) end for (9) return ind elite ALGORITHM 2: Determination of knee solution.
In Holm's test, z value is the testing statistic for comparing the i-th and j-th matchers, which is used for finding the p value that is the corresponding probability from the table of the normal distribution. p value is then compared with α � 0.05, which is an appropriate level of significance. According to Table 3, it is possible to state that our approach statistically outperforms other biomedical ontology matchers on f-measure at 5% significance level. Since the multiobjective evolving mechanism can better trade-off two objectives and the knee solution can effectively guide the algorithm's search direction, CMPSO-K's solutions are much better than other SIAs. Figures 4 and 5, respectively, compare CMPSO-K with other SIAs on the memory consumption and converging speed. As can be seen from the figures, CMPSO-K can significantly improve the converging speed and reduce the memory consumption, which shows the effectiveness of the compact encoding mechanism and the compact evolutionary operators. To sum up, CMPSO-K-based ontology matching technique can efficiently optimize the biomedical ontology alignments.

Conclusion and Future Work
To optimize biomedical ontology alignment's quality, in this paper, a discrete multiobjective optimal model is constructed, and a hybrid similarity metric to distinguish the heterogeneous biomedical concepts, a CMPSO-K-based ontology matching technique is then proposed for addressing it. Compared with the most existing SIA-based ontology matching techniques, CMPSO-K takes into consideration both the algorithm's performance and the DM's preference. Accordingly, three methods are proposed to achieve this goal: (1) the compact encoding mechanism for saving the memory consumption and runtime; (2) multiobjective decomposition and evolutionary mechanism for trading-off different objectives; and (3) the max-min strategy for determining the knee solution and guiding the algorithm's searching direction. Our work presents a novel compact multiobjective evolutionary framework that can improve the efficiency of the current SIA-based ontology matching technique.
In the experiment, the f-measure values obtained by CMPSO-K outperform all the other competitors, which shows that CMPSO-K can effectively optimize the ontology alignments. In particular, the quality of alignment of CMPSO-K is better than EA ABC and PSO, which shows that CMPSO-K's multiobjective evolutionary mechanism driven by the knee solution can effectively trade-off two optimal objectives and find the better solution. Moreover, since the compact encoding mechanism uses PVs to represent the swarms and the compact evolutionary operators can simplify the population-based PSO's evolving process, CMPSO-K can significantly reduce the memory consumption and runtime. Since none of the similarity measures can effectively distinguish all the heterogeneous concepts in any situations, it is necessary to aggregate several similarity measures to improve the result's precision. We utilize a hybrid similarity measure which combines three kinds of similarity measures to calculate the entity similarity value, and therefore, CMPSO-K's results are significantly higher than other systems that only take into consideration one or two categories of similarity measure, such as DOME, POMAP++, and LogMap. However, AML applies too many similarity measures that lead to the conflicting results, which decrease its recall value. us, how many similarity measures should be selected and combined to ensure the quality of the alignment will be one of our future work. Last but not the least, CMPSO-K takes into consideration of DM's preference and utilizes the knee solution to guide the algorithm's search direction, which can effectively trade-off two optimal objectives and find better solutions.
To further improve the matching process's efficiency, in the future, we will be interested in developing a biomedical ontology partitioning technique to split two ontologies into disjoint ontology segments so that the largescale problem can be converted into several small-scale segment matching problems. After that, the parallel computation can also be further utilized to match the similar segments and improve the matching process's performance. With respect to the similarity measure, we would like to develop an adaptive biomedical concept similarity measure framework, which can instantiate an effective similarity metric according to two biomedical ontologies' heterogeneous characteristics. Data Availability e data used to support this study can be found in http:// oaei.ontologymatching.org.

Conflicts of Interest
e authors declare that they have no conflicts of interest in the work.