Sensor Ontology Metamatching with Heterogeneity Measures

The heterogeneity problem among di ﬀ erent sensor ontologies hinders the interaction of information. Ontology matching is an e ﬀ ective method to address this problem by determining the heterogeneous concept pairs. In the matching process, the similarity measure serves as the kernel technique, which calculates the similarity value of two concepts. Since none of the similarity measures can ensure its e ﬀ ectiveness in any context, usually, several measures are combined together to enhance the result ’ s con ﬁ dence. How to ﬁ nd suitable aggregating weights for various similarity measures, i.e., ontology metamatching problem, is an open challenge. This paper proposes a novel ontology metamatching approach to improve the sensor ontology alignment ’ s quality, which utilizes the heterogeneity features on two ontologies to tune the aggregating weight set. In particular, three ontology heterogeneity measures are ﬁ rstly proposed to, respectively, evaluate the heterogeneity values in terms of syntax, linguistics, and structure, and then, a semiautomatically learning approach is presented to construct the conversion functions that map any two ontologies ’ heterogeneity values to the weights for aggregating the similarity measures. To the best of our knowledge, this is the ﬁ rst time that heterogeneity features are proposed and used to solve the sensor ontology metamatching problem. The e ﬀ ectiveness of the proposal is veri ﬁ ed by comparing with using state-of-the-art ontology matching techniques on Ontology Alignment Evaluation Initiative (OAEI) ’ s testing cases and two pairs of real sensor ontologies.


Introduction
A sensor network is composed of various sensors. In order to realize the information integration and communication between multiple sensor networks, a semantic sensor web is born, which is composed of the semantic web and the sensor network. In the semantic sensor web, sensor data is marked as metadata characterized by sensor ontology to increase interoperability [1][2][3]. Sensor ontology can reach a consensus on the meaning of concepts in sensor certain field and provide rich domain knowledge and semantic vocabulary for the interaction between application systems in the semantic sensor web. In recent years, there have been many sensor ontologies, such as SensorOntology2009 (https://www.w3 .org/2005/Incubator/ssn/wiki/SensorOntology2009), IoT-Lite (https://www.w3.org/Submission/2015/SUBM-iot-lite-20151126/), original SSN (Semantic Sensor Network) (https://www.w3.org/2005/Incubator/ssn/wiki/SSN#Sensor), new SSN (https://www.w3.org/ns/ssn/), and SOSA (Sensor, Observation, Sample, and Actuator) (https://www.w3.org/ ns/sosa/). These ontologies can represent the function, performance, and usage conditions of sensors, which can provide different data for different purposes and contexts [4,5]. However, in the distributed and open environment, since the people who construct sensor ontologies might have diverse preferences, knowledge backgrounds, and styles of understanding knowledge, one concept could be defined with different names or granularities, and the context of a concept could also be different as well. These lead to the problem of ontology heterogeneity [6,7]. For example, the existence of the concept "personal computer" in one ontology is defined as "PC" in another ontology, which results in syntax heterogeneity. To support the semantic interaction between intelligent systems based on sensor ontology, we need to determine the correspondence between their heterogeneous concepts in sensor ontologies, which is the so-called sensor ontology matching [8]. During the matching process, it is important to distinguish identical concepts by calculating their similarity value, and thus, similarity measure becomes the kernel technique that directly affects the alignment's quality. At present, similarity measures can be mainly divided into three categories, i.e., syntax-based measure, linguistics-based measure, and structure-based measure [9]. Although these methods have been extensively studied and used, it is still a challenge to find an effective similarity measure for a specific matching task. With the emergence of various similarity measures, it is a feasible way of using multiple similarity measures to overcome the limitations of a single one [10][11][12]. However, how to find suitable aggregating weights to combine them becomes another problem, i.e., ontology metamatching problem [13]. Traditional metamatching methods include Max, Min, and average approaches [14][15][16], which, respectively, take into account the maximum value, minimum value, and average value of the concepts measured by various similarity measures but ignore the heterogeneity features of concepts. This paper proposes a novel ontology metamatching approach to face this challenge, which uses the heterogeneity features of two ontologies to semiautomatically tune the weights for aggregating different similarity measures.
In particular, three ontology heterogeneity measures are firstly proposed to, respectively, evaluate the heterogeneity values in terms of syntax, linguistics, and structure heterogeneity features, and then, along with the aggregating weights of corresponding similarity measures which are given by experts, conversion functions that convert any two ontologies' heterogeneity values to the weights for aggregating the similarity measures are trained. The training process of conversion function is shown in Figure 1, where O r and O i represent the ith pair of ontologies for training (O r is the reference ontology and O i is a target ontology), and W i , W i ′ , and W i ″ are the ith set of weights for aggregating the syntax-based, linguistics-based, and structure-based similarity measures, respectively, which are given by experts and have the best effectiveness to measure the ith pair of ontologies. The conversion function aims at converting the heterogeneity value into corresponding aggregating weight, such as converting the syntax heterogeneity to the aggregating weight for syntax-based similarity measure.
In the rest of the paper, Section 2 defines the related concepts about sensor ontology and the evaluation function; Section 3 describes in detail the ontology heterogeneity measure and semiautomatically learning method; Section 4 shows the experimental results; Section 5 summarizes the work and presents future work.

Preliminaries
2.1. Sensor Ontology and Entity Correspondence. Sensor ontology is a 3-tuple [17] O = ðC, P, IÞ, where C is a nonempty set of classes, P is a nonempty set of properties, I is a set of instances (can be empty). In general, class, property (datatype property and object property), and instance are called entities.
The sensor ontology matching measure utilizes the information inside the sensor ontology and other knowledge bases outside the sensor ontology to determine the similar entities in two sensor ontologies [18]. The entity mapping set is called sensor ontology alignment, and each entity correspondence inside is defined as follows: An entity correspondence is a 4-tuple [19] ðe, e′, n, rÞ, where e and e ′ are two entities from the sensor ontologies, respectively; n represents the similarity value obtained by calculating e and e ′ through mathematical functions; and r is the semantic relationship between e and e ′ , which is the equivalence relationship in this work [20].

Similarity Measure.
A similarity measure is a function that takes as input two entities' information d and d′ and returns their similarity value of v = gðd, d′Þ ∈ ½0, 1. Next, we introduce three types of similarity measures in detail, i.e., syntax-based, linguistic-based, and structure-based.
2.2.1. Syntax-Based Similarity Measure. Syntax-based similarity measure utilizes the syntax information of two strings to calculate their similarity value. There are many similarity measures based on syntax, such as Jaro-Winkler distance [21], SMOA distance [22], and N-gram [23]. According to 2 Wireless Communications and Mobile Computing [23], N-gram similarity measure shows superior performance when concepts match, and especially when N = 3, the best effect is achieved. Therefore, this paper uses the Ngram as the representative of the syntax-based similarity measure. N-gram is defined as follows: where s 1 and s 2 are the strings to be matched, N s 1 and N s 2 , respectively, represent the number of substring in the two strings when N = 3, and commðs 1 , s 2 Þ represents the number of common substring in the two strings.

Linguistics-Based Similarity
Measure. Linguistics-based similarity measure calculates two words' similarity value by using a background knowledge base, such as WordNet [24]. WordNet is an electronic language database which includes the set of synonyms of various vocabularies and defines the subordinate relation between different vocabularies. It is usually used to measure the similarity between concepts. The Wu and Palmer similarity measure used in this paper uses the depth of the two concepts in WordNet and the depth of their nearest common parent concept to calculate the similarity [25]. It considers the change of connection strength between concepts, and the result of measurement is more accurate. Given two words c 1 and c 2 , the Wu and Palmer similarity measure is defined as follows: where LCAðc 1 , c 2 Þ is the closest common parent concept about the concept c 1 and c 2 in the WordNet hierarchy. depthðLCAðc 1 , c 2 ÞÞ, depthðc 1 Þ, and depthðc 2 Þ represent the depth of the common parent concept, c 1 and c 2 in the hierarchy, respectively.

Structure-Based Similarity
Measure. Structure-based similarity measure utilizes two concepts' parent-child concepts defined in their ontologies to calculate the similarity value. As a classic structure-based similarity measure, SimRank [26] works based on the assumption that if two concepts are related to similar concepts, then the two concepts are similar. The relationship graph between concepts is expressed as G = ðV, EÞ, V as a node represents an object, and E as an edge connecting nodes represents the relationship between objects. The SimRank similarity measure is defined as follows: where a and b are two concepts from two ontologies, respectively, I i ðaÞ represents the ith concept that points to a in the relationship graph, and jIðaÞj represents the cardinality of the concepts pointing to a in the relationship graph. When a = b, the similarity value between concepts is 1; otherwise, the similarity value of a and b is the average value of the similarity value of all combinations of the concepts pointing to a and the concepts pointing to b. The constant C as the damping coefficient is generally assigned to 0.8. When the number of concepts pointing to a or b is 0, the similarity value between a and b is 0.
As a recursive algorithm, SimRank algorithm has an iterative process as follows. Let R k ða, bÞ be the similarity of ða, bÞ in the kth iteration process and initialized as

Wireless Communications and Mobile Computing
When a = b, the initial similarity value between concepts is 1; otherwise, the initial similarity value between concepts is 0, using the initialization result to start iteration: When a = b, R k+1 ða, bÞ = 1. The similarity of each generation ða, bÞ will be updated according to the value of the previous generation. R k+1 ða, bÞ converges with the increase of k. When k → ∞, it can be considered that R k ða, bÞ = sða, bÞ. Usually, it has the best effect when k ≈ 5.

Evaluation Metrics on Alignment's Quality.
Traditionally, the quality of a sensor ontology alignment can be evaluated by recall, precision, and f -measure [27,28]. The recall indicates the correct matching pairs among the total found matching pairs accounting for the reference matching pairs. When the recall is 1, it means that all correct matching pairs have been found. However, this does not mean that there are no wrong matching pairs among the found matching pairs, so the accuracy needs to be evaluated. The precision indicates the correct matching pairs among the total found matching pairs accounting for the found matching pairs. In order to comprehensively evaluate the matching quality, the weighted harmonic mean (i.e., f -measure) of the precision and recall is used. The precision, recall, and f -measure are defined as follows: where R is a reference alignment and A is an alignment.

Ontology Heterogeneity Measure and Conversion Function
In this work, we dedicate to semiautomatically tuning the aggregating weights for various similarity measures by taking into consideration two ontologies' heterogeneity features. Firstly, we construct an ontology hierarchy graph for each ontology, whose nodes are the ontology's concepts and the edge between two nodes is the relationship between the corresponding concepts [29]. Secondly, the concepts in each ontology are ranked in descending order according to their corresponding nodes' out-in-degrees [30] in the ontology hierarchy graph. After that, the first third concepts with the best ranked are selected as the representative concepts, which are further used to measure the heterogeneity value of two ontologies. On this basis, the representative datatype properties and object properties can also be determined. Then, given the similarity measures of their aggregating weights, we try to learn conversion functions that map any two ontol-ogies' heterogeneity values to the weights for aggregating the corresponding similarity measures.

Ontology Heterogeneity Measure.
To the best of our knowledge, this is the first time to quantify the degree of ontology heterogeneity features. To reduce the computational complexity, it is necessary to select the representative concepts from the ontology for the heterogeneity calculation. In this work, we first calculate the sum of each node's out-in-degrees, which reflects the close connection between this node and its surrounding nodes, as well as the importance of the nodes. Then, the first third concepts with the best ranked are selected as the representative concepts (i.e., classes). The datatype properties and object properties related to representative classes can be chosen as the representative ones.
Being inspired by traditional metrics such as recall and precision, the heterogeneity value in this work is calculated through the ratio of the heterogeneous part to the overall part. Supposing cardinality of the alignment in this paper is one-to-one mapping, E 1 and E 2 are two entity sets to be matched; the heterogeneity value is calculated as follows: where jE 1 j and jE 2 j are, respectively, E 1 and E 2 's cardinalities, jE mapped j is the cardinality of the identical entities in the two entity sets. At present, there are three kinds of ontology heterogeneity, and the quantification of the heterogeneity of each of them should be carried out on the entities related to the heterogeneity. In different heterogeneity measures, the choice of entity sets E 1 and E 2 is different: (1) Syntax Heterogeneity Measure. It is inspired by the syntax similarity measure N-gram, which directly measures the entities in the ontology. Then, the quantification of syntax heterogeneity feature will also focus on the entities in the ontology. E 1 and E 2 , respectively, represent the representative entity sets in the two ontologies to be matched (2) Linguistics Heterogeneity Measure. It is inspired by linguistics similarity measure Wu and Palmer, measuring the synonyms of the entities to be matched in WordNet, that the quantification of linguistic heterogeneity will also focus on the synonyms of the entities in the ontology in WordNet. E 1 and E 2 , respectively, represent the set of representative entities in the two ontologies to be matched and their synonyms in WordNet According to different heterogeneity features, different entities are selected as the input of the heterogeneity method (formula (7)), which can quantify the degree of different heterogeneity features more targeted.

Conversion Function
Learning. The construction of the "heterogeneity value-weight" conversion function is aimed at exploring the functional relationship between the aggregating weights of various similarity measures and the heterogeneity features that cause the heterogeneity between ontologies, aiming to give more effective aggregating weights to various similarity measures. After determining three kinds of heterogeneity values between two ontologies, we need to further convert them to the aggregating weights for different similarity measures. So the conversion function is needed to construct. As shown in Figure 1, the learning data of the conversion function includes heterogeneity value and the corresponding aggregating weight (about N-gram, Wu and Palmer, or SimRank) given by experts.
This work uses the data analysis software Origin2020 (https://www.originlab.com) to construct the conversion function. The fitting algorithm behind the tool is leastsquares. Compared to language interpolating polynomials and cubic spline interpolation, the least-squares algorithm is more suitable for forecasting. Furthermore, by observing that "heterogeneity value-weight" fitting data is an exponential function, and the least-squares algorithm is capable of fitting the exponential function.
Due to the different heterogeneity features of various entities in ontology, not all the three types of similarity measures are needed for each kind of entity.
(1) For class. The learning data for fitting can be defined as ðX j i , Q j i Þ, where X i ði = 1, 2, 3Þ represents a certain kind of heterogeneity value of the three, and Q i is the aggregating weight of the similarity measure which belongs to the same kind, such as X i is the syntax heterogeneity value and Q i is the aggregating weight of the syntax-based similarity measure Ngram. j = 1, 2, 3 ⋯ represents the serial number of the data for fitting

Ontology Metamatching
In this section, we describe the proposed ontology metamatching technique in detail. The entities in an ontology can be divided into three types, i.e., class, datatype property, and object property, and they all have such information as ID, label, and comment. Next, we take class as an example to illustrate the matching process: given a pair of ontologies to be matched, (1) select the representative classes from the two ontologies, calculate three kinds of heterogeneity values, and, respectively, substitute them into the corresponding "heterogeneity value-weight" functions to obtain the aggregating weights. In particular, the ith aggregating weights are denoted as W i ; (2) construct the similarity matrices on ID and label using different similarity measures, which are, respectively, denoted as M ID i and M label i . Since comment has no structural relationship and it is usually a sentence that SimRank and Wu and Palmer could not measure it effectively, we utilize N-gram to measure their similarity; (3) multiply the similarity matrix of ID and label with their corresponding aggregating weights and, respectively, add these weighted matrices to a comprehensive similarity matrix, i.e., (4) use a threshold to filter the comprehensive similarity matrix: first, filter the ID matching pairs; second, for those entities that do not match, filter them by their labels; third, for those entities that still do not match, filter by their comments; finally, put the mappings whose similarity values are above the threshold into the final alignment in each step. For the sake of clarity, we show the flowchart of the process of determining the final ontology alignment in Figure 2.

Dataset and Experimental Configuration.
In the experiment, we use the testing cases provided by the Ontology Alignment Evaluation Initiative (OAEI) (http://oaei .ontologymatching.org). A brief description on OAEI's testing cases is shown in Table 1. In particular, each case contains two ontologies to be matched and one reference alignment for evaluating the quality of the alignment. The

201-210
The concept language features of the ontology pairs to be matched are different, but the structure of them is the same.

221-247
The concept structure of the ontology pairs to be matched is different, but the language features of them are the same.

248-262
The ontology pairs to be matched in the cases have different lexical, linguistic, and structure features.

Real sensor ontologies
Original SSN Description of sensors and observations and related concepts.
New SSN Addressing changes in scope and audience, shortcomings of the original SSN, and new technical developments. SOSA As a lightweight core for SSN.  SOSA provides a lightweight core for SSN and aims at broadening the target audience and application areas that can make use of Semantic Web ontologies (https://www.w3.org/ TR/vocab-ssn/). As classic sensor ontologies, SSN series sensor ontology is widely used in satellite imagery, large-scale scientific monitoring, industrial and household infrastructures, social sensing, citizen science, observation-driven ontology engineering, and the Web of Things. They greatly promote information exchange in various fields.
The relevant configuration of the experiment is as follows: the threshold of the similarity matrix is 0.68; the model of the conversion function is the exponential function yðxÞ = a × b x + c. In this experiment, through the benchmark, we compare our approach with OAEI's participants and three classic metamatching methods, i.e., maximum-based metamatching  approach, minimum-based metamatching approach, and average-based metamatching approach. Finally, we apply the method of this paper to two pairs of real sensor ontologies to verify the robustness. Figures 3-5, seven conversion functions are achieved, among which class has three kinds of conversion functions, while due to the lack of structural features, datatype property and object property have two kinds of conversion functions, respectively. According to R-square, we can see that each conversion function is able to effectively reflect the mapping relation between heterogeneity value and aggregating weight. From the conversion functions, we can find that the aggregating weight of syntax-based similarity measure N-gram takes the largest percentage among the similarity measures for all three kinds of entities, while the others account for a small part. In addition, the aggregating weight of the syntaxbased N-gram decreases exponentially as its heterogeneity value increases for all three kinds of entities; on the contrary, the aggregating weight of the linguistics-based similarity measure Wu and Palmer and the structure-based similarity measure SimRank increases exponentially as their heterogeneity value increases.

Experimental Results and Analysis. As shown in
In Table 2, edna [32] is the baseline method, and AML [33], LogMap series [34], and XMap [35] are OAEI's participants. In terms of the comprehensive evaluation index f -measure, our method score is 0.82, which is much higher than other state-of-the-art methods. Since AML uses more than ten similarity measures, its precision is slightly higher than our method, but there will be suppression between multiple similarity measures, which yields low recall and high computational complexity on tuning these similarity measures and debugging the results. Our approach's recall is 0.74, which is higher than other state-of-the-art methods.
As shown in Tables 3-5 that in all 45 testing cases, our method achieves better results than other competitors in terms of 41 testing cases and has equal results with other competitors in terms of another 4 testing cases, which further shows the effectiveness of our method. For a more intuitive comparison, a bar graph of f -measure for the four methods is drawn in Figure 6.
Both in comparisons with the state-of-the-art methods in OAEI and three classic metamatching methods, our method achieves ideal results. This is because other methods ignore the effect of heterogeneity features, which makes them unable to identify more matching pairs effectively. The heterogeneity features into the process of obtaining aggregating weights   When matching real sensor ontologies, new SSN is treated as the source ontology, and the other two real sensor ontologies are treated as the target ontologies. It can be seen from Table 6 that our approach is able to obtain results that are very close to the golden alignment. When matching the new SSN and the original SSN, the recall obtained by our approach is 0.97, which is due to that we suppose the alignment's cardinality is one to one, i.e., one concept in the ontology can only be mapped with one concept of another ontology and vice versa. However, the cardinality could be one to many in some practical tasks, e.g., the concept "Stimulus" in the new SSN is mapped with the concept "Stimulus" in the original SSN which is asserted the same as another class "Sensor Input." In our approach, according to the filtering sequence of step 4 in Figure 2, if "Stimulus" in the new SSN first is mapped with "Stimulus" in the original SSN by ID, it will no longer participate in the filtering process (although the comment of "Stimulus" in the new SSN is the same as the comment of "Sensor Input" in the original SSN), which decreases our approach's recall value.

Conclusion and Future Work
Addressing ontology metamatching problem requires the determination of suitable aggregating weights of several similarity measures for matching ontologies. Considering that the heterogeneity features of ontology have an important influence on the aggregating weight of similarity measures, this paper first proposes three heterogeneity measures to quantify the heterogeneity values of ontology heterogeneity features and then constructs the functional relationship between the heterogeneity feature and the aggregating weights of various similarity measures. The effectiveness of the proposal is verified by comparing with the state-of-the-art matching techniques in OAEI and other classic metamatching methods. The robustness of the method is proved by matching two pairs of real sensor ontologies.
In the future, we are interested in further improving this approach to deal with the matching task with the alignment's cardinality being many to many. We are also interested in addressing large-scale matching problem, where two sensor ontologies might own tens of thousands of concepts and instances. In addition, when the number of similarity measures is large, e.g., more than 50, the selection of suitable ones is the critical issue that we need to address. Finally, an efficient user interacting strategy should be used to further reduce the interacting time and the user's error rate.

Data Availability
The data used to support this study can be found in http:// oaei.ontologymatching.org.