Semantic Integration of Sensor Knowledge on Artificial Internet of Things

Department of Information Technology, Concord University College Fujian Normal University, Fuzhou/350118, China Fujian Key Lab for Automotive Electronics and Electric Drive, Fujian University of Technology, Fuzhou/350118, China Guangxi Key Laboratory of Automatic Detecting Technology and Instruments, Guilin University of Electronic Technology, Guilin/541004, China Intelligent Information Processing Research Center, Fujian University of Technology, Fuzhou/350118, China


Introduction
Internet of Things (IoT) [1] consists of interconnected things with built-in sensors, and Artificial IoT (AIoT) [2] further integrates Artificial Intelligence (AI) with IoT to create the sensor network that can communicate and process data. To implement the communications and co-operations among intelligent systems on AIoT, it is necessary to annotate sensor data with the semantic meanings to overcome heterogeneity problem among different sensors, which requires the utilization of sensor ontology [3]. Sensor ontology formally models the knowledge on AIoT by defining the concepts, the properties describing a concept, and the relationships between two concepts. Since sensor ontologies are regarded as the solution to data heterogeneity on AIoT, and in recent years, many sensor ontologies [4] have been developed. However, due to human's subjectivity, the overlapping information in these ontologies could be defined with different terminologies and contexts, yielding the ontology heterogeneity problem. Therefore, before using them, it is necessary to integrate their knowledge by finding the correspondences between their concepts. Ontology matching can bring sensor ontologies into mutual agreement by automatic determining identical concept correspondences (i.e., ontology alignment), which is regarded as an effective technique to address the ontology heterogeneity problem.
Due to high computational complexity in the matching process, the Swarm Intelligence algorithm (SI) has become a popular methodology for integrating heterogeneous ontologies [5][6][7][8][9]. Martinez-Gil and Montes [10] propose the Genetics for Ontology Alignments (GOAL), which first generates a similarity matrix for each similarity measure, and then uses the Genetic Algorithm (GA) to optimize the weights for aggregating these matrices. Aggregating weights determined by GOAL can be reused to match the ontologies with similar heterogeneous features. Ginsca and Iftene [11] not only optimize the parameters in the matching process but also the threshold in the alignment filtering process. Acampora et al. [12] try to improve GA's converging speed as well as the solution's quality by introducing a local search strategy. Xue and Wang [13] propose a new metric to approximately measure the alignment's f-measure [14], and on this basis, utilize the hybrid GA to execute the instancelevel matching in the Linked Open Data cloud (LOD). More recently, He et al. [15] propose an Artificial Bee Colony algorithm (ABC) based matching technique to aggregate different similarity measures, which can improve the alignment's quality. These SI-based matching techniques need to first store the similarity matrices determined by the similarity measures, which sharply increase the computational complexity. To this end, Genetic Algorithm based Ontology Matching (GAOM) [16] models the ontology matching as a bipartite graph matching process and tries to use GA to directly determine the alignment with high quality. Since the instance information can effectively improve the alignment's precision value, Alves et al. [17] first propose an instance-based similarity measure and then utilize a hybrid GA to determine the optimal mappings. MapPSO [18] models the ontology matching as a bipartite graph matching problem, and it proposes to use the Particle Swarm Optimization algorithm (PSO) [19] to address it. MapPSO utilizes the statistical information of the alignment to approximately evaluate its quality and guide the algorithm's search direction, which can automatically determine high-quality alignments. For dynamic applications on SSW, it is necessary to integrate the sensor ontologies online, and thus, besides the quality of the alignments, the matching efficiency is also of prime importance. Being inspired by the success of compact SI in various applications [20][21][22][23], this work proposes a compact PSO (cPSO) to integrate the sensor knowledge in AIoT. Our proposal uses a probabilistic representation of the population to execute the optimizing process, which simulates the population behaviour as it extensively explores the decision space at the beginning of the optimization process and progressively focuses the search on the most promising genotypes and narrows the search radius. Thus, a run of cPSO requires much more limited memory consumption comparing to the standard PSO. In particular, we formally define the sensor ontology matching problem and propose a problem-specific cPSO to effectively address the problem and integrate the sensor knowledge inside.
The rest of the paper is organized as follows: Section 2 presents the concept similarity measures and the mathematical model of sensor ontology matching problem; Section 3 gives the details of cPSO; Section 4 shows the experimental results; and finally; Section 5 draws the conclusions.

Preliminaries
2.1. Concept Similarity Measure. Concept Similarity Measure (CMS) is a function that takes as input two concepts' information, and output a real number in [0, 1] which represents their similarity value. In general, there are three kinds of CMS, i.e., string-based CMS, linguistic-based CMS, and structurebased CMS. In particular, string-based CMS takes as input two concepts' labels and compares their syntax information, linguistic-based CMS also compares two concepts' labels but it uses the external digital dictionary such as Wordnet [24] to calculate their similarity value, and structure-based CMS calculate the similarity value of two concepts based on their direct super-concepts and subconcepts.
Given two concepts c 1 and c 2 , we first remove the meaningless words (such as the stop word) from their labels and convert them into two token sets T 1 and T 2 , then the string-based similarity value is calculated as follows: where jT 1 j and jT 2 j are, respectively, the cardinalities of jT 1 j and jT 2 j. The first ratio indicates the overlap fraction of jT 1 j with respect to jT 2 j, the second one indicates jT 2 j with respect to jT 1 j, and the minimum value is selected as their string-based similarity value. The linguistic-based similarity value is defined as follows: where T 1,i denotes the ith token in jT 1 j and T 2,j is the jth token in jT 2 j, sim ′ ðT 1,i , T 2,j Þ = 1 if they are synonymous in Wordnet, otherwise 0. Finally, supposing superC1 and superC2 are, respectively, the super-concept set of c 1 and c 2 , subC 1 and subC 2 are, respectively, the direct subconcept set of c 1 and c 2 , the structure similarity value is defined as follows: 2 Wireless Communications and Mobile Computing where superC 1,i denotes the ith super-class of c 1 and super C 2,j the jth super-class of c 2 , subC 1,i is the ith direct subclass of c 1 and subC 2,j the jth subclass of c 2 .
Since none of the CSMs can ensure the effectiveness in all context, i.e., distinguishing all the heterogeneous concepts, usually they are combined together to enhance the result's confidence. Due to its flexibility, the weighted average strategy becomes a popular way of aggregating CSMs, which is defined as follows: where w i ∈ ½0, 1, i = 1, 2, 3, ∑w i = 1.

Alignment Extraction.
Each aggregating weight set corresponds to a unique aggregated CSM, which can be further used to construct a similarity matrix M whose element m ij is the similarity value between the ith concept of one ontology and the jth concept of the other.
On this basis, we can extract an alignment with the cardinality 1 : 1 (one concept from the source ontology is only mapped with one concept from the target ontology and vice versa) from M according to the following steps: With respect to the last correspondence, since its similarity value is low, which is regarded as unauthentic, the final alignment consists of top five correspondences.

Sensor Ontology Matching Problem.
Since the quality of an alignment is directly proportional to the mean similarity value of all the correspondences found and the cardinality of the alignment, we utilize the following equation to calculate an alignment A's quality: where jO 1 j, jO 2 j, and jAj are, respectively, the cardinalities of two ontologies O 1 , O 2 , and A, sim i is the ith correspondence's similarity value.
On this basis, the mathematical model of sensor ontology matching problem can be defined as follows: where w 1 , i = 1, 2, ⋯ represents the ith similarity measure's aggregating weight, and FðWÞ calculates the aggregating weight set W's corresponding alignment's quality.

Compact Particle Swarm Optimization Algorithm
PSO is inspired by the behaviour of birds, where each bird (particle) has the memory of the best-visited position and moves to a leading bird (elite particle) with some degree of randomization. This procedure can be described with the following update sequence, for the ith particle in generation t: where v t i is the velocity, i.e., a perturbation vector, particle t i is the ith particle's position in current generation and particle t i−lb is its best position visited in the history, particle t i−gb is the best position found by all the particles, and Φ i , i = 1, 2, 3, is the weight vector. Eq. (7) and Eq. (8) indicate that PSO update each particle by exchanging its gene values with both local best particle and global best particle to find a better position. Clearly, the original PSO is a population-based SI, and in this work, we further propose a compact version of PSO to improve the algorithm's performance.

Encoding and Decoding
Mechanism. This work uses a binary encoding mechanism, i.e., Gray code, and each particle's gene values can be divided into two parts, one stands for the weight set for aggregating the similarity measures and the other for the similarity threshold for filtering the correspondences with low similarity values. Concerning the characteristics of the weights in Section 2.1, we normalize them when decoding. We utilize one Probability Vectors (PVs) to represent a population, whose element number is equal to the length of a particle. Each PV's element represents the probability of being 1 corresponding to each gene bit of a particle. We can use PV to generate various binary particles through its probability in each dimension, and when each dimension value is closed to 1 or 0, the algorithm is about to converge. In addition, PV should be updated in each generation to move toward the elite, which can make the new particles generated in the next are more closed to the elite. Figure 2 shows an example of generating a particle through PV. Given a PV (0.1, 0.3, 0.5, 0.9) T , generate four random numbers in [0, 1], e.g., 0.2, 0.4, 0.6, and 0.1, and we can determine a new particle by comparing them with PV's elements accordingly. To be specific, since 0:2 > 0:1, 0:4 > 0:3, 0:6 > 0:5, and 0:1 < 0:9, the newly generated particle is 0001. In each generation, PV's elements are updated according to the best particle found. If the bit value of the elite particle is  Wireless Communications and Mobile Computing (decreased) by step, which can make the newly generated particle more closer to the elite particle. For example, given a PV (0.1, 0.3, 0.5, 0.9) T , an elite particle 1110 and the step length step = 0:1, since the first-bit value of the elite particle is 1, accordingly, we update the first element of PV by step, which makes the first-bit value of newly generated particle is more likely to be 1 (the same with elite particle's first-bit value). Therefore, after updating all elements of PV, the newly generated particles would be closer to the elite particle in terms of each bit value. When all elements of PV are 1 or 0, the newly generated particles will be the same and the algorithm converges.

Crossover
Operator. Given two particles particle 1 and particle 2 , the crossover operator generates one offspring by exchanging their gene values. In this work, we generate the offspring particle particle off by copying a sequential gene fragment of from particle 1 into the corresponding gene bits of particle 2 , so that particle off inherits a sequential gene fragment from both particle 1 and particle 2 . For the sake of clarity, the pseudo-code of the crossover operator is shown in Algorithm 1.

The Pseudo-Code of Compact Particle Swarm Optimization Algorithm
The pseudo-code of cPSO is presented in Algorithm 2. cPSO first initializes the probability vector PV by setting all the elements as 0.5, which is then used to initialize the local best particle particle lb and global best particle particle gb . In each generation, cPSO first tries to update the particle lb through the crossover between it and a new particle, and then the particle gb by exchanging its gene values with particle lb . Finally, cPSO updates PV according to particle gb to move the new generated particle towards it.

Experimental Setup.
In the experiment, we use the Ontology Alignment Evaluation Initiative (OAEI)'s Conference track "http://http://oaei.ontologymatching.org/2019/ conference/index.html" and two pairs of real sensor ontologies to test cPSO's performance. The experiment compares cPSO with four state-of-the-art sensor ontology matching techniques, i.e., ASMOV [25], CODI [26], SOBOM [27], and FuzzyAlign [28], on all testing cases in terms of f -measure. We empirically set cPSO's crossover probability as 0.8 and maximum generation as 3000, and cPSO's results in the tables are the average of thirty independent runs. In Table 1, we briefly describe the ontologies in these testing cases.

Statistical Comparison.
We utilize two popular statistical testing methods, i.e., Friedman's Test (FT) [29] and Holm's Test (HT) [30], to compare different competitors' performance. In particular, FT aims at checking whether there are differences among the competitors, and HT is further used to find whether one competitor statistically outperforms others. First, we need to reject HT's null-hypothesis that all competitors' performances are the same. To this end, the computed value X 2 r must be equal to or greater than the tabled critical chi-square value at the specified level of significance α = 0:05. In this work, since we are comparing 5 matchers, the critical value for 4 degrees of freedom X 2 0:05 is 9.488.
In Table 2, since X 2 r = 85:09, which is greater than 9.488, and therefore, the null hypothesis is rejected. Then, HT is further carried out. Since cPSO ranks with the lowest value, it is set as a control matcher that will be compared with others. HT's z value is the testing statistic for comparing the ith and jth competitors, which is used for finding the p value that is the corresponding probability from the table of the normal distribution. The p value is then compared with α/k − i, and according to  Figure 2: An example of generating a particle through PV.

Conclusion
AIoT aims at creating a sensor network that can communicate and process data, which can be technically implemented by using the sensor ontologies to annotate sensor data with the semantic meanings. To support the co-operations among AIoT applications based on ontologies, it is necessary to integrate these sensor ontologies by finding the alignment between them. In this work, a novel matching framework is proposed, which aggregates three kinds of CSMs and an alignment extraction approach to determine the ontology alignment. We propose a compact PSO and use it to optimize the aggregating weights for the CSMs and a threshold for filtering the alignment, which ensures the quality of the results. The experimental results show that our proposal can effectively match different sensor ontologies, and the quality of 1: * * Initialization * * 2: set generation t = 0 ; 3: set all elements in PV as 0.5; 4: generate one particle through PV to initialize the local best particle particle lb and global best particle particle gb ; 5: * * Evolution * * 6: while t < maxGen do 7: generate a new particle particle new through PV ; 8: * * * Update Particle * * * 9: particle new = crossoverðparticle new , particle lb Þ; 10: ½winner, loser = competeðparticle new , particle lb Þ ; 11: if winner == particle new then 12: particle lb = particle new ; 13: end if 14: particle new = crossoverðparticle lb , particle gb Þ ; 15: ½winner, loser = competeðparticle new , particle gb Þ ; 16: if winner == particle new then 17: particle gb = particle new ; 18: end if 19: * * * * Update PV * * * * 20: fori = 0 ; i < PV:length ; i + +do 21: ifparticle gb , i == 1 then 22: PV i = PV i + ð1/particle gb :lengthÞ ; 23: else 24: PV i = PV i − 1/particle gb :length ; 25: end if 26: end for 27: t = t + 1 ; 28: end while 29: return; Algorithm 2. Compact particle Swarm Optimization algorithm. In the future, we will further improve cPSO to match the large-scale sensor ontologies, and address the problem of Instance Coreference Resolution (ICR) in the sensor network domain, which requires matching large-scale sensor instances in the Linked Open Data cloud (LOD). We also want to extend cPSO to match the ontologies in the specific domains such as the biomedical domain and geographical domain. The particular strategies and techniques need to proposed and used to improve the alignment's precision and recall because these matching tasks require specific background knowledge base and complex forms of alignment.

Data Availability
The data used to support this study can be found in http:// oaei.ontologymatching.org.

Conflicts of Interest
The authors declare that they have no conflicts of interest in the work.