Gradient Learning Algorithms for Ontology Computing

The gradient learning model has been raising great attention in view of its promising perspectives for applications in statistics, data dimensionality reducing, and other specific fields. In this paper, we raise a new gradient learning model for ontology similarity measuring and ontology mapping in multidividing setting. The sample error in this setting is given by virtue of the hypothesis space and the trick of ontology dividing operator. Finally, two experiments presented on plant and humanoid robotics field verify the efficiency of the new computation model for ontology similarity measure and ontology mapping applications in multidividing setting.


Introduction and Motivations
The term "ontology" is originally from the field of philosophy and it is used to describe the nature connection of things and the inherent hidden connections of their components. In information and computer science, ontology is a model for knowledge storing and representation and has been widely applied in knowledge management, machine learning, information systems, image retrieval, information retrieval search extension, collaboration, and intelligent information integration. In the past decade, as an effective concept semantic model and a powerful analysis tool, ontology has been widely applied in pharmacology science, biology science, medical science, geographic information system, and social sciences (e.g., see Hu et al., [1], Lambrix and Edberg [2], Mork and Bernstein [3], Fonseca et al., [4], and Bouzeghoub and Elbyed [5]).
The structure of ontology can be expressed as a simple graph. Each concept, object, or element in ontology corresponds to a vertex and each (directed or undirected) edge on an ontology graph represents a relationship (or potential link) between two concepts (objects or elements). Let be an ontology and a simple graph corresponding to . The nature of ontology engineer application can be attributed to get the similarity calculating function which is to compute the similarities between ontology vertices. These similarities represent the intrinsic link between vertices in ontology graph. The goal of ontology mapping is to get the ontology similarity measuring function by measuring the similarity between vertices from different ontologies, such mapping is a bridge between different ontologies, and get a potential association between the objects or elements from different ontologies. Specifically, the ontology similarity function Sim : × → R + ∪{0} is a semipositive score function which maps each pair of vertices to a nonnegative real number. Example 1. Ontology technologies are widely used in humanoid robotics in recent years. Different bionic robot has a different structure. Each bionic robot or each component of a bionic robot can be represented as an ontology. Each vertex in ontology stands for a part or a construction, edge between vertices represents a direct physical link between these constructs, or these parts have intrinsic link with its function. Thus, the similarity calculation between vertices in the same ontology allows us to find the degree of association and the potential link between different constructs in bionic robots. Similarity calculation between two different ontologies (i.e., ontology mapping building) allows us to understand the potential association for different components or parts in two biomimetic robots.

Computational Intelligence and Neuroscience
Example 2. In information retrieval, ontology concepts are often used in query expansion. The user queries the information related concept . If we manually set the parameters > 0, the ontology algorithm will find that all concepts meet Sim( , ) > . Then the information related concepts will be returned to the user as the query expansion for concept .
Very recently, ontology technologies are employed in a variety of applications. Ma et al. [6] presented a graph derivation representation based technology for stable semantic measurement. Li et al. [7] raised an ontology representation method for online shopping customers knowledge in enterprise information. Santodomingo et al. [8] proposed an innovative ontology matching system that finds complex correspondences by processing expert knowledge from external domain ontologies and in terms of using novel matching tricks. Pizzuti et al. [9] described the main features of the food ontology and some examples of application for traceability purposes. Lasierra et al. [10] argued that ontologies can be used in designing an architecture for monitoring patients at home.
Traditional methods for ontology similarity computation are heuristic and based on pairwise similarity calculation. With high computational complexity and low intuitive , this model requires large parameters selection. One example of traditional ontology similarity computation method is Sim ( , ) = 1 Sim name ( , ) + 2 Sim instance ( , ) where and are two vertices corresponding to two concepts; 0 ≤ 1 , 2 , 3 , 4 ≤ 1 and ∑ 4 =1 = 1; Sim name , Sim instance , Sim attribute , and Sim structure are functions of name similarity, instance similarity, attribute similarity, and structure similarity, respectively. These similarity functions are determined by experts directly in terms of their experience. Hence, this model has the following deficiencies: (i) many parameters rely heavily on the experts; (ii) high computational complexity and thus being inapplicable to ontology with large number of vertices; (iii) pairwise similarities fall reflect the ontology structure intuitively.
Thus, a more advanced way to deal with the ontology similarity computation is using ontology learning algorithm which gets an ontology function : → R. By virtue of the ontology function, the ontology graph is mapped into a line which consists of real numbers. The similarity between two concepts then can be measured by comparing the difference between their corresponding real numbers. The essence of this algorithm is dimensionality reduction. In order to associate the ontology function with ontology application, for vertex V, we use a vector to express all its information (including its name, instance, attribute and structure, and semantic information of the concept which is corresponding to the vertex and that is contained in name and attribute components of its vector). In order to facilitate the representation, we slightly confuse the notations and use V to denote both the ontology vertex and its corresponding vector. The vector is mapped to a real number by ontology function : → R, and the ontology function is a dimensionality reduction operator which maps multidimensional vectors into one-dimensional vectors.
There are several effective methods for getting efficient ontology similarity measure or ontology mapping algorithm in terms of ontology function. Wang et al. [11] considered the ontology similarity calculation in terms of ranking learning technology. Huang et al. [12] raised the fast ontology algorithm in order to cut the time complexity for ontology application. Gao and Liang [13] presented an ontology optimizing model such that the ontology function is determined by virtue of NDCG measure, and it is successfully applied in physics education. Since the large part of ontology structure is the tree, Lan et al. [14] explored the learning theory approach for ontology similarity calculating and ontology mapping in specific setting when the structure of ontology graph has no cycle. In the multidividing ontology setting, all vertices in ontology graph or multiontology graph are divided into parts corresponding to the classes of rates. The rate values of all classes are determined by experts. In this way, a vertex in a rate has larger score than any vertex in rate (if 1 ≤ < ≤ ) under the multidividing ontology function : → R. Finally, the similarity between two ontology vertices corresponding to two concepts (or elements) is judged by the difference of two real numbers which they correspond to. Hence, the multidividing ontology setting is suitable to get a score ontology function for an ontology application if the ontology is drawn into a noncycle structure. Gao and Xu [15] studied the uniform stability of multidividing ontology algorithm and obtained the generalization bounds for stable multidividing ontology algorithms.
In the above described ontology learning algorithms, their optimal ontology function calculation model or its solution strategy is done by gradient calculation. Specifically, the ontology gradient learning algorithm obtains the ontology function vector ⃗ = ( 1 , 2 , . . . , ) which maps each vertex into a real number (the value corresponds to vertex V ). In this sense, it is good or bad policy gradient calculation algorithm that will determine the merits of the ontology algorithm. In this paper, we raise an ontology gradient learning algorithm for ontology similarity measuring and ontology mapping in multidividing setting. The organization of the rest paper is as follows: the notations and ontology gradient computing model are directly presented in Section 2; the detailed description of new ontology algorithms is shown in Section 3; in Section 4, we obtain some theoretical results concerning the sample error and convergence rate; in Section 5, two simulation experiments on plant science and humanoid robotics are designed to test the efficiency of our gradient computation based ontology algorithm, and the data results reveal that our algorithm has high precision ratio for plant and humanoid robotics applications.
Computational Intelligence and Neuroscience 3

The Gradient Computation Model for Ontology in Multidividing Setting
In order to combine the machine learning technology and ontology frame, the relevant information for each vertex in ontology graph is represented as an -dimensional vector. Hence the vertex set is a subset of R (vertex space or input space for ontology). Assume that is compact. In the supervised learning, let = R be the label set for . Denote as a probability measure on = × . Let and (⋅ | V) be the marginal distribution on and conditional distribution at V ∈ , respectively. The ontology function : → R associated with is described as = ∫ ( | V).
Then, the gradient of the ontology function is the vector of ontology functions Let z = {(V , )} =1 be a random sample independently drawn according to in standard ontology setting. The purpose of standard ontology gradient learning is to learn ∇ from the sample set z. From the perspective of statistical learning theory, the gradient learning algorithm is based on the Taylor expansion Using unknown ontology function vector ⃗ = ( 1 , 2 , . . . , ) to replace ∇ , then the standard least-square ontology learning algorithm is denoted as where and are two positive constants to control the smoothness of ontology function. Here : × → R is a positive semidefinite, continuous, and symmetric kernel (i.e., Mercer kernel) and H is the reproducing kernel Hilbert space (for short, RKHS) associated with the Mercer kernel . The notation H presented in (4) is the -fold hypothesis space of H composing of vectors of ontology functions ⃗ = By the representation theory in statistical learning theory, the ontology algorithm (4) can be implemented in terms of solving a linear system for the coefficients ; hence the coefficient matrix for the linear system has size . Therefore, this size will become huge if the size of sample set is large itself. The standard approximation ontology algorithm allows us to solve linear systems with coefficient matrices of smaller sizes. The gradient learning model for ontology algorithm in standard setting is determined as follows: where the sample set z ∈ , ⃗ 1 = 0, ∈ Z, { } is the sequence of step sizes and { } is the sequence of balance parameters.
For multidividing ontology setting, the vertex in ontology sample set can be divided into rates.
and is the label of V for 1 ≤ ≤ and 1 ≤ ≤ . Hence, (4) becomes We obtain the following gradient computation model for ontology application in multidividing setting which corresponds to (5): Here in (6) and (7), ( ) We emphasize that our algorithm in multidividing setting is different from that of Wu et al. [16]. First, the label for ontology vertex V is used to present its class information in [16], that is, ∈ {1, . . . , }, while in our setting, ∈ R.

Computational Intelligence and Neuroscience
Second, the computation model in [16] relies heavily on the convexity loss function , while our algorithm depends on the weight function .

Description of Ontology Algorithms via Gradient Learning
The above raised gradient learning ontology algorithm can be used in ontology concepts similarity measurement and ontology mapping. The basic idea is the following: via the ontology gradient computation model, the ontology graph is mapped into a real line consisting of real numbers. The similarity between two concepts then can be measured by comparing the difference between their corresponding real numbers.
Algorithm 3 (gradient calculating based ontology similarity measure algorithm). For V ∈ ( ) and is an optimal ontology function determined by gradient calculating, we use one of the following methods to obtain the similar vertices and return the outcome to the users.

Method 2. Choose an integer
and return the closest concepts on the value list in ( ).
Clearly, method 1 looks like fairer, but method 2 can control the number of vertices that return to the users.
Algorithm 4 (gradient calculating based ontology mapping algorithm). Let 1 , 2 , . . . , be ontology graphs corresponding to ontologies 1 , 2 , . . . , . For V ∈ ( ) (1 ≤ ≤ ) and being an optimal ontology function determined by gradient calculating, we use one of the following methods to obtain the similar vertices and return the outcome to the users. Also, method 1 looks like fairer and method 2 can control the number of vertices that return to the users.

Theoretical Analysis
In this section, we give certain theoretical analysis for our proposed multidividing ontology algorithm. Let = sup V∈ √ (V, V) and Diam( ) = sup V,V ∈ |V − V |. We divide this section into two parts: first, some useful lemmas are prepared; then, main results in our paper concerning approximation conclusions are presented. Our error analysis depends on integral operators and gradient learning, and more references on these tricks can be referred to in Mukherjee and Wu [18], Mukherjee et al. [19], Yao et al. [20], and Rosasco et al. [21].

Preliminary Results.
Let sequence { ⃗ } ∈N be the noisefree limit of the sequence (7) which is determined by ⃗ 1 = 0 and ⃗ +1 Our error analysis for proving main result (Theorems 12 and 13 in the next subsection) consists of two parts: sample error and approximation error. The main task in this subsection is to estimate the sample error ‖ ⃗ z − ⃗ ‖ in terms of McDiarmid-Bernsteintype probability inequality and the multidividing sampling operator. For each 1 ≤ ≤ , the multidividing sampling The adjoint of the multidividing ontology sampling operator, ( v ) : R → H , is given by where Let us express (7) by virtue of the multidividing ontology sampling operator. Note that For each pair of ( , ) with 1 ≤ < ≤ , we single out one summation ∑ =1 from (7) as We infer that Denote Hence, we have Thus, it confirms the following representation for the sequence { ⃗ z }. For simplicity, let ∏ = +1 ( − v, ) = in the following contents.

Lemma 5. Set
If { ⃗ z } is defined by (7), we deduce We should discuss the convergence of the multidividing ontology operator to the integral operator , : H → H determined by where ⃗ ∈ H . Lemma 6. Let z = {z 1 , z 2 , . . . , z } be multidividing sample set independently drawn according to a probability distribution on . Denote ( , ‖ ⋅ ‖) as a Hilbert space and suppose that : → H is measurable. If there is nonnegatives uch that ‖ (z) − E ( (v))‖ ≤̃for each V ∈ z and almost every z ∈ 1 × 2 ×⋅⋅⋅× , then for every > 0, For any 0 < < 1, with confidence 1 − , one gets 6 Computational Intelligence and Neuroscience By applying Lemma 6 to this Hilbert space, we obtain the following lemma.
In order to find the difference between ⃗ z and ⃗ , the convergence of to the ontology function defined by (55) is studied.
The sample error ‖ ⃗ z − ⃗ ‖ H is stated in the following conclusion.

Computational Intelligence and Neuroscience
By virtue of the assumptions on , , we infer that which implies that for any z ∈ 1 . Now, we consider the estimate of with measure at least 1 − such that (27) is established for any z ∈ 2 . In view of (26), for each z ∈ 2 we yield Using the fact that , , , we obtain that for any z ∈ 2 , By changing the order of summation, we determine that According to (45), we can verify that In view of the above fact and (46), we obtain that for any z ∈ However, the measure of the subset 1 ∩ 2 of is at least 1 − 2 . The desired conclusion follows after substituting for /2.
The following result is Theorem 4 in Dong and Zhou [23]; it also holds in multidividing setting and we skip the detailed proof.
Theorem 11. Let { , } ∈N be determined by (53). Then, we deduce that ⃗ − ⃗ * H Computational Intelligence and Neuroscience 9 4.2. Main Results. The first main result in our paper implies that { ⃗ z } is a good approximation of a noise-free limit for the ontology function (6) as a solution of (8) which we refer as multidividing ontology function ⃗ * .
The second main result in our paper follows from Theorem 10 and the technologies raised in [23].
Proof. Obviously, under the assumptions ∈ 3 , (56) and (57), we get Furthermore, by virtue of Proposition 15 in Mukherjee and Zhou [22], we have where constant , relies on and . Theorem 10 and these estimates reveal that with confidence 1 − , we yield The learning rate (58) is determined according to the selection of the parameters.

Experiments
To show the effectiveness of our new ontology algorithms, two experiments concerning ontology measure and ontology mapping are designed below.

Ontology Similarity Measure Experiment on Plant Data.
In the first experiment, we use plant "PO" ontology 1 which was constructed in the website http://www.plantontology .org/. The structure of 1 is presented in Figure 1. @ (precision ratio; see Craswell and Hawking [24]) is used to measure the quality of the experiment data. Here, we take = 2, = 3, = 1, and = 0.1. We first give the closest concepts for every vertex on the ontology graph by experts in plant field, and then we obtain the first concepts for every vertex on ontology graph by Algorithm 3 and compute the precision ratio. Specifically, for vertex V and given integer > 0. Let Sim ,expert V be the set @3 average precision ratio @5 average precision ratio @10 average precision ratio Algorithm 3 in our paper 0.5042 0.6216 0.7853 Algorithm in [11] 0.4549 0.5117 0.5859 Algorithm in [12] 0.4282 0.4849 0.5632 Algorithm in [13] 0 . . .
Then the precision ratio for vertex V is denoted by The @ average precision ratio for ontology graph is then stated as At the same time, we apply ontology methods in [11][12][13] to the "PO" ontology. Calculating the average precision ratio by these three algorithms and comparing the results to Algorithm 3 rose in our paper, part of the data is referred to in Table 1. When = 3, 5, or 10, the precision ratio by virtue of our gradient computation based algorithm is higher than the precision ratio determined by algorithms proposed in [11][12][13]. In particular, when increases, such precision ratios are increasing apparently. Therefore, the gradient learning based ontology Algorithm 3 described in our paper is superior to the method proposed by [11][12][13].

Ontology Mapping Experiment on Humanoid Robotics
Data. For the second experiment, we use "humanoid robotics" ontologies 2 and 3 . The structure of 2 and 3 is shown in Figures 2 and 3, respectively. The ontology 2 presents the leg joint structure of bionic walking device for six-legged robot, while the ontology 3 presents the exoskeleton frame of a robot with wearable and powerassisted lower extremities. In this experiment, we take = 2, = 4, = 1, and = 0.05. The goal of this experiment is to give ontology mapping between 2 and 3 . We also use @ precision ratio to measure the quality of experiment. Again, we apply ontology algorithms in [12,13,17] on "humanoid robotics" ontology and compare the precision ratio which is gotten from three methods. Some results referred to in Table 2. Taking = 1, 3, or 5, the precision ratio in terms of our gradient computation based ontology mapping algorithm is higher than the precision ratio determined by algorithms @1 average precision ratio @3 average precision ratio @5 average precision ratio Algorithm 4 in our paper 0.4444 0.5185 0.6111 Algorithm in [17] 0.2778 0.4815 0.5444 Algorithm in [12] 0.2222 0.4074 0.4889 Algorithm in [13] 0  proposed in [12,13,17]. Particularly, as increases, the precision ratios in view of our algorithm are increasing apparently. Therefore, the gradient learning based ontology Algorithm 4 described in our paper is superior to the method proposed by [12,13,17].

Conclusions
As a data structural representation and storage model, ontology has been widely used in various fields and proved to have a high efficiency. The core of ontology algorithm is to get the similarity measure between vertices on ontology graph. One learning trick is mapping each vertex to a real number, and the similarity is judged by the difference between the real number which the vertices correspond to. In this paper, we raise a gradient learning model for ontology application in multidividing setting. The sample error and approximation properties are given in our paper. These results support the gradient computation based ontology algorithm from the theoretical point of view. The new technology contributes to the state of the art for applications and the result achieved in our paper illustrates the promising application prospects for multidividing ontology algorithm.

Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.