As a greater number of Web Services are made available today, automatic discovery is recognized as an important task. To promote the automation of service discovery, different semantic languages have been created that allow describing the functionality of services in a machine interpretable form using Semantic Web technologies. The problem is that users do not have intimate knowledge about semantic Web service languages and related toolkits. In this paper, we propose a discovery framework that enables semantic Web service discovery based on keywords written in natural language. We describe a novel approach for automatic discovery of semantic Web services which employs Natural Language Processing techniques to match a user request, expressed in natural language, with a semantic Web service description. Additionally, we present an efficient semantic matching technique to compute the semantic distance between ontological concepts.
A lot of web services are being offered nowadays, and this trend is going to continue in the future. A demand increases consequently for an automatic discovery framework of services that are highly relevant to user requirements.
The widespread adoption of Web services is enabled by a set of flexible and extensible XML-based standards such as WSDL [
The Semantic Web [
Many approaches for automatic Web service discovery have been proposed, as discussed in Section
Also, from the service requestor’s perspective, the requestor may not be aware of all the knowledge that constitutes the domain ontology. Specifically, the service requestor may not be aware of all the terms related to the service request. As a result of which many services relevant to the request may not be considered in the service discovery process.
Another limitation of some proposed framework consists on their semantic matching approaches. In fact, both service provider and service requester use domain ontologies to build semantic service description file. The semantic matchmaker uses the domain ontologies from two sides to determine their degree of semantic match. Most of proposed approaches assume that both service provider and service requester use the same ontology domain to describe service capabilities which is not applicable in real-world scenario. To overcome this ontology heterogeneity, it is needed to utilize ontology mapping techniques to coordinate the differences between these ontologies to support interoperability.
In order to address the cited limitations of existing approaches, we first propose a discovery framework based on a user query expressed in natural language. Then, we perform query preprocessing using Natural Language Processing (NLP) techniques in order to extract keywords from the user query. Compared with formal queries, keyword-based queries have many advantages. They offer a simple syntax in terms of a list of keyword phrases and open vocabularies wherein the users can use their own words to express their information requirement. Also, keyword-based search is more familiar to the user due to its widespread usage (e.g., Search Engines, UDDI registries).
However, creating a semantic Web service discovery engine using a keyword-based approach can be a complex task. In fact, many issues should be considered in order to answer these questions. How to extract the most relevant information from a semantic Web service description? How to match keywords from the user query with textual information from a semantic Web service description? How to map English words to ontological concepts in order to perform semantic matching?
Secondly, our proposed framework does not make any assumptions about the description language of the advertised Web Service. In effect, a published Web service could be described in WSDL or in any semantic Web service description language like OWL-S or WSMO. Finally, to overcome the ontology heterogeneity problem, our proposed framework employs some Natural Language (NLP) techniques to extract senses from user keywords and Web service descriptions. It also contains a mapping module which converts English terms present in WordNet [
The remainder of the paper is structured as follows. We present the related work in Section
Many research efforts have been made to present a discovery framework for Web services. They are generally devised into syntactic-based approaches and semantic based approaches. The major differences between these two approaches are summarized in Table
Syntactic versus semantic approaches for Web services discovery.
Syntactic-based approaches for WS discovery | Semantic-based approaches for WS discovery | |
Matchmaking technique | (i) A simple keyword-based search | (i) Exploit the semantic representation of concepts describing a Web Service and their relations in an ontology |
Advantages | (i) Simple and widely used technique. | (i) Minimize the manual discovery and usage of Web service by allowing software agents to automatically and dynamically discover WSs |
Disadvantages | (i) Do not allow retrieval of Web Services with similar functionality | (i) More complex technique |
The semantic-based approaches utilize semantic description for Web services to automate the discovery process and employ the Semantic Web techniques. GODO [
Sycara et al. introduced LARKS [
METEOR-S discovery [
In this section, we describe some concepts definitions and methodologies utilized in our framework. We first present some semantic Web related technologies. Then, we briefly describe some Natural Language Processing (NLP) techniques utilized in our approach in order to process a user query written in natural language and Web services descriptions before performing semantic matchmaking. We finally present an overview about WordNet and SUMO projects.
An ontology is an explicit shared specification of various conceptualization in a particular domain. It plays a vital role in the semantic Web and tries to capture the semantics of a domain by deploying knowledge representation primitives, enabling a machine to understand the relationships between concepts in a domain.
Because some relations and axioms, ontology can be reasoned availably, therefore we can express the semantics of a concept by establishing the complex relationship among other concepts, attributes, and instances. Domain ontology is a detailed description of the hierarchical concepts of the field. It abstracts and conceptualizes objects, relationship, and class to be expressed as a vocabulary. The sets of glossary in the vocabulary are concepts. Ontology is a detailed description of the world’s conceptualization. Domain ontology is the sets of all concepts from the domain. In the actual application, people always build domain ontology in their respective fields (e.g., travel ontology, communication ontology, and medical ontology).
A number of ontology’s description languages have been proposed to address the semantic heterogeneity among Web resources and services. However, OWL is considered as a major technology for the future implementation of a Semantic Web since it is based on XML so OWL information can be easily exchanged between different types of computers using different operating systems and application languages.
The Web Ontology Language (OWL) [
Traditional Web services are described using XML-based standards and published into a specific registry standard.
WSDL [
UDDI [
Semantic Web services are services that have been enriched with machine-interpretable semantics. Semantic description aims to enhance the integration and Web service discovery by utilizing the machine readable constructs of the representation.
Several standards have been proposed for creating semantic Web services. Each one of them is having their own strength and can be used in a specific situation. Some of the popular languages are described as follows.
OWL-S [
WSMO [ Ontologies: provides the terminology used by other WSMO elements. Web service descriptions: describes the functional and behavioral aspects of a Web service. Goals: represents user desires. Mediators: aims to automatically handle interoperability problems between different WSMO elements.
Current WSDL standard operates at the syntactic level and lacks the semantic expressivity needed to represent the requirements and capabilities of Web Services [
Natural Language processing (NLP) [
In our work, we employ some NLP techniques which are presented as follows. Word splitting: is the process of parsing concatenated text (i.e., text that contains no spaces or other word separators) to infer where word breaks exist. Stemming: is the process for reducing inflected (or sometimes derived) words to their stem, base, or root form. For example, a stemming algorithm reduces the words “fishing”, “fished”, “fish,” and “fisher” to the root word, “fish”. Part Of Speech (POS) tagging: is the process of marking up the words in a text (corpus) as corresponding to a particular part of speech, based on both its definition as well as its context. A POS tagger enables the identification of words as nouns, verbs, adjectives, adverbs, and so forth. Word Sense Disambiguation (WSD): the process of identifying which sense of a word (i.e., meaning) is used in a sentence, when the word has multiple meanings (polysemy).
WordNet [
WordNet is of interest not only because it is a vast repository of lexical data, but also because it is so widely used. It has been leveraged for automated sense-disambiguation, term expansion in IR systems, and the construction of structure representations of document content. In fact, WordNet is so popular that it is almost considered a de facto standard in the NLP community.
SUMO (Suggested Upper Merged Ontology) [
The SUMO was created by merging publicly available ontological content into a single, comprehensive, and cohesive structure.
In this section, we present our discovery framework presented in Figure
Automatic Web service discovery framework architecture.
Sequence diagram for use case “discovery of Web services.”
Our discovery process aims to enable efficient search for appropriate Web services according to a user query. In our proposed discovery framework, we suppose that there is a set a Web services described in WSDL, OWL-S, or WSMO languages and published by services providers in a Web service registry. These descriptions are parsed and read by our system in order to extract all useful information elements for the matchmaking process. Some NLP techniques are then applied to extracted information to find useful words for next steps. As words could have different senses, a sense disambiguation is performed. In order to map each word to its corresponding concept in SUMO ontology, a WordNet/SUMO mapping is carried out. The final process in the framework is semantic matchmaking. It is based on calculating semantic distance between concepts defined in ontology.
From the service requester point of view, our system offers a simple graphical user query interface to facilitate the discovery process. Therefore, the framework has as input a query expressed in natural language, from which useful keywords are extracted. Consequently, the overall architecture and implemented technologies are transparent to user. The user query must be also preprocessed to be matched to service description using the same processes as service description. Finally, the concepts mapped to the senses disambiguated from the search query are matched with the concepts mapped to the senses disambiguated from Web service description.
As presented in Sections
In the case of OWL-S or WSMO Web service, names and nonfunctional descriptions of elements such as the capabilities (inputs/outputs), conditions, and effects of the Web service should be extracted by the service reader. After extracting concepts out of those elements, the service reader searches for their nonfunctional descriptions in the relevant ontology which is extracted from the ontologies database.
In the case of WSDL description, the service reader extracts operations parameters (all terms under <element name> and <documentation> tag).
Before extracting words from a Web service description, the description has to be parsed. Different languages can represent different syntaxes and therefore different parsers are needed. For example, for WSMO a WSML parser like WSMO4J [
Web service description must be preprocessed in order to transform extracted elements into useful words that could be processed later. User query must be also pre-processed to extract useful keywords from a query written in natural language. For pre-processing, some NLP techniques are utilized.
First, word segmentation is performed if needed to split a string of written language into its component words. The white space is a good approximation of a word delimiter. In the case of element names, simply splitting the words when a case transition has occurred is enough, since in most cases they are written as camel words (e.g., TravelCheckingService). To find useful words for WSD, each word in the sentences found must be tagged with the right Part-of-Speech (PoS) such as noun, verb, and adjective. Markups and punctuations are then removed. Translation of uppercase characters into lowercase is also needed. Second, all stop words are removed from extracted elements. Stemming is finally processed to transform obtained words to root words.
The Word Sense Disambiguation module establishes the context of words received from the preprocessor by extracting relevant senses. This will result in a set of senses, each representing a single meaning of a word. In general terms, WSD involves the association of a given word in service description or in user request with a definition or meaning (
In our approach, we use a variant of the SSI algorithm [
The similarity function (
The mappings between WordNet and the SUMO can be regarded as a natural language index to the SUMO. It presents a tool which permits the user to enter English terms and which returns SUMO concepts that are associated with the input terms via WordNet synsets. The WordNet/SUMO mapping module offers the capability to assign the structured meanings of the SUMO to free text. In fact, all extracted senses from WSD module are matched to the equivalent concept in SUMO ontology. Thus, semantic matchmaking could be applied to user query-related concepts with service-description-related concepts.
A basic step toward semantic matchmaking is to calculate the semantic distance between concepts that are defined in an ontology. In the semantic matchmaking module, we utilize a novel edge-based approach to measure the semantic distance between two ontological concepts which is presented in details in our previous work [
Equation (
The work proposed in this paper provides an approach for automatic discovery of Web services.
We lay stress on the fact that, since users often have little knowledge about Web-service-related technologies and implementation details, a discovery framework that has a user query expressed in natural language as input is needed. Our proposed framework presents a discovery mechanism that enables Web-service-discovery-based on keywords written in natural language with no constraints about the used Web service description language. We presented a novel approach which takes advantages from keyword-based search simplicity and from Semantic web emergent technologies to automate the discovery process of Web services.
Some of our work in progress is aimed at extending our approach to service discovery, to support service invocation and workflow composition.