Automating Mashup Service Recommendation via Semantic and Structural Features

Increasing physical objects connected to the Internet make it possible for smart things to access all kinds of cloud services. Mashup has been an effective way to the rapid IoT (Internet of -ings) application development. It remains a big challenge to bridge the semantic gap between user expectations and application functionality with the development of mashup services. -is paper proposes a mashup service recommendation approach via merging semantic features from API descriptions and structural features from the mashup-API network. To validate our approach, large-scale experiments are conducted based on a real-world accessible service repository, ProgrammableWeb. -e results show the effectiveness of our proposed approach.


Introduction
Internet of ings (IoT) was firstly introduced to the community in 1999 for supply chain management. Now, we will arrive to the post-cloud era, where there will be large amounts of smart things to access all kinds of cloud services, and the capabilities of smart things can be enhanced by interacting with other functional entities through the interfaces of cloud services. Since the functionality of an individual service is too simple to satisfy the complex requirements of users, in an IoT application, people prefer to combine several cloud services together. In recent years, mashup technology has gained great attention since it can support the development of IoT applications by composing existing cloud services in the form of web APIs.
Several platforms, including Seekda1, Google, and ProgrammableWeb [1], enabled vibrant software ecosystems between service providers and developers, where developers utilize one or more query items such as keywords and category. However, keyword-based service recommendation mechanisms usually have low accuracy [2]. NLP techniques are increasingly applied in the software engineering domain, which have been shown to be useful in requirements engineering [3], usability of API documents [4], and other areas [5].
us, semantic queries may get more accurate results.
Most of API descriptions and the tags are all text. en, these approaches use similar semantic metrics that capture the service similarity, such as the similarity of service descriptions and tags. Indeed, most of similarity metrics, which are proposed to quantify the similarity of service descriptions and the similarity of tags, are based on the semantic information in the text. Some works utilized structural metrics to quantify the structural similarity where the metrics reached the topological information extracted from methods, attributes, classes and their couplings, etc [6][7][8]. However, there are very few works in which structural similarity has been introduced to guide the service ranking.
We present AMSRSSF (automating mashup service recommendation via semantic and structural features), a framework that utilizes NLP and graph-embedding techniques to recommend services for developers in this study. AMSRSSF takes as input the developers' personalized requirements and structural semantic and determines which services can be recommended for developers. Furthermore, we show that the structural semantics can be generated from a two-mode graph, which can describe mashups, web APIs, and their relations. We evaluate AMSRSSF against a dataset of description documents including 10050 web-based services and 7155 mashups. Our experiments demonstrate that our approach can rank mashup services efficiently, and its performance is better than semantic-based mashup service ranking approaches alone. is paper makes the following main contributions: (1) We propose a mashup service recommendation approach via merging semantic features from API descriptions and structural features from the mashup-API network (2) We conduct comprehensive experiments on a realworld dataset, demonstrating the effectiveness of our approach e remainder of this paper is organized as follows: Section 2 gives a motivating scenario and presents our approach. Section 3 describes the experiments in detail. Section 4 discusses the related works, and Section 5 concludes the paper with future work directions.

Automating Service Recommendation via Semantic Features and Structural Features
In this section, we first present a scenario to illustrate the motivation of our work in Section 2.1. en, we discuss the issues of mashup service recommendation in Section 2.2; next, we propose a mashup service recommendation approach via semantic and structural features in Sections 2.3-2.8.

Motivation.
Under Internet scenario, more structures are presented between data objects. A typical scenario is a knowledge graph that consists of a global relationship diagram of user behavior data and an item with more attributes. Structural and semantic similarities characterize different aspects of the service similarity. Indeed, two similarities are orthogonal, which motivate our work. We expected that better service recommendation approaches might be proposed by using structural features. In particular, we propose a simple but effective approach AMSRSSF to rank mashup services by using semantic and structural features. First, it applies a two-mode graph to describe mashups, web APIs, and their relations formally. Second, we quantify the structural similarity between every pair of mashup services based on the two-mode graph. e structural similarity results from the structural context in which web APIs are used by mashup services. Finally, we introduce a merging embedding vectors algorithm that only considers the pairwise similarities between mashup services to rank them effectively. According to the above, it is certainly valuable to introduce structural features and then improve the accuracy of recommendation based on structural semantics.
Finally, there are some other issues which need to be addressed: (1) How to extract semantic from natural language API descriptions? (2) How to generate structural semantics of the mashupmashup network?
(3) How to merge multiple embedding vectors for better accuracy? (4) How do we design experiments for performance evaluation?
2.2. Problem Description. Our work considers the task of recommendation via representation learning as follows: given a requirement q described in multidimensional information, which includes semantic and structural features, the corresponding recommended services are given via similarity calculation, which exists in terms of the representation learning model. Our model consists in learning a function S(·), which can score requirement-service pairs (q, t). Hence, finding the top-ranked answer t(q) to a requirement q is directly carried out by and to handle multiple recommended services, we present the results as a ranked list rather than taking the top service.

Overview of Our Framework.
In this section, we will present an overview of our approach. As shown in Figure 1, our approach mainly consists of four components: semantic extraction based on NLP pipeline, structural embedding generation of the mashup-mashup network, merging of multiple embeddings, and recommendation based on fusion embedding. More specifically, the descriptions of web-based services will be crawled from ProgrammableWeb.
We firstly preprocess natural language API descriptions using NLP pipeline and extract text semantic embedding. We then generate structural embedding of the mashupmashup network; next, we merge semantic embedding and structural embedding. Finally, the developers give multidimensional information of requirements, which will be parsed into semantic and structural features and passed to similarity calculation for ranking ( Figure 2).

Semantic Extraction Based on the NLP Pipeline.
is section preprocesses the sentences in API descriptions using NLP pipeline named as NLTK [9], which resembles the following high-level flow in Figure 3.
For purposes of illustration, we introduce a sample as follows: " e Twitter microblogging service includes two RESTful APIs. e Twitter REST API methods allow developers to access core Twitter data." We have the following steps.

EOS (End of Sentence) Detection.
Breaking the texts into paragraphs might be important. However, it may be unlikely to help EOS detection, which marks the boundary of a sentence from texts and breaks them into sentences. We

End of sentence (EOS) detection
Tokenization Tokenization Chunking Named entity extraction  Mathematical Problems in Engineering utilize them to represent logical units of thought and tend to give a predictable syntax for further analysis.
We will obtain the following results by parsing a sentence with NLTK: [" e Twitter micro-blogging service includes two RESTful APIs. e Twitter REST API methods allow developers to access core Twitter data."] A period "." is utilized to mark the end of a sentence. However, there are other legal usages of the period such as decimal (periods between numbers), ellipsis "...," and shorthand notations just like "Mr." and "Dr.". We can overcome this difficulty by looking up abbreviations from WordNet and detecting decimals including period character by regular expressions.
Moreover, there are instances where an enumeration list is used to describe data type, such as the following: " is service where you can obtain the following data: (a) abc . . ., (b) xyz . . .." It is prone to understand the implication for a human, but it is arduous to seek appropriate boundaries as far as a machine is concerned. An improvement over breaking on characters "." is leveraging the Syntax information: (1) Placements of tabs (2) Bullet points (numbers, characters, roman numerals, and symbols) (3) Delimiters such as ":" to detect appropriate boundaries We further improve the EOS detection using the following patterns: (1) We remove the leading and trailing " * " and "-" characters in a sentence. (2) We consider the following characters as sentence separators: "-", "-", " ", " §", "￥", "◇", "}", "|", "∼", and "★" . . .. (3) For an enumeration sentence, we split the sentence into short ones for each enumerated item.
Currently, sentence detection has occurred for arbitrary text.
After finishing POS tagging, we can chunk nouns as part of noun phrases and then try to deduce the types of entities (e.g., people, places, organizations, etc.).

2.4.4.
Chunking. Chunking means to analyze tagged tokens from sentence and identify logical concepts, such as "Pandora Internet radio" and "Google Maps." We annotate such phrases as single lexical units such as a noun phrase and a verb phrase. e state-of-the-art approaches achieve around 97% accuracy in classifying phrases and clauses over wellwritten news articles 4 . We develop chunking with more complex grammars to substitute default one in NLTK.
In addition, abbreviations usually lead to incorrect parsing for sentence; for example, "Application Programming Interfaces (APIs)" is treated as single lexical unit. We can work it out looking up abbreviations from WordNet [10].

Semantic Embedding Generation.
In order to convert named entity into feature vectors and utilize them in latter models, we utilize word2vector to encode named entities from API descriptions.
Word2vector is a powerful method to extract meaningful grammatical and semantic features, which can transform a word into a fixed-length vector. Each named entity from API description can be reduced to embedding vector V text , and semantic embedding of API description can then be represented as a fixed-length vector as follows: where N m is the number of named entities of API description.

Two-Mode Graph of Mashup Services and Web
APIs. e first step of structural embedding generation is to generate the mashup-mashup network. e network can be generated through mashup behavior sequence. e information, such as the same APIs between mashups, can be utilized to establish the edges between mashups as well, so as to generate structural graph. e embedding vectors based on this graph can be called structural embedding vectors.
In this article, two networks are utilized to represent all relationships of mashup services and APIs: one is the network between API and mashup services, and the other is the network between mashup services. Definition 1. MAN (mashup-API network) is a network of 2 modes, where the nodes represent the API and the mashup services, and edges represent the relationship between them.
Definition 2. MMN (mashup-mashup network) is a weighted network between mashups, where each node represents a mashup service and edges represent the relationship between them; the weight of edge represents the relevancy between them. MMN can be represented with a triple: where N is a node set of all mashup services; E is a set of all edges in MMN network; W is an edge weights matrix. For example, if Nodei and Nodej are linked, then Wij is defined as the number of API intersections of two mashup services.

Structural Embedding Generation of the Mashup-Mashup
Network. Structural embedding is designed to represent a network in a low-dimensional space and retain as much information as possible. By using structural embedding, each node in the network can be represented as a vector. e traditional sequence embedding is inadequate for network structure. We can conduct a random walk on the graph structure to generate a large number of node sequences, which were input into word2vec as training samples and obtained structural embedding. Figure 4 illustrates the random walk process graphically.
As shown in Figure 4, random walk process firstly selects random mashup as starting point and produces the sequence, where the hop probability of the random walk needs to be formally defined, that is, the probability of traversing vj after reaching node vi. If the graph is an undirected graph, then the hop probability from node vi to node vj is defined as follows: where N(v i ) is the set of all outgoing edges of node v i and w ij is the weight of edge from nodevi to node v j . We further tradeoff the homogeneity and structural equivalence of network embedding by adjusting the weight of random walk.
Specifically, the homogeneity of the network means that the embeddings of nodes that are close to each other should be as similar as possible. As shown in Figure 5, the embedding expression of node u should be close to the nodes which are connected to s1, s2, s3, and s4. at is the embodiment of homogeneity.
Structural equivalence means that the embeddings of nodes with similar structures should be as close as possible. In Figure 4, both node u and node s6 are the central node of their local networks, respectively, which are similar in structure. eir embedding expressions should be similar, which is the embodiment of structural equivalence.
Random walk needs to be inclined to the process of breadth first search (BFS) in order to express the homogeneity of the network. ere will be more homogeneous information in generated sequence because BFS prefers access to direct connection. On the other hand, depth-first search (DFS) can grasp the overall structure of the network through multiple hops. However, it is important to trade off the tendency of BFS and DFS. Formally, the hop probability from node v to node x is enhanced as follows: where λ pq (t, x) is defined as follows: Mathematical Problems in Engineering where d tx refers to the distance from node t to node x and the parameters p r and p io jointly control the tendency of random walk. Parameter p r is known as return parameter, which controls the probability of accessing the node visited repeatedly. Parameter p io is called in-out parameter, which controls the outward or inward tendency. Random walk tends to access nodes close to node t if p io ≥ 1 and tends to access nodes far from node t if p io ≤ 1. Our approach enhances Node2vec [11] by replacing w ij as P(v|x). Figure 6 shows the hop probability of our approach while jumping from node t to node v in the next step. e homogeneity and structure equivalence of the network embodied by equation (5) can be intuitively explained in the recommendation system. Items with the same homogeneity are likely to be items of the same category and attributes, while items with the same structure are items with similar trends or structural attributes. ere is no doubt that both of them are very important features in recommendation systems.
Next, we should encode above node sequences into structural embedding. Our model named sequence2vector is derived from skip-gram, where the node sequence also utilizes central node sequence to predict the context, and its data structure can be represented by a triplet s t−1 , s t , s t+1 . s t is input, and s t−1 , s t+1 is output. e neural network structure of the sequence2vector model is the common encoder-decoder architecture using the GRU model. erefore, word vector is utilized while training sequence vector; the output of encoder is embedding vector of last word of node sequences. e formulas for the realization of encoding stage are as follows: where h t represents the output of the hidden layer at the time of t. While we take s t+1 as an example, the formula for the realization of decoding stage is as follows: where C r , C z , and C are used as vector bias for reset gate, update gate, and hidden layer, respectively. Sequence2vector can even be combined into the subsequent deep learning network to retain different features of objects (Algorithm 1).

Embedding Fusion.
From the above, we have got semantic embedding of API description and structural embedding through mashup behavior sequence. en it is important to integrate multiple embedding vectors to form the final embedding of mashup. e simplest approach is to add the average pooling layer into the deep neural network to average different kinds of embedding. On this basis, we have added weights to each embedding, as shown in Figure 7. e hidden representation layer is the layer that performs the weighted average operation on different kinds of embedding. After the weighted average embedding vector is obtained, it is directly input into the softmax layer. In this way, the weighted α i (i � 0, . . . , 1) can be trained.  In the actual model, we adopted e αi instead of α i as the weight of corresponding embedding, which can avoid the weight being 0 and own good mathematical properties during the process of gradient descent.
Our model finally inputs semantic and structure embedding into the model to generate the final fusion embedding.

Interfering Semantic-Related Services.
In this section, we utilize fusion embedding from the above to calculate the similarity between mashups as follows: where ‖ ‖ 1 is the Manhattan distance. Finally, we recommend the top-k mashups to developers.

Experiments
In this section, we conduct experiments to compare our approach with state-of-the-art method. Our experiments are intended to address the following questions: (1) Does structural embedding improve the effectiveness with respect to semantic similarities?
(2) What is the performance of top-k rankings of our embedding models?

Data Set Description.
We adopt a real-world dataset, ProgrammableWeb, which was crawled from all registered services in publically available repository until Nov. 25, 2015. e dataset contains description documents including 12250 web-based services and 7875 mashups, which have attributes such as name, tags, summary, and description.

Metrics.
In statistical classification [12], Precision is defined as the ratio of the number of true positives to the total number of items reported to be true, and Recall is defined as the ratio of the number of true positives to the total number of items that are true. F-score is defined as the weighted harmonic mean of Precision and Recall. Accuracy is defined as the ratio of sum of true positives and true negatives to the total number of items. Higher values of precision, recall, F-score, and accuracy indicate higher quality of identifying the semantic relationship. Based on the total number of TPs, FPs, TNs, and FNs, we computed the precision, recall, and F-score as follows: where TP denotes true positives, FP denotes false positives, FN denotes false negatives, and TN denotes true negatives. MAP (mean average precision) is the average of precisions computed at the point of each of the relevant documents in the ranked sequence, whose equation is as follows: where AP is average precision and Q R denotes relevant documents.
We utilize the normalized discounted cumulative gain (NDCG) [13] metric as well, which is a popular metric for evaluating ranking results. Given ideal rankings (used as ground truth) and the derived rankings, the NDCG value of top-k ranked services can be calculated by the following formula: where DCG k and IDCG k denote the discounted cumulative gain (DCG) values of derived rankings and ideal rankings of top-k services. e value of DCG k can be calculated by the following formula: where rel i is the score of the service at position i of rankings. e premise of DCG is that a high-quality service appearing lower in a ranking list should be penalized as the score is reduced logarithmically proportional to the position of the result divided by log 2 i.

Baseline Approaches.
In this section, to show the effectiveness of our approach, we compare our approach with the following approaches: (1) SimpleGraph [14]: this approach extracts features of subgraphs of question-answer pairs and utilizes a logistic regression classifier to predict the probability that a candidate answer is correct (2) AvgPara: the semantics for a given question in this model are directly averaged to predict the answers (3) DataAugment: this approach employs semantics for data augmentation during training which utilize the question-answer pairs and semantics to automatically generate new training samples (4) SepPara [15]: this approach is the semantic scoring model, which is trained on classification data, without considering question-answer pairs (5) Para4QA [15]: this approach results in learning semantics, where semantic scoring and QA models are trained end-to-end on question-answer pairs

Evaluation Protocols.
In this section, we have designed the experiments to evaluate our approach. We then created a set of (q, t, l) triples, where q is a API description, t is a mashup-mashup network, and l is a label (classes of mashups). We manually tagged each pair with a label l. is resulted in a set of approximately 20 human judgments. e union of them composes set T1, which is considered as the baseline to evaluate the precision of mashup recommendation. Five authors pick out 30 API descriptions of mashups as well; top-20 rankings are artificially extracted by two authors at least after analyzing and augmenting them, which is considered as the baseline to evaluate the rankings of mashup recommendation. e union of them composes set T2.

Training Details.
We utilized encoder-decoder architecture using the GRU model with 2 layers, with 800 cells at each layer and 800 dimensional word embeddings. e complete training details are given below: (1) We used word2vector to initialize the word embedding matrix. Out-of-vocabulary words were replaced with a special unknown symbol. We also augmented API descriptions with start-of-sequence and end-of-sequence symbols. Word vectors for these special symbols were updated during training. Model hyperparameters were validated on set T1. Each of the structural and semantic features is represented by a vector of 64 dimensions. e dropout rate was selected from {0.2, 0.3, 0.4}.
(2) Parameters were randomly initialized from a uniform distribution U(−0.08,0.08). e learning rate was 0.7. After 5 epochs, we began halving the learning rate every half epoch. We trained our models for a total of 20 epochs. e batch size was set to 150. To alleviate the exploding gradient problem, the gradient norm was clipped to 5.  Table 2 presents the results of the ranking experiments. Our approach obtains the best prediction accuracy (largest NDCG values) under all the experimental settings consistently, since our approach considers fusion embedding of semantic and structural features, which improved the accuracy of similarity calculation.

reats to
Validity. e threats to external validity are related to the representativeness of the dataset. To evaluate the quality of semantic extraction and service recommendation, we applied our approach on real-world dataset, ProgrammableWeb, which is suitable for our study due to public availability and activeness.
is threat has been partially mitigated by the fact that we collected description documents including 12250 web-based APIs and 7875 mashups. Furthermore, we applied our approach on top-10 subjects of them as well. e threats to internal validity of our study are related to the results of the pairwise comparison on the criteria (see Section 3.5). ere is a possibility that the results are subjective, as people may be biased. It may affect the weighted α i which is assigned to the semantic and structural embedding.
is threat has been partially mitigated by the fact that five authors elicit 10 service description documents from each of top-10 subjects independently to evaluate the quality of semantic relationship extraction and query, where we manually achieve top-10 rankings as the baseline in our evaluations.

Related Works and Discussion
Our approach touches some areas such as recommendation, DL artifacts, and requirements engineering. We next discuss relevant works pertinent to our proposed approach in these domains.
Most of recommendations focused on keyword-based searching, where the keywords are usually tokenized from the interface of web-based services, such as input, output, precondition, and postcondition. For example, Junghans et al.
give an oriented functionalities and requests formalization method [16]. OWLS-MX is proposed to calculate the similarity of concept in OWL-S for recommendation [17]. Chen et al. give a clustering approach named as WTCluster using WSDL documents and service tags [18]. Mueller et al. give an approach based on nature-inspired swarm intelligence [19].
Generally, most ranking approaches ranked services by utilizing the text in service profiles (mashup descriptions, API descriptions, WSDL documents, tags, etc.) to calculate the semantic similarity between every pair of services. In contrast, our proposed approach utilized structural similarities to guide the ranking activities, which can reduce false positives and reach higher performance.

Conclusion and Future Work
is paper introduces a mashup service recommendation approach via merging semantic features from API descriptions and structural features from the mashup-API network. Our approach can significantly outperform previous works after encoding semantic and structural features.
In spite of these promising results, some exciting challenges remain, especially in order to scale up this model to more dimensions. Furthermore, many more modes have to be carried out to encode the complex semantics into the embedding space.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.  Mathematical Problems in Engineering 9