HyOASAM: A Hybrid Open API Selection Approach for Mashup Development

At present, Mashup development has attracted much attention in the field of software engineering. It is the focus of this article to use existing open APIs to meet the needs of Mashup developers. -erefore, how to select the most appropriate open API for a specific user requirement is a crucial problem to be solved. We propose a Hybrid Open API Selection Approach for Mashup development (HyOASAM), which consists of two basic approaches: one is a user-story-driven open API discovery approach, and the other is multidimensional-information-matrix(MIM-) based open API recommendation approach. -e open API discovery approach introduces user stories in agile development to capture Mashup requirements. First, it extracts three components from user stories, and then, it extracts three corresponding properties from open API descriptions. Next, the similarity calculation is performed on two sets of data. -e open API recommendation approach first uses MIM to store open APIs, Mashups, and the invoking relationship between them. Second, it enters the matrix obtained in the previous step into a factorization machine model to calculate the association scores between theMashups and the open APIs, and TOP-N open API lists for creating theMashup are obtained. Finally, experimental comparison and analysis are carried out on the PWeb dataset. -e experimental results show that our approach has improved significantly.


Introduction
Unlike object-oriented software engineering [1,2], serviceoriented software engineering (SOSE) is used to design, develop, and maintain software systems [3] that use the principles of service-oriented architecture (SOA) [4]. Open APIs are the basis of SOSE [5]. Mashup development is a novel development practice of SOSE for building multiservice applications by integrating single-function open APIs, which becomes more and more popular. Mashup refers to a temporary combination of web applications that allows users to create entirely new APIs using content retrieved from external data sources [6]. As Mashup application developers face the explosive growth of open APIs on the Internet, they often suffer from the overload of API information.
At present, there are a large number of open APIs over the Internet with similar descriptions but different functionalities and qualities, which undoubtedly affect Mashup application developers' decisions. In addition, unstructured description documents of open APIs increase the difficulty of semantic extraction. All of the above problems make it more and more difficult for developers to select the appropriate high-quality open APIs to build Mashup applications. erefore, a major challenge to Mashup application development has emerged [7]: how to effectively and efficiently select the most appropriate open APIs from a large number of available resources to match the needs of Mashup developers.
In response to the above challenges, a number of open APIs selection approaches have been proposed, including keyword-based discovery approaches [8,9], topic modelbased discovery approaches [10,11], content-based recommendation approaches [12,13], and QoS-based recommendation approaches [14,15]. Yet, there are some problems with these studies: (1) Most established approaches use nonnatural language (NL) text to describe Mashup requirements [16], such as WSDL, which is not user friendly. e existing techniques do not work well with text modeling. (2) Some users do not even know how to describe the requirements of Mashup exactly [17], which makes the API search based on keywords difficult. erefore, a comprehensive open API selection approach should not only include searching or discovering APIs based on keywords but also include actively recommending APIs to developers based on his/her preference. Yet, current approaches separate open API discovery from open API recommendation, leading to inefficient results.
To overcome the above problems, we propose a Hybrid Open API Selection Approach for Mashup development (HyOASAM), which consists of two basic approaches: one is a user-story-driven open API discovery approach, and the other is a multidimensional-information-matrix-(MIM-) based open API recommendation approach. e user-storydriven open API discovery approach is to tackle the first problem. By introducing user stories into Mashup development, the Mashup developers can easily capture the role, aim, and motivation of a Mashup and then describe them with NL-based user stories. In agile development, user stories are used to capture and describe the rapidly changing user requirements. e MIM-based open API recommendation approach is to tackle the second problem. We make use of the historical information of Mashup developers (such as the search history and the access history) to profile each developer, elicit their preferences, and then recommend the most suitable open APIs to them. e open API discovery approach can be divided into three steps: (1) extracting three components from user stories, (2) extracting three corresponding elements from open API descriptions, and (3) calculating the similarity based on two sets of data. e open API recommendation approach can be divided into two steps: (1) extracting open APIs, Mashups, and the invoking relationships between them using MIM and (2) calculating the association scores between the Mashup and the open APIs using a factorization machine (FM) model to recommend TOP-N open APIs. Our approach can perform attribute extraction well for both long and short texts, and deeper associations can be better mined through FM.
In summary, the contributions of this paper are as follows: ( e rest of the paper is organized as follows: Section 2 introduces previously established open API discovery and recommendation approaches; Section 3 introduces our proposed HyOASAM in detail; Section 4 compares the HyOASAM with other approaches; Section 5 draws the conclusions.

Related Work
In this work, we treat open APIs and services as synonyms and use the two concepts interchangeably.

Open APIs Discovery.
Open APIs discovery is the efficient and accurate retrieval of a set of open APIs or services that achieve the needs of users from a service database based on the demand statements entered by users [18]. Generally, the discovery approaches can be categorized into the following two main classes.

Syntax-Level Discovery Approaches.
Syntax-level open API discovery is the earliest proposed discovery technique. It matches through several keywords and the grammatical features of the service interface, and the matching mechanism is relatively simple. Typical approaches are as follows.
Massimo and Erl [8] proposed a solution based on DAML-S (DARPA Agent Markup Language) to perform semantic matching between user requirements and service description. Mateos et al. [19] use metaprogramming and other related technologies to develop a set of tools for text mining and service processing of WSDL documents. Paliwal et al. [20] proposed clustering services by clustering approaches based on service descriptions and service registration information on UDDI and then used latent semantic indexing (LSI) to achieve matching. Elgazzar et al. [21] improved the accuracy of service discovery by searching key information such as content, service types, messages, and ports in the WSDL document and used Quality reshold (QT) clustering algorithms to group services based on key information.
In general, the syntax-level open APIs discovery approaches are relatively simple to implement and easy to maintain. However, the deep semantics cannot be understood using such approaches. For example, the polysemy is a common problem, which inevitably leads to a low precision.

Semantic-Level Discovery Approaches.
Ontology is used to solve the heterogeneity of grammatical-level service descriptions at the early stage, so that the semantic description of services functions and behaviors is strengthened. e matching algorithms in semantic-level service discovery rely on logical deduction and reasoning. e continuous development of artificial intelligence makes the service discovery algorithms smarter, faster, and accurate.
Ke et al. [22] transformed user requirements and service description documents into ontology trees; calculated the conceptual similarity, attribute similarity, and structural similarity of corresponding nodes in a hierarchical and classified manner; and thereafter effectively avoided complex reasoning. Huang et al. [23] proposed a semantic similarity calculation approach based on ontology distance calculation. e AGNES clustering algorithm clustered semantic service sets to improve the efficiency of service discovery. Klusch [24] used service profiles to select services described by OWL-S on a semantic level. Wei et al. [25] proposed a customizable SAWSDL service matcher that extends XQuery by using various similarity measures to support multiple matching strategies based on different application requests.
Most of the above open API discovery approaches cannot meet the dynamic changing needs of Mashup developers to provide service choices. Moreover, most of the Mashup developers' requirement descriptions are inaccurate and cannot describe their real needs better. is will greatly affect the results of service discovery. In traditional topicbased data extraction models, such as LDA, TF-IDF, HDP, and other topic models, the topic extraction method is the generalization of the user's overall needs, and the extracted data will have a certain deviation from the actual needs of the user; that is, it cannot describe user preferences and needs.

Open APIs Recommendation.
With the rapid increase of open APIs, some APIs that are of interest to Mashup developers are difficult to search because of the small number of visits. On the other hand, developers often lack reasonable and effective requirement description skills. In such cases, open APIs discovery cannot be applied appropriately. Open APIs recommendation maintains the ecology of the service platform to tackle the problem. At present, the existing recommendation research work can be roughly divided into the following three categories.

Recommendation Based on Functional Characteristics.
e functional characteristics are mainly extracted from service description documents, and the most similar services are recommended by measuring the similarity between the description documents. For example, Cao et al. [26] used the topic model to calculate the relationship between Mashup, services, and the invoking relationship between them. By integrating the popularity of the service into the model, they predicted the link between the Mashup and the service and then recommended the appropriate service for the Mashup developer. Most service recommendation approaches based on functional characteristics adopt traditional topic models or keywords for similarity calculation. However, traditional topic models need to specify the number of topics in advance while extracting topic vectors, which has a direct impact on the recommendation results.

Recommendation Based on Quality of Service (QoS).
QoS refers to the nonfunctional features of the service, such as the user's history of invoking services or the quality of services. Zheng et al. [27] used collaborative filtering to calculate the quality of services through user historical behaviors. Huang et al. [28] proposed a Mashup component recommendation approach to establish a relationship between Mashup components through a generic layer model and guide users to select components from a large-scale Mashup component library. Xu et al. [29] proposed a socially perceived approach, in which the coupling matrix model was used to store the multidimensional social network between potential users, topics, Mashups, and services, and the relationships were predicted by existing relational networks. However, those approaches have the problem of matrix sparsity, which affects the recommendation accuracy.

Recommendation Based on Hybrid Characteristics.
Such approaches take into consideration not only the functional characteristics of the service but also the QoS. By combining the two characteristics, the accuracy of service recommendation is improved. For example, Gao et al. [30] proposed a manifold ordering framework. Based on the similarity between Mashups and the heterogeneous relationship between Mashups and services, a manifold ranking algorithm is applied to recommendation services. e similarity between Mashups and the heterogeneous relationship between Mashups and services are calculated by a manifold sorting algorithm. Li et al. [31] proposed an approach for integrating multidimensional information, using HDP to extract service, and the subject vector of Mashups was used to calculate the similarity between Mashups, the similarity between services, the fluency of services, and the co-occurrence of services. en they used the FM model to score and recommend the highest rated N services to Mashup developers. Xia et al. [32] proposed a new classaware service clustering and distributed approach. First, the services were clustered by extending the K-means clustering algorithm, and then, the service ordering was predicted through a distributed machine learning framework. At present, the service recommendation approaches based on hybrid characteristics are among research hotspots, due to its high precision. HyOASAM has taken the advantage of such approaches.
From Table 1, we can see that the established open APIs discovery approaches have the problem of the randomness of user demand descriptions and open APIs description texts, which leads to unsatisfactory results, whereas open APIs recommendation approaches have the problem of inability to fully mine hidden information. Besides, it is hard to meet the real needs of developers to take either the discovery approaches or the recommendation approaches. A hybrid manner is a more accurate and comprehensive manner.

HyOASAM Approach
If a Mashup developer enters a user story of a requirement "as a user, I want to upload and edit photos online so that I can process photos on the server," then user-story-driven open APIs discovery approach is applied to calculate the similarity   Figure 2 shows the overall framework of the open API discovery approach.

Requirements Components Extraction (Step 1)
Definitions. Developer requirement descriptions are often too casual, so we use user story to describe open API requirement components [34]. For example, "as a user, I want to upload and edit photos online so that I can process photos on the server". Requirement components are the key information of open API requirements, and the detailed description is shown as follows: Open API discovery e earliest service described by DARPA agent markup language e randomness of user demand descriptions and service description texts leads to unsatisfactory results Service discovery based on text mining [19] It combines text mining and metaprogramming techniques e approach is unable to mine deep relationships Web service discovery based on an ontology [20,22,24] ey address the issue of nonexplicit service description semantics that match a specific service request e semantic extension is not enough Web service discovery based on WSDL documents clustering [21] Narrowing the search space and improving results Each feature is not assigned its own weight Web service discovery based on hierarchical clustering [23] e vector space model improves the accuracy and efficiency It does not take the semantics into consideration Web service discovery based on SAWSDL-iMatcher [25] Multiple matching strategy extensions via XQuery can effectively aggregate similar values e approach is only useful in one specific domain, not effective in other domains Open API recommendation based on topic model [26,31] Open API recommendation e document probability distribution is obtained, and the distance is used to calculate the semantic distance e topic model is not well trained Web service recommendation based on collaborative filtering [27] Collaborative filtering does not require specialized domain knowledge and can be easily modeled Collaborative filtering cannot mine hidden information Model-based recommendation [28] e use of a generic hierarchical graph model can improve efficiency and effectiveness is approach cannot get synthesis of multiple constraints for more personalized recommendation Social-aware recommendation [29] It can predict unobserved relationships e matrix sparsity affects accuracy Manifold-learning-based recommendation [30] Mashup can use manifold sorting algorithm for better clustering e approach cannot handle dynamically added services Combining machine learning and distributed recommendation [32] More accurate prediction e approach ignores QoS HyOASAM Open API discovery and recommendation HyOASAM can handle random description text and make accurate recommendations for unclear user needs description e modeling process is a little more complicated than other approaches (1) Role (ro) � <noun, adj>: the role to carry out the functionality. e noun is the role, and the adj is a modification of the noun. In the above example, the role is <user, null>. (2) Aim (ai) � <verb, do, io, adj>: the aim that developers want to achieve. Verb is act of a role, do is the direct object, io is the indirect object, and adj is the extension of the corresponding noun. In the above example, the aim is <{upload, edit}, photo, null, online>. (3) Motivation (mo) � <verb, do, io, adj>: the developers' purpose. Components in mo and go are the same. In the above example, the motivation is <process, photo, server, null>.
Definition 1 (requirements components). rc � <ro, ai, mo>. Requirements component represents developers' actual needs and is composed of role, aim, and motivation.
In the process of requirements components extraction, we tag each word in the user story, because polysemy will affect the final result. en, we extract requirement components based on grammatical dependencies. We use Stanford Parser [35] to parse text and extract Stanford Dependency (SD) set. Proceed as follows: For example, when the Mashup developer enters the requirement "as a user, I want to upload and edit photos online so that I can process photos on the server," through Step 1, the final extraction result of the requirement component is like this: <user, null>, <{upload, edit}, photo, null, online>, <process, photo, server, null>.

Open API Properties Extraction (Step 2). Next, we extract open API properties from the open API description
text. e open API description text is generally a text describing the API function written by the API developer. It is mainly a text that helps the developer understand the API and how to use it. Currently, the open API description text is written in NL, for example, "customers can use the service to edit photos and video over the Internet." Open API properties contain the following properties: (1) Agent (ag) � <noun, adj>. e subject that the open API is served to. In the above example, the agent is <customer, null>.

Similarity Calculation (Step 3).
is section presents the similarity formula between the user story q and open API s through requirements components and open APIs properties. e overall formula is as follows:  Mathematical Problems in Engineering where usim(u q , u s ) represents the similarity between the role components u q in user story q (e.g., <user, null>) and the agent properties u s in open API description text s (e.g., <customer, null>); asim(a q , a s ) represents the similarity between part of aim and motivation components a q (verb and do) in user story q (e.g., <{upload, edit}, photo>,<process, photo>) and the activity properties a s (verb and do) in open API description text s (e.g., <edit, {photo, video}>); gsim(g q , g s ) represents the similarity between part of aim components g q (io and adj) in user story u (e.g., <null, online><server, null>) and scenario properties g s (io and adj) in open API description text s (e.g., <Internet, over>). e parameters a, b, and c represent the weight of the three variables in  (1), and a + b + c � 1. e specific formula is as follows: In (2), K is the amount of words in u q and J is the amount of words in u s . Equation (3) is the same as (2). In (4), first, we vectorize the word with Word2Vec [36] and then calculate the similarity. w q k represents the word vector corresponding to Word2Vec: We normalize the similarity of each row to calculate the weight α k.j : We use the accumulation of the multiplication of weights and similarities as the similarity of each row in u: e formula for asim(a q , a s ) is as follows: asim a q , a s � n i�1 max masim a q i , a s 1∼m n .
Here n is the amount of verbs in the aim and motivation of u and m is the amount of verbs in the activities of s. masim(a q i , a s i ) represents the similarity between an element in a q and an element in a s : masim a q , a s � w 1 × sim V q , V s In (9), V q is a verb in aim or motivation, V s is another verb in activity, I and J represent the number of nouns in a q and a s , respectively, N q i is the i-th noun in a q , N s j is the j-th noun in a s , and w 1 and w 2 are the weights of verbs and nouns, respectively. Using (4), sim(V q , V s ) and sim(N q i , N s j ) are calculated as similarities between words.
For example, when calculating the similarity between requirements and the open API description text in step 3, usim(u q , u s ) and gsim(g q , g s ) are calculated in the same way. Here, we take gsim(g q , g s ) as an example. At this time, g q is <null, online><server, null>, and g s is <Internet, over>.
First establish the similarity matrix sim(null, Internet) sim(server, Internet) sim(null, over) sim(server, over) sim(online, Internet) sim(null, Internet) sim(online, over) sim(null, over) sim(online, Internet) is calculated using Word2Vec similarity. After obtaining the similarity matrix, calculate the a k,j weight for every similarity of each row; the weight indicates the importance of each element in each row in the entire row; then, linearly combine the weight with the corresponding elements, and finally take the similarity of all rows; the average of the values is taken as the final similarity of gsim.
Compared with the calculation of gsim, asim needs to calculate the similarity between verbs and verbs, nouns and nouns, and then perform linear combination of weights. Specifically, first calculate the similarity masim (a q , a s ) between an element in a q and an element in a s ; a q is composed of aim and motivation (io and adj), that is, <{upload, edit}, photo> and <process, photo>; a s is composed of scenario (io and adj), that is, <edit, {photo, video}>. Because a q has 3 verbs and a s has only one, n = 3 and m = 1. First calculate the similarity between < upload, photo> and <edit, {photo, video}>; the verb upload and verb edit are calculated with Word2Vec, for the similarity of the nouns: photo and photo, video; because < upload, photo > has only one noun in a q , <edit, {photo, video}> in a s has two nouns, I = 1, J = 2; calculate sim(photo, photo) and sim(photo, video) respectively; take the maximum value of the similarity of a noun k in a q to all nouns in a s as the similarity of k to a s nouns, and then take the average value of the similarity of all nouns in a q to the a s noun as the noun similarity between a q and a s . Finally, the weight of the verb and the similarity of the verb, and the weight of the noun and the similarity of the noun are linearly combined to obtain the final similarity, that is, the final masim(a q , a s ). Similarly, calculate the similarity of <edit, photo> and <edit, {photo, video}>, <process, photo> and <edit, {photo, video}>. Take the maximum value of the similarity of an element in a q to each element in a s as the similarity of the a q element to a s , and at this time m = 1, so the similarity of each element in a q to a s is the maximum. n = 3. Finally, calculate the average similarity of all elements in a q to a s as the final gsim.
Linearly combine usim, asim, and gsim to get the similarity between the final user story and the open API description text.  Figure 3, the MIM-based open API recommendation approach is generally divided into two steps:

MIM-Based
Step (1) e Similar Open APIs Matrix Construction. Open API properties extraction. Open APIs properties are selected as the functional characteristics for open API recommendation, i.e., the agent, activity, and scenario. e extraction process is the same as what we have shown in Section 3.1 and will not be described here.
Similarity calculation: the calculation of similarity is the same as what we have shown in Section 3.1.3 and will not be described here. A similar open APIs matrix can be obtained by multiple calculations of the above similarity approach. (1) Mashup description text extraction: e Mashup description text recorded the detailed information of the Mashup and a storage carrier, which is written in NL and in any format, for example, "customers can use the service to upload and edit photos over the Internet." We randomly extract 10,000 Mashup application descriptions from the application library and extract SDs on the application descriptions. Compared with open API, Mashup lacks user description information.
erefore, we only extract the activity (named as Mashup activity) and scenario from the Mashup description text: (1) Mashup activity (ma): the activity provided by the Mashup, which in the example refers to "upload and edit photos." (2) Mashup scenario (ms): the scenario of the Mashup, which in the example refers to "over the Internet." We have identified 10 SDs and classified the SDs into the following six scenarios to extract the Mashup properties, as shown in Table 2.
Similarity calculation: suppose there are two Mashup descriptions "customers can use the service to upload and edit photos over the Internet" and "this Mashup allows users to watch pictures on the server or Internet." We calculate the similarity between Mashups through Mashup description text and the number of identical open APIs invoked between Mashups. e specific formula for calculating the similarity of two Mashups is shown in 8 Mathematical Problems in Engineering where the weight parameter represents the similarity of activities between two Mashups (e.g., <{upload, edit}, photo> and <watch, picture>). sim ms (M 1 , M 2 ) represents the similarity of the scenarios between two Mashups (e.g., <Internet> and <{server, Internet}>). sim se (M 1 , M 2 ) represents the similarity of open APIs invoked between Mashups. e formula of sim ma (M 1 , M 2 ) is as follows: where n is the amount of verbs in the ma of M 1 , and m is the amount of verbs in the ma of M 2 . masim(a q i , a s i ) represents the similarity between an element in ma of M 1 and an element in ma of M 2 :

Mathematical Problems in Engineering
In (11), V 1 is a verb in the ma of M 1 , V 2 is another verb in the ma of M 2 , I and J represent the number of nouns in the ma of M 1 and ma of M 2 , respectively, N 1 i is the i-th noun in the ma of M 1 , N 2 j is the j-th noun in the ma of M 2 , and w 1 and w 2 are the weights of verbs and nouns, respectively. Using (4), sim(V 1 , V 2 ) and sim(N 1 i , N 2 j ) are calculated as similarities between words. e formula of Sim ms (M 1 , M 2 ) is as follows: where u i represents a set of word combination's similarities. k, j respectively represent the amount of words contained in the ma or ms of the two Mashups M 1 and M 2 . M 11, . . ., M 1k represent each word corresponding to ma or ms of M 1 . M 21, . . ., M 2j represent each word corresponding to ma or ms of M 2 . e two groups of words are compared one-to-one to calculate the similarity, and a similarity matrix of k × j is formed. Take the maximum value of the similarity result of each row to represent the similarity of the corresponding words, and finally a matrix of 1 × k is obtained. We adopted the Jaccard similarity calculation idea to calculate the similarity of open APIs invoked between Mashups: Here Here, ai and aj represent different open APIs, respectively. |a i ∩ a j | represents the total times ai and aj were invoked by the same Mashup. |a i ∪ a j | represents the sum of the times the open API ai and aj were invoked in the past.

FM Model Score Prediction (Step 2). FM can learn the characteristic interaction between Mashup and open API, so
as to calculate the correlation information between them. e specific formula of FM is as follows: where n is the amount of the features, w 0 is the initial bias term, w i is the weight of the i-th feature, x i , x j represent the interaction between the paired feature variables, v i,f , v j,f represent a hidden factor between Mashup x i and open API x j in the factorization model, and k is the factorization matrix dimension. Figure 4 shows an example of an FM model's input and output. e training data consists of two parts. In this work, we use the MIM matrix as input data and a score value Y as output data. Finally, FM can calculate the target value Y between Mashup and the open API and offer recommendations for the Mashup developers by sorting Y.
In the training set, if the active open API is historically invoked by the Mashup, the value in the vector Y is 1; otherwise it is 0. In the test set, the value in vector Y represents the calculated score of the active open API relative to the Mashup. Finally, the final set of recommended open APIs is obtained by ranking the predicted scores.
In the FM model, the model parameters w 0 , w, and v are obtained from the training examples. In order to get the optimal parameters, a loss function needs to be defined to obtain the optimal parameter model. We define the loss function as  (16) is a vector [3]. e final MIM is [0, 1, 0, 1, 0, 0.4, 0, 0.7, 0, 0.3, 0.5, 0, 3] and will be entered into the trained FM to get the final score.

Experimental Results and Analysis
We conducted a series of experiments on user-story-driven open API discovery and MIM-based open API recommendation approach to evaluate the effectiveness of the HyOASAM approach [37]. In order to evaluate the effectiveness of HyOASAM, we have established a standard set. We appointed four graduate students with extensive experience in Mashup development to build the establishment of the standard set. e four students built four different sets of open API list standards based on the practical experience they developed. In the end, we used precision as criteria. Due to different standard sets, the final results are also different, and we take the average of the four results as the final standard set of the experiment.

4.2.
e Experimental Analysis for Open API Discovery. ree requirements components are extracted from each user story. Table 3 shows the result; we select the five different user stories domains in the table as the experimental open API requirement texts.

Metric Selection.
We use precision to evaluate the effectiveness of user-story-driven open API discovery. e precision formula is as follows: Here, S A represents the requirements components extracted by HyOASAM and S M represents the manually extracted open API properties.

Parameter Selection.
In (1), a represents the weight of users, b represents the weight of functions, and c represents the weight of motivation. We compared the open API sets from three different domains user stories as input to the three standard open API sets. In Figure 5, it can be seen that the parameters of a � 0.2, b � 0.6, and c � 0.2 are in most cases better than the precision of other parameters. is shows that function is the main factor of the overall similarity.

Comparative Experiment.
We compared the userstory-driven open API discovery approach (USDOAD) with other established open API discovery approaches [38,39].
e two established approaches are as below: (1) Open API discovery approach based on vector space model (VSMOAD): we used VSM to vectorize the data processed user story u � {u 1 , u 2 , u 3 , u 4 , . . ., u i } and open API description text s � {s 1 , s 2 , s 3 , . . ., s i }, where i is the size of the corpus vocabulary, and then used cosine similarity to calculate similarity: (2) Open API discovery approach based on LDA (LDAOAD): LDA is a topic model, which can give the topic of each document in the document set as a probability distribution, so we used LDA to extract Mathematical Problems in Engineering the subject distribution vector of user story u and open API description text s and then used enhanced cosine similarity to calculate similarity. e formula is as follows: Sim(a, u) � i∈I r a,i − r a r u,i − r u ������������ � i∈I r a,i − r a 2 ������������� i∈I r u,i − r u 2 .
(21) Figure 6 shows that our approach is significantly better than the VSMOAD and LDAOAD approach, but the precision between TOP-20 and TOP-25 is significantly reduced.
In the above two formulas, R(A i ) represents the open API actually invoked by the target Mashup and RM(A i ) represents the recommended open API from our approach.
F-measure is the unified average of recall rate and accuracy: e relationship between the three metrics and the performance of the recommended algorithm is roughly positively correlated. e larger the recall, precision, and Fmeasure are, the better the performance of the recommended approach is; otherwise it has poor performance.

Parameter Selection.
e similarity calculation formula for the open API-recommended algorithm invoked Mashup that is proposed in this paper contains three parameters: u, v, and w, which correspond to the function of Mashup, the application scenario and the similarity calculation of the invoked open API, and u + v + w � 1. ey directly affect the construction of MIM, which indirectly influences the effect of FM model. Figure 7 shows the effect of the values of the five groups u, v, and w on the recommended results. It can be seen from Figure 7 that when the values of u, v, and w are 0.6, 0.2, and 0.2, the recommended effect is higher than other groups, so our parameters are configured as u � 0.6, v � 0.2, w � 0.2. We
(1) TF-IDF: is approach starts from the degree of similarity between the active open API description document and the target Mashup description  e evaluation result is calculated in terms of the recall, precision, and F-measure [42]. e comparison shows that our approach has the highest accuracy in all the three metrics.
In Figure 10, our approach is better on recall than the other three approaches, and recall increases as N increases. In Figure 11, although the precision decreases as N increases, our approach is still the best. As shown in Figure 12, the average F-measure value of the MIM-based approach is 2.21% higher than LDA-FM, 4.60% higher than E-LDA, and 15.81% higher than TF. In all cases, TF-IDF has the worst performance. TF-IDF just uses the frequency of word occurrences to vectorize words, regardless of the underlying semantic relevance behind them. MIM-based approach, LDA-FM, and E-LDA reveal the semantic relevance of open APIs and Mashup description documents, so they can calculate their similarities with higher accuracy.  (2) dynamically get a list of open APIs that match the requirements and select the open APIs they want. It can be seen through experiments that HyOASAM has improved in precision and recall. In the future we will consider employing Word Embedding and Attention Model into NLP techniques, so that the semantic relationship between words can be fully extracted.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.