Service Composition Recommendation Method Based on Recurrent Neural Network and Naive Bayes

Due to the lack of domain and interface knowledge, it is difficult for users to create suitable service processes according to their needs. (us, the paper puts forward a new service composition recommendation method. (e method is composed of two steps: the first step is service component recommendation based on recurrent neural network (RNN). When a user selects a service component, the RNN algorithm is exploited to recommend other matched services to the user, aiding the completion of a service composition. (e second step is service composition recommendation based on Naive Bayes. When the user completes a service composition, considering the diversity of user interests, the Bayesian classifier is used to model their interests, and other service compositions that satisfy the user interests are recommended to the user. Experiments show that the proposed method can accurately recommend relevant service components and service compositions to users.


Introduction
With the rapid development of Web 2.0, users gradually participate in the creation of web content. However, it is becoming more and more difficult to meet users' complex needs with the single service. us, users begin to combine different services for the generation of their own service composition [1][2][3]. Service compositions refer to several services in a certain logical order to form an integrated application. For example, IFTTT (If is en at) was used to customize smog SMS. IFTTT has started a new trend, shifting users from creating content to creating service composition. e traditional service system, however, is too complicated and has poor scalability. It is difficult for users to combine services. erefore, the lightweight WEB-API has become the future direction of service composition, owing to its easy access, extensibility, and easy development. e user-oriented lightweight service composition allows users to drag and drop service components on a lightweight service composition platform to generate a new service sequence. It can thus fulfill users' individual needs. Generally speaking, the platform tools of lightweight service composition can support graphical components encapsulated by third parties, such as RSS/Atom feeds, web services, and various programming APIs (Google Maps, Flickr). Users can create service compositions through a visual operation interface without programming skills. Both the industry and the academia have shown great interest in this user-oriented lightweight service composition method.
Although the service composition tools are recognized by users, they still need strategic guidelines when combining lightweight services [4]. ese guidelines include initial user guidance and user interest extraction. For the initial user guidance, at the beginning phase of the service selection, when a user selects a service component, other service components that effectively associate with the selected service should also be added in the recommendation list and be recommended to the user because of their potential unknown interests. For user interest extraction, owing to the diversity of user interests, the current user interest scenario should be modelled according to the user's selection. Other service compositions similar to the user interests should be recommended to the user.
In view of the above reasons, the paper puts forward a service composition recommendation method based on recurrent neural network and Naive Bayes. e method is divided into two stages: (1) When a user's initial interests are unknown, according to the user's selection, the method firstly recommends n service components with the highest correlation to the user by the RNN algorithm. (2) When the user completes a service composition, considering the diversity of their interests, the Bayesian classifier is used to model these interests, and other service compositions suitable for the user interests are recommended to the user. In the current research, the RNN algorithm recommends related services to users, which is likely to alleviate the problem of service mismatch. e Naive Bayes algorithm provides users with other service compositions that can satisfy their interests. It not only can meet the diversity of user interests but also can create excellent service compositions in the template with reused library. Experiments show that the proposed method is able to accurately recommend service components and service compositions to users.

Related Works
Previous researchers mainly utilized the topic model to obtain the latent topics for the improvement of the recommendation accuracy, for instance, the Latent Dirichlet Allocation (LDA) [5]. However the training of topic model was time-consuming. Subsequently, the matrix factorization was widely applied in service recommendations [6]. As the matrix factorization was not suitable for general prediction tasks, a general predictor method named Factorization Machines (FM) [7] was proposed. By exploiting Factorization Machines, Cao et al. [8] proposed the Self-Organization Map-Based functionality clustering algorithm and the Deep Factorization Machine-Based quality prediction algorithm to recommend API services. In addition, to solve the sparsity problem of historical interactions, Cao et al. [9] used topic models to extract the relationships between mashups and to model the latent topics. Although the above methods had generated several satisfactory results, traditional service recommendation approaches usually overlooked the dynamic nature of usage pattern. erefore, it is suggested by Bai et al. [10] to incorporate both the textual content and the historical usage to build latent vector models for service recommendation. Meanwhile, to address the cold-start problem, Ma et al. [11] proposed learning the interaction information between candidate services and mashups based on the content and the history. en, according to interaction vectors, a multiple layer perceptron was used to predict the rank of candidate services. rough the user's historical service access records, Gao et al. [12] utilized a PLSA-based semantic analysis model to capture the user's interests and to recommend the services that meet the user's preference.
In recent years, a few researchers have begun to pay attention to service recommendation from the perspective of Quality of Service (QoS) [9,[13][14][15][16][17][18][19]. By focusing on the network resource consumption, Zhou et al. [13] used an integer nonlinear programming to solve microservice mashup problems, and an approximation algorithm was designed to solve the NP-hard problem. Xia et al. [14] offered to determine each service's virtual cost according to the service's attributes and the user's preference. As a result, the service composition with the least total virtual cost was recommended to users. In addition, in terms of service function, by weighing the relationship between maximizing service cousage, maximizing functional diversity, and maximizing functional matching, Almarimi et al. [20] provided the nondominated sorting genetic algorithm to extract an optimal set to create a mashup. Shin et al. [21] proposed a service composition method based on functional semantics, and Shi et al. [22] employed a Long Short-Term Memory-(LSTM-) based method with a functional attention mechanism and a contextual attention mechanism to recommend services. In terms of semantic relevance, Ge et al. [23] suggested to effectively use the existing service composition and semantic association to expand the scope of service recommendation. In the research of Duan et al. [24], the integration of the probabilistic matrix factorization (PMF) model and the probabilistic latent semantic index (PLSI) model were adopted to recommend services to users. In the present study, the PLSI model was used to train user access records. By mining historical execution trajectories, He et al. [25] discovered potential behavior patterns based on the context and user characteristics. And contextrelated and preference-related user activity selection probability models were established.
is potentially supported the construction and the recommendation of optimized personalized mashup.
In summary, when a user clicks on the service "Flickr," combining the user has clicked the service "Facebook" before, it can thus recommend other services linked to the sequence Facebook-> Flickr. After the user finishes a service process, because of the diversity of user interests, it is necessary to recommend other service processes that are of potential interests. However, most schemes in previous studies only focus on one point, weakening the user's experience. In addition, the service component recommendation based on association rules ignores the relevance between word orders; thereby, it has a relatively low recommendation accuracy. e service composition recommendation based on QoS mainly focuses on the nonfunctional needs of users. In the Mobile Internet and the 5G era, users, however, pay more attention to their functional requirements. erefore, this paper proposes a service composition recommendation method based on the RNN and Naive Bayes. e RNN is used to ensure the relevance between word orders. Naive Bayes is adopted to identify users' potential interests according to the component function and provides users with excellent service processes in the template library.

Service Component Recommendation Based on Recurrent
Neural Network. In this section, service compositions are first sent to RNN for training. e training is divided into two phases: forward propagation and back propagation.
en the error losses of the output layers at different times are obtained through forward propagation. Subsequently, according to the cross entropy of error losses, weight increments ∇U, ∇V, ∇W are calculated through back propagation. Finally, the weights U, V, W are updated by a gradient descent method.

Preprocessing of Service Process Call Records.
To train a suitable RNN, it is needed to preprocess the service process call records as the input data and the predefined output data. Here, for each service composition in the training set, the last service component is deleted, and other service components are inserted into the list x_data as a list element. For each service composition, the first service component is deleted, and other service components as the corresponding list element are inserted into the list y_data. For example, if two service compositions are Facebook-> Flickr-> GoogleMaps and Time-> Weather-> Text, where -> represents the linked sequence between service components, then x_data � [["Facebook", "Flickr"], ["Time", "Weather"]], y_data � [["Flickr", "GoogleMasp"], and ["Weather", "Text"]]. Here x data j will be used as the input data of forward propagation, and y data j will be used as the predefined output data of back propagation. x data j and y data j need to be converted to a one-hot vector before training. In other words, there are L words in the dictionary; if the position of a service in the dictionary is j, then the service can be represented as an L-dimensional vector, where the j th dimension is set to 1, and the remaining dimensions are all set to 0.

Forward Propagation.
e forward propagation process of RNN is shown in Figure 1. Here U represents the weight between the input layer and the hidden layer. V represents the weight between the hidden layer and the output layer. W represents the weight of the adjacent hidden layers.
At time t, x t is the input value, and s t is the state in the hidden layer, which is related to the input value x t and the state s t−1 of the previous hidden layer. e mathematical expression is s t � f(Ux t + Ws t−1 ). f represents an activation function in the hidden layer. In the paper, f � tanh. y t is the output value at time t. e mathematical expression is y t � g(Vs t ). In the output layer, g represents an activation function. In the paper, g() � softmax().

Back Propagation.
RNN uses the back propagation to add up the error losses of the output layers at different times to obtain the total error loss E and then calculates the gradient of each weight U, V, W to obtain weight increments ∇U, ∇V, ∇W and finally employs a gradient descent method to update each weight U, V, W.
(1) Error Loss Function. For each time t, there will be an error loss e t between the output value y t of RNN and the predefined output value y t . Assuming that the cross entropy is used as the error loss function, there is a total error loss E � N t�1 e t , and e t � − L i�1 y t (i)ln y t (i) , where N represents the length of x data j or y data j , L represents the length of the one-hot vector, and x t ∈ x data j , y t ∈ y data j .
(2) Gradient Calculation. For ∇V, V does not depend on the previous states; thus, it is relatively easy to obtain. However, for ∇W, ∇U, the chain derivation rule is needed to obtain them. e calculation process is as follows: Assuming the error variation of the hidden layer δ h t � ze t /zy t · zy t /zs t + ze t+1 /zy t+1 · zy t+1 /zs t+1 · zs t+1 /zs t and the error variation of the output layer δ o t � ze t /zy t , ∇U, ∇V, ∇W can be expressed as follows: Figure 1: e forward propagation process of RNN.

Scientific Programming
In RNN, the calculation of back propagation is from the back to the front. At each moment, weight increments ∇U, ∇V, ∇W are updated as follows: (3) Weight Update. When the training of a service composition is completed, the RNN uses a gradient descent method to update U, V, W along the negative gradient direction. e updated process is as follows: Here the initial U, V, W are randomly generated. lr is the step length of the gradient descent method.
After the update of U, V, W is completed, the loop is repeated until the error loss E reaches the threshold. At this time, the weights U, V, W are used to predict the output results according to the input data.

Service Components Recommendation.
When users select a service component, the service component is sent to the RNN. It uses the weights U, V, W obtained in Section 3.1.3 to compute the following prediction services and then sends the top n prediction services to the recommendation list and posts them to users.

Service Composition Recommendation Based on Naive
Bayes. It is noted that service components selected by users need to be further reduced through the information gain, and then the Naive Bayes classifier is exploited to extract user interests based on the reduced service component set. Finally, similar service compositions are recommended to the user according to their interests. Bayesian can quickly and efficiently identify the user's interest according to several service components clicked by the user; and those with similar interest in the user template library can directly match the common components clicked by the user.

Information Gain.
After users finish a service composition, we need to determine user interests based on this service composition. To decrease the interference of noncritical service components, the information gain algorithm is used to reduce the service component set. e gain value IG(s) of each service component in the service composition can be calculated. e service components are sorted by the gain value IG(s), and the first n service components are regarded as the reduced service component set. e process is as follows: (1) e entropy of each service component SC j in the service composition SC is calculated, which is H(SC j |SC).

Scientific Programming
Here P(c i |SC j ) represents the probability of the service component SC j belonging to interest c i . P(c i |SC j ) denotes the probability of the service component SC j not belonging to interest c i . n(SC j |c i ) means the number of service compositions including SC j in interest c i . n(SC j |c i ) means the number of service compositions excluding SC j in interest c i . n(c i ) is the number of service compositions in interest c i . P(c i ) represents the proportion of services compositions belonging to interest c i in all services compositions. (4) e service components are sorted according to the classification gain value, and the first n service components form a reduced service component set.

User Interest Modeling.
According to the reduced service component set, the Naive Bayes classifier is exploited to determine the user interests. e process is specified as follows: (1) As discussed in Section 3.2.1, the probability of the reduced service component set belonging to each interest category is calculated by the Naive Bayes classifier, which is P(c i |SC). According to the Bayesian formula, P(c i |SC) � P(c i |SC 1 , SC 2 , . . . , SC n ) ∝ P(SC 1 , SC 2 , . . . , SC n |c i )P(c i ). Assuming that SC 1 , SC 2 ,. . ., SC n are independent, P(SC 1 , SC 2 , ..., SC n |c i ) � n j�1 P(SC j |c i ). As shown in formula (6), SC represents the sequence of the reduced service components (SC 1 ,SC 2 ,. . .,SC n ). P c i |SC � P c i |SC 1 , SC 2 , . . . , SC n ∝ P SC 1 , SC 2 , . . . , SC n |c i P c i , P SC 1 , SC 2 , . . . , SC n |c i � n j�1 P SC j |c i . (6) (2) According to formula (6), P(c i |SC) ∝ P(c i ) n j�1 P(SC j |c i ). is paper selects the interest category with the highest probability as the user interest; therefore, formula (7) is feasible.

Service Compositions Recommendation.
According to the reduced service component set, the Naive Bayes classifier is exploited to determine the user interests. e N-gram distance is used to compute the distance between different service compositions, and the service compositions are recommended to the user based on the similarities from high to low. e process is specified as follows: (1) In the service composition data set, service compositions consistent with the user interests are selected. (2) e N-gram distance is used to compute the distance between the selected service compositions and the reduced service component set. Depending on the distance, service compositions that are most similar to the reduced service component set are recommended, as shown in the following formula: (3) Here, GN(SC p ) denotes the number of service components in service composition SC p . GN(SC q ) denotes the number of service components in service composition SC q . GN(SC l ) ∩ GN(SC p ) is the number of the same service components in two service compositions. e similarity between two service compositions increases with the decrease in their distance.

Experiments
Experiments in this paper attempt to verify the effectiveness of RNN and Naive Bayes. Section 4.1 describes the data set used in Sections 3.1 and 3.2. Section 4.2 depicts the linked prediction performance of RNN, including the number of RNN's iterations, the precision, and the time comparison with the traditional algorithms (Apriori and N-gram).

Dataset.
is paper uses service process call records and the service composition data set from the Pro-grammableWeb website to conduct experiments. Service process call records include 20035 users' records. To improve the precision of experiments, the paper eliminates records of inactive users. In particular, users who call service processes less than 3 are regarded as inactive users; thus, there are 11730 service process call records used for our experiments.
e service composition data set includes 13,082 service processes, and there are 24 types of the classification labels of service processes.

e Number of Iterations.
e mean loss is given as follows: is paper adopts the free-running mode for training. e training results are shown in Figure 2 and the mean loss is shown in formula (9). Here E represents the loss of each round iteration. It can be seen that as the number of iterations increases, the mean loss of each epoch gradually decreases. When the number of iterations reaches 2000, the convergence of the RNN algorithm is achieved.

Algorithm Comparison.
is section compares the RNN algorithm with the traditional Apriori algorithm and the Ngram algorithm. e Apriori algorithm is a common association rule algorithm in data mining, mainly used in recommendation systems. e N-gram algorithm is also used in recommendation systems, but it can effectively reduce the recommendation space through learning the context. e comparison results demonstrate the feasibility of the RNN algorithm.
(1) Comparison of the Recommendation Precision between RNN(1), Apriori (1), and N-Gram (1). RNN(1) represents the recommendation algorithm RNN after the user calls a service component. Apriori(1) represents the recommendation algorithm Apriori after the user calls a service component. N-gram(1) represents the recommendation algorithm N-gram after the user calls a service component.
As shown in Figure 3, the ordinate represents the recommendation precision of service components, as shown in formula (10). Here, L(Rec(sc 1 . . . sc i )) represents the number of recommended service components for the called service component sequence sc 1 . . . sc i . sc i+1 represents the actually linked service component for the called service component sequence sc 1 . . . sc i . ∩ represents the intersection. L(Rec(sc 1 . . . sc i ) ∩ sc i+1 ) equals 0 or 1. e abscissa Top-P represents the number of service components required to be recommended. In practice, due to the control of the predefined threshold T, L(Rec(sc 1 . . . sc i )) ≤ Top − P, where T � 0.42. As can be seen, the precision of RNN(1) is superior to those of Apriori(1) and N-gram (1). When the Top-P is 5, RNN(1) presents the best performance. At this time, the precision of RNN(1) is 0.41, higher than 0.17 of Apriori(1) and 0.24 of N-gram (1). is is because the RNN and the N-gram can learn the linked relationships between service components through training, while Apriori can only learn the correlations between service components and cannot capture the linked order between service components. Meanwhile, due to the limitation of the Markov model, the RNN has shown superior context learning effects than the N-gram. (2) Comparison of the Recommendation Precision between RNN(2), Apriori (2), and (N)-Gram (2). As shown in Figure 4, RNN(2) represents the recommendation algorithm RNN after the user calls two components. Apriori(2) represents the recommendation algorithm Apriori after the user calls two components. N-gram (2) represents the recommendation algorithm N-gram after the user calls two components. When the user's initial selection of service components is greater than 1, the precision of RNN(2) is still superior to those of Apriori (2) and N-gram (2). When the Top-P is 5, the recommendation precision of RNN(2) is 0.85. e recommendation precision of Apriori(2) is 0.65. e recommendation precision of Ngram(2) is 0.79. At this time, the recommendation precisions of RNN(2), Apriori (2), and N-gram (2) are higher than those of RNN(1), Apriori (1), and N-gram (1). is is mainly because when the user selects more initial component sequences, there are fewer subsequently related service components, and the recommendation precision becomes higher.
(3) Comparison of Training Time. As shown in Figure 5, when the training data is small, the training times of the Apriori(2) algorithm and the N-gram (2) algorithm are lesser than that of the RNN(2) algorithm. But as the training data increases, the amount of data processed by the Apriori(2) algorithm and the N-gram(2) algorithm will increase exponentially. e training time of RNN(2) is lesser than those of Apriori (2) and N-gram (2). When the data density is 80%, it costs RNN(2) 720 minutes to train, while Apriori(2) takes 1065 minutes and N-gram(2) takes 1123 minutes.

4.
3. e Classification Performance of Naive Bayes. As shown in Figure 6, precision%(classfication) refers to the precision of classification prediction through the service composition data set. e predicted label is compared with the real label, and finally the classification precision of the algorithm is obtained. As can be seen, with the increase in the training data, the recommendation precision becomes higher. When the density of training data is 80% and that of the test data is 20%, the classification precision of Naive Bayes reaches 89.1%. Figure 7 analyzes the recommendation performance of Ngram distance. As the length of the recommendation list increases, the recommendation precision first increases and then decreases. When the length is 13, the recommendation performance is the optimal. At this time, the recommendation precision is 21.3%.

Conclusions
In order to optimize the assistance to users in their decisionmaking, this paper proposes a service composition recommendation method based on the RNN and Naive Bayes. is method has the following contributions:     (1) Different from traditional algorithms, this paper uses the context learning to reduce the recommendation space and provides users with more accurate servicelinked components. (2) To fulfill the diversity of user interests, this paper adopts the interest modeling to recommend other service processes that meet users' current interests. is can effectively promote the reuse of the template library.
It is yet worth noting that the interest modeling of Naive Bayes does not take the semantic similarity into consideration. As a result, future research would consider using the semantic analysis to model user interests.
Data Availability e data included in this paper are available without any restriction.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this study.