^{1}

We propose a Markov decision process model for solving the Web service composition (WSC)
problem. Iterative policy evaluation, value iteration, and policy iteration algorithms are used to
experimentally validate our approach, with artificial and real data. The experimental results
show the reliability of the model and the methods employed, with policy iteration being the best
one in terms of the minimum number of iterations needed to estimate an optimal policy, with the
highest Quality of Service attributes. Our experimental work shows how the solution of a WSC
problem involving a set of 100,000 individual Web services and where a valid composition
requiring the selection of 1,000 services from the available set can be computed in the worst
case in less than 200 seconds, using an Intel Core i5 computer with 6 GB RAM. Moreover, a real
WSC problem involving only 7 individual Web services requires less than 0.08 seconds, using the
same computational power. Finally, a comparison with two popular reinforcement learning
algorithms, sarsa and

A Web service is a software system designed to support interoperable machine-to-machine interaction over a network, with an interface described in a machine-processable format called Web Services Description Language [

When a Web service is requested, all available Web services descriptions must be matched with the requested description, so that an appropriate service with the desired functionality can be found. However, since the number of available Web services is continuously growing year by year, finding the best match is not a trivial problem anymore, especially if we take into account that the matching criteria must consider not only the desired functionality, but also other attributes such as execution cost, security, performance, and so forth.

If individual Web services are not able to meet complex requirements, they can be combined to create composite services [

Some approaches to solve the WSC problem have focused on different graph-based algorithms [

The use of methods based on Markov decision processes (MDPs) for the composition problem is certainly not new. In [

Solutions based on reinforcement learning are also relevant. For instance, in [

The goal of automatic WSC is to determine a sequence of Web services that can be combined to satisfy a set of predefined QoS constraints. For problems where we need to find the sequence of actions maximizing an overall performance function, the MDPs are one of the most robust mathematical tools that we can use. Therefore, in this paper we propose an MDP model to solve the WSC problem. To show the reliability of our model, we conducted experiments with three of the most studied algorithms: policy iteration, iterative policy evaluation, and value iteration. Although all three algorithms provided good solutions, the policy iteration algorithm required the minimum number of iterations to converge to the optimal solutions. We also compared these three algorithms against sarsa and

This paper is structured as follows. Section

The WSC problem can be abstracted as the problem of selecting a sequence of actions, in such a way that we maximize an overall evaluation function. Such kind of sequential decision problems can be defined and solved in an MDP framework. An MDP is a tuple

The MDP dynamics is the following. An agent in state

As the agent goes through states,

The reward at timestep

A policy is defined as a function

The optimal value function is defined as

This function gives the best possible expected sum of discounted rewards that can be obtained using any policy

The optimal value function is such that we have

When the state transition probabilities are known, dynamic programming can be used to solve (

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(1) initialize

(2)

(3)

(4)

(5)

(6)

(7)

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

The last two algorithms are known to converge usually faster than the first one. Moreover policy iteration and value iteration are standard algorithms for solving MDPs, and there is not currently universal agreement over which algorithm is better [

In this section we define the MDP model used to represent and solve the Web service composition problem by means of dynamic programming algorithms.

We begin by describing the WSC problem in more details. Individual Web services can be categorized in classes by their functionality, input data, and output data. Given

Now, we are ready to introduce our model. We define a Web service composition problem as an MDP

Formally, we say that

For example, if the current state represents the composition

In this section we provide the results of our experimental comparison using two scenarios, one real and one artificial. The experiments that we present in this section were performed running policy iteration, iterative policy iteration, and value iteration algorithms, on an Intel Core i5 2.5 GHz processor, on Windows 8.1, 64 bits operating system, and 6 GB RAM.

The WSC problem considered as our first experimental scenario consists of 2 classes of Web services. One class is about weather services that can be used to obtain the current temperature in a city. The other class is about Web services that can be used to convert temperatures from one metric unit to another, for example, from Fahrenheit to Celsius. In the class of weather services we considered 3 different Web services.

National Oceanic and Atmospheric Administration (NOAA) Web service, available at

GlobalWeather Web service, available at

Weather channel Web service, available at

In the class of metric units conversion services we considered 4 different Web services.

A simple calculator Web service such as the one available at

ConvertTemperature Web service, available at

TemperatureConversions Web service, available at

TempConvert Web service, available at

We obtained the QoS attribute values of all 7 Web services using a java program designed to get the attribute values with the following formulas:

In order to obtain representative QoS values for the Web services, we made many measurements, several days in different moments of the day. We obtained the values for each parameter and measurement, and then we calculated the average values for the QoS parameters.

Once we gathered the information of the QoS attributes we used all 3 dynamic programming algorithms to learn the best composite Web service. With 7 Web services belonging to 2 different classes, there are 12 possible compositions. All these possibilities are represented with the graph illustrated in Figure

Graph for the real scenario with 2 classes of Web services. The first class contains 3 Web services and the second class contains 4 Web services. Each class is illustrated as a layer of nodes.

The graph of the real scenario illustrates each class of Web services as a layer. In this graph, each node represents an individual Web service. Node

Results with the real Web services scenario are plotted in Figure

Learning times for the real scenario.

As our second scenario to test all 3 dynamic programming algorithms, we simulated data for three QoS attributes: availability, execution time, and throughput. We created a maximum of 100,000 individual Web services, classified into 100 hypothetical classes of Web services. We assumed that every Web service in a class

Graph for an artificially generated Web composition problem with a maximum of 1,000 selected nodes. Each node is selected out of 100 possible individual Web services belonging to the same class (layer).

As in the first scenario, node

Results of this second set of experiments are shown in Figures

Learning times with

Learning times with

Learning times with

Each layer in the graph represents 100 Web services belonging to the same class. Therefore, when the number of nodes to be selected for a valid Web service composition is 1,000, we are really solving a problem with 100 × 1,000 = 100,000 Web services. We can see from the learning curves that the time needed to solve the MDP problem increases as the number of nodes is increased. Again, all 3 algorithms found the optimal solution, but policy iteration found it in less time. The best performances of the algorithms were obtained for

In some related works [

Sarsa [

(1) initialize

(2)

(3) initialize

(4) choose

(5)

(6) take action

(7) choose

(8)

(9)

(10)

(11)

If the policy is such that each action is executed infinitely often in every state, every state is visited infinitely often, and it is greedy with respect to the current action-value function in the limit, then by decaying

(1) initialize

(2)

(3) initialize

(4)

(5) choose

(6) take action

(7)

(8)

(9)

(10)

If in the limit the action-values of all state-action pairs are updated infinitely often, with a decaying

We have implemented sarsa and

Learning times required for a real scenario of Web service composition, plotted in logarithmic scale. Reinforcement learning methods required two orders of magnitude and more time than dynamic programming methods.

Additionally, we ran experiments with a second artificially created scenario, with 3 layers of 20 Web services each. Once more, reinforcement learning methods required much more time than the dynamic programming algorithms. Logarithmic time curves given in Figure

Learning times required for a simulated scenario with 3 layers of 20 Web services. Curves plotted in logarithmic scale show that reinforcement learning methods required ten times more time than dynamic programming algorithms to handle the same kind of problem.

Dynamic programming methods converge faster than reinforcement learning methods simply because dynamic programming methods update every single state value at each iteration. Reinforcement learning methods only update the value of the states that happen to visit, giving its exploration policy, that is, epsilon greedy.

Furthermore, in terms of the deployment of an automatic Web service composition system, it is worth mentioning that the gathering of QoS information can be performed at specific time intervals by a dedicated module of such system. Once we have gathered this information, which is fundamental for the evaluation of the reward function, there is no need to explore the state space of Web services as reinforcement learning methods do. We can simply run a dynamic programming algorithm to estimate the value function of the Web services and then compute the optimal composition of Web services.

In this paper we have proposed an MDP model to address the Web service composition problem. We used three dynamic programming algorithms, namely, iterative policy evaluation, value iteration, and policy iteration, to show the reliability of our approach. Experiments were conducted with both artificially created data and a set of real data involving seven publicly available Web services.

Our experimental results show that policy iteration is the best one in terms of the minimum number of iterations needed to estimate an optimal policy. The optimal policy indicates the sequence of combined individual Web services making up a composite Web service with the highest evaluation of their QoS attributes.

Although some approaches using reinforcement learning have also been proposed, we argue that dynamic programming methods are better suited for the Web service composition problem than reinforcement learning methods. The reason is that reinforcement learning methods such as sarsa and

None of the related works proposing the use of MDP-based methods to solve the Web service composition problem have provided a comparison study involving the five algorithms that we have analyzed in this work: iterative policy evaluation, value iteration, policy iteration, sarsa, and

Future research on this topic must address real Web services composition involving more nodes. Another interesting subject that deserves to be further investigated is the design of complex reward functions capable of handling an increasing number of QoS factors.

The authors declare that there is no conflict of interests regarding the publication of this paper.

The authors would also like to thank the Secretaria de Educacion of Mexico for the partial support through Grant PIFI-2013-31MSU0098J-14.