Scheduling Multilevel Deadline-Constrained Scientific Workflows on Clouds Based on Cost Optimization

This paper presents a cost optimizationmodel for scheduling scientificworkflows on IaaS clouds such asAmazonEC2orRackSpace. We assume multiple IaaS clouds with heterogeneous virtual machine instances, with limited number of instances per cloud and hourly billing. Input and output data are stored on a cloud object store such as Amazon S3. Applications are scientific workflows modeled as DAGs as in the Pegasus Workflow Management System. We assume that tasks in the workflows are grouped into levels of identical tasks. Our model is specified using mathematical programming languages (AMPL and CMPL) and allows us to minimize the cost of workflow execution under deadline constraints. We present results obtained using our model and the benchmark workflows representing real scientific applications in a variety of domains. The data used for evaluation come from the synthetic workflows and from general purpose cloud benchmarks, as well as from the data measured in our own experiments with Montage, an astronomical application, executed on Amazon EC2 cloud. We indicate how this model can be used for scenarios that require resource planning for scientific workflows and their ensembles.


Introduction
Today, science requires processing of large amounts of data and use of hosted services for compute-intensive tasks [1].Cloud services are used not only to provide resources, but also for hosting scientific datasets, as in the case of AWS public datasets [2].Scientific applications that run on these clouds often have the structure of workflows or workflow ensembles that are groups of interrelated workflows [3].Infrastructure as a service (IaaS) cloud providers offer services where virtual machine instances differ in performance and price [4].Planning computational experiments requires optimization decisions that take into account both execution time and resource cost.
Research presented in this paper can be seen as a step towards developing a "cloud resource calculator" for scientific applications in the hosted science model [1].Specifically, we address the cost optimization problem of large-scale scientific workflows running on multiple heterogeneous clouds, using mathematical modeling with AMPL [5] and CMPL [6], and mixed integer programming.This approach allows us to describe the model mathematically and use a set of available optimization solvers.On the other hand, an attempt to apply this method to the general problem of scheduling largescale workflows on heterogeneous cloud resources would be impractical due to the problem complexity and therefore simplified models need to be analyzed.In our previous work [7], we used a similar technique to solve the problem where the application consists of tasks that either are identical or vary in size within a small range.As observed in [8,9], largescale scientific workflows often consist of multiple parallel stages or levels, each of which has a structure of set of tasks; that is, the tasks in each level are similar and independent of each other.In the case of large workflows, when the number of tasks in the level is high, it becomes more practical to optimize the execution of the whole level instead of looking

Related Work in Cloud Workflow Scheduling
Our work is related to heuristic algorithms for workflow scheduling on IaaS clouds.In [12], the model assumes that infrastructure is provided by only one provider.The cloudtargeted autoscaling solution [10] considers dynamic and unpredictable workloads containing workflows.In [13], a multiobjective list-based method for workflow scheduling (MOHEFT) is proposed and evaluated.The solution presented in [14] focuses on cloud bursting scenario, where a private cloud is combined with a public one, and the goal is to minimize the cost while maintaining the workflow deadline.Our work is different from these approaches in two aspects.First, in our infrastructure model we assume multiple heterogeneous clouds with object storage attached to them, instead of individual machines with peer-to-peer data transfers between them.Moreover, rather than scheduling each task individually, our method proposes a global optimization of placement of workflow tasks and data.The deadline-constrained cost optimization of scientific workloads on heterogeneous IaaS described in [15] addresses multiple providers and data transfers between them, where the application is a set of tasks.The global cost minimization problem on clouds addressed in [16] focuses on data transfer costs and does not address workflows.Other approaches presented in [17,18] consider unpredictable dynamic workloads on IaaS clouds and optimize the objectives, such as cost, runtime, or utility function, by autoscaling the resource pool at runtime.
Pipelined workflows consisting of stages are addressed in [19].The processing model is a data flow and multiple instances of the same workflow are executed on the same set of cloud resources, whereas in our approach we focus on cost optimization instead of meeting the QoS constraints.
Integer linear programming (ILP) method is applied to scheduling workflows on hybrid clouds in [20].The objective is to minimize monetary cost under a deadline constraint.The scheduler uses varying discretization of the schedule timeline to reduce the complexity of the problem so that the employed CPLEX solver can find acceptable solutions within a 10-minute limit.The evaluation, however, is performed on the Montage and random fork-join workflows of 30 tasks with randomly chosen runtimes, while we focus on larger scale workflows and we address the complexity by grouping tasks into levels.

Methodology Based on Mathematical Optimization
The core of our methodology (see Figure 1) is to use mathematical modeling languages that can be coupled with a set of solvers dedicated to linear, nonlinear, or mixed integer programming problems.As modeling languages we use AMPL [5], as it is one of the most advanced modeling languages, and CMPL [6], as its open source alternative.These languages provide interfaces to a wide set of solvers, both commercial, such as CPLEX [21], and open source, such as CBC [22].The mathematical programming approach enables us to formally define optimization problem.AMPL (a mathematical programming language) and CMPL (COIN mathematical programming language) are algebraic mathematical modeling languages that resemble traditional mathematical notation to describe variables, objectives, and constraints.Algebraic modeling languages allow expressing a wide range of optimization problems: linear, nonlinear, and integer.The advantage of AMPL is that it is one of the most advanced mathematical programming languages, while CMPL is easier to use in open source projects.AMPL or CMPL enables us to separate model definition and instance specific data, usually into three files: model, data, and calling script.The model file defines abstract optimization model: sets and parameters, objective and constraints.The data file populates the sets and parameters with the numbers for the particular instance of the problem.Both model and data files are loaded from calling script that may do some pre-or postprocessing.In addition, it is possible to import and export data and results into some external format such as YAML for analysis or integration with external programs.The input to the solver has to be prepared in the form of a problem description.We separate the problem into an application model (in this case the leveled workflows) and infrastructure model (cloud consisting of compute sites running virtual machines and object storage such as Amazon S3).In addition, a scheduling model has to be defined, specifying how to calculate the objective and constraints using the application and infrastructure models.The challenge in the scheduling model is that it has to be developed to allow the solver to find a solution in a reasonable amount of time, so it must incorporate appropriate assumptions, constraints, and approximations.We discuss these assumptions in detail in Section 5.
The scheduling problems that we deal with in this paper are formulated as mixed integer programming (MIP) problems.This class of optimization problems has linear objective and constraints, while some or all of variables are integervalued.Such problems are solved by using branch-and-bound approach that uses a linear solver to solve subproblems.Moreover, the solvers can relax the integrality of the variables in order to estimate the solution, since no integer solution can be better than the solution of the same problem in continuous domain.The difference between the best integer solution found and the noninteger bound can be used to estimate the accuracy of solution and to reduce the search time (see Section 7.1).
In this paper, we describe two alternative scheduling models: for workflows with fine-grained and coarse-grained tasks.This is motivated by the observation [11] that the granularity of the tasks in the workflows has significant influence on the results of the optimization.The best results can be obtained when the average runtime of the tasks is similar to the billing cycle of the cloud provider, such as 1 hour on Amazon EC2.To address this issue, we developed another scheduling model for fine-grained tasks and deadlines shorter than one hour, which corresponds to the real characteristics of the Montage workflow.
The scheduling models have to be provided with the actual values of parameters, consisting of the application data and infrastructure data.To evaluate our models, we use two sources of application data: synthetic workflows obtained from the workflow generator gallery [23] and real data obtained from our recent benchmarks performed on Amazon EC2.As infrastructure parameters, we use two sources: CloudHarmony benchmarks [24] that publish CPU performance of selected cloud providers and our own applicationspecific benchmark results.For research presented in this paper, we selected the Montage workflow and EC2 cloud as an example of a real workflow and infrastructure.
In the following sections, we describe the models and datasets used in more detail.

Application and Infrastructure Models
In this paper we focus on large-scale scientific workflows [23].Examples of such workflows come from a wide variety of domains including bioinformatics (Epigenomics [25], SIPHT [26]), astronomy (Montage [27]), earthquake science (CyberShake [28]), and physics (LIGO [29]).Such workflows typically consist of a large number of computationally intensive tasks, processing large amounts of data.
We assume that each workflow may be represented with a directed acyclic graph (DAG) where nodes in the graph represent computational tasks, and the edges represent dataor control-flow dependencies between the tasks.Each task has a set of input and output files.We assume that the task and file sizes are known in advance.
Based on the characteristics of large-scale workflows, we assume that a workflow is divided into several levels that can be executed sequentially and tasks within one level not do depend on each other (see Figure 2).Each level represents a set of tasks that can be partitioned in several groups (A, B, etc.) that share computational cost and input/output size.We assume that only one task group is executed on a specific cloud instance.This forbids instance sharing between multiple levels, which means that each application may need its own specific VM template.
Similar to what is in [7], we assume multiple heterogeneous cloud IaaS infrastructures such as Amazon EC2, RackSpace, or ElasticHosts.Clouds have heterogeneous virtual machine instance types, with limits on the number of instances per cloud, for example, 20 for EC2 and 15 for RackSpace.Input and output data are stored on a cloud object store such as Amazon S3 or RackSpace CloudFiles.In our model, all virtual machine instances are billed per hour of usage, and there are fees associated with data transfer in/out of the cloud.In the application model, we also assume that there is a small constant cost of execution of a single task, which can correspond, for example, to the cost of a request to the queuing system such as Amazon SQS.The model allows us to include a private cloud where costs are set to 0.
For evaluation, we use synthetic workflows that were generated using historical data from real applications [23], as well as the data from our own measurements.The synthetic workflows were generated using code developed in [30], with task runtimes based on distributions gathered from running real workflows.The experimental data come from execution of Montage workflow on Amazon EC2 using the HyperFlow workflow management system [31].

Formulation of the Scheduling Problem
In this section we give the mathematical formulation of the models, beginning with application and infrastructure models, and then describe the scheduling models for coarsegrained and fine-grained workflows.We have intentionally decided to present the problem in a form which is different from the routine statement of mathematical progrramming way.The main reason was to make it easily understood for reasearchers engaged in workflow execution optimization.
To perform optimization of the total cost of the workflow execution, mixed integer problem (MIP) is formulated and implemented using a mathematical programming language.First, we have implemented the optimization model using AMPL [5] and solved it with CPLEX solver, then we ported it to open source CMPL [6] and solved it with CBC solver.Both systems require to specify input datasets and variables to define the search space, as well as constraints and an objective function to be optimized.

Application and Infrastructure Model
Input Data.The formulation requires a number of input sets to represent the infrastructure model.This is a similar way to an approach presented in [7].The infrasructure is described with the following sets: Introducing   and   enables one to describe the locality between compute and storage resources.This is an important aspect, since the cloud providers typically charge for the cost of data transfer out of a cloud site, while the transfers within the site are free.
Each instance type  ∈  is described with the following parameters: Additionally, we need to provide data transfer rates  , between a given storage site  and instance  in MiB per second.
Our application model is different from that in [7] because it is designed for workflow scheduling where tasks are grouped into levels.This fact is described with the following characteristics: (i) : a set of levels the workflow is divided into, (ii) : a set of task groups (A, B, etc., in Figure 2); tasks in groups have the same computational cost and input/output size, (iii)   ⊂ : a set of task groups belonging to a level  ∈ , (iv)  tot  : number of tasks in a group  ∈ , (v)    : execution time in hours of a single task in a group  on a machine with the processor performance of 1 CloudHarmony Compute Unit (CCU) [32], (vi)  in  ,  out  : data size for input and output of a task in group  in MiB, (vii)   : price per task for a queuing service, such as Amazon SQS, (viii)   : total time allowed for completing workflow (deadline).
The application model assumes that the estimated execution time    is known in advance; that is, it is obtained using benchmarks or other estimation methods [33], such as regression or performance modelling.When using general purpose cloud benchmarks, such as CloudHarmony [24], which provide processor performance measured in CCU, the    depends only on a task in group  since we assume that the actual task execution time on a specific instance is inversely proportional to the processing speed of the instance expressed in the number of CCU.As it is not always the case, since different tasks may have different processing speeds on different instances, it is also possible to provide execution time predictions at instance level:   , .The scheduling model can use such data if it is available.In Section 6.2 we provide an example of such a dataset for the Montage workflow on Amazon EC2.

Scheduling Model for Coarse-Grained Workflows.
In this model, we schedule groups of tasks of the same type divided into levels.We do not schedule individual tasks as in [34] to keep MIP problem small, as one of the requirements is that optimization time is shorter than the workflow execution time.The coarse-grained workflows are such workflows where task execution times are in the order of one hour.This is important, as we assume the hourly billing cycle of the cloud, so the model has to optimize the task assignment in such a way that the hourly slots of allocated resources (VM instances) are as fully utilized as possible.
To keep this model in the MIP class, we had to take a different approach than in [7] and schedule each virtual machine instance separately.A drawback of this approach is that we need to increase the number of decision variables.We have also divided the search space by storage providers, solving the problem separately for each storage and selecting the best result.Additionally, the deadline becomes a variable with an upper bound, as it may happen that a shorter deadline may actually give a cheaper solution (see Figure 5 and its discussion).
Auxiliary Parameters.Based on the input parameters, in the scheduling model we derive a set of precomputed parameters that are used for expressing objectives and constraints.The transfer time is computed based on the input and output data size and the transfer rate between an instance and the storage.The time for processing a task is a sum of computing and data transfer time.The cost of data transfer is a sum of cost of input and output data, both including the transfer fees at the source and destination cloud site.The indexing of instances is introduced; for example, all m1.small instances are numbered 0, 1, 2, . .., to distinguish between individual instances of a given type: (ii)  ,, : for how many hours the instance of index  is launched (integer); (iii)  ,, : how many tasks of  are processed on that instance (integer); (iv)    : actual computation time for level  (real); (v)   : maximal number of hours (deadline) that instances are allowed to run at level  (integer).
The variables defined in this way allow the solver to search over the space of possible assignments of instances to task groups ( ,, ) with a varying number  ,, of hours each instance is launched and number  ,, of tasks processed on these instances.The deadline is divided into subdeadlines for each workflow level , while the actual computation time    can be shorter than the deadline   .
Objective.The scheduling problem is represented as a cost minimization problem.The cost of running a single task is defined as follows: and it includes the cost of the computing time of instance (1), the cost of transfer of input data (2), that of output data (3), and request price (4).The objective function  tot represents the total cost which is a sum of task costs computed over all the task groups, all the instance types, and the individual instances.It is defined as To properly implement the assumptions we impose on the application, infrastructure, and scheduling models, the following constraints have to be introduced.
(1) ∑ ∈   ≤   ensures that the sum of subdeadlines of all levels is not greater than the workflow deadline, that is, that the workflow finishes in the given deadline.
(  The scheduling model presented above shows its advantages if the workflow tasks are about one hour long or larger, and the deadline exceeds one hour.For fine-grained workflows, such as Montage where most task execution times are in order of seconds and the whole workflow may be finished within an hour, a model can be simplified.

Scheduling Model for Fine-Grained Workflows.
When scheduling workflows with many short tasks and with deadlines shorter than the cloud billing cycle (one hour), we do not need to use the  ,, variable that counts the number of hours the instance is running.Thus we can assume that each level completes its work in one hour.This assumption reduces the number of decision variables making the MIP problem faster to solve.We also add an assumption that only one instance type may be used for each task type, which also reduces the search space.
In addition to these assumptions, we changed the way how the data transfer time is computed.Since for short tasks the data access latency is important, in addition to transfer rate  , we also provide the latency parameter  lat , .The actual values come from linear regression of experimental data, where we run Montage workflow on Amazon S3.In the fine-grained scheduling model, we also use execution time predictions at instance level:   , .The   is normalized by the number of CPU cores present on the VM if there are enough tasks to be processed in parallel.The modifications mentioned in this paragraph may also be applied to the coarse-grained model if needed.
Based on these modifications, the auxiliary parameters transfer time  The remaining part of the model has the following form.
Variables.Variables are similar to the ones in the coarsegrained model, but the problem has less dimensions, since there is no need to use  ,, and to distinguish instances by index : (i)  , tells if instances of type  are used to process task group  (binary); (ii)  , tells how many instances of type  are launched to process task group  (integer); (iii)  , tells how many tasks in group  are processed on instances of type  (integer); (iv)    tells actual computation time for level  (real).
Objective.The cost function  tot is computed in a similar way, by summing the costs of all the task groups over all of the instances, taking into account the task assignment  , : Constraints.The constraints are as following: (1) ∑ ∈   ≤   ensures that workflow finishes before the given deadline; (2) ∀ ∈,∈  , ≤  , ≤  , ⋅   max  ensures that the number of active instances  , is consistent with the binary variable  , and does not exceed the instance limit; (3) ∀ ∈,∈  , ≤  , ≤  , ⋅  tot  ensures that there are no empty instances and that the number of assigned tasks does not exceed the total number of tasks;  This scheduling model yields reasonable results only for the cases when it is actually possible to complete all the workflow tasks before the deadline.If not, the solver will not find any solution.
The optimization models introduced in this section were implemented using CMPL and AMPL effectively being workflow schedulers.The source code of the schedulers is available as an online supplement (https://github.com/kfigiela/optimization-models/tree/ppam-extended/workflows).The public repository on GitHub includes the model files, the data, and the scripts we used to run the solvers.

Application and Infrastructure Data Used for Evaluation
To perform optimization we need to provide optimization models defined in the previous section with data describing an application and an infrastructure.First, we used the generic infrastructure benchmarks obtained from CloudHarmony and the application data from the workflow generator gallery.Next, we performed our own experiments using the Montage workflows on Amazon EC2, which provided the application-specific performance benchmark of cloud resources together to obtain the real application data.The data gathered during experiments are inputs for the scheduler.

Data for Coarse-Grained Scheduler.
To evaluate the coarse-grained scheduler on realistic data, we used Cloud-Harmony [24] benchmarks to parameterize the infrastructure model, and we used the workflow generator gallery workflows [23] as test applications.In the infrastructure model we assumed that we had 4 public cloud providers (Amazon EC2, RackSpace, GoGrid, and ElasticHosts) and a private cloud with 0 costs.The infrastructure had two storage sites: S3 which is local to EC2, and CloudFiles which is local to RackSpace, so data transfers between local virtual machines and storage sites are free.We used the first generation of CloudHarmony CPU benchmarks described in [24].CloudHarmony CPU benchmarks use CloudHarmony Compute Unit (CCU) as a unit for measuring CPU performance.It is calculated based on a set of general-purpose CPU benchmarks [32].First generation benchmarks were calibrated relative to Amazon's m1.small instance and are now deprecated in favor of new benchmarks that are calibrated to nonvirtualized hardware.The new benchmark is compared to our benchmark data in Figure 3. Actual datasets are provided as an online supplement (https://github.com/kfigiela/optimization-models/tree/ppam-extended/workflows).We tested the coarse-grained scheduler with all of the applications from the gallery: Montage, CyberShake, Epigenomics, LIGO, and SIPHT for all available workflow sizes (from 50 to 1000 tasks per workflow up to 5000 tasks in the case of SIPHT workflow).We varied the deadline from 1 to 30 hours with 1-hour increments.We solved the problem for two cases, depending on whether the data are stored on S3 or on CloudFiles.

Data for Fine-Grained Scheduler.
Cloud benchmarks, such as CloudHarmony [24], are based on set of generalpurpose benchmarks that do not necessarily represent scientific applications that are to be scheduled.In order to find out how it may differ, we run Montage workflow on several Amazon EC2 instance types.The workflow of 12700 tasks processing 8.5 GiB of photos rendered a mosaic of an 8 × 8 degree region at Orion Nebula from 2MASS survey.
Usually, benchmarks take into account the fact that instances provide multiple virtual cores that speed up multithreaded applications, but it has no impact on single threaded ones.Montage workflow tasks are single threaded and therefore in our experiment the number of execution threads running in parallel was equal to the number of virtual cores.We used the HyperFlow workflow engine [31] to drive workflow execution.In the experiment, we used EBS (elastic block storage) volume for data storage instead of S3 (simple storage service); however we measured the transfer times to and from S3 separately.EBS is different from S3 as it provides block level access (i.e., filesystem) to the data volume, while S3 is object store available as a service by REST API.
The data we gathered in experiments may be used to calculate application-specific performance metric of the instance (ECU-like).In Figure 3 we compare our results with CloudHarmony benchmarks.It shows that, for the tasks forming the parallel levels of Montage workflow (such as mProjectPP [27]), the performance of the instances is proportional to the generic CPU benchmark.On the other hand, for the levels that are not parallel (e.g., mBgModel), there is no difference between cheaper m3.large and more expensive instance types (e.g., c3.8xlarge).Those instance types are deployed on the same generation of hardware, so their performance for single threaded applications is very similar.Additionally, as a reference we show the instance performance provided by Amazon in ECU (EC2 Compute Units).The observation from this evaluation is that the benchmarks from CloudHarmony give better approximation to the task performance than the generic ECU value.Moreover, it is important to distinguish between parallel and sequential workflow levels when selecting the virtual machine instance type.The dataset obtained in this experiment was used for evaluation of fine-grained scheduling model in Section 7.2.

Evaluation of Optimization Models
In this section, we present the results of optimization, obtained by applying our schedulers to the application and infrastructure data.First, we show the results of using the coarse-grained scheduler applied to the generic CloudHarmony datasets.Next, we present the results of the fine-grained scheduler applied to the dataset obtained from our experiments with the Montage workflow on EC2.

Results for
Coarse-Grained Scheduling. Figure 4 shows the cost of execution of the Epigenomics application with two workflows of sizes 400 and 500 tasks as a function of deadline.For longer deadlines (over 6 hours), the private cloud instances and the cheapest RackSpace instances are used so the cost is low when using CloudFiles.For shorter deadlines, the cost grows rapidly, since we reach the limit of instances per cloud and additional instances must be spawned on a different provider, thus making the transfer costs higher.This effect is amplified in Figure 4(a), which differs from Figure 4(b) not only by the number of tasks, but also by the data size of the most data-intensive level.This means that the transfer costs are growing more rapidly, so it becomes more economical to store the data on Amazon EC2 that provides more powerful instances required for short deadlines.
One interesting feature of our scheduler is that for longer deadlines it enables finding the cost-optimal solutions that have shorter workflow completion time than the requested deadline.This effect can be observed in Figure 5 and is caused by the fact that for long deadlines the simple solution is to run the application on a set of the least expensive machines.
Figures 6(a) to 7(b) show results obtained for Cybershake, LIGO, Montage, and SIPHT workflows.These workflows have relatively small execution time, so even for short deadlines the scheduler is able to schedule tasks on the cheapest instances on a single cloud, thus resulting in flat characteristics.
To investigate how the scheduler behaves for workflows with the same structure, but with much longer runtimes of tasks, we run the optimization for Montage workflow with tasks 1000x longer.This corresponds to the scenario where tasks are in the order of hours instead of seconds.The results in Figure 8 show how the cost increases very steeply with shorter deadlines, illustrating the trade-off between time and cost.The difference between Figures 7(a) and 8 illustrates that the scheduler is more useful for workflows when tasks are of granularity that is similar to the granularity of the (hourly) billing cycle of cloud providers.Additionally, Figure 8 shows how the optimal cost depends on available clouds.
The runtime of the optimization algorithm for workflows with up to 1000 tasks ranges from a few seconds up to 4 minutes using the CPLEX [21] solver running on a server with 4 16-core 2.3 GHz AMD Opteron processors (model 6276), with CPLEX limited artificially to use only 32 cores.Figure 9(a) shows that the time becomes much higher for shorter deadlines and increases slowly for very long deadlines.This is correlated with the size of search space: the  longer the deadline, the larger the search space, while for shorter deadlines the problem has a very small set of acceptable solutions.The problem becomes more severe for bigger and more complex workflows like SIPHT as optimization time becomes very high (Figure 9(b)).
Figure 10 illustrates how the optimization time depends on MIP gap solver setting.The relative MIP gap is a relative diference between the best integer solution found by the solver and the possible optimal noninteger solution.The MIP gap value indicates to solver to stop when an integer feasible solution has been proved to be within a given percent of optimality [21].Applying a relative MIP gap of 1% or 5% instead of default 0.01% shortens optimization time in orders of magnitude.Increasing the MIP gap to 5% did not decrease the quality of the result noticeably: the minimum cost obtained for the gap of 5% was higher only by 3.63% in the worst case.

Results for Fine-Grained Workflows and Short Deadlines.
We performed optimization for deadlines ranging from 13 to 60 minutes, using the Amazon EC2 cloud, with S3 or local storage.When assuming that the storage is local, we set the  net ,, fixed to 0, which may represent, for example, a very fast NFS storage when transfer times are negligible.The results shown in Figure 11 have similar character to those we got in [7] and to the ones obtained using the coarsegrained scheduler and task runtimes artificially expanded (Figure 8).This observation leads to the conclusion that the granularity of the workflow tasks versus the granularity of the billing cycle of the cloud provider plays an important role in scheduling.In our case, we had to define two separate schedulers to address this issue.The problem, however, may be more complex when we assume more cloud providers with different billing cycles, such as hourly, 5-minute, or perminute billing.This may be an interesting subject for further research.

Conclusions and Future Work
In this paper, we presented the schedulers using cost optimization for scientific workflows executing on multiple heterogeneous clouds.The models, formulated in AMPL and CMPL, allow us to find the optimal assignment of workflow tasks, grouped into levels, to cloud instances.We validated our models with a set of synthetic benchmark workflows as well as with the data of real astronomy workflow, and we observed that they gave useful solutions in a reasonable amount of computing time.
Based on our experiments with execution of Montage workflow on Amazon EC2 cloud and its characteristics, we developed separate scheduling models dedicated to coarsegrained workflows and to fine-grained workflows with short deadlines.We also compared the general-purpose cloud benchmarks, such as CloudHarmony, with our own measurements.The results underline the importance of applicationspecific cloud benchmarking, since the general purpose benchmarks can serve only as the rough approximation of the actual application performance.The observed relations between the granularity of the tasks and the performance of optimization models shows the influence of the cloud billing cycle on the cost optimizing workflow scheduling.
By solving the models for multiple deadlines, we can produce trade-off plots, showing how the cost depends on the deadline.We believe that such plots are a step towards a scientific cloud workflow calculator, supporting resource management decisions for both end-users and workflow-asa-service providers.
In the future, we plan to apply this model to the problem of provisioning cloud resources for workflow ensembles [3], where the optimization of cost can drive the workflow admission decisions.We also plan to refine the models to better support smaller workflows by reusing instances between levels, to fine-tune the model, and to test different solver configurations to reduce the computing time, as well as to apply the optimization models to the problem of dynamic workflow scheduling in order to better handle the uncertainties in the infrastructure and the application.

Figure 3 :
Figure3: Amazon instance benchmarks for different tasks compared to the generic CloudHarmony benchmark and Amazon ECU.Data was normalized to m1.small instance type having relative performance of 1.0.

Figure 4 :
Figure 4: Result of coarse-grained scheduling for the Epigenomics application.

Figure 5 :Figure 6 :
Figure 5: Ratio of the actual completion time to the deadline for the Epigenomics workflow with 500 tasks.

Figure 7 :Figure 8 :Figure 9 :
Figure 7: Optimal cost found by the scheduler for Montage and SIPHT applications.

Figure 10 :Figure 11 :
Figure 10: Solver runtime with different relative MIP gap (in percent), showing the relation between accuracy and runtime of the solver for the coarse-grained scheduler for Montage workflow of 500 tasks with runtimes artificially multiplied by 1000 for different cloud infrastructures.
: a fee (in US dollars) for running the instance of type  for one hour, (ii) ccu   : performance of instance of type  in CloudHarmony Compute Units (CCU), : price for nonlocal data transfer to and from an instance of type  in US dollars per MiB (1 MiB = 1024 ⋅ 1024 bytes), (v)   max  : upper limit of the number of instances of type , equal to   max  , where  is the provider of instance type .