Integrated data placement and task assignment for scientific workflows in clouds

Abstract : We consider the problem of optimizing the execution of data-intensive scientific workflows in the Cloud. We address the problem under the following scenario. The tasks of the workflows communicate through files; the output of a task is used by another task as an input file and if these tasks are assigned on different execution sites, a file transfer is necessary. The output files are to be stored at a site. Each execution site is to be assigned a certain percentage of the files and tasks. These percentages, called target weights, are pre-determined and reflect either user preferences or the storage capacity and computing power of the sites. The aim is to place the data files into and assign the tasks to the execution sites so as to reduce the cost associated with the file transfers, while complying with the target weights. To do this, we model the workflow as a hypergraph and with a hypergraph-partitioning-based formulation, we propose a heuristic which generates data placement and task assignment schemes simultaneously. We report simulation results on a number of real-life and synthetically generated scientific workflows. Our results show that the proposed heuristic is fast, and can find mappings and assignments which reduce file transfers, while respecting the target weights.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/hal-00786551
Contributor : Equipe Roma <>
Submitted on : Friday, February 8, 2013 - 8:15:34 PM
Last modification on : Sunday, September 29, 2019 - 9:58:17 PM

Identifiers

Collections

Citation

Umit Catalyurek, Kamer Kaya, Bora Uçar. Integrated data placement and task assignment for scientific workflows in clouds. Proceedings of the fourth international workshop on Data-intensive distributed computing, Jun 2011, New York, NY, USA, United States. pp.45--54, ⟨10.1145/1996014.1996022⟩. ⟨hal-00786551⟩

Share

Metrics

Record views

149