Integrated data placement and task assignment for scientific workflows in clouds

Abstract : We consider the problem of optimizing the execution of data-intensive scientific workflows in the Cloud. We address the problem under the following scenario. The tasks of the workflows communicate through files; the output of a task is used by another task as an input file and if these tasks are assigned on different execution sites, a file transfer is necessary. The output files are to be stored at a site. Each execution site is to be assigned a certain percentage of the files and tasks. These percentages, called target weights, are pre-determined and reflect either user preferences or the storage capacity and computing power of the sites. The aim is to place the data files into and assign the tasks to the execution sites so as to reduce the cost associated with the file transfers, while complying with the target weights. To do this, we model the workflow as a hypergraph and with a hypergraph-partitioning-based formulation, we propose a heuristic which generates data placement and task assignment schemes simultaneously. We report simulation results on a number of real-life and synthetically generated scientific workflows. Our results show that the proposed heuristic is fast, and can find mappings and assignments which reduce file transfers, while respecting the target weights.
Type de document :
Communication dans un congrès
Proceedings of the fourth international workshop on Data-intensive distributed computing, Jun 2011, New York, NY, USA, United States. ACM, pp.45--54, 2011, 〈10.1145/1996014.1996022〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00786551
Contributeur : Equipe Roma <>
Soumis le : vendredi 8 février 2013 - 20:15:34
Dernière modification le : vendredi 20 avril 2018 - 15:44:24

Identifiants

Collections

Citation

Umit Catalyurek, Kamer Kaya, Bora Uçar. Integrated data placement and task assignment for scientific workflows in clouds. Proceedings of the fourth international workshop on Data-intensive distributed computing, Jun 2011, New York, NY, USA, United States. ACM, pp.45--54, 2011, 〈10.1145/1996014.1996022〉. 〈hal-00786551〉

Partager

Métriques

Consultations de la notice

87