An Algebraic Approach for Data-Centric Scientific Workflows

Abstract : Scientific workflows have emerged as a basic abstraction for structuring and executing scientific experiments in computational environments. In many situations, these workflows are computationally and data intensive, thus requiring execution in large-scale parallel computers. However, parallelization of scientific workflows remains low-level, ad-hoc and laborintensive, which makes it hard to exploit optimization opportunities. To address this problem, we propose an algebraic approach (inspired by relational algebra) and a parallel execution model that enable automatic optimization of scientific workflows. We conducted a thorough validation of our approach using both a real oil exploitation application and synthetic data scenarios. The experiments were run in Chiron, a data-centric scientific workflow engine implemented to support our algebraic approach. Our experiments demonstrate performance improvements of up to 226% compared to an ad-hoc workflow implementation.
Type de document :
Article dans une revue
Proceedings of the VLDB Endowment (PVLDB), VLDB Endowment, 2011, 4 (11), pp.1328-1339
Liste complète des métadonnées
Contributeur : Patrick Valduriez <>
Soumis le : vendredi 11 novembre 2011 - 23:18:24
Dernière modification le : mercredi 21 novembre 2018 - 19:26:08


  • HAL Id : hal-00640431, version 1



Eduardo Ogasawara, Daniel De Oliveira, Patrick Valduriez, Daniel Dias, Fabio Porto, et al.. An Algebraic Approach for Data-Centric Scientific Workflows. Proceedings of the VLDB Endowment (PVLDB), VLDB Endowment, 2011, 4 (11), pp.1328-1339. 〈hal-00640431〉



Consultations de la notice