An Algebraic Approach for Data-Centric Scientific Workflows

Abstract : Scientific workflows have emerged as a basic abstraction for structuring and executing scientific experiments in computational environments. In many situations, these workflows are computationally and data intensive, thus requiring execution in large-scale parallel computers. However, parallelization of scientific workflows remains low-level, ad-hoc and laborintensive, which makes it hard to exploit optimization opportunities. To address this problem, we propose an algebraic approach (inspired by relational algebra) and a parallel execution model that enable automatic optimization of scientific workflows. We conducted a thorough validation of our approach using both a real oil exploitation application and synthetic data scenarios. The experiments were run in Chiron, a data-centric scientific workflow engine implemented to support our algebraic approach. Our experiments demonstrate performance improvements of up to 226% compared to an ad-hoc workflow implementation.
Document type :
Journal articles
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/hal-00640431
Contributor : Patrick Valduriez <>
Submitted on : Wednesday, November 13, 2019 - 9:57:21 PM
Last modification on : Thursday, November 14, 2019 - 12:08:33 PM

Identifiers

  • HAL Id : hal-00640431, version 1

Collections

Citation

Eduardo Ogasawara, Daniel de Oliveira, Patrick Valduriez, Jonas Dias, Fabio Porto, et al.. An Algebraic Approach for Data-Centric Scientific Workflows. Proceedings of the VLDB Endowment (PVLDB), VLDB Endowment, 2011, 4 (11), pp.1328-1339. ⟨hal-00640431⟩

Share

Metrics

Record views

491

Files downloads

71