Adaptive Caching for Data-Intensive Scientific Workflows in the Cloud - Archive ouverte HAL Access content directly
Conference Papers Year :

Adaptive Caching for Data-Intensive Scientific Workflows in the Cloud

(1) , (2) , (1) , (3, 1) , (4) , (1)
1
2
3
4

Abstract

Many scientific experiments are now carried on using scien-tific workflows, which are becoming more and more data-intensive and complex. We consider the efficient execution of such workflows in the cloud. Since it is common for workflow users to reuse other workflows or data generated by other workflows, a promising approach for efficient workflow execution is to cache intermediate data and exploit it to avoid task re-execution. In this paper, we propose an adaptive caching solution for data-intensive workflows in the cloud. Our solution is based on a new scientific workflow management architecture that automatically manages the storage and reuse of intermediate data and adapts to the variations in task execution times and output data size. We evaluated our solution by implementing it in the OpenAlea system and performing extensive experiments on real data with a data-intensive application inplant phenotyping. The results show that adaptive caching can yield major performance gains,e.g., up to 120.16% with 6 workflow re-executions.
Fichier principal
Vignette du fichier
DEXA_2019.pdf (2.02 Mo) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-02174445 , version 1 (05-07-2019)

Identifiers

Cite

Gaëtan Heidsieck, Daniel de Oliveira, Esther Pacitti, Christophe Pradal, Francois Tardieu, et al.. Adaptive Caching for Data-Intensive Scientific Workflows in the Cloud. DEXA 2019 - 30th International Conference on Database and Expert Systems Applications, Aug 2019, Linz, Austria. pp.452-466, ⟨10.1007/978-3-030-27618-8_33⟩. ⟨hal-02174445⟩
211 View
433 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More