Skip to Main content Skip to Navigation
Conference papers

Multisite Management of Data-intensive Scientific Workflows in the Cloud

Ji Liu 1, 2, *
* Corresponding author
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : The current solutions for the parallel execution of scientific workflows are appropriate for static computing and storage resources in a grid environment. They have been extended to deal with more elastic resources in a cloud, but with only one site. Our analysis [1] of the current techniques of scientific workflow parallelization and scientific workflow execution has shown that there is a lot of room for improvement in the following directions: 1. Data staging: existing techniques mainly focus on the mechanism that starts scientific workflow execution after gathering all the related data in a shared-disk file system at one data center, which is time consuming. 2. Architecture: the structure of SWfMSs is generally centralized, with a master node, which is a single point of failure and performance bottleneck, managing all the optimization and scheduling processes. 3. Task scheduling and data location: most SWfMSs do not take data location into account during task scheduling, which makes it inefficient to read or write data. 4. Multisite: novel task and data scheduling approaches are required for utilizing resources in a multisite cloud. In the rest of this paper, we define more precisely the problem and introduce our approach to address it.
Document type :
Conference papers
Complete list of metadata

Cited literature [6 references]  Display  Hide  Download
Contributor : David Gross-Amblard Connect in order to contact the contributor
Submitted on : Tuesday, June 30, 2015 - 3:34:32 PM
Last modification on : Friday, October 22, 2021 - 3:07:16 PM
Long-term archiving on: : Tuesday, April 25, 2017 - 8:25:12 PM


Publisher files allowed on an open archive


Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0 International License


  • HAL Id : hal-01169960, version 1



Ji Liu. Multisite Management of Data-intensive Scientific Workflows in the Cloud. BDA: Gestion de Données — Principes, Technologies et Applications, Oct 2014, Autrans, France. pp.28-30. ⟨hal-01169960⟩



Record views


Files downloads