Multisite Management of Data-intensive Scientific Workflows in the Cloud

Ji Liu 1, 2, *
* Corresponding author
2 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : The current solutions for the parallel execution of scientific workflows are appropriate for static computing and storage resources in a grid environment. They have been extended to deal with more elastic resources in a cloud, but with only one site. Our analysis [1] of the current techniques of scientific workflow parallelization and scientific workflow execution has shown that there is a lot of room for improvement in the following directions: 1. Data staging: existing techniques mainly focus on the mechanism that starts scientific workflow execution after gathering all the related data in a shared-disk file system at one data center, which is time consuming. 2. Architecture: the structure of SWfMSs is generally centralized, with a master node, which is a single point of failure and performance bottleneck, managing all the optimization and scheduling processes. 3. Task scheduling and data location: most SWfMSs do not take data location into account during task scheduling, which makes it inefficient to read or write data. 4. Multisite: novel task and data scheduling approaches are required for utilizing resources in a multisite cloud. In the rest of this paper, we define more precisely the problem and introduce our approach to address it.
Document type :
Conference papers
Complete list of metadatas

Cited literature [6 references]  Display  Hide  Download

https://hal.inria.fr/hal-01169960
Contributor : David Gross-Amblard <>
Submitted on : Tuesday, June 30, 2015 - 3:34:32 PM
Last modification on : Tuesday, November 20, 2018 - 12:54:36 PM
Long-term archiving on: Tuesday, April 25, 2017 - 8:25:12 PM

File

bda2014-actes-phd-5-pp28-30.pd...
Publisher files allowed on an open archive

Licence


Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0 International License

Identifiers

  • HAL Id : hal-01169960, version 1

Collections

Citation

Ji Liu. Multisite Management of Data-intensive Scientific Workflows in the Cloud. BDA: Gestion de Données — Principes, Technologies et Applications, Oct 2014, Autrans, France. pp.28-30. ⟨hal-01169960⟩

Share

Metrics

Record views

1595

Files downloads

249