Adaptive File Management for Scientific Workflows on the Azure Cloud

Radu Tudoran 1 Alexandru Costan 1 Rad Ramin Rezai 2 Goetz Brasche 2 Gabriel Antoniu 1
1 KerData - Scalable Storage for Clouds and Beyond
IRISA-D1 - SYSTÈMES LARGE ÉCHELLE, Inria Rennes – Bretagne Atlantique
2 Cloud Team
EMIC - European Microsoft Innovation Center
Abstract : Scientific workflows typically communicate data between tasks using files. Currently, on public clouds, this is achieved by using the cloud storage services, which are unable to exploit the workflow semantics and are subject to low throughput and high latencies. To overcome these limitations, we propose an alternative leveraging data locality through direct file transfers between the compute nodes. We rely on the observation that workflows generate a set of common data access patterns that our solution exploits in conjunction with context information to self-adapt, choose the most adequate transfer protocol and expose the data layout within the virtual machines to the workflow engines. This file management system was integrated within the Microsoft Generic Worker workflow engine and was validated using synthetic benchmarks and a real-life application on the Azure cloud. The results show it can bring significant performance gains: up to 5x file transfer speedup compared to solutions based on standard cloud storage and over 25% application timespan reduction compared to Hadoop on Azure.
Document type :
Conference papers
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download

https://hal.inria.fr/hal-00926748
Contributor : Radu Tudoran <>
Submitted on : Friday, January 10, 2014 - 10:57:26 AM
Last modification on : Thursday, November 15, 2018 - 11:57:44 AM
Long-term archiving on : Thursday, April 10, 2014 - 10:26:54 PM

File

bare_conf.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00926748, version 1

Citation

Radu Tudoran, Alexandru Costan, Rad Ramin Rezai, Goetz Brasche, Gabriel Antoniu. Adaptive File Management for Scientific Workflows on the Azure Cloud. IEEE Big Data, Oct 2013, Santa Clara, United States. pp.273 - 281. ⟨hal-00926748⟩

Share

Metrics

Record views

1064

Files downloads

489