Skip to Main content Skip to Navigation
New interface
Conference papers

DataSteward: Using Dedicated Compute Nodes for Scalable Data Management on Public Clouds

Radu Tudoran 1 Alexandru Costan 1 Gabriel Antoniu 1 
1 KerData - Scalable Storage for Clouds and Beyond
Inria Rennes – Bretagne Atlantique , IRISA-D1 - SYSTÈMES LARGE ÉCHELLE
Abstract : A large spectrum of scientific applications, some generating data volumes exceeding petabytes, are currently being ported on clouds to build on their inherent elasticity and scalability. One of the critical needs in order to deal with this "data deluge" is an efficient, scalable and reliable storage. However, the storage services proposed by cloud providers suffer from high latencies, trading performance for availability. One alternative is to federate the local virtual disks on the compute nodes into a globally shared storage used for large intermediate or checkpoint data. This collocated storage supports a high throughput but it can be very intrusive and subject to failures that can stop the host node and degrade the application performance. To deal with these limitations we propose DataSteward, a data management system that provides a higher degree of reliability while remaining non-intrusive through the use of dedicated compute nodes. DataSteward harnesses the storage space of a set of dedicated VMs, selected using a topology-aware clustering algorithm, and has a lifetime dependent on the deployment lifetime. To capitalize on this separation, we introduce a set of scientific data processing services on top of the storage layer, that can overlap with the executing applications. We performed extensive experimentations on hundreds of cores in the Azure cloud: compared to state-of-the-art node selection algorithms, we show up to a 20% higher throughput, which improves the overall performance of a real life scientific application up to 45%.
Document type :
Conference papers
Complete list of metadata

Cited literature [12 references]  Display  Hide  Download
Contributor : Radu Tudoran Connect in order to contact the contributor
Submitted on : Sunday, January 12, 2014 - 4:37:31 PM
Last modification on : Thursday, January 20, 2022 - 4:20:16 PM
Long-term archiving on: : Saturday, April 8, 2017 - 2:26:17 PM


Files produced by the author(s)



Radu Tudoran, Alexandru Costan, Gabriel Antoniu. DataSteward: Using Dedicated Compute Nodes for Scalable Data Management on Public Clouds. TrustCom 2013 - 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, Jul 2013, Melbourne, Australia. pp.1057--1064, ⟨10.1109/TrustCom.2013.129⟩. ⟨hal-00927283⟩



Record views


Files downloads