Scalable Reed-Solomon-based Reliable Local Storage for HPC Applications on IaaS Clouds

Leonardo Bautista Gomez 1, 2 Bogdan Nicolae 1, 3, * Naoya Maruyama 1 Franck Cappello 1, 3 Satoshi Matsuoka 1
* Auteur correspondant
3 GRAND-LARGE - Global parallel and distributed computing
LRI - Laboratoire de Recherche en Informatique, LIFL - Laboratoire d'Informatique Fondamentale de Lille, UP11 - Université Paris-Sud - Paris 11, Inria Saclay - Ile de France, CNRS - Centre National de la Recherche Scientifique : UMR8623
Abstract : With increasing interest among mainstream users to run HPC applications, Infrastructure-as-a-Service (IaaS) cloud computing platforms represent a viable alternative to the acquisition and maintenance of expensive hardware, often out of the financial capabilities of such users. Also, one of the critical needs of HPC applications is an efficient, scalable and persistent storage. Unfortunately, storage options proposed by cloud providers are not standardized and typically use a different access model. In this context, the local disks on the compute nodes can be used to save large data sets such as the data generated by Checkpoint-Restart (CR). This local storage offers high throughput and scalability but it needs to be combined with persistency techniques, such as block replication or erasure codes. One of the main challenges that such techniques face is to minimize the overhead of performance and I/O resource utilization (i.e., storage space and bandwidth), while at the same time guaranteeing high reliability of the saved data. This paper introduces a novel persistency technique that leverages Reed-Solomon (RS) encoding to save data in a reliable fashion. Compared to traditional approaches that rely on block replication, we demonstrate about 50% higher throughput while reducing network bandwidth and storage utilization by a factor of 2 for the same targeted reliability level. This is achieved both by modeling and real life experimentation on hundreds of nodes.
Type de document :
Communication dans un congrès
Euro-Par '12: 18th International Euro-Par Conference on Parallel Processing, Aug 2012, Rhodes, Greece. pp.313-324, 2012, 〈10.1007/978-3-642-32820-6_32〉
Liste complète des métadonnées

Littérature citée [23 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00703119
Contributeur : Bogdan Nicolae <>
Soumis le : jeudi 31 mai 2012 - 21:12:57
Dernière modification le : jeudi 5 avril 2018 - 12:30:12
Document(s) archivé(s) le : samedi 1 septembre 2012 - 02:31:24

Fichier

1569568223.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Leonardo Bautista Gomez, Bogdan Nicolae, Naoya Maruyama, Franck Cappello, Satoshi Matsuoka. Scalable Reed-Solomon-based Reliable Local Storage for HPC Applications on IaaS Clouds. Euro-Par '12: 18th International Euro-Par Conference on Parallel Processing, Aug 2012, Rhodes, Greece. pp.313-324, 2012, 〈10.1007/978-3-642-32820-6_32〉. 〈hal-00703119〉

Partager

Métriques

Consultations de la notice

1167

Téléchargements de fichiers

474