Leveraging naturally distributed data redundancy to reduce collective I/O replication overhead

Bogdan Nicolae 1, *
* Auteur correspondant
Abstract : Dumping large amounts of related data simulta-neously to local storage devices instead of a parallel file system is a frequent I/O pattern of HPC applications running at large scale. Since local storage resources are prone to failures and have limited potential to serve multiple requests in parallel, techniques such as replication are often used to enable re-silience and high availability. However, replication introduces overhead, both in terms of network traffic necessary to distribute replicas, as well as extra storage space requirements. To reduce this overhead, state-of-art techniques often apply redundancy elimination (e.g. compression or deduplication) before replication, ignoring the natural redundancy that is already present. By contrast, this paper proposes a novel scheme that treats redundancy elimination and replication as a single co-optimized phase: remotely duplicated data is detected and directly leveraged to maintain a desired replication factor by keeping only as many replicas as needed and adding more if necessary. In this context, we introduce a series of high performance algorithms specifically designed to operate under tight and controllable constrains at large scale. We present how this idea can be leveraged in practice and demonstrate its viability for two real-life HPC applications.
Type de document :
Communication dans un congrès
IPDPS '15: 29th IEEE International Parallel and Distributed Processing Symposium, May 2015, Hyderabad, India. 〈10.1109/IPDPS.2015.82〉
Liste complète des métadonnées

Littérature citée [30 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01115700
Contributeur : Bogdan Nicolae <>
Soumis le : mercredi 11 février 2015 - 15:46:59
Dernière modification le : samedi 5 novembre 2016 - 21:31:49
Document(s) archivé(s) le : jeudi 28 mai 2015 - 10:00:42

Fichier

paper.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Bogdan Nicolae. Leveraging naturally distributed data redundancy to reduce collective I/O replication overhead. IPDPS '15: 29th IEEE International Parallel and Distributed Processing Symposium, May 2015, Hyderabad, India. 〈10.1109/IPDPS.2015.82〉. 〈hal-01115700〉

Partager

Métriques

Consultations de la notice

152

Téléchargements de fichiers

125