Leveraging naturally distributed data redundancy to reduce collective I/O replication overhead

Bogdan Nicolae 1, *
* Corresponding author
Abstract : Dumping large amounts of related data simulta-neously to local storage devices instead of a parallel file system is a frequent I/O pattern of HPC applications running at large scale. Since local storage resources are prone to failures and have limited potential to serve multiple requests in parallel, techniques such as replication are often used to enable re-silience and high availability. However, replication introduces overhead, both in terms of network traffic necessary to distribute replicas, as well as extra storage space requirements. To reduce this overhead, state-of-art techniques often apply redundancy elimination (e.g. compression or deduplication) before replication, ignoring the natural redundancy that is already present. By contrast, this paper proposes a novel scheme that treats redundancy elimination and replication as a single co-optimized phase: remotely duplicated data is detected and directly leveraged to maintain a desired replication factor by keeping only as many replicas as needed and adding more if necessary. In this context, we introduce a series of high performance algorithms specifically designed to operate under tight and controllable constrains at large scale. We present how this idea can be leveraged in practice and demonstrate its viability for two real-life HPC applications.
Complete list of metadatas

Cited literature [30 references]  Display  Hide  Download

https://hal.inria.fr/hal-01115700
Contributor : Bogdan Nicolae <>
Submitted on : Wednesday, February 11, 2015 - 3:46:59 PM
Last modification on : Saturday, November 5, 2016 - 9:31:49 PM
Long-term archiving on : Thursday, May 28, 2015 - 10:00:42 AM

File

paper.pdf
Files produced by the author(s)

Identifiers

Citation

Bogdan Nicolae. Leveraging naturally distributed data redundancy to reduce collective I/O replication overhead. IPDPS '15: 29th IEEE International Parallel and Distributed Processing Symposium, May 2015, Hyderabad, India. ⟨10.1109/IPDPS.2015.82⟩. ⟨hal-01115700⟩

Share

Metrics

Record views

180

Files downloads

187