Is it time to revisit Erasure Coding in Data-intensive clusters?

Jad Darrous 1 Shadi Ibrahim 2, 3 Christian Pérez 1
1 AVALON - Algorithms and Software Architectures for Distributed and HPC Platforms
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
2 STACK - Software Stack for Massively Geo-Distributed Infrastructures
Inria Rennes – Bretagne Atlantique , LS2N - Laboratoire des Sciences du Numérique de Nantes
Abstract : Data-intensive clusters are heavily relying on distributed storage systems to accommodate the unprecedented growth of data. Hadoop distributed file system (HDFS) is the primary storage for data analytic frameworks such as Spark and Hadoop. Traditionally, HDFS operates under replication to ensure data availability and to allow locality-aware task execution of data-intensive applications. Recently, erasure coding (EC) is emerging as an alternative method to replication in storage systems due to the continuous reduction in its computation overhead. In this work, we conduct an extensive experimental study to understand the performance of data-intensive applications under replication and EC. We use representative benchmarks on the Grid'5000 testbed to evaluate how analytic workloads, data persistency, failures, the back-end storage devices, and the network configuration impact their performances. Our study sheds the light not only on the potential benefits of erasure coding in data-intensive clusters but also on the aspects that may help to realize it effectively.
Document type :
Conference papers
Complete list of metadatas

Cited literature [58 references]  Display  Hide  Download

https://hal.inria.fr/hal-02263116
Contributor : Jad Darrous <>
Submitted on : Friday, August 2, 2019 - 8:50:19 PM
Last modification on : Monday, December 2, 2019 - 11:43:50 AM
Long-term archiving on: Wednesday, January 8, 2020 - 7:25:05 PM

File

MASCOTS CR.pdf
Files produced by the author(s)

Identifiers

Citation

Jad Darrous, Shadi Ibrahim, Christian Pérez. Is it time to revisit Erasure Coding in Data-intensive clusters?. MASCOTS 2019 - 27th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, Oct 2019, Rennes, France. pp.165-178, ⟨10.1109/MASCOTS.2019.00026⟩. ⟨hal-02263116⟩

Share

Metrics

Record views

118

Files downloads

1121