Is it time to revisit Erasure Coding in Data-intensive clusters?

Jad Darrous; Shadi Ibrahim; Christian Pérez

doi:10.1109/MASCOTS.2019.00026

Communication Dans Un Congrès Année : 2019

Is it time to revisit Erasure Coding in Data-intensive clusters?

(1) , (2, 3) , (1)

1
2
3

Jad Darrous

Fonction : Auteur
PersonId : 174216
IdHAL : jad-darrous
ORCID : 0000-0003-4573-4529

Algorithms and Software Architectures for Distributed and HPC Platforms

Shadi Ibrahim

Fonction : Auteur
PersonId : 13360
IdHAL : shadi-ibrahim

Software Stack for Massively Geo-Distributed Infrastructures

Département Automatique, Productique et Informatique

Christian Pérez

Fonction : Auteur
PersonId : 3022
IdHAL : chperez
IdRef : 094180962

Algorithms and Software Architectures for Distributed and HPC Platforms

Résumé

Data-intensive clusters are heavily relying on distributed storage systems to accommodate the unprecedented growth of data. Hadoop distributed file system (HDFS) is the primary storage for data analytic frameworks such as Spark and Hadoop. Traditionally, HDFS operates under replication to ensure data availability and to allow locality-aware task execution of data-intensive applications. Recently, erasure coding (EC) is emerging as an alternative method to replication in storage systems due to the continuous reduction in its computation overhead. In this work, we conduct an extensive experimental study to understand the performance of data-intensive applications under replication and EC. We use representative benchmarks on the Grid'5000 testbed to evaluate how analytic workloads, data persistency, failures, the back-end storage devices, and the network configuration impact their performances. Our study sheds the light not only on the potential benefits of erasure coding in data-intensive clusters but also on the aspects that may help to realize it effectively.

Mots clés

Hadoop Erasure codes Data-intensive clusters MapReduce Experimental evaluation

Domaines

Informatique [cs] Performance et fiabilité [cs.PF]

Fichier principal

MASCOTS CR.pdf (892 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Jad Darrous : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02263116

Soumis le : vendredi 2 août 2019-20:50:19

Dernière modification le : jeudi 15 février 2024-03:30:57

Archivage à long terme le : mercredi 8 janvier 2020-19:25:05

Dates et versions

hal-02263116 , version 1 (02-08-2019)

Identifiants

HAL Id : hal-02263116 , version 1
DOI : 10.1109/MASCOTS.2019.00026

Citer

Jad Darrous, Shadi Ibrahim, Christian Pérez. Is it time to revisit Erasure Coding in Data-intensive clusters?. MASCOTS 2019 - 27th IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, Oct 2019, Rennes, France. pp.165-178, ⟨10.1109/MASCOTS.2019.00026⟩. ⟨hal-02263116⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON UNIV-NANTES INSTITUT-TELECOM UNIV-RENNES1 CNRS INRIA UNIV-LYON1 EC-NANTES IRISA GRID5000 UNAM INRIA2 UR1-MATH-STIC LS2N UR1-UFR-ISTIC LS2N-STACK UNIV-RENNES IMT-ATLANTIQUE UDL SILECS ANR UR1-MATH-NUM NANTES-UNIVERSITE

173 Consultations

863 Téléchargements

Is it time to revisit Erasure Coding in Data-intensive clusters?

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager