Handling Data-skew Effects in Join Operations using MapReduce

Mohamad Al Hajj Hassan; Mostafa Bamha; Frédéric Loulergue

Communication Dans Un Congrès Année : 2014

Handling Data-skew Effects in Join Operations using MapReduce

(1) , (1) , (2)

1
2

Mohamad Al Hajj Hassan

Fonction : Auteur
PersonId : 904857

Laboratoire d'Informatique Fondamentale d'Orléans

Mostafa Bamha

Fonction : Auteur
PersonId : 834021

Laboratoire d'Informatique Fondamentale d'Orléans

Frédéric Loulergue

Fonction : Auteur
PersonId : 2199
IdHAL : frederic-loulergue
ORCID : 0000-0001-9301-7829
IdRef : 096178558

PaMDA

Résumé

For over a decade, MapReduce has become a prominent programming model to handle vast amounts of raw data in large scale systems. This model ensures scalability, reliability and availability aspects with reasonable query processing time. However these large scale systems still face some challenges: data skew, task imbalance, high disk I/O and redistribution costs can have disastrous effects on performance. In this paper, we introduce MRFA-Join algorithm: a new frequency adaptive algorithm based on MapReduce programming model and a randomised key redistribution approach for join processing of large-scale datasets. A cost analysis of this algorithm shows that our approach is insensitive to data skew and ensures perfect balancing properties during all stages of join computation. These performances have been confirmed by a series of experimentations.

Domaines

Calcul parallèle, distribué et partagé [cs.DC] Base de données [cs.DB]

Frédéric Loulergue : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00958116

Soumis le : mardi 11 mars 2014-16:43:48

Dernière modification le : samedi 25 juin 2022-10:12:30

Dates et versions

hal-00958116 , version 1 (11-03-2014)

Identifiants

HAL Id : hal-00958116 , version 1

Citer

Mohamad Al Hajj Hassan, Mostafa Bamha, Frédéric Loulergue. Handling Data-skew Effects in Join Operations using MapReduce. ICCS, 2014, Cairns, Australia. ⟨hal-00958116⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ORLEANS MSL MSL-THESE

264 Consultations

0 Téléchargements

Handling Data-skew Effects in Join Operations using MapReduce

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager