Handling Data-skew Effects in Join Operations using MapReduce

Abstract : For over a decade, MapReduce has become a prominent programming model to handle vast amounts of raw data in large scale systems. This model ensures scalability, reliability and availability aspects with reasonable query processing time. However these large scale systems still face some challenges: data skew, task imbalance, high disk I/O and redistribution costs can have disastrous effects on performance. In this paper, we introduce MRFA-Join algorithm: a new frequency adaptive algorithm based on MapReduce programming model and a randomised key redistribution approach for join processing of large-scale datasets. A cost analysis of this algorithm shows that our approach is insensitive to data skew and ensures perfect balancing properties during all stages of join computation. These performances have been confirmed by a series of experimentations.
Type de document :
Communication dans un congrès
ICCS, 2014, Cairns, Australia. Elsevier, 2014, Procedia Computer Science
Liste complète des métadonnées

https://hal.inria.fr/hal-00958116
Contributeur : Frédéric Loulergue <>
Soumis le : mardi 11 mars 2014 - 16:43:48
Dernière modification le : mardi 28 octobre 2014 - 18:20:47

Identifiants

  • HAL Id : hal-00958116, version 1

Collections

Citation

Mohamad Al Hajj Hassan, Mostafa Bamha, Frédéric Loulergue. Handling Data-skew Effects in Join Operations using MapReduce. ICCS, 2014, Cairns, Australia. Elsevier, 2014, Procedia Computer Science. 〈hal-00958116〉

Partager

Métriques

Consultations de la notice

165