Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce

Olivier Beaumont 1 Thomas Lambert 1 Loris Marchal 2, 3 Bastien Thomas 4
1 Realopt - Reformulations based algorithms for Combinatorial Optimization
LaBRI - Laboratoire Bordelais de Recherche en Informatique, IMB - Institut de Mathématiques de Bordeaux, Inria Bordeaux - Sud-Ouest
2 ROMA - Optimisation des ressources : modèles, algorithmes et ordonnancement
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
Abstract : MapReduce is a well-know framework for distributing data-processing computations onto parallel clusters. In MapReduce, a large computation is broken into small tasks that run in parallel on multiple machines, and scales easily to very large clusters of inexpensive commodity computers. Before the Map phase, the original dataset is split into data chunks that are replicated (a constant number of times, usually 3) and distributed randomly onto computing nodes. During the Map phase, local tasks (i.e., tasks whose data chunks are stored locally) are assigned in priority when processors request tasks. In this paper, we provide the first complete theoretical analysis of data locality in the Map phase of MapReduce, and more generally, for bag-of-tasks applications that behave like MapReduce. We prove that if tasks are homogeneous (in terms of processing time), as soon as the replication factor is larger than 2, FindAssignment, a matching based algorithm, achieves a quasi-perfect makespan (i.e., optimal up to an additive constant of one step) using a sophisticated matching algorithm. Above result is proved with high probability when the number of tasks becomes arbitrarily large, and we therefore complement theoretical results with simulations that corroborate them even for small number of tasks. We also show that the matching-based approach leads to an improvement of data locality during the Map phase and therefore decreases the amount of communications needed to achieve perfect makespan, compared to the classical MapReduce greedy approach.
Type de document :
Rapport
[Research Report] RR-8968, Inria - Research Centre Grenoble – Rhône-Alpes; Inria Bordeaux Sud-Ouest. 2017
Liste complète des métadonnées

Littérature citée [34 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01386539
Contributeur : Thomas Lambert <>
Soumis le : mardi 24 octobre 2017 - 17:47:37
Dernière modification le : vendredi 20 avril 2018 - 15:44:27

Fichier

paperRRinria (1).pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01386539, version 5

Citation

Olivier Beaumont, Thomas Lambert, Loris Marchal, Bastien Thomas. Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce. [Research Report] RR-8968, Inria - Research Centre Grenoble – Rhône-Alpes; Inria Bordeaux Sud-Ouest. 2017. 〈hal-01386539v5〉

Partager

Métriques

Consultations de la notice

171

Téléchargements de fichiers

60