Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce

Olivier Beaumont; Thomas Lambert; Loris Marchal; Bastien Thomas

Rapport (Rapport De Recherche) Année : 2017

Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce

(1) , (1) , (2, 3) , (4)

1
2
3
4

Olivier Beaumont

Fonction : Auteur
PersonId : 181224
IdHAL : olivier-beaumont
ORCID : 0000-0003-2741-6228
IdRef : 124577083

Reformulations based algorithms for Combinatorial Optimization

Thomas Lambert

Fonction : Auteur
PersonId : 967090
IdHAL : thomas-lambert

Reformulations based algorithms for Combinatorial Optimization

Loris Marchal

Fonction : Auteur
PersonId : 170697
IdHAL : loris-marchal
ORCID : 0000-0002-5519-9913
IdRef : 112112986

Optimisation des ressources : modèles, algorithmes et ordonnancement

Laboratoire de l'Informatique du Parallélisme

Bastien Thomas

Fonction : Auteur
PersonId : 991304

École normale supérieure - Rennes

Résumé

MapReduce is a well-know framework for distributing data-processing computations onto parallel clusters. In MapReduce, a large computation is broken into small tasks that run in parallel on multiple machines, and scales easily to very large clusters of inexpensive commodity computers. Before the Map phase, the original dataset is split into data chunks that are replicated (a constant number of times, usually 3) and distributed randomly onto computing nodes. During the Map phase, local tasks (i.e., tasks whose data chunks are stored locally) are assigned in priority when processors request tasks. In this paper, we provide the first complete theoretical analysis of data locality in the Map phase of MapReduce, and more generally, for bag-of-tasks applications that behave like MapReduce. We prove that if tasks are homogeneous (in terms of processing time), as soon as the replication factor is larger than 2, FindAssignment, a matching based algorithm, achieves a quasi-perfect makespan (i.e., optimal up to an additive constant of one step) using a sophisticated matching algorithm. Above result is proved with high probability when the number of tasks becomes arbitrarily large, and we therefore complement theoretical results with simulations that corroborate them even for small number of tasks. We also show that the matching-based approach leads to an improvement of data locality during the Map phase and therefore decreases the amount of communications needed to achieve perfect makespan, compared to the classical MapReduce greedy approach.

Mots clés

Resource Allocation and Scheduling Analysis of Randomized Algorithms Matchings MapReduce Balls-into-bins

Domaines

Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

paperRRinria (1).pdf (1.52 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Thomas Lambert : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01386539

Soumis le : mardi 24 octobre 2017-17:47:37

Dernière modification le : jeudi 4 avril 2024-03:07:49

Dates et versions

hal-01386539 , version 1 (24-10-2016)

hal-01386539 , version 2 (02-11-2016)

hal-01386539 , version 3 (10-02-2017)

hal-01386539 , version 4 (10-02-2017)

hal-01386539 , version 5 (24-10-2017)

Identifiants

HAL Id : hal-01386539 , version 5

Citer

Olivier Beaumont, Thomas Lambert, Loris Marchal, Bastien Thomas. Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce. [Research Report] RR-8968, Inria - Research Centre Grenoble – Rhône-Alpes; Inria Bordeaux Sud-Ouest. 2017. ⟨hal-01386539v5⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-LYON CNRS INRIA UNIV-LYON1 INRIA-RRRT IMB INRIA2 LARA UNIV-RENNES UDL ANR

565 Consultations

412 Téléchargements

Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager