Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2017

Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce

Résumé

MapReduce is a well-know framework for distributing data-processing computations onto parallel clusters. In MapReduce, a large computation is broken into small tasks that run in parallel on multiple machines, and scales easily to very large clusters of inexpensive commodity computers. Before the Map phase, the original dataset is split into data chunks that are replicated (a constant number of times, usually 3) and distributed randomly onto computing nodes. During the Map phase, local tasks (i.e., tasks whose data chunks are stored locally) are assigned in priority when processors request tasks. In this paper, we provide the first complete theoretical analysis of data locality in the Map phase of MapReduce, and more generally, for bag-of-tasks applications that behave like MapReduce. We prove that if tasks are homogeneous (in terms of processing time), as soon as the replication factor is larger than 2, FindAssignment, a matching based algorithm, achieves a quasi-perfect makespan (i.e., optimal up to an additive constant of one step) using a sophisticated matching algorithm. Above result is proved with high probability when the number of tasks becomes arbitrarily large, and we therefore complement theoretical results with simulations that corroborate them even for small number of tasks. We also show that the matching-based approach leads to an improvement of data locality during the Map phase and therefore decreases the amount of communications needed to achieve perfect makespan, compared to the classical MapReduce greedy approach.
Fichier principal
Vignette du fichier
RR-8968.pdf (1.01 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01386539 , version 1 (24-10-2016)
hal-01386539 , version 2 (02-11-2016)
hal-01386539 , version 3 (10-02-2017)
hal-01386539 , version 4 (10-02-2017)
hal-01386539 , version 5 (24-10-2017)

Identifiants

  • HAL Id : hal-01386539 , version 4

Citer

Olivier Beaumont, Thomas Lambert, Loris Marchal, Bastien Thomas. Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce. [Research Report] RR-8968, Inria - Research Centre Grenoble – Rhône-Alpes; Inria Bordeaux Sud-Ouest. 2017. ⟨hal-01386539v4⟩
565 Consultations
412 Téléchargements

Partager

Gmail Facebook X LinkedIn More