Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce

Olivier Beaumont 1 Thomas Lambert 1 Loris Marchal 2, 3 Bastien Thomas 4
1 Realopt - Reformulations based algorithms for Combinatorial Optimization
LaBRI - Laboratoire Bordelais de Recherche en Informatique, IMB - Institut de Mathématiques de Bordeaux, Inria Bordeaux - Sud-Ouest
2 ROMA - Optimisation des ressources : modèles, algorithmes et ordonnancement
Inria Grenoble - Rhône-Alpes, LIP - Laboratoire de l'Informatique du Parallélisme
Abstract : MapReduce is a well-know framework for distributing data-processing computations onto parallel clusters. In MapReduce, a large computation is broken into small tasks that run in parallel on multiple machines, and scales easily to very large clusters of inexpensive commodity computers. Before the Map phase, the original dataset is split into data chunks that are replicated (a constant number of times, usually 3) and distributed randomly onto computing nodes. During the Map phase, local tasks (i.e., tasks whose data chunks are stored locally) are assigned in priority when processors request tasks. In this paper, we provide the first complete theoretical analysis of data locality in the Map phase of MapReduce, and more generally, for bag-of-tasks applications that behave like MapReduce. We prove that if tasks are homogeneous (in terms of processing time), as soon as the replication factor is larger than 2, FindAssignment, a matching based algorithm, achieves a quasi-perfect makespan (i.e., optimal up to an additive constant of one step) using a sophisticated matching algorithm. Above result is proved with high probability when the number of tasks becomes arbitrarily large, and we therefore complement theoretical results with simulations that corroborate them even for small number of tasks. We also show that the matching-based approach leads to an improvement of data locality during the Map phase and therefore decreases the amount of communications needed to achieve perfect makespan, compared to the classical MapReduce greedy approach.
Complete list of metadatas

Cited literature [34 references]  Display  Hide  Download

https://hal.inria.fr/hal-01386539
Contributor : Thomas Lambert <>
Submitted on : Tuesday, October 24, 2017 - 5:47:37 PM
Last modification on : Friday, December 14, 2018 - 2:39:35 PM

File

paperRRinria (1).pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01386539, version 5

Citation

Olivier Beaumont, Thomas Lambert, Loris Marchal, Bastien Thomas. Matching-Based Assignement Strategies for Improving Data Locality of Map Tasks in MapReduce. [Research Report] RR-8968, Inria - Research Centre Grenoble – Rhône-Alpes; Inria Bordeaux Sud-Ouest. 2017. ⟨hal-01386539v5⟩

Share

Metrics

Record views

266

Files downloads

193