Matching-Based Allocation Strategies for Improving Data Locality of Map Tasks in MapReduce - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2016

Matching-Based Allocation Strategies for Improving Data Locality of Map Tasks in MapReduce

Résumé

MapReduce is a well-know framework for distributing data-processing computations on parallel clusters. In MapReduce, a large computation is broken into small tasks that run in parallel on multiple machines, and scales easily to very large clusters of inexpensive commodity computers. Before the Map phase, the original dataset is first split into chunks, that are replicated (a constant number of times, usually 3) and distributed onto the computing nodes. During the Map phase, nodes request tasks and are allocated first tasks associated to local chunks (if any). Communications take place when requesting nodes do not hold any local chunk anymore. In this paper, we provide the first complete theoretical data locality analysis of the Map phase of MapReduce, and more generally, for bag-of-tasks applications that behaves like MapReduce. We show that if tasks are homogeneous (in term of processing time), once the chunks have been replicated randomly on resources with a replication factor larger than 2, it is possible to find a priority mechanism for tasks that achieves a quasi-perfect number of communications using a sophisticated matching algorithm. In the more realistic case of heterogeneous processing times, we prove using an actual trace of a MapReduce server that this priority mechanism enables to complete the Map phase with significantly fewer communications, even on realistic distributions of task durations.
Fichier principal
Vignette du fichier
paperRR.pdf (653.93 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01386539 , version 1 (24-10-2016)
hal-01386539 , version 2 (02-11-2016)
hal-01386539 , version 3 (10-02-2017)
hal-01386539 , version 4 (10-02-2017)
hal-01386539 , version 5 (24-10-2017)

Identifiants

  • HAL Id : hal-01386539 , version 1

Citer

Olivier Beaumont, Thomas Lambert, Loris Marchal, Bastien Thomas. Matching-Based Allocation Strategies for Improving Data Locality of Map Tasks in MapReduce. 2016. ⟨hal-01386539v1⟩
564 Consultations
412 Téléchargements

Partager

Gmail Facebook X LinkedIn More