Self-configuration of the Number of Concurrently Running MapReduce Jobs in a Hadoop Cluster

There is a trade-off between the number of concurrently running MapReduce jobs and their corresponding map and reduce tasks within a node in a Hadoop cluster. Leaving this trade-off statically configured to a single value can significantly reduce job response times leaving only suboptimal resource usage. To overcome this problem, we propose a feedback control loop based approach that dynamically adjusts the Hadoop resource manager configuration based on the current state of the cluster. The preliminary assessment based on workloads synthesized from real-world traces shows that the system performance can be improved by about 30% compared to default Hadoop setup.

Mots clés

Hadoop Cluster MapReduce Performance Optimization

Domaines

Informatique [cs] Recherche d'information [cs.IR]

Fichier principal

icac15-paper.pdf (264.52 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Bo ZHANG : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01143157

Soumis le : mercredi 6 mai 2015-11:24:44

Dernière modification le : lundi 15 avril 2024-11:25:23

Archivage à long terme le : mercredi 19 avril 2017-18:26:20

Dates et versions

hal-01143157 , version 1 (06-05-2015)

Identifiants

HAL Id : hal-01143157 , version 1

Citer

Bo Zhang, Filip Křikava, Romain Rouvoy, Lionel Seinturier. Self-configuration of the Number of Concurrently Running MapReduce Jobs in a Hadoop Cluster. ICAC 2015, Jul 2015, Grenoble, France. pp.149-150. ⟨hal-01143157⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-SPIRALS UNIV-LILLE

360 Consultations

280 Téléchargements