Scheduling/Data Management Heuristics

Frédéric Desprez; Sylvain Gault; Frédéric Suter

Pré-Publication, Document De Travail Année : 2012

Scheduling/Data Management Heuristics

(1) , (1) , (2, 1)

1
2

Frédéric Desprez

Fonction : Auteur
PersonId : 6600
IdHAL : frederic-desprez
IdRef : 034430563

Algorithms and Software Architectures for Distributed and HPC Platforms

Sylvain Gault

Fonction : Auteur
PersonId : 933451

Algorithms and Software Architectures for Distributed and HPC Platforms

Frédéric Suter

Fonction : Auteur
PersonId : 739871
IdHAL : frederic-suter
ORCID : 0000-0003-1902-1955
IdRef : 078831962

Centre de Calcul de l'IN2P3

Algorithms and Software Architectures for Distributed and HPC Platforms

Résumé

Data volume produced by scientific applications increase at a high speed. Some are expected to produce several petabyte per year. In order to process this amount of data, the computing power of several hundreds or thousands of machines have to be used at the same time. Regarding this, one of the biggest challenge is: how to program these machines in order to make them to collaborate for the same computation? One answer brought by Google is the MapReduce paradigm. MapReduce has the advantage of being quite simple to program for the user and handle on its own the repetitive or complex tasks like the data transfers between nodes, task scheduling or handling node failure. These automatic tasks have to be handled in an optimized way in order to make the framework fast and scalable. This report presents our first studies towards an efficient scheduling of MapReduce operations. More specifically, we focused on the scheduling of the data transfers together with the tasks. We present here an interesting work around this topic and our algorithm which improves their results.

Domaines

Calcul parallèle, distribué et partagé [cs.DC]

Fichier principal

MapReduce-D3.1.pdf (839.84 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Sylvain Gault : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00759546

Soumis le : lundi 3 décembre 2012-15:34:07

Dernière modification le : jeudi 11 avril 2024-13:18:11

Archivage à long terme le : lundi 4 mars 2013-03:00:09

Dates et versions

hal-00759546 , version 1 (03-12-2012)

Identifiants

HAL Id : hal-00759546 , version 1

Citer

Frédéric Desprez, Sylvain Gault, Frédéric Suter. Scheduling/Data Management Heuristics. 2012. ⟨hal-00759546⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

IN2P3 ENS-LYON CNRS INRIA UNIV-LYON1 INRIA2 UDL ANR CC-IN2P3

178 Consultations

74 Téléchargements

Scheduling/Data Management Heuristics

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager