hal-00684866, version 1
Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures
Gabriel Antoniu
1Julien Bigot
d, 2Christophe Blanchet 3Luc Bougé
a, 1François Briant 4Franck Cappello 5, 6, 7Alexandru Costan
1Frédéric Desprez
d, 2Gilles Fedak
d, 2Sylvain Gault d, 2Kate Keahey
b, 8Bogdan Nicolae
d, 5, 6Christian Pérez
d, 2Anthony Simonet d, 2Frédéric Suter
c, 9Bing Tang d, 2Raphael Terreux 3
1st International IBM Cloud Academy Conference - ICA CON 2012 (2012)
Résumé : As Map-Reduce emerges as a leading programming paradigm for data-intensive computing, today's frameworks which support it still have substantial shortcomings that limit its potential scalability. In this paper we discuss several directions where there is room for such progress: they concern storage efficiency under massive data access concurrency, scheduling, volatility and fault-tolerance. We place our discussion in the perspective of the current evolution towards an increasing integration of large-scale distributed platforms (clouds, cloud federations, enterprise desktop grids, etc.). We propose an approach which aims to overcome the current limitations of existing Map-Reduce frameworks, in order to achieve scalable, concurrency-optimized, fault-tolerant Map-Reduce data processing on hybrid infrastructures. This approach will be evaluated with real-life bio-informatics applications on existing Nimbus-powered cloud testbeds interconnected with desktop grids.
- a – École normale supérieure de Cachan - ENS Cachan
- b – Argonne National Laboratory
- c – CNRS
- d – INRIA
- 1 : KerData (INRIA - IRISA)
- INRIA – CNRS : UMR6074 – École normale supérieure de Cachan - ENS Cachan – Institut National des Sciences Appliquées (INSA) - Rennes – Université de Rennes 1
- 2 : AVALON (LIP Lyon / Inria Grenoble Rhône-Alpes)
- CNRS : UMR5668 – INRIA – École Normale Supérieure - Lyon – Laboratoire d'informatique du Parallélisme – Université Claude Bernard - Lyon I
- 3 : Institut de biologie et chimie des protéines [Lyon] (IBCP)
- CNRS : UMR5086 – Université Claude Bernard - Lyon I
- 4 : IBM PSSC Montpellier - Innovation Lab.
- IBM PSSC Montpellier
- 5 : GRAND-LARGE (INRIA Saclay - Ile de France)
- INRIA – CNRS : UMR8623 – Université Paris XI - Paris Sud
- 6 : Joint Laboratory for Petascale Computing [Illinois] (JLPC)
- University of Illinois at Urbana-Champaign – INRIA
- 7 : Laboratoire de Recherche en Informatique (LRI)
- CNRS : UMR8623 – Université Paris XI - Paris Sud
- 8 : Argonne National Laboratory (ANL)
- US Department of Energy – University of Chicago
- 9 : Centre de Calcul de l'inst. national de phy. nucléaire et de phy. des particules (CC IN2P3)
- CNRS : USR6402 – IN2P3
- Domaine : Informatique/Calcul parallèle, distribué et partagé
- Mots-clés : MapReduce – cloud computing – data-intensive computing – hybrid infrastructures – BlobSeer – BitDew – Nimbus – HLCM – Grid'5000
- hal-00684866, version 1
- http://hal.inria.fr/hal-00684866
- oai:hal.inria.fr:hal-00684866
- Contributeur : Gabriel Antoniu
- Soumis le : Vendredi 20 Avril 2012, 11:43:30
- Dernière modification le : Jeudi 6 Septembre 2012, 16:27:27






Documents associés
Exporter