28577 articles – 22061 Notices  [english version]

hal-00684866, version 1

Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures

Gabriel Antoniu (, http://www.irisa.fr/kerdata/doku.php?id=people:gabriel.antoniu) 1, Julien Bigot () d2, Christophe Blanchet 3, Luc Bougé () a1, François Briant 4, Franck Cappello 567, Alexandru Costan () 1, Frédéric Desprez () d2, Gilles Fedak () d2, Sylvain Gault d2, Kate Keahey () b8, Bogdan Nicolae () d56, Christian Pérez () d2, Anthony Simonet d2, Frédéric Suter () c9, Bing Tang d2, Raphael Terreux 3

1st International IBM Cloud Academy Conference - ICA CON 2012 (2012)

Résumé : As Map-Reduce emerges as a leading programming paradigm for data-intensive computing, today's frameworks which support it still have substantial shortcomings that limit its potential scalability. In this paper we discuss several directions where there is room for such progress: they concern storage efficiency under massive data access concurrency, scheduling, volatility and fault-tolerance. We place our discussion in the perspective of the current evolution towards an increasing integration of large-scale distributed platforms (clouds, cloud federations, enterprise desktop grids, etc.). We propose an approach which aims to overcome the current limitations of existing Map-Reduce frameworks, in order to achieve scalable, concurrency-optimized, fault-tolerant Map-Reduce data processing on hybrid infrastructures. This approach will be evaluated with real-life bio-informatics applications on existing Nimbus-powered cloud testbeds interconnected with desktop grids.

  • a –  École normale supérieure de Cachan - ENS Cachan
  • b –  Argonne National Laboratory
  • c –  CNRS
  • d –  INRIA
  • 1 :  KerData (INRIA - IRISA)
  • INRIA – CNRS : UMR6074 – École normale supérieure de Cachan - ENS Cachan – Institut National des Sciences Appliquées (INSA) - Rennes – Université de Rennes 1
  • 2 :  AVALON (LIP Lyon / Inria Grenoble Rhône-Alpes)
  • CNRS : UMR5668 – INRIA – École Normale Supérieure - Lyon – Laboratoire d'informatique du Parallélisme – Université Claude Bernard - Lyon I
  • 3 :  Institut de biologie et chimie des protéines [Lyon] (IBCP)
  • CNRS : UMR5086 – Université Claude Bernard - Lyon I
  • 4 :  IBM PSSC Montpellier - Innovation Lab.
  • IBM PSSC Montpellier
  • 5 :  GRAND-LARGE (INRIA Saclay - Ile de France)
  • INRIA – CNRS : UMR8623 – Université Paris XI - Paris Sud
  • 6 :  Joint Laboratory for Petascale Computing [Illinois] (JLPC)
  • University of Illinois at Urbana-Champaign – INRIA
  • 7 :  Laboratoire de Recherche en Informatique (LRI)
  • CNRS : UMR8623 – Université Paris XI - Paris Sud
  • 8 :  Argonne National Laboratory (ANL)
  • US Department of Energy – University of Chicago
  • 9 :  Centre de Calcul de l'inst. national de phy. nucléaire et de phy. des particules (CC IN2P3)
  • CNRS : USR6402 – IN2P3
  • Domaine : Informatique/Calcul parallèle, distribué et partagé
  • Mots-clés : MapReduce – cloud computing – data-intensive computing – hybrid infrastructures – BlobSeer – BitDew – Nimbus – HLCM – Grid'5000
 
  • hal-00684866, version 1
  • oai:hal.inria.fr:hal-00684866
  • Contributeur : 
  • Soumis le : Vendredi 20 Avril 2012, 11:43:30
  • Dernière modification le : Jeudi 6 Septembre 2012, 16:27:27