hal-00684866, version 1
Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures

1
d, 2 3
a, 1 4 5, 6, 7
1
d, 2
d, 2 d, 2
b, 8
d, 5, 6
d, 2 d, 2
c, 9 d, 2 3
1st International IBM Cloud Academy Conference - ICA CON 2012 (2012)
Résumé : As Map-Reduce emerges as a leading programming paradigm for data-intensive computing, today's frameworks which support it still have substantial shortcomings that limit its potential scalability. In this paper we discuss several directions where there is room for such progress: they concern storage efficiency under massive data access concurrency, scheduling, volatility and fault-tolerance. We place our discussion in the perspective of the current evolution towards an increasing integration of large-scale distributed platforms (clouds, cloud federations, enterprise desktop grids, etc.). We propose an approach which aims to overcome the current limitations of existing Map-Reduce frameworks, in order to achieve scalable, concurrency-optimized, fault-tolerant Map-Reduce data processing on hybrid infrastructures. This approach will be evaluated with real-life bio-informatics applications on existing Nimbus-powered cloud testbeds interconnected with desktop grids.
- a – École normale supérieure de Cachan - ENS Cachan
- b – Argonne National Laboratory
- c – CNRS
- d – INRIA
- 1 :
- INRIA – CNRS : UMR6074 – École normale supérieure de Cachan - ENS Cachan – Institut National des Sciences Appliquées (INSA) - Rennes – Université de Rennes 1
- 2 :
- CNRS : UMR5668 – INRIA – École Normale Supérieure - Lyon – Laboratoire d'informatique du Parallélisme – Université Claude Bernard - Lyon I
- 3 :
- CNRS : UMR5086 – Université Claude Bernard - Lyon I
- 4 :
- IBM PSSC Montpellier
- 5 :
- INRIA – CNRS : UMR8623 – Université Paris XI - Paris Sud
- 6 :
- University of Illinois at Urbana-Champaign – INRIA
- 7 :
- CNRS : UMR8623 – Université Paris XI - Paris Sud
- 8 :
- US Department of Energy – University of Chicago
- 9 :
- CNRS : USR6402 – IN2P3
- Domaine : Informatique/Calcul parallèle, distribué et partagé
- Mots-clés : MapReduce – cloud computing – data-intensive computing – hybrid infrastructures – BlobSeer – BitDew – Nimbus – HLCM – Grid'5000
- hal-00684866, version 1
- http://hal.inria.fr/hal-00684866
- oai:hal.inria.fr:hal-00684866
- Contributeur :
- Soumis le : Vendredi 20 Avril 2012, 11:43:30
- Dernière modification le : Jeudi 6 Septembre 2012, 16:27:27



Documents associés
Exporter