Benchmarking SQL on MapReduce systems using large astronomy databases

Abstract : In the era of bigdata, with a massive set of digital information of unprecedented volumes being collected and/or produced in several application domains , it becomes more and more difficult to manage and query large data repositories. In the framework of the PetaSky project (http://com.isima.fr/Petasky), we focus on the problem of managing scientific data in the field of cosmology. The data we consider are those of the LSST project (http://www.lsst.org/). The overall size of the database that will be produced is expected to exceed 60 PB [28]. In order to evaluate the performances of existing SQL On MapReduce data management systems, we conducted extensive experiments by using data and queries from the area of cosmology. The goal of this work is to report on the ability of such systems to support large scale declarative queries. We mainly investigated the impact of data partitioning, indexing and compression on query execution performances.
Type de document :
Article dans une revue
Distributed and Parallel Databases, Springer, 2015, 〈10.1007/s10619-014-7172-8〉
Liste complète des métadonnées

Littérature citée [31 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01221665
Contributeur : Amin Mesmoudi <>
Soumis le : mercredi 28 octobre 2015 - 13:32:26
Dernière modification le : jeudi 19 avril 2018 - 14:38:05
Document(s) archivé(s) le : vendredi 28 avril 2017 - 05:49:48

Fichier

bench_sql_mapr.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Amin Mesmoudi, Mohand-Saïd Hacid, Farouk Toumani. Benchmarking SQL on MapReduce systems using large astronomy databases. Distributed and Parallel Databases, Springer, 2015, 〈10.1007/s10619-014-7172-8〉. 〈hal-01221665〉

Partager

Métriques

Consultations de la notice

412

Téléchargements de fichiers

351