Intentional Data Placement Policy for Improving OLAP Cube Construction on Hadoop Clusters

Billel Arres 1, 2 Nadia Kabachi 1, 2 Omar Boussaid 1, 2
2 SID
ERIC - Equipe de Recherche en Ingénierie des Connaissances
Abstract : In the recent past, we have witnessed dramatic increases in the volume of data literally in every area: business, science, and daily life to name a few. The Hadoop framework-an open source project based on the MapReduce paradigm-is a popular choice for big data analytics. However, the performance gained from Hadoop's features is currently limited by its default block placement policy, which does not take any data characteristics into account. Indeed, the efficiency of many operations can be improved by a careful data placement , including indexing, grouping, aggregation and joins. In our work we propose a data warehouse partitioning strategy to improve query gain performances. We investigate the performance gain for OLAP cube construction with and without data organization on a Hadoop cluster. And this, by varying the number of nodes and data warehouse size. Our experiments suggest that a good data placement on a cluster during the implementation of the data warehouse can significantly increase the OLAP cube construction and querying performances. In the next step, we will extend the experiments to study the effects of other configuration parameters on collocation data in the context of parallel data warehousing, such as partitions size, replication factor and OLAP query complexity. We plan also to study an intelligent system for warehouses data placement on clusters by integrating Multi-Agent System (MAS) and Intelligent Agents to the process.
Type de document :
Communication dans un congrès
BDA 2014 : Gestion de données - principes, technologies et applications, Oct 2014, Autrans, France
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01169934
Contributeur : David Gross-Amblard <>
Soumis le : mardi 30 juin 2015 - 15:04:37
Dernière modification le : mercredi 13 janvier 2016 - 10:08:02
Document(s) archivé(s) le : mardi 25 avril 2017 - 20:25:55

Fichier

bda2014-actes-phd-2-pp21-23.pd...
Fichiers éditeurs autorisés sur une archive ouverte

Licence


Distributed under a Creative Commons Paternité - Pas d'utilisation commerciale - Pas de modification 4.0 International License

Identifiants

  • HAL Id : hal-01169934, version 1

Collections

Citation

Billel Arres, Nadia Kabachi, Omar Boussaid. Intentional Data Placement Policy for Improving OLAP Cube Construction on Hadoop Clusters. BDA 2014 : Gestion de données - principes, technologies et applications, Oct 2014, Autrans, France. 〈hal-01169934〉

Partager

Métriques

Consultations de la notice

108

Téléchargements de fichiers

229