Web Data Indexing in the Cloud: Efficiency and Cost Reductions

Jesús Camacho-Rodríguez 1, 2 Dario Colazzo 2, 1 Ioana Manolescu 2, 1
2 OAK - Database optimizations and architectures for complex large data
CNRS - Centre National de la Recherche Scientifique : UMR8623, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique
Abstract : An increasing part of the world's data is either shared through the Web or directly produced through and for Web platforms, in particular using structured formats like XML or JSON. Cloud platforms are interesting candidates to handle large data repositories, due to their elastic scaling properties. Popular commercial clouds provide a variety of sub-systems and primitives for storing data in specific formats (files, key-value pairs etc.) as well as dedicated sub-systems for running and coordinating execution within the cloud. We propose an architecture for warehousing large-scale Web data, in particular XML, in a commercial cloud platform, specifically, Amazon Web Services. Since cloud users support monetary costs directly connected to their consumption of cloud resources, we focus on indexing content in the cloud. We study the applicability of several indexing strategies, and show that they lead not only to reducing query evaluation time, but also, importantly, to reducing the monetary costs associated with the exploitation of the cloud-based warehouse. Our architecture can be easily adapted to similar cloud-based complex data warehousing settings, carrying over the benefits of access path selection in the cloud.
Type de document :
Communication dans un congrès
EDBT - International Conference on Extending Database Technology, Mar 2013, Genoa, Italy. 2013
Liste complète des métadonnées

Littérature citée [24 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00803597
Contributeur : Jesús Camacho-Rodríguez <>
Soumis le : vendredi 22 mars 2013 - 12:53:11
Dernière modification le : lundi 28 mai 2018 - 14:38:02
Document(s) archivé(s) le : dimanche 2 avril 2017 - 18:21:50

Fichier

edbt2013.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00803597, version 1

Collections

Citation

Jesús Camacho-Rodríguez, Dario Colazzo, Ioana Manolescu. Web Data Indexing in the Cloud: Efficiency and Cost Reductions. EDBT - International Conference on Extending Database Technology, Mar 2013, Genoa, Italy. 2013. 〈hal-00803597〉

Partager

Métriques

Consultations de la notice

376

Téléchargements de fichiers

197