Distributed High-Dimensional Index Creation using Hadoop, HDFS and C++ - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

Distributed High-Dimensional Index Creation using Hadoop, HDFS and C++

Gylfi Þór Guðmundsson
  • Fonction : Auteur
  • PersonId : 890913
Laurent Amsaleg

Résumé

This paper presents an initial study where the creation of a high-dimensional index is made parallel and distributed by using the Hadoop framework. Early experimental results show substantial performance gains, despite the fact that the Hadoop framework is loosely coupled to the C++ based index creation. Two main lessons can be drawn from this work: (i)~it is key to invest time, energy and manpower to re-implement the code in Java in order to benefit from all the features of Hadoop---although our results are already impressive, even better performance gains will be observed if the index creation is re-implemented in Java; and (ii)~special care must be taken to account for the networking topology to prevent message exchanges from becoming the new bottleneck, when parallelism fixes the CPU bottleneck and HDFS the I/O bottleneck.
Fichier principal
Vignette du fichier
decp.pdf (77.08 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00764434 , version 1 (13-12-2012)

Identifiants

Citer

Gylfi Þór Guðmundsson, Laurent Amsaleg, Björn Þór Jónsson. Distributed High-Dimensional Index Creation using Hadoop, HDFS and C++. CBMI - 10th Workshop on Content-Based Multimedia Indexing, Jun 2012, Annecy, France. ⟨10.1109/CBMI.2012.6269848⟩. ⟨hal-00764434⟩
359 Consultations
330 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More