Skip to Main content Skip to Navigation
New interface
Conference papers

Distributed High-Dimensional Index Creation using Hadoop, HDFS and C++

Gylfi Þór Guðmundsson 1 Laurent Amsaleg 1 Björn Þór Jónsson 2 
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : This paper presents an initial study where the creation of a high-dimensional index is made parallel and distributed by using the Hadoop framework. Early experimental results show substantial performance gains, despite the fact that the Hadoop framework is loosely coupled to the C++ based index creation. Two main lessons can be drawn from this work: (i)~it is key to invest time, energy and manpower to re-implement the code in Java in order to benefit from all the features of Hadoop---although our results are already impressive, even better performance gains will be observed if the index creation is re-implemented in Java; and (ii)~special care must be taken to account for the networking topology to prevent message exchanges from becoming the new bottleneck, when parallelism fixes the CPU bottleneck and HDFS the I/O bottleneck.
Document type :
Conference papers
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download
Contributor : Laurent Amsaleg Connect in order to contact the contributor
Submitted on : Thursday, December 13, 2012 - 9:19:59 AM
Last modification on : Thursday, January 20, 2022 - 4:18:39 PM
Long-term archiving on: : Thursday, March 14, 2013 - 3:46:21 AM


Files produced by the author(s)



Gylfi Þór Guðmundsson, Laurent Amsaleg, Björn Þór Jónsson. Distributed High-Dimensional Index Creation using Hadoop, HDFS and C++. CBMI - 10th Workshop on Content-Based Multimedia Indexing, Jun 2012, Annecy, France. ⟨10.1109/CBMI.2012.6269848⟩. ⟨hal-00764434⟩



Record views


Files downloads