Distributed High-Dimensional Index Creation using Hadoop, HDFS and C++

Gylfi Þór Guðmundsson 1 Laurent Amsaleg 1 Björn Þór Jónsson 2
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : This paper presents an initial study where the creation of a high-dimensional index is made parallel and distributed by using the Hadoop framework. Early experimental results show substantial performance gains, despite the fact that the Hadoop framework is loosely coupled to the C++ based index creation. Two main lessons can be drawn from this work: (i)~it is key to invest time, energy and manpower to re-implement the code in Java in order to benefit from all the features of Hadoop---although our results are already impressive, even better performance gains will be observed if the index creation is re-implemented in Java; and (ii)~special care must be taken to account for the networking topology to prevent message exchanges from becoming the new bottleneck, when parallelism fixes the CPU bottleneck and HDFS the I/O bottleneck.
Document type :
Conference papers
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/hal-00764434
Contributor : Laurent Amsaleg <>
Submitted on : Thursday, December 13, 2012 - 9:19:59 AM
Last modification on : Friday, November 16, 2018 - 1:21:49 AM
Long-term archiving on : Thursday, March 14, 2013 - 3:46:21 AM

File

decp.pdf
Files produced by the author(s)

Identifiers

Citation

Gylfi Þór Guðmundsson, Laurent Amsaleg, Björn Þór Jónsson. Distributed High-Dimensional Index Creation using Hadoop, HDFS and C++. CBMI - 10th Workshop on Content-Based Multimedia Indexing, Jun 2012, Annecy, France. ⟨10.1109/CBMI.2012.6269848⟩. ⟨hal-00764434⟩

Share

Metrics

Record views

557

Files downloads

342