Indexing and Searching 100M Images with Map-Reduce

Diana Moise 1 Denis Shestakov 1 Gylfi Thór Gudmundsson 1 Laurent Amsaleg 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Most researchers working on high-dimensional indexing agree on the following three trends: (i) the size of the multimedia collections to index are now reaching millions if not billions of items, (ii) the computers we use every day now come with multiple cores and (iii) hardware becomes more available, thanks to easier access to Grids and/or Clouds. This paper shows how the Map-Reduce paradigm can be applied to indexing algorithms and demonstrates that great scalability can be achieved using Hadoop, a popular Map-Reduce-based framework. Dramatic performance improvements are not however guaranteed a priori: such frameworks are rigid, they severely constrain the possible access patterns to data and scares resource RAM has to be shared. Furthermore, algorithms require major redesign, and may have to settle for sub-optimal behavior. The benefits, however, are many: simplicity for programmers, automatic distribution, fault tolerance, failure detection and automatic re-runs and, last but not least, scalability. We share our experience of adapting a clustering-based high-dimensional indexing algorithm to the Map-Reduce model, and of testing it at large scale with Hadoop as we index 30 billion SIFT descriptors. We foresee that lessons drawn from our work could minimize time, effort and energy invested by other researchers and practitioners working in similar directions.
Document type :
Conference papers
Complete list of metadatas

Cited literature [23 references]  Display  Hide  Download

https://hal.inria.fr/hal-00796475
Contributor : Laurent Amsaleg <>
Submitted on : Monday, March 4, 2013 - 11:39:25 AM
Last modification on : Friday, November 16, 2018 - 1:21:45 AM
Long-term archiving on : Wednesday, June 5, 2013 - 3:56:43 AM

File

icmr115-moise.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00796475, version 1

Citation

Diana Moise, Denis Shestakov, Gylfi Thór Gudmundsson, Laurent Amsaleg. Indexing and Searching 100M Images with Map-Reduce. ACM International Conference on Multimedia Retrieval, Apr 2013, Dallas, United States. ⟨hal-00796475⟩

Share

Metrics

Record views

587

Files downloads

7365