A Large-Scale Performance Study of Cluster-Based High-Dimensional Indexing

Gylfi Thór Gudmundsson 1 Björn Þór Jónsson 2 Laurent Amsaleg 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : High-dimensional clustering is a method that is used by some content-based image retrieval systems to partition the data into groups; the groups (clusters) are then indexed to accelerate the processing of queries. Recently, the Cluster Pruning approach was proposed as a very simple way to efficiently and effectively produce such clusters. While the original evaluation of the algorithm was performed within a text indexing context at a rather small scale, its simplicity and performance motivated us to study its behavior in an image indexing context at a much larger scale. We experiment with two collections of 72-dimensional state-of-the-art local descriptors, the larger collection containing 189 million descriptors. This paper summarizes the results of this study and shows that while the basic algorithm works fairly well, three extensions can dramatically improve its performance and scalability, accelerating both query processing and the construction of clusters, making Cluster Pruning a promising basis for building large-scale systems that require a clustering algorithm.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/inria-00560972
Contributor : Patrick Gros <>
Submitted on : Monday, January 31, 2011 - 1:43:45 PM
Last modification on : Friday, November 16, 2018 - 1:22:08 AM

Identifiers

Citation

Gylfi Thór Gudmundsson, Björn Þór Jónsson, Laurent Amsaleg. A Large-Scale Performance Study of Cluster-Based High-Dimensional Indexing. 18th ACM International Conference on Multimedia - Workshop on Very-Large-Scale Multimedia Corpus, Mining and Retrieval, ACM, Oct 2010, Florence, Italy. ⟨10.1145/1878137.1878145⟩. ⟨inria-00560972⟩

Share

Metrics

Record views

147