Skip to Main content Skip to Navigation
New interface
Reports (Research report)

A Large-Scale Performance Study of Cluster-Based High-Dimensional Indexing

Gylfi Thór Gudmundsson 1 Björn Thór Jónsson 1 Laurent Amsaleg 2 
2 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : High-dimensional clustering is a method that is used by some content-based image retrieval systems to partition the data into groups; the groups (clusters) are then indexed to accelerate the processing of queries. Recently, the Cluster Pruning approach was proposed as a very simple way to efficiently and effectively produce such clusters. While the original evaluation of the algorithm was performed within a text indexing context at a rather small scale, its simplicity and performance motivated us to study its behavior in an image indexing context at a much larger scale. We experiment with two collections of 72-dimensional state-of-the-art local descriptors, the larger collection containing 189 million descriptors. This paper summarizes the results of this study and shows that while the basic algorithm works fairly well, three extensions can dramatically improve its performance and scalability, accelerating both query processing and the construction of clusters, making Cluster Pruning a promising basis for building large-scale systems that require a clustering algorithm.
Document type :
Reports (Research report)
Complete list of metadata
Contributor : Laurent Amsaleg Connect in order to contact the contributor
Submitted on : Tuesday, December 7, 2010 - 7:00:06 AM
Last modification on : Wednesday, October 26, 2022 - 8:16:46 AM
Long-term archiving on: : Thursday, December 1, 2016 - 7:32:27 AM


Files produced by the author(s)


  • HAL Id : inria-00489816, version 1


Gylfi Thór Gudmundsson, Björn Thór Jónsson, Laurent Amsaleg. A Large-Scale Performance Study of Cluster-Based High-Dimensional Indexing. [Research Report] RR-7307, INRIA. 2010. ⟨inria-00489816⟩



Record views


Files downloads