HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Web-scale image clustering revisited

Abstract : Large scale duplicate detection, clustering and mining of documents or images has been conventionally treated with seed detection via hashing, followed by seed growing heuristics using fast search. Principled clustering methods , especially kernelized and spectral ones, have higher complexity and are difficult to scale above millions. Under the assumption of documents or images embedded in Eu-clidean space, we revisit recent advances in approximate k-means variants, and borrow their best ingredients to introduce a new one, inverted-quantized k-means (IQ-means). Key underlying concepts are quantization of data points and multi-index based inverted search from centroids to cells. Its quantization is a form of hashing and analogous to seed detection, while its updates are analogous to seed growing, yet principled in the sense of distortion minimization. We further design a dynamic variant that is able to determine the number of clusters k in a single run at nearly zero additional cost. Combined with powerful deep learned representations , we achieve clustering of a 100 million image collection on a single machine in less than one hour.
Complete list of metadata

Cited literature [39 references]  Display  Hide  Download

Contributor : Ioannis Emiris Connect in order to contact the contributor
Submitted on : Wednesday, January 23, 2019 - 12:21:19 PM
Last modification on : Friday, April 8, 2022 - 4:08:03 PM
Long-term archiving on: : Wednesday, April 24, 2019 - 1:39:21 PM


Files produced by the author(s)


  • HAL Id : hal-01990662, version 1


Ioannis Z. Emiris, Yannis Avrithis, Yannis Kalantidis, Evangelos Anagnostopoulos. Web-scale image clustering revisited. ICCV 2015 - International Conference on Computer Vision, Dec 2015, Santiago, Chile. ⟨hal-01990662⟩



Record views


Files downloads