Large-Scale High-Dimensional Clustering with Fast Sketching

Antoine Chatalic 1 Rémi Gribonval 1 Nicolas Keriven 1
1 PANAMA - Parcimonie et Nouveaux Algorithmes pour le Signal et la Modélisation Audio
Inria Rennes – Bretagne Atlantique , IRISA-D5 - SIGNAUX ET IMAGES NUMÉRIQUES, ROBOTIQUE
Abstract : In this paper, we address the problem of high-dimensional k-means clustering in a large-scale setting, i.e. for datasets that comprise a large number of items. Sketching techniques have already been used to deal with this “large-scale” issue, by compressing the whole dataset into a single vector of random nonlinear generalized moments from which the k centroids are then retrieved efficiently. However , this approach usually scales quadratically with the dimension; to cope with high-dimensional datasets, we show how to use fast structured random matrices to compute the sketching operator efficiently. This yields significant speed-ups and memory savings for high-dimensional data, while the clustering results are shown to be much more stable, both on artificial and real datasets.
Complete list of metadatas

Cited literature [33 references]  Display  Hide  Download

https://hal.inria.fr/hal-01701121
Contributor : Antoine Chatalic <>
Submitted on : Monday, February 5, 2018 - 3:52:59 PM
Last modification on : Friday, September 13, 2019 - 9:50:02 AM
Long-term archiving on : Saturday, May 5, 2018 - 2:26:49 AM

File

final_with_reviews.pdf
Files produced by the author(s)

Identifiers

Citation

Antoine Chatalic, Rémi Gribonval, Nicolas Keriven. Large-Scale High-Dimensional Clustering with Fast Sketching. ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Canada. pp.4714-4718, ⟨10.1109/ICASSP.2018.8461328⟩. ⟨hal-01701121⟩

Share

Metrics

Record views

1345

Files downloads

533