Large-Scale High-Dimensional Clustering with Fast Sketching - Archive ouverte HAL Access content directly
Conference Papers Year :

Large-Scale High-Dimensional Clustering with Fast Sketching

(1) , (1) , (1)
1

Abstract

In this paper, we address the problem of high-dimensional k-means clustering in a large-scale setting, i.e. for datasets that comprise a large number of items. Sketching techniques have already been used to deal with this “large-scale” issue, by compressing the whole dataset into a single vector of random nonlinear generalized moments from which the k centroids are then retrieved efficiently. However , this approach usually scales quadratically with the dimension; to cope with high-dimensional datasets, we show how to use fast structured random matrices to compute the sketching operator efficiently. This yields significant speed-ups and memory savings for high-dimensional data, while the clustering results are shown to be much more stable, both on artificial and real datasets.
Fichier principal
Vignette du fichier
final_with_reviews.pdf (160.52 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01701121 , version 1 (05-02-2018)

Identifiers

Cite

Antoine Chatalic, Rémi Gribonval, Nicolas Keriven. Large-Scale High-Dimensional Clustering with Fast Sketching. ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Canada. pp.4714-4718, ⟨10.1109/ICASSP.2018.8461328⟩. ⟨hal-01701121⟩

Relations

569 View
717 Download

Altmetric

Share

Gmail Facebook Twitter LinkedIn More