Large-Scale High-Dimensional Clustering with Fast Sketching - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Large-Scale High-Dimensional Clustering with Fast Sketching

Résumé

In this paper, we address the problem of high-dimensional k-means clustering in a large-scale setting, i.e. for datasets that comprise a large number of items. Sketching techniques have already been used to deal with this “large-scale” issue, by compressing the whole dataset into a single vector of random nonlinear generalized moments from which the k centroids are then retrieved efficiently. However , this approach usually scales quadratically with the dimension; to cope with high-dimensional datasets, we show how to use fast structured random matrices to compute the sketching operator efficiently. This yields significant speed-ups and memory savings for high-dimensional data, while the clustering results are shown to be much more stable, both on artificial and real datasets.
Fichier principal
Vignette du fichier
final_with_reviews.pdf (160.52 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01701121 , version 1 (05-02-2018)

Identifiants

Citer

Antoine Chatalic, Rémi Gribonval, Nicolas Keriven. Large-Scale High-Dimensional Clustering with Fast Sketching. ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Canada. pp.4714-4718, ⟨10.1109/ICASSP.2018.8461328⟩. ⟨hal-01701121⟩
584 Consultations
821 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More