Persistence-Based Clustering in Riemannian Manifolds

Frédéric Chazal 1 Leonidas J. Guibas 2 Steve Oudot 1 Primoz Skraba 1, *
* Auteur correspondant
1 GEOMETRICA - Geometric computing
CRISAM - Inria Sophia Antipolis - Méditerranée , Inria Saclay - Ile de France
Abstract : We present a novel clustering algorithm that combines a mode-seeking phase with a cluster merging phase. While mode detection is performed by a standard graph-based hill-climbing scheme, the novelty of our approach resides in its use of {\em topological persistence} theory to guide the merges between clusters. An interesting feature of our algorithm is to provide additional feedback in the form of a finite set of points in the plane, called a {\em persistence diagram}, which provably reflects the prominence of each of the modes of the density. Such feedback is an invaluable tool in practice, as it enables the user to determine a set of parameter values that will make the algorithm compute a relevant clustering on the next run. In terms of generality, our approach requires the sole knowledge of (approximate) pairwise distances between the data points, as well as of rough estimates of the density at these points. It is therefore virtually applicable in any arbitrary metric space. In the meantime, its complexity remains reasonable: although the size of the input distance matrix may be up to quadratic in the number of data points, a careful implementation only uses a linear amount of main memory and barely takes more time to run than the one spent reading the input. Taking advantage of recent advances in topological persistence theory, we are able to give a theoretically sound notion of what the {\em correct} number $k$ of clusters is, and to prove that under mild sampling conditions and a relevant choice of parameters (made possible in practice by the persistence diagram) our clustering scheme computes a set of $k$ clusters whose spatial locations are bound to the ones of the basins of attraction of the peaks of the density. These guarantess hold in a large variety of contexts, including when data points are distributed along some unknown Riemannian manifold.
Type de document :
Rapport
[Research Report] RR-6968, INRIA. 2009, 47 p
Liste complète des métadonnées

Littérature citée [34 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00389390
Contributeur : Primoz Skraba <>
Soumis le : lundi 22 juin 2009 - 23:32:43
Dernière modification le : vendredi 23 février 2018 - 14:20:06
Document(s) archivé(s) le : lundi 15 octobre 2012 - 11:20:42

Fichier

RR-6968.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00389390, version 1

Collections

Citation

Frédéric Chazal, Leonidas J. Guibas, Steve Oudot, Primoz Skraba. Persistence-Based Clustering in Riemannian Manifolds. [Research Report] RR-6968, INRIA. 2009, 47 p. 〈inria-00389390〉

Partager

Métriques

Consultations de la notice

593

Téléchargements de fichiers

829