Consistent Minimization of Clustering Objective Functions

Ulrike Von Luxburg 1 Sébastien Bubeck 2 Stefanie Jegelka 1 Michael Kaufmann 3
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : Clustering is often formulated as a discrete optimization problem. The objective is to find, among all partitions of the data set, the best one according to some quality measure. However, in the statistical setting where we assume that the finite data set has been sampled from some underlying space, the goal is not to find the best partition of the given sample, but to approximate the true partition of the underlying space. We argue that the discrete optimization approach usually does not achieve this goal. As an alternative, we suggest the paradigm of ``nearest neighbor clustering''. Instead of selecting the best out of all partitions of the sample, it only considers partitions in some restricted function class. Using tools from statistical learning theory we prove that nearest neighbor clustering is statistically consistent. Moreover, its worst case complexity is polynomial by construction, and it can be implemented with small average case complexity using branch and bound.
Type de document :
Communication dans un congrès
Neural Information Processing Systems, Dec 2007, Vancouver, Canada. 2007
Liste complète des métadonnées

Littérature citée [5 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00185777
Contributeur : Sébastien Bubeck <>
Soumis le : mercredi 7 novembre 2007 - 09:37:15
Dernière modification le : jeudi 11 janvier 2018 - 06:22:13
Document(s) archivé(s) le : lundi 24 septembre 2012 - 14:56:04

Identifiants

  • HAL Id : inria-00185777, version 1

Collections

Citation

Ulrike Von Luxburg, Sébastien Bubeck, Stefanie Jegelka, Michael Kaufmann. Consistent Minimization of Clustering Objective Functions. Neural Information Processing Systems, Dec 2007, Vancouver, Canada. 2007. 〈inria-00185777〉

Partager

Métriques

Consultations de la notice

268

Téléchargements de fichiers

261