Skip to Main content Skip to Navigation
Conference papers

Revisiting clustering as matrix factorisation on the Stiefel manifold

Stephane Chretien 1 Benjamin Guedj 2, 3, 4
2 MODAL - MOdel for Data Analysis and Learning
LPP - Laboratoire Paul Painlevé - UMR 8524, Université de Lille, Sciences et Technologies, Inria Lille - Nord Europe, METRICS - Evaluation des technologies de santé et des pratiques médicales - ULR 2694, Polytech Lille - École polytechnique universitaire de Lille
Abstract : This paper studies clustering for possibly high dimensional data (e.g. images, time series, gene expression data, and many other settings), and rephrase it as low rank matrix estimation in the PAC-Bayesian framework. Our approach leverages the well known Burer-Monteiro factorisation strategy from large scale optimisation, in the context of low rank estimation. Moreover, our Burer-Monteiro factors are shown to lie on a Stiefel manifold. We propose a new generalized Bayesian estimator for this problem and prove novel prediction bounds for clustering. We also devise a componentwise Langevin sampler on the Stiefel manifold to compute this estimator.
Complete list of metadata

Cited literature [40 references]  Display  Hide  Download

https://hal.inria.fr/hal-02064396
Contributor : Benjamin Guedj <>
Submitted on : Monday, March 11, 2019 - 7:00:29 PM
Last modification on : Tuesday, February 2, 2021 - 3:31:17 AM
Long-term archiving on: : Wednesday, June 12, 2019 - 6:26:17 PM

File

main.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02064396, version 1
  • ARXIV : 1903.04479

Collections

Citation

Stephane Chretien, Benjamin Guedj. Revisiting clustering as matrix factorisation on the Stiefel manifold. LOD 2020 - the Sixth International Conference on Machine Learning, Optimisation and Data Science, Jul 2020, Siena, Italy. ⟨hal-02064396⟩

Share

Metrics

Record views

104

Files downloads

458