Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly

Benjamin Guedj 1 Le Li 2
1 MODAL - MOdel for Data Analysis and Learning
LPP - Laboratoire Paul Painlevé - UMR 8524, Université de Lille, Sciences et Technologies, Inria Lille - Nord Europe, CERIM - Santé publique : épidémiologie et qualité des soins-EA 2694, Polytech Lille - École polytechnique universitaire de Lille
Abstract : When confronted with massive data streams, summarizing data with dimension reduction methods such as PCA raises theoretical and algorithmic pitfalls. Principal curves act as a nonlinear generalization of PCA and the present paper proposes a novel algorithm to automatically and sequentially learn principal curves from data streams. We show that our procedure is supported by regret bounds with optimal sublinear remainder terms. A greedy local search implementation that incorporates both sleeping experts and multi-armed bandit ingredients is presented, along with its regret bound and performance on a toy example and seismic data.
Type de document :
Pré-publication, Document de travail
2018
Liste complète des métadonnées

Littérature citée [29 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01796011
Contributeur : Benjamin Guedj <>
Soumis le : vendredi 18 mai 2018 - 21:54:16
Dernière modification le : mercredi 14 novembre 2018 - 14:40:11
Document(s) archivé(s) le : lundi 24 septembre 2018 - 11:51:19

Fichier

main-pcurves.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01796011, version 1

Collections

Citation

Benjamin Guedj, Le Li. Sequential Learning of Principal Curves: Summarizing Data Streams on the Fly. 2018. 〈hal-01796011〉

Partager

Métriques

Consultations de la notice

69

Téléchargements de fichiers

24