Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments

Cédric Hartland 1, 2 Nicolas Baskiotis 1, 2 Sylvain Gelly 1, 2 Michèle Sebag 1, 2 Olivier Teytaud 1, 2
1 TANC - Algorithmic number theory for cryptology
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], Inria Saclay - Ile de France, X - École polytechnique, CNRS - Centre National de la Recherche Scientifique : UMR7161
Abstract : Motivated by realtime website optimization, this paper is about online learning in abruptly changing environments. Two extensions of the UCBT algorithm are combined in order to handle dynamic multi-armed bandits, and specifically to cope with fast variations in the rewards. Firstly, a change point detection test based on Page-Hinkley statistics is used to overcome the limitations due to the UCBT inertia. Secondly, a controlled forgetting strategy dubbed Meta-Bandit is proposed to take care of the Exploration vs Exploitation trade-off when the PH test is triggered. Extensive empirical validation shows significant improvements compared to the baseline algorithms. The paper also investigates the sensitivity of the proposed algorithm with respect to the number of available options.
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00164033
Contributeur : Cédric Hartland <>
Soumis le : lundi 5 novembre 2007 - 10:18:05
Dernière modification le : jeudi 10 mai 2018 - 02:06:00
Document(s) archivé(s) le : lundi 24 septembre 2012 - 11:15:50

Fichier

cap07.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00164033, version 1

Collections

Citation

Cédric Hartland, Nicolas Baskiotis, Sylvain Gelly, Michèle Sebag, Olivier Teytaud. Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments. CAp, cepadues, 2007, pp.237-250. 〈inria-00164033〉

Partager

Métriques

Consultations de la notice

565

Téléchargements de fichiers

871