Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments

Motivated by realtime website optimization, this paper is about online learning in abruptly changing environments. Two extensions of the UCBT algorithm are combined in order to handle dynamic multi-armed bandits, and specifically to cope with fast variations in the rewards. Firstly, a change point detection test based on Page-Hinkley statistics is used to overcome the limitations due to the UCBT inertia. Secondly, a controlled forgetting strategy dubbed Meta-Bandit is proposed to take care of the Exploration vs Exploitation trade-off when the PH test is triggered. Extensive empirical validation shows significant improvements compared to the baseline algorithms. The paper also investigates the sensitivity of the proposed algorithm with respect to the number of available options.

Mots clés

online learning meta bandits ucb dynamic environments

Domaines

Intelligence artificielle [cs.AI] Apprentissage [cs.LG] Statistiques [math.ST] Théorie [stat.TH]

Fichier principal

cap07.pdf (154.15 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Cédric Hartland : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00164033

Soumis le : lundi 5 novembre 2007-10:18:05

Dernière modification le : vendredi 19 avril 2024-14:36:39

Archivage à long terme le : lundi 24 septembre 2012-11:15:50

Dates et versions

inria-00164033 , version 1 (05-11-2007)

Identifiants

HAL Id : inria-00164033 , version 1

Citer

Cédric Hartland, Nicolas Baskiotis, Sylvain Gelly, Michèle Sebag, Olivier Teytaud. Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments. CAp 2007 : 9è Conférence francophone sur l'apprentissage automatique, Jul 2007, Grenoble, France. pp.237-250. ⟨inria-00164033⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X EC-PARIS CNRS INRIA LIX X-LIX X-DEP-INFO PARISTECH UMR8623 INRIA2 UNIV-PARIS-SACLAY

727 Consultations

1404 Téléchargements