Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments - Inria - Institut national de recherche en sciences et technologies du numérique Access content directly
Conference Papers Year : 2007

Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments

Abstract

Motivated by realtime website optimization, this paper is about online learning in abruptly changing environments. Two extensions of the UCBT algorithm are combined in order to handle dynamic multi-armed bandits, and specifically to cope with fast variations in the rewards. Firstly, a change point detection test based on Page-Hinkley statistics is used to overcome the limitations due to the UCBT inertia. Secondly, a controlled forgetting strategy dubbed Meta-Bandit is proposed to take care of the Exploration vs Exploitation trade-off when the PH test is triggered. Extensive empirical validation shows significant improvements compared to the baseline algorithms. The paper also investigates the sensitivity of the proposed algorithm with respect to the number of available options.
Fichier principal
Vignette du fichier
cap07.pdf (154.15 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

inria-00164033 , version 1 (05-11-2007)

Identifiers

  • HAL Id : inria-00164033 , version 1

Cite

Cédric Hartland, Nicolas Baskiotis, Sylvain Gelly, Michèle Sebag, Olivier Teytaud. Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments. CAp 2007 : 9è Conférence francophone sur l'apprentissage automatique, Jul 2007, Grenoble, France. pp.237-250. ⟨inria-00164033⟩
729 View
1408 Download

Share

Gmail Facebook X LinkedIn More