Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments

Motivated by realtime website optimization, this paper is about online learning in abruptly changing environments. Two extensions of the UCBT algorithm are combined in order to handle dynamic multi-armed bandits, and specifically to cope with fast variations in the rewards. Firstly, a change point detection test based on Page-Hinkley statistics is used to overcome the limitations due to the UCBT inertia. Secondly, a controlled forgetting strategy dubbed Meta-Bandit is proposed to take care of the Exploration vs Exploitation trade-off when the PH test is triggered. Extensive empirical validation shows significant improvements compared to the baseline algorithms. The paper also investigates the sensitivity of the proposed algorithm with respect to the number of available options.

Keywords

online learning meta bandits ucb dynamic environments

Domains

Artificial Intelligence [cs.AI] Machine Learning [cs.LG] Statistics [math.ST] Statistics Theory [stat.TH]

Fichier principal

cap07.pdf (154.15 Ko)

Origin : Files produced by the author(s)

Cédric Hartland : Connect in order to contact the contributor

https://inria.hal.science/inria-00164033

Submitted on : Monday, November 5, 2007-10:18:05 AM

Last modification on : Friday, April 19, 2024-2:36:39 PM

Long-term archiving on: Monday, September 24, 2012-11:15:50 AM

Dates and versions

inria-00164033 , version 1 (05-11-2007)

Identifiers

HAL Id : inria-00164033 , version 1

Cite

Cédric Hartland, Nicolas Baskiotis, Sylvain Gelly, Michèle Sebag, Olivier Teytaud. Change Point Detection and Meta-Bandits for Online Learning in Dynamic Environments. CAp 2007 : 9è Conférence francophone sur l'apprentissage automatique, Jul 2007, Grenoble, France. pp.237-250. ⟨inria-00164033⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X EC-PARIS CNRS INRIA LIX X-LIX X-DEP-INFO PARISTECH UMR8623 INRIA2 UNIV-PARIS-SACLAY

734 View

1408 Download