Adaptive Bandits: Towards the best history-dependent strategy

Odalric-Ambrym Maillard; Rémi Munos

Rapport (Rapport Technique) Année : 2011

Adaptive Bandits: Towards the best history-dependent strategy

(1) , (1)

Odalric-Ambrym Maillard

Fonction : Auteur
PersonId : 5563
IdHAL : odalric-ambrym-maillard
ORCID : 0000-0001-7935-7026
IdRef : 158055594

Sequential Learning

Rémi Munos

Fonction : Auteur

Sequential Learning

Résumé

We consider multi-armed bandit games with possibly adaptive opponents. We introduce models Theta of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which dene two learning scenarios: (1) The opponent is constrained, i.e. he provides rewards that are stochastic functions of equivalence classes dened by some model theta*\in Theta. The regret is measured with respect to (w.r.t.) the best history-dependent strategy. (2) The opponent is arbitrary and we measure the regret w.r.t. the best strategy among all mappings from classes to actions (i.e. the best history-class-based strategy) for the best model in Theta. This allows to model opponents (case 1) or strategies (case 2) which handles nite memory, periodicity, standard stochastic bandits and other situations. When Theta={theta}, i.e. only one model is considered, we derive tractable algorithms achieving a tight regret (at time T) bounded by ~O(sqrt(TAC)), where C is the number of classes of theta. Now, when many models are available, all known algorithms achieving a nice regret O(sqrt(T)) are unfortunately not tractable and scale poorly with the number of models |Theta|. Our contribution here is to provide tractable algorithms with regret bounded by T^{2/3}C^{1/3} log(|Theta|)^{1/2}.

Domaines

Statistiques [math.ST] Théorie [stat.TH] Apprentissage [cs.LG]

Fichier principal

AdaptiveBandits_HAL.pdf (325.33 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Odalric-Ambrym Maillard : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00574999

Soumis le : mercredi 9 mars 2011-13:57:08

Dernière modification le : lundi 22 avril 2024-10:25:54

Archivage à long terme le : mardi 6 novembre 2012-15:56:07

Dates et versions

inria-00574999 , version 1 (09-03-2011)

Identifiants

HAL Id : inria-00574999 , version 1

Citer

Odalric-Ambrym Maillard, Rémi Munos. Adaptive Bandits: Towards the best history-dependent strategy. [Technical Report] 2011, pp.14. ⟨inria-00574999⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS INRIA2 LARA

427 Consultations

196 Téléchargements

Adaptive Bandits: Towards the best history-dependent strategy

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager