Latent Bandits.

Abstract : We consider a multi-armed bandit problem where the reward distributions are indexed by two sets --one for arms, one for type-- and can be partitioned into a small number of clusters according to the type. First, we consider the setting where all reward distributions are known and all types have the same underlying cluster, the type's identity is, however, unknown. Second, we study the case where types may come from different classes, which is significantly more challenging. Finally, we tackle the case where the reward distributions are completely unknown. In each setting, we introduce specific algorithms and derive non-trivial regret performance. Numerical experiments show that, in the most challenging agnostic case, the proposed algorithm achieves excellent performance in several difficult scenarios.
Type de document :
Autre publication
Extended version of the paper accepted to ICML 2014 (paper and supplementary material). 2014
Liste complète des métadonnées

https://hal.inria.fr/hal-00926281
Contributeur : Odalric-Ambrym Maillard <>
Soumis le : jeudi 9 janvier 2014 - 13:03:01
Dernière modification le : lundi 13 janvier 2014 - 13:18:14
Document(s) archivé(s) le : jeudi 10 avril 2014 - 15:05:19

Fichier

icml_cr_Arxiv.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00926281, version 1

Citation

Odalric-Ambrym Maillard, Shie Mannor. Latent Bandits.. Extended version of the paper accepted to ICML 2014 (paper and supplementary material). 2014. <hal-00926281>

Partager

Métriques

Consultations de
la notice

254

Téléchargements du document

229