Trading off rewards and errors in multi-armed bandits

Akram Erraqabi 1, 2 Alessandro Lazaric 1 Michal Valko 1 Emma Brunskill 3 Yun-En Liu 3
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : In multi-armed bandits, the most common objective is the maximization of the cumulative reward. Alternative settings include active exploration, where a learner tries to gain accurate estimates of the rewards of all arms. While these objectives are contrasting, in many scenarios it is desirable to trade off rewards and errors. For instance, in educational games the designer wants to gather generalizable knowledge about the behavior of the students and teaching strategies (small estimation errors) but, at the same time, the system needs to avoid giving a bad experience to the players, who may leave the system permanently (large reward). In this paper, we formalize this tradeoff and introduce the ForcingBalance algorithm whose performance is provably close to the best possible tradeoff strategy. Finally, we demonstrate on real-world educational data that ForcingBalance returns useful information about the arms without compromising the overall reward.
Type de document :
Communication dans un congrès
International Conference on Artificial Intelligence and Statistics, 2017, Fort Lauderdale, United States
Liste complète des métadonnées

https://hal.inria.fr/hal-01482765
Contributeur : Michal Valko <>
Soumis le : lundi 6 mars 2017 - 11:53:46
Dernière modification le : lundi 30 avril 2018 - 14:38:02
Document(s) archivé(s) le : mercredi 7 juin 2017 - 12:16:07

Fichier

erraqabi2017trading.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01482765, version 1

Citation

Akram Erraqabi, Alessandro Lazaric, Michal Valko, Emma Brunskill, Yun-En Liu. Trading off rewards and errors in multi-armed bandits. International Conference on Artificial Intelligence and Statistics, 2017, Fort Lauderdale, United States. 〈hal-01482765〉

Partager

Métriques

Consultations de la notice

457

Téléchargements de fichiers

144