Rewards and errors in multi-arm bandits for interactive education - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

Rewards and errors in multi-arm bandits for interactive education

Résumé

In multi-armed bandits, the most common objective is the maximization of the cumulative reward. Alternative settings include active exploration, where a learner tries to gain accurate estimates of the rewards of all arms. While these objectives are contrasting, in many scenarios it is desirable to trade off rewards and errors. For instance, in educational games the designer wants to gather generalizable knowledge about the behavior of the students and teaching strategies (small estimation errors) but, at the same time, the system needs to avoid giving a bad experience to the players, who may leave the system permanently (large reward). In this paper, we formalize this tradeoff and introduce the ForcingBalance algorithm whose performance is provably close to the best possible tradeoff strategy. Finally, we demonstrate on real-world educational data that ForcingBalance returns useful information about the arms without compromising the overall reward.
Fichier principal
Vignette du fichier
erraqabi2016rewards.pdf (368.28 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01482764 , version 1 (03-03-2017)

Identifiants

  • HAL Id : hal-01482764 , version 1

Citer

Akram Erraqabi, Alessandro Lazaric, Michal Valko, Emma Brunskill, Yun-En Liu. Rewards and errors in multi-arm bandits for interactive education. Challenges in Machine Learning: Gaming and Education workshop at Neural Information Processing Systems, 2016, Barcelona, Spain. ⟨hal-01482764⟩
218 Consultations
221 Téléchargements

Partager

Gmail Facebook X LinkedIn More