Rewards and errors in multi-arm bandits for interactive education

Akram Erraqabi; Alessandro Lazaric; Michal Valko; Emma Brunskill; Yun-En Liu

Communication Dans Un Congrès Année : 2016

Rewards and errors in multi-arm bandits for interactive education

(1, 2) , (1) , (1) , (3) , (3)

1
2
3

Akram Erraqabi

Fonction : Auteur
PersonId : 982698

Sequential Learning

Université de Montréal

Alessandro Lazaric

Fonction : Auteur
PersonId : 851
IdHAL : alessandro-lazaric
ORCID : 0000-0002-8970-413X
IdRef : 188701486

Sequential Learning

Michal Valko

Fonction : Auteur
PersonId : 284
IdHAL : michal
IdRef : 22360934X

Sequential Learning

Emma Brunskill

Fonction : Auteur

Computer Science Department - Carnegie Mellon University

Yun-En Liu

Fonction : Auteur

Computer Science Department - Carnegie Mellon University

Résumé

In multi-armed bandits, the most common objective is the maximization of the cumulative reward. Alternative settings include active exploration, where a learner tries to gain accurate estimates of the rewards of all arms. While these objectives are contrasting, in many scenarios it is desirable to trade off rewards and errors. For instance, in educational games the designer wants to gather generalizable knowledge about the behavior of the students and teaching strategies (small estimation errors) but, at the same time, the system needs to avoid giving a bad experience to the players, who may leave the system permanently (large reward). In this paper, we formalize this tradeoff and introduce the ForcingBalance algorithm whose performance is provably close to the best possible tradeoff strategy. Finally, we demonstrate on real-world educational data that ForcingBalance returns useful information about the arms without compromising the overall reward.

Domaines

Machine Learning [stat.ML]

Fichier principal

erraqabi2016rewards.pdf (368.28 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Michal Valko : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01482764

Soumis le : vendredi 3 mars 2017-18:40:11

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Archivage à long terme le : mardi 6 juin 2017-12:09:55

Dates et versions

hal-01482764 , version 1 (03-03-2017)

Identifiants

HAL Id : hal-01482764 , version 1

Citer

Akram Erraqabi, Alessandro Lazaric, Michal Valko, Emma Brunskill, Yun-En Liu. Rewards and errors in multi-arm bandits for interactive education. Challenges in Machine Learning: Gaming and Education workshop at Neural Information Processing Systems, 2016, Barcelona, Spain. ⟨hal-01482764⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE

218 Consultations

221 Téléchargements

Rewards and errors in multi-arm bandits for interactive education

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager