Practical Evaluation and Optimization of Contextual Bandit Algorithms

Abstract : We study and empirically optimize contextual bandit learning, exploration, and problem encodings across 500+ datasets, creating a reference for practitioners and discovering or reinforcing a number of natural open problems for researchers. Across these experiments we show that minimizing the amount of exploration is a key design goal for practical performance. Remarkably, many problems can be solved purely via the implicit exploration imposed by the diversity of contexts. For practitioners, we introduce a number of practical improvements to common exploration algorithms including Bootstrap Thompson sampling, Online Cover, and $\epsilon$-greedy. We also detail a new form of reduction to regression for learning from exploration data. Overall, this is a thorough study and review of contextual bandit methodology.
Type de document :
Pré-publication, Document de travail
2018
Liste complète des métadonnées

Littérature citée [41 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01708310
Contributeur : Alberto Bietti <>
Soumis le : mardi 13 février 2018 - 15:08:17
Dernière modification le : jeudi 15 février 2018 - 11:19:35

Fichier

practical_cb.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01708310, version 1
  • ARXIV : 1802.04064

Collections

UGA | LJK | INRIA

Citation

Alberto Bietti, Alekh Agarwal, John Langford. Practical Evaluation and Optimization of Contextual Bandit Algorithms. 2018. 〈hal-01708310〉

Partager

Métriques

Consultations de la notice

84

Téléchargements de fichiers

16