A Contextual Bandit Bake-off

Abstract : Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems. Despite multiple recent successes on statistically and computationally efficient methods, the practical behavior of these algorithms is still poorly understood. We leverage the availability of large numbers of supervised learning datasets to compare and empirically optimize contextual bandit algorithms, focusing on practical methods that learn by relying on optimization oracles from supervised learning. We find that a recent method (Foster et al., 2018) using optimism under uncertainty works the best overall. A surprisingly close second is a simple greedy baseline that only explores implicitly through the diversity of contexts, followed by a variant of Online Cover (Agarwal et al., 2014) which tends to be more conservative but robust to problem specification by design. Along the way, we also evaluate and improve several internal components of contextual bandit algorithm design. Overall, this is a thorough study and review of contextual bandit methodology.
Type de document :
Pré-publication, Document de travail
2018
Liste complète des métadonnées

Littérature citée [33 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01708310
Contributeur : Alberto Bietti <>
Soumis le : mercredi 30 mai 2018 - 23:12:52
Dernière modification le : mardi 12 juin 2018 - 12:10:00
Document(s) archivé(s) le : vendredi 31 août 2018 - 18:57:49

Fichier

main.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01708310, version 2
  • ARXIV : 1802.04064

Collections

Citation

Alberto Bietti, Alekh Agarwal, John Langford. A Contextual Bandit Bake-off. 2018. 〈hal-01708310v2〉

Partager

Métriques

Consultations de la notice

205

Téléchargements de fichiers

397