A Contextual Bandit Bake-off

Alberto Bietti; Alekh Agarwal; John Langford

Article Dans Une Revue Journal of Machine Learning Research Année : 2021

A Contextual Bandit Bake-off

(1, 2, 3) , (4) , (4)

1
2
3
4

Alberto Bietti

Fonction : Auteur
PersonId : 989882

Microsoft Research - Inria Joint Centre

New York University [New York]

Apprentissage de modèles à partir de données massives

Alekh Agarwal

Fonction : Auteur
PersonId : 1028181

Microsoft Research

John Langford

Fonction : Auteur

Microsoft Research

Résumé

Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems. Despite multiple recent successes on statistically and computationally efficient methods, the practical behavior of these algorithms is still poorly understood. We leverage the availability of large numbers of supervised learning datasets to compare and empirically optimize contextual bandit algorithms, focusing on practical methods that learn by relying on optimization oracles from supervised learning. We find that a recent method (Foster et al., 2018) using optimism under uncertainty works the best overall. A surprisingly close second is a simple greedy baseline that only explores implicitly through the diversity of contexts, followed by a variant of Online Cover (Agarwal et al., 2014) which tends to be more conservative but robust to problem specification by design. Along the way, we also evaluate and improve several internal components of contextual bandit algorithm design. Overall, this is a thorough study and review of contextual bandit methodology.

Mots clés

Contextual bandits Online learning Evaluation

Domaines

Machine Learning [stat.ML] Apprentissage [cs.LG]

Fichier principal

18-863.pdf (868.92 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Alberto Bietti : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01708310

Soumis le : jeudi 24 juin 2021-19:42:42

Dernière modification le : jeudi 4 avril 2024-21:10:03

Dates et versions

hal-01708310 , version 1 (13-02-2018)

hal-01708310 , version 2 (30-05-2018)

hal-01708310 , version 3 (26-12-2018)

hal-01708310 , version 4 (24-06-2021)

Identifiants

HAL Id : hal-01708310 , version 4
ARXIV : 1802.04064

Citer

Alberto Bietti, Alekh Agarwal, John Langford. A Contextual Bandit Bake-off. Journal of Machine Learning Research, 2021, 22 (133), pp.1-49. ⟨hal-01708310v4⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LJK LJK_GI INRIA2 LJK-GI-THOTH

673 Consultations

2515 Téléchargements

A Contextual Bandit Bake-off

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager