# Practical Evaluation and Optimization of Contextual Bandit Algorithms

1 Thoth - Apprentissage de modèles à partir de données massives
LJK - Laboratoire Jean Kuntzmann, Inria Grenoble - Rhône-Alpes
Abstract : We study and empirically optimize contextual bandit learning, exploration, and problem encodings across 500+ datasets, creating a reference for practitioners and discovering or reinforcing a number of natural open problems for researchers. Across these experiments we show that minimizing the amount of exploration is a key design goal for practical performance. Remarkably, many problems can be solved purely via the implicit exploration imposed by the diversity of contexts. For practitioners, we introduce a number of practical improvements to common exploration algorithms including Bootstrap Thompson sampling, Online Cover, and $\epsilon$-greedy. We also detail a new form of reduction to regression for learning from exploration data. Overall, this is a thorough study and review of contextual bandit methodology.
Document type :
Preprints, Working Papers, ...
Domain :

Cited literature [30 references]

https://hal.inria.fr/hal-01708310
Contributor : Alberto Bietti <>
Submitted on : Tuesday, February 13, 2018 - 3:08:17 PM
Last modification on : Tuesday, June 12, 2018 - 12:10:00 PM
Long-term archiving on: Sunday, May 6, 2018 - 12:01:00 AM

### File

practical_cb.pdf
Files produced by the author(s)

### Identifiers

• HAL Id : hal-01708310, version 1
• ARXIV : 1802.04064

### Citation

Alberto Bietti, Alekh Agarwal, John Langford. Practical Evaluation and Optimization of Contextual Bandit Algorithms. 2018. ⟨hal-01708310v1⟩

Record views