Linear Thompson Sampling Revisited

Marc Abeille 1, 2 Alessandro Lazaric 1, 2
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : We derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting. While we obtain a regret bound of order $\wt{O}(d^{3/2}\sqrt{T})$ as in previous results, the proof sheds new light on the functioning of the \ts. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to \textit{optimistic} parameters does control it. Thus we show that \ts can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.
Type de document :
Communication dans un congrès
AISTATS 2017 - 20th International Conference on Artificial Intelligence and Statistics, Apr 2017, Fort Lauderdale, United States. 2017
Liste complète des métadonnées

https://hal.inria.fr/hal-01493561
Contributeur : Alessandro Lazaric <>
Soumis le : mardi 21 mars 2017 - 17:37:20
Dernière modification le : mardi 3 juillet 2018 - 11:37:07
Document(s) archivé(s) le : jeudi 22 juin 2017 - 14:16:20

Fichier

main.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01493561, version 1

Citation

Marc Abeille, Alessandro Lazaric. Linear Thompson Sampling Revisited. AISTATS 2017 - 20th International Conference on Artificial Intelligence and Statistics, Apr 2017, Fort Lauderdale, United States. 2017. 〈hal-01493561〉

Partager

Métriques

Consultations de la notice

525

Téléchargements de fichiers

97