Sample mean based index policies with o(log n) regret for the multi-armed bandit problem, Advances in Applied Probability, vol.27, issue.4, pp.1054-1078, 1995. ,
Minimax policies for adversarial and stochastic bandits, COLT, pp.217-226, 2009. ,
URL : https://hal.archives-ouvertes.fr/hal-00834882
Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002. ,
DOI : 10.1137/S0097539701398375
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.130.158
Prior-free and prior-dependent regret bounds for Thompson Sampling, 2014 48th Annual Conference on Information Sciences and Systems (CISS), pp.638-646, 2013. ,
DOI : 10.1109/CISS.2014.6814158
URL : http://arxiv.org/abs/1304.5758
Optimal adaptive policies for sequential allocation problems, Advances in Applied Mathematics, vol.17, issue.2, pp.122-142, 1996. ,
Gilles Stoltz, et al. Kullback?leibler upper confidence bounds for optimal sequential allocation. The Annals of Statistics, pp.1516-1541, 2013. ,
Prediction, learning, and games, 2006. ,
DOI : 10.1017/CBO9780511546921
Anytime optimal algorithms in stochastic multi-armed bandits, Proceedings of the 33rd International Conference on International Conference on Machine Learning, pp.1587-1595 ,
The kl-ucb algorithm for bounded stochastic bandits and beyond, COLT, pp.359-376, 2011. ,
On explore-then-commit strategies, Advances in Neural Information Processing Systems, pp.784-792, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01322906
Explore first, exploit next: The true shape of regret in bandit problems, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01276324
On bayesian index policies for sequential resource allocation. arXiv preprint, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01251606
On bayesian upper confidence bounds for bandit problems, AISTATS, pp.592-600, 2012. ,
Thompson sampling for 1-dimensional exponential family bandits, Advances in Neural Information Processing Systems, pp.1448-1456, 2013. ,
Asymptotically efficient adaptive allocation rules, Advances in applied mathematics, vol.6, issue.1, pp.4-22, 1985. ,
Optimally confident ucb: Improved regret for finite-armed bandits. arXiv preprint, 2015. ,
A finite-time analysis of multi-armed bandits problems with Kullback-Leibler divergences, Proceedings of the 23rd Annual Conference on Learning Theory, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00574987
The missing factor in hoeffding's inequalities, Annales de l'IHP Probabilités et statistiques, pp.689-702, 1995. ,