P. L. Bartlett, V. Gabillon, and M. Valko, A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption, Algorithmic Learning Theory, 2019.
URL : https://hal.archives-ouvertes.fr/hal-01885368

D. Bertsekas and J. Tsitsiklis, Neuro-dynamic programming, Athena Scientific, 1996.

S. Bubeck and R. Munos, Open-loop optimistic planning, Conference on Learning Theory, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00943119

S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, Xarmed bandits, Journal of Machine Learning Research, vol.12, pp.1587-1627, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00450235

L. Bu?oniu and R. Munos, Optimistic planning for Markov decision processes, International Conference on Artificial Intelligence and Statistics, 2012.

A. Carpentier and A. Locatelli, Tight (lower) bounds for the fixed budget best-arm identification bandit problem, Conference on Learning Theory, 2016.

P. Coquelin and R. Munos, Bandit algorithms for tree search, Uncertainty in Artificial Intelligence, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00150207

R. Coulom, Efficient selectivity and backup operators in Monte-Carlo tree search. Computers and games, vol.4630, p.7283, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00116992

Z. Feldman and C. Domshlak, Simple regret optimization in online planning for Markov decision processes, Journal of Artificial Intelligence Research, 2014.

V. Gabillon, M. Ghavamzadeh, and A. Lazaric, Bestarm identification: A unified approach to fixed budget and fixed confidence, Neural Information Processing Systems, 2012.

S. Gelly, W. Yizao, R. Munos, and O. Teytaud, Modification of UCT with patterns in Monte-Carlo Go, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00117266

J. Grill, M. Valko, and R. Munos, Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning, Neural Information Processing Systems, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01389107

A. Hoorfar and M. Hassani, Inequalities on the lambert w function and hyperpower function, Journal of Inequalities in Pure and Applied Mathematics, vol.9, issue.2, pp.5-9, 2008.

J. Hren and R. Munos, Optimistic planning of deterministic systems, European Workshop on Reinforcement Learning, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00830182

E. Kaufmann and W. M. Koolen, Monte-carlo tree search by best-arm identification, Neural Information Processing Systems, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01535907

L. Kocsis and C. Szepesvári, Bandit-based Monte-Carlo planning, European Conference on Machine Learning, 2006.

E. Leurent and O. Maillard, , 2019.

R. Munos, Optimistic optimization of deterministic functions without the knowledge of its smoothness, Neural Information Processing Systems, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00830143

R. Munos, From bandits to Monte-Carlo tree search: The optimistic principle applied to optimization and planning. Foundations and Trends in Machine Learning, vol.7, pp.1-130, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00747575

F. Orabona and D. Pál, Scale-free online learning, Theoretical Computer Science, 2018.

F. Orabona and T. Tommasi, Training deep networks without learning rates through coin betting, Neural Information Processing Systems, 2017.

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994.

S. Ross, P. Mineiro, and J. Langford, Normalized online learning, Uncertainty in Artificial Intelligence, 2013.

D. Shah, Q. Xie, and Z. Xu, On reinforcement learning using Monte-Carlo tree search with supervised learning: Non-asymptotic analysis, 2019.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol.529, issue.7587, pp.484-489, 2016.

B. Szörényi, G. Kedenburg, and R. Munos, Optimistic planning in Markov decision processes using a generative model, Neural Information Processing Systems, 2014.

M. Valko, A. Carpentier, and R. Munos, Stochastic simultaneous optimistic optimization, International Conference on Machine Learning, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00789606