P. Auer, N. Cesa-bianchi, and P. Fischer, Finitetime analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

G. Marc, W. Bellemare, R. Dabney, and . Munos, A distributional perspective on reinforcement learning. arXiv preprint, 2017.

S. Bubeck and R. Munos, Open loop optimistic planning, COLT, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00943119

R. De-maesschalck, D. Jouan-rimbaud, L. Désiré, and . Massart, The mahalanobis distance. Chemometrics and intelligent laboratory systems, pp.1-18, 2000.

M. Enzenberger, M. Muller, B. Arneson, and R. Segal, Fuego???An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search, IEEE Transactions on Computational Intelligence and AI in Games, vol.2, issue.4, pp.259-270, 2010.
DOI : 10.1109/TCIAIG.2010.2083662

M. Heusner and . Uct-for-pac-man, , 2011.

L. Pack-kaelbling, L. Michael, . Littman, R. Anthony, and . Cassandra, Planning and acting in partially observable stochastic domains, Artificial Intelligence, vol.101, issue.1-2, pp.99-134, 1998.
DOI : 10.1016/S0004-3702(98)00023-X

T. Keller and M. Helmert, Trial-based heuristic tree search for finite horizon mdps, ICAPS, 2013.

L. Kocsis and C. Szepesvári, Bandit Based Monte-Carlo Planning, ECML, pp.282-293, 2006.
DOI : 10.1007/11871842_29

D. Perez, P. Rohlfshagen, M. Simon, and . Lucas,

, Monte carlo tree search: Long-term versus short-term planning, Computational Intelligence and Games (CIG), 2012 IEEE Conference on, pp.219-226, 2012.

D. Perez, P. Rohlfshagen, M. Simon, and . Lucas, The physical travelling salesman problem: WCCI 2012 competition, 2012 IEEE Congress on Evolutionary Computation, pp.1-8, 2012.
DOI : 10.1109/CEC.2012.6256440

L. Martin and . Puterman, Markov decision processes: discrete stochastic dynamic programming, 2014.

E. Rachelson, G. Michail, and . Lagoudakis, On the locality of action domination in sequential decision making, ISAIM, 2010.

D. Silver and J. Veness, Monte-carlo planning in large pomdps, Advances in neural information processing systems, pp.2164-2172, 2010.

D. Silver, S. Richard, M. Sutton, and . Müller, Sample-based learning and search with permanent and transient memories, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.968-975, 2008.
DOI : 10.1145/1390156.1390278

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol.34, issue.7587, pp.529-484, 2016.
DOI : 10.3233/ICG-2011-34302

S. Richard and . Sutton, Dyna, an integrated architecture for learning, planning, and reacting, ACM SIGART Bulletin, vol.2, issue.4, pp.160-163, 1991.

S. Richard, . Sutton, G. Andrew, and . Barto, Reinforcement learning: An introduction, 1998.

A. Weinstein, L. Michael, and . Littman, Bandit-based planning and learning in continuous-action markov decision processes, ICAPS, 2012.