P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

D. P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, 2001.

S. Bubeck and R. Munos, Open loop optimistic planning, Conference on Learning Theory, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00943119

S. Bubeck-andnicoì-o-cesa-bianchi, Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Machine Learning, pp.1-122, 2012.

L. Bus¸oniubus¸oniu and R. Munos, Optimistic planning for markov decision processes, Proceedings 15th International Conference on Artificial Intelligence and Statistics (AISTATS-12), pp.182-189, 2012.

E. F. Camacho and C. Bordons, Model Predictive Control, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00683813

N. Cesa-bianchi and G. Lugosi, Prediction, Learning, and Games, 2006.
DOI : 10.1017/CBO9780511546921

R. Coulom, Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Proceedings Computers and Games, 2006.
DOI : 10.1007/978-3-540-75538-8_7
URL : https://hal.archives-ouvertes.fr/inria-00116992

E. Even-dar, S. Mannor, and Y. Mansour, Action elimination and stopping conditions for reinforcement learning, Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), pp.162-169, 2003.

V. Gabillon, M. Ghavamzadeh, and A. Lazaric, Best arm identification: A unified approach to fixed budget and fixed confidence, NIPS, pp.3221-3229, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00747005

S. Gelly, Y. Wang, R. Munos, and O. Teytaud, Modification of UCT with Patterns in Monte-Carlo Go, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00117266

A. Eric, S. Hansen, and . Zilberstein, A heuristic search algorithm for Markov decision problems, Proceedings Bar-Ilan Symposium on the Foundation of Artificial Intelligence, pp.23-25

J. Hren and R. Munos, Optimistic Planning of Deterministic Systems, Recent Advances in Reinforcement Learning European Workshop on Reinforcement Learning, pp.151-164, 2008.
DOI : 10.1007/978-3-540-89722-4_12
URL : https://hal.archives-ouvertes.fr/hal-00830182

M. Kearns, Y. Mansour, and A. Y. Ng, A sparse sampling algorithm for near-optimal planning in large Markovian decision processes, Machine Learning, pp.193-208, 2002.

G. Kedenburg, R. Fonteneau, and R. Munos, Aggregating optimistic planning trees for solving markov decision processes, Advances in Neural Information Processing Systems 26, pp.2382-2390, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00923681

L. Kocsis and C. Szepesvári, Bandit Based Monte-Carlo Planning, In: ECML-06. Number 4212 in LNCS, pp.282-293, 2006.
DOI : 10.1007/11871842_29

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8

R. Munos, From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, Machine Learning, pp.1-129, 2014.
DOI : 10.1561/2200000038
URL : https://hal.archives-ouvertes.fr/hal-00747575

N. J. Nilsson, Principles of Artificial Intelligence, 1980.
DOI : 10.1007/978-3-662-09438-9

M. L. Puterman, Markov Decision Processes ? Discrete Stochastic Dynamic Programming, 1994.

T. J. Walsh, S. Goschin, and M. L. Littman, Integrating sample-based planning and model-based reinforcement learning, Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, pp.612-617, 2010.