P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

D. P. Bertsekas, Neuro-dynamic programming, Encyclopedia of Optimization, pp.2555-2560, 2009.
DOI : 10.1007/0-306-48332-7_333

G. Chaslot, C. Fiter, J. Hoock, A. Rimmel, and O. Teytaud, Adding Expert Knowledge and Exploration in Monte-Carlo Tree Search, Advances in Computer Games, 2009.
DOI : 10.1007/978-3-642-12993-3_1

URL : https://hal.archives-ouvertes.fr/inria-00386477

F. De-mesmay, A. Rimmel, Y. Voronenko, and M. Püschel, Bandit-based optimization on graphs with application to library performance tuning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553468

URL : https://hal.archives-ouvertes.fr/inria-00379523

L. Kocsis and C. Szepesvari, Bandit Based Monte-Carlo Planning, European Conference on Machine Learning, pp.282-293, 2006.
DOI : 10.1007/11871842_29

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296

T. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8

URL : http://doi.org/10.1016/0196-8858(85)90002-8

J. Pearl, Heuristics. Intelligent search strategies for computer problem solving, 1984.

P. Rolet, M. Sebag, and O. Teytaud, Optimal active learning through billiards and upper confidence trees in continous domains, Proceedings of the European Conference on Machine Learning, 2009.

E. A. Sherstov and P. Stone, Function Approximation via Tile Coding: Automating Parameter Choice, In of Lecture Notes in Artificial Intelligence, pp.194-205, 2005.
DOI : 10.1007/11527862_14

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.78.6631

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

R. S. Sutton, Learning to predict by the methods of temporal differences, Machine Learning, pp.9-44, 1988.
DOI : 10.1007/BF00115009

F. Teytaud and O. Teytaud, Creating an Upper-Confidence-Tree Program for Havannah, Advances in Computer Games 12, 2009.
DOI : 10.1007/978-3-642-12993-3_7

URL : https://hal.archives-ouvertes.fr/inria-00380539