B. Arneson, R. Hayward, and P. Henderson, Mohex wins hex tournament, ICGA journal, pp.114-116, 2009.

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

R. Bellman, Dynamic Programming, 1957.

A. Couetoux, J. Hoock, N. Sokolovska, O. Teytaud, and N. Bonnard, Continuous Upper Confidence Trees, LION'11: Proceedings of the 5th International Conference on Learning and Intelligent OptimizatioN, p.page TBA, 2011.
DOI : 10.1016/0196-8858(85)90002-8
URL : https://hal.archives-ouvertes.fr/hal-00835352

R. Coulom, Computing elo ratings of move patterns in the game of go, Computer Games Workshop, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00149859

F. De-mesmay, A. Rimmel, Y. Voronenko, and M. Püschel, Bandit-based optimization on graphs with application to library performance tuning, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.729-736, 2009.
DOI : 10.1145/1553374.1553468
URL : https://hal.archives-ouvertes.fr/inria-00379523

S. Gelly and D. Silver, Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007.
DOI : 10.1145/1273496.1273531
URL : https://hal.archives-ouvertes.fr/inria-00164003

L. Kocsis and C. Szepesvàri, Bandit Based Monte-Carlo Planning, 15th European Conference on Machine Learning (ECML), pp.282-293, 2006.
DOI : 10.1007/11871842_29

C. Lee, M. Wang, G. Chaslot, J. Hoock, A. Rimmel et al., The Computational Intelligence of MoGo Revealed in Taiwan's Computer Go Tournaments, IEEE Transactions on Computational Intelligence and AI in games, 2009.

P. Massé, Les Réserves et la Régulation de l'Avenir dans la vie Economique, 1946.

H. Nakhost and M. Müller, Monte-carlo exploration for deterministic planning, IJCAI, pp.1766-1771, 2009.

P. Rolet, M. Sebag, and O. Teytaud, Optimal robust expensive optimization is tractable, Proceedings of the 11th Annual conference on Genetic and evolutionary computation, GECCO '09, 2009.
DOI : 10.1145/1569901.1570255
URL : https://hal.archives-ouvertes.fr/inria-00374910

R. Sutton and A. G. Barto, Reinforcement learning, 1998.
DOI : 10.1007/978-1-4615-3618-5
URL : https://hal.archives-ouvertes.fr/hal-00764281

F. Teytaud and O. Teytaud, Creating an Upper-Confidence-Tree Program for Havannah, ACG 12, 2009.
DOI : 10.1007/978-3-642-12993-3_7
URL : https://hal.archives-ouvertes.fr/inria-00380539

B. Tuffin, On the use of low discrepancy sequences in Monte Carlo methods, Monte Carlo Methods and Applications, vol.2, issue.4, 1996.
DOI : 10.1515/mcma.1996.2.4.295

Y. Wang, J. Audibert, and R. Munos, Algorithms for infinitely manyarmed bandits, Advances in Neural Information Processing Systems, 2008.