R. Agrawal, The Continuum-Armed Bandit Problem, SIAM Journal on Control and Optimization, vol.33, issue.6, pp.1926-1951, 1995.
DOI : 10.1137/S0363012992237273

P. Auer, N. Cesa-bianchi, and C. Gentile, Adaptive and Self-Confident On-Line Learning Algorithms, Journal of Computer and System Sciences, vol.64, issue.1, 2001.
DOI : 10.1006/jcss.2001.1795

A. Barto, S. Bradtke, and S. Singh, Learning to act using real-time dynamic programming, Artificial Intelligence, vol.72, issue.1-2, 1993.
DOI : 10.1016/0004-3702(94)00011-O

R. Bellman, Dynamic Programming, 1957.

D. A. Berry, R. W. Chen, A. Zame, D. C. Heath, and L. A. Shepp, Bandit problems with infinitely many arms, The Annals of Statistics, vol.25, issue.5, pp.2103-2116, 1997.
DOI : 10.1214/aos/1069362389

D. Bertsekas, Dynamic Programming and Optimal Control, vols I and II, 1995.

B. Bruegmann, Monte carlo go, 1993.

T. Cazenave and B. Helmstetter, Combining tactical search and monte-carlo in the game of go, IEEE CIG, pp.171-175, 2005.

P. Coquelin and R. Munos, Bandit algorithms for tree search, Proceedings of UAI'07, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00150207

R. Coulom, Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Proceedings of the 5th International Conference on Computers and Games, 2006.
DOI : 10.1007/978-3-540-75538-8_7
URL : https://hal.archives-ouvertes.fr/inria-00116992

R. Coulom, Computing elo ratings of move patterns in the game of go, Computer Games Workshop, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00149859

V. Dani and T. P. Hayes, Robbing the bandit, Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm , SODA '06, pp.937-943, 2006.
DOI : 10.1145/1109557.1109660

S. Gelly and D. Silver, Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007.
DOI : 10.1145/1273496.1273531
URL : https://hal.archives-ouvertes.fr/inria-00164003

Z. Hussain, P. Auer, N. Cesa-bianchi, L. Newnham, and J. Shawe-taylor, Exploration vs. exploitation challenge, 2006.

L. Kocsis and C. Szepesvari, Reduced-variance payoff estimation in adversarial bandit problems, Proceedings of the ECML-2005 Workshop on Reinforcement Learning in Non-Stationary Environments, 2005.

L. Kocsis and C. Szepesvari, Bandit-based montecarlo planning, p.6, 2006.
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296

L. Kocsis and C. Szepesvari, Discounted-ucb, 2nd Pascal-Challenge Workshop, 2006.

T. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8
URL : http://doi.org/10.1016/0196-8858(85)90002-8

W. Powell, Approximate Dynamic Programming, 2007.

Y. Wang and S. Gelly, Modifications of UCT and sequence-like simulations for Monte-Carlo Go, 2007 IEEE Symposium on Computational Intelligence and Games, pp.175-182, 2007.
DOI : 10.1109/CIG.2007.368095