R. Bellman, Dynamic Programming, 1957.

D. Bertsekas and J. Tsitsiklis, Neuro-dynamic Programming, Athena Scientific, 1996.

S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvri, Online optimization in x-armed bandits, Advances in Neural Information Processing Systems 22, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00329797

G. Chaslot, M. Winands, J. Uiterwijk, H. Van-den-herik, and B. Bouzy, Progressive Strategies for Monte-Carlo Tree Search, Proceedings of the 10th Joint Conference on Information Sciences, pp.655-661, 2007.

A. Couetoux, J. Hoock, N. Sokolovska, O. Teytaud, and N. Bonnard, Continuous Upper Confidence Trees, LION'11: Proceedings of the 5th International Conference on Learning and Intelligent OptimizatioN, p.page TBA, 2011.
DOI : 10.1016/0196-8858(85)90002-8

URL : https://hal.archives-ouvertes.fr/hal-00835352

R. Coulom, Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search
DOI : 10.1007/978-3-540-75538-8_7

URL : https://hal.archives-ouvertes.fr/inria-00116992

R. Coulom, Computing elo ratings of move patterns in the game of go, Computer Games Workshop, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00149859

S. Gelly and D. Silver, Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007.
DOI : 10.1145/1273496.1273531

URL : https://hal.archives-ouvertes.fr/inria-00164003