P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

P. Dimitri, J. Bertsekas, and . Tsitsiklis, Neuro-Dynamic Programming Coquelin and R. Munos. Bandit algorithms for tree search, Athena Scientific Uncertainty in Artificial Intelligence, 1996.

Y. [. Gelly, R. Wang, O. Munos, and . Teytaud, Modification of UCT with patterns in Monte-Carlo go, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00117266

M. Kearns, Y. Mansour, A. Y. Ng-[-lr85-]-t, H. Lai, . Robbinspg04-]-l et al., A sparse sampling algorithm for near-optimal planning in large Markovian decision processes Bandit based monte-carlo planning Asymptotically efficient adaptive allocation rules On-line search for solving large Markov decision processes Some aspects of the sequential design of experiments, Machine Learning European Conference on Machine Learning Proceedings of the 16th European Conference on Artificial IntelligencePut94] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic ProgrammingSB98] R. Sutton and A. Barto. Reinforcement Learning, pp.193-208, 1952.