J. Audibert and S. Bubeck, Minimax policies for adversarial and stochastic bandits, proceedings of the Annual Conference on Learning Theory (COLT), 2009.
URL : https://hal.archives-ouvertes.fr/hal-00834882

P. Auer, N. Cesa-bianchi, Y. Freund, and R. E. Schapire, Gambling in a rigged casino: The adversarial multi-armed bandit problem, Proceedings of IEEE 36th Annual Foundations of Computer Science, pp.322-331, 1995.
DOI : 10.1109/SFCS.1995.492488

B. Bouzy and M. Métivier, Multi-agent learning experiments on repeated matrix games, ICML, pp.119-126, 2010.

R. Coulom, Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Proceedings of the 5th International Conference on Computers and Games, pp.72-83, 2006.
DOI : 10.1007/978-3-540-75538-8_7
URL : https://hal.archives-ouvertes.fr/inria-00116992

M. D. Grigoriadis and L. G. Khachiyan, A sublinear-time randomized approximation algorithm for matrix games, Operations Research Letters, vol.18, issue.2, pp.53-58, 1995.
DOI : 10.1016/0167-6377(95)00032-0

L. Kocsis and C. Szepesvari, Bandit Based Monte-Carlo Planning, 15th European Conference on Machine Learning (ECML), pp.282-293, 2006.
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296

T. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8
URL : http://doi.org/10.1016/0196-8858(85)90002-8

C. Lee, M. Wang, G. Chaslot, J. Hoock, A. Rimmel et al., The Computational Intelligence of MoGo Revealed in Taiwan's Computer Go Tournaments, IEEE Transactions on Computational Intelligence and AI in games, 2009.

O. Madani, S. Hanks, and A. Condon, On the undecidability of probabilistic planning and related stochastic optimization problems, Artificial Intelligence, vol.147, issue.1-2, pp.5-34, 2003.
DOI : 10.1016/S0004-3702(02)00378-8

M. Mundhenk, J. Goldsmith, C. Lusena, and E. Allender, Complexity of finite-horizon Markov decision process problems, Journal of the ACM, vol.47, issue.4, pp.681-720, 2000.
DOI : 10.1145/347476.347480

C. H. Papadimitriou and J. N. Tsitsiklis, The Complexity of Markov Decision Processes, Mathematics of Operations Research, vol.12, issue.3, pp.441-450, 1987.
DOI : 10.1287/moor.12.3.441

J. Rintanen, Complexity of Planning with Partial Observability, Proceedings of ICAPS'03 Workshop on Planning under Uncertainty and Incomplete Information, 2003.

O. Teytaud, Decidability and complexity in partially observable antagonist coevolution, Proceedings of Dagstuhl's seminar 10361, 2010.