P. Auer, R. Ortner, and C. Szepesvári, Improved Rates for the Stochastic Continuum-Armed Bandit Problem, 20th Conference on Learning Theory, pp.454-468, 2007.
DOI : 10.1007/978-3-540-72927-3_33

E. Cope, Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces, 2004.

P. Coquelin and R. Munos, Bandit algorithms for tree search, Proceedings of 23rd Conference on Uncertainty in Artificial Intelligence, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00150207

J. L. Doob, Stochastic Processes, 1953.

S. Gelly, Y. Wang, R. Munos, and O. Teytaud, Modification of UCT with patterns in Monte-Carlo go, 2006.
URL : https://hal.archives-ouvertes.fr/inria-00117266

R. Kleinberg, Nearly tight bounds for the continuum-armed bandit problem, 18th Advances in Neural Information Processing Systems, 2004.

R. Kleinberg, A. Slivkins, and E. Upfal, Multi-armed bandits in metric spaces, Proceedings of the fourtieth annual ACM symposium on Theory of computing, STOC 08, 2008.
DOI : 10.1145/1374376.1374475

L. Kocsis and C. Szepesvári, Bandit Based Monte-Carlo Planning, Proceedings of the 15th European Conference on Machine Learning, pp.282-293, 2006.
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296