A. and M. R. Szepesvari-c, Use of variance estimation in the multi-armed bandit problem, NIPS 2006 Workshop on On-line Trading of Exploration and Exploitation, 2006.

A. P. Cesa-bianchi-n and . Fischer-p, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.23, pp.235-256, 2002.

A. P. Cesa-bianchi-n and . Gentile-c, Adaptive and self-confident on-line learning algorithms, Machine Learning Journal, 2001.

B. D. , C. R. , Z. A. , and H. D. Shepp-l, Bandit problems with infinitely many arms, Ann. Statist, vol.25, issue.5, pp.2103-2116, 1997.

C. Lugosi-g, Prediction, learning, and games, 2006.

D. V. Hayes-t, Robbing the bandit: less regret in online geometric optimization against an adaptive adversary, SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, pp.937-943, 2006.

D. L. Gy¨orfigy¨ and G. L. Lugosi-g, A probabilistic Theory of Pattern Recognition, 1997.

H. C. , G. S. Baskiotis-n, and T. O. Sebag-m, Multiarmed bandits, dynamic environments and meta-bandits, NIPS Workshop " online trading of exploration and exploitation, 2006.

H. Z. , A. P. Cesa-bianchi-n, and N. L. Shawe-taylor, Exploration vs. exploitation challenge, 2006.

K. L. Szepesvari-c, Reduced-variance payoff estimation in adversarial bandit problems, Proceedings of the ECML-2005 Workshop on Reinforcement Learning in Non-Stationary Environments. CAp, 2005.

K. L. Szepesvari-c, Bandit-based monte-carlo planning, p.6, 2006.

L. T. Robbins-h, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, pp.4-22, 1985.

W. Y. Gelly-s, Modification of uct with patterns in monte-carlo go, Proceedings of ADPRL'07, 2007.