Use of variance estimation in the multi-armed bandit problem, NIPS 2006 Workshop on On-line Trading of Exploration and Exploitation, 2006. ,
Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.23, pp.235-256, 2002. ,
Adaptive and self-confident on-line learning algorithms, Machine Learning Journal, 2001. ,
Bandit problems with infinitely many arms, Ann. Statist, vol.25, issue.5, pp.2103-2116, 1997. ,
Prediction, learning, and games, 2006. ,
Robbing the bandit: less regret in online geometric optimization against an adaptive adversary, SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm, pp.937-943, 2006. ,
A probabilistic Theory of Pattern Recognition, 1997. ,
Multiarmed bandits, dynamic environments and meta-bandits, NIPS Workshop " online trading of exploration and exploitation, 2006. ,
Exploration vs. exploitation challenge, 2006. ,
Reduced-variance payoff estimation in adversarial bandit problems, Proceedings of the ECML-2005 Workshop on Reinforcement Learning in Non-Stationary Environments. CAp, 2005. ,
Bandit-based monte-carlo planning, p.6, 2006. ,
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, pp.4-22, 1985. ,
Modification of uct with patterns in monte-carlo go, Proceedings of ADPRL'07, 2007. ,