The Continuum-Armed Bandit Problem, SIAM Journal on Control and Optimization, vol.33, issue.6, pp.1926-1951, 1995. ,
DOI : 10.1137/S0363012992237273
Adaptive and Self-Confident On-Line Learning Algorithms, Journal of Computer and System Sciences, vol.64, issue.1, 2001. ,
DOI : 10.1006/jcss.2001.1795
Learning to act using real-time dynamic programming, Artificial Intelligence, vol.72, issue.1-2, 1993. ,
DOI : 10.1016/0004-3702(94)00011-O
Dynamic Programming, 1957. ,
Bandit problems with infinitely many arms, The Annals of Statistics, vol.25, issue.5, pp.2103-2116, 1997. ,
DOI : 10.1214/aos/1069362389
Dynamic Programming and Optimal Control, vols I and II, 1995. ,
Monte carlo go, 1993. ,
Combining tactical search and monte-carlo in the game of go, IEEE CIG, pp.171-175, 2005. ,
Bandit algorithms for tree search, Proceedings of UAI'07, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00150207
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Proceedings of the 5th International Conference on Computers and Games, 2006. ,
DOI : 10.1007/978-3-540-75538-8_7
URL : https://hal.archives-ouvertes.fr/inria-00116992
Computing elo ratings of move patterns in the game of go, Computer Games Workshop, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00149859
Robbing the bandit, Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm , SODA '06, pp.937-943, 2006. ,
DOI : 10.1145/1109557.1109660
Combining online and offline knowledge in UCT, Proceedings of the 24th international conference on Machine learning, ICML '07, pp.273-280, 2007. ,
DOI : 10.1145/1273496.1273531
URL : https://hal.archives-ouvertes.fr/inria-00164003
Exploration vs. exploitation challenge, 2006. ,
Reduced-variance payoff estimation in adversarial bandit problems, Proceedings of the ECML-2005 Workshop on Reinforcement Learning in Non-Stationary Environments, 2005. ,
Bandit-based montecarlo planning, p.6, 2006. ,
DOI : 10.1007/11871842_29
URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296
Discounted-ucb, 2nd Pascal-Challenge Workshop, 2006. ,
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985. ,
DOI : 10.1016/0196-8858(85)90002-8
URL : http://doi.org/10.1016/0196-8858(85)90002-8
Approximate Dynamic Programming, 2007. ,
Modifications of UCT and sequence-like simulations for Monte-Carlo Go, 2007 IEEE Symposium on Computational Intelligence and Games, pp.175-182, 2007. ,
DOI : 10.1109/CIG.2007.368095