Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
Dynamic Programming and Optimal Control, Athena Scientific, 2001. ,
Open loop optimistic planning, Conference on Learning Theory, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00943119
Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Machine Learning, pp.1-122, 2012. ,
Optimistic planning for markov decision processes, Proceedings 15th International Conference on Artificial Intelligence and Statistics (AISTATS-12), pp.182-189, 2012. ,
Model Predictive Control, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-00683813
Prediction, Learning, and Games, 2006. ,
DOI : 10.1017/CBO9780511546921
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Proceedings Computers and Games, 2006. ,
DOI : 10.1007/978-3-540-75538-8_7
URL : https://hal.archives-ouvertes.fr/inria-00116992
Action elimination and stopping conditions for reinforcement learning, Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), pp.162-169, 2003. ,
Best arm identification: A unified approach to fixed budget and fixed confidence, NIPS, pp.3221-3229, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00747005
Modification of UCT with Patterns in Monte-Carlo Go, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00117266
A heuristic search algorithm for Markov decision problems, Proceedings Bar-Ilan Symposium on the Foundation of Artificial Intelligence, pp.23-25 ,
Optimistic Planning of Deterministic Systems, Recent Advances in Reinforcement Learning European Workshop on Reinforcement Learning, pp.151-164, 2008. ,
DOI : 10.1007/978-3-540-89722-4_12
URL : https://hal.archives-ouvertes.fr/hal-00830182
A sparse sampling algorithm for near-optimal planning in large Markovian decision processes, Machine Learning, pp.193-208, 2002. ,
Aggregating optimistic planning trees for solving markov decision processes, Advances in Neural Information Processing Systems 26, pp.2382-2390, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00923681
Bandit Based Monte-Carlo Planning, In: ECML-06. Number 4212 in LNCS, pp.282-293, 2006. ,
DOI : 10.1007/11871842_29
Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985. ,
DOI : 10.1016/0196-8858(85)90002-8
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, Machine Learning, pp.1-129, 2014. ,
DOI : 10.1561/2200000038
URL : https://hal.archives-ouvertes.fr/hal-00747575
Principles of Artificial Intelligence, 1980. ,
DOI : 10.1007/978-3-662-09438-9
Markov Decision Processes ? Discrete Stochastic Dynamic Programming, 1994. ,
Integrating sample-based planning and model-based reinforcement learning, Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, pp.612-617, 2010. ,