A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption, Algorithmic Learning Theory, 2019. ,
URL : https://hal.archives-ouvertes.fr/hal-01885368
Neuro-dynamic programming, Athena Scientific, 1996. ,
Open-loop optimistic planning, Conference on Learning Theory, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00943119
Xarmed bandits, Journal of Machine Learning Research, vol.12, pp.1587-1627, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00450235
Optimistic planning for Markov decision processes, International Conference on Artificial Intelligence and Statistics, 2012. ,
Tight (lower) bounds for the fixed budget best-arm identification bandit problem, Conference on Learning Theory, 2016. ,
Bandit algorithms for tree search, Uncertainty in Artificial Intelligence, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00150207
Efficient selectivity and backup operators in Monte-Carlo tree search. Computers and games, vol.4630, p.7283, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00116992
Simple regret optimization in online planning for Markov decision processes, Journal of Artificial Intelligence Research, 2014. ,
Bestarm identification: A unified approach to fixed budget and fixed confidence, Neural Information Processing Systems, 2012. ,
Modification of UCT with patterns in Monte-Carlo Go, 2006. ,
URL : https://hal.archives-ouvertes.fr/inria-00117266
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning, Neural Information Processing Systems, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01389107
Inequalities on the lambert w function and hyperpower function, Journal of Inequalities in Pure and Applied Mathematics, vol.9, issue.2, pp.5-9, 2008. ,
Optimistic planning of deterministic systems, European Workshop on Reinforcement Learning, 2008. ,
URL : https://hal.archives-ouvertes.fr/hal-00830182
Monte-carlo tree search by best-arm identification, Neural Information Processing Systems, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01535907
Bandit-based Monte-Carlo planning, European Conference on Machine Learning, 2006. ,
, , 2019.
Optimistic optimization of deterministic functions without the knowledge of its smoothness, Neural Information Processing Systems, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00830143
From bandits to Monte-Carlo tree search: The optimistic principle applied to optimization and planning. Foundations and Trends in Machine Learning, vol.7, pp.1-130, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00747575
Scale-free online learning, Theoretical Computer Science, 2018. ,
Training deep networks without learning rates through coin betting, Neural Information Processing Systems, 2017. ,
Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994. ,
Normalized online learning, Uncertainty in Artificial Intelligence, 2013. ,
On reinforcement learning using Monte-Carlo tree search with supervised learning: Non-asymptotic analysis, 2019. ,
Mastering the game of Go with deep neural networks and tree search, Nature, vol.529, issue.7587, pp.484-489, 2016. ,
Optimistic planning in Markov decision processes using a generative model, Neural Information Processing Systems, 2014. ,
Stochastic simultaneous optimistic optimization, International Conference on Machine Learning, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00789606