Finitetime analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002. ,
DOI : 10.1023/A:1013689704352
A distributional perspective on reinforcement learning. arXiv preprint, 2017. ,
Open loop optimistic planning, COLT, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00943119
The mahalanobis distance. Chemometrics and intelligent laboratory systems, pp.1-18, 2000. ,
Fuego???An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search, IEEE Transactions on Computational Intelligence and AI in Games, vol.2, issue.4, pp.259-270, 2010. ,
DOI : 10.1109/TCIAIG.2010.2083662
, , 2011.
Planning and acting in partially observable stochastic domains, Artificial Intelligence, vol.101, issue.1-2, pp.99-134, 1998. ,
DOI : 10.1016/S0004-3702(98)00023-X
Trial-based heuristic tree search for finite horizon mdps, ICAPS, 2013. ,
Bandit Based Monte-Carlo Planning, ECML, pp.282-293, 2006. ,
DOI : 10.1007/11871842_29
,
, Monte carlo tree search: Long-term versus short-term planning, Computational Intelligence and Games (CIG), 2012 IEEE Conference on, pp.219-226, 2012.
The physical travelling salesman problem: WCCI 2012 competition, 2012 IEEE Congress on Evolutionary Computation, pp.1-8, 2012. ,
DOI : 10.1109/CEC.2012.6256440
Markov decision processes: discrete stochastic dynamic programming, 2014. ,
On the locality of action domination in sequential decision making, ISAIM, 2010. ,
Monte-carlo planning in large pomdps, Advances in neural information processing systems, pp.2164-2172, 2010. ,
Sample-based learning and search with permanent and transient memories, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.968-975, 2008. ,
DOI : 10.1145/1390156.1390278
Mastering the game of Go with deep neural networks and tree search, Nature, vol.34, issue.7587, pp.529-484, 2016. ,
DOI : 10.3233/ICG-2011-34302
Dyna, an integrated architecture for learning, planning, and reacting, ACM SIGART Bulletin, vol.2, issue.4, pp.160-163, 1991. ,
Reinforcement learning: An introduction, 1998. ,
Bandit-based planning and learning in continuous-action markov decision processes, ICAPS, 2012. ,