Online learning in markov decision processes with adversarially chosen transition probability distributions, Advances in Neural Information Processing Systems 26, pp.2508-2516, 2013. ,
Improved Rates for the Stochastic Continuum-Armed Bandit Problem, In COLT, pp.454-468, 2007. ,
DOI : 10.1007/978-3-540-72927-3_33
Regret Bounds for Reinforcement Learning with Policy Advice, ECML/PKDD, pp.97-112, 2013. ,
DOI : 10.1007/978-3-642-40988-2_7
URL : https://hal.archives-ouvertes.fr/hal-00924021
Reinforcement learning in pomdp's via direct gradient ascent, ICML, pp.41-48, 2000. ,
X-armed bandits, Journal of Machine Learning Research, vol.12, pp.1655-1695, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00450235
Lipschitz Bandits without the Lipschitz Constant, ALT, pp.144-158, 2011. ,
DOI : 10.1007/978-3-642-24412-4_14
URL : https://hal.archives-ouvertes.fr/hal-00595692
Adaptive-tree bandits. arXiv preprint arXiv:1302, 2013. ,
DOI : 10.3150/14-bej644
Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems, IEEE Transactions on Automatic Control, vol.54, issue.6, pp.1243-1253, 2009. ,
DOI : 10.1109/TAC.2009.2019797
High dimensional gaussian process bandits, Neural Information Processing Systems (NIPS), 2013. ,
Nearoptimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010. ,
Multi-armed bandits in metric spaces, Proceedings of the fourtieth annual ACM symposium on Theory of computing, STOC 08, pp.681-690, 2008. ,
DOI : 10.1145/1374376.1374475
Bandits and experts in metric spaces, 2013. ,
Policy search for motor primitives in robotics, Machine Learning, pp.171-203, 2011. ,
The sample-complexity of general reinforcement learning, Proceedings of Thirtieth International Conference on Machine Learning (ICML), 2013. ,
Markov chains and mixing times, 2006. ,
DOI : 10.1090/mbk/058
Empirical bernstein bounds and sample variance penalization. arXiv preprint, 2009. ,
Optimistic optimization of a deterministic function without the knowledge of its smoothness, NIPS, pp.783-791, 2011. ,
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, Machine Learning, 2013. ,
DOI : 10.1561/2200000038
URL : https://hal.archives-ouvertes.fr/hal-00747575
Online regret bounds for undiscounted continuous reinforcement learning, Advances in Neural Information Processing Systems 25, pp.1772-1780, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00765441
Policy search: Any local optimum enjoys a global performance guarantee. arXiv preprint, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00829548
Contextual bandits with similarity information. CoRR, abs/0907, 2009. ,
Multi-armed bandits on implicit metric spaces, Advances in Neural Information Processing Systems, pp.1602-1610, 2011. ,