Regret bounds for the adaptive control of linear quadratic systems, Proceedings of the 24th Annual Conference on Learning Theory, vol.19, pp.9-11, 2011. ,
Improved Algorithms for Linear Stochastic Bandits, Advances in Neural Information Processing Systems (NIPS), pp.2312-2320, 2011. ,
Minimax Regret Bounds for Reinforcement Learning, 2017. ,
Learning to act using real-time dynamic programming, Artificial intelligence, vol.72, issue.1-2, pp.81-138, 1995. ,
Concentration inequalities: A nonasymptotic theory of independence, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00794821
Optimal adaptive policies for Markov decision processes, Mathematics of Operations Research, vol.22, issue.1, pp.222-255, 1997. ,
Online learning in kernelized markov decision processes, of Proceedings of Machine Learning Research, vol.89, pp.16-18, 2019. ,
Minimal exploration in structured stochastic bandits, NIPS, pp.1763-1771, 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-02395029
Tight regret bounds for model-based reinforcement learning with greedy policies, Advances in Neural Information Processing Systems, pp.12203-12213, 2019. ,
Efficient Regression in Metric Spaces via Approximate Lipschitz Extension, IEEE Transactions on Information Theory, vol.63, issue.8, pp.4838-4849, 2017. ,
Near-optimal Regret Bounds for Reinforcement Learning, Journal of Machine Learning Research, vol.99, pp.1563-1600, 2010. ,
, Is Q-learning Provably Efficient? (NeurIPS), 2018.
Provably Efficient Reinforcement Learning with Linear Function Approximation, pp.1-28, 2019. ,
, Metric State Spaces. Icml, pp.306-312, 2003.
Near-optimal reinforcement learning in polynomial time, Machine learning, vol.49, issue.2-3, pp.209-232, 2002. ,
Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning, Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015. ,
URL : https://hal.archives-ouvertes.fr/hal-01165966
The samplecomplexity of general reinforcement learning, ICML (3), vol.28, pp.28-36, 2013. ,
Exploration in structured reinforcement learning, NeurIPS, pp.8888-8896, 2018. ,
Kernel-based reinforcement learning, Machine Learning, vol.49, pp.161-178, 2002. ,
Online Regret Bounds for Undiscounted Continuous Reinforcement Learning, 2013. ,
URL : https://hal.archives-ouvertes.fr/hal-00765441
more) efficient reinforcement learning via posterior sampling, Advances in Neural Information Processing Systems, pp.3003-3011, 2013. ,
PAC optimal exploration in continuous space markov decision processes, AAAI, 2013. ,
, Self-normalized processes: Limit theory and Statistical Applications, 2008.
Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994. ,