Y. Abbasi-yadkori and C. Szepesvári, Regret bounds for the adaptive control of linear quadratic systems, Proceedings of the 24th Annual Conference on Learning Theory, vol.19, pp.9-11, 2011.

Y. Abbasi-yadkori, D. Pál, and C. Szepesvári, Improved Algorithms for Linear Stochastic Bandits, Advances in Neural Information Processing Systems (NIPS), pp.2312-2320, 2011.

M. G. Azar, I. Osband, and R. Munos, Minimax Regret Bounds for Reinforcement Learning, 2017.

A. G. Barto, S. J. Bradtke, and S. P. Singh, Learning to act using real-time dynamic programming, Artificial intelligence, vol.72, issue.1-2, pp.81-138, 1995.

S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities: A nonasymptotic theory of independence, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00794821

A. N. Burnetas and M. N. Katehakis, Optimal adaptive policies for Markov decision processes, Mathematics of Operations Research, vol.22, issue.1, pp.222-255, 1997.

S. R. Chowdhury and A. Gopalan, Online learning in kernelized markov decision processes, of Proceedings of Machine Learning Research, vol.89, pp.16-18, 2019.

R. Combes, S. Magureanu, and A. Proutière, Minimal exploration in structured stochastic bandits, NIPS, pp.1763-1771, 2017.
URL : https://hal.archives-ouvertes.fr/hal-02395029

Y. Efroni, N. Merlis, M. Ghavamzadeh, and S. Mannor, Tight regret bounds for model-based reinforcement learning with greedy policies, Advances in Neural Information Processing Systems, pp.12203-12213, 2019.

L. A. Gottlieb, A. Kontorovich, and R. Krauthgamer, Efficient Regression in Metric Spaces via Approximate Lipschitz Extension, IEEE Transactions on Information Theory, vol.63, issue.8, pp.4838-4849, 2017.

T. Jaksch, R. Ortner, and P. Auer, Near-optimal Regret Bounds for Reinforcement Learning, Journal of Machine Learning Research, vol.99, pp.1563-1600, 2010.

C. Jin, Z. Allen-zhu, S. Bubeck, J. , and M. I. , Is Q-learning Provably Efficient? (NeurIPS), 2018.

C. Jin, Z. Yang, Z. Wang, J. , and M. I. , Provably Efficient Reinforcement Learning with Linear Function Approximation, pp.1-28, 2019.

S. M. Kakade, M. J. Kearns, J. Langford, and . Ex, Metric State Spaces. Icml, pp.306-312, 2003.

M. Kearns and S. Singh, Near-optimal reinforcement learning in polynomial time, Machine learning, vol.49, issue.2-3, pp.209-232, 2002.

K. Lakshmanan, R. Ortner, and D. Ryabko, Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning, Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015.
URL : https://hal.archives-ouvertes.fr/hal-01165966

T. Lattimore, M. Hutter, and P. Sunehag, The samplecomplexity of general reinforcement learning, ICML (3), vol.28, pp.28-36, 2013.

J. Ok, A. Proutière, and D. Tranos, Exploration in structured reinforcement learning, NeurIPS, pp.8888-8896, 2018.

D. Ormoneit and ?. Sen, Kernel-based reinforcement learning, Machine Learning, vol.49, pp.161-178, 2002.

R. Ortner and D. Ryabko, Online Regret Bounds for Undiscounted Continuous Reinforcement Learning, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00765441

I. Osband, D. Russo, and B. Van-roy, more) efficient reinforcement learning via posterior sampling, Advances in Neural Information Processing Systems, pp.3003-3011, 2013.

J. Pazis and R. Parr, PAC optimal exploration in continuous space markov decision processes, AAAI, 2013.

V. H. Peña, T. L. Lai, and Q. Shao, Self-normalized processes: Limit theory and Statistical Applications, 2008.

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994.