A. Antos, R. Munos, and C. Szepesvári, Fitted Q-iteration in continuous action-space MDPs, Proceedings of the 21st Annual Conference on Neural Information Processing Systems, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00185311

P. L. Bartlett and A. Tewari, REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs, Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, 2009.

R. Bellman, Dynamic Programming, 1957.

D. P. Bertsekas, Dynamic Programming and Optimal Control, volume I, Athena Scientific, 2007.

D. P. Bertsekas, Dynamic Programming and Optimal Control, volume II, Athena Scientific, 2007.

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, 1996.

N. Cesa-bianchi and G. Lugosi, Prediction, Learning, and Games, 2006.
DOI : 10.1017/CBO9780511546921

E. Even-dar and Y. Mansour, Learning Rates for Q-Learning, Journal of Machine Learning Research, vol.5, pp.1-25, 2003.
DOI : 10.1007/3-540-44581-1_39

E. Even-dar, S. Mannor, and Y. Mansour, PAC Bounds for Multi-armed Bandit and Markov Decision Processes, 15th Annual Conference on Computational Learning Theory, pp.255-270, 2002.
DOI : 10.1007/3-540-45435-7_18

W. Feller, An Introduction to Probability Theory and Its Applications, 1968.

T. Jaksch, R. Ortner, and P. Auer, Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010.

M. Kearns and S. Singh, Finite-sample convergence rates for Q-learning and indirect algorithms, Advances in Neural Information Processing Systems 12, pp.996-1002, 1999.

S. Koenig and R. G. Simmons, Complexity analysis of real-time reinforcement learning, Proceedings of the Eleventh National Conference on Artificial Intelligence, 1993.

S. Mannor and J. N. Tsitsiklis, The sample complexity of exploration in the multi-armed bandit problem, Journal of Machine Learning Research, vol.5, pp.623-648, 2004.

R. Munos and C. Szepesvári, Finite-time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008.
URL : https://hal.archives-ouvertes.fr/inria-00120882

J. Peng and R. J. Williams, Incremental multi-step Q-learning, Machine Learning, pp.283-290, 1996.

M. L. Puterman, Markov Decision Processes, Discrete Stochastic Dynamic Programming, 1994.

A. L. Strehl, L. Li, and M. L. Littman, Reinforcement learning in finite MDPs: PAC analysis, Journal of Machine Learning Research, vol.10, pp.2413-2444, 2009.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

. Cs and . Szepesvári, The asymptotic convergence-rate of Q-learning, Advances in Neural Information Processing Systems 10, 1997.

. Cs and . Szepesvári, Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, 2010.

I. Szita and C. Szepesvári, Model-based reinforcement learning with nearly tight exploration complexity bounds, Proceedings of the 27th International Conference on Machine Learning, pp.1031-1038, 2010.

H. Van-hasselt, Double Q-learning, Advances in Neural Information Processing Systems 23, pp.2613-2621, 2010.

C. J. Watkins, Learning from Delayed Rewards, Kings College, 1989.