J. Asmuth, L. Li, M. L. Littman, A. Nouri, and D. Wingate, A Bayesian sampling approach to exploration in reinforcement learning, Proc. of UAI, 2009.

R. Bellman, The theory of dynamic programming, Bulletin of the American Mathematical Society, vol.60, issue.6, pp.503-516, 1954.
DOI : 10.1090/S0002-9904-1954-09848-8

R. I. Brafman and M. Tennenholtz, R-max -a general polynomial time algorithm for near-optimal reinforcement learning, JMLR, vol.3, pp.213-231, 2003.

C. Dimitrakakis, Tree Exploration for Bayesian RL Exploration, 2008 International Conference on Computational Intelligence for Modelling Control & Automation, 2008.
DOI : 10.1109/CIMCA.2008.32

M. Duff, Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes, 2002.

M. Kearns and S. Singh, Near-optimal reinforcement learning in polynomial time, Machine Learning, pp.260-268, 1998.

J. Kolter and A. Ng, Near-Bayesian exploration in polynomial time, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009.
DOI : 10.1145/1553374.1553441

P. Poupart, N. Vlassis, J. Hoey, and K. Regan, An analytic solution to discrete Bayesian reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, 2006.
DOI : 10.1145/1143844.1143932

M. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994.
DOI : 10.1002/9780470316887

J. Sorg, S. Singh, and R. Lewis, Variance-based rewards for approximate Bayesian reinforcement learning, Proc. of UAI, 2010.

A. L. Strehl and M. L. Littman, A theoretical analysis of Model-Based Interval Estimation, Proceedings of the 22nd international conference on Machine learning , ICML '05, 2005.
DOI : 10.1145/1102351.1102459

A. L. Strehl, L. Li, and M. L. Littman, Reinforcement learning in finite MDPs: PAC analysis, JMLR, vol.10, pp.2413-2444, 2009.

J. A. Malcolm and . Strens, A Bayesian framework for reinforcement learning, Proc. of ICML, 2000.

R. Sutton and A. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

I. Szita and C. Szepesvári, Model-based reinforcement learning with nearly tight exploration complexity bounds, Proc. of ICML, 2010.

L. G. Valiant, A theory of the learnable, Proc. of STOC, 1984.

T. J. Walsh, I. Szita, C. Diuk, and M. L. Littman, Exploring compact reinforcement-learning representations with linear regression, Proc. of UAI, 2009.