M. Lopes, T. Lang, M. Toussaint, and P. Oudeyer, Exploration in modelbased reinforcement learning by empirically estimating learning progress, Neural Information Processing System, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00755248

S. Richard, A. G. Sutton, and . Barto, Reinforcement learning, an introduction, 1998.

R. I. Brafman and M. Tennenholtz, R-max : a general polynomial time algorithm for near-optimal reinforcement learning, Journal of Machine Learning Research, pp.213-231, 2002.

J. , Z. Kolter, and A. Ng, Near-bayesian exploration in polynomial time, Proceeding of the International Conference on Machine Learning (ICML), pp.513-520, 2009.

S. Hong, L. , and P. Auer, Autonomous exploration for navigating in mdps, Conference Proceedings, 2012.

J. Schmidhuber, Curious model-building control systems, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks, pp.1458-1463, 1991.
DOI : 10.1109/IJCNN.1991.170605

P. Oudeyer, F. Kaplan, and V. V. Hafner, Intrinsic Motivation Systems for Autonomous Mental Development, IEEE Transactions On Evolutionary Computation, 2007.
DOI : 10.1109/TEVC.2006.890271

A. Baranès and P. Oudeyer, R-IAC: Robust Intrinsically Motivated Exploration and Active Learning, IEEE Transactions on Autonomous Mental Development, 2009.
DOI : 10.1109/TAMD.2009.2037513

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, International Conference on Machine Learning, 2002.

L. Kocsis and C. Szepesvari, Bandit Based Monte-Carlo Planning, European Conference on Machine Learning, 2006.
DOI : 10.1007/11871842_29

L. Alexander, M. L. Strehl, and . Littman, An analysis of model-based interval estimation for markov decision, Journal of Computer and System Sciences, 2008.

O. Maillard, R. Munos, and G. Stoutly, A finite-time analysis of multiarmed bandits problems with kullback-leibler divergence, COLT, 2011.

D. Auger, A. Couëtoux, and O. Teytaud, Continuous Upper Confidence Trees with Polynomial Exploration ??? Consistency, ECML/PKKD, 2013.
DOI : 10.1007/978-3-642-40988-2_13
URL : https://hal.archives-ouvertes.fr/hal-00835352