L. Peter, A. Bartlett, and . Tewari, Regal: A regularization based algorithm for reinforcement learning in weakly communicating mdps, UAI, pp.35-42, 2009.

E. Brunskill, Bayes-optimal reinforcement learning for discrete uncertainty domains, Abstract. Proceedings of the International Conference on Autonomous Agents and Multiagent System, 2012.

Y. Nicoì-o-cesa-bianchi, D. Freund, D. P. Haussler, R. E. Helmbold, M. K. Schapire et al., How to use expert advice, Journal of the ACM, vol.44, issue.3, pp.427-485, 1997.
DOI : 10.1145/258128.258179

C. Diuk, L. Li, and B. R. Leffler, The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning, ICML, 2009.

K. Dyagilev, S. Mannor, and N. Shimkin, Efficient Reinforcement Learning in Parameterized Models: Discrete Parameter Case, European Workshop on Reinforcement Learning, 2008.
DOI : 10.1007/978-3-540-89722-4_4

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

F. Fernández and M. M. Veloso, Probabilistic policy reuse in a reinforcement learning agent, Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems , AAMAS '06, pp.720-727, 2006.
DOI : 10.1145/1160633.1160762

T. Jaksch, R. Ortner, and P. Auer, Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010.

O. Maillard, P. Nguyen, R. Ortner, and D. Ryabko, Optimal regret bounds for selecting the state representation in re inforcement learning, ICML, pp.543-551, 2013.

R. Ortner, D. Ryabko, P. Auer, and R. Munos, Regret bounds for restless markov bandits, ALT, pp.214-228, 2012.
DOI : 10.1007/978-3-642-34106-9_19

URL : https://hal.archives-ouvertes.fr/hal-00765450

P. Poupart, N. Vlassis, J. Hoey, and K. Regan, An analytic solution to discrete Bayesian reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, 2006.
DOI : 10.1145/1143844.1143932

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

D. Pucci-de-farias and N. Megiddo, Exploration-exploitation tradeoffs for experts algorithms in reactive environments, Advances in Neural Information Processing Systems 17, pp.409-416, 2004.

L. Martin and . Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

E. Talvitie and S. Singh, An experts algorithm for transfer learning, IJCAI, 2007.

C. Tekin and M. Liu, Online Learning of Rested and Restless Bandits, IEEE Transactions on Information Theory, vol.58, issue.8, pp.5588-5611, 2012.
DOI : 10.1109/TIT.2012.2198613