A POMDP extension with belief-dependent rewards, Advances in Neural Information Processing Systems 23 (NIPS-10), 2010. ,
A Bayesian sampling approach to exploration in reinforcement learning, Proceedings of the Twenty- Fifth Conference on Uncertainty in Artificial Intelligence (UAI'09), 2009. ,
The theory of dynamic programming, Bulletin of the American Mathematical Society, vol.60, issue.6, pp.503-516, 1954. ,
DOI : 10.1090/S0002-9904-1954-09848-8
R-max -a general polynomial time algorithm for near-optimal reinforcement learning, Journal of Machine Learning Research, vol.3, pp.213-231, 2003. ,
An intrinsic reward mechanism for efficient exploration, Proceedings of the 23rd international conference on Machine learning. pp. 833? 840. ICML'06, 2006. ,
Tree Exploration for Bayesian RL Exploration, 2008 International Conference on Computational Intelligence for Modelling Control & Automation, pp.1029-1034, 2008. ,
DOI : 10.1109/CIMCA.2008.32
URL : http://arxiv.org/abs/0902.0392
Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes, 2002. ,
Bandit processes and dynamic allocation indices, Journal of the Royal Statistical Society, vol.41, issue.2, pp.148-177, 1979. ,
DOI : 10.1002/9780470980033
Active Learning of Dynamic Bayesian Networks in Markov Decision Processes, Proceedings of the 7th International Conference on Abstraction , Reformulation, and Approximation, pp.273-284, 2007. ,
DOI : 10.1007/978-3-540-73580-9_22
Near-Bayesian exploration in polynomial time, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009. ,
DOI : 10.1145/1553374.1553441
Policy invariance under reward transformations: Theory and application to reward shaping, Proceedings of the Sixteenth International Conference on Machine Learning, pp.278-287, 1999. ,
An analytic solution to discrete Bayesian reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, 2006. ,
DOI : 10.1145/1143844.1143932
Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1994. ,
DOI : 10.1002/9780470316887
Probabilistic distance measures of the Dirichlet and Beta distributions, Pattern Recognition, vol.41, issue.2, pp.637-645, 2008. ,
DOI : 10.1016/j.patcog.2007.06.023
Coastal navigation with mobile robots, Advances in Neural Information Processing Systems 12, pp.1043-1049, 1999. ,
Variance-based rewards for approximate Bayesian reinforcement learning, Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, 2010. ,
A Bayesian framework for reinforcement learning, Proceedings of the International Conference on Machine Learning (ICML'00, pp.943-950, 2000. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Reinforcement learning algorithms for MDPs ? a survey, 2009. ,