ou peut être que d'autres techniques myopic pourraient produire de meilleurs résultats En particulier, utiliser des techniques optimistes sur la fonction de valeur comme BOSS, pas le même impact sur les récompenses dépendant de la croyance, car ces récompenses évoluent au cours de l'exécution, 2009. ,
A POMDP extension with belief-dependent rewards, Advances in Neural Information Processing Systems, 2010. ,
A Bayesian sampling approach to exploration in reinforcement learning, Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI'09), 2009. ,
The theory of dynamic programming, Bulletin of the American Mathematical Society, vol.60, issue.6, pp.503-516, 1954. ,
DOI : 10.1090/S0002-9904-1954-09848-8
R-max -a general polynomial time algorithm for near-optimal reinforcement learning, Journal of Machine Learning Research, vol.3, pp.213-231, 2003. ,
An intrinsic reward mechanism for efficient exploration, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.833-840, 2006. ,
DOI : 10.1145/1143844.1143949
Tree Exploration for Bayesian RL Exploration, 2008 International Conference on Computational Intelligence for Modelling Control & Automation, pp.1029-1034, 2008. ,
DOI : 10.1109/CIMCA.2008.32
URL : http://arxiv.org/abs/0902.0392
Optimal learning : Computational procedures for Bayes-adaptive Markov decision processes, 2002. ,
Bandit processes and dynamic allocation indices, Journal of the Royal Statistical Society, vol.41, issue.2, pp.148-177, 1979. ,
DOI : 10.1002/9780470980033
Active Learning of Dynamic Bayesian Networks in Markov Decision Processes, Proceedings of the 7th International Conference on Abstraction, Reformulation, and Approximation, SARA'07, pp.273-284, 2007. ,
DOI : 10.1007/978-3-540-73580-9_22
Near-Bayesian exploration in polynomial time, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009. ,
DOI : 10.1145/1553374.1553441
Policy invariance under reward transformations : Theory and application to reward shaping, Proceedings of the Sixteenth International Conference on Machine Learning, pp.278-287, 1999. ,
An analytic solution to discrete Bayesian reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, 2006. ,
DOI : 10.1145/1143844.1143932
Markov Decision Processes : Discrete Stochastic Dynamic Programming, 1994. ,
DOI : 10.1002/9780470316887
Probabilistic distance measures of the Dirichlet and Beta distributions, Pattern Recognition, vol.41, issue.2, pp.637-645, 2008. ,
DOI : 10.1016/j.patcog.2007.06.023
Coastal navigation with mobile robots, Advances in Neural Information Processing Systems 12, pp.1043-1049, 1999. ,
Variance-based rewards for approximate Bayesian reinforcement learning, Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence, 2010. ,
A Bayesian framework for reinforcement learning, Proceedings of the International Conference on Machine Learning (ICML'00), pp.943-950, 2000. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Reinforcement Learning Algorithms for MDPs ? A Survey, 2009. ,