Fitted Q-iteration in continuous action-space MDPs, Proceedings of the 21st Annual Conference on Neural Information Processing Systems, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00185311
REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs, Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, 2009. ,
Dynamic Programming, 1957. ,
Dynamic Programming and Optimal Control, volume I, Athena Scientific, 2007. ,
Dynamic Programming and Optimal Control, volume II, Athena Scientific, 2007. ,
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
Prediction, Learning, and Games, 2006. ,
DOI : 10.1017/CBO9780511546921
Learning Rates for Q-Learning, Journal of Machine Learning Research, vol.5, pp.1-25, 2003. ,
DOI : 10.1007/3-540-44581-1_39
PAC Bounds for Multi-armed Bandit and Markov Decision Processes, 15th Annual Conference on Computational Learning Theory, pp.255-270, 2002. ,
DOI : 10.1007/3-540-45435-7_18
An Introduction to Probability Theory and Its Applications, 1968. ,
Near-optimal regret bounds for reinforcement learning, Journal of Machine Learning Research, vol.11, pp.1563-1600, 2010. ,
Finite-sample convergence rates for Q-learning and indirect algorithms, Advances in Neural Information Processing Systems 12, pp.996-1002, 1999. ,
Complexity analysis of real-time reinforcement learning, Proceedings of the Eleventh National Conference on Artificial Intelligence, 1993. ,
The sample complexity of exploration in the multi-armed bandit problem, Journal of Machine Learning Research, vol.5, pp.623-648, 2004. ,
Finite-time bounds for fitted value iteration, Journal of Machine Learning Research, vol.9, pp.815-857, 2008. ,
URL : https://hal.archives-ouvertes.fr/inria-00120882
Incremental multi-step Q-learning, Machine Learning, pp.283-290, 1996. ,
Markov Decision Processes, Discrete Stochastic Dynamic Programming, 1994. ,
Reinforcement learning in finite MDPs: PAC analysis, Journal of Machine Learning Research, vol.10, pp.2413-2444, 2009. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
The asymptotic convergence-rate of Q-learning, Advances in Neural Information Processing Systems 10, 1997. ,
Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, 2010. ,
Model-based reinforcement learning with nearly tight exploration complexity bounds, Proceedings of the 27th International Conference on Machine Learning, pp.1031-1038, 2010. ,
Double Q-learning, Advances in Neural Information Processing Systems 23, pp.2613-2621, 2010. ,
Learning from Delayed Rewards, Kings College, 1989. ,