PEGASUS: A policy search method for large MDPs and POMDPs, Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence, pp.406-415, 2000. ,
Learning from scarce experience, ICML, pp.498-505, 2002. ,
Policy-gradient methods for planning, Advances in Neural Information Processing Systems 18, pp.9-16, 2006. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, issue.2, pp.1107-1149, 2003. ,
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, COLT-19, pp.574-588, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00830201
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path, Machine Learning, pp.16-18, 2007. ,
URL : https://hal.archives-ouvertes.fr/hal-00830201
Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2004. ,
Generalization in reinforcement learning: Safely approximating the value function, NIPS-7, pp.369-376, 1995. ,
Stable Function Approximation in Dynamic Programming, Proc. of ICML 20, pp.261-268, 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50040-2
Kernel-based reinforcement learning, Machine Learning, pp.161-178, 2002. ,
Neural Fitted Q Iteration ??? First Experiences with a Data Efficient Neural Reinforcement Learning Method, 16th European Conference on Machine Learning, pp.317-328, 2005. ,
DOI : 10.1007/11564096_32
Batch reinforcement learning in a complex domain, Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems , AAMAS '07, p.10, 2007. ,
DOI : 10.1145/1329125.1329241
Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.330-337, 2007. ,
DOI : 10.1109/ADPRL.2007.368207
URL : https://hal.archives-ouvertes.fr/inria-00124833
Stochastic Optimal Control (The Discrete Time Case), 1978. ,
An introduction to support vector machines (and other kernel-based learning methods), 2000. ,
DOI : 10.1017/CBO9780511801389
Fat-Shattering and the Learnability of Real-Valued Functions, Journal of Computer and System Sciences, vol.52, issue.3, pp.434-452, 1996. ,
DOI : 10.1006/jcss.1996.0033