Temporal differences-based policy iteration and applications in neuro-dynamic programming, 1996. ,
Technical update: Least-squares temporal difference learning, Machine Learning, pp.233-246, 2002. ,
Linear least-squares algorithms for temporal difference learning, Machine Learning, pp.33-57, 1996. ,
Bias-variance error bounds for temporal difference updates, Proceedings of the 13th Annual Conference on Computational Learning Theory, pp.142-147, 2000. ,
Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Least-Squares Methods in Reinforcement Learning for Control, SETN'02: Proceedings of the Second Hellenic Conference on AI, pp.249-260, 2002. ,
DOI : 10.1007/3-540-46014-4_23
Least squares policy evaluation algorithms with linear function approximation . Discrete Event Dynamic Systems, pp.79-110, 2003. ,
Optimality of reinforcement learning algorithms with linear function approximation, NIPS, pp.1555-1562, 2002. ,
Fast gradient-descent methods for temporaldifference learning with linear function approximation, ICML'09: Proceedings of the 26th Annual International Conference on Machine Learning, pp.993-1000, 2009. ,
Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998. ,
DOI : 10.1109/TNN.1998.712192
Finite time bounds for sampling based fitted value iteration, Proceedings of the 22nd international conference on Machine learning , ICML '05, pp.880-887, 2005. ,
DOI : 10.1145/1102351.1102462
Building Controllers for Tetris, ICGA Journal, vol.32, issue.1, pp.3-11, 2009. ,
DOI : 10.3233/ICG-2009-32102
URL : https://hal.archives-ouvertes.fr/inria-00418954
Performance bound for Approximate Optimistic Policy Iteration ,
URL : https://hal.archives-ouvertes.fr/inria-00480952
Convergence Results for Some Temporal Difference Methods Based on Least Squares, IEEE Trans. Automatic Control, vol.54, pp.1515-1531, 2009. ,