On the Generation of Markov Decision Processes, Journal of the Operational Research Society, vol.46, issue.3, pp.354-361, 1995. ,
DOI : 10.1057/jors.1995.50
Neuro-Dynamic Programming, 1996. ,
DOI : 10.1007/0-306-48332-7_333
Technical update: Least-squares temporal difference learning, Machine Learning, pp.2-3, 2002. ,
A large-deviation inequality for vector-valued martingales, 2005. ,
Finite-sample analysis of least-squares policy iteration, Journal of Machine Learning Research, vol.13, pp.3041-3074, 2012. ,
URL : https://hal.archives-ouvertes.fr/inria-00528596
Least squares policy evaluation algorithms with linear function approximation, Theory and Applications, vol.13, pp.79-110, 2002. ,
Should one compute the temporal difference fix point or minimize the bellman residual? the unified oblique projection view, ICML, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00537403
On the use of non-stationary policies for stationary infinite-horizon Markov decision processes, NIPS 2012 Adv.in Neural Information Processing Systems, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00758809
Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, 2010. ,
DOI : 10.2200/S00268ED1V01Y201005AIM009
An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, 1997. ,
DOI : 10.1109/9.580874
Rates of convergence for empirical processes stationnary mixing consequences. The Annals of Probability, pp.3041-3074, 1994. ,
Convergence of least-squares temporal difference methods under general conditions, ICML, 2010. ,