On the Generation of Markov Decision Processes, Journal of the Operational Research Society, vol.46, issue.3, pp.354-361, 1995. ,
DOI : 10.1057/jors.1995.50
Policy search by dynamic programming, NIPS, 2003. ,
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
DOI : 10.1007/0-306-48332-7_333
Error propagation for approximate policy and value iteration (extended version), NIPS, 2010. ,
Conservative and Greedy Approaches to Classification-based Policy Iteration, AAAI, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00772610
Approximately optimal approximate reinforcement learning, ICML, 2002. ,
Reinforcement Learning as Classification: Leveraging Modern Classifiers, ICML, 2003. ,
Least-squares policy iteration, Journal of Machine Learning Research (JMLR), vol.4, pp.1107-1149, 2003. ,
Analysis of a Classification-based Policy Iteration Algorithm, ICML, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00482065
Error Bounds for Approximate Policy Iteration, ICML, 2003. ,
Performance Bounds in Lp norm for Approximate Value Iteration, SIAM J. Control and Optimization, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00124685
Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris, Journal of Machine Learning Research, vol.14, pp.1175-1221, 2013. ,
URL : https://hal.archives-ouvertes.fr/inria-00185271
On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes, NIPS, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00758809
Approximate Modified Policy Iteration, ICML, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00758882