Fitted Q-iteration in continuous action-space MDPs, 2007. ,
URL : https://hal.archives-ouvertes.fr/inria-00185311
Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp.330-337, 2007. ,
DOI : 10.1109/ADPRL.2007.368207
URL : https://hal.archives-ouvertes.fr/inria-00124833
Neuro-dynamic programming, 1996. ,
DOI : 10.1007/0-306-48332-7_333
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming, Mathematics of Operations Research, vol.37, issue.1, pp.66-94, 2012. ,
DOI : 10.1287/moor.1110.0532
(Approximate) iterated successive approximations algorithm for sequential decision processes, Annals of Operations Research, vol.3, issue.3, pp.1-12, 2012. ,
DOI : 10.1007/s10479-012-1073-x
Error propagation for approximate policy and value iteration (extended version), NIPS, 2010. ,
On the Sample Complexity of Reinforcement Learning, 2003. ,
Error bounds for approximate policy iteration, International Conference on Machine Learning (ICML), pp.560-567, 2003. ,
Performance Bounds in $L_p$???norm for Approximate Value Iteration, SIAM Journal on Control and Optimization, vol.46, issue.2, pp.541-561, 2007. ,
DOI : 10.1137/040614384
Markov decision processes: Discrete stochastic dynamic programming, 1994. ,
DOI : 10.1002/9780470316887
Modified Policy Iteration Algorithms for Discounted Markov Decision Problems, Management Science, vol.24, issue.11, pp.1127-1137, 1978. ,
DOI : 10.1287/mnsc.24.11.1127
On the use of non-stationary policies for stationary infinite-horizon markov decision processes, Advances in Neural Information Processing Systems 25, pp.1835-1843, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00758809
Performance bound for approximate optimistic policy iteration, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00480952
Approximate modified policy iteration, Proceedings of the 29th International Conference on Machine Learning (ICML-12), pp.1207-1214, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00758882
Approximate modified policy iteration, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00758882
An upper bound on the loss from approximate optimal-value functions, Machine Learning, pp.16-19, 1994. ,
DOI : 10.1007/BF00993308