Learning Near-optimal Policies with Bellman-residual Minimization based Fitted Policy Iteration and a Single Sample Path, COLT, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00830201
On the Generation of Markov Decision Processes, Journal of the Operational Research Society, vol.46, issue.3, pp.354-361, 1995. ,
DOI : 10.1057/jors.1995.50
Residual Algorithms: Reinforcement Learning with Function Approximation, ICML, 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50013-X
Temporal differences-based policy iteration and applications i n neuro-dynamic programming, 1996. ,
Neuro-Dynamic Programming, Athena Scien- tific, 1996. ,
Projected equation methods for approximate solution of large linear systems, Journal of Computational and Applied Mathematics, vol.227, issue.1, pp.27-50, 2009. ,
DOI : 10.1016/j.cam.2008.07.037
Projected equation methods for approximate solution of large linear systems, Journal of Computational and Applied Mathematics, vol.227, issue.1, pp.27-50, 2009. ,
DOI : 10.1016/j.cam.2008.07.037
The tradeoffs of large scale learning, Optimization for Machine Learning, pp.351-368, 2011. ,
Linear Least-Squares algorithms for temporal difference learning, Machine Learning, pp.1-3, 1996. ,
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning, Discrete Event Dynamic Systems, vol.22, issue.1,2,3, pp.207-239, 2006. ,
DOI : 10.1007/s10626-006-8134-8
Algorithms and Representations for Reinforcement Learning, 2005. ,
Eligibility traces through colored noises, International Congress on Ultra Modern Telecommunications and Control Systems, 2010. ,
DOI : 10.1109/ICUMT.2010.5676597
URL : https://hal.archives-ouvertes.fr/hal-00553910
Kalman Temporal Differences, JAIR, vol.39, pp.483-532, 2010. ,
URL : https://hal.archives-ouvertes.fr/hal-00351297
Algorithmic Survey of Parametric Value Function Approximation, IEEE Transactions on Neural Networks and Learning Systems, 2013. ,
DOI : 10.1109/TNNLS.2013.2247418
URL : https://hal.archives-ouvertes.fr/hal-00869725
Bias-Variance Error Bounds for Temporal Difference Updates, COLT, 2000. ,
The Fixed Ponts of Off-Policy TD, Neural Information Processing Systems (NIPS), 2011. ,
GQ(?): A general gradient algorithm for temporaldifference prediction learning with eligibility traces, Conference on Artificial General Intelligence, 2010. ,
Error Bounds for Approximate Policy Iteration, ICML, 2003. ,
Least Squares Policy Evaluation Algorithms with Linear Function Approximation, DEDS, vol.13, pp.79-110, 2003. ,
Eligibility Traces for Off-Policy Policy Evaluation, ICML, 2000. ,
Off-Policy Temporal-Difference Learning with Function Approximation, Proceedings of the 18th International Conference on Machine Learning, 2001. ,
Combining importance sampling and temporal difference control variates to simulate Markov Chains, ACM Transactions on Modeling and Computer Simulation, vol.14, issue.1, pp.1-30, 2004. ,
DOI : 10.1145/974734.974735
Stochastic Simulation, 1987. ,
DOI : 10.1002/9780470316726
Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view, ICML, 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00537403
Recursive Least-Squares Learning with Eligibility Traces, European Wrokshop on Reinforcement Learning (EWRL 11), 2011. ,
DOI : 10.1007/978-3-642-29946-9_14
URL : https://hal.archives-ouvertes.fr/hal-00644511
Optimality of Reinforcement Learning Algorithms with Linear Function Approximation, NIPS, 2002. ,
Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning), 1998. ,
DOI : 10.1007/978-1-4615-3618-5
Fast gradient-descent methods for temporal-difference learning with linear function approximation, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, 2009. ,
DOI : 10.1145/1553374.1553501
An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol.42, issue.5, pp.674-690, 1997. ,
DOI : 10.1109/9.580874