Least-squares policy iteration, Journal of Machine Learning Research, vol.4, pp.1107-1149, 2003. ,
Learning near-optimal policies with fitted policy iteration and a single sample path: approximate iterative policy evaluation, 2006. ,
Stochastic Optimal Control (The Discrete Time Case), 1978. ,
Toward a modern theory of adaptive networks: Expectation and prediction., Proc. of the Ninth Annual Conference of Cognitive Science Society, 1987. ,
DOI : 10.1037/0033-295X.88.2.135
Error bounds for approximate policy iteration, 19th International Conference on Machine Learning, pp.560-567, 2003. ,
Markov Chains and Stochastic Stability, 1993. ,
Neural Network Learning: Theoretical Foundations, 1999. ,
DOI : 10.1017/CBO9780511624216
Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, pp.94-116, 1994. ,
Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension, Journal of Combinatorial Theory, Series A, vol.69, issue.2, pp.217-232, 1995. ,
DOI : 10.1016/0097-3165(95)90052-7
Some studies in machine learning using the game of checkers, IBM Journal on Research and Development, pp.210-229, 1959. ,
Functional approximation and dynamic programming, Math. Tables and other Aids Comp, pp.247-251, 1959. ,
Neuro-Dynamic Programming, Athena Scientific, 1996. ,
Reinforcement learning: An introduction, 1998. ,
Stable Function Approximation in Dynamic Programming, Proceedings of the Twelfth International Conference on Machine Learning, pp.261-268, 1995. ,
DOI : 10.1016/B978-1-55860-377-6.50040-2
Feature-based methods for large scale dynamic programming, Machine Learning, pp.59-94, 1996. ,
Max-norm projections for factored mdps, Proceedings of the International Joint Conference on Artificial Intelligence, 2001. ,
Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, vol.6, pp.503-556, 2005. ,
Efficient value function approximation using regression trees, Proceedings of the IJCAI Workshop on Statistical Machine Learning for Large-Scale Optimization, 1999. ,
DOI : 10.1111/biom.12207
Batch value function approximation via support vectors, Advances in Neural Information Processing Systems 14, 2002. ,
Finite time bounds for sampling based fitted value iteration, ICML'2005, 2005. ,
Nonparametric time series prediction through adaptive model selection, Machine Learning, pp.5-34, 2000. ,