HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory

Andras Antos 1 Csaba Szepesvari 2 Rémi Munos 3
3 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We consider batch reinforcement learning problems in continuous space,expected total discounted-reward Markovian Decision Problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced thereby significantly extending the scope of previous results.
Document type :
Conference papers
Complete list of metadata

Cited literature [16 references]  Display  Hide  Download

Contributor : Rémi Munos Connect in order to contact the contributor
Submitted on : Tuesday, January 16, 2007 - 1:34:47 PM
Last modification on : Thursday, January 20, 2022 - 4:12:36 PM
Long-term archiving on: : Friday, September 21, 2012 - 10:15:57 AM


Files produced by the author(s)


  • HAL Id : inria-00124833, version 1



Andras Antos, Csaba Szepesvari, Rémi Munos. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007, Hawai, United States. pp.2007. ⟨inria-00124833⟩



Record views


Files downloads