Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory

Andras Antos; Csaba Szepesvari; Rémi Munos

Conference Papers Year : 2007

Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory

(1) , (2) , (3)

1
2
3

Andras Antos

Function : Author

Computer and Automation Research Institute [Budapest]

Csaba Szepesvari

Function : Author

Department of Computing Science [Edmonton]

Rémi Munos

Function : Author
PersonId : 836863

Sequential Learning

Abstract

We consider batch reinforcement learning problems in continuous space,expected total discounted-reward Markovian Decision Problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced thereby significantly extending the scope of previous results.

Keywords

Reinforcement Learning Markov decision process policy iteration statistical learning function approximation

Domains

Machine Learning [cs.LG]

Fichier principal

sapi_adprl_final.pdf (166.42 Ko)

Origin : Files produced by the author(s)

Rémi Munos : Connect in order to contact the contributor

https://inria.hal.science/inria-00124833

Submitted on : Tuesday, January 16, 2007-1:34:47 PM

Last modification on : Friday, May 17, 2024-4:32:06 PM

Long-term archiving on: Friday, September 21, 2012-10:15:57 AM

Dates and versions

inria-00124833 , version 1 (16-01-2007)

Identifiers

HAL Id : inria-00124833 , version 1

Cite

Andras Antos, Csaba Szepesvari, Rémi Munos. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007, Hawai, United States. pp.2007. ⟨inria-00124833⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-LILLE3 CNRS INRIA LAGIS INRIA2

222 View

526 Download

Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory

Abstract

Keywords

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Share