Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Andras Antos; Csaba Szepesvari; Rémi Munos

Communication Dans Un Congrès Année : 2006

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

(1) , (1) , (2, 3)

1
2
3

Andras Antos

Fonction : Auteur

Computer and Automation Research Institute [Budapest]

Csaba Szepesvari

Fonction : Auteur

Computer and Automation Research Institute [Budapest]

Rémi Munos

Fonction : Auteur
PersonId : 836863

Sequential Learning

Centre de Mathématiques Appliquées - Ecole Polytechnique

Résumé

We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems. As opposed to previous theoretical work, we consider the case when the training data consists of a single sample path (trajectory) of some behaviour policy. In particular, we do not assume access to a generative model of the environment. The algorithm studied is policy iteration where in successive iterations the Q-functions of the intermediate policies are obtained by means of minimizing a novel Bellman-residual type error. PACstyle polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance where the bound depends on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used.

Domaines

Apprentissage [cs.LG]

Fichier principal

antos-colt06.pdf (466.11 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Rémi Munos : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00117130

Soumis le : jeudi 30 novembre 2006-13:11:01

Dernière modification le : vendredi 24 mars 2023-14:52:48

Archivage à long terme le : mardi 6 avril 2010-23:40:56

Dates et versions

inria-00117130 , version 1 (30-11-2006)

Identifiants

HAL Id : inria-00117130 , version 1

Citer

Andras Antos, Csaba Szepesvari, Rémi Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Conference On Learning Theory, Jun 2006, Pittsburgh, USA. ⟨inria-00117130⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X UNIV-LILLE3 CNRS INRIA X-CMAP X-DEP-MATHA LAGIS CMAP UVSQ INRIA2

142 Consultations

1554 Téléchargements

Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager