Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Andras Antos 1 Csaba Szepesvari 1 Rémi Munos 2, 3
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems. As opposed to previous theoretical work, we consider the case when the training data consists of a single sample path (trajectory) of some behaviour policy. In particular, we do not assume access to a generative model of the environment. The algorithm studied is policy iteration where in successive iterations the Q-functions of the intermediate policies are obtained by means of minimizing a novel Bellman-residual type error. PACstyle polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance where the bound depends on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used.
Type de document :
Communication dans un congrès
Conference On Learning Theory, Jun 2006, Pittsburgh, USA, 2006
Liste complète des métadonnées

Littérature citée [21 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00117130
Contributeur : Rémi Munos <>
Soumis le : jeudi 30 novembre 2006 - 13:11:01
Dernière modification le : jeudi 10 mai 2018 - 02:05:44
Document(s) archivé(s) le : mardi 6 avril 2010 - 23:40:56

Fichier

antos-colt06.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : inria-00117130, version 1

Collections

Citation

Andras Antos, Csaba Szepesvari, Rémi Munos. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Conference On Learning Theory, Jun 2006, Pittsburgh, USA, 2006. 〈inria-00117130〉

Partager

Métriques

Consultations de la notice

351

Téléchargements de fichiers

765