Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory

Andras Antos 1 Csaba Szepesvari 2 Rémi Munos 3
3 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, Inria Lille - Nord Europe
Abstract : We consider batch reinforcement learning problems in continuous space,expected total discounted-reward Markovian Decision Problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced thereby significantly extending the scope of previous results.
Type de document :
Communication dans un congrès
IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007, Hawai, United States. pp.2007, 2007
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00124833
Contributeur : Rémi Munos <>
Soumis le : mardi 16 janvier 2007 - 13:34:47
Dernière modification le : jeudi 11 janvier 2018 - 01:49:33
Document(s) archivé(s) le : vendredi 21 septembre 2012 - 10:15:57

Fichiers

sapi_adprl_final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00124833, version 1

Collections

Citation

Andras Antos, Csaba Szepesvari, Rémi Munos. Value-Iteration Based Fitted Policy Iteration: Learning with a Single Trajectory. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007, Hawai, United States. pp.2007, 2007. 〈inria-00124833〉

Partager

Métriques

Consultations de la notice

264

Téléchargements de fichiers

212