Finite Time Bounds for Sampling-Based Fitted Value Iteration - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Rapport (Rapport De Recherche) Année : 2006

Finite Time Bounds for Sampling-Based Fitted Value Iteration

Rémi Munos
  • Fonction : Auteur
  • PersonId : 836863

Résumé

In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) for solving large, or infinite state-space Markovian decision problems (MDP) with a generative model. Unlike most previous results, our theoretical guarantees apply to a large class of regressors (e.g. neural networks, adaptive regression trees, kernel machines, locally weighted learning). The bounds derived on the performance of sampling-based FVI make the dependence of loss explicit on the MDP's controllability and smoothness properties, the MDP's relation to the regressor employed and several other algorithmic choices. We discuss the relation of our results to previous results and illustrate some tradeoffs by means of a computer experiment.
Fichier principal
Vignette du fichier
savi_1.1.pdf (552.55 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

inria-00120882 , version 1 (18-12-2006)
inria-00120882 , version 2 (19-01-2007)
inria-00120882 , version 3 (09-01-2008)
inria-00120882 , version 4 (05-03-2008)

Identifiants

  • HAL Id : inria-00120882 , version 1

Citer

Rémi Munos, Csaba Szepesvari. Finite Time Bounds for Sampling-Based Fitted Value Iteration. [Research Report] 2006, pp.47. ⟨inria-00120882v1⟩
386 Consultations
510 Téléchargements

Partager

Gmail Facebook X LinkedIn More