Finite Time Bounds for Sampling-Based Fitted Value Iteration

Rémi Munos 1 Csaba Szepesvari 2
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available. Our main results come in the form of finite-time bounds on the performance of two versions of sampling-based FVI. The convergence rate results obtained allow us to show that both versions of FVI are well behaving in the sense that by using a sufficiently large number of samples for a large class of MDPs, arbitrary good performance can be achieved with high probability. An important feature of our proof technique is that it permits the study of weighted Lp-norm performance bounds. As a result, our technique applies to a large class of function-approximation methods (e.g., neural networks, adaptive regression trees, kernel machines, locally weighted learning), and our bounds scale well with the effective horizon of the MDP. The bounds show a dependence on the stochastic stability properties of the MDP: they scale with the discounted-average concentrability of the future-state distributions. They also depend on a new measure of the approximation capabilities of the function space, the inherent Bellman residual, which reflects how well the function space is ``aligned'' with the essential characteristics of the dynamics of the MDP. The conditions of the main result, as well as the concepts introduced in the analysis, are extensively discussed and compared to previous theoretical results. Numerical experiments are used to substantiate the theoretical findings.
Type de document :
Rapport
[Research Report] 2007, pp.46
Liste complète des métadonnées

https://hal.inria.fr/inria-00120882
Contributeur : Rémi Munos <>
Soumis le : mercredi 5 mars 2008 - 16:47:28
Dernière modification le : jeudi 11 janvier 2018 - 06:22:13
Document(s) archivé(s) le : vendredi 25 novembre 2016 - 21:25:37

Fichiers

savi_1.5.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00120882, version 4

Collections

Citation

Rémi Munos, Csaba Szepesvari. Finite Time Bounds for Sampling-Based Fitted Value Iteration. [Research Report] 2007, pp.46. 〈inria-00120882v4〉

Partager

Métriques

Consultations de la notice

608

Téléchargements de fichiers

220