Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation

Rémi Munos

Article Dans Une Revue Journal of Machine Learning Research Année : 2006

Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation

(1, 2)

1
2

Rémi Munos

Fonction : Auteur
PersonId : 836863

Sequential Learning

Centre de Mathématiques Appliquées - Ecole Polytechnique

Résumé

We study a variance reduction technique for Monte Carlo estimation of functionals in Markov chains. The method is based on designing sequential control variates using successive approximations of the function of interest V. Regular Monte Carlo estimates have a variance of O(1/N), where N is the number of sample trajectories of the Markov chain. Here, we obtain a geometric variance reduction O(ρ^N) (with ρ<1) up to a threshold that depends on the approximation error V-AV, where A is an approximation operator linear in the values. Thus, if V belongs to the right approximation space (i.e. AV=V), the variance decreases geometrically to zero. An immediate application is value function estimation in Markov chains, which may be used for policy evaluation in a policy iteration algorithm for solving Markov Decision Processes. Another important domain, for which variance reduction is highly needed, is gradient estimation, that is computing the sensitivity ∂αV of the performance measure V with respect to some parameter α of the transition probabilities. For example, in policy parametric optimization, computing an estimate of the policy gradient is required to perform a gradient optimization method. We show that, using two approximations for the value function and the gradient, a geometric variance reduction is also achieved, up to a threshold that depends on the approximation errors of both of those representations.

Domaines

Apprentissage [cs.LG]

Fichier principal

fast_mc_jmlr.pdf (108.03 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Rémi Munos : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00117153

Soumis le : jeudi 30 novembre 2006-11:39:42

Dernière modification le : vendredi 24 mars 2023-14:52:48

Archivage à long terme le : mardi 6 avril 2010-23:41:14

Dates et versions

inria-00117153 , version 1 (30-11-2006)

Identifiants

HAL Id : inria-00117153 , version 1

Citer

Rémi Munos. Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation. Journal of Machine Learning Research, 2006, 7, pp.413-427. ⟨inria-00117153⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X UNIV-LILLE3 CNRS INRIA X-CMAP X-DEP-MATHA LAGIS CMAP UVSQ INRIA2

173 Consultations

68 Téléchargements

Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager