HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Journal articles

Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation

Rémi Munos 1, 2
1 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : We study a variance reduction technique for Monte Carlo estimation of functionals in Markov chains. The method is based on designing sequential control variates using successive approximations of the function of interest V. Regular Monte Carlo estimates have a variance of O(1/N), where N is the number of sample trajectories of the Markov chain. Here, we obtain a geometric variance reduction O(ρ^N) (with ρ<1) up to a threshold that depends on the approximation error V-AV, where A is an approximation operator linear in the values. Thus, if V belongs to the right approximation space (i.e. AV=V), the variance decreases geometrically to zero. An immediate application is value function estimation in Markov chains, which may be used for policy evaluation in a policy iteration algorithm for solving Markov Decision Processes. Another important domain, for which variance reduction is highly needed, is gradient estimation, that is computing the sensitivity ∂αV of the performance measure V with respect to some parameter α of the transition probabilities. For example, in policy parametric optimization, computing an estimate of the policy gradient is required to perform a gradient optimization method. We show that, using two approximations for the value function and the gradient, a geometric variance reduction is also achieved, up to a threshold that depends on the approximation errors of both of those representations.
Document type :
Journal articles
Complete list of metadata

Cited literature [17 references]  Display  Hide  Download

Contributor : Rémi Munos Connect in order to contact the contributor
Submitted on : Thursday, November 30, 2006 - 11:39:42 AM
Last modification on : Thursday, January 20, 2022 - 4:16:39 PM
Long-term archiving on: : Tuesday, April 6, 2010 - 11:41:14 PM


Publisher files allowed on an open archive


  • HAL Id : inria-00117153, version 1



Rémi Munos. Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation. Journal of Machine Learning Research, Microtome Publishing, 2006, 7, pp.413-427. ⟨inria-00117153⟩



Record views


Files downloads