On a Variance-Reduction Correction for the Temporal-Difference Learning in the Stochastic Continuous Setting - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2022

On a Variance-Reduction Correction for the Temporal-Difference Learning in the Stochastic Continuous Setting

Résumé

We consider the problem of policy evaluation for continuous-time processes using the temporal-difference learning algorithm. More precisely, from the time discretization of a stochastic differential equation, we intend to learn the continuous value function using TD(0). First, we show that the standard TD(0) algorithm is doomed to fail when the time step tends to zero because of the stochastic part of the dynamics. Then, we propose an additive zero-mean correction to the temporal difference making it robust with respect to vanishing time steps. We propose two algorithms: the first one being model-based since it requires to know the drift function of the dynamics; the second one being model-free. We prove the convergence of the model-based algorithm to the continuous-time solution under a linear-parametrization assumption in two different regimes: one with a convex regularization of the problem; and the second using the Polyak-Juditsy averaging method with constant step size and without regularization. The convergence rate obtained in the latter regime is comparable with the state of the art for the simpler problem of linear regression using stochastic gradient descent methods. From a totally different perspective, our method may be applied to solve second-order elliptic equations in non-divergent form using machine learning.
Fichier principal
Vignette du fichier
VR_version2.pdf (319.36 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03574645 , version 1 (15-02-2022)
hal-03574645 , version 2 (10-06-2022)
hal-03574645 , version 3 (05-06-2023)

Identifiants

Citer

Ziad Kobeissi, Francis Bach. On a Variance-Reduction Correction for the Temporal-Difference Learning in the Stochastic Continuous Setting. 2022. ⟨hal-03574645v2⟩
106 Consultations
138 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More