On a Variance-Reduction Correction for the Temporal-Difference Learning in the Stochastic Continuous Setting

Ziad Kobeissi; Francis Bach

Pré-Publication, Document De Travail Année : 2022

On a Variance-Reduction Correction for the Temporal-Difference Learning in the Stochastic Continuous Setting

(1, 2) , (2, 3, 4)

1
2
3
4

Ziad Kobeissi

Fonction : Auteur
PersonId : 1126035

Institut Louis Bachelier

Statistical Machine Learning and Parsimony

Francis Bach

Fonction : Auteur
PersonId : 863086

Statistical Machine Learning and Parsimony

Département d'informatique - ENS Paris

Université Paris Sciences et Lettres

Résumé

We consider the problem of policy evaluation for continuous-time processes using the temporal-difference learning algorithm. More precisely, from the time discretization of a stochastic differential equation, we intend to learn the continuous value function using TD(0). First, we show that the standard TD(0) algorithm is doomed to fail when the time step tends to zero because of the stochastic part of the dynamics. Then, we propose an additive zero-mean correction to the temporal difference making it robust with respect to vanishing time steps. We propose two algorithms: the first one being model-based since it requires to know the drift function of the dynamics; the second one being model-free. We prove the convergence of the model-based algorithm to the continuous-time solution under a linear-parametrization assumption in two different regimes: one with a convex regularization of the problem; and the second using the Polyak-Juditsy averaging method with constant step size and without regularization. The convergence rate obtained in the latter regime is comparable with the state of the art for the simpler problem of linear regression using stochastic gradient descent methods. From a totally different perspective, our method may be applied to solve second-order elliptic equations in non-divergent form using machine learning.

Domaines

Optimisation et contrôle [math.OC] Analyse fonctionnelle [math.FA] Equations aux dérivées partielles [math.AP] Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

VR_version2.pdf (319.36 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Ziad Kobeissi : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03574645

Soumis le : vendredi 10 juin 2022-15:46:58

Dernière modification le : vendredi 19 avril 2024-16:18:56

Dates et versions

hal-03574645 , version 1 (15-02-2022)

hal-03574645 , version 2 (10-06-2022)

hal-03574645 , version 3 (05-06-2023)

Identifiants

HAL Id : hal-03574645 , version 2
ARXIV : 2202.07960

Citer

Ziad Kobeissi, Francis Bach. On a Variance-Reduction Correction for the Temporal-Difference Learning in the Stochastic Continuous Setting. 2022. ⟨hal-03574645v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

106 Consultations

138 Téléchargements

On a Variance-Reduction Correction for the Temporal-Difference Learning in the Stochastic Continuous Setting

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager