On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2022

On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting

Résumé

This paper deals with solving continuous time, state and action optimization problems in stochastic settings, using reinforcement learning algorithms, and considers the policy evaluation process. We prove that standard learning algorithms based on the discretized temporal difference are doomed to fail when the time discretization tends to zero, because of the stochastic part. We propose a variance-reduction correction of the temporal difference, leading to new learning algorithms that are stable with respect to vanishing time steps. This allows us to give theoretical guarantees of convergence of our algorithms to the solutions of continuous stochastic optimization problems.
Fichier principal
Vignette du fichier
var_red_arxiv.pdf (334.96 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03574645 , version 1 (15-02-2022)
hal-03574645 , version 2 (10-06-2022)
hal-03574645 , version 3 (05-06-2023)

Identifiants

Citer

Ziad Kobeissi, Francis Bach. On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting. 2022. ⟨hal-03574645v1⟩
112 Consultations
142 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More