On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting

Ziad Kobeissi; Francis Bach

Pré-Publication, Document De Travail Année : 2022

On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting

(1, 2) , (2, 3, 4)

1
2
3
4

Ziad Kobeissi

Fonction : Auteur
PersonId : 1126035

Institut Louis Bachelier

Statistical Machine Learning and Parsimony

Francis Bach

Fonction : Auteur
PersonId : 863086

Statistical Machine Learning and Parsimony

Département d'informatique - ENS Paris

Université Paris Sciences et Lettres

Résumé

This paper deals with solving continuous time, state and action optimization problems in stochastic settings, using reinforcement learning algorithms, and considers the policy evaluation process. We prove that standard learning algorithms based on the discretized temporal difference are doomed to fail when the time discretization tends to zero, because of the stochastic part. We propose a variance-reduction correction of the temporal difference, leading to new learning algorithms that are stable with respect to vanishing time steps. This allows us to give theoretical guarantees of convergence of our algorithms to the solutions of continuous stochastic optimization problems.

Domaines

Optimisation et contrôle [math.OC] Analyse fonctionnelle [math.FA] Equations aux dérivées partielles [math.AP] Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

var_red_arxiv.pdf (334.96 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Ziad Kobeissi : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03574645

Soumis le : mardi 15 février 2022-11:23:52

Dernière modification le : vendredi 19 avril 2024-16:18:56

Dates et versions

hal-03574645 , version 1 (15-02-2022)

hal-03574645 , version 2 (10-06-2022)

hal-03574645 , version 3 (05-06-2023)

Identifiants

HAL Id : hal-03574645 , version 1
ARXIV : 2202.07960

Citer

Ziad Kobeissi, Francis Bach. On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting. 2022. ⟨hal-03574645v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

112 Consultations

142 Téléchargements

On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager