Temporal Difference Learning with Continuous Time and State in the Stochastic Setting - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2023

Temporal Difference Learning with Continuous Time and State in the Stochastic Setting

Résumé

We consider the problem of continuous-time policy evaluation. This consists in learning through observations the value function associated with an uncontrolled continuous-time stochastic dynamic and a reward function. We propose two original variants of the well-known TD(0) method using vanishing time steps. One is model-free and the other is model-based. For both methods, we prove theoretical convergence rates that we subsequently verify through numerical simulations. Alternatively, those methods can be interpreted as novel reinforcement learning approaches for approximating solutions of linear PDEs (partial differential equations) or linear BSDEs (backward stochastic differential equations).
Fichier principal
Vignette du fichier
v4_june2023.pdf (624.25 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Licence : CC BY - Paternité

Dates et versions

hal-03574645 , version 1 (15-02-2022)
hal-03574645 , version 2 (10-06-2022)
hal-03574645 , version 3 (05-06-2023)

Licence

Paternité

Identifiants

Citer

Ziad Kobeissi, Francis Bach. Temporal Difference Learning with Continuous Time and State in the Stochastic Setting. 2023. ⟨hal-03574645v3⟩
105 Consultations
134 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More