Using Confounded Data in Latent Model-Based Reinforcement Learning - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Transactions on Machine Learning Research Journal Année : 2023

Using Confounded Data in Latent Model-Based Reinforcement Learning

Maxime Gasse
  • Fonction : Auteur
  • PersonId : 1338059
Guillaume Gaudron
  • Fonction : Auteur
  • PersonId : 1119292
Pierre-Yves Oudeyer

Résumé

In the presence of confounding, naively using off-the-shelf offline reinforcement learning (RL) algorithms leads to sub-optimal behaviour. In this work, we propose a safe method to exploit confounded offline data in model-based RL, which improves the sample-efficiency of an interactive agent that collects and learns from online, unconfounded data. First, we import ideas from the well-established framework of do-calculus to express model-based RL as a causal inference problem, thus bridging the gap between the fields of RL and causality. Then, we propose a generic method for learning a causal transition model from offline and online data, which captures and corrects the confounding effect using a hidden latent variable. We demonstrate that our method is correct and efficient, in the sense that it attains better generalization guarantees thanks to the confounded offline data (in the asymptotic case), regardless of the confounding effect (the offline expert's behaviour). We showcase our method on a series of synthetic experiments, which demonstrate that a) using confounded offline data naively degrades the sample-efficiency of an RL agent collecting and learning from online data; b) using confounded offline data correctly improves its sample-efficiency.
Fichier principal
Vignette du fichier
tmlrGasse23.pdf (880.49 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Licence : CC BY - Paternité

Dates et versions

hal-04404106 , version 1 (18-01-2024)

Licence

Paternité

Identifiants

  • HAL Id : hal-04404106 , version 1

Citer

Maxime Gasse, Damien Grasset, Guillaume Gaudron, Pierre-Yves Oudeyer. Using Confounded Data in Latent Model-Based Reinforcement Learning. Transactions on Machine Learning Research Journal, 2023. ⟨hal-04404106⟩
15 Consultations
16 Téléchargements

Partager

Gmail Facebook X LinkedIn More