Using Confounded Data in Latent Model-Based Reinforcement Learning

Maxime Gasse; Damien Grasset; Guillaume Gaudron; Pierre-Yves Oudeyer

Article Dans Une Revue Transactions on Machine Learning Research Journal Année : 2023

Using Confounded Data in Latent Model-Based Reinforcement Learning

(1) , (2) , (3) , (4)

1
2
3
4

Maxime Gasse

Fonction : Auteur
PersonId : 1338059

ServiceNow Research

Damien Grasset

Fonction : Auteur
PersonId : 1119291

IRT Saint Exupéry - Institut de Recherche Technologique

Guillaume Gaudron

Fonction : Auteur
PersonId : 1119292

Ubisoft

Pierre-Yves Oudeyer

Fonction : Auteur
PersonId : 6675
IdHAL : pyoudeyer
ORCID : 0000-0002-9404-7613
IdRef : 081674481

Flowing Epigenetic Robots and Systems

Résumé

In the presence of confounding, naively using off-the-shelf offline reinforcement learning (RL) algorithms leads to sub-optimal behaviour. In this work, we propose a safe method to exploit confounded offline data in model-based RL, which improves the sample-efficiency of an interactive agent that collects and learns from online, unconfounded data. First, we import ideas from the well-established framework of do-calculus to express model-based RL as a causal inference problem, thus bridging the gap between the fields of RL and causality. Then, we propose a generic method for learning a causal transition model from offline and online data, which captures and corrects the confounding effect using a hidden latent variable. We demonstrate that our method is correct and efficient, in the sense that it attains better generalization guarantees thanks to the confounded offline data (in the asymptotic case), regardless of the confounding effect (the offline expert's behaviour). We showcase our method on a series of synthetic experiments, which demonstrate that a) using confounded offline data naively degrades the sample-efficiency of an RL agent collecting and learning from online data; b) using confounded offline data correctly improves its sample-efficiency.

Mots clés

Reinforcement learning RL causal learning confounded data model-based RL

Domaines

Informatique [cs]

Fichier principal

tmlrGasse23.pdf (880.49 Ko)

Origine : Fichiers produits par l'(les) auteur(s)
Licence : CC BY - Paternité

Pierre-Yves Oudeyer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-04404106

Soumis le : jeudi 18 janvier 2024-18:47:43

Dernière modification le : mardi 23 janvier 2024-03:51:14

Dates et versions

hal-04404106 , version 1 (18-01-2024)

Licence

Paternité

Identifiants

HAL Id : hal-04404106 , version 1

Citer

Maxime Gasse, Damien Grasset, Guillaume Gaudron, Pierre-Yves Oudeyer. Using Confounded Data in Latent Model-Based Reinforcement Learning. Transactions on Machine Learning Research Journal, 2023. ⟨hal-04404106⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA2 IRT_SAINT-EXUPERY

15 Consultations

16 Téléchargements

Using Confounded Data in Latent Model-Based Reinforcement Learning

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager