Causal Reinforcement Learning using Observational and Interventional Data

Maxime Gasse; Damien Grasset; Guillaume Gaudron; Pierre-Yves Oudeyer

Pré-Publication, Document De Travail Année : 2021

Causal Reinforcement Learning using Observational and Interventional Data

(1) , (2) , (3) , (4)

1
2
3
4

Maxime Gasse

Fonction : Auteur
PersonId : 1119290

École Polytechnique de Montréal

Damien Grasset

Fonction : Auteur
PersonId : 1119291

IRT Saint Exupéry - Institut de Recherche Technologique

Guillaume Gaudron

Fonction : Auteur
PersonId : 1119292

Ubisoft

Pierre-Yves Oudeyer

Fonction : Auteur
PersonId : 6675
IdHAL : pyoudeyer
ORCID : 0000-0002-9404-7613
IdRef : 081674481

Flowing Epigenetic Robots and Systems

Résumé

Learning efficiently a causal model of the environment is a key challenge of model-based RL agents operating in POMDPs. We consider here a scenario where the learning agent has the ability to collect online experiences through direct interactions with the environment (interventional data), but has also access to a large collection of offline experiences, obtained by observing another agent interacting with the environment (observational data). A key ingredient, that makes this situation non-trivial, is that we allow the observed agent to interact with the environment based on hidden information, which is not observed by the learning agent. We then ask the following questions: can the online and offline experiences be safely combined for learning a causal model ? And can we expect the offline experiences to improve the agent's performances ? To answer these questions, we import ideas from the well-established causal framework of do-calculus, and we express model-based reinforcement learning as a causal inference problem. Then, we propose a general yet simple methodology for leveraging offline data during learning. In a nutshell, the method relies on learning a latent-based causal transition model that explains both the interventional and observational regimes, and then using the recovered latent variable to infer the standard POMDP transition model via deconfounding. We prove our method is correct and efficient in the sense that it attains better generalization guarantees due to the offline data (in the asymptotic case), and we illustrate its effectiveness empirically on synthetic toy problems. Our contribution aims at bridging the gap between the fields of reinforcement learning and causality.

Domaines

Informatique [cs] Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

2106.14421.pdf (936.44 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Pierre-Yves Oudeyer : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03465488

Soumis le : vendredi 3 décembre 2021-17:37:37

Dernière modification le : mercredi 15 mars 2023-08:50:07

Archivage à long terme le : vendredi 4 mars 2022-19:32:58

Dates et versions

hal-03465488 , version 1 (03-12-2021)

Identifiants

HAL Id : hal-03465488 , version 1

Citer

Maxime Gasse, Damien Grasset, Guillaume Gaudron, Pierre-Yves Oudeyer. Causal Reinforcement Learning using Observational and Interventional Data. 2021. ⟨hal-03465488⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENSTA INRIA INRIA2 IRT_SAINT-EXUPERY

52 Consultations

99 Téléchargements

Causal Reinforcement Learning using Observational and Interventional Data

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager