Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

Lina Mezghani; Sainbayar Sukhbaatar; Piotr Bojanowski; Alessandro Lazaric; Karteek Alahari

Communication Dans Un Congrès Année : 2022

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

(1, 2) , (2) , (2) , (2) , (1)

1
2

Lina Mezghani

Fonction : Auteur

Apprentissage de modèles à partir de données massives

Meta AI

Sainbayar Sukhbaatar

Fonction : Auteur

Meta AI

Piotr Bojanowski

Fonction : Auteur

Meta AI

Alessandro Lazaric

Fonction : Auteur
PersonId : 1105515

Meta AI

Karteek Alahari

Fonction : Auteur
PersonId : 19670
IdHAL : karteek
ORCID : 0000-0002-1838-5936
IdRef : 196283892

Apprentissage de modèles à partir de données massives

Résumé

Developing agents that can execute multiple skills by learning from pre-collected datasets is an important problem in robotics, where online interaction with the environment is extremely time-consuming. Moreover, manually designing reward functions for every single desired skill is prohibitive. Prior works targeted these challenges by learning goal-conditioned policies from offline datasets without manually specified rewards, through hindsight relabeling. These methods suffer from the issue of sparsity of rewards, and fail at long-horizon tasks. In this work, we propose a novel self-supervised learning phase on the pre-collected dataset to understand the structure and the dynamics of the model, and shape a dense reward function for learning policies offline. We evaluate our method on three continuous control tasks, and show that our model significantly outperforms existing approaches, especially on tasks that involve long-term planning.

Mots clés

Offline RL Self-Supervised Learning Goal-Conditioned RL

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

offline_reward_shaping_CoRL22.pdf (2.31 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Karteek Alahari : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03869706

Soumis le : jeudi 24 novembre 2022-13:54:48

Dernière modification le : jeudi 4 avril 2024-21:28:45

Archivage à long terme le : samedi 25 février 2023-19:37:09

Dates et versions

hal-03869706 , version 1 (24-11-2022)

Identifiants

HAL Id : hal-03869706 , version 1

Citer

Lina Mezghani, Sainbayar Sukhbaatar, Piotr Bojanowski, Alessandro Lazaric, Karteek Alahari. Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping. CoRL 2022- Conference on Robot Learning, Dec 2022, Auckland, New Zealand. pp.1-15. ⟨hal-03869706⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA LJK LJK_GI INRIA2 LJK-GI-THOTH ANR

67 Consultations

133 Téléchargements

Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager