PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation

Shizhe Chen; Ricardo Garcia; Cordelia Schmid; Ivan Laptev

Communication Dans Un Congrès Année : 2023

PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation

(1) , (1) , (1) , (1)

Shizhe Chen

Fonction : Auteur

Models of visual object recognition and scene understanding

Ricardo Garcia

Fonction : Auteur

Models of visual object recognition and scene understanding

Cordelia Schmid

Fonction : Auteur

Models of visual object recognition and scene understanding

Ivan Laptev

Fonction : Auteur

Models of visual object recognition and scene understanding

Résumé

The ability for robots to comprehend and execute manipulation tasks based on natural language instructions is a long-term goal in robotics. The dominant approaches for language-guided manipulation use 2D image representations, which face difficulties in combining multi-view cameras and inferring precise 3D positions and relationships. To address these limitations, we propose a 3D point cloud based policy called PolarNet for language-guided manipulation. It leverages carefully designed point cloud inputs, efficient point cloud encoders, and multimodal transformers to learn 3D point cloud representations and integrate them with language instructions for action prediction. PolarNet is shown to be effective and data efficient in a variety of experiments conducted on the RLBench benchmark. It outperforms state-of-the-art 2D and 3D approaches in both single-task and multi-task learning. It also achieves promising results on a real robot.

Mots clés

Robotic manipulation 3D point clouds Language-guided policy

Domaines

Robotique [cs.RO] Intelligence artificielle [cs.AI] Vision par ordinateur et reconnaissance de formes [cs.CV] Machine Learning [stat.ML]

Fichier principal

2309.15596.pdf (14.87 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Ricardo Garcia : Connectez-vous pour contacter le contributeur

https://hal.science/hal-04221153

Soumis le : jeudi 28 septembre 2023-12:43:40

Dernière modification le : vendredi 19 avril 2024-16:18:58

Archivage à long terme le : vendredi 29 décembre 2023-18:56:24

Dates et versions

hal-04221153 , version 1 (28-09-2023)

Licence

Paternité

Identifiants

HAL Id : hal-04221153 , version 1
ARXIV : 2309.15596

Citer

Shizhe Chen, Ricardo Garcia, Cordelia Schmid, Ivan Laptev. PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation. 7th Conference on Robot Learning (CoRL 2023), Nov 2023, Atlanta, GA, United States. ⟨hal-04221153⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA INRIA2 GENCI PSL ANR PRAIRIE-IA

39 Consultations

35 Téléchargements

PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager