LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation

Florent Bartoccioni; Éloi Zablocki; Andrei Bursuc; Patrick Pérez; Matthieu Cord; Karteek Alahari

Communication Dans Un Congrès Année : 2022

LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation

(1, 2) , (2) , (2) , (2) , (2, 3) , (1)

1
2
3

Florent Bartoccioni

Fonction : Auteur

Apprentissage de modèles à partir de données massives

Valeo.ai

Éloi Zablocki

Fonction : Auteur

Valeo.ai

Andrei Bursuc

Fonction : Auteur

Valeo.ai

Patrick Pérez

Fonction : Auteur

Valeo.ai

Matthieu Cord

Fonction : Auteur

Valeo.ai

Sorbonne Université

Karteek Alahari

Fonction : Auteur
PersonId : 19670
IdHAL : karteek
ORCID : 0000-0002-1838-5936
IdRef : 196283892

Apprentissage de modèles à partir de données massives

Résumé

Recent works in autonomous driving have widely adopted the bird'seye-view (BEV) semantic map as an intermediate representation of the world. Online prediction of these BEV maps involves non-trivial operations such as multi-camera data extraction as well as fusion and projection into a common topview grid. This is usually done with error-prone geometric operations (e.g., homography or back-projection from monocular depth estimation) or expensive direct dense mapping between image pixels and pixels in BEV (e.g., with MLP or attention). In this work, we present 'LaRa', an efficient encoder-decoder, transformer-based model for vehicle semantic segmentation from multiple cameras. Our approach uses a system of cross-attention to aggregate information over multiple sensors into a compact, yet rich, collection of latent representations. These latent representations, after being processed by a series of selfattention blocks, are then reprojected with a second cross-attention in the BEV space. We demonstrate that our model outperforms the best previous works using transformers on nuScenes. The code and trained models are available at https://github.com/valeoai/LaRa.

Mots clés

bird's eye view semantic segmentation encoder-decoder transformers

Domaines

Informatique [cs]

Fichier principal

LaRa_clean_arxiv.pdf (8.92 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

THOTH Team : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03875582

Soumis le : lundi 28 novembre 2022-14:44:10

Dernière modification le : samedi 27 avril 2024-03:12:18

Dates et versions

hal-03875582 , version 1 (28-11-2022)

Identifiants

HAL Id : hal-03875582 , version 1
ARXIV : 2206.13294

Citer

Florent Bartoccioni, Éloi Zablocki, Andrei Bursuc, Patrick Pérez, Matthieu Cord, et al.. LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation. CoRL 2022 - Conference on Robot Learning, Dec 2022, Auckland, New Zealand. ⟨hal-03875582⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA INSMI ISIR LJK LJK_GI INRIA2 GENCI LJK-GI-THOTH SORBONNE-UNIVERSITE SU-SCIENCES ANR ISIR_MLIA

65 Consultations

43 Téléchargements

LaRa: Latents and Rays for Multi-Camera Bird's-Eye-View Semantic Segmentation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager