NetVLAD: CNN architecture for weakly supervised place recognition

Relja Arandjelovic; Petr Gronat; Akihiko Torii; Tomas Pajdla; Josef Sivic

Communication Dans Un Congrès Année : 2016

NetVLAD: CNN architecture for weakly supervised place recognition

(1, 2) , (1, 2) , (3) , (4) , (1, 2)

1
2
3
4

Relja Arandjelovic

Fonction : Auteur
PersonId : 3839
IdHAL : reljaarandjelovic
ORCID : 0000-0002-9232-4023
IdRef : 253131251

Models of visual object recognition and scene understanding

Laboratoire d'informatique de l'école normale supérieure

Petr Gronat

Fonction : Auteur

Models of visual object recognition and scene understanding

Laboratoire d'informatique de l'école normale supérieure

Akihiko Torii

Fonction : Auteur

Tokyo Institute of Technology [Tokyo]

Tomas Pajdla

Fonction : Auteur
PersonId : 966475

Department of Cybernetics [Prague]

Josef Sivic

Fonction : Auteur

Models of visual object recognition and scene understanding

Laboratoire d'informatique de l'école normale supérieure

Résumé

We tackle the problem of large scale visual place recognition , where the task is to quickly and accurately recognize the location of a given query photograph. We present the following three principal contributions. First, we develop a convolutional neural network (CNN) architecture that is trainable in an end-to-end manner directly for the place recognition task. The main component of this architecture, NetVLAD, is a new generalized VLAD layer, inspired by the " Vector of Locally Aggregated Descriptors " image representation commonly used in image retrieval. The layer is readily pluggable into any CNN architecture and amenable to training via backpropagation. Second, we develop a training procedure, based on a new weakly supervised ranking loss, to learn parameters of the architecture in an end-to-end manner from images depicting the same places over time downloaded from Google Street View Time Machine. Finally, we show that the proposed architecture obtains a large improvement in performance over non-learnt image representations as well as significantly outperforms off-the-shelf CNN descriptors on two challenging place recognition benchmarks, and outperforms current state-of-the-art compact image representations on standard image retrieval benchmarks.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

cvpr16_place.pdf (4.5 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Relja Arandjelović : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01242052

Soumis le : lundi 23 mai 2016-17:38:16

Dernière modification le : vendredi 19 avril 2024-16:18:55

Dates et versions

hal-01242052 , version 1 (12-12-2015)

hal-01242052 , version 2 (10-03-2016)

hal-01242052 , version 3 (23-05-2016)

Identifiants

HAL Id : hal-01242052 , version 3
ARXIV : 1511.07247

Citer

Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, Josef Sivic. NetVLAD: CNN architecture for weakly supervised place recognition. CVPR 2016 - 29th IEEE Conference on Computer Vision and Pattern Recognition, Jun 2016, Las Vegas, United States. ⟨hal-01242052v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA INRIA2 PSL

551 Consultations

1040 Téléchargements

NetVLAD: CNN architecture for weakly supervised place recognition

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager