Local Features and Visual Words Emerge in Activations

Oriane Siméoni 1 Yannis Avrithis 1 Ondřej Chum 2
1 LinkMedia - Creating and exploiting explicit links between multimedia fragments
Inria Rennes – Bretagne Atlantique , IRISA-D6 - MEDIA ET INTERACTIONS
2 VRG - Visual Recognition Group [Prague]
CTU/FEE - Faculty of electrical engineering [Prague]
Abstract : We propose a novel method of deep spatial matching (DSM) for image retrieval. Initial ranking is based on image descriptors extracted from convolutional neural network activations by global pooling, as in recent state-of-the-art work. However, the same sparse 3D activation tensor is also approximated by a collection of local features. These local features are then robustly matched to approximate the optimal alignment of the tensors. This happens without any network modification, additional layers or training. No local feature detection happens on the original image. No local feature descriptors and no visual vocabulary are needed throughout the whole process. We experimentally show that the proposed method achieves the state-of-the-art performance on standard benchmarks across different network architectures and different global pooling methods. The highest gain in performance is achieved when diffusion on the nearest-neighbor graph of global descriptors is initiated from spatially verified images.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [43 references]  Display  Hide  Download

https://hal.inria.fr/hal-02370209
Contributor : Yannis Avrithis <>
Submitted on : Tuesday, November 19, 2019 - 12:52:48 PM
Last modification on : Thursday, November 21, 2019 - 1:21:05 AM

File

1905.06358.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02370209, version 1
  • ARXIV : 1905.06358

Citation

Oriane Siméoni, Yannis Avrithis, Ondřej Chum. Local Features and Visual Words Emerge in Activations. 2019. ⟨hal-02370209⟩

Share

Metrics

Record views

14

Files downloads

39