Image Classification with the Fisher Vector: Theory and Practice - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue International Journal of Computer Vision Année : 2013

Image Classification with the Fisher Vector: Theory and Practice

Florent Perronnin
  • Fonction : Auteur
  • PersonId : 928545
Thomas Mensink
  • Fonction : Auteur
  • PersonId : 940630

Résumé

A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words (BoV) representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an "universal" generative Gaussian mixture model. This representation, which we call Fisher Vector (FV) has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets - PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K - with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.
Fichier principal
Vignette du fichier
journal.pdf (411.11 Ko) Télécharger le fichier
Vignette du fichier
Screenshot_hal.png (70.9 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Format : Figure, Image
Loading...

Dates et versions

hal-00830491 , version 1 (05-06-2013)
hal-00830491 , version 2 (12-06-2013)

Identifiants

Citer

Jorge Sanchez, Florent Perronnin, Thomas Mensink, Jakob Verbeek. Image Classification with the Fisher Vector: Theory and Practice. International Journal of Computer Vision, 2013, 105 (3), pp.222-245. ⟨10.1007/s11263-013-0636-x⟩. ⟨hal-00830491v2⟩
5999 Consultations
23608 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More