Image Classification with the Fisher Vector: Theory and Practice

Jorge Sanchez 1, * Florent Perronnin 2, * Thomas Mensink 2, 3, * Jakob Verbeek 3, *
* Corresponding author
3 LEAR - Learning and recognition in vision
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words (BOV) representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an ''universal'' generative Gaussian mixture model. This representation, which we call Fisher Vector (FV) has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets -- PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K -- with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.
Document type :
Reports
Complete list of metadatas

Cited literature [73 references]  Display  Hide  Download


https://hal.inria.fr/hal-00779493
Contributor : Jakob Verbeek <>
Submitted on : Wednesday, June 12, 2013 - 3:40:49 PM
Last modification on : Friday, August 2, 2019 - 3:24:01 PM
Long-term archiving on : Tuesday, April 4, 2017 - 8:52:17 PM

Files

RR-8209.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00779493, version 3

Collections

Citation

Jorge Sanchez, Florent Perronnin, Thomas Mensink, Jakob Verbeek. Image Classification with the Fisher Vector: Theory and Practice. [Research Report] RR-8209, INRIA. 2013. ⟨hal-00779493v3⟩

Share

Metrics

Record views

2584

Files downloads

16316