Image categorization using Fisher kernels of non-iid image models

Ramazan Gokberk Cinbis 1 Jakob Verbeek 1 Cordelia Schmid 1
1 LEAR - Learning and recognition in vision
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
Abstract : The bag-of-words (BoW) model treats images as an unordered set of local regions and represents them by visual word histograms. Implicitly, regions are assumed to be identically and independently distributed (iid), which is a poor assumption from a modeling perspective. We introduce non-iid models by treating the parameters of BoW models as latent variables which are integrated out, rendering all local regions dependent. Using the Fisher kernel we encode an image by the gradient of the data log-likelihood w.r.t. hyper-parameters that control priors on the model parameters. Our representation naturally involves discounting transformations similar to taking square-roots, providing an explanation of why such transformations have proven successful. Using variational inference we extend the basic model to include Gaussian mixtures over local descriptors, and latent topic models to capture the co-occurrence structure of visual words, both improving performance. Our models yield state-of-the-art categorization performance using linear classifiers; without using non-linear transformations such as taking square-roots of features, or using (approximate) explicit embeddings of non-linear kernels.
Type de document :
Communication dans un congrès
CVPR 2012 - IEEE Conference on Computer Vision & Pattern Recognition, Jun 2012, Providence, United States. IEEE, pp.2184-2191, 2012, 〈10.1109/CVPR.2012.6247926〉
Liste complète des métadonnées

Littérature citée [21 références]  Voir  Masquer  Télécharger


https://hal.inria.fr/hal-00685943
Contributeur : Thoth Team <>
Soumis le : vendredi 6 avril 2012 - 14:08:50
Dernière modification le : mercredi 11 avril 2018 - 01:58:20
Document(s) archivé(s) le : samedi 7 juillet 2012 - 02:35:22

Fichiers

paper_final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Ramazan Gokberk Cinbis, Jakob Verbeek, Cordelia Schmid. Image categorization using Fisher kernels of non-iid image models. CVPR 2012 - IEEE Conference on Computer Vision & Pattern Recognition, Jun 2012, Providence, United States. IEEE, pp.2184-2191, 2012, 〈10.1109/CVPR.2012.6247926〉. 〈hal-00685943〉

Partager

Métriques

Consultations de la notice

1313

Téléchargements de fichiers

4320