Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets

James Philbin; Josef Sivic; Andrew Zisserman

Article Dans Une Revue International Journal of Computer Vision Année : 2011

Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets

(1) , (2) , (1)

1
2

James Philbin

Fonction : Auteur

Visual Geometry Group

Josef Sivic

Fonction : Auteur

Models of visual object recognition and scene understanding

Andrew Zisserman

Fonction : Auteur

Visual Geometry Group

Résumé

Given a large-scale collection of images our aim is to efficiently associate images which contain the same entity, for example a building or object, and to discover the significant entities. To achieve this, we introduce the Geometric Latent Dirichlet Allocation (gLDA) model for unsupervised discovery of particular objects in unordered image collections. This explicitly represents images as mixtures of particular objects or facades, and builds rich latent topic models which incorporate the identity and locations of visual words specific to the topic in a geometrically consistent way. Applying standard inference techniques to this model enables images likely to contain the same object to be probabilistically grouped and ranked. Additionally, to reduce the computational cost of applying the gLDA model to large datasets, we propose a scalable method that first computes a matching graph over all the images in a dataset. This matching graph connects images that contain the same object, and rough image groups can be mined from this graph using standard clustering techniques. The gLDA model can then be applied to generate a more nuanced representation of the data. We also discuss how "hub images" (images representative of an object or landmark) can easily be extracted from our matching graph representation. We evaluate our techniques on the publicly available Oxford buildings dataset (5K images) and show examples of automatically mined objects. The methods are evaluated quantitatively on this dataset using a ground truth labeling for a number of Oxford landmarks. To demonstrate the scalability of the matching graph method, we show qualitative results on two larger datasets of images taken of the Statue of Liberty (37K images) and Rome (1M+ images).

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

Philbin_ijcv2011.pdf (7.09 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Suha Kwak : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01064717

Soumis le : mercredi 17 septembre 2014-10:43:09

Dernière modification le : vendredi 19 avril 2024-16:18:57

Archivage à long terme le : jeudi 18 décembre 2014-10:20:19

Dates et versions

hal-01064717 , version 1 (17-09-2014)

Identifiants

HAL Id : hal-01064717 , version 1

Citer

James Philbin, Josef Sivic, Andrew Zisserman. Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets. International Journal of Computer Vision, 2011. ⟨hal-01064717⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA INRIA2 PSL

215 Consultations

645 Téléchargements

Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager