Distances and weighting schemes for bag of visual words image retrieval

Pierre Tirilly; Vincent Claveau; Patrick Gros

doi:10.1145/1743384.1743438

Communication Dans Un Congrès Année : 2010

Distances and weighting schemes for bag of visual words image retrieval

(1) , (1) , (1)

Pierre Tirilly

Fonction : Auteur
PersonId : 5896
IdHAL : pierretirilly
ORCID : 0000-0003-2675-8023
IdRef : 133245497

Multimedia content-based indexing

Vincent Claveau

Fonction : Auteur correspondant
PersonId : 5270
IdHAL : vincent-claveau
ORCID : 0000-0002-3459-0550
IdRef : 075988216

Connectez-vous pour contacter l'auteur

Multimedia content-based indexing

Patrick Gros

Fonction : Auteur
PersonId : 894
IdHAL : patrick-gros
IdRef : 075986604

Multimedia content-based indexing

Résumé

Current text retrieval techniques allow to index and retrieve text documents very efficiently and with a good accuracy. Image retrieval, on the contrary, is still very coarse and does not yield satisfying results. Therefore, computer vision researchers try to benefit from text retrieval contributions to enhance their retrieval systems. In particular, Sivic and Zisserman, with their video-google framework [1], propose a description of images similar to standard text descriptors: images are described by elementary image parts, called visual words. Thus, they perform image retrieval using the standard Vector Space Model (VSM) of Information Retrieval (IR) and benefit from some classical IR techniques such as inverted files. Among available text retrieval techniques, automatically finding the most relevant words to describe a document has been intensively studied for texts, but not for images. In this paper, we propose to explore the use of term weighting techniques and classical distances from text retrieval in the case of images. These weights are standard VSM weights, weights derived from probabilistic models of IR or new weighting schemes that we propose. Our experiments, performed on several datasets, show that no weighting scheme can improve retrieval on every dataset, but also that choosing weights fitting the properties of the dataset can improve precision and MAP up to 10 percents. This study provides some interesting insights about the semantic and statistical differences between textual and visual words, and about the way visual word-based image retrieval systems can be optimized. It also shows some limits of the bag of visual words model, and the relation existing between Minkowski distances and local weighting. At last, this study questions some experimental habits commonly found in the literature (choice of L1 distance, TF*IDF weights and evaluation using one dataset only).

Domaines

Recherche d'information [cs.IR]

Patrick Gros : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00523975

Soumis le : mercredi 6 octobre 2010-17:22:37

Dernière modification le : vendredi 24 mars 2023-14:52:53

Dates et versions

inria-00523975 , version 1 (06-10-2010)

Identifiants

HAL Id : inria-00523975 , version 1
DOI : 10.1145/1743384.1743438

Citer

Pierre Tirilly, Vincent Claveau, Patrick Gros. Distances and weighting schemes for bag of visual words image retrieval. ACM International Conference on Multimedia Information Retrieval, ACM, Mar 2010, Philadelphia, Pennsylvania, United States. pp.323-332, ⟨10.1145/1743384.1743438⟩. ⟨inria-00523975⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-D6 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

210 Consultations

0 Téléchargements

Distances and weighting schemes for bag of visual words image retrieval

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager