Skip to Main content Skip to Navigation
New interface
Conference papers

Distances and weighting schemes for bag of visual words image retrieval

Pierre Tirilly 1 Vincent Claveau 1, * Patrick Gros 1 
* Corresponding author
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Current text retrieval techniques allow to index and retrieve text documents very efficiently and with a good accuracy. Image retrieval, on the contrary, is still very coarse and does not yield satisfying results. Therefore, computer vision researchers try to benefit from text retrieval contributions to enhance their retrieval systems. In particular, Sivic and Zisserman, with their video-google framework [1], propose a description of images similar to standard text descriptors: images are described by elementary image parts, called visual words. Thus, they perform image retrieval using the standard Vector Space Model (VSM) of Information Retrieval (IR) and benefit from some classical IR techniques such as inverted files. Among available text retrieval techniques, automatically finding the most relevant words to describe a document has been intensively studied for texts, but not for images. In this paper, we propose to explore the use of term weighting techniques and classical distances from text retrieval in the case of images. These weights are standard VSM weights, weights derived from probabilistic models of IR or new weighting schemes that we propose. Our experiments, performed on several datasets, show that no weighting scheme can improve retrieval on every dataset, but also that choosing weights fitting the properties of the dataset can improve precision and MAP up to 10 percents. This study provides some interesting insights about the semantic and statistical differences between textual and visual words, and about the way visual word-based image retrieval systems can be optimized. It also shows some limits of the bag of visual words model, and the relation existing between Minkowski distances and local weighting. At last, this study questions some experimental habits commonly found in the literature (choice of L1 distance, TF*IDF weights and evaluation using one dataset only).
Document type :
Conference papers
Complete list of metadata
Contributor : Patrick Gros Connect in order to contact the contributor
Submitted on : Wednesday, October 6, 2010 - 5:22:37 PM
Last modification on : Thursday, January 20, 2022 - 4:18:09 PM



Pierre Tirilly, Vincent Claveau, Patrick Gros. Distances and weighting schemes for bag of visual words image retrieval. ACM International Conference on Multimedia Information Retrieval, ACM, Mar 2010, Philadelphia, Pennsylvania, United States. pp.323-332, ⟨10.1145/1743384.1743438⟩. ⟨inria-00523975⟩



Record views