Skip to Main content Skip to Navigation

New methods for image classification, image retrieval and semantic correspondence

Abstract : The problem of image representation is at the heart of computer vision. The choice of feature extracted of an image changes according to the task we want to study. Large image retrieval databases demand a compressed global vector representing each image, whereas a semantic segmentation problem requires a clustering map of its pixels. The techniques of machine learning are the main tool used for the construction of these representations. In this manuscript, we address the learning of visual features for three distinct problems: Image retrieval, semantic correspondence and image classification. First, we study the dependency of a Fisher vector representation on the Gaussian mixture model used as its codewords. We introduce the use of multiple Gaussian mixture models for different backgrounds, e.g. different scene categories, and analyze the performance of these representations for object classification and the impact of scene category as a latent variable. Our second approach proposes an extension to the exemplar SVM feature encoding pipeline. We first show that, by replacing the hinge loss by the square loss in the ESVM cost function, similar results in image retrieval can be obtained at a fraction of the computational cost. We call this model square-loss exemplar machine, or SLEM. Secondly, we introduce a kernelized SLEM variant which benefits from the same computational advantages but displays improved performance. We present experiments that establish the performance and efficiency of our methods using a large array of base feature representations and standard image retrieval datasets. Finally, we propose a deep neural network for the problem of establishing semantic correspondence. We employ object proposal boxes as elements for matching and construct an architecture that simultaneously learns the appearance representation and geometric consistency. We propose new geometrical consistency scores tailored to the neural network’s architecture. Our model is trained on image pairs obtained from keypoints of a benchmark dataset and evaluated on several standard datasets, outperforming both recent deep learning architectures and previous methods based on hand-crafted features. We conclude the thesis by highlighting our contributions and suggesting possible future research directions.
Complete list of metadata
Contributor : Abes Star :  Contact
Submitted on : Tuesday, July 17, 2018 - 11:54:06 AM
Last modification on : Thursday, July 1, 2021 - 5:58:03 PM
Long-term archiving on: : Thursday, October 18, 2018 - 1:53:36 PM


Version validated by the jury (STAR)


  • HAL Id : tel-01676893, version 2



Rafael Sampaio de Rezende. New methods for image classification, image retrieval and semantic correspondence. Computer Vision and Pattern Recognition [cs.CV]. Université Paris sciences et lettres, 2017. English. ⟨NNT : 2017PSLEE068⟩. ⟨tel-01676893v2⟩



Record views


Files downloads