Convolutional neural networks: towards less supervision for visual recognition

Abstract : This thesis investigates convolutional neural networks for visual recognition. Recent convolutional neural networks have demonstrated excellent performance for a variety of recognition tasks but typically require large amounts of manually annotated training data to perform well. This data is often costly to annotate and may introduce unwanted biases. In this thesis we investigate different ways how to reduce the amount and complexity of required training supervision. In our first contribution, we propose a transfer learning approach with a convolutional neural network for object classification. We first learn mid-level features on the large ImageNet dataset during a pre-training phase, then we use the parameters to initialize another network designed for a smaller-scale task, where less training data is available. We show, first, that the image representations can be efficiently transferred to other visual recognition tasks, and second, that these representations lead to higher performance when more data is used for pre-training. We demonstratethat the proposed approach outperforms state-of-the-art on the Pascal VOC image classification task. In our second contribution, we investigate weakly supervised learning for object recognition. We use the fact that for classification, convolutional neural networks tend to take decisions based on the most distinctive parts of objects. This allows us to build a network that can predict the location of objects, based on a weakly annotated dataset indicating only the presence or absence of objects but not their location in images. We demonstrate that our approach improves the state-of-theart on the Pascal VOC image classification task, performing on par with methods requiring full object-level supervision. In our third contribution, we look at possible paths for progress in unsupervised learning with neural networks. We study the recent Generative Adversarial Networks; these architectures learn distributions of images and generate new samples, but the evaluation which learned model is better than others is difficult. We propose a twosample test method for this evaluation problem, allowing us to perform a first level of model selection. We investigate possible links between Generative Adversarial Networks and concepts related to causality, and propose a two-sample test method for the task of causal discovery, outperforming the state of the art. Finally, building on a recent connection with optimal transport, we investigate what these generative algorithms are learning from unlabeled data.
Document type :
Complete list of metadatas
Contributor : Maxime Oquab <>
Submitted on : Thursday, May 31, 2018 - 11:00:05 AM
Last modification on : Wednesday, January 30, 2019 - 11:07:49 AM
Long-term archiving on: Saturday, September 1, 2018 - 1:36:22 PM


Oquab PhD Thesis.pdf
Files produced by the author(s)


  • HAL Id : tel-01803967, version 1



Maxime Oquab. Convolutional neural networks: towards less supervision for visual recognition. Computer Science [cs]. Ecole Normale Supérieure (ENS); ED 386 : École doctorale de sciences mathématiques de Paris centre, UPMC, 2018. English. ⟨tel-01803967⟩



Record views


Files downloads