Skip to Main content Skip to Navigation
Conference papers

Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks

Maxime Oquab 1, 2, 3 Léon Bottou 4 Ivan Laptev 2, 3 Josef Sivic 2, 3
2 WILLOW - Models of visual object recognition and scene understanding
DI-ENS - Département d'informatique de l'École normale supérieure, Inria Paris-Rocquencourt, CNRS - Centre National de la Recherche Scientifique : UMR8548
Abstract : Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large-scale visual recognition challenge (ILSVRC2012). The success of CNNs is attributed to their ability to learn rich mid-level image representations as opposed to hand-designed low-level features used in other image classification methods. Learning CNNs, however, amounts to estimating millions of parameters and requires a very large number of annotated image samples. This property currently prevents application of CNNs to problems with limited training data. In this work we show how image representations learned with CNNs on large-scale annotated datasets can be efficiently transferred to other visual recognition tasks with limited amount of training data. We design a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset. We show that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets. We also show promising results for object and action localization.
Document type :
Conference papers
Complete list of metadatas

Cited literature [50 references]  Display  Hide  Download
Contributor : Josef Sivic <>
Submitted on : Saturday, September 13, 2014 - 12:39:59 PM
Last modification on : Tuesday, September 22, 2020 - 3:48:03 AM
Long-term archiving on: : Sunday, December 14, 2014 - 10:21:31 AM


Files produced by the author(s)


  • HAL Id : hal-00911179, version 2



Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic. Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks. IEEE Conference on Computer Vision and Pattern Recognition, Jun 2014, Columbus, OH, United States. ⟨hal-00911179v2⟩



Record views


Files downloads