Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks

Maxime Oquab; Léon Bottou; Ivan Laptev; Josef Sivic

Communication Dans Un Congrès Année : 2013

Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks

(1, 2, 3) , (4) , (2, 3) , (2, 3)

1
2
3
4

Maxime Oquab

Fonction : Auteur
PersonId : 949102

Microsoft Research - Inria Joint Centre

Models of visual object recognition and scene understanding

Laboratoire d'informatique de l'école normale supérieure

Léon Bottou

Fonction : Auteur
PersonId : 920968

Microsoft Research New York City

Ivan Laptev

Fonction : Auteur
PersonId : 865349

Models of visual object recognition and scene understanding

Laboratoire d'informatique de l'école normale supérieure

Josef Sivic

Fonction : Auteur
PersonId : 945630

Models of visual object recognition and scene understanding

Laboratoire d'informatique de l'école normale supérieure

Résumé

Convolutional neural networks (CNN) have recently shown outstanding image classification performance in the large-scale visual recognition challenge (ILSVRC2012). The success of CNNs is attributed to their ability to learn rich mid-level image representations as opposed to hand-designed low-level features used in other image classification methods. Learning CNNs, however, amounts to estimating millions of parameters and requires a very large number of annotated image samples. This property currently prevents application of CNNs to problems with limited training data. In this work we show how image representations learned with CNNs on large-scale annotated datasets can be efficiently transferred to other visual recognition tasks with limited amount of training data. We design a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset. We show that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification, outperforming the current state of the art on Pascal VOC 2007 and 2012 datasets. We also show promising results for object and action localization.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

oquab14.pdf (1.38 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Josef Sivic : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00911179

Soumis le : samedi 13 septembre 2014-12:39:59

Dernière modification le : vendredi 19 avril 2024-16:18:55

Archivage à long terme le : dimanche 14 décembre 2014-10:21:31

Dates et versions

hal-00911179 , version 1 (28-11-2013)

hal-00911179 , version 2 (13-09-2014)

Identifiants

HAL Id : hal-00911179 , version 2

Citer

Maxime Oquab, Léon Bottou, Ivan Laptev, Josef Sivic. Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks. IEEE Conference on Computer Vision and Pattern Recognition, Jun 2014, Columbus, OH, United States. ⟨hal-00911179v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS UNIV-RENNES1 CNRS INRIA IRISA INRIA2 PSL UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

4702 Consultations

7583 Téléchargements

Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager