Weakly supervised methods for learning actions and objects

Alessandro Prest

Thèse Année : 2012

Weakly supervised methods for learning actions and objects

(1, 2)

1
2

Alessandro Prest

Fonction : Auteur
PersonId : 879018

Learning and recognition in vision

Eidgenössische Technische Hochschule - Swiss Federal Institute of Technology [Zürich]

Résumé

Modern Computer Vision systems learn visual concepts through examples (i.e. images) which have been manually annotated by humans. While this paradigm allowed the field to tremendously progress in the last decade, it has now become one of its major bottlenecks. Teaching a new visual concept requires an expensive human annotation effort, limiting systems to scale to thousands of visual concepts from the few dozens that work today. The exponential growth of visual data available on the net represents an invaluable resource for visual learning algorithms and calls for new methods able to exploit this information to learn visual concepts without the need of major human annotation effort. As a first contribution, we introduce an approach for learning human actions as interac- tions between persons and objects in realistic images. By exploiting the spatial structure of human-object interactions, we are able to learn action models automatically from a set of still images annotated only with the action label (weakly-supervised). Extensive experimental evaluation demonstrates that our weakly-supervised approach achieves the same performance of popular fully-supervised methods despite using substantially less supervision. In the second part of this thesis we extend this reasoning to human-object interactions in realistic video and feature length movies. Popular methods represent actions with low- level features such as image gradients or optical flow. In our approach instead, interactions are modeled as the trajectory of the object wrt to the person position, providing a rich and natural description of actions. Our interaction descriptor is an informative cue on its own and is complimentary to traditional low-level features. Finally, in the third part we propose an approach for learning object detectors from real- world web videos (i.e. YouTube). As opposed to the standard paradigm of learning from still images annotated with bounding-boxes, we propose a technique to learn from videos known only to contain objects of a target class. We demonstrate that learning detec- tors from video alone already delivers good performance requiring much less supervision compared to training from images annotated with bounding boxes. We additionally show that training from a combination of weakly annotated videos and fully annotated still images improves over training from still images alone.

Mots clés

computer vision weakly supervised learning

Domaines

Informatique Machine Learning [stat.ML]

Fichier principal

dissertation.pdf (30.35 Mo)

Alessandro Prest : Connectez-vous pour contacter le contributeur

https://theses.hal.science/tel-00758797

Soumis le : jeudi 29 novembre 2012-12:52:27

Dernière modification le : samedi 27 avril 2024-03:13:22

Archivage à long terme le : samedi 17 décembre 2016-17:44:44

Dates et versions

tel-00758797 , version 1 (29-11-2012)

Identifiants

HAL Id : tel-00758797 , version 1

Citer

Alessandro Prest. Weakly supervised methods for learning actions and objects. Computer science. Eidgenössische Technische Hochschule Zürich (ETHZ), 2012. English. ⟨NNT : ⟩. ⟨tel-00758797⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS INRIA INSMI LJK LJK_GI LJK_GI_LEAR INRIA2 RISC_THESE_HDR

654 Consultations

3024 Téléchargements

Weakly supervised methods for learning actions and objects

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager