Learning to parse pictures of people

Rémi Ronfard; Cordelia Schmid; William Triggs

Communication Dans Un Congrès Année : 2002

Learning to parse pictures of people

(1) , (1) , (1)

Rémi Ronfard

Fonction : Auteur
PersonId : 5568
IdHAL : remironfard
ORCID : 0000-0003-4830-5690
IdRef : 031277896

Modeling, localization, recognition and interpretation in computer vision

Cordelia Schmid

Fonction : Auteur
PersonId : 831154

Modeling, localization, recognition and interpretation in computer vision

William Triggs

Fonction : Auteur
PersonId : 741773
IdHAL : bill-triggs
ORCID : 0000-0003-4116-6296
IdRef : 068974116

Modeling, localization, recognition and interpretation in computer vision

Résumé

Detecting people in images is a key problem for video indexing, browsing and retrieval. The main difficulties are the large appearance variations caused by action, clothing, illumination, viewpoint and scale. Our goal is to find people in static video frames using learned models of both the appearance of body parts (head, limbs, hands), and of the geometry of their assemblies. We build on Forsyth & Fleck's general ‘body plan' methodology and Felzenszwalb & Huttenlocher's dynamic programming approach for efficiently assembling candidate parts into ‘pictorial structures'. However we replace the rather simple part detectors used in these works with dedicated detectors learned for each body part using Support Vector Machines (SVMs) or Relevance Vector Machines (RVMs). We are not aware of any previous work using SVMs to learn articulated body plans, however they have been used to detect both whole pedestrians and combinations of rigidly positioned subimages (typically, upper body, arms, and legs) in street scenes, under a wide range of illumination, pose and clothing variations. RVMs are SVM-like classifiers that offer a well-founded probabilistic interpretation and improved sparsity for reduced computation.We demonstrate their benefits experimentally in a series of results showing great promise for learning detectors in more general situations.

Mots clés

kernel methods object recognition image and video indexing grouping and segmentation statistical pattern recognition kernel methods.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

RST02.pdf (384.73 Ko)

rst2002.png (319.59 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Format : Figure, Image

Rémi Ronfard : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00545109

Soumis le : jeudi 9 décembre 2010-16:05:50

Dernière modification le : jeudi 4 avril 2024-21:06:50

Archivage à long terme le : jeudi 10 mars 2011-13:53:50

Dates et versions

inria-00545109 , version 1 (09-12-2010)

Identifiants

HAL Id : inria-00545109 , version 1

Citer

Rémi Ronfard, Cordelia Schmid, William Triggs. Learning to parse pictures of people. European Conference on Computer Vision, 2002, Copenhague, Denmark. pp.700--714. ⟨inria-00545109⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA IMAG CNRS INRIA IRISA INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

759 Consultations

386 Téléchargements

Learning to parse pictures of people

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager