Learning Methods for Recovering 3D Human Pose from Monocular Images

Ankur Agarwal; Bill Triggs

Rapport (Rapport De Recherche) Année : 2004

Learning Methods for Recovering 3D Human Pose from Monocular Images

(1) , (1)

Ankur Agarwal

Fonction : Auteur

Learning and recognition in vision

Bill Triggs

Fonction : Auteur
PersonId : 741773
IdHAL : bill-triggs
ORCID : 0000-0003-4116-6296
IdRef : 068974116

Learning and recognition in vision

Résumé

We describe a learning based method for recovering 3D human body pose from single images and monocular image sequences. Our approach requires neither an explicit body model nor prior labelling of body parts in the image. Instead, it recovers pose by direct nonlinear regression against shape descriptor vectors extracted automatically from image silhouettes. For robustness against local silhouette segmentation errors, silhouette shape is encoded by histogram-of-shape-contexts descriptors. We evaluate several different regression methods: ridge regression, Relevance Vector Machine (RVM) regression and Support Vector Machine (SVM) regression over both linear and kernel bases. The RVMs provide much sparser regressors without compromising performance, and kernel bases give a small but worthwhile improvement in performance. Loss of depth and limb labelling information often makes the recovery of 3D pose from single silhouettes ambiguous. We propose two solutions to this: the first embeds the method in a tracking framework, using dynamics from the previous state estimate to disambiguate the pose; the second uses a mixture of regressors framework to return multiple solutions for each silhouette. We show that the resulting system tracks long sequences stably, and is also capable of accurately reconstructing 3D human pose from single images, giving multiple possible solutions in ambiguous cases. For realism and good generalization over a wide range of viewpoints, we train the regressors on images resynthesized from real human motion capture data. The method is demonstrated on a 54-parameter full body pose model, both quantitatively on independent but similar test data, and qualitatively on real image sequences. Mean angular errors of 4-5 degrees are obtained - a factor of 3 better than the current state of the art for the much simpler upper body problem.

Mots clés

COMPUTER VISION HUMAN MOTION ESTIMATION MACHINE LEARNING MULTIVARIATE REGRESSION RELEVANCE VECTOR MACHINE

Domaines

Autre [cs.OH]

Fichier principal

RR-5333.pdf (1.63 Mo)

multsols.png (11.88 Ko)

Format : Figure, Image

Rapport De Recherche Inria : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00070668

Soumis le : vendredi 19 mai 2006-21:09:55

Dernière modification le : jeudi 4 avril 2024-18:24:35

Archivage à long terme le : lundi 17 septembre 2012-16:01:21

Dates et versions

inria-00070668 , version 1 (19-05-2006)

Identifiants

HAL Id : inria-00070668 , version 1

Citer

Ankur Agarwal, Bill Triggs. Learning Methods for Recovering 3D Human Pose from Monocular Images. [Research Report] RR-5333, INRIA. 2004, pp.25. ⟨inria-00070668⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA IMAG CNRS INRIA INRIA-RRRT INRIA2 LARA

225 Consultations

328 Téléchargements

Learning Methods for Recovering 3D Human Pose from Monocular Images

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager