A comprehensive system for facial animation of generic 3D head models driven by speech.

Lucas Terissi; Mauricio Cerda; Juan C. Gomez; Nancy Hitschfeld-Kahler; Bernard Girau

doi:10.1186/1687-4722-2013-5

Article Dans Une Revue EURASIP Journal on Audio, Speech, and Music Processing Année : 2013

A comprehensive system for facial animation of generic 3D head models driven by speech.

(1) , (2) , (1) , (3) , (4)

1
2
3
4

Lucas Terissi

Fonction : Auteur
PersonId : 864904

Laboratory for System Dynamics and Signal Processing

Mauricio Cerda

Fonction : Auteur
PersonId : 954904

Laboratory for Scientific Image Analysis

Juan C. Gomez

Fonction : Auteur
PersonId : 882784

Laboratory for System Dynamics and Signal Processing

Nancy Hitschfeld-Kahler

Fonction : Auteur
PersonId : 843076

Computer Science Department [Santiago]

Bernard Girau

Fonction : Auteur
PersonId : 830272
IdHAL : bernard-girau
ORCID : 0009-0006-6037-6220

Neuromimetic intelligence

Résumé

A comprehensive system for facial animation of generic 3D head models driven by speech is presented in this article. In the training stage, audio-visual information is extracted from audio-visual training data, and then used to compute the parameters of a single joint audio-visual hidden Markov model (AV-HMM). In contrast to most of the methods in the literature, the proposed approach does not require segmentation/classification processing stages of the audio-visual data, avoiding the error propagation related to these procedures. The trained AV-HMM provides a compact representation of the audio-visual data, without the need of phoneme (word) segmentation, which makes it adaptable to different languages. Visual features are estimated from the speech signal based on the inversion of the AV-HMM. The estimated visual speech features are used to animate a simple face model. The animation of a more complex head model is then obtained by automatically mapping the deformation of the simple model to it, using a small number of control points for the interpolation. The proposed algorithm allows the animation of 3D head models of arbitrary complexity through a simple setup procedure. The resulting animation is evaluated in terms of intelligibility of visual speech through perceptual tests, showing a promising performance. The computational complexity of the proposed system is analyzed, showing the feasibility of its real-time implementation.

Domaines

Réseau de neurones [cs.NE]

Bernard Girau : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00974678

Soumis le : lundi 7 avril 2014-12:26:22

Dernière modification le : vendredi 27 octobre 2023-16:14:06

Dates et versions

hal-00974678 , version 1 (07-04-2014)

Identifiants

HAL Id : hal-00974678 , version 1
DOI : 10.1186/1687-4722-2013-5

Citer

Lucas Terissi, Mauricio Cerda, Juan C. Gomez, Nancy Hitschfeld-Kahler, Bernard Girau. A comprehensive system for facial animation of generic 3D head models driven by speech.. EURASIP Journal on Audio, Speech, and Music Processing, 2013, ⟨10.1186/1687-4722-2013-5⟩. ⟨hal-00974678⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA INRIA-CHILE UNIV-LORRAINE INRIA2 LORIA LORIA-AIS

88 Consultations

0 Téléchargements

A comprehensive system for facial animation of generic 3D head models driven by speech.

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager