Towards a True Acoustic-Visual Speech Synthesis

Asterios Toutios 1 Utpala Musti 1 Slim Ouni 1, * Vincent Colotte 1 Brigitte Wrobel-Dautcourt 2 Marie-Odile Berger 2
* Auteur correspondant
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
2 MAGRIT - Visual Augmentation of Complex Environments
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : This paper presents an initial bimodal acoustic-visual synthesis system able to generate concurrently the speech signal and a 3D animation of the speaker's face. This is done by concatenating bimodal diphone units that consist of both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asyn- chrony and incoherence inherent in classic approaches to audiovisual synthesis. Unit selection is based on classic target and join costs from acoustic-only synthesis, which are augmented with a visual join cost. Preliminary results indicate the benefits of this approach, since both the synthesized speech signal and the face animation are of good quality.
Type de document :
Communication dans un congrès
9th International Conference on Auditory-Visual Speech Processing - AVSP2010, Sep 2010, Hakone, Kanagawa, Japan. pp.POS1-8, 2010, AVSP2010
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00526782
Contributeur : Slim Ouni <>
Soumis le : vendredi 15 octobre 2010 - 17:08:01
Dernière modification le : jeudi 11 janvier 2018 - 06:20:14
Document(s) archivé(s) le : lundi 17 janvier 2011 - 10:55:05

Fichier

AVSP10-AT.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00526782, version 1

Collections

Citation

Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, et al.. Towards a True Acoustic-Visual Speech Synthesis. 9th International Conference on Auditory-Visual Speech Processing - AVSP2010, Sep 2010, Hakone, Kanagawa, Japan. pp.POS1-8, 2010, AVSP2010. 〈inria-00526782〉

Partager

Métriques

Consultations de la notice

531

Téléchargements de fichiers

183