inria-00526782, version 1
Towards a True Acoustic-Visual Speech Synthesis
9th International Conference on Auditory-Visual Speech Processing - AVSP2010 (2010) POS1-8
Résumé : This paper presents an initial bimodal acoustic-visual synthesis system able to generate concurrently the speech signal and a 3D animation of the speaker's face. This is done by concatenating bimodal diphone units that consist of both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asyn- chrony and incoherence inherent in classic approaches to audiovisual synthesis. Unit selection is based on classic target and join costs from acoustic-only synthesis, which are augmented with a visual join cost. Preliminary results indicate the benefits of this approach, since both the synthesized speech signal and the face animation are of good quality.
- a – Université Nancy II
- b – INRIA
- c – Université Henri Poincaré - Nancy I
- 1 :
- INRIA – CNRS : UMR7503 – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL)
- 2 :
- CNRS : UMR7503 – INRIA – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL)
- Domaine : Sciences du Vivant/Autre
Informatique/Traitement du signal et de l'image
Informatique/Multimédia
Informatique/Interface homme-machine
Sciences de l'ingénieur/Traitement du signal et de l'image - Mots-clés : audiovisual speech synthesis – talking head – bimodal unit concatenation – diphones
- inria-00526782, version 1
- http://hal.inria.fr/inria-00526782
- oai:hal.inria.fr:inria-00526782
- Contributeur :
- Soumis le : Vendredi 15 Octobre 2010, 17:08:01
- Dernière modification le : Lundi 25 Octobre 2010, 09:25:22



Documents associés
Exporter