Towards a True Acoustic-Visual Speech Synthesis

Asterios Toutios; Utpala Musti; Slim Ouni; Vincent Colotte; Brigitte Wrobel-Dautcourt; Marie-Odile Berger

Communication Dans Un Congrès Année : 2010

Towards a True Acoustic-Visual Speech Synthesis

(1) , (1) , (1) , (1) , (2) , (2)

1
2

Asterios Toutios

Fonction : Auteur
PersonId : 855198

Analysis, perception and recognition of speech

Utpala Musti

Fonction : Auteur
PersonId : 880717

Analysis, perception and recognition of speech

Slim Ouni

Fonction : Auteur correspondant
PersonId : 1158
IdHAL : slim-ouni
ORCID : 0000-0001-5286-7368

Connectez-vous pour contacter l'auteur

Analysis, perception and recognition of speech

Vincent Colotte

Fonction : Auteur
PersonId : 16268
IdHAL : vincent-colotte
IdRef : 070401683

Analysis, perception and recognition of speech

Brigitte Wrobel-Dautcourt

Fonction : Auteur
PersonId : 830676

Visual Augmentation of Complex Environments

Marie-Odile Berger

Fonction : Auteur
PersonId : 830601

Visual Augmentation of Complex Environments

Résumé

This paper presents an initial bimodal acoustic-visual synthesis system able to generate concurrently the speech signal and a 3D animation of the speaker's face. This is done by concatenating bimodal diphone units that consist of both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asyn- chrony and incoherence inherent in classic approaches to audiovisual synthesis. Unit selection is based on classic target and join costs from acoustic-only synthesis, which are augmented with a visual join cost. Preliminary results indicate the benefits of this approach, since both the synthesized speech signal and the face animation are of good quality.

Mots clés

audiovisual speech synthesis talking head bimodal unit concatenation diphones

Domaines

Autre [q-bio.OT] Traitement du signal et de l'image [eess.SP] Multimédia [cs.MM] Interface homme-machine [cs.HC] Traitement du signal et de l'image [eess.SP]

Fichier principal

AVSP10-AT.pdf (935.4 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Slim Ouni : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00526782

Soumis le : vendredi 15 octobre 2010-17:08:01

Dernière modification le : jeudi 15 février 2024-03:31:55

Archivage à long terme le : lundi 17 janvier 2011-10:55:05

Dates et versions

inria-00526782 , version 1 (15-10-2010)

Identifiants

HAL Id : inria-00526782 , version 1

Citer

Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, et al.. Towards a True Acoustic-Visual Speech Synthesis. 9th International Conference on Auditory-Visual Speech Processing - AVSP2010, Sep 2010, Hakone, Kanagawa, Japan. pp.POS1-8. ⟨inria-00526782⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA UNIV-LORRAINE INRIA2 LORIA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES ANR UR1-MATH-NUM

290 Consultations

172 Téléchargements

Towards a True Acoustic-Visual Speech Synthesis

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager