Acoustic-visual synthesis technique using bimodal unit-selection

Slim Ouni 1 Vincent Colotte 1 Utpala Musti 1 Asterios Toutios 1 Brigitte Wrobel-Dautcourt 2 Marie-Odile Berger 2 Caroline Lavecchia 1
1 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
2 MAGRIT - Visual Augmentation of Complex Environments
Inria Nancy - Grand Est, LORIA - ALGO - Department of Algorithms, Computation, Image and Geometry
Abstract : This paper presents a bimodal acoustic-visual synthesis technique that concurrently generates the acoustic speech signal and a 3D animation of the speaker's outer face. This is done by concatenating bimodal diphone units that consist of both acoustic and visual information. In the visual domain, we mainly focus on the dynam- ics of the face rather than on rendering. The proposed technique overcomes the problems of asynchrony and incoherence inherent in classic approaches to audiovisual synthesis. The different synthesis steps are similar to typical concatenative speech synthesis but are generalized to the acoustic-visual domain. The bimodal synthesis was evaluated using perceptual and subjective evaluations. The overall outcome of the evaluation indicates that the proposed bimodal acoustic-visual synthesis technique provides intelligible speech in both acoustic and visual channels.
Type de document :
Article dans une revue
EURASIP Journal on Audio, Speech, and Music Processing, SpringerOpen, 2013, 〈http://asmp.eurasipjournals.com/content/2013/1/16〉. 〈10.1186/1687-4722-2013-16〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00835854
Contributeur : Slim Ouni <>
Soumis le : jeudi 20 juin 2013 - 00:47:19
Dernière modification le : jeudi 11 janvier 2018 - 06:25:24

Lien texte intégral

Identifiants

Citation

Slim Ouni, Vincent Colotte, Utpala Musti, Asterios Toutios, Brigitte Wrobel-Dautcourt, et al.. Acoustic-visual synthesis technique using bimodal unit-selection. EURASIP Journal on Audio, Speech, and Music Processing, SpringerOpen, 2013, 〈http://asmp.eurasipjournals.com/content/2013/1/16〉. 〈10.1186/1687-4722-2013-16〉. 〈hal-00835854〉

Partager

Métriques

Consultations de la notice

251