Inversion from Audiovisual Speech to Articulatory Information by Exploiting Multimodal Data

Abstract : We present an inversion framework to identify speech production properties from audiovisual information. Our system is built on a multimodal articulatory dataset comprising ultrasound, X-ray, magnetic resonance images as well as audio and stereovisual recordings of the speaker. Visual information is captured via stereovision while the vocal tract state is represented by a properly trained articulatory model. Inversion is based on an adaptive piecewise linear approximation of the audiovisualto- articulation mapping. The presented system can recover the hidden vocal tract shapes and may serve as a basis for a more widely applicable inversion setup.
Type de document :
Communication dans un congrès
8th International Seminar On Speech Production - ISSP'08, Dec 2008, Strasbourg, France. 2008
Liste complète des métadonnées

https://hal.inria.fr/inria-00327031
Contributeur : Michael Aron <>
Soumis le : mardi 6 janvier 2009 - 14:16:36
Dernière modification le : jeudi 11 janvier 2018 - 06:20:14
Document(s) archivé(s) le : jeudi 3 juin 2010 - 22:24:11

Fichier

KatsamanisRoussosMaragosAronBe...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00327031, version 1

Collections

Citation

Athanassios Katsamanis, Anastasios Roussos, Petros Maragos, Michael Aron, Marie-Odile Berger. Inversion from Audiovisual Speech to Articulatory Information by Exploiting Multimodal Data. 8th International Seminar On Speech Production - ISSP'08, Dec 2008, Strasbourg, France. 2008. 〈inria-00327031〉

Partager

Métriques

Consultations de la notice

255

Téléchargements de fichiers

75