Inversion from Audiovisual Speech to Articulatory Information by Exploiting Multimodal Data
Résumé
We present an inversion framework to identify speech production properties from audiovisual information. Our system is built on a multimodal articulatory dataset comprising ultrasound, X-ray, magnetic resonance images as well as audio and stereovisual recordings of the speaker. Visual information is captured via stereovision while the vocal tract state is represented by a properly trained articulatory model. Inversion is based on an adaptive piecewise linear approximation of the audiovisualto- articulation mapping. The presented system can recover the hidden vocal tract shapes and may serve as a basis for a more widely applicable inversion setup.
Fichier principal
KatsamanisRoussosMaragosAronBerger_AVInversionMultimodalArtData_issp2008.pdf (696.35 Ko)
Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)