Skip to Main content Skip to Navigation
Conference papers

Inversion from Audiovisual Speech to Articulatory Information by Exploiting Multimodal Data

Abstract : We present an inversion framework to identify speech production properties from audiovisual information. Our system is built on a multimodal articulatory dataset comprising ultrasound, X-ray, magnetic resonance images as well as audio and stereovisual recordings of the speaker. Visual information is captured via stereovision while the vocal tract state is represented by a properly trained articulatory model. Inversion is based on an adaptive piecewise linear approximation of the audiovisualto- articulation mapping. The presented system can recover the hidden vocal tract shapes and may serve as a basis for a more widely applicable inversion setup.
Complete list of metadata

https://hal.inria.fr/inria-00327031
Contributor : Michael Aron <>
Submitted on : Tuesday, January 6, 2009 - 2:16:36 PM
Last modification on : Tuesday, May 18, 2021 - 3:42:01 PM
Long-term archiving on: : Thursday, June 3, 2010 - 10:24:11 PM

File

KatsamanisRoussosMaragosAronBe...
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00327031, version 1

Collections

Citation

Athanassios Katsamanis, Anastasios Roussos, Petros Maragos, Michael Aron, Marie-Odile Berger. Inversion from Audiovisual Speech to Articulatory Information by Exploiting Multimodal Data. 8th International Seminar On Speech Production - ISSP'08, Dec 2008, Strasbourg, France. ⟨inria-00327031⟩

Share

Metrics

Record views

305

Files downloads

292