Phoneme-to-Articulatory mapping using bidirectional gated RNN

Théo Biasutto– Lervat 1 Slim Ouni 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Deriving articulatory dynamics from the acoustic speech signal has been addressed in several speech production studies. In this paper, we investigate whether it is possible to predict articulatory dynamics from phonetic information without having the acoustic speech signal. The input data may be considered as not sufficiently rich acoustically, as probably there is no explicit coarticulation information but we expect that the phonetic sequence provides compact yet rich knowledge. Motivated by the recent success of deep learning techniques used in the acoustic-to-articulatory inversion, we have experimented around the bidirectional gated recurrent neural network archi-tectures. We trained these models with an EMA corpus, and have obtained good performances similar to the state-of-the-art articulatory inversion from LSF features, but using only the phoneme labels and durations.
Liste complète des métadonnées

https://hal.inria.fr/hal-01862587
Contributor : Slim Ouni <>
Submitted on : Monday, August 27, 2018 - 2:40:29 PM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM
Document(s) archivé(s) le : Wednesday, November 28, 2018 - 4:15:47 PM

File

1202_Paper.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01862587, version 1

Citation

Théo Biasutto– Lervat, Slim Ouni. Phoneme-to-Articulatory mapping using bidirectional gated RNN. Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association, Sep 2018, Hyderabad, India. ⟨hal-01862587⟩

Share

Metrics

Record views

132

Files downloads

126