Skip to Main content Skip to Navigation
Conference papers

Modeling Labial Coarticulation with Bidirectional Gated Recurrent Networks and Transfer Learning

Théo Biasutto--Lervat 1 Sara Dahmani 1 Slim Ouni 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In this study, we investigate how to learn labial coarticula-tion to generate a sparse representation of the face from speech. To do so, we experiment a sequential deep learning model, bidi-rectional gated recurrent networks, which have reached nice result in addressing the articulatory inversion problem and so should be able to handle coarticulation effects. As acquiring audiovisual corpora is expensive and time-consuming, we designed our solution to counteract the lack of data. Firstly, we have used phonetic information (phoneme label and respective duration) as input to ensure speaker independence, and in second hand, we have experimented around pretraining strategies to reach acceptable performances. We demonstrate how a careful initialization of the last layers of the network can greatly ease the training and help to handle coarticulation effect. This initialization relies on dimensionality reduction strategies, allowing injecting knowledge of useful latent representation of the visual data into the network. We focused on two data-driven tools (PCA and autoencoder) and one hand-crafted latent space coming from animation community, blendshapes decomposition. We have trained and evaluated the model with a corpus consisting of 4 hours of French speech, and we have gotten an average RMSE close to 1.3mm.
Complete list of metadata

Cited literature [38 references]  Display  Hide  Download
Contributor : Slim Ouni Connect in order to contact the contributor
Submitted on : Saturday, July 6, 2019 - 11:46:16 AM
Last modification on : Tuesday, November 9, 2021 - 4:03:54 AM


Files produced by the author(s)


  • HAL Id : hal-02175780, version 1



Théo Biasutto--Lervat, Sara Dahmani, Slim Ouni. Modeling Labial Coarticulation with Bidirectional Gated Recurrent Networks and Transfer Learning. INTERSPEECH 2019 - 20th Annual Conference of the International Speech Communication Association, Sep 2019, Graz, Austria. ⟨hal-02175780⟩



Les métriques sont temporairement indisponibles