Autoencoder-Based Tongue Shape Estimation During Continuous Speech - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Autoencoder-Based Tongue Shape Estimation During Continuous Speech

Résumé

Vocal tract shape estimation is a necessary step for articulatory speech synthesis. However, the literature on the topic is scarce, and most current methods lack adequacy to many physical constraints related to speech production. This study proposes an alternative approach to the task to solve specific issues faced in the previous work, especially those related to critical articulators. We present an autoencoder-based method for tongue shape estimation during continuous speech. An autoencoder is trained to learn the data's encoding and serves as an auxiliary network for the principal one, which maps phonemes to the shapes. Instead of predicting the exact points in the target curve, the neural network learns how to predict the curve's main components, i.e., the autoencoder's representation. We show how this approach allows imposing critical articulators' constraints, controlling the tongue shape through the latent space, and generating a smooth output without relying on any postprocessing method.
Fichier principal
Vignette du fichier
INTERSPEECH_2022_Autoencoder_Tongue_Shape_RibeiroLaprie.pdf (364.94 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03798790 , version 1 (05-10-2022)

Identifiants

  • HAL Id : hal-03798790 , version 1

Citer

Vinicius Ribeiro, Yves Laprie. Autoencoder-Based Tongue Shape Estimation During Continuous Speech. 23rd INTERSPEECH Conference on "Human and Humanizing Speech Technology”, Sep 2022, Incheon, South Korea. ⟨hal-03798790⟩
74 Consultations
88 Téléchargements

Partager

Gmail Facebook X LinkedIn More