Autoencoder-Based Tongue Shape Estimation During Continuous Speech

Vinicius Ribeiro; Yves Laprie

Communication Dans Un Congrès Année : 2022

Autoencoder-Based Tongue Shape Estimation During Continuous Speech

(1) , (1)

Vinicius Ribeiro

Fonction : Auteur
PersonId : 825550
IdHAL : vinicius-de-paulo-souza-ribeiro
ORCID : 0000-0001-5897-5765

Speech Modeling for Facilitating Oral-Based Communication

Yves Laprie

Fonction : Auteur
PersonId : 6696
IdHAL : yves-laprie
ORCID : 0000-0002-2379-6481
IdRef : 060274387

Speech Modeling for Facilitating Oral-Based Communication

Résumé

Vocal tract shape estimation is a necessary step for articulatory speech synthesis. However, the literature on the topic is scarce, and most current methods lack adequacy to many physical constraints related to speech production. This study proposes an alternative approach to the task to solve specific issues faced in the previous work, especially those related to critical articulators. We present an autoencoder-based method for tongue shape estimation during continuous speech. An autoencoder is trained to learn the data's encoding and serves as an auxiliary network for the principal one, which maps phonemes to the shapes. Instead of predicting the exact points in the target curve, the neural network learns how to predict the curve's main components, i.e., the autoencoder's representation. We show how this approach allows imposing critical articulators' constraints, controlling the tongue shape through the latent space, and generating a smooth output without relying on any postprocessing method.

Mots clés

autoencoder tongue shape estimation Speech processing phonetics

Domaines

Informatique et langage [cs.CL]

Fichier principal

INTERSPEECH_2022_Autoencoder_Tongue_Shape_RibeiroLaprie.pdf (364.94 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Yves Laprie : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03798790

Soumis le : mercredi 5 octobre 2022-14:30:08

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : vendredi 6 janvier 2023-19:25:38

Dates et versions

hal-03798790 , version 1 (05-10-2022)

Identifiants

HAL Id : hal-03798790 , version 1

Citer

Vinicius Ribeiro, Yves Laprie. Autoencoder-Based Tongue Shape Estimation During Continuous Speech. 23rd INTERSPEECH Conference on "Human and Humanizing Speech Technology”, Sep 2022, Incheon, South Korea. ⟨hal-03798790⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD

74 Consultations

88 Téléchargements

Autoencoder-Based Tongue Shape Estimation During Continuous Speech

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager