Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

Expresso: Un Benchmark et Analyse de la Resynthèse Discrète de la Parole Expressive

Résumé

Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization). The adoption of these methods is still limited by the fact that most speech synthesis datasets are read, severely limiting spontaneity and expressivity. Here, we introduce EXPRESSO, a high-quality expressive speech dataset for textless speech synthesis that includes both read speech and improvised dialogues rendered in 26 spontaneous expressive styles. We illustrate the challenges and potentials of this dataset with an expressive resynthesis benchmark where the task is to encode the input in low-bitrate units and resynthesize it in a target voice while preserving content and style. We evaluate resynthesis quality with automatic metrics for different self-supervised discrete encoders, and explore tradeoffs between quality, bitrate and invariance to speaker and style. The dataset, evaluation metrics and baseline models are open sourced.
Fichier principal
Vignette du fichier
expresso_arxiv.pdf (111.3 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04208441 , version 1 (21-09-2023)

Identifiants

Citer

Tu Anh Nguyen, Wei-Ning Hsu, Antony d'Avirro, Bowen Shi, Itai Gat, et al.. Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis. INTERSPEECH 2023 - 24th Annual Conference of the International Speech Communication Association, Aug 2023, Dublin, Ireland. pp.4823-4827, ⟨10.21437/Interspeech.2023-1905⟩. ⟨hal-04208441⟩
99 Consultations
68 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More