Generative Spoken Dialogue Language Modeling - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Transactions of the Association for Computational Linguistics Année : 2023

Generative Spoken Dialogue Language Modeling

Résumé

We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. We show that our model is able to generate speech, laughter and other paralinguistic signals in the two channels simultaneously and reproduces more naturalistic and fluid turn taking compared to a text-based cascaded model.
Fichier principal
Vignette du fichier
2023.tacl-1.15.pdf (785.59 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Licence : CC BY - Paternité

Dates et versions

hal-03985368 , version 1 (13-02-2023)
hal-03985368 , version 2 (23-01-2024)

Licence

Paternité

Identifiants

Citer

Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, et al.. Generative Spoken Dialogue Language Modeling. Transactions of the Association for Computational Linguistics, 2023, 11, pp.250-266. ⟨10.1162/tacl_a_00545⟩. ⟨hal-03985368v2⟩
136 Consultations
56 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More