Generative Spoken Dialogue Language Modeling

Tu Anh Nguyen; Eugene Kharitonov; Jade Copet; Yossi Adi; Wei-Ning Hsu; Ali Elkahky; Paden Tomasello; Robin Algayres; Benoît Sagot; Abdelrahman Mohamed; Emmanuel Dupoux

doi:10.1162/tacl_a_00545

Article Dans Une Revue Transactions of the Association for Computational Linguistics Année : 2023

Generative Spoken Dialogue Language Modeling

(1, 2) , (3) , (3) , (3) , (3) , (3) , (3) , (3) , (1) , (3) , (3, 4, 5, 6, 7)

1
2
3
4
5
6
7

Tu Anh Nguyen

Fonction : Auteur
PersonId : 1284418
IdHAL : ntuanh

Automatic Language Modelling and ANAlysis & Computational Humanities

Meta AI Research [Paris]

Eugene Kharitonov

Fonction : Auteur

Meta AI

Jade Copet

Fonction : Auteur

Meta AI

Yossi Adi

Fonction : Auteur

Meta AI

Wei-Ning Hsu

Fonction : Auteur

Meta AI

Ali Elkahky

Fonction : Auteur

Meta AI

Paden Tomasello

Fonction : Auteur

Meta AI

Robin Algayres

Fonction : Auteur

Meta AI

Benoît Sagot

Fonction : Auteur
PersonId : 1461
IdHAL : bsagot
ORCID : 0000-0002-0107-8526
IdRef : 177454229

Automatic Language Modelling and ANAlysis & Computational Humanities

Abdelrahman Mohamed

Fonction : Auteur

Meta AI

Emmanuel Dupoux

Fonction : Auteur

Meta AI

Département d'Etudes Cognitives - ENS Paris

Laboratoire de sciences cognitives et psycholinguistique

École des hautes études en sciences sociales

Apprentissage machine et développement cognitif

Résumé

We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. We show that our model is able to generate speech, laughter and other paralinguistic signals in the two channels simultaneously and reproduces more naturalistic and fluid turn taking compared to a text-based cascaded model.

Domaines

Informatique et langage [cs.CL]

Fichier principal

2023.tacl-1.15.pdf (785.59 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte
Licence : CC BY - Paternité

Benoît Sagot : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03985368

Soumis le : mardi 23 janvier 2024-15:43:24

Dernière modification le : vendredi 19 avril 2024-16:18:55

Dates et versions

hal-03985368 , version 1 (13-02-2023)

hal-03985368 , version 2 (23-01-2024)

Licence

Paternité

Identifiants

HAL Id : hal-03985368 , version 2
ARXIV : 2203.16502
DOI : 10.1162/tacl_a_00545

Citer

Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, et al.. Generative Spoken Dialogue Language Modeling. Transactions of the Association for Computational Linguistics, 2023, 11, pp.250-266. ⟨10.1162/tacl_a_00545⟩. ⟨hal-03985368v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA EHESS LSCP DEC INRIA2 PSL ANR PRAIRIE-IA

136 Consultations

56 Téléchargements

Generative Spoken Dialogue Language Modeling

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager