Generative Spoken Dialogue Language Modeling

Tu Anh Nguyen; Eugene Kharitonov; Jade Copet; Yossi Adi; Wei-Ning Hsu; Ali Elkahky; Paden Tomasello; Robin Algayres; Benoît Sagot; Abdelrahman Mohamed; Emmanuel Dupoux

Communication Dans Un Congrès Année : 2023

Generative Spoken Dialogue Language Modeling

(1, 2) , (3) , (3) , (3) , (3) , (3) , (3) , (3) , (1) , (3) , (3, 4, 5, 6, 7)

1
2
3
4
5
6
7

Tu Anh Nguyen

Fonction : Auteur
PersonId : 1284418
IdHAL : ntuanh

Automatic Language Modelling and ANAlysis & Computational Humanities

Meta AI Research [Paris]

Eugene Kharitonov

Fonction : Auteur

Meta AI

Jade Copet

Fonction : Auteur

Meta AI

Yossi Adi

Fonction : Auteur

Meta AI

Wei-Ning Hsu

Fonction : Auteur

Meta AI

Ali Elkahky

Fonction : Auteur

Meta AI

Paden Tomasello

Fonction : Auteur

Meta AI

Robin Algayres

Fonction : Auteur

Meta AI

Benoît Sagot

Fonction : Auteur
PersonId : 1461
IdHAL : bsagot
ORCID : 0000-0002-0107-8526
IdRef : 177454229

Automatic Language Modelling and ANAlysis & Computational Humanities

Abdelrahman Mohamed

Fonction : Auteur

Meta AI

Emmanuel Dupoux

Fonction : Auteur

Meta AI

Département d'Etudes Cognitives - ENS Paris

Laboratoire de sciences cognitives et psycholinguistique

École des hautes études en sciences sociales

Apprentissage machine et développement cognitif

Résumé

We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. We show that our model is able to generate speech, laughter and other paralinguistic signals in the two channels simultaneously and reproduces more naturalistic and fluid turn taking compared to a text-based cascaded model 12 .

Domaines

Linguistique

Fichier principal

2203.16502.pdf (2.28 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Sabrina Zermani : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03985368

Soumis le : lundi 13 février 2023-11:24:30

Dernière modification le : vendredi 19 avril 2024-16:18:55

Archivage à long terme le : dimanche 14 mai 2023-19:23:04

Dates et versions

hal-03985368 , version 1 (13-02-2023)

hal-03985368 , version 2 (23-01-2024)

Licence

Paternité

Identifiants

HAL Id : hal-03985368 , version 1

Citer

Tu Anh Nguyen, Eugene Kharitonov, Jade Copet, Yossi Adi, Wei-Ning Hsu, et al.. Generative Spoken Dialogue Language Modeling. SLT-2022 - IEEE Spoken Language Technology Workshop, Jan 2023, Doha, Qatar. ⟨hal-03985368v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

137 Consultations

57 Téléchargements

Generative Spoken Dialogue Language Modeling

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Partager