Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge

Ewan Dunbar; Nicolas Hamilakis; Emmanuel Dupoux

doi:10.1109/jstsp.2022.3206084

Article Dans Une Revue IEEE Journal of Selected Topics in Signal Processing Année : 2022

Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge

(1, 2) , (3, 2) , (4, 5, 2)

1
2
3
4
5

Ewan Dunbar

Fonction : Auteur
PersonId : 1143247
ORCID : 0000-0001-9603-953X
IdRef : 243352808

University of Toronto

Apprentissage machine et développement cognitif

Nicolas Hamilakis

Fonction : Auteur

École normale supérieure - Paris

Apprentissage machine et développement cognitif

Emmanuel Dupoux

Fonction : Auteur

École des hautes études en sciences sociales

Meta AI

Apprentissage machine et développement cognitif

Résumé

Recent progress in self-supervised or unsupervised machine learning has opened the possibility of building a full speech processing system from raw audio without using any textual representations or expert labels such as phonemes, dictionaries or parse trees. The contribution of the Zero Resource Speech Challenge series since 2015 has been to break down this long-term objective into four well-defined tasks-Acoustic Unit Discovery, Spoken Term Discovery, Discrete Resynthesis, and Spoken Language Modeling-and introduce associated metrics and benchmarks enabling model comparison and cumulative progress. We present an overview of the six editions of this challenge series since 2015, discuss the lessons learned, and outline the areas which need more work or give puzzling results.

Mots clés

Textless speech processing Unsupervised and self-supervised learning Representation learning

Domaines

Informatique et langage [cs.CL] Intelligence artificielle [cs.AI] Linguistique

Fichier principal

JSTSP3206084 (1).pdf (10.34 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Ewan Dunbar : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03789716

Soumis le : mardi 27 septembre 2022-15:58:14

Dernière modification le : vendredi 19 avril 2024-16:18:57

Archivage à long terme le : mercredi 28 décembre 2022-19:39:43

Dates et versions

hal-03789716 , version 1 (27-09-2022)

Identifiants

HAL Id : hal-03789716 , version 1
DOI : 10.1109/jstsp.2022.3206084

Citer

Ewan Dunbar, Nicolas Hamilakis, Emmanuel Dupoux. Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge. IEEE Journal of Selected Topics in Signal Processing, In press, ⟨10.1109/jstsp.2022.3206084⟩. ⟨hal-03789716⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA EHESS LSCP DEC INRIA2 PSL ANR PRAIRIE-IA

47 Consultations

92 Téléchargements

Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager