Unsupervised pretraining transfers well across languages - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Unsupervised pretraining transfers well across languages

Résumé

Cross-lingual and multilingual training of Automatic Speech Recognition (ASR) has been extensively investigated in the supervised setting. This assumes the existence of a parallel corpus of speech and orthographic transcriptions. Recently, contrastive predictive coding (CPC) algorithms have been proposed to pretrain ASR systems with unlabelled data. In this work, we investigate whether unsupervised pretraining transfers well across languages. We show that a slight modification of the CPC pretraining extracts features that transfer well to other languages, being on par or even outperforming supervised pretraining. This shows the potential of unsuper-vised methods for languages with few linguistic resources.
Fichier principal
Vignette du fichier
2002.02848.pdf (277.9 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02959418 , version 1 (06-10-2020)

Identifiants

Citer

Morgane Rivière, Armand Joulin, Pierre-Emmanuel Mazaré, Emmanuel Dupoux. Unsupervised pretraining transfers well across languages. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2020, Barcelona / Virtual, Spain. pp.7414-7418, ⟨10.1109/ICASSP40776.2020.9054548⟩. ⟨hal-02959418⟩
65 Consultations
172 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More