Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training? - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training?

Les unités plus grossières bénéficient-elles d'un préapprentissage de la parole basé sur la prédiction de cluster?

Résumé

The research community has produced many successful selfsupervised speech representation learning methods over the past few years. Discrete units have been utilized in various self-supervised learning frameworks, such as VQ-VAE [1], wav2vec 2.0 [2], HuBERT [3], and Wav2Seq [4]. This paper studies the impact of altering the granularity and improving the quality of these discrete acoustic units for pre-training encoder-only and encoder-decoder models. We systematically study the current proposals of using Byte-Pair Encoding (BPE) and new extensions that use cluster smoothing and Brown clustering. The quality of learned units is studied intrinsically using zero speech metrics and on the downstream speech recognition (ASR) task. Our results suggest that longer-range units are helpful for encoder-decoder pre-training; however, encoder-only masked-prediction models cannot yet benefit from self-supervised word-like targets.
Fichier principal
Vignette du fichier
Do_Coarser_Units_Benefit_Cluster_Prediction-Based_Speech_Pre-Training.pdf (860.59 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04208427 , version 1 (15-09-2023)

Identifiants

Citer

Ali Elkahky, Wei-Ning Hsu, Paden Tomasello, Tu Anh Nguyen, Robin Algayres, et al.. Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training?. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023), Jun 2023, Ixia-Ialyssos, Greece. ⟨10.1109/ICASSP49357.2023.10096788⟩. ⟨hal-04208427⟩
37 Consultations
69 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More