Skip to Main content Skip to Navigation
New interface
Conference papers

Evaluating the reliability of acoustic speech embeddings

Abstract : Speech embeddings are fixed-size acoustic representations of variable-length speech sequences. They are increasingly used for a variety of tasks ranging from information retrieval to un-supervised term discovery and speech segmentation. However, there is currently no clear methodology to compare or optimize the quality of these embeddings in a task-neutral way. Here, we systematically compare two popular metrics, ABX discrimination and Mean Average Precision (MAP), on 5 languages across 17 embedding methods, ranging from supervised to fully unsu-pervised, and using different loss functions (autoencoders, cor-respondance autoencoders, siamese). Then we use the ABX and MAP to predict performances on a new downstream task: the unsupervised estimation of the frequencies of speech segments in a given corpus. We find that overall, ABX and MAP correlate with one another and with frequency estimation. However, substantial discrepancies appear in the fine-grained distinctions across languages and/or embedding methods. This makes it un-realistic at present to propose a task-independent silver bullet method for computing the intrinsic quality of speech embed-dings. There is a need for more detailed analysis of the metrics currently used to evaluate such embeddings.
Document type :
Conference papers
Complete list of metadata

Cited literature [35 references]  Display  Hide  Download
Contributor : Benoît Sagot Connect in order to contact the contributor
Submitted on : Sunday, October 25, 2020 - 12:58:25 PM
Last modification on : Thursday, October 27, 2022 - 4:02:50 AM
Long-term archiving on: : Tuesday, January 26, 2021 - 6:06:27 PM


Files produced by the author(s)


  • HAL Id : hal-02977539, version 1


Robin Algayres, Mohamed Salah Zaiem, Benoît Sagot, Emmanuel Dupoux. Evaluating the reliability of acoustic speech embeddings. INTERSPEECH 2020 - Annual Conference of the International Speech Communication Association, Oct 2020, Shanghai / Vitrtual, China. ⟨hal-02977539⟩



Record views


Files downloads