Skip to Main content Skip to Navigation
Conference papers

Evaluating the reliability of acoustic speech embeddings

Abstract : Speech embeddings are fixed-size acoustic representations of variable-length speech sequences. They are increasingly used for a variety of tasks ranging from information retrieval to un-supervised term discovery and speech segmentation. However, there is currently no clear methodology to compare or optimize the quality of these embeddings in a task-neutral way. Here, we systematically compare two popular metrics, ABX discrimination and Mean Average Precision (MAP), on 5 languages across 17 embedding methods, ranging from supervised to fully unsu-pervised, and using different loss functions (autoencoders, cor-respondance autoencoders, siamese). Then we use the ABX and MAP to predict performances on a new downstream task: the unsupervised estimation of the frequencies of speech segments in a given corpus. We find that overall, ABX and MAP correlate with one another and with frequency estimation. However, substantial discrepancies appear in the fine-grained distinctions across languages and/or embedding methods. This makes it un-realistic at present to propose a task-independent silver bullet method for computing the intrinsic quality of speech embed-dings. There is a need for more detailed analysis of the metrics currently used to evaluate such embeddings.
Document type :
Conference papers
Complete list of metadatas

Cited literature [35 references]  Display  Hide  Download

https://hal.inria.fr/hal-02977539
Contributor : Benoît Sagot <>
Submitted on : Sunday, October 25, 2020 - 12:58:25 PM
Last modification on : Tuesday, October 27, 2020 - 11:02:41 AM

File

Thu-3-2-6.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02977539, version 1

Collections

Citation

Robin Algayres, Mohamed Zaiem, Benoît Sagot, Emmanuel Dupoux. Evaluating the reliability of acoustic speech embeddings. INTERSPEECH 2020 - Annual Conference of the International Speech Communication Association, Oct 2020, Shanghai / Vitrtual, China. ⟨hal-02977539⟩

Share

Metrics

Record views

20

Files downloads

81