Skip to Main content Skip to Navigation
Conference papers

Limitations of weak labels for embedding and tagging

Nicolas Turpault 1 Romain Serizel 1 Emmanuel Vincent 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Many datasets and approaches in ambient sound analysis use weakly labeled data. Weak labels are employed because annotating every data sample with a strong label is too expensive. Yet, their impact on the performance in comparison to strong labels remains unclear. Indeed, weak labels must often be dealt with at the same time as other challenges, namely multiple labels per sample, unbalanced classes and/or overlapping events. In this paper, we formulate a supervised learning problem which involves weak labels. We create a dataset that focuses on the difference between strong and weak labels as opposed to other challenges. We investigate the impact of weak labels when training an embedding or an end-to-end classifier. Different experimental scenarios are discussed to provide insights into which applications are most sensitive to weakly labeled data.
Complete list of metadatas

https://hal.inria.fr/hal-02467401
Contributor : Nicolas Turpault <>
Submitted on : Monday, December 7, 2020 - 11:06:08 AM
Last modification on : Monday, December 14, 2020 - 5:52:12 PM

Files

icassp2020.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02467401, version 4
  • ARXIV : 2002.01687

Citation

Nicolas Turpault, Romain Serizel, Emmanuel Vincent. Limitations of weak labels for embedding and tagging. ICASSP 2020 - 45th International Conference on Acoustics, Speech, and Signal Processing, May 2020, Barcelona, Spain. ⟨hal-02467401v4⟩

Share

Metrics

Record views

14

Files downloads

27