Skip to Main content Skip to Navigation
Conference papers

Limitations of weak labels for embedding and tagging

Nicolas Turpault 1 Romain Serizel 1 Emmanuel Vincent 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : While many datasets and approaches in ambient sound analysis use weakly labeled data, the impact of weak labels on the performance in comparison to strong labels remains unclear. Indeed, weakly labeled data is usually used because it is too expensive to annotate every data with a strong label and for some use cases strong labels are not sure to give better results. Moreover, weak labels are usually mixed with various other challenges like multilabels, unbalanced classes, overlapping events. In this paper, we formulate a supervised problem which involves weak labels. We create a dataset that focuses on difference between strong and weak labels. We investigate the impact of weak labels when training an embedding or an end-to-end classi-fier. Different experimental scenarios are discussed to give insights into which type of applications are most sensitive to weakly labeled data.
Complete list of metadatas

Cited literature [30 references]  Display  Hide  Download
Contributor : Nicolas Turpault <>
Submitted on : Thursday, April 30, 2020 - 8:07:25 PM
Last modification on : Tuesday, May 5, 2020 - 1:34:24 AM


Files produced by the author(s)


  • HAL Id : hal-02467401, version 3
  • ARXIV : 2002.01687



Nicolas Turpault, Romain Serizel, Emmanuel Vincent. Limitations of weak labels for embedding and tagging. ICASSP 2020 - 45th International Conference on Acoustics, Speech, and Signal Processing, May 2020, Barcelona, Spain. ⟨hal-02467401v3⟩



Record views


Files downloads