Skip to Main content Skip to Navigation

Limitations of weak labels for embedding and tagging

Nicolas Turpault 1 Romain Serizel 1 Emmanuel Vincent 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : While many datasets and approaches in ambient sound analysis use weakly labeled data, the impact of weak labels on the performance in comparison to strong labels remains unclear. Indeed, weakly labeled data is usually used because it is too expensive to annotate every data with a strong label and for some use cases strong labels are not sure to give better results. Moreover, weak labels are usually mixed with various other challenges like multilabels, unbalanced classes, overlapping events. In this paper, we formulate a supervised problem which involves weak labels. We create a dataset that focuses on difference between strong and weak labels. We investigate the impact of weak labels when training an embedding or an end-to-end classi-fier. Different experimental scenarios are discussed to give insights into which type of applications are most sensitive to weakly labeled data.
Complete list of metadatas

Cited literature [30 references]  Display  Hide  Download

https://hal.inria.fr/hal-02467401
Contributor : Nicolas Turpault <>
Submitted on : Friday, February 7, 2020 - 10:08:58 AM
Last modification on : Wednesday, March 18, 2020 - 3:32:53 PM

Files

icassp2020.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02467401, version 2
  • ARXIV : 2002.01687

Citation

Nicolas Turpault, Romain Serizel, Emmanuel Vincent. Limitations of weak labels for embedding and tagging. ICASSP 2020 - 45th International Conference on Acoustics, Speech, and Signal Processing, May 2020, Barcelona, Spain. ⟨hal-02467401v2⟩

Share

Metrics

Record views

65

Files downloads

142