Identify ambiguous tasks combining crowdsourced labels by weighting Areas Under the Margin

In supervised learning - for instance in image classification - modern massive datasets are commonly labeled by a crowd of workers. The obtained labels in this crowdsourcing setting are then aggregated for training. The aggregation step generally leverages a per-worker trust score. Yet, such worker-centric approaches discard each task's ambiguity. Some intrinsically ambiguous tasks might even fool expert workers, which could eventually be harmful to the learning step. In a standard supervised learning setting - with one label per task - the Area Under the Margin (AUM) is tailored to identify mislabeled data. We adapt the AUM to identify ambiguous tasks in crowdsourced learning scenarios, introducing the Weighted AUM (WAUM). The WAUM is an average of AUMs weighted by task-dependent scores. We show that the WAUM can help discard ambiguous tasks from the training set, leading to better generalization or calibration performance. We report improvements over existing strategies for learning a crowd, both for simulated settings and for the CIFAR-10H, LabelMe and Music crowdsourced datasets.

Domains

Machine Learning [cs.LG] Other Statistics [stat.ML]

Joseph Salmon : Connect in order to contact the contributor

https://hal.science/hal-03812716

Submitted on : Wednesday, October 12, 2022-9:11:33 PM

Last modification on : Saturday, April 27, 2024-3:10:46 AM

Dates and versions

hal-03812716 , version 1 (12-10-2022)

Identifiers

HAL Id : hal-03812716 , version 1
ARXIV : 2209.15380

Cite

Tanguy Lefort, Benjamin Charlier, Alexis Joly, Joseph Salmon. Identify ambiguous tasks combining crowdsourced labels by weighting Areas Under the Margin. 2022. ⟨hal-03812716⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA I3M_UMR5149 INSMI ZENITH LIRMM IMAG-MONTPELLIER INRIA2 UNIV-MONTPELLIER ANR

44 View

0 Download