Pl@ntNet-300K: a plant image dataset with high label ambiguity and a long-tailed distribution - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Pl@ntNet-300K: a plant image dataset with high label ambiguity and a long-tailed distribution

Résumé

This paper presents a novel image dataset with high intrinsic ambiguity and a longtailed distribution built from the database of Pl@ntNet citizen observatory. It consists of 306,146 plant images covering 1,081 species. We highlight two particular features of the dataset, inherent to the way the images are acquired and to the intrinsic diversity of plants morphology: (i) the dataset has a strong class imbalance, i.e., a few species account for most of the images, and, (ii) many species are visually similar, rendering identification difficult even for the expert eye. These two characteristics make the present dataset well suited for the evaluation of set-valued classification methods and algorithms. Therefore, we recommend two set-valued evaluation metrics associated with the dataset (macro-average top-k accuracy and macro-average average-k accuracy) and we provide baseline results established by training deep neural networks using the cross-entropy loss.
Fichier principal
Vignette du fichier
supplementary.pdf (3.4 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03474556 , version 1 (10-12-2021)
hal-03474556 , version 2 (09-02-2022)

Licence

Paternité

Identifiants

Citer

Camille Garcin, Alexis Joly, Pierre Bonnet, Jean-Christophe Lombardo, Antoine Affouard, et al.. Pl@ntNet-300K: a plant image dataset with high label ambiguity and a long-tailed distribution. NeurIPS 2021 - 35th Conference on Neural Information Processing Systems, Dec 2021, Virtual Conference, France. ⟨10.5281/zenodo.5645731⟩. ⟨hal-03474556v2⟩
524 Consultations
743 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More