On Tail Decay Rate Estimation of Loss Function Distributions - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Journal of Machine Learning Research Année : 2022

On Tail Decay Rate Estimation of Loss Function Distributions

Résumé

The study of loss function distributions is critical to characterize a model's behaviour on a given machine learning problem. For example, while the quality of a model is commonly determined by the average loss assessed on a testing set, this quantity does not reflect the existence of the true mean of the loss distribution. Indeed, the finiteness of the statistical moments of the loss distribution is related to the thickness of its tails, which are generally unknown. Since typical cross-validation schemes determine a family of testing loss distributions conditioned on the training samples, the total loss distribution must be recovered by marginalizing over the space of training sets. As we show in this work, the finiteness of the sampling procedure negatively affects the reliability and efficiency of classical tail estimation methods from the Extreme Value Theory, such as the Peaks-Over-Threshold approach. In this work we tackle this issue by developing a novel general theory for estimating the tails of marginal distributions, when there exists a large variability between locations of the individual conditional distributions underlying the marginal. To this end, we demonstrate that under some regularity conditions, the shape parameter of the marginal distribution is the maximum tail shape parameter of the family of conditional distributions. We term this estimation approach as cross-tail estimation (CTE). We test cross-tail estimation in a series of experiments on simulated and real data 1 , showing the improved robustness and quality of tail estimation as compared to classical approaches, and providing evidence for the relationship between model performance and loss distribution tail thickness.
Fichier principal
Vignette du fichier
Tails_Presub_JMLR__Final_Submission_.pdf (2.9 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03911884 , version 1 (23-12-2022)
hal-03911884 , version 2 (23-12-2023)

Identifiants

  • HAL Id : hal-03911884 , version 1

Citer

Etrit Haxholli, Marco Lorenzi. On Tail Decay Rate Estimation of Loss Function Distributions. Journal of Machine Learning Research, In press. ⟨hal-03911884v1⟩
158 Consultations
98 Téléchargements

Partager

Gmail Facebook X LinkedIn More