Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2015

Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier

Résumé

The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the $k$-nearest neighbors ($k$NN) rule in the context of binary classification. Here we focus on the leave-$p$-out cross-validation (L$p$O) used to assess the performance of the $k$NN classifier. Remarkably this L$p$O estimator can be efficiently computed in this context using closed-form formulas derived by \cite{CelisseMaryHuard11}. We describe a general strategy to derive moment and exponential concentration inequalities for the L$p$O estimator applied to the $k$NN classifier. Such results are obtained first by exploiting the connection between the L$p$O estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the L$1$O estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the L$p$O estimator and the classification error/risk of the $k$NN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments.
Fichier principal
Vignette du fichier
knn_celisse_maryhuard.pdf (562.45 Ko) Télécharger le fichier

Dates et versions

hal-01185092 , version 1 (19-08-2015)
hal-01185092 , version 2 (12-10-2017)

Licence

Copyright (Tous droits réservés)

Identifiants

Citer

Alain Celisse, Tristan Mary-Huard. Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier . 2015. ⟨hal-01185092v2⟩
610 Consultations
792 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More