HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier

Alain Celisse 1, 2, * Tristan Mary-Huard 3
* Corresponding author
1 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, METRICS - Evaluation des technologies de santé et des pratiques médicales - ULR 2694, Polytech Lille - École polytechnique universitaire de Lille, Université de Lille, Sciences et Technologies
Abstract : The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the $k$-nearest neighbors ($k$NN) rule in the context of binary classification. Here we focus on the leave-$p$-out cross-validation (L$p$O) used to assess the performance of the $k$NN classifier. Remarkably this L$p$O estimator can be efficiently computed in this context using closed-form formulas derived by \cite{CelisseMaryHuard11}. We describe a general strategy to derive moment and exponential concentration inequalities for the L$p$O estimator applied to the $k$NN classifier. Such results are obtained first by exploiting the connection between the L$p$O estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the L$1$O estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the L$p$O estimator and the classification error/risk of the $k$NN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments.
Document type :
Preprints, Working Papers, ...
Complete list of metadata

https://hal.inria.fr/hal-01185092
Contributor : Alain Celisse Connect in order to contact the contributor
Submitted on : Thursday, October 12, 2017 - 12:51:27 PM
Last modification on : Wednesday, March 23, 2022 - 3:51:06 PM

Licence

Copyright

Identifiers

  • HAL Id : hal-01185092, version 2
  • ARXIV : 1508.04905

Citation

Alain Celisse, Tristan Mary-Huard. Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier . 2015. ⟨hal-01185092v2⟩

Share

Metrics

Record views

484

Files downloads

662