Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier

Alain Celisse 1, 2, * Tristan Mary-Huard 3
* Corresponding author
1 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, METRICS - Evaluation des technologies de santé et des pratiques médicales - ULR 2694, Polytech Lille - École polytechnique universitaire de Lille, Université de Lille, Sciences et Technologies
Abstract : The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the $k$-nearest neighbors ($k$NN) rule in the context of binary classification. Here we focus on the leave-$p$-out cross-validation (L$p$O) used to assess the performance of the $k$NN classifier. Remarkably this L$p$O estimator can be efficiently computed in this context using closed-form formulas derived by \cite{CelisseMaryHuard11}. We describe a general strategy to derive moment and exponential concentration inequalities for the L$p$O estimator applied to the $k$NN classifier. Such results are obtained first by exploiting the connection between the L$p$O estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the L$1$O estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the L$p$O estimator and the classification error/risk of the $k$NN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments.
Document type :
Preprints, Working Papers, ...
Complete list of metadata
Contributor : Alain Celisse <>
Submitted on : Thursday, October 12, 2017 - 12:51:27 PM
Last modification on : Friday, November 27, 2020 - 2:18:02 PM




  • HAL Id : hal-01185092, version 2
  • ARXIV : 1508.04905



Alain Celisse, Tristan Mary-Huard. Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier . 2015. ⟨hal-01185092v2⟩



Record views


Files downloads