Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier

Alain Celisse 1, 2, * Tristan Mary-Huard 3
* Auteur correspondant
1 MODAL - MOdel for Data Analysis and Learning
Inria Lille - Nord Europe, LPP - Laboratoire Paul Painlevé - UMR 8524, CERIM - Santé publique : épidémiologie et qualité des soins-EA 2694, Polytech Lille, Université de Lille 1, IUT’A
Abstract : The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the $k$-nearest neighbors ($k$NN) rule in the context of binary classification. Here we focus on the leave-$p$-out cross-validation (L$p$O) used to assess the performance of the $k$NN classifier. Remarkably this L$p$O estimator can be efficiently computed in this context using closed-form formulas derived by \cite{CelisseMaryHuard11}. We describe a general strategy to derive moment and exponential concentration inequalities for the L$p$O estimator applied to the $k$NN classifier. Such results are obtained first by exploiting the connection between the L$p$O estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the L$1$O estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the L$p$O estimator and the classification error/risk of the $k$NN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments.
Type de document :
Pré-publication, Document de travail
2015
Liste complète des métadonnées

https://hal.inria.fr/hal-01185092
Contributeur : Alain Celisse <>
Soumis le : jeudi 12 octobre 2017 - 12:51:27
Dernière modification le : jeudi 11 janvier 2018 - 06:23:19

Licence


Copyright (Tous droits réservés)

Identifiants

  • HAL Id : hal-01185092, version 2
  • ARXIV : 1508.04905

Citation

Alain Celisse, Tristan Mary-Huard. Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier . 2015. 〈hal-01185092v2〉

Partager

Métriques

Consultations de la notice

60

Téléchargements de fichiers

16