New upper bounds on cross-validation for the k-Nearest Neighbor classification rule - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2015

New upper bounds on cross-validation for the k-Nearest Neighbor classification rule

Résumé

The present work addresses binary classification by use of the k-nearest neighbors (kNN) classifier. Among several assets, it belongs to intuitive majority vote classification rules and also adapts to spatial inhomogeneity, which is particularly relevant in high dimensional settings where no a priori partitioning of the space seems realistic. However the performance of the kNN classifier crucially depends on the number k of neighbors that will be considered. To calibrate the parameter k, cross-validation procedures such as V-fold or leave-one-out are usually used. But on the one hand these procedures can become highly time-consuming. On the other hand, not that much theoretical guaranties do exist on the performance of such procedures. Recently [11] have derived closed-form formulas for the leave-pout estimator of the kNN classifier performance. Such formulas now allow to efficiently perform cross-validation. The main purpose of the present article is twofold: First, we provide a new strategy to derive bounds on moments of the leave-pout estimator used to assess the performance of the kNN classifier. This new strategy exploits the link between leave-pout and U-statistics as well as the generalized Efron-Stein inequality. Second, these moment upper bounds are used to settle a new exponential concentration inequality for
Fichier principal
Vignette du fichier
KnnCelisseMaryHuard_HAL.pdf (407.02 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01185092 , version 1 (19-08-2015)
hal-01185092 , version 2 (12-10-2017)

Licence

Copyright (Tous droits réservés)

Identifiants

Citer

Alain Celisse, Tristan Mary-Huard. New upper bounds on cross-validation for the k-Nearest Neighbor classification rule. 2015. ⟨hal-01185092v1⟩
616 Consultations
795 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More