Influence functions for CART

Avner Bar Hen 1 Servane Gey 1 Jean-Michel Poggi 2, 3
3 SELECT - Model selection in statistical learning
Inria Saclay - Ile de France, LMO - Laboratoire de Mathématiques d'Orsay, CNRS - Centre National de la Recherche Scientifique : UMR
Abstract : This paper deals with measuring the influence of observations on the results obtained with CART classification trees. To define the influence of individuals on the analysis, we use influence functions to propose some general criterions to measure the sensitivity of the CART analysis and its robustness. The proposals, based on jakknife trees, are organized around two lines: influence on predictions and influence on partitions. In addition, the analysis is extended to the pruned sequences of CART trees to produce a CART specific notion of influence. A numerical example, the well known spam dataset, is presented to illustrate the notions developed throughout the paper. A real dataset relating the administrative classification of cities surrounding Paris, France, to the characteristics of their tax revenues distribution, is finally analyzed using the new influence-based tools.
Type de document :
Pré-publication, Document de travail
Preprint HAL. 2014
Liste complète des métadonnées

https://hal.inria.fr/hal-00944098
Contributeur : Erwan Le Pennec <>
Soumis le : lundi 10 février 2014 - 15:43:16
Dernière modification le : jeudi 9 février 2017 - 15:53:22

Fichier

cart.influence.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00944098, version 1

Collections

Citation

Avner Bar Hen, Servane Gey, Jean-Michel Poggi. Influence functions for CART. Preprint HAL. 2014. <hal-00944098>

Partager

Métriques

Consultations de
la notice

257

Téléchargements du document

94