Reducing the Number of Binary Splits, in Decision Tree Induction, by means of an Hierarchical Classification

Israël-César Lerman 1 Joaquim Da Costa 2
1 REPCO - Knowledge Representation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes
Abstract : The main problem considered in this paper consists of binarizing categorical (nominal) attributes having a very large number of values (20^4 in our application). Few number of relevant binary attributes are gathered from each initial attribute. The significant idea consists in grouping the values of an attribute by means of an hierarchical classification method. The similarity between values is associated with the classes to be predicted. The solution that we propose is independant of the number of these classes and can be applied to various situations. A specific use of the obtained classification tree reduces very significantly the number of binary splits of the attribute value set that have to be retained. In fact and for complexity reasons, the hierarchical classification method is combined with formal decomposition and recomposition of the attribute value set. The ARCADE method that we have set up is mainly a powerful hybridation of the celebrated CART method, by our above outlined reduction method. The application of ARCADE, to the protein secondary structure prediction problem, proves the validity of our approach.
Type de document :
[Research Report] RR-3312, INRIA. 1997
Liste complète des métadonnées
Contributeur : Rapport de Recherche Inria <>
Soumis le : mercredi 24 mai 2006 - 12:39:38
Dernière modification le : vendredi 1 juin 2018 - 10:28:01
Document(s) archivé(s) le : dimanche 4 avril 2010 - 21:53:04



  • HAL Id : inria-00073377, version 1


Israël-César Lerman, Joaquim Da Costa. Reducing the Number of Binary Splits, in Decision Tree Induction, by means of an Hierarchical Classification. [Research Report] RR-3312, INRIA. 1997. 〈inria-00073377〉



Consultations de la notice


Téléchargements de fichiers