HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation

Reducing the Number of Binary Splits, in Decision Tree Induction, by means of an Hierarchical Classification

Israël-César Lerman 1 Joaquim da Costa 2
1 REPCO - Knowledge Representation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes
Abstract : The main problem considered in this paper consists of binarizing categorical (nominal) attributes having a very large number of values (20^4 in our application). Few number of relevant binary attributes are gathered from each initial attribute. The significant idea consists in grouping the values of an attribute by means of an hierarchical classification method. The similarity between values is associated with the classes to be predicted. The solution that we propose is independant of the number of these classes and can be applied to various situations. A specific use of the obtained classification tree reduces very significantly the number of binary splits of the attribute value set that have to be retained. In fact and for complexity reasons, the hierarchical classification method is combined with formal decomposition and recomposition of the attribute value set. The ARCADE method that we have set up is mainly a powerful hybridation of the celebrated CART method, by our above outlined reduction method. The application of ARCADE, to the protein secondary structure prediction problem, proves the validity of our approach.
Document type :
Complete list of metadata

Contributor : Rapport de Recherche Inria Connect in order to contact the contributor
Submitted on : Wednesday, May 24, 2006 - 12:39:38 PM
Last modification on : Friday, April 22, 2022 - 11:42:04 AM
Long-term archiving on: : Sunday, April 4, 2010 - 9:53:04 PM


  • HAL Id : inria-00073377, version 1


Israël-César Lerman, Joaquim da Costa. Reducing the Number of Binary Splits, in Decision Tree Induction, by means of an Hierarchical Classification. [Research Report] RR-3312, INRIA. 1997. ⟨inria-00073377⟩



Record views


Files downloads