Skip to Main content Skip to Navigation
Reports

Reducing the Number of Binary Splits, in Decision Tree Induction, by means of an Hierarchical Classification

Israël-César Lerman 1 Joaquim da Costa 2
1 REPCO - Knowledge Representation
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, INRIA Rennes
Abstract : The main problem considered in this paper consists of binarizing categorical (nominal) attributes having a very large number of values (20^4 in our application). Few number of relevant binary attributes are gathered from each initial attribute. The significant idea consists in grouping the values of an attribute by means of an hierarchical classification method. The similarity between values is associated with the classes to be predicted. The solution that we propose is independant of the number of these classes and can be applied to various situations. A specific use of the obtained classification tree reduces very significantly the number of binary splits of the attribute value set that have to be retained. In fact and for complexity reasons, the hierarchical classification method is combined with formal decomposition and recomposition of the attribute value set. The ARCADE method that we have set up is mainly a powerful hybridation of the celebrated CART method, by our above outlined reduction method. The application of ARCADE, to the protein secondary structure prediction problem, proves the validity of our approach.
Document type :
Reports
Complete list of metadata

https://hal.inria.fr/inria-00073377
Contributor : Rapport de Recherche Inria <>
Submitted on : Wednesday, May 24, 2006 - 12:39:38 PM
Last modification on : Friday, February 12, 2021 - 3:33:13 AM
Long-term archiving on: : Sunday, April 4, 2010 - 9:53:04 PM

Identifiers

  • HAL Id : inria-00073377, version 1

Citation

Israël-César Lerman, Joaquim da Costa. Reducing the Number of Binary Splits, in Decision Tree Induction, by means of an Hierarchical Classification. [Research Report] RR-3312, INRIA. 1997. ⟨inria-00073377⟩

Share

Metrics

Record views

220

Files downloads

162