Predicting Problem Difficulty for Genetic Programming Applied to Data Classification

Leonardo Trujillo 1 Yuliana Martinez 1 Edgar Galvan-Lopez 2 Pierrick Legrand 3, 4
2 School of Computer Science and Electronic Engineering
CSEE - School of Computer Science and Electronic Engineering [Essex]
4 ALEA - Advanced Learning Evolutionary Algorithms
Inria Bordeaux - Sud-Ouest, UB - Université de Bordeaux, CNRS - Centre National de la Recherche Scientifique : UMR5251
Abstract : During the development of applied systems, an important problem that must be addressed is that of choosing the correct tools for a given domain or scenario. This general task has been addressed by the genetic programming (GP) community by attempting to determine the intrinsic difficulty that a problem poses for a GP search. This paper presents an approach to predict the performance of GP applied to data classification, one of themost common problems in computer science. The novelty of the proposal is to extract statistical descriptors and complexity descriptors of the problem data, and from these estimate the expected performance of a GP classifier. We derive two types of predictive models: linear regression models and symbolic regression models evolved with GP. The experimental results show that both approaches provide good estimates of classifier performance, using synthetic and real-world problems for validation. In conclusion, this paper shows that it is possible to accurately predict the expected performance of a GP classifier using a set of descriptors that characterize the problem data.
Type de document :
Communication dans un congrès
Natalio KrasnogorUniversity of Nottingham, UK. Gecco 2011, Jul 2011, Dublin, Ireland. ACM New York, NY, USA ©2011, pp.1355-1362, 2011, 〈10.1145/2001576.2001759〉
Liste complète des métadonnées

Littérature citée [28 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00643358
Contributeur : Pierrick Legrand <>
Soumis le : lundi 21 novembre 2011 - 21:22:15
Dernière modification le : jeudi 11 janvier 2018 - 06:22:36
Document(s) archivé(s) le : mercredi 22 février 2012 - 02:31:08

Fichier

Gecco_2011.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Leonardo Trujillo, Yuliana Martinez, Edgar Galvan-Lopez, Pierrick Legrand. Predicting Problem Difficulty for Genetic Programming Applied to Data Classification. Natalio KrasnogorUniversity of Nottingham, UK. Gecco 2011, Jul 2011, Dublin, Ireland. ACM New York, NY, USA ©2011, pp.1355-1362, 2011, 〈10.1145/2001576.2001759〉. 〈hal-00643358〉

Partager

Métriques

Consultations de la notice

327

Téléchargements de fichiers

290