A data sampling and attribute selection strategy for improving decision tree construction - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Expert Systems with Applications Année : 2019

A data sampling and attribute selection strategy for improving decision tree construction

Résumé

Decision trees are efficient means for building classification models due to the compressibility, simplicity and ease of interpretation of their results. However, during the construction phase of decision trees, the outputs are often large trees that are affected by many uncertainties in the data (particularity, noise and residual variation). Combining attribute selection and data sampling presents one of the most promising research directions to overcome decision tree construction problems. However, the search space composed of all possible combinations of subsets of training samples and attributes is extremely large. In this paper, a novel approach is presented that allows generating an optimized decision tree by selecting an optimal couple of training samples and attributes subsets for training. As the search space of candidate couples of training samples and attributes subsets is extremely large, we use particle swarm optimization to make the search of an “optimal” solution tractable. The selected optimized solution helps in avoiding over-fitting and complexity problems suffered in the construction phase of decision trees. We conducted an extensive experimental evaluation on 22 datasets from the UCI Machine Learning Repository. The obtained results show that the proposed approach outperforms state-of-the-art classical as well as evolutionary decision tree construction methods in terms of simplicity, accuracy, and F-measure. We further evaluate our approach on a real-world engineering application for condition monitoring of rotating machinery under severe non-stationary conditions. The obtained results showed that the proposed approach allowed to optimize the use of instantaneous angular speed to diagnose gears defects.
Fichier non déposé

Dates et versions

hal-03025844 , version 1 (26-11-2020)

Identifiants

Citer

Nour El Islem Karabadji, Ilyes Khelf, Hassina Seridi, Sabeur Aridhi, Didier Rémond, et al.. A data sampling and attribute selection strategy for improving decision tree construction. Expert Systems with Applications, 2019, 129, pp.84-96. ⟨10.1016/j.eswa.2019.03.052⟩. ⟨hal-03025844⟩
52 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More