Skip to Main content Skip to Navigation
Journal articles

A data sampling and attribute selection strategy for improving decision tree construction

Abstract : Decision trees are efficient means for building classification models due to the compressibility, simplicity and ease of interpretation of their results. However, during the construction phase of decision trees, the outputs are often large trees that are affected by many uncertainties in the data (particularity, noise and residual variation). Combining attribute selection and data sampling presents one of the most promising research directions to overcome decision tree construction problems. However, the search space composed of all possible combinations of subsets of training samples and attributes is extremely large. In this paper, a novel approach is presented that allows generating an optimized decision tree by selecting an optimal couple of training samples and attributes subsets for training. As the search space of candidate couples of training samples and attributes subsets is extremely large, we use particle swarm optimization to make the search of an “optimal” solution tractable. The selected optimized solution helps in avoiding over-fitting and complexity problems suffered in the construction phase of decision trees. We conducted an extensive experimental evaluation on 22 datasets from the UCI Machine Learning Repository. The obtained results show that the proposed approach outperforms state-of-the-art classical as well as evolutionary decision tree construction methods in terms of simplicity, accuracy, and F-measure. We further evaluate our approach on a real-world engineering application for condition monitoring of rotating machinery under severe non-stationary conditions. The obtained results showed that the proposed approach allowed to optimize the use of instantaneous angular speed to diagnose gears defects.
Complete list of metadatas

https://hal.inria.fr/hal-03025844
Contributor : Sabeur Aridhi <>
Submitted on : Thursday, November 26, 2020 - 2:23:17 PM
Last modification on : Tuesday, December 1, 2020 - 3:22:24 AM

Identifiers

Citation

Nour El Islem Karabadji, Ilyes Khelf, Hassina Seridi, Sabeur Aridhi, Didier Rémond, et al.. A data sampling and attribute selection strategy for improving decision tree construction. Expert Systems with Applications, Elsevier, 2019, 129, pp.84-96. ⟨10.1016/j.eswa.2019.03.052⟩. ⟨hal-03025844⟩

Share

Metrics

Record views

14