Skip to Main content Skip to Navigation
Journal articles

A Comparative study of sample selection methods for classification

Abstract : Sampling of large datasets for data mining is important for at least two reasons. The processing of large amounts of data results in increased computational complexity. The cost of this additional complexity may not be justifiable. On the other hand, the use of small samples results in fast and efficient computation for data mining algorithms. Statistical methods for obtaining sufficient samples from datasets for classification problems are discussed in this paper. Results are presented for an empirical study based on the use of sequential random sampling and sample evaluation using univariate hypothesis testing and an information theoretic measure. Comparisons are made between theoretical and empirical estimates.
Document type :
Journal articles
Complete list of metadata

Cited literature [22 references]  Display  Hide  Download

https://hal.inria.fr/hal-01262348
Contributor : Coordination Episciences Iam <>
Submitted on : Tuesday, January 26, 2016 - 4:05:10 PM
Last modification on : Sunday, November 22, 2020 - 12:52:02 PM
Long-term archiving on: : Wednesday, April 27, 2016 - 1:20:57 PM

File

arima00606.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-01262348, version 1

Collections

Citation

Patricia E.N. Lutu, Andries P. Engelbrecht. A Comparative study of sample selection methods for classification. Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées, INRIA, 2007, 6, pp.69--85. ⟨hal-01262348⟩

Share

Metrics

Record views

164

Files downloads

1167