A scalable and effective rough set theory-based approach for big data pre-processing

Zaineb Chelly Dagdia; Christine Zarges; Gaël Beck; Mustapha Lebbah

doi:10.1007/s10115-020-01467-y

Article Dans Une Revue Knowledge and Information Systems (KAIS) Année : 2020

A scalable and effective rough set theory-based approach for big data pre-processing

(1, 2, 3) , (2) , (4) , (4)

1
2
3
4

Zaineb Chelly Dagdia

Fonction : Auteur
PersonId : 737013
IdHAL : zaineb-chelly-dagdia
ORCID : 0000-0002-2551-6586

Speech Modeling for Facilitating Oral-Based Communication

Aberystwyth University

Institut Supérieur de Gestion de Tunis [Tunis]

Christine Zarges

Fonction : Auteur

Aberystwyth University

Gaël Beck

Fonction : Auteur

Université Paris 13

Mustapha Lebbah

Fonction : Auteur
PersonId : 735055
IdHAL : mustapha-lebbah
ORCID : 0000-0001-7245-6371
IdRef : 144970759

Université Paris 13

Résumé

A big challenge in the knowledge discovery process is to perform data pre-processing, specifically feature selection, on a large amount of data and high dimensional attribute set. A variety of techniques have been proposed in the literature to deal with this challenge with different degrees of success as most of these techniques need further information about the given input data for thresholding, need to specify noise levels or use some feature ranking procedures. To overcome these limitations, rough set theory (RST) can be used to discover the dependency within the data and reduce the number of attributes enclosed in an input data set while using the data alone and requiring no supplementary information. However, when it comes to massive data sets, RST reaches its limits as it is highly computationally expensive. In this paper, we propose a scalable and effective rough set theory-based approach for large-scale data pre-processing, specifically for feature selection, under the Spark framework. In our detailed experiments, data sets with up to 10,000 attributes have been considered, revealing that our proposed solution achieves a good speedup and performs its feature selection task well without sacrificing performance. Thus, making it relevant to big data.

Domaines

Calcul parallèle, distribué et partagé [cs.DC] Recherche d'information [cs.IR]

Fichier principal

ChellyDagdia2020_Article_AScalableAndEffectiveRoughSetT.pdf (2.03 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Zaineb Chelly Dagdia : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02880626

Soumis le : jeudi 25 juin 2020-10:26:00

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : mercredi 23 septembre 2020-15:47:25

Dates et versions

hal-02880626 , version 1 (25-06-2020)

Identifiants

HAL Id : hal-02880626 , version 1
DOI : 10.1007/s10115-020-01467-y

Citer

Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Mustapha Lebbah. A scalable and effective rough set theory-based approach for big data pre-processing. Knowledge and Information Systems (KAIS), 2020, ⟨10.1007/s10115-020-01467-y⟩. ⟨hal-02880626⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS13 CNRS INRIA GRID5000 UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD USPC SORBONNE-PARIS-NORD SILECS

50 Consultations

130 Téléchargements

A scalable and effective rough set theory-based approach for big data pre-processing

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager