, Original CVAttributeEval-threshold0

. Cvattributeeval-num3205,

. Cvattributeeval-num4117,

. Cvattributeeval-num5566,

. Cvattributeeval-num6171,

. Cfssubseteval-bestfirstgreedy,

. Chisquaredattributeeval-num3205,

. Chisquaredattributeeval-num4117,

. Chisquaredattributeeval-num5566,

. Chisquaredattributeeval-num6171,

. Consistencysubseteval-bestfirstgreedy,

. Correlationattributeeval-num3205,

. Correlationattributeeval-num4117,

. Correlationattributeeval-num5566,

. Correlationattributeeval-num6171,

. Gainratioattributeeval-threshold0,

. Gainratioattributeeval-num3205,

. Gainratioattributeeval-num4117,

. Gainratioattributeeval-num5566,

. Gainratioattributeeval-num6171,

, InfoGainAttributeEval-threshold0

. Infogainattributeeval-num3205,

. Infogainattributeeval-num4117,

. Infogainattributeeval-num5566,

. Infogainattributeeval-num6171,

, Original CVAttributeEval-threshold0

. Cvattributeeval-num3205,

. Cvattributeeval-num4117,

. Cvattributeeval-num5566,

. Cvattributeeval-num6171,

. Cfssubseteval-bestfirstgreedy,

. Chisquaredattributeeval-num3205,

. Chisquaredattributeeval-num4117,

. Chisquaredattributeeval-num5566,

. Chisquaredattributeeval-num6171,

. Consistencysubseteval-bestfirstgreedy,

. Correlationattributeeval-num3205,

. Correlationattributeeval-num4117,

. Correlationattributeeval-num5566,

. Correlationattributeeval-num6171,

. Gainratioattributeeval-threshold0,

. Gainratioattributeeval-num3205,

. Gainratioattributeeval-num4117,

. Gainratioattributeeval-num5566,

. Gainratioattributeeval-num6171,

, InfoGainAttributeEval-threshold0

. Infogainattributeeval-num3205,

. Infogainattributeeval-num4117,

. Infogainattributeeval-num5566,

. Infogainattributeeval-num6171,

, Original CVAttributeEval-threshold0

. Cvattributeeval-num3205,

. Cvattributeeval-num4117,

. Cvattributeeval-num5566,

. Cvattributeeval-num6171,

. Cfssubseteval-bestfirstgreedy,

. Chisquaredattributeeval-num3205,

. Chisquaredattributeeval-num4117,

. Chisquaredattributeeval-num5566,

. Chisquaredattributeeval-num6171,

. Consistencysubseteval-bestfirstgreedy,

. Correlationattributeeval-num3205,

. Correlationattributeeval-num4117,

. Correlationattributeeval-num5566,

. Correlationattributeeval-num6171,

. Gainratioattributeeval-threshold0,

. Gainratioattributeeval-num3205,

. Gainratioattributeeval-num4117,

. Gainratioattributeeval-num5566,

. Gainratioattributeeval-num6171,

, InfoGainAttributeEval-threshold0

. Infogainattributeeval-num3205,

. Infogainattributeeval-num4117,

. Infogainattributeeval-num5566,

. Infogainattributeeval-num6171,

, Original CVAttributeEval-threshold0

. Cvattributeeval-num3205,

. Cvattributeeval-num4117,

. Cvattributeeval-num5566,

. Cvattributeeval-num6171,

. Cfssubseteval-bestfirstgreedy,

. Chisquaredattributeeval-num3205,

. Chisquaredattributeeval-num4117,

. Chisquaredattributeeval-num5566,

. Chisquaredattributeeval-num6171,

. Consistencysubseteval-bestfirstgreedy,

. Correlationattributeeval-num3205,

. Correlationattributeeval-num4117,

. Correlationattributeeval-num5566,

. Correlationattributeeval-num6171,

. Gainratioattributeeval-threshold0,

. Gainratioattributeeval-num3205,

. Gainratioattributeeval-num4117,

. Gainratioattributeeval-num5566,

. Gainratioattributeeval-num6171,

, InfoGainAttributeEval-threshold0

. Infogainattributeeval-num3205,

. Infogainattributeeval-num4117,

. Infogainattributeeval-num5566,

. Infogainattributeeval-num6171,

, Original CVAttributeEval-threshold0

. Cvattributeeval-num3205,

. Cvattributeeval-num4117,

. Cvattributeeval-num5566,

. Cvattributeeval-num6171,

. Cfssubseteval-bestfirstgreedy,

. Chisquaredattributeeval-num3205,

. Chisquaredattributeeval-num4117,

. Chisquaredattributeeval-num5566,

. Chisquaredattributeeval-num6171,

. Consistencysubseteval-bestfirstgreedy,

. Correlationattributeeval-num3205,

. Correlationattributeeval-num4117,

. Correlationattributeeval-num5566,

. Correlationattributeeval-num6171,

. Gainratioattributeeval-threshold0,

. Gainratioattributeeval-num3205,

. Gainratioattributeeval-num4117,

. Gainratioattributeeval-num5566,

. Gainratioattributeeval-num6171,

, InfoGainAttributeEval-threshold0

. Infogainattributeeval-num3205,

. Infogainattributeeval-num4117,

. Infogainattributeeval-num5566,

. Infogainattributeeval-num6171,

, Original CVAttributeEval-threshold0

. Cvattributeeval-num3205,

. Cvattributeeval-num4117,

. Cvattributeeval-num5566,

. Cvattributeeval-num6171,

. Cfssubseteval-bestfirstgreedy,

. Chisquaredattributeeval-num3205,

. Chisquaredattributeeval-num4117,

. Chisquaredattributeeval-num5566,

. Chisquaredattributeeval-num6171,

. Consistencysubseteval-bestfirstgreedy,

. Correlationattributeeval-num3205,

. Correlationattributeeval-num4117,

. Correlationattributeeval-num5566,

. Correlationattributeeval-num6171,

. Gainratioattributeeval-threshold0,

. Gainratioattributeeval-num3205,

. Gainratioattributeeval-num4117,

. Gainratioattributeeval-num5566,

. Gainratioattributeeval-num6171,

, InfoGainAttributeEval-threshold0

. Infogainattributeeval-num3205,

. Infogainattributeeval-num4117,

. Infogainattributeeval-num5566,

. Infogainattributeeval-num6171,

, Original CVAttributeEval-threshold0

. Cvattributeeval-num3205,

. Cvattributeeval-num4117,

. Cvattributeeval-num5566,

. Cvattributeeval-num6171,

. Cfssubseteval-bestfirstgreedy,

. Chisquaredattributeeval-num3205,

. Chisquaredattributeeval-num4117,

. Chisquaredattributeeval-num5566,

. Chisquaredattributeeval-num6171,

. Consistencysubseteval-bestfirstgreedy,

. Correlationattributeeval-num3205,

. Correlationattributeeval-num4117,

. Correlationattributeeval-num5566,

. Correlationattributeeval-num6171,

. Gainratioattributeeval-threshold0,

. Gainratioattributeeval-num3205,

. Gainratioattributeeval-num4117,

. Gainratioattributeeval-num5566,

. Gainratioattributeeval-num6171,

, InfoGainAttributeEval-threshold0

. Infogainattributeeval-num3205,

. Infogainattributeeval-num4117,

. Infogainattributeeval-num5566,

. Infogainattributeeval-num6171,

F. M. Afendi, N. Ono, Y. Nakamura, K. Nakamura, L. K. Darusman et al., Data mining methods for omics and knowledge of crude medicinal plants toward big data biology, Comput Struct Biotechnol J, vol.4, issue.5, pp.1-14, 2013.

M. H. Aghdam, N. Ghasem-aghaee, and M. E. Basiri, Text feature selection using ant colony optimization, Expert Syst Appl, vol.36, issue.3, pp.6843-6853, 2009.

S. Ahmed, M. Zhang, and L. Peng, Enhanced feature selection for biomarker discovery in LC-MS data using GP, Evolutionary computation (CEC), pp.584-591, 2013.

A. Asuncion and D. J. Newman, UCI machine learning repository, 2007.

C. Bai and J. Sarkis, Integrating sustainability into supplier selection with grey system and rough set methodologies, Int J Produ Econ, vol.124, issue.1, pp.252-264, 2010.

V. Bolón-canedo, D. Rego-fernández, D. Peteiro-barral, A. , A. Guijarro-berdiñas et al., On the scalability of feature selection methods on high-dimensional data, Knowl Inf Syst, vol.56, issue.2, pp.395-442, 2018.

M. Chen, S. Mao, and Y. Liu, Big data: a survey, Mobile Netw Appl, vol.19, issue.2, pp.171-209, 2014.

Z. C. Dagdia, C. Zarges, G. Beck, and M. Lebbah, A distributed rough set theory based algorithm for an efficient big data pre-processing under the spark framework, 2017 IEEE international conference on big data, pp.911-916, 2017.

M. Dash and H. Liu, Feature selection for classification, Intell Data Anal, vol.1, issue.1-4, pp.131-156, 1997.

J. Dean and S. Ghemawat, MapReduce: a flexible data processing tool, Commun ACM, vol.53, issue.1, pp.72-77, 2010.

I. Düntsch and G. Gediga, Rough set data analysis, Encycl Comput Sci Technol, vol.43, issue.28, pp.281-301, 2000.

E. El-alfy and M. A. Alshammari, Towards scalable rough set based attribute subset selection for intrusion detection using parallel genetic algorithm in MapReduce, Simul Model Pract Theory, vol.64, pp.18-29, 2016.

W. Fan and A. Bifet, Mining big data: current status, and forecast to the future, ACM sIGKDD Explor Newsl, vol.14, issue.2, pp.1-5, 2013.

A. Fernández, S. Del-río, V. López, A. Bawakid, M. J. Del-jesus et al., Big data with cloud computing: an insight on the computing environment, MapReduce, and programming frameworks, Wiley Interdiscip Rev Data Min Knowl Discov, vol.4, issue.5, pp.380-409, 2014.

A. Ghosh, A. Datta, and S. Ghosh, Self-adaptive differential evolution for feature selection in hyperspectral image data, Appl Soft Comput, vol.13, issue.4, pp.1969-1977, 2013.

J. W. Grzymala-busse and W. Ziarko, Data mining and rough set theory, Commun ACM, vol.43, issue.4, pp.108-109, 2000.

I. Guyon and A. Elisseeff, An introduction to variable and feature selection, J Mach Learn Res, vol.3, pp.1157-1182, 2003.

J. Hu, W. Pedrycz, G. Wang, and K. Wang, Rough sets in distributed decision information systems, Knowl-Based Syst, vol.94, pp.13-22, 2016.

G. H. John, R. Kohavi, and K. Pfleger, Irrelevant features and the subset selection problem, Machine learning: proceedings of the eleventh international conference, pp.121-129, 1994.

D. T. Larose, Discovering knowledge in data: an introduction to data mining, 2014.

P. Lingras, Unsupervised rough set classification using GAs, J Intell Inf Syst, vol.16, issue.3, pp.215-228, 2001.

P. Lingras, Rough set clustering for web mining, Fuzzy systems, 2002. FUZZ-IEEE'02. Proceedings of the 2002 IEEE international conference on, pp.1039-1044, 2002.

H. Liu, H. Motoda, R. Setiono, and Z. Zhao, Feature selection: an ever evolving frontier in data mining, Feature selection in data mining, pp.4-13, 2010.

H. Liu and L. Yu, Toward integrating feature selection algorithms for classification and clustering, IEEE Trans Knowl Data Eng, vol.17, issue.4, pp.491-502, 2005.

H. Liu and Z. Zhao, Manipulating data and dimension reduction methods: feature selection, Computational complexity, pp.1790-1800, 2012.

Z. Pawlak, Rough sets: theoretical aspects of reasoning about data, Pawlak Z, Skowron A, vol.9, issue.1, pp.3-27, 2007.

H. Peng, F. Long, and C. Ding, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans Pattern Anal Mach Intell, vol.27, issue.8, pp.1226-1238, 2005.

D. Peralta, S. Del-río, S. Ramírez-gallego, I. Triguero, J. M. Benitez et al., Evolutionary feature selection for big data classification: a mapreduce approach, Math Probl Eng, 2015.

Y. Qian, J. Liang, W. Pedrycz, and C. Dang, Positive approximation: an accelerator for attribute reduction in rough set theory, Artif Intell, vol.174, issue.9, pp.597-618, 2010.

Y. Qian, X. Liang, Q. Wang, J. Liang, B. Liu et al., Local rough set: a solution to rough data analysis in big data, Int J Approx Reason, vol.97, pp.38-63, 2018.

S. Sakr, A. Liu, D. M. Batista, and M. Alomari, A survey of large scale data management approaches in cloud environments, IEEE Commun Surv Tutor, vol.13, issue.3, pp.311-336, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00623093

P. Schäfer, Scalable time series classification, Data Min Knowl Discov, vol.30, issue.5, pp.1273-1298, 2016.

J. Schneider and M. Vlachos, Scalable density-based clustering with quality guarantees using random projections, Data Mining Knowl Discov, vol.31, issue.4, pp.972-1005, 2017.

J. G. Shanahan and L. Dai, Large scale distributed data science using apache spark, Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp.2323-2324, 2015.

M. Snir, MPI-the complete reference: the MPI core, vol.1, 1998.

N. Talukder and M. J. Zaki, A distributed approach for graph mining in massive networks, Data Mini Knowl Discov, vol.30, issue.5, pp.1024-1052, 2016.

K. Thangavel and A. Pethalakshmi, Dimensionality reduction based on rough set theory: a review, Appl Soft Comput, vol.9, issue.1, pp.1-12, 2009.

N. X. Vinh, J. Chan, S. Romano, J. Bailey, C. Leckie et al., Discovering outlying aspects in large datasets, Data Min Knowl Discov, vol.30, issue.6, pp.1520-1555, 2016.

X. Wu, X. Zhu, G. Q. Wu, and W. Ding, Data mining with big data, IEEE Trans Knowl Data Eng, vol.26, issue.1, pp.97-107, 2014.

X. Xu, J. Jäger, and H. P. Kriegel, A fast parallel clustering algorithm for large spatial databases, High performance data mining, pp.263-290, 1999.

T. Zhai, Y. Gao, H. Wang, and L. Cao, Classification of high-dimensional evolving data streams via a resource-efficient online ensemble, Data Mining Knowl Discov, vol.31, issue.5, pp.1242-1265, 2017.

J. Zhang, S. Wang, L. Chen, and P. Gallinari, Multiple bayesian discriminant functions for highdimensional massive data classification, Data Min Knowl Discov, vol.31, issue.2, pp.465-501, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01549570

, Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

, Her research interests include different aspects of artificial intelligence. She writes on machine learning, data mining and data analytics, evolutionary algorithms and artificial immune systems, big data, and uncertainty theories, the ACM-W Award, the Marie Sklodowska Curie Individual European Fellowship and the Best Reviewer Award, 2013.

, Her current main research interests include machine learning, optimization and heuristic search methods, 2011.