A. Agresti, Categorical Data Analysis Wiley Series in Probability and Statistics, 2002.

H. Akaike, Information theory and an extension of the maximum likelihood principle, Second International Symposium on Information Theory, pp.267-281, 1973.

C. F. Aliferis, A. Statnikov, I. Tsamardinos, S. Mani, and X. D. Koutsoukos, Local causal and Markov blanket induction for causal discovery and feature selection for classification part i: Algorithms and empirical evaluation, Journal of Machine Learning Research, vol.11, pp.171-234, 2010.

C. F. Aliferis, I. Tsamardinos, and A. Statnikov, Hiton: a novel Markov blanket algorithm for optimal variable selection, AMIA Annual Symposium Proceedings, volume 2003, p.21, 2003.

L. Armijo, Minimization of functions having Lipschitz continuous first partial derivatives, Pacific Journal of Mathematics, vol.16, issue.1, pp.1-3, 1966.
DOI : 10.2140/pjm.1966.16.1
URL : http://msp.org/pjm/1966/16-1/pjm-v16-n1-p01-s.pdf

B. J. Becker and M. Wu, The Synthesis of Regression Slopes in Meta-Analysis, Statistical Science, vol.22, issue.3, pp.414-429, 2007.
DOI : 10.1214/07-STS243

D. Bertsimas, A. King, and R. Mazumder, Best subset selection via a modern optimization lens. The Annals of Statistics, pp.813-852, 2016.
DOI : 10.1214/15-aos1388
URL : http://arxiv.org/pdf/1507.03133

V. Boln-canedo, N. Snchez-maroo, and A. Alonso-betanzos, Feature Selection for High- Dimensional Data, 2015.

G. Borboudakis and I. Tsamardinos, Forward-backward selection with early dropping, p.2017

J. K. Bradley, A. Kyrola, D. Bickson, and C. Guestrin, Parallel coordinate descent for l1-regularized loss minimization, Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp.321-328, 2011.

G. Brown, A. Pocock, M. Zhao, and M. Luján, Conditional likelihood maximisation: A unifying framework for information theoretic feature selection, J. Mach. Learn. Res, vol.13, pp.27-66, 2012.

O. Canela-xandri, A. Law, A. Gray, J. A. Woolliams, and A. Tenesa, A new tool called DISSECT for analysing large genomic data sets using a Big Data approach, Nature Communications, vol.157, 2015.
DOI : 10.1038/nature04226

C. C. Chang, C. C. Chow, L. C. Tellier, S. Vattikuti, S. M. Purcell et al., Second-generation PLINK: rising to the challenge of larger and richer datasets, GigaScience, vol.30, issue.1, p.7, 2015.
DOI : 10.1093/bioinformatics/btu495

F. S. Collins and H. Varmus, A New Initiative on Precision Medicine, New England Journal of Medicine, vol.372, issue.9, pp.793-795, 2015.
DOI : 10.1056/NEJMp1500523
URL : https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5101938/pdf

I. H. Consortium, A haplotype map of the human genome, Nature, vol.54, issue.7063, pp.1299-1320, 2005.
DOI : 10.1086/429864

J. Dougherty, R. Kohavi, and M. Sahami, Supervised and Unsupervised Discretization of Continuous Features, MACHINE LEARNING: PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE, pp.194-202, 1995.
DOI : 10.1016/B978-1-55860-377-6.50032-3
URL : http://ai.stanford.edu/~ronnyk/disc.pdf

B. Efron and R. J. Tibshirani, An introduction to the bootstrap, 1994.
DOI : 10.1007/978-1-4899-4541-9

R. F. Engle, Wald, likelihood ratio, and lagrange multiplier tests in econometrics. Handbook of econometrics, pp.775-826, 1984.
DOI : 10.1016/s1573-4412(84)02005-5

J. Fan, Y. Feng, and Y. Wu, High-dimensional variable selection for Cox???s proportional hazards model, Borrowing Strength: Theory Powering Applications?A Festschrift for Lawrence D. Brown, pp.70-86, 2010.
DOI : 10.1214/10-IMSCOLL606

R. Fisher, Statistical methods for research workers, 1932.

R. V. Foutz and R. C. Srivastava, The performance of the likelihood ratio test when the model is incorrect. The Annals of Statistics, pp.1183-1194, 1977.

J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical learning, series in statistics, 2001.

S. V. Geer, P. Bühlmann, and J. Schelldorfer, Estimation for high-dimensional linear mixed-effects models using l1-penalization, Scandinavian Journal of Statistics, vol.38, issue.2, pp.197-214, 2011.

I. Guyon and A. Elisseeff, An introduction to variable and feature selection, Journal of machine learning research, vol.3, pp.1157-1182, 2003.

F. Harrell, Regression Modeling Strategies, 2001.

L. V. Hedges and J. L. Vevea, Fixed- and random-effects models in meta-analysis., Psychological Methods, vol.3, issue.4, p.486, 1998.
DOI : 10.1037/1082-989X.3.4.486

D. W. Hosmer, J. , S. Lemeshow, and R. X. Sturdivant, Introduction to the Logistic Regression Model, 2013.
DOI : 10.1002/0471722146.ch1

S. Ivanoff, F. Picard, and V. Rivoirard, Adaptive lasso and group-lasso for functional poisson regression, J. Mach. Learn. Res, vol.17, issue.1, pp.1903-1948, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01097914

G. H. John, R. Kohavi, and K. Pfleger, Irrelevant Features and the Subset Selection Problem, Machine learning: proceedings of the eleventh international conference, pp.121-129, 1994.
DOI : 10.1016/B978-1-55860-335-6.50023-4
URL : ftp://robotics.stanford.edu/pub/gjohn/papers/relevance.ps

R. Kerber, Chimerge: Discretization of numeric attributes, Proceedings of the tenth national conference on Artificial intelligence, pp.123-128, 1992.

P. Konda, A. Kumar, C. Ré, and V. Sashikanth, Feature selection in enterprise analytics, Proceedings of the VLDB Endowment, vol.6, issue.12, pp.1306-1309, 2013.
DOI : 10.14778/2536274.2536302

M. H. Kutner, C. J. Nachtsheim, J. Neter, and W. Li, Applied Linear Statistical Models, 2004.

V. Lagani, G. Athineou, A. Farcomeni, M. Tsagris, and I. Tsamardinos, : Discovering Statistically Equivalent Feature Subsets, Journal of Statistical Software, vol.80, issue.7, p.2016
DOI : 10.18637/jss.v080.i07
URL : https://doi.org/10.18637/jss.v080.i07

V. Lagani, G. Kortas, and I. Tsamardinos, BIOMARKER SIGNATURE IDENTIFICATION IN ???OMICS??? DATA WITH MULTI-CLASS OUTCOME, Computational and Structural Biotechnology Journal, vol.6, issue.7, pp.1-7, 2013.
DOI : 10.5936/csbj.201303004
URL : https://doi.org/10.5936/csbj.201303004

V. Lagani and I. Tsamardinos, Structure-based variable selection for survival data, Bioinformatics, vol.19, issue.15, pp.1887-1894, 2010.
DOI : 10.1177/0962280209105024
URL : https://academic.oup.com/bioinformatics/article-pdf/26/15/1887/16893123/btq261.pdf

S. Lee, J. K. Kim, X. Zheng, Q. Ho, G. A. Gibson et al., On model parallelization and scheduling strategies for distributed machine learning, Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, pp.2834-2842, 2014.

J. Li, K. Cheng, S. Wang, F. Morstatter, R. P. Trevino et al., Feature Selection, ACM Computing Surveys, vol.50, issue.6, 2016.
DOI : 10.1109/ICDM.2017.78
URL : https://hal.archives-ouvertes.fr/hal-01361171

Q. Li, S. Qiu, S. Ji, P. M. Thompson, J. Ye et al., Parallel Lasso Screening for Big Data Optimization, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, pp.1705-1714, 2016.
DOI : 10.1145/1015330.1015332

T. M. Loughin, A systematic comparison of methods for combining p-values from independent tests, Computational Statistics & Data Analysis, vol.47, issue.3, pp.467-485, 2004.
DOI : 10.1016/j.csda.2003.11.020

D. Margaritis, Toward provably correct feature selection in arbitrary domains, Advances in Neural Information Processing Systems, pp.1240-1248, 2009.

D. Margaritis and S. Thrun, Bayesian network induction via local neighborhoods, Advances in Neural Information Processing Systems 12, pp.505-511, 2000.

L. Meier, S. V. Geer, and P. Bühlmann, The group lasso for logistic regression, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.68, issue.1, 2008.
DOI : 10.1093/oxfordjournals.pan.a004868
URL : http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2007.00627.x/pdf

N. Meinshausen and P. Bühlmann, High-dimensional graphs and variable selection with the lasso. The annals of statistics, pp.1436-1462, 2006.

X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman et al., Mllib: Machine learning in apache spark, J. Mach. Learn. Res, vol.17, issue.1, pp.1235-1241, 2016.

A. Miller, Subset selection in regression, 2002.

T. P. Minka, A comparison of numerical optimizers for logistic regression, 2003.

J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, 1988.

J. Pearl, Causality, Models, Reasoning, and Inference, 2000.

J. Pearl and T. S. Verma, A Theory of Inferred Causation, 1991.

P. Peduzzi, J. Concato, E. Kemper, T. R. Holford, and A. R. Feinstein, A simulation study of the number of events per variable in logistic regression analysis, Journal of Clinical Epidemiology, vol.49, issue.12, pp.1373-1379, 1996.
DOI : 10.1016/S0895-4356(96)00236-3

J. M. Peña, R. Nilsson, J. Björkegren, and J. Tegnér, Towards scalable and data efficient learning of Markov boundaries, International Journal of Approximate Reasoning, vol.45, issue.2, pp.211-232, 2007.
DOI : 10.1016/j.ijar.2006.06.008

H. Peng, F. Long, and C. Ding, Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.27, issue.8, pp.1226-1238, 2005.
DOI : 10.1109/TPAMI.2005.159

S. Ramrez-gallego, H. Mourio-taln, D. Martnez-rego, V. Boln-canedo, J. M. Bentez et al., An Information Theory-Based Feature Selection Framework for Big Data Under Apache Spark, IEEE Transactions on Systems, Man, and Cybernetics: Systems, issue.99, pp.1-13, 2017.
DOI : 10.1109/TSMC.2017.2670926

T. Richardson and P. Spirtes, Ancestral graph Markov models, The Annals of Statistics, vol.30, issue.4, pp.962-1030, 2002.
DOI : 10.1214/aos/1031689015
URL : http://www.hss.cmu.edu/philosophy/spirtes/annals2002.pdf

T. Sato, Y. Takano, R. Miyashiro, and A. Yoshise, Feature subset selection for logistic regression via mixed integer optimization, Computational Optimization and Applications, vol.30, issue.3, pp.865-880, 2016.
DOI : 10.1016/j.patrec.2008.11.012
URL : https://tsukuba.repo.nii.ac.jp/?action=repository_action_common_download&item_id=32772&item_no=1&attribute_id=17&file_no=2

G. Schwarz, Estimating the dimension of a model. The annals of statistics, pp.461-464, 1978.

S. T. Sherry, M. Ward, M. Kholodov, J. Baker, L. Phan et al., dbSNP: the NCBI database of genetic variation, Nucleic Acids Research, vol.29, issue.1, pp.308-311, 2001.
DOI : 10.1093/nar/29.1.308

S. Singh, J. Kubica, S. Larsen, and D. Sorokina, Parallel Large Scale Feature Selection for Logistic Regression, Proceedings of the 2009 SIAM International Conference on Data Mining, pp.1172-1183, 2009.
DOI : 10.1137/1.9781611972795.100
URL : http://www.cs.cmu.edu/%7Edaria/papers/fslr.pdf

P. Spirtes, C. N. Glymour, and R. Scheines, Causation, prediction, and search, 2000.
DOI : 10.1007/978-1-4612-2748-9

A. Statnikov, N. I. Lytkin, J. Lemeire, and C. F. Aliferis, Algorithms for discovery of multiple Markov boundaries, Journal of Machine Learning Research, vol.14, issue.Feb, pp.499-566, 2013.

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), pp.267-288, 1996.
DOI : 10.1111/j.1467-9868.2011.00771.x

I. Tsamardinos and C. F. Aliferis, Towards principled feature selection: relevancy, filters and wrappers, Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 2003.

I. Tsamardinos, C. F. Aliferis, and A. Statnikov, Time and sample efficient discovery of Markov blankets and direct causal relations, Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '03, pp.673-678, 2003.
DOI : 10.1145/956750.956838
URL : http://discover1.mc.vanderbilt.edu/discover/public/Publications/MMMB.pdf

I. Tsamardinos, C. F. Aliferis, and A. R. Statnikov, Algorithms for large scale Markov blanket discovery, FLAIRS conference, 2003.

I. Tsamardinos and A. P. Mariglis, Multi-Source Causal Analysis: Learning Bayesian Networks from Multiple Datasets, IFIP International Conference on Artificial Intelligence Applications and Innovations, pp.479-490, 2009.
DOI : 10.1007/978-1-4419-0221-4_56

T. Verma and P. , Causal Networks: Semantics and Expressiveness, Proceedings, 4th Workshop on Uncertainty in Artificial Intelligence, pp.352-359, 1988.

E. Vittinghoff and C. E. Mcculloch, Relaxing the Rule of Ten Events per Variable in Logistic and Cox Regression, American Journal of Epidemiology, vol.165, issue.6, pp.710-718, 2007.
DOI : 10.1093/aje/kwk052

Q. H. Vuong, Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses, Econometrica, vol.57, issue.2, pp.307-333, 1989.
DOI : 10.2307/1912557
URL : https://authors.library.caltech.edu/81424/1/sswp605.pdf

S. Weisberg, Applied linear regression, 2005.
DOI : 10.1002/0471704091

W. J. Welch, - hard problems in computational statistics, Journal of Statistical Computation and Simulation, vol.20, issue.1, pp.17-25, 1982.
DOI : 10.1137/1020067

H. White, Maximum Likelihood Estimation of Misspecified Models, Econometrica, vol.50, issue.1, pp.1-25, 1982.
DOI : 10.2307/1912526

S. S. Wilks, The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses, The Annals of Mathematical Statistics, vol.9, issue.1, pp.60-62, 1938.
DOI : 10.1214/aoms/1177732360

E. P. Xing, Q. Ho, P. Xie, and D. Wei, Strategies and Principles of Distributed Machine Learning on Big Data, Engineering, vol.2, issue.2, pp.179-195, 2016.
DOI : 10.1016/J.ENG.2016.02.008

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, Spark: Cluster computing with working sets, HotCloud, 2010.

Y. Zhai, Y. Ong, and I. W. Tsang, The Emerging "Big Dimensionality", IEEE Computational Intelligence Magazine, vol.9, issue.3, pp.14-26, 2014.
DOI : 10.1109/MCI.2014.2326099

K. Zhang, J. Peters, D. Janzing, and B. Schölkopf, Kernel-based conditional independence test and application in causal discovery, Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pp.804-813, 2011.

Z. Zhao, R. Zhang, J. Cox, D. Duling, and W. Sarle, Massively parallel feature selection: an approach based on variance preservation, Machine Learning, pp.195-220, 2013.
DOI : 10.1007/s10994-013-5373-4
URL : http://www.cs.bris.ac.uk/~flach/ECMLPKDD2012papers/1125533.pdf

P. Zhimin, Y. Ming, and Y. Wotao, Parallel and distributed sparse optimization, Proceedings of the Asilomar Conference on Signals, Systems and Computers, 2013.