A. Anandkumar, D. Hsu, A. Javanmard, and S. Kakade, Learning linear bayesian networks with latent variables, International Conference on Machine Learning, pp.249-257, 2013.

B. Aragam, J. Gu, and Q. Zhou, Learning large-scale bayesian networks with the sparsebn package, 2017.

B. Aragam and Q. Zhou, Concave penalized estimation of sparse gaussian bayesian networks, Journal of Machine Learning Research, vol.16, pp.2273-2328, 2015.

O. Banerjee, L. E. Ghaoui, and A. Aspremont, Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data, Journal of Machine learning research, vol.9, pp.485-516, 2008.

A. R. Barron and T. M. Cover, Minimum complexity density estimation, IEEE transactions on information theory, vol.37, issue.4, pp.1034-1054, 1991.

M. Bartlett and J. Cussens, Integer linear programming for the bayesian network structure learning problem, Artificial Intelligence, vol.244, pp.258-271, 2017.

M. Better, F. Glover, and M. Laguna, Advances in analytics: Integrating dynamic data mining with simulation optimization, IBM journal of research and development, vol.51, issue.3, pp.477-487, 2007.

B. Bothorel, O. Goudet, and B. Duval, Inférence de graphes de régulation génétique. application au génome de la plante arabidopsis thaliana, 2019.

G. Brown, A. Pocock, M. Zhao, and M. Luján, Conditional likelihood maximisation: a unifying framework for information theoretic feature selection, Journal of machine learning research, vol.13, pp.27-66, 2012.

K. Budhathoki and J. Vreeken, Causal inference by stochastic complexity, 2017.

P. Bühlmann, J. Peters, and J. Ernest, Cam: Causal additive models, highdimensional order search and penalized regression, The Annals of Statistics, vol.42, issue.6, pp.2526-2556, 2014.

D. M. Chickering, Optimal structure identification with greedy search, Journal of Machine Learning Research, vol.3, pp.507-554, 2002.

D. M. Chickering, D. Heckerman, and C. Meek, Large-sample learning of bayesian networks is np-hard, Journal of Machine Learning Research, vol.5, pp.1287-1330, 2004.

K. Cho, B. Van-merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01433235

D. Colombo and M. H. Maathuis, Order-independent constraint-based causal structure learning, Journal of Machine Learning Research, vol.15, issue.1, pp.3741-3782, 2014.

D. Colombo, M. H. Maathuis, M. Kalisch, and T. S. Richardson, Learning highdimensional directed acyclic graphs with latent and selection variables, The Annals of Statistics, pp.294-321, 2012.

G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems (MCSS), vol.2, pp.303-314, 1989.

P. Daniusis, D. Janzing, J. Mooij, J. Zscheischler, B. Steudel et al., Inferring deterministic causal relations, 2012.

T. G. Dietterich, Ensemble methods in machine learning, International workshop on multiple classifier systems, pp.1-15, 2000.

G. Doquet and M. Sebag, Agnostic feature selection, 2019.

M. Drton and M. H. Maathuis, Structure learning in graphical modeling, Annual Review of Statistics and Its Application, issue.0, 2016.

R. Edwards, Fourier analysis on groups, 1964.

M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum et al., Efficient and robust automated machine learning, Advances in Neural Information Processing Systems, vol.28, pp.2962-2970, 2015.

J. A. Fonollosa, Conditional distribution variability measures for causality detection, 2016.

P. Forré and J. M. Mooij, Constraint-based causal discovery for non-linear structural causal models with cycles and latent confounders, 2018.

J. Friedman, T. Hastie, and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics, vol.9, issue.3, pp.432-441, 2008.

Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle et al., Domain-adversarial training of neural networks, The Journal of Machine Learning Research, vol.17, issue.1, pp.2096-2030, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01624607

X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp.249-256, 2010.

F. Glover and E. Taillard, A user's guide to tabu search, Annals of operations research, vol.41, issue.1, pp.1-28, 1993.

A. S. Goldberger, Reverse regression and salary discrimination, Journal of Human Resources, 1984.

D. Golovin, B. Solnik, S. Moitra, G. Kochanski, J. Karro et al., Google vizier: A service for black-box optimization, Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.1487-1495, 2017.

I. Goodfellow, J. Pouget-abadie, M. Mirza, B. Xu, D. Warde-farley et al., Generative adversarial nets, Neural Information Processing Systems (NIPS), pp.2672-2680, 2014.

A. Gordon, E. Eban, O. Nachum, B. Chen, H. Wu et al., Morphnet: Fast & simple resource-constrained structure learning of deep networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1586-1595, 2018.

O. Goudet, D. Kalainathan, P. Caillou, I. Guyon, D. Lopez-paz et al., Learning functional causal models with generative neural networks, Explainable and Interpretable Models in Computer Vision and Machine Learning, pp.39-80, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01649153

A. Gretton, K. M. Borgwardt, M. Rasch, B. Schölkopf, and A. J. Smola, A kernel method for the two-sample-problem, vol.19, p.513, 2007.

A. Gretton, O. Bousquet, A. Smola, and B. Schölkopf, Measuring statistical dependence with hilbert-schmidt norms, International conference on algorithmic learning theory, pp.63-77, 2005.

A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Schölkopf, Kernel methods for measuring independence, Journal of Machine Learning Research, vol.6, pp.2075-2129, 2005.

P. D. Grünwald and P. M. Vitányi, Algorithmic information theory. Handbook of the Philosophy of Information, pp.281-320, 2008.

E. J. Gumbel, Statistical theory of extreme values and some practical applications, NBS Applied Mathematics Series, p.33, 1954.

I. Guyon, Chalearn cause effect pairs challenge, 2013.

I. Guyon, Chalearn fast causation coefficient challenge, 2014.

I. Guyon, A. Statnikov, B. Batu, and B. , Cause Effect Pairs in Machine Learning, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02433195

K. Hara, D. Saitoh, and H. Shouno, Analysis of dropout learning regarded as ensemble learning, International Conference on Artificial Neural Networks, pp.72-79, 2016.

A. Hauser and P. Bühlmann, Characterization and greedy learning of interventional markov equivalence classes of directed acyclic graphs, Journal of Machine Learning Research, vol.13, pp.2409-2464, 2012.

K. He, X. Zhang, S. Ren, and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Proceedings of the IEEE international conference on computer vision, pp.1026-1034, 2015.

G. Hinton, L. Deng, D. Yu, G. E. Dahl, . Mohamed et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, 2012.

P. O. Hoyer, D. Janzing, J. M. Mooij, J. Peters, and B. Schölkopf, Nonlinear causal discovery with additive noise models, Neural Information Processing Systems (NIPS), pp.689-696, 2009.

P. O. Hoyer, S. Shimizu, and A. J. Kerminen, Estimation of linear, non-gaussian causal models in the presence of confounding latent variables, 2006.

P. O. Hoyer, S. Shimizu, A. J. Kerminen, and M. Palviainen, Estimation of causal effects using linear non-gaussian causal models with hidden variables, International Journal of Approximate Reasoning, vol.49, issue.2, pp.362-378, 2008.

A. Hyvärinen and P. Pajunen, Nonlinear independent component analysis: Existence and uniqueness results, Neural Networks, vol.12, issue.3, pp.429-439, 1999.

G. W. Imbens and D. B. Rubin, Causal inference in statistics, social, and biomedical sciences, 2015.

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, International conference on machine learning, pp.448-456, 2015.

A. Irrthum, L. Wehenkel, and P. Geurts, Inferring regulatory networks from expression data using tree-based methods, PloS one, vol.5, issue.9, p.12776, 2010.

E. Jang, S. Gu, and B. Poole, Categorical reparameterization with gumbel-softmax, 2016.

D. Janzing, J. Mooij, K. Zhang, J. Lemeire, J. Zscheischler et al., Information-geometric approach to inferring causal directions, Artificial Intelligence, vol.182, pp.1-31, 2012.

D. Janzing and B. Scholkopf, Causal inference using the algorithmic markov condition, IEEE Transactions on Information Theory, vol.56, issue.10, pp.5168-5194, 2010.

A. E. Johnson, T. J. Pollard, L. Shen, H. L. Li-wei, M. Feng et al., Mimic-iii, a freely accessible critical care database, vol.3, p.160035, 2016.

D. Kalainathan, O. Goudet, P. Caillou, and M. Sebag, Portraits de travailleurs: Comprendre la qualité de vie au travail. Notes de la Fabrique, 2018.

D. Kalainathan, O. Goudet, I. Guyon, D. Lopez-paz, and M. Sebag, Structural agnostic modelling: Adversarial learning of causal graphs, 2019.

M. Kalisch and A. Hauser, , 2018.

M. Kalisch, M. Mächler, D. Colombo, M. H. Maathuis, and P. Bühlmann, Causal inference using graphical models with the r package pcalg, Journal of Statistical Software, vol.47, issue.11, pp.1-26, 2012.

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, 2014.

D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, 2014.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks. NIPS, 2012.

S. Lachapelle, P. Brouillard, T. Deleu, and S. Lacoste-julien, Gradient-based neural dag learning, 2019.

S. L. Lauritzen, Graphical models, vol.17, 1996.

J. Lemeire and K. Steenhaut, Inference of graphical causal models: Representing the meaningful information of probability distributions, Causality: Objectives and Assessment, pp.107-120, 2010.

C. Li, W. Chang, Y. Cheng, Y. Yang, P. et al., Mmd gan: Towards deeper understanding of moment matching network, Advances in Neural Information Processing Systems, pp.2203-2213, 2017.

M. Li and P. Vitányi, An introduction to Kolmogorov complexity and its applications, 2013.

Y. Li, K. Swersky, and R. S. Zemel, Generative moment matching networks, ICML, pp.1718-1727, 2015.

D. Lopez-paz, From dependence to causation, 2016.

D. Lopez-paz, K. Muandet, B. Schölkopf, and I. O. Tolstikhin, Towards a learning theory of cause-effect inference, ICML, pp.1452-1461, 2015.

D. Lopez-paz, R. Nishihara, S. Chintala, B. Schölkopf, and L. Bottou, Discovering causal signals in images, 2016.

D. Lopez-paz and M. Oquab, Revisiting classifier two-sample tests, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01862834

C. Louizos, M. Welling, and D. P. Kingma, Learning sparse neural networks through l 0 regularization, 2017.

C. J. Maddison, A. Mnih, and Y. W. Teh, The concrete distribution: A continuous relaxation of discrete random variables, 2016.

D. Marbach, T. Schaffter, D. Floreano, R. J. Prill, and G. Stolovitzky, The dream4 in-silico network challenge, 2009.

A. Marx and J. Vreeken, Telling cause from effect using mdl-based local and global regression, 2017 IEEE international conference on data mining (ICDM), pp.307-316, 2017.

P. Mendes, W. Sha, Y. , and K. , Artificial gene networks for objective comparison of analysis algorithms, Bioinformatics, vol.19, issue.2, pp.122-129, 2003.

P. Meyer and G. Bontempi, Information-theoretic gene selection in expression data, Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, pp.399-420, 2013.

J. Mitrovic, D. Sejdinovic, and Y. W. Teh, Causal inference via kernel deviance measures, 2018.

O. Mogren, C-rnn-gan, Continuous recurrent neural networks with adversarial training, 2016.

J. M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Schölkopf, Distinguishing cause from effect using observational data: methods and benchmarks, Journal of Machine Learning Research, vol.17, issue.32, pp.1-102, 2016.

P. Nandy, A. Hauser, and M. H. Maathuis, High-dimensional consistency in scorebased and hybrid structure learning, 2015.

M. Nauta, Temporal causal discovery and structure learning with attention-based convolutional neural networks, 2018.

X. Nguyen, M. J. Wainwright, J. , and M. I. , Estimating divergence functionals and the likelihood ratio by convex risk minimization, IEEE Transactions on Information Theory, vol.56, issue.11, pp.5847-5861, 2010.

S. Nowozin, B. Cseke, and R. Tomioka, f-gan: Training generative neural samplers using variational divergence minimization, Advances in Neural Information Processing Systems, pp.271-279, 2016.

J. M. Ogarrio, P. Spirtes, R. , and J. , A hybrid causal search algorithm for latent variable models, Conference on Probabilistic Graphical Models, pp.368-379, 2016.

A. Paszke, S. Gross, and S. Chintala, Automatic differentiation in pytorch, 2017.

J. Pearl, Causality: models, reasoning and inference, Econometric Theory, vol.19, p.46, 2003.

J. Pearl, Causality, 2009.

J. Pearl and T. Verma, A formal theory of inductive causation, 1991.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

J. Peters and P. Bühlmann, Structural intervention distance (sid) for evaluating causal graphs, 2013.

J. Peters, D. Janzing, and B. Schölkopf, Elements of Causal Inference -Foundations and Learning Algorithms, 2017.

J. Peters, J. M. Mooij, D. Janzing, and B. Schölkopf, Causal discovery with continuous additive noise models, The Journal of Machine Learning Research, vol.15, issue.1, pp.2009-2053, 2014.

A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, 2015.

R. Raina, A. Madhavan, and A. Y. Ng, Large-scale deep unsupervised learning using graphics processors, Proceedings of the 26th annual international conference on machine learning, pp.873-880, 2009.

J. Ramsey, M. Glymour, R. Sanchez-romero, and C. Glymour, A million variables and more: the fast greedy equivalence search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images, International journal of data science and analytics, vol.3, issue.2, pp.121-129, 2017.

J. D. Ramsey, Scaling up greedy causal search for continuous variables, 2015.

P. R. Rosenbaum and D. B. Rubin, The central role of the propensity score in observational studies for causal effects, Biometrika, vol.70, issue.1, pp.41-55, 1983.

F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain, Psychological review, vol.65, issue.6, p.386, 1958.

J. Runge, Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information, 2017.

K. Sachs, O. Perez, D. Pe'er, D. A. Lauffenburger, and G. P. Nolan, Causal proteinsignaling networks derived from multiparameter single-cell data, Science, vol.308, issue.5721, pp.523-529, 2005.

S. Salehkaleybar, A. Ghassami, N. Kiyavash, and K. Zhang, Learning linear nongaussian causal models in the presence of latent variables, 2019.

M. Scanagatta, C. P. De-campos, G. Corani, and M. Zaffalon, Learning bayesian networks with thousands of variables, Advances in neural information processing systems, pp.1864-1872, 2015.

T. Schaffter, D. Marbach, and D. Floreano, Genenetweaver: in silico benchmark generation and performance profiling of network inference methods, Bioinformatics, vol.27, issue.16, pp.2263-2270, 2011.

M. Scutari, Learning bayesian networks with the bnlearn r package, 2009.

M. Scutari, Package 'bnlearn, 2018.

E. Sgouritsa, D. Janzing, P. Hennig, and B. Schölkopf, Inference of cause and effect with unsupervised inverse regression, AISTATS, 2015.

S. S. Shen-orr, R. Milo, S. Mangan, A. , and U. , Network motifs in the transcriptional regulation network of escherichia coli, Nature genetics, vol.31, issue.1, p.64, 2002.

S. Shimizu, P. O. Hoyer, A. Hyvärinen, and A. Kerminen, A linear non-gaussian acyclic model for causal discovery, Journal of Machine Learning Research, vol.7, pp.2003-2030, 2006.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., Mastering the game of Go with deep neural networks and tree search, Nature, 2016.

P. Spirtes, C. Glymour, and R. Scheines, Causation, prediction and search, Lecture Notes in Statistics, 1993.

P. Spirtes, C. N. Glymour, and R. Scheines, Causation, prediction, and search, 2000.

P. Spirtes, C. Meek, T. Richardson, and C. Meek, An algorithm for causal inference in the presence of latent variables and selection bias, 1999.

P. Spirtes and K. Zhang, Causal discovery and inference: concepts and recent methodological advances, Applied informatics, vol.3, 2016.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, vol.15, issue.1, pp.1929-1958, 2014.

A. Statnikov, M. Henaff, N. I. Lytkin, A. , and C. F. , New methods for separating causes from effects in genomics data, BMC genomics, vol.13, issue.8, p.22, 2012.

O. Stegle, D. Janzing, K. Zhang, J. M. Mooij, and B. Schölkopf, Probabilistic latent variable models for distinguishing between cause and effect, Neural Information Processing Systems (NIPS), pp.1687-1695, 2010.

E. V. Strobl, K. Zhang, and S. Visweswaran, Approximate kernel-based conditional independence tests for fast non-parametric causal discovery, 2017.

R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), pp.267-288, 1996.

I. Tsamardinos, C. F. Aliferis, A. R. Statnikov, and E. Statnikov, Algorithms for large scale markov blanket discovery, FLAIRS conference, vol.2, pp.376-380, 2003.

I. Tsamardinos, L. E. Brown, A. , and C. F. , The max-min hill-climbing bayesian network structure learning algorithm, Machine learning, vol.65, issue.1, pp.31-78, 2006.

S. Van-de-geer and P. Bühlmann, l0-penalized maximum likelihood for sparse directed acyclic graphs, The Annals of Statistics, vol.41, issue.2, pp.536-567, 2013.

T. Van-den-bulcke, K. Van-leemput, B. Naudts, P. Van-remortel, H. Ma et al., Syntren: a generator of synthetic gene expression data for design and analysis of structure learning algorithms, BMC bioinformatics, vol.7, issue.1, p.43, 2006.

T. Verma and J. Pearl, Equivalence and synthesis of causal models, Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI '90, pp.255-270, 1991.

Y. Wang and D. M. Blei, The blessings of multiple causes, 2018.

M. Welling, Intelligence per kilowatt-hour, 2018.

A. Yale, S. Dash, R. Dutta, A. Pavao, D. Kalainathan et al., Privacy preserving synthetic health data, 2018.
URL : https://hal.archives-ouvertes.fr/hal-02160496

M. Yamada, W. Jitkrittum, L. Sigal, E. P. Xing, and M. Sugiyama, Highdimensional feature selection by feature-wise kernelized lasso, Neural computation, vol.26, issue.1, pp.185-207, 2014.

K. Yu, L. Liu, L. , and J. , A unified view of causal and non-causal feature selection, 2018.

Y. Yu, J. Chen, T. Gao, Y. , and M. , Dag-gnn: Dag structure learning with graph neural networks, 2019.

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.68, issue.1, pp.49-67, 2006.

R. Zaag, J. P. Tamby, C. Guichard, Z. Tariq, G. Rigaill et al., Gem2net: from gene expression modeling to-omics networks, a new catdb module to investigate arabidopsis thaliana genes involved in stress response, Nucleic acids research, vol.43, issue.D1, pp.1010-1017, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01137554

J. Zhang, On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias, Artificial Intelligence, vol.172, pp.1873-1896, 2008.

K. Zhang and A. Hyvärinen, On the identifiability of the post-nonlinear causal model, Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, pp.647-655, 2009.

K. Zhang and A. Hyvärinen, Distinguishing causes from effects using nonlinear acyclic causal models, Causality: Objectives and Assessment, pp.157-164, 2010.

K. Zhang, J. Peters, D. Janzing, and B. Schölkopf, Kernel-based conditional independence test and application in causal discovery, 2012.

X. Zheng, B. Aragam, P. Ravikumar, and E. P. Xing, Dags with NO TEARS: continuous optimization for structure learning, NeurIPS, pp.9492-9503, 2018.

X. Zheng, B. Aragam, P. Ravikumar, and E. P. Xing, Dags with no tears: Smooth optimization for structure learning, 2018.