A. Diogo-montinho, Pattern-based causal feature extraction. Cause-effect Pairs, 2018.

F. Arntzenius, Reichenbach's common cause principle, 2010.

L. Peter, M. I. Bartlett, J. D. Jordan, and . Mcauliffe, Convexity, classification, and risk bounds, Journal of the American Statistical Association, vol.101, issue.473, pp.138-156, 2006.

S. Ben-david, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira et al., A theory of learning from different domains, Machine Learning, vol.79, pp.151-175, 2010.

C. Bishop, Bernard Boser, Isabelle Guyon, and Vladimir Vapnik. Pattern recognition system using support vectors, Pattern Recognition and Machine Learning, vol.5, p.68, 1997.

S. Boucheron, O. Bousquet, and G. Lugosi, Theory of classification: A survey of some recent advances, ESAIM: probability and statistics, vol.9, pp.323-375, 1984.
URL : https://hal.archives-ouvertes.fr/hal-00017923

L. Breiman, Random forests. Machine learning, vol.45, pp.5-32, 2001.

T. Chen and C. Guestrin, Xgboost: A scalable tree boosting system, Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp.785-794, 2016.

D. Maxwell and C. , Optimal structure identification with greedy search. JMLR, 2002. Povilas Daniusis, Dominik Janzing, Joris Mooij, Jakob Zscheischler, Bastian Steudel, Kun Zhang, and Bernhard Schölkopf. Inferring deterministic causal relations, 2012.

J. Fonollosa, Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting, Conditional distribution variability measures for causality detection. arXiv, vol.55, pp.119-139, 1997.

H. Jerome and . Friedman, Greedy function approximation: a gradient boosting machine, Annals of statistics, pp.1189-1232, 2001.

Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle et al., Domain-adversarial training of neural networks, Journal of Machine Learning Research, vol.17, issue.59, pp.1-35, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01624607

P. Geurts, D. Ernst, and L. Wehenkel, Extremely randomized trees, Machine learning, vol.63, issue.1, pp.3-42, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00341932

I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, 2016.

I. Guyon, Chalearn cause effect pairs challenge, 2013.

I. Guyon, Chalearn fast causation coefficient challenge. Codalab platform, ChaLearn, 2014.

O. Patrik, D. Hoyer, . Janzing, M. Joris, J. Mooij et al., Nonlinear causal discovery with additive noise models, NIPS, 2009.

D. Janzing and B. Schölkopf, Causal inference using the algorithmic markov condition, IEEE Transactions on Information Theory, vol.56, issue.10, pp.5168-5194, 2010.

D. Janzing, J. Mooij, K. Zhang, J. Lemeire, and J. Zscheischler, Information-geometric approach to inferring causal directions, Povilas Daniu?is, Bastian Steudel, and Bernhard Schölkopf, vol.182, pp.1-31, 2012.

D. Kalainathan, O. Goudet, I. Guyon, D. Lopez-paz, M. Sebag et al., Structural Agnostic Model, Causal Discovery and Penalized Adversarial Learning, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01864239

H. Kim and Y. Teh, Scaling up the Automatic Statistician: Scalable structure discovery using Gaussian processes, AISTATS, vol.84, pp.575-584, 2018.

P. Diederik, J. Kingma, and . Ba, Adam: A method for stochastic optimization, 2014.

O. Tarald and . Kvalseth, Entropy and correlation: Some comments, IEEE Transactions on Systems, Man, and Cybernetics, vol.17, issue.3, pp.517-519, 1987.

J. Lloyd, D. Duvenaud, R. Grosse, J. B. Tenenbaum, and Z. Ghahramani, Automatic construction and natural-language description of nonparametric regression models, AAAI, 2014.

D. Lopez-paz, K. Muandet, B. Schölkopf, and . Ilya-o-tolstikhin, Towards a learning theory of cause-effect inference, 2015.

D. Lopez-paz and R. Nishihara, Soumith Chintala, Bernhard Schölkopf, and Léon Bottou. Discovering causal signals in images. CVPR, 2017.

L. Van-der-maaten and G. Hinton, Visualizing data using t-sne, Journal of machine learning research, vol.9, pp.2579-2605, 2008.

J. Mitrovic, D. Sejdinovic, and Y. Teh, Causal inference via kernel deviance measures, 2018.

M. Joris, J. Mooij, D. Peters, J. Janzing, B. Zscheischler et al., Distinguishing cause from effect using observational data: methods and benchmarks, 2016.

K. Muandet, K. Fukumizu, B. Sriperumbudur, and B. Schölkopf, Kernel mean embedding of distributions: A review and beyond. Foundations and Trends R in Machine Learning, Judea Pearl. Causality, vol.10, issue.1-2, pp.1-141, 2009.

A. Rahimi and B. Recht, Random features for large-scale kernel machines, Advances in neural information processing systems, pp.1177-1184, 2008.

A. Rahimi and B. Recht, Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning, Advances in neural information processing systems, pp.1313-1320, 2009.

R. Raina, A. Madhavan, and A. Ng, Large-scale deep unsupervised learning using graphics processors, Proceedings of the 26th annual international conference on machine learning, pp.873-880, 2009.

. Walter-rudin, Fourier Analysis on Groups, 1962.

S. Samothrakis, D. Perez, and S. Lucas, Training gradient boosting machines using curve-fitting and information-theoretic features for causal direction detection, 2013.

T. Schaffter, D. Marbach, and D. Floreano, Genenetweaver: In silico benchmark generation and performance profiling of network inference methods, Bioinformatics, vol.27, issue.16, pp.2263-2270, 2011.

E. Robert and . Schapire, A brief introduction to boosting, 1999.

R. Scheines, An introduction to causal inference, 1997.

B. Schölkopf, A. Smola, and K. Müller, Kernel principal component analysis, International Conference on Artificial Neural Networks, pp.583-588, 1997.

R. Ramprasaath, M. Selvaraju, A. Cogswell, R. Das, D. Vedantam et al., Grad-cam: Visual explanations from deep networks via gradient-based localization, 2017.

K. Singh and G. Gupta, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal. Deep convolutional neural networks for pairwise causality, 2017.

P. Spirtes, N. Clark, R. Glymour, D. Scheines, C. Heckerman et al., Ingo Steinwart and Andreas Christmann. Support vector machines, Information Science and Statistics, vol.1, 2000.

C. Uhler, G. Raskutti, P. Bühlmann, and B. Yu, Geometry of the faithfulness assumption in causal inference, The Annals of Statistics, pp.436-463, 2013.

N. Vladimir and . Vapnik, Statistical learning theory. Adaptive and learning systems for signal processing, communications and control series, 1998.

J. Nguyen-xuan-vinh, J. Epps, and . Bailey, Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance, Journal of Machine Learning Research, vol.11, pp.2837-2854, 2010.

J. Zhang and P. Spirtes, Strong faithfulness and uniform consistency in causal inference, Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence, p.632

K. Zhang and A. Hyvärinen, On the identifiability of the post-nonlinear causal model, 2009.