]. P. Bibliography-[-alg92 and . Algoet, Universal schemes for prediction, gambling and portfolio selection. The Annals of Probability, pp.901-941, 1992.

A. Syed-mumtaz, D. Samuel, and . Silvey, A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society. Series B (Methodological), pp.131-142, 1966.

P. Assouad, Plongements lipschitziens dans R n, Bull. Soc. Math. France, vol.111, issue.4, pp.429-448, 1983.
DOI : 10.24033/bsmf.1997

URL : http://archive.numdam.org/article/BSMF_1983__111__429_0.pdf

[. Aslan and G. Zech, New test for the multivariate two-sample problem based on the concept of minimum energy, Journal of Statistical Computation and Simulation, vol.1, issue.2, pp.109-119, 2005.
DOI : 10.2307/2289012

R. E. Bellman, Adaptive control processes: a guided tour, 1961.
DOI : 10.1515/9781400874668

C. [. Baringhaus and . Franz, On a new multivariate two-sample test, Journal of Multivariate Analysis, vol.88, issue.1, pp.190-206, 2004.
DOI : 10.1016/S0047-259X(03)00079-4

URL : http://doi.org/10.1016/s0047-259x(03)00079-4

G. Biau and L. Györfi, On the asymptotic properties of a nonparametric l1-test statistic of homogeneity. Information Theory, IEEE Transactions on, vol.51, issue.11, pp.3965-3973, 2004.

]. P. Bic69 and . Bickel, A distribution free version of the smirnov two sample test in the p-variate case. The Annals of Mathematical Statistics, pp.1-23, 1969.

]. P. Bil13 and . Billingsley, Convergence of probability measures, 2013.

E. Carlo and . Bonferroni, Teoria statistica delle classi e calcolo delle probabilita, Libreria internazionale Seeber, 1936.

]. L. Bre57 and . Breiman, The individual ergodic theorem of information theory. The Annals of Mathematical Statistics, pp.809-811, 1957.

]. L. Bre01 and . Breiman, Random forests, Machine Learning, pp.5-32, 2001.

]. P. Bub15 and . Bubenik, Statistical topological data analysis using persistence landscapes, J. of Machine Learning Research, vol.16, pp.77-102, 2015.

Y. Benjamini and D. Yekutieli, The control of the false discovery rate in multiple testing under dependency, The Annals of Statistics, vol.29, issue.4, pp.1165-1188, 2001.

F. Chazal and L. J. , Guibas an dS.Y. Oudot, and P. Skraba. Persistence-based clustering in Riemannian manifolds, ACM SoCG, pp.97-106, 2011.

J. A. Cuesta-albertos, R. Fraiman, and T. Ransford, A Sharp Form of the Cram??r???Wold Theorem, Journal of Theoretical Probability, vol.3, issue.4, pp.201-209, 2007.
DOI : 10.1007/s10959-007-0060-7

R. [. Casella and . Berger, Statistical Inference., Biometrics, vol.49, issue.1, 2001.
DOI : 10.2307/2532634

D. [. Cazals and . Cohen-steiner, Reconstructing 3D compact sets, Computational Geometry, vol.45, issue.1-2, pp.1-13, 2011.
DOI : 10.1016/j.comgeo.2011.07.005

URL : https://hal.archives-ouvertes.fr/hal-00849819

[. Chaudhuri and S. Dasgupta, Rates of convergence for nearest neighbor classification, Advances in Neural Information Processing Systems, pp.3437-3445, 2014.

T. [. Cazals and . Dreyfus, SBL, the Structural Bioinformatics Library, 2015.

M. [. Clémençon, N. Depecker, and . Vayatis, AUC optimization and the two-sample problem, Advances in Neural Information Processing Systems Computational Geometry Algorithms Library, vol.22, pp.360-368, 2009.

P. [. Comaniciu and . Meer, Mean shift: A robust approach toward feature space analysis. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol.24, issue.5, pp.603-619, 2002.

I. Csiszár, Eine informationstheoretische ungleichung und ihre anwendung auf den beweis der ergodizitat von markoffschen ketten, Magyar. Tud. Akad. Mat. Kutató Int. Közl, vol.8, pp.85-108, 1963.

O. Chapelle and B. Schölkopf, Alexander Zien, et al. Semi-supervised learning, 2006.

J. [. Cover and . Thomas, Elements of Information Theory [Daw84] A Philip Dawid. Present position and potential developments: Some personal views: Statistical theory: The prequential approach, Journal of the Royal Statistical Society. Series A (General), pp.278-292, 1984.

M. Depecker, Méthodes d'apprentissage statistique pour le scoring, 2010.

L. Devroye, [. Dasgupta, and Y. Freund, A course in density estimation Random projection trees and low dimensional manifolds, Proceedings of the 40th annual ACM symposium on Theory of computing, pp.537-546, 1987.

C. Daridon, S. Fleischer, P. Shen, S. Ries, A. Lhéritier et al., Cytokine production by b cells in autoimmunity, Preparation, 2015.

L. Devroye, L. Györfi, and G. Lugosi, A probabilistic theory of pattern recognition, 1996.
DOI : 10.1007/978-1-4612-0711-5

[. Denil, D. Matheson, and D. Nando, Consistency of online random forests, Proceedings of The 30th International Conference on Machine Learning, pp.1256-1264, 2013.

]. M. Dwa57 and . Dwass, Modified randomization tests for nonparametric hypotheses, The Annals of Mathematical Statistics, vol.28, issue.1, pp.181-187, 1957.

H. Ding and J. Xu, FPTAS for Minimizing Earth Mover???s Distance under Rigid Transformations, Algorithms?ESA 2013, pp.397-408, 2013.
DOI : 10.1007/978-3-642-40450-4_34

D. Michael and . Ernst, Permutation methods: a basis for exact inference, Statistical Science, vol.19, issue.4, pp.676-685, 2004.

T. Adriaan-lambertus-van-erven, When data compression and statistics disagree: two frequentist challenges for the minimum description length principle, 2010.

J. [. Edelsbrunner, . M. Hareres03-]-d, J. E. Endres, and . Schindelin, Computational topology: an introduction A new metric for probability distributions, IEEE Transactions on Information theory, 2003.
DOI : 10.1090/mbk/069

G. Fasano and A. Franceschini, A multidimensional version of the Kolmogorov-Smirnov test, Monthly Notices of the Royal Astronomical Society, vol.225, issue.1, pp.155-170, 1987.
DOI : 10.1093/mnras/225.1.155

A. Ronald and . Fisher, On the interpretation of ?2 from contingency tables, and the calculation of p, Journal of the Royal Statistical Society, pp.87-94, 1922.

T. [. Fomenko and . Kunii, Topological Modeling for visualization, 1997.
DOI : 10.1007/978-4-431-66956-2

P. Flajolet, Approximate counting: A detailed analysis, BIT, vol.21, issue.1, pp.113-134, 1985.
DOI : 10.1007/BF01934993

M. Fromont, B. Laurent, M. Lerasle, and P. , Kernels based tests with non-asymptotic bootstrap approaches for two-sample problem, JMLR: Workshop and Conference Proceedings, pp.23-24, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00913879

R. Fortet and E. Mourier, Convergence de la r??partition empirique vers la r??partition th??orique, Annales scientifiques de l'??cole normale sup??rieure, vol.70, issue.3, pp.267-285, 1953.
DOI : 10.24033/asens.1013

[. Fay and M. Proschan, Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules, Statistics Surveys, vol.4, issue.0, pp.1-39, 2009.
DOI : 10.1214/09-SS051

J. Friedman and L. C. Rafsky, Multivariate generalizations of the wald-wolfowitz and smirnov two-sample tests. The Annals of Statistics, pp.697-717, 1979.

R. Filipovych, S. M. Resnick, and C. Davatzikos, Semi-supervised cluster analysis of imaging data, NeuroImage, vol.54, issue.3, pp.2185-2197, 2011.
DOI : 10.1016/j.neuroimage.2010.09.074

J. Friedman, On multivariate goodness-of-fit and two-sample testing, Proceedings of Phystat2003, 2004.
DOI : 10.2172/826696

H. Jerome, S. Friedman, J. Steppel, and . Tukey, A nonparametric procedure for comparing multivariate point sets. Number 153. Stanford Linear Accelerator Center Computation Research Group Technical Memo, 1973.

A. Gretton, M. Karsten, M. Borgwardt, B. Rasch, A. J. Schölkopf et al., A kernel method for the two-sample-problem, Advances in neural information processing systems, pp.513-520, 2006.

]. A. Gbr-+-12, K. M. Gretton, J. R. Borgwardt, B. Rasch, A. Schölkopf et al., A kernel twosample test, The Journal of Machine Learning Research, vol.13, issue.1, pp.723-773, 2012.

M. P. Gessaman, A Consistent Nonparametric Multivariate Density Estimator Based on Statistically Equivalent Blocks, The Annals of Mathematical Statistics, vol.41, issue.4, pp.1344-1346, 1970.
DOI : 10.1214/aoms/1177696909

A. Gretton, K. Fukumizu, Z. Harchaoui, K. Bharath, and . Sriperumbudur, A fast, consistent kernel two-sample test, Advances in neural information processing systems, pp.673-681, 2009.

A. [. Györfi and . Krzyzak, A distribution-free theory of nonparametric regression, 2002.
DOI : 10.1007/b97848

L. Gordon, A. Richard, and . Olshen, Consistent nonparametric regression from recursive partitioning schemes, Journal of Multivariate Analysis, vol.10, issue.4, pp.611-627, 1980.
DOI : 10.1016/0047-259X(80)90074-3

L. Gordon, A. Richard, and . Olshen, Almost surely consistent nonparametric regression from recursive partitioning schemes, Journal of Multivariate Analysis, vol.15, issue.2, pp.147-163, 1984.
DOI : 10.1016/0047-259X(84)90022-8

I. Phillip and . Good, Permutation, parametric and bootstrap tests of hypotheses, 2005.

]. P. Grü07 and . Grünwald, The minimum description length principle, 2007.

]. A. Gss-+-12, D. Gretton, H. Sejdinovic, S. Strathmann, M. Balakrishnan et al., Optimal kernel choice for large-scale two-sample tests, Advances in Neural Information Processing Systems, pp.1205-1213, 2012.

L. Gyorfi, The rate of convergence of<tex>k_n</tex>-NN regression estimates and classification rules (Corresp.), IEEE Transactions on Information Theory, vol.27, issue.3, pp.362-364, 1981.
DOI : 10.1109/TIT.1981.1056344

]. N. Hen88 and . Henze, A multivariate two-sample test based on the number of nearest neighbor type coincidences. The Annals of Statistics, pp.772-783, 1988.

[. Jr, L. Erich, and . Lehmann, Estimates of location based on rank tests. The Annals of Mathematical Statistics, pp.598-611, 1963.

N. Henze and M. D. Penrose, On the multivariate runs test. The Annals of Statistics, pp.290-298, 1999.

N. [. Hall and . Tajvidi, Permutation tests for equality of distributions in high-dimensional settings, Biometrika, vol.89, issue.2, p.359, 2002.
DOI : 10.1093/biomet/89.2.359

H. Jeffreys, An Invariant Form for the Prior Probability in Estimation Problems, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol.186, issue.1007, pp.453-461, 1007.
DOI : 10.1098/rspa.1946.0056

K. Jockel, Finite sample properties and asymptotic efficiency of monte carlo tests. The annals of Statistics, pp.336-347, 1986.

S. [. Kpotufe and . Dasgupta, A tree-based regressor that adapts to intrinsic dimension, Journal of Computer and System Sciences, vol.78, issue.5, pp.1496-1515, 2012.
DOI : 10.1016/j.jcss.2012.01.002

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=

M. [. Kullback, H. H. Kupperman, and . Ku, Tests for Contingency Tables and Markov Chains, Technometrics, vol.4, issue.4, pp.573-608, 1962.
DOI : 10.2307/1266291

A. Kolmogorov, Sulla determinazione empirica delle leggi di probabilita, Giorn. Ist. Ital. Attuari, vol.4, pp.1-11, 1933.

S. Kpotufe, k-nn regression adapts to local intrinsic dimension, Advances in Neural Information Processing Systems, pp.729-737, 2011.

S. S. Kozat, A. C. Singer, and A. J. Bean, Universal portfolios via context trees, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.2093-2096, 2008.
DOI : 10.1109/ICASSP.2008.4518054

A. Karatzoglou and A. Smola, Kurt Hornik, and Achim Zeileis. kernlab ? an S4 package for kernel methods in R, Journal of Statistical Software, vol.11, issue.9, pp.1-20, 2004.

J. Kivinen, A. J. Smola, C. Robert, and . Williamson, Online Learning with Kernels, IEEE Transactions on Signal Processing, vol.52, issue.8, pp.2165-2176, 2004.
DOI : 10.1109/TSP.2004.830991

V. [. Krichevsky and . Trofimov, The performance of universal encoding. Information Theory, IEEE Transactions on, vol.27, issue.2, pp.199-207, 1981.

S. Kullback, Information Theory and Statistics LeCun and C. Cortes. The MNIST database of handwritten digits, 1959.

]. E. Leh51 and . Lehmann, Consistency and unbiasedness of certain nonparametric tests. The Annals of, Mathematical Statistics, vol.22, issue.2, pp.165-179, 1951.

]. J. Lin91 and . Lin, Divergence measures based on the Shannon entropy. Information Theory, IEEE Transactions on, vol.37, issue.1, pp.145-151, 1991.

J. Li and R. Y. Liu, New Nonparametric Tests of Multivariate Locations and Scales Using Data Depth, Statistical Science, vol.19, issue.4, pp.686-696, 2004.
DOI : 10.1214/088342304000000594

R. Y. Liu, J. M. Parelius, and K. Singh, Multivariate analysis by data depth: Descriptive statistics, graphics and inference. The Annals of Statistics, pp.783-840, 1999.

[. Lehmann, P. Joseph, and . Romano, Testing statistical hypotheses, 2005.

I. [. Lopes, P. R. Reid, and . Hobson, The two-dimensional kolmogorov-smirnov test. In XI International Workshop on Advanced Computing and Analysis Techniques in Physics Research, Nikhef, Amsterdam, the Netherlands, Liese and Igor Vajda. On divergences and informations in statistics and information theory. Information Theory, pp.524394-4412, 2006.

L. Li and B. Yu, Iterated logarithmic expansions of the pathwise code lengths for exponential families, IEEE Transactions on Information Theory, vol.46, issue.7, pp.2683-2689, 2000.

J. Langford and B. Zadrozny, Estimating class membership probabilities using classifier learners, Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, pp.198-205, 2005.

G. Marsaglia, Choosing a Point from the Surface of a Sphere, The Annals of Mathematical Statistics, vol.43, issue.2, pp.645-646, 1972.
DOI : 10.1214/aoms/1177692644

G. Malandain and J. Boissonnat, COMPUTING THE DIAMETER OF A POINT SET, International Journal of Computational Geometry & Applications, vol.12, issue.06, pp.489-510, 2002.
DOI : 10.1142/S0218195902001006

URL : https://hal.archives-ouvertes.fr/inria-00615026

W. [. Melnykov, R. Chen, and . Maitra, Package for Simulating Data to Study Performance of Clustering Algorithms, Journal of Statistical Software, vol.51, issue.12, pp.511-536, 2012.
DOI : 10.18637/jss.v051.i12

M. [. Merhav and . Feder, Universal prediction. Information Theory, IEEE Transactions on, vol.44, issue.6, pp.2124-2147, 1998.

W. John, R. Miller, P. Goodman, and . Smyth, On loss functions which minimize to conditional expected values and posterior probabilities. Information Theory, IEEE Transactions on, vol.39, issue.4, pp.1404-1408, 1993.

J. W. Milnor, Morse Theory, pp.1963-63

M. Muja, G. David, and . Lowe, Flann, fast library for approximate nearest neighbors, International Conference on Computer Vision Theory and Applications, 2009.

H. [. Möttönen and . Oja, Multivariate spatial sign and rank methods, Journal of Nonparametric Statistics, vol.50, issue.2, pp.201-213, 1995.
DOI : 10.2307/1403809

T. Morimoto, -Theorem, Journal of the Physical Society of Japan, vol.18, issue.3, pp.328-331, 1963.
DOI : 10.1143/JPSJ.18.328

URL : https://hal.archives-ouvertes.fr/hal-00658784

J. Maa, D. K. Pearl, and R. Bartoszynski, Reducing multidimensional two-sample data to one-dimensional interpoint comparisons. The Annals of Statistics, pp.1069-1074, 1996.

E. Mervin and . Muller, A note on a method for generating points uniformly on n-dimensional spheres, Communications of the ACM, vol.2, issue.4, pp.19-20, 1959.

B. Henry, . Mann, R. Donald, and . Whitney, On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, pp.50-60, 1947.

A. Nobel, Histogram regression estimation using data-dependent partitions. The Annals of Statistics, pp.1084-1105, 1996.

J. Neyman and E. S. Pearson, On the Problem of the Most Efficient Tests of Statistical Hypotheses, Containing Papers of a Mathematical or Physical Character, pp.289-337, 1933.
DOI : 10.1098/rsta.1933.0009

H. Oja, Multivariate nonparametric methods with R: an approach based on spatial signs and ranks, 2010.
DOI : 10.1007/978-1-4419-0468-3

R. [. Oja and . Randles, Multivariate Nonparametric Tests, Statistical Science, vol.19, issue.4, pp.598-605, 2004.
DOI : 10.1214/088342304000000558

URL : http://projecteuclid.org/download/pdfview_1/euclid.ss/1113832724

J. Platt, Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods Advances in large margin classifiers Kullback-Leibler divergence estimation of continuous distributions, Information Theory IEEE International Symposium on, pp.61-74, 1999.

J. Peacock, Two-dimensional goodness-of-fit testing in astronomy, Monthly Notices of the Royal Astronomical Society, vol.202, issue.3, pp.615-627, 1983.
DOI : 10.1093/mnras/202.3.615

]. G. Posah, A. Pau, M. Oles, O. Smith, . Sklyar et al., EBImage: Image processing toolbox for R

G. [. Phipson and . Smyth, Permutation P-values Should Never Be Zero: Calculating Exact P-values When Permutations Are Randomly Drawn, Statistical Applications in Genetics and Molecular Biology, vol.9, issue.1, 2010.
DOI : 10.2202/1544-6115.1585

L. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE, pp.257-286, 1989.

J. Rissanen, Modeling by shortest data description, Automatica, vol.14, issue.5, pp.465-471, 1978.
DOI : 10.1016/0005-1098(78)90005-5

J. Rissanen, Stochastic complexity, Journal of the Royal Statistical Society. Series B (Methodological), pp.223-239, 1987.

J. J. Rissanen, Fisher information and stochastic complexity. Information Theory, IEEE Transactions on, vol.42, issue.1, pp.40-47, 1996.

A. [. Roerdink and . Meijster, The watershed transform: Definitions, algorithms and parallelization strategies, p.187, 2000.

]. P. Ros05 and . Rosenbaum, An exact distribution-free test comparing two multivariate distributions based on adjacency, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.67, issue.4, pp.515-530, 2005.

D. Mark, . Reid, C. Robert, and . Williamson, Information, divergence and risk for binary experiments, The Journal of Machine Learning Research, vol.12, pp.731-817, 2011.

M. F. Schilling, Multivariate Two-Sample Tests Based on Nearest Neighbors, Journal of the American Statistical Association, vol.11, issue.395, pp.799-806, 1986.
DOI : 10.1214/aos/1176346051

R. Serfling, Depth functions in nonparametric multivariate inference, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol.72, issue.1, 2006.

M. Gail, R. Sullivan, and . Feinn, Using effect size-or why the p value is not enough, Journal of graduate medical education, vol.4, issue.3, pp.279-282, 2012.

]. Y. Sht87 and . Shtar-'kov, Universal sequential coding of single messages, Problemy Peredachi Informatsii, vol.23, issue.3, pp.3-17, 1987.

W. Bernard and . Silverman, Density estimation for statistics and data analysis, 1986.

[. Santiago-mozos, R. Fernandez-lorenzana, F. Perez-cruz, and A. , On the uncertainty in sequential hypothesis testing, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp.1223-1226, 2008.
DOI : 10.1109/ISBI.2008.4541223

N. Smirnov, Table for estimating the goodness of fit of empirical distributions. The annals of mathematical statistics, pp.279-281, 1948.

J. Silva and S. Narayanan, Universal Consistency of Data-Driven Partitions for Divergence Estimation, 2007 IEEE International Symposium on Information Theory, pp.2021-2025, 2007.
DOI : 10.1109/ISIT.2007.4557518

J. Silva, S. Shrikanth, and . Narayanan, Information divergence estimation based on data-dependent partitions, Journal of Statistical Planning and Inference, vol.140, issue.11, pp.3180-3198, 2010.
DOI : 10.1016/j.jspi.2010.04.011

. Srh-+-10-]-sören, G. Sonnenburg, S. Rätsch, C. Henschel, J. Widmer et al., Christian Gehl, and Vojt?ch Franc. The shogun machine learning toolbox, The Journal of Machine Learning Research, vol.11, pp.1799-1802, 2010.

A. [. Scholkopf and . Smola, Learning with kernels, 2002.

G. Shafer, A. Shen, N. Vereshchagin, and V. Vovk, Test Martingales, Bayes Factors and p -Values, Statistical Science, vol.26, issue.1, pp.84-101, 2011.
DOI : 10.1214/10-STS347

J. Charles and . Stone, Consistent nonparametric regression. The annals of statistics, pp.595-620, 1977.

J. Charles and . Stone, Optimal rates of convergence for nonparametric estimators. The annals of Statistics, pp.1348-1360, 1980.

J. Charles and . Stone, Optimal global rates of convergence for nonparametric regression. The Annals of Statistics, pp.1040-1053, 1982.

L. Song, C. H. Teo, and A. J. Smola, Relative novelty detection, International Conference on Artificial Intelligence and Statistics, pp.536-543, 2009.

]. R. Tar83 and . Tarjan, Data Structures and Network Algorithms, CBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics, vol.44, 1983.

]. V. Vap99 and . Vapnik, The nature of statistical learning theory [vdPG14] S. van der Pas and P. Grünwald. Almost the best of three worlds: Risk, consistency and optional stopping for the switch criterion in single parameter model selection. arXiv preprint, 1999.

T. Van-erven, P. Grünwald, and S. De-rooij, Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the AIC-BIC dilemma, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.23, issue.3, pp.361-417, 2012.
DOI : 10.1111/j.1467-9868.2011.01025.x

S. Verma, S. Kpotufe, and . Dasgupta, Which spatial partition trees are adaptive to intrinsic dimension?, Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence, pp.565-574, 2009.

J. Veness, K. S. Ng, M. Hutter, and M. Wagenmakers, Context tree switching A practical solution to the pervasive problems of p values, Data Compression Conference (DCC), pp.327-336779, 2007.

A. Wald, Sequential Tests of Statistical Hypotheses, The Annals of Mathematical Statistics, vol.16, issue.2, pp.117-186, 1945.
DOI : 10.1214/aoms/1177731118

D. Williams, [. Wang, S. R. Kulkarni, and S. Verdú, Probability with martingales Divergence estimation of continuous distributions based on data-dependent partitions. Information Theory, IEEE Transactions on, issue.9, pp.513064-3074, 1991.

S. [. Wang, S. Kulkarni, and . Verdú, Divergence estimation for multidimensional densities via k-nearest-neighbor distances. Information Theory, IEEE Transactions on, vol.55, issue.5, pp.2392-2405, 2009.

M. J. Frans, Y. M. Willems, T. J. Shtarkov, and . Tjalkens, The context-tree weighting method: Basic properties. Information Theory, IEEE Transactions on, vol.41, issue.3, pp.653-664, 1995.

M. J. Frans, Y. M. Willems, T. J. Shtarkov, and . Tjalkens, Context weighting for general finite-context sources. Information Theory, IEEE Transactions on, vol.42, issue.5, pp.1514-1520, 1996.

A. [. Zaremba, M. Gretton, and . Blaschko, B-test: A non-parametric, low variance kernel two-sample test, Advances in Neural Information Processing Systems, pp.755-763, 2013.

[. Zuo and R. Serfling, General notions of statistical depth function. The Annals of Statistics, pp.461-482, 2000.