A. Greedy, . Reduction, . For, and . Problems,

A. R. Barron, L. Gyorfi, and E. C. Van-der-meulen, Distribution estimation consistent in total variation and in two types of information divergence, IEEE transactions on Information Theory, vol.38, pp.1437-1454, 1992.

R. E. Bellman, Adaptive control processes: a guided tour, vol.2045, 2015.

B. Bigi, Using kullback-leibler distance for text categorization, European Conference on Information Retrieval, pp.305-319, 2003.
URL : https://hal.archives-ouvertes.fr/hal-01392500

P. Binev, A. Cohen, W. Dahmen, and R. Devore, Classification algorithms using adaptive partitioning, The Annals of Statistics, vol.42, pp.2141-2163, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01217382

C. M. Bishop, Pattern recognition and machine learning, 2006.

A. Cohen, I. Daubechies, R. Devore, G. Kerkyacharian, and D. Picard, Capturing ridge functions in high dimensions from point queries, Constructive Approximation, vol.35, pp.225-243, 2012.

I. Csiszár, Eine informationstheoretische ungleichung und ihre anwendung auf beweis der ergodizitaet von markoffschen ketten, Magyer Tud. Akad. Mat. Kutato Int. Koezl, vol.8, pp.85-108, 1964.

M. Dash and H. Liu, Feature selection for classification, Intelligent data analysis, vol.1, pp.131-156, 1997.

M. Dash and H. Liu, Feature selection for clustering, Pacific-Asia Conference on knowledge discovery and data mining, pp.110-121, 2000.

C. Daskalakis and Q. Pan, Square hellinger subadditivity for bayesian networks and its applications to identity testing, 2016.

R. De-maesschalck, D. Jouan-rimbaud, and D. L. Massart, The mahalanobis distance, Chemometrics and intelligent laboratory systems, vol.50, pp.1-18, 2000.

L. Dümbgen and P. Conte-zerial, On low-dimensional projections of highdimensional distributions, From Probability to Statistics and Back: High-Dimensional Models and Processes-A Festschrift in Honor of Jon A, pp.91-104, 2013.

A. A. Fedotov, P. Harremoës, and F. Topsoe, Refinements of pinsker's inequality, IEEE Transactions on Information Theory, vol.49, pp.1491-1498, 2003.

I. K. Fodor, A survey of dimension reduction techniques, 2002.

M. Fornasier, K. Schnass, and J. Vybiral, Learning functions of few arbitrary linear parameters in high dimensions, Foundations of Computational Mathematics, vol.12, pp.229-262, 2012.

A. L. Gibbs and F. E. Su, On choosing and bounding probability metrics, International statistical review, vol.70, pp.419-435, 2002.

N. Gunduz and E. Fokoué, Robust classification of high dimension low sample size data, 2015.

I. Guyon and A. Elisseeff, An introduction to variable and feature selection, Journal of machine learning research, vol.3, pp.1157-1182, 2003.

P. Hall, J. S. Marron, and A. Neeman, Geometric representation of high dimension, low sample size data, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.67, pp.427-444, 2005.

P. Howland, M. Jeon, and H. Park, Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition, SIAM Journal on Matrix Analysis and Applications, vol.25, pp.165-179, 2003.

G. Hughes, On the mean accuracy of statistical pattern recognizers, IEEE transactions on information theory, vol.14, pp.55-63, 1968.

M. W. Iruthayarajan and S. Baskar, Evolutionary algorithms based design of multivariable pid controller, Expert Systems with applications, vol.36, pp.9159-9167, 2009.

S. Kern, S. D. Müller, N. Hansen, D. Büche, J. Ocenasek et al., Learning probability distributions in continuous evolutionary algorithms-a comparative review, Natural Computing, vol.3, pp.77-112, 2004.

H. Kim, H. Park, and H. Zha, Distance preserving dimension reduction for manifold learning, Proceedings of the 2007 SIAM International Conference on Data Mining, SIAM, pp.527-532, 2007.

S. Kullback and R. A. Leibler, On information and sufficiency, The annals of mathematical statistics, vol.22, pp.79-86, 1951.

N. Langrené and X. Warin, Fast and stable multivariate kernel density estimation by fast sum updating, Journal of Computational and Graphical Statistics, pp.1-27, 2019.

B. Liu, Y. Wei, Y. Zhang, and Q. Yang, Deep neural networks for high dimension, low sample size data, IJCAI, pp.2287-2293, 2017.

C. Liu and H. Shum, Kullback-leibler boosting, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol.1, pp.I-I, 2003.

Y. Liu, D. N. Hayes, A. Nobel, and J. S. Marron, Statistical significance of clustering for high-dimension, low-sample size data, Journal of the American Statistical Association, vol.103, pp.1281-1293, 2008.

M. Lopes, M. Fauvel, S. Girard, and D. Sheeren, High dimensional kullback-leibler divergence for grassland management practices classification from high resolution satellite image time series, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp.3342-3345, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01366208

S. Mayer, T. Ullrich, and J. Vyb?ral, Entropy and sampling numbers of classes of ridge functions, vol.42, 2015.

E. Meckes, Approximation of projections of random vectors, Journal of Theoretical Probability, vol.25, pp.333-352, 2012.

J. Navarro, A simple proof for the multivariate chebyshev inequality, 2013.

I. Nourdin and G. Poly, Convergence in law implies convergence in total variation for polynomials in independent gaussian, gamma or beta random variables, High Dimensional Probability VII, pp.381-394, 2016.
URL : https://hal.archives-ouvertes.fr/hal-00821911

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion et al., Scikit-learn: Machine learning in Python, Journal of Machine Learning Research, vol.12, pp.2825-2830, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00650905

L. Prechelt, Automatic early stopping using cross validation: quantifying the criteria, Neural Networks, vol.11, pp.761-767, 1998.

J. Ramírez, J. C. Segura, C. Benítez, A. De-la-torre, and A. J. Rubio, A new kullbackleibler vad for speech recognition in noise, IEEE signal processing letters, vol.11, pp.266-269, 2004.

A. Shemyakin, Hellinger distance and non-informative priors, Bayesian Analysis, vol.9, pp.923-938, 2014.

S. Smale and D. Zhou, Estimating the approximation error in learning theory, Analysis and Applications, vol.1, pp.17-41, 2003.

G. V. Trunk, A problem of dimensionality: A simple example, IEEE Transactions on Pattern Analysis & Machine Intelligence, pp.306-307, 1979.

A. Tsanas, M. A. Little, C. Fox, and L. O. Ramig, Objective automatic assessment of rehabilitative speech treatment in parkinson's disease, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol.22, pp.181-190, 2013.

S. Wold, K. Esbensen, and P. Geladi, Principal component analysis, Chemometrics and intelligent laboratory systems, vol.2, pp.37-52, 1987.

S. Xiang, F. Nie, and C. Zhang, Learning a mahalanobis distance metric for data clustering and classification, Pattern recognition, pp.3600-3612, 2008.