https://hal.inria.fr/hal-02404295Kim, JisuJisuKimDATASHAPE - Understanding the Shape of Data - CRISAM - Inria Sophia Antipolis - MÃ©diterranÃ©e - Inria - Institut National de Recherche en Informatique et en Automatique - Inria Saclay - Ile de France - Inria - Institut National de Recherche en Informatique et en AutomatiqueShin, JaehyeokJaehyeokShinStatistics Department, Carnegie Mellon University - CMU - Carnegie Mellon University [Pittsburgh]Rinaldo, AlessandroAlessandroRinaldoStatistics Department, Carnegie Mellon University - CMU - Carnegie Mellon University [Pittsburgh]Wasserman, LarryLarryWassermanStatistics Department, Carnegie Mellon University - CMU - Carnegie Mellon University [Pittsburgh]Machine Learning Department [Carnegie Mellon Univ.] - CMU - Carnegie Mellon University [Pittsburgh]Uniform Convergence of the Kernel Density Estimator Adaptive to Intrinsic Volume DimensionHAL CCSD2019[MATH.MATH-ST] Mathematics [math]/Statistics [math.ST]KIM, Jisu2019-12-31 15:01:482022-02-04 03:09:342020-01-07 10:48:51enConference papershttps://hal.inria.fr/hal-02404295/documentapplication/pdf1We derive concentration inequalities for the supremum norm of the difference between a kernel density estimator (KDE) and its point-wise expectation that hold uniformly over the selection of the bandwidth and under weaker conditions on the kernel and the data generating distribution than previously used in the literature. We first propose a novel concept, called the volume dimension, to measure the intrinsic dimension of the support of a probability distribution based on the rates of decay of the probability of vanishing Euclidean balls. Our bounds depend on the volume dimension and generalize the existing bounds derived in the literature. In particular, when the data-generating distribution has a bounded Lebesgue density or is supported on a sufficiently well-behaved lower-dimensional manifold, our bound recovers the same convergence rate depending on the intrinsic dimension of the support as ones known in the literature. At the same time, our results apply to more general cases, such as the ones of distribution with unbounded densities or supported on a mixture of manifolds with different dimensions. Analogous bounds are derived for the derivative of the KDE, of any order. Our results are generally applicable but are especially useful for problems in geometric inference and topological data analysis, including level set estimation, density-based clustering, modal clustering and mode hunting, ridge estimation and persistent homology.