On the sampling distribution of an $\ell^2$ norm of the Empirical Distribution Function, with applications to two-sample nonparametric testing
Résumé
We consider a situation where a set of two samples $\{X_1, \ldots, X_{n^{(1)}}\}$, $\{Y_1, \ldots, Y_{n^{(2)}}\}$, of independent real valued observations are obtained from unknown $\{F_X, F_Y\}$ under different conditions, with empirical distributions $\{\hat{F}_X, \hat{F}_Y\}$. In this case it is well known under the null hypothesis $F_X \equiv F_Y$ that the sample variation of the $\ell^{\infty}$ maximum distance, $|| \hat{F}_X - \hat{F}_Y ||_{\infty}$, has as asymptotic density of standard form independent of $F$. This result underpins the popular two-sample Kolmogorov-Smirnov test. In this article we show that other norms exist for which the asymptotic sampling distribution is also available in standard form. In particular we describe a weighted $\ell^2$ norm $|| \hat{F}_X - \hat{F}_Y ||_{2}^w$ derived from a binary recursion of $\bkR$ which is shown to follow a sum of $\chi^2$ random variables. This motivates a nonparametric test based on the average divergence, $|| \hat{F}_X - \hat{F}_Y||_{2}^w$, rather than the maximum, $|| \cdot ||_{\infty}$, which we demonstrate exhibits greater sensitivity to changes in scale and tail characteristics when $F_X \ne F_Y$, while maintaining power for changes in location.
Origine : Fichiers produits par l'(les) auteur(s)