Abstract : Given samples from two distributions, a nonparametric two-sample test
aims at determining whether the two distributions are equal or not,
based on a test statistic. This statistic may be computed on the whole
dataset, or may be computed on a subset of the dataset by a function
trained on its complement. We propose a third tier, consisting of
functions exploiting a sequential framework to learn the differences
while incrementally processing the data. Sequential processing
naturally allows optional stopping, which makes our test the first
truly sequential nonparametric two-sample test.
We show that any sequential predictor can be turned into a sequential
two-sample test for which a valid $p$-value can be computed, yielding
controlled type I error. We also show that pointwise universal
predictors yield consistent tests, which can be built with a
nonparametric regressor based on $k$-nearest neighbors in particular.
We also show that mixtures and switch distributions can be used to
increase power, while keeping consistency.
https://hal.inria.fr/hal-01135608 Contributor : Frederic CazalsConnect in order to contact the contributor Submitted on : Tuesday, June 2, 2015 - 7:23:03 PM Last modification on : Wednesday, February 2, 2022 - 3:58:44 PM Long-term archiving on: : Tuesday, April 25, 2017 - 12:41:17 AM