21811 articles – 15606 references  [version française]

hal-00194145, version 2

Some non-asymptotic results on resampling in high dimension, I: Confidence regions, II: Multiple tests

Sylvain Arlot () 12, Gilles Blanchard () 34, Etienne Roquain () 5

The Annals of Statistics 38, 1 (2010) 51-99

Abstract: We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality of the vector can possibly be much larger than the number of observations and we focus on a non-asymptotic control of the confidence level, following ideas inspired by recent results in learning theory. We consider two approaches, the first based on a concentration principle (valid for a large class of resampling weights) and the second on a direct resampled quantile, specifically using Rademacher weights. Several intermediate results established in the approach based on concentration principles are of self-interest. We also discuss the question of accuracy when using Monte-Carlo approximations of the resampled quantities. We present an application of these results to the one-sided and two-sided multiple testing problem, in which we derive several resampling-based step-down procedures providing a non-asymptotic FWER control. We compare our different procedures in a simulation study, and we show that they can outperform Bonferroni's or Holm's procedures as soon as the observed vector has sufficiently correlated coordinates.

  • 1:  Laboratoire d'informatique de l'école normale supérieure (LIENS)
  • CNRS : UMR8548 – Ecole normale supérieure de Paris - ENS Paris
  • 2:  WILLOW (INRIA Rocquencourt)
  • INRIA – Ecole normale supérieure de Paris - ENS Paris – Ecole des Ponts ParisTech – CNRS : UMR8548
  • 3:  Weierstrass Institute for Applied Analysis and Stochastics (WIAS)
  • Forschungsverbund Berlin e.V. (FVB)
  • 4:  Fraunhofer FIRST.IDA (FHG FIRST.IDA)
  • Fraunhofer Institute
  • 5:  Laboratoire de Probabilités et Modèles Aléatoires (LPMA)
  • CNRS : UMR7599 – Université Pierre et Marie Curie [UPMC] - Paris VI – Université Paris VII - Paris Diderot
  • Domain : Mathematics/Statistics
    Statistics/Statistics Theory
  • Keywords : confidence regions – family-wise error – multiple testing – high dimensional data – non-asymptotic error control – resampling – cross-validation – concentration inequalities – resampled quantile
  • Comment : long version of arXiv:math/0701605
  • Available versions :  v1 (2007-12-05) v2 (2009-07-06)
 
  • hal-00194145, version 2
  • oai:hal.archives-ouvertes.fr:hal-00194145
  • From: 
  • Submitted on: Monday, 6 July 2009 11:25:18
  • Updated on: Thursday, 1 July 2010 14:35:30