hal-00688609, version 2
A sliced inverse regression approach for data stream
Marie Chavent 1, 2Stéphane Girard
a, 3Vanessa Kuentz 4Benoît Liquet
5Thi Mong Ngoc Nguyen 6Jérôme Saracco 1, 2
Résumé : In this article, we focus on data arriving sequentially by block in a stream. A semiparametric regression model involving a common EDR (Effective Dimension Reduction) direction is assumed in each block. Our goal is to estimate this direction at each arrival of a new block. A simple direct approach consists of pooling all the observed blocks and estimating the EDR direction by the SIR (Sliced Inverse Regression) method. But in practice, some disadvantages become apparent such as the storage of the blocks and the running time for high dimensional data. To overcome these drawbacks, we propose an adaptive SIR estimator based on the optimizaton of a quality measure.The proposed approach is faster both in terms of computational complexity and running time, and provides data storage benefits. We show the consistency of our estimator at the root-n rate and give its asymptotic distribution. We propose an extension to multiple indices model. We also provide a graphical tool in order to detect changes in the underlying model, i.e. drift in the EDR direction or aberrant blocks in the data stream. In a simulation study, we illustrate the numerical behavior of our estimator. Finally we apply it on real data concerning the estimation of physical properties of the Mars surface.
- a – INRIA
- 1 : Institut de Mathématiques de Bordeaux (IMB)
- CNRS : UMR5251 – Université Sciences et Technologies - Bordeaux I – Université Victor Segalen - Bordeaux II
- 2 : CQFD (INRIA Bordeaux - Sud-Ouest)
- INRIA – Université Sciences et Technologies - Bordeaux I – Université Victor Segalen - Bordeaux II – CNRS : UMR5251
- 3 : MISTIS (INRIA Grenoble Rhône-Alpes / LJK Laboratoire Jean Kuntzmann)
- INRIA – Laboratoire Jean Kuntzmann
- 4 : Aménités et dynamiques des espaces ruraux (UR ADBX)
- Irstea
- 5 : Epidémiologie et Biostatistique
- INSERM : U897 – Université Victor Segalen - Bordeaux II – Institut de Santé Publique, d'Épidémiologie et de Développement (ISPED)
- 6 : Institut de Recherche Mathématique Avancée (IRMA)
- CNRS : UMR7501 – Université de Strasbourg
- Domaine : Statistiques/Méthodologie
- Versions disponibles : v1 (18-04-2012) v2 (02-10-2012)
- hal-00688609, version 2
- http://hal.inria.fr/hal-00688609
- oai:hal.inria.fr:hal-00688609
- Contributeur : Stephane Girard
- Soumis le : Mardi 2 Octobre 2012, 08:47:31
- Dernière modification le : Mardi 2 Octobre 2012, 09:44:54






Documents associés
Exporter