Skip to Main content Skip to Navigation
Theses

Nonparametric methods for learning and detecting multivariate statistical dissimilarity

Alix Lhéritier 1
1 ABS - Algorithms, Biology, Structure
CRISAM - Inria Sophia Antipolis - Méditerranée
Abstract : In this thesis, we study problems related to learning and detecting multivariate statistical dissimilarity, which are of paramount importance for many statistical learning methods nowadays used in an increasingly number of fields. This thesis makes three contributions related to these problems. The first contribution introduces a notion of multivariate nonparametric effect size shedding light on the nature of the dissimilarity detected between two datasets. Our two step method first decomposes a dissimilarity measure (Jensen-Shannon divergence) aiming at localizing the dissimilarity in the data embedding space, and then proceeds by aggregating points of high discrepancy and in spatial proximity into clusters. The second contribution presents the first sequential nonparametric two-sample test. That is, instead of being given two sets of observations of fixed size, observations can be treated one at a time and, when strongly enough evidence has been found, the test can be stopped, yielding a more flexible procedure while keeping guaranteed type I error control. Additionally, under certain conditions, when the number of observations tends to infinity, the test has a vanishing probability of type II error. The third contribution consists in a sequential change detection test based on two sliding windows on which a two-sample test is performed, with type I error guarantees. Our test has controlled memory footprint and, as opposed to state-of-the-art methods that also provide type I error control, has constant time complexity per observation, which makes our test suitable for streaming data.
Document type :
Theses
Complete list of metadata

Cited literature [135 references]  Display  Hide  Download

https://hal.inria.fr/tel-01245946
Contributor : Abes Star :  Contact
Submitted on : Friday, March 25, 2016 - 12:28:09 PM
Last modification on : Thursday, January 11, 2018 - 4:47:56 PM

File

2015NICE4072.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-01245946, version 2

Collections

Citation

Alix Lhéritier. Nonparametric methods for learning and detecting multivariate statistical dissimilarity. Other [cs.OH]. Université Nice Sophia Antipolis, 2015. English. ⟨NNT : 2015NICE4072⟩. ⟨tel-01245946v2⟩

Share

Metrics

Record views

335

Files downloads

560