New efficient algorithms for multiple change-point detection with kernels

Alain Celisse 1, 2, 3 Guillemette Marot 3, 4 Morgane Pierre-Jean 5 Guillem Rigaill 6
3 MODAL - MOdel for Data Analysis and Learning
LPP - Laboratoire Paul Painlevé - UMR 8524, Université de Lille, Sciences et Technologies, Inria Lille - Nord Europe, CERIM - Santé publique : épidémiologie et qualité des soins-EA 2694, Polytech Lille - École polytechnique universitaire de Lille
Abstract : The present paper focuses on the problem of detecting abrupt changes arising in the full distribution of the observations (not only in the mean or variance). Several statistical approaches based on kernels have been proposed to tackle this problem. Although they enjoy good statistical properties (oracle inequality,.. .), they suffer a high computational cost (in terms of time and memory) which makes them computationally inefficient even with small to medium sample sizes (< 10 4). Our work addresses this computational issue by first describing a new efficient and exact algorithm for kernel multiple change-point detection with an improved worst-case complexity that is quadratic in time and linear in space. It allows to deal with medium size signals (≈ 10 5). Secondly, we also design a faster (but approximate) algorithm based on a low-rank approximation to the Gram matrix, which is linear in time and space. This algorithm can be applied to large-scale signals (n = 10 6). These exact and approximate algorithms have been implemented in R and C for various kernels. The computational and statistical performances of these new algorithms have been assessed through empirical experiments. The runtime of our algorithms is observed to be faster than that of other considered procedures. Finally the simulation results empirically confirm the higher statistical accuracy of kernel-based approaches (and their flexibility) to analyze biological profiles made of DNA copy numbers and allele B frequencies.
Complete list of metadatas

Cited literature [36 references]  Display  Hide  Download

https://hal.inria.fr/hal-01413230
Contributor : Alain Celisse <>
Submitted on : Friday, December 9, 2016 - 3:08:41 PM
Last modification on : Thursday, February 21, 2019 - 10:34:08 AM
Long-term archiving on : Thursday, March 23, 2017 - 10:25:38 AM

File

article.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01413230, version 1

Citation

Alain Celisse, Guillemette Marot, Morgane Pierre-Jean, Guillem Rigaill. New efficient algorithms for multiple change-point detection with kernels. 2016. ⟨hal-01413230v1⟩

Share

Metrics

Record views

389

Files downloads

189