Fast, Stable and Efficient Approximation of Multi-parameter Persistence Modules with MMA
Résumé
Topological data analysis (TDA) is a rapidly growing area of data science which uses the geometry and topology of data sets to produce qualitative multi-scale shape descriptors for subsequent statistical and machine learning tasks. The most common descriptor in TDA is persistent homology, which tracks the topological changes in growing families of subsets of the data set itself, called filtrations, and encodes them in an algebraic object, called a persistence module. The algorithmic and theoretical properties of persistence modules are now well understood in the single-parameter case, that is, when there is only one filtration (e.g., feature scale) to study. In contrast, much less is known in the multi-parameter case, where several filtrations (e.g., scale and density) are used simultaneously. Indeed, the resulting multi-parameter persistence modules are much more complicated and intricate, which dramatically impedes the study of their theoretical properties. However, they usually encode information that is invisible to their singleparameter
counterparts, and are thus much more useful descriptors for applications in data science. As a consequence, a lot of attention has been devoted to the construction of tractable and stable proxies for multi-parameter persistence modules. However, most of the proposed approaches in the literature are still
prohibitively expensive to compute on large-scale data, many are limited to at most two filtrations, and the most tractable sacrifice much of the richness of the multi-parameter setting.
In this article, we introduce a new parameterized family of topological invariants, taking the form
of candidate decompositions, for multi-parameter persistence modules. We prove that our candidate
decompositions are controllable approximations: when restricting to modules that can be decomposed into
interval summands, we establish theoretical results about the approximation error between our candidate
decompositions and the true underlying module in terms of the standard interleaving and bottleneck
distances. Moreover, even when the underlying module does not admit such a decomposition, our
candidate decompositions are nonetheless stable invariants; small perturbations in the underlying module
lead to small perturbations in the candidate decomposition. Then, we introduce MMA (Multipersistence
Module Approximation): an algorithm for computing stable instances of such invariants, which is based
on fibered barcodes and exact matchings, two constructions that stem from the theory of single-parameter
persistence. By design, MMA can handle an arbitrary number of filtrations, and has bounded complexity
and running time. Finally, we present empirical evidence validating the generalization capabilities and
running time speed-ups of MMA on several data sets.
Origine : Fichiers produits par l'(les) auteur(s)