Low-rank model with covariates for count data analysis - Centre de mathématiques appliquées (CMAP) Accéder directement au contenu
Article Dans Une Revue Journal of Multivariate Analysis Année : 2019

Low-rank model with covariates for count data analysis

Résumé

Count data are collected in many scientific and engineering tasks including image processing, single-cell RNA sequencing and ecological studies. Such data sets often contain missing values, for example because some ecological sites cannot be reached in a certain year. In addition, in many instances, side information is also available, for example covariates about ecological sites or species. Low-rank methods are popular to denoise and impute count data, and benefit from a substantial theoretical background. Extensions accounting for covariates have been proposed, but to the best of our knowledge their theoretical and empirical properties have not been thoroughly studied, and few softwares are available for practitioners. We propose a complete methodology called LORI (Low-Rank Interaction), including a Poisson model, an algorithm, and automatic selection of the regularization parameter, to analyze count tables with covariates. We also derive an upper bound on the estimation error. We provide a simulation study with synthetic data, revealing empirically that LORI improves on state of the art methods in terms of estimation and imputation of the missing values. We illustrate how the method can be interpreted through visual displays with the analysis of a well-know plant abundance data set, and show that the LORI outputs are consistent with known results. Finally we demonstrate the relevance of the methodology by analyzing a waterbirds abundance table from the French national agency for wildlife and hunting management (ONCFS). The method is available in the R package lori on the Comprehensive Archive Network (CRAN).
Fichier principal
Vignette du fichier
main_arxiv.pdf (985.43 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01482773 , version 1 (03-03-2017)
hal-01482773 , version 2 (19-09-2017)
hal-01482773 , version 3 (20-03-2018)

Identifiants

Citer

Geneviève Robin, Julie Josse, Éric Moulines, Sylvain Sardy. Low-rank model with covariates for count data analysis. Journal of Multivariate Analysis, 2019, 173. ⟨hal-01482773v3⟩
751 Consultations
240 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More