Sketching for Large-Scale Learning of Mixture Models

Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. We propose a "compressive learning'' framework where we first sketch the data by computing random generalized moments of the underlying probability distribution, then estimate mixture model parameters from the sketch using an iterative algorithm analogous to greedy sparse signal recovery. We exemplify our framework with the sketched estimation of Gaussian Mixture Models (GMMs). We experimentally show that our approach yields results comparable to the classical Expectation-Maximization (EM) technique while requiring significantly less memory and fewer computations when the number of database elements is large. We report large-scale experiments in speaker verification, where our approach makes it possible to fully exploit a corpus of 1000 hours of speech signal to learn a universal background model at scales computationally inaccessible to EM.

Mots clés

database sketch Compressed Sensing compressed learning Gaussian mixture

Domaines

Machine Learning [stat.ML] Traitement du signal et de l'image [eess.SP] Probabilités [math.PR]

Fichier principal

paper.pdf (611.53 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Nicolas Keriven : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01208027

Soumis le : jeudi 1 octobre 2015-17:01:24

Dernière modification le : vendredi 24 mars 2023-14:53:01

Archivage à long terme le : samedi 2 janvier 2016-11:24:33

Dates et versions

hal-01208027 , version 1 (01-10-2015)

hal-01208027 , version 2 (23-10-2015)

hal-01208027 , version 3 (01-03-2016)

Identifiants

HAL Id : hal-01208027 , version 1

Citer

Nicolas Keriven, Anthony Bourrier, Rémi Gribonval, Patrick Pérez. Sketching for Large-Scale Learning of Mixture Models. 2015. ⟨hal-01208027v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

952 Consultations

896 Téléchargements