Uncertainty-based learning of acoustic models from noisy data

Alexey Ozerov 1 Mathieu Lagrange 2 Emmanuel Vincent 3, 4
3 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
4 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : We consider the problem of acoustic modeling of noisy speech data, where the uncertainty over the data is given by a Gaussian distribution. While this uncertainty has been exploited at the decoding stage via uncertainty decoding, its usage at the training stage remains limited to static model adaptation. We introduce a new Expectation Maximisation (EM) based technique, which we call uncertainty training, that allows us to train Gaussian mixture models (GMMs) or hidden Markov models (HMMs) directly from noisy data with dynamic uncertainty. We evaluate the potential of this technique for a GMM-based speaker recognition task on speech data corrupted by real-world domestic background noise, using a state-of-the-art signal enhancement technique and various uncertainty estimation techniques as a front-end. Compared to conventional training, the proposed training algorithm results in 1% to 2% absolute improvement in speaker recognition accuracy by training from either matched, unmatched or multi-condition noisy data. This algorithm is also applicable with minor modifications to maximum a posteriori (MAP) or maximum likelihood linear regression (MLLR) acoustic model adaptation from noisy data and to other data than audio.
Complete list of metadatas

Cited literature [40 references]  Display  Hide  Download

https://hal.inria.fr/hal-00717992
Contributor : Emmanuel Vincent <>
Submitted on : Wednesday, April 17, 2013 - 10:47:50 PM
Last modification on : Friday, May 17, 2019 - 9:22:06 AM
Long-term archiving on : Thursday, July 18, 2013 - 4:10:14 AM

File

ozerov_CSL12.pdf
Files produced by the author(s)

Identifiers

Citation

Alexey Ozerov, Mathieu Lagrange, Emmanuel Vincent. Uncertainty-based learning of acoustic models from noisy data. Computer Speech and Language, Elsevier, 2013, 27 (3), pp.874-894. ⟨10.1016/j.csl.2012.07.002⟩. ⟨hal-00717992v2⟩

Share

Metrics

Record views

743

Files downloads

870