Uncertainty-based learning of acoustic models from noisy data

Alexey Ozerov 1 Mathieu Lagrange 2 Emmanuel Vincent 3, 4
3 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
4 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : We consider the problem of acoustic modeling of noisy speech data, where the uncertainty over the data is given by a Gaussian distribution. While this uncertainty has been exploited at the decoding stage via uncertainty decoding, its usage at the training stage remains limited to static model adaptation. We introduce a new Expectation Maximisation (EM) based technique, which we call uncertainty training, that allows us to train Gaussian mixture models (GMMs) or hidden Markov models (HMMs) directly from noisy data with dynamic uncertainty. We evaluate the potential of this technique for a GMM-based speaker recognition task on speech data corrupted by real-world domestic background noise, using a state-of-the-art signal enhancement technique and various uncertainty estimation techniques as a front-end. Compared to conventional training, the proposed training algorithm results in 1% to 2% absolute improvement in speaker recognition accuracy by training from either matched, unmatched or multi-condition noisy data. This algorithm is also applicable with minor modifications to maximum a posteriori (MAP) or maximum likelihood linear regression (MLLR) acoustic model adaptation from noisy data and to other data than audio.
Type de document :
Article dans une revue
Computer Speech and Language, Elsevier, 2013, 27 (3), pp.874-894. 〈10.1016/j.csl.2012.07.002〉
Liste complète des métadonnées

Littérature citée [40 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00717992
Contributeur : Emmanuel Vincent <>
Soumis le : mercredi 17 avril 2013 - 22:47:50
Dernière modification le : jeudi 11 janvier 2018 - 06:25:24
Document(s) archivé(s) le : jeudi 18 juillet 2013 - 04:10:14

Fichier

ozerov_CSL12.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Alexey Ozerov, Mathieu Lagrange, Emmanuel Vincent. Uncertainty-based learning of acoustic models from noisy data. Computer Speech and Language, Elsevier, 2013, 27 (3), pp.874-894. 〈10.1016/j.csl.2012.07.002〉. 〈hal-00717992v2〉

Partager

Métriques

Consultations de la notice

518

Téléchargements de fichiers

337