Nonparametric uncertainty estimation and propagation for noise robust ASR

Dung T. Tran 1 Emmanuel Vincent 2 Denis Jouvet 2
1 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
2 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : We consider the framework of uncertainty propagation for automatic speech recognition (ASR) in highly non-stationary noise environments. Uncertainty is considered as the variance of speech distortion. Yet, its accurate estimation in the spectral domain and its propagation to the feature domain remain difficult. Existing methods typically rely on a single uncertainty estimator and propagator fixed by mathematical approximation. In this paper, we propose a new paradigm where we seek to learn more powerful mappings to predict uncertainty from data. We investigate two such possible mappings: linear fusion of multiple uncertainty estimators/propagators and nonparametric uncertainty estimation/propagation. In addition, a procedure to propagate the estimated spectral-domain uncertainty to the static Mel frequency cepstral coefficients (MFCCs), to the log-energy, and to their first- and second-order time derivatives is proposed. This results in a full uncertainty covariance matrix over both static and dynamic MFCCs. Experimental evaluation on Tracks 1 and 2 of the 2nd CHiME Challenge resulted in up to 29% and 28% relative keyword error rate reduction with respect to speech enhancement alone.
Type de document :
Article dans une revue
IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2015, 23 (11), pp.1835-1846. 〈10.1109/TASLP.2015.2450497〉
Liste complète des métadonnées

Littérature citée [39 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01114329
Contributeur : Dung Tran <>
Soumis le : vendredi 17 juillet 2015 - 13:31:09
Dernière modification le : jeudi 11 janvier 2018 - 06:27:31
Document(s) archivé(s) le : mercredi 26 avril 2017 - 07:26:59

Fichier

FinalVersion.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Dung T. Tran, Emmanuel Vincent, Denis Jouvet. Nonparametric uncertainty estimation and propagation for noise robust ASR. IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2015, 23 (11), pp.1835-1846. 〈10.1109/TASLP.2015.2450497〉. 〈hal-01114329v2〉

Partager

Métriques

Consultations de la notice

598

Téléchargements de fichiers

254