Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR

Abstract : We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used ' na¨vely ' as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furthermore, simple, feature-level fusion based extensions to the framework are proposed to improve the integration with the ASR back-end. These yield a best result of 13.76 % average word error rate, which is, to our knowledge, the best score to date.
Type de document :
Communication dans un congrès
12th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Aug 2015, Liberec, Czech Republic. 2015
Liste complète des métadonnées

Littérature citée [17 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01163493
Contributeur : Emmanuel Vincent <>
Soumis le : samedi 13 juin 2015 - 10:46:45
Dernière modification le : jeudi 11 janvier 2018 - 06:27:31
Document(s) archivé(s) le : lundi 14 septembre 2015 - 10:05:47

Fichier

weninger_LVA15.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01163493, version 1

Citation

Felix Weninger, Hakan Erdogan, Shinji Watanabe, Emmanuel Vincent, Jonathan Le Roux, et al.. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. 12th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Aug 2015, Liberec, Czech Republic. 2015. 〈hal-01163493〉

Partager

Métriques

Consultations de la notice

669

Téléchargements de fichiers

3667