Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR

Abstract : We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used ' na¨vely ' as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furthermore, simple, feature-level fusion based extensions to the framework are proposed to improve the integration with the ASR back-end. These yield a best result of 13.76 % average word error rate, which is, to our knowledge, the best score to date.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [17 references]  Display  Hide  Download

https://hal.inria.fr/hal-01163493
Contributor : Emmanuel Vincent <>
Submitted on : Saturday, June 13, 2015 - 10:46:45 AM
Last modification on : Saturday, March 30, 2019 - 1:26:26 AM
Document(s) archivé(s) le : Monday, September 14, 2015 - 10:05:47 AM

File

weninger_LVA15.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01163493, version 1

Citation

Felix Weninger, Hakan Erdogan, Shinji Watanabe, Emmanuel Vincent, Jonathan Le Roux, et al.. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. 12th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Aug 2015, Liberec, Czech Republic. ⟨hal-01163493⟩

Share

Metrics

Record views

4168

Files downloads

6681