Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR - Archive ouverte HAL Access content directly
Conference Papers Year : 2015

Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR

(1) , (2, 3) , (2) , (4) , (2) , (2) , (5)
1
2
3
4
5

Abstract

We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used ' na¨vely ' as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furthermore, simple, feature-level fusion based extensions to the framework are proposed to improve the integration with the ASR back-end. These yield a best result of 13.76 % average word error rate, which is, to our knowledge, the best score to date.
Fichier principal
Vignette du fichier
weninger_LVA15.pdf (295.58 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01163493 , version 1 (13-06-2015)

Identifiers

  • HAL Id : hal-01163493 , version 1

Cite

Felix Weninger, Hakan Erdogan, Shinji Watanabe, Emmanuel Vincent, Jonathan Le Roux, et al.. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. 12th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Aug 2015, Liberec, Czech Republic. ⟨hal-01163493⟩
6086 View
8597 Download

Share

Gmail Facebook Twitter LinkedIn More