Skip to Main content Skip to Navigation
Conference papers

Online Monaural Speech Enhancement Using Delayed Subband LSTM

Xiaofei Li 1, 2 Radu Horaud 2
2 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, Grenoble INP - Institut polytechnique de Grenoble - Grenoble Institute of Technology
Abstract : This paper proposes a delayed subband LSTM network for online monaural (single-channel) speech enhancement. The proposed method is developed in the short time Fourier transform (STFT) domain. Online processing requires frame-by-frame signal reception and processing. A paramount feature of the proposed method is that the same LSTM is used across frequencies, which drastically reduces the number of network parameters, the amount of training data and the computational burden. Training is performed in a subband manner: the input consists of one frequency, together with a few context frequencies. The network learns a speech-to-noise discriminative function relying on the signal stationarity and on the local spectral pattern, based on which it predicts a clean-speech mask at each frequency. To exploit future information, i.e. look-ahead, we propose an output-delayed subband architecture, which allows the unidirectional forward network to process a few future frames in addition to the current frame. We leverage the proposed method to participate to the DNS real-time speech enhancement challenge. Experiments with the DNS dataset show that the proposed method achieves better performance-measuring scores than the DNS baseline method, which learns the full-band spectra using a gated recurrent unit network.
Complete list of metadata

Cited literature [27 references]  Display  Hide  Download

https://hal.inria.fr/hal-02907455
Contributor : Team Perception <>
Submitted on : Monday, July 27, 2020 - 4:06:34 PM
Last modification on : Friday, March 26, 2021 - 10:24:42 AM
Long-term archiving on: : Tuesday, December 1, 2020 - 7:54:43 AM

File

main.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Xiaofei Li, Radu Horaud. Online Monaural Speech Enhancement Using Delayed Subband LSTM. Interspeech 2020, International Speech Communication Association, Oct 2020, Shangai, China. pp.2462-2466, ⟨10.21437/Interspeech.2020-2091⟩. ⟨hal-02907455⟩

Share

Metrics

Record views

119

Files downloads

521