Online Monaural Speech Enhancement Using Delayed Subband LSTM

Xiaofei Li; Radu Horaud

doi:10.21437/Interspeech.2020-2091

Communication Dans Un Congrès Année : 2020

Online Monaural Speech Enhancement Using Delayed Subband LSTM

(1, 2) , (2)

1
2

Xiaofei Li

Fonction : Auteur

Westlake University

Interpretation and Modelling of Images and Videos

Radu Horaud

Fonction : Auteur
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Interpretation and Modelling of Images and Videos

Résumé

This paper proposes a delayed subband LSTM network for online monaural (single-channel) speech enhancement. The proposed method is developed in the short time Fourier transform (STFT) domain. Online processing requires frame-by-frame signal reception and processing. A paramount feature of the proposed method is that the same LSTM is used across frequencies, which drastically reduces the number of network parameters, the amount of training data and the computational burden. Training is performed in a subband manner: the input consists of one frequency, together with a few context frequencies. The network learns a speech-to-noise discriminative function relying on the signal stationarity and on the local spectral pattern, based on which it predicts a clean-speech mask at each frequency. To exploit future information, i.e. look-ahead, we propose an output-delayed subband architecture, which allows the unidirectional forward network to process a few future frames in addition to the current frame. We leverage the proposed method to participate to the DNS real-time speech enhancement challenge. Experiments with the DNS dataset show that the proposed method achieves better performance-measuring scores than the DNS baseline method, which learns the full-band spectra using a gated recurrent unit network.

Mots clés

Online monaural speech enhancement denoising subband LSTM output-delayed network

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Traitement du signal et de l'image [eess.SP] Apprentissage [cs.LG] Son [cs.SD]

Fichier principal

main.pdf (195.42 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Perception team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02907455

Soumis le : lundi 27 juillet 2020-16:06:34

Dernière modification le : jeudi 4 avril 2024-20:59:23

Archivage à long terme le : mardi 1 décembre 2020-07:54:43

Dates et versions

hal-02907455 , version 1 (27-07-2020)

Identifiants

HAL Id : hal-02907455 , version 1
ARXIV : 2005.05037
DOI : 10.21437/Interspeech.2020-2091

Citer

Xiaofei Li, Radu Horaud. Online Monaural Speech Enhancement Using Delayed Subband LSTM. Interspeech 2020, International Speech Communication Association, Oct 2020, Shangai, China. pp.2462-2466, ⟨10.21437/Interspeech.2020-2091⟩. ⟨hal-02907455⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA LJK LJK_GI LJK_GI_PERCEPTION INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

78 Consultations

392 Téléchargements

Online Monaural Speech Enhancement Using Delayed Subband LSTM

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager