Narrow-band Deep Filtering for Multichannel Speech Enhancement

Xiaofei Li; Radu Horaud

Pré-Publication, Document De Travail Année : 2019

Narrow-band Deep Filtering for Multichannel Speech Enhancement

(1) , (1)

Xiaofei Li

Fonction : Auteur

Interpretation and Modelling of Images and Videos

Radu Horaud

Fonction : Auteur
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Interpretation and Modelling of Images and Videos

Résumé

In this paper we address the problem of multichan-nel speech enhancement in the short-time Fourier transform (STFT) domain and in the framework of sequence-to-sequence deep learning. A long short-time memory (LSTM) network takes as input a sequence of STFT coefficients associated with a frequency bin of multichannel noisy-speech signals. The net-work's output is a sequence of single-channel cleaned speech at the same frequency bin. We propose several clean-speech network targets, namely, the magnitude ratio mask, the complex ideal ratio mask, the STFT coefficients and spatial filtering. A prominent feature of the proposed model is that the same LSTM architecture, with identical parameters, is trained across frequency bins. The proposed method is referred to as narrow-band deep filtering. This choice stays in contrast with traditional wide-band speech enhancement methods. The proposed deep filter is able to discriminate between speech and noise by exploiting their different temporal and spatial characteristics: speech is non-stationary and spatially coherent while noise is relatively stationary and weakly correlated across channels. This is similar in spirit with unsupervised techniques, such as spectral subtraction and beamforming. We describe extensive experiments with both mixed signals (noise is added to clean speech) and real signals (live recordings). We empirically evaluate the proposed architecture variants using speech enhancement and speech recognition metrics, and we compare our results with the results obtained with several state of the art methods. In the light of these experiments we conclude that narrow-band deep filtering has very good performance, and excellent generalization capabilities in terms of speaker variability and noise type.

Mots clés

Speech enhancement Speech denoising Deep fitlering Recurrent neural networks LSTM

Domaines

Traitement du signal et de l'image [eess.SP] Apprentissage [cs.LG] Son [cs.SD]

Fichier principal

multichannel_lstm.pdf (1.07 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Perception team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02378413

Soumis le : lundi 25 novembre 2019-10:59:14

Dernière modification le : mercredi 3 avril 2024-12:50:03

Archivage à long terme le : mercredi 26 février 2020-14:39:35

Dates et versions

hal-02378413 , version 1 (25-11-2019)

hal-02378413 , version 2 (23-09-2020)

Identifiants

HAL Id : hal-02378413 , version 1
ARXIV : 1911.10791

Citer

Xiaofei Li, Radu Horaud. Narrow-band Deep Filtering for Multichannel Speech Enhancement. 2019. ⟨hal-02378413v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

956 Consultations

511 Téléchargements

Narrow-band Deep Filtering for Multichannel Speech Enhancement

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager