Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

Xiaofei Li; Laurent Girin; Sharon Gannot; Radu Horaud

Pré-Publication, Document De Travail Année : 2018

Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

(1) , (2, 1) , (3) , (1)

1
2
3

Xiaofei Li

Fonction : Auteur

Interpretation and Modelling of Images and Videos

Laurent Girin

Fonction : Auteur
PersonId : 3682
IdHAL : laurent-girin
ORCID : 0000-0002-9214-8760
IdRef : 088998037

GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing

Interpretation and Modelling of Images and Videos

Sharon Gannot

Fonction : Auteur

Bar-Ilan University [Israël]

Radu Horaud

Fonction : Auteur
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Interpretation and Modelling of Images and Videos

Résumé

This paper addresses the problem of audio source recovery from multichannel noisy convolutive mixture for source separation and speech enhancement, assuming known mixing filters. We propose to conduct the source recovery in the short-time Fourier transform domain, and based on the convolutive transfer function (CTF) approximation. Compared to the time domain filters, CTF has much less taps, and thus less near-common zeros among channels and less computational complexity. This work proposes three source recovery methods, i) the multichannel inverse filtering method, i.e. multiple input/output inverse theorem (MINT), is exploited in the CTF domain, and for the multisource case, ii) a beamforming-like multichannel inverse filtering method is proposed appling the single source MINT and power minimization, which is suitable for the case that not the CTFs of all the sources are known, iii) a constrained Lasso method. The sources are recovered by minimizing their $\ell_1$-norm to impose the spectral sparsity, with the constraint that the $\ell_2$-norm fitting cost between the microphone signals and the mixture model involving the unknown source signals is less than a tolerance. The noise can be reduced by setting the tolerance to the noise power. Experiments under various acoustic conditions are conducted to evaluate the three proposed methods. The comparison among them and with the baseline methods are presented.

Mots clés

Source separation short-time Fourier transform Speech enhancement Lasso convolutive transfer function Lasso optimization CTF Audio source separation MINT

Domaines

Traitement du signal et de l'image [eess.SP] Apprentissage [cs.LG] Son [cs.SD]

Fichier principal

bss_mint.pdf (547.6 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Perception team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01645749

Soumis le : lundi 26 février 2018-11:15:19

Dernière modification le : mercredi 3 avril 2024-12:50:03

Dates et versions

hal-01645749 , version 1 (23-11-2017)

hal-01645749 , version 2 (26-02-2018)

hal-01645749 , version 3 (14-05-2018)

Identifiants

HAL Id : hal-01645749 , version 2
ARXIV : 1711.07911

Citer

Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud. Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function. 2018. ⟨hal-01645749v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

584 Consultations

1388 Téléchargements

Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager