A robust method to count, locate and separate audio sources in a multichannel underdetermined mixture

Simon Arberet; Rémi Gribonval; Frédéric Bimbot

Rapport (Rapport De Recherche) Année : 2008

A robust method to count, locate and separate audio sources in a multichannel underdetermined mixture

(1) , (1) , (1)

Simon Arberet

Fonction : Auteur
PersonId : 852482

Speech and sound data modeling and processing

Rémi Gribonval

Fonction : Auteur
PersonId : 1255
IdHAL : remi-gribonval
ORCID : 0000-0002-9450-8125
IdRef : 113181590

Speech and sound data modeling and processing

Frédéric Bimbot

Fonction : Auteur
PersonId : 830967

Speech and sound data modeling and processing

Résumé

We propose a method to count and estimate the mixing directions and the sources in an underdetermined multichannel mixture. Like DUET-type methods, the approach is based on the hypothesis that the sources have time-frequency representations with limited overlap. However, instead of assuming essentially disjoint representations, we only assume that, in the neighbourhood of some time-frequency points, only one source contributes to the mixture: such time-frequency points can provide robust local estimates of the corresponding source direction. At the core of our contribution is a local confidence measure --inspired by the work of Deville on TIFROM-- which detect the time-frequency regions where such a robust information is available. A clustering algorithm called DEMIX is proposed to merge the information from all time-frequency regions according to their confidence level. Two variants are proposed to treat instantaneous and anechoic mixtures. In the latter case, to overcome the intrinsic ambiguities of phase unwrapping as met with DUET, we propose a technique similar to GCC-PHAT to estimate time-delay parameters from phase differences between time-frequency representations of different channels. The resulting method is shown to be robust in conditions where all DUET-like comparable methods fail: a) when time-delays largely exceed one sample; b) when the source directions are very close. As an example, experiments show that, in more than 65% of the tested stereophonic mixtures of six speech sources, DEMIX-Anechoic correctly estimates the number of sources and outperforms DUET in the accuracy, providing a distance error 10 times lower.

Mots clés

separation de source aveugle audio multicanal direction d'arrivée estimation de délais analyse en composantes parcimonieuses blind source separation multichannel audio direction of arrival delay estimation sparse component analysis

Domaines

Son [cs.SD]

Fichier principal

RR-6593.pdf (725.71 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Simon Arberet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00305435

Soumis le : lundi 4 août 2008-15:27:22

Dernière modification le : vendredi 24 mars 2023-14:52:50

Archivage à long terme le : samedi 26 novembre 2016-01:35:14

Dates et versions

inria-00305435 , version 1 (24-07-2008)

inria-00305435 , version 2 (04-08-2008)

Identifiants

HAL Id : inria-00305435 , version 2

Citer

Simon Arberet, Rémi Gribonval, Frédéric Bimbot. A robust method to count, locate and separate audio sources in a multichannel underdetermined mixture. [Research Report] RR-6593, INRIA. 2008, pp.29. ⟨inria-00305435v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA INRIA-RRRT IRISA-D5 INRIA2 LARA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

414 Consultations

190 Téléchargements

A robust method to count, locate and separate audio sources in a multichannel underdetermined mixture

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager