Acoustic Space Learning for Sound-Source Separation and Localization on Binaural Manifolds

In this paper we address the problems of modeling the acoustic space generated by a full-spectrum sound source and of using the learned model for the localization and separation of multiple sources that simultaneously emit sparse-spectrum sounds. We lay theoretical and methodological grounds in order to introduce the binaural manifold paradigm. We perform an in-depth study of the latent low-dimensional structure of the high-dimensional interaural spectral data, based on a corpus recorded with a human-like audiomotor robot head. A non-linear dimensionality reduction technique is used to show that these data lie on a two-dimensional (2D) smooth manifold parameterized by the motor states of the listener, or equivalently, the sound source directions. We propose a probabilistic piecewise affine mapping model (PPAM) specifically designed to deal with high-dimensional data exhibiting an intrinsic piecewise linear structure. We derive a closed-form expectation-maximization (EM) procedure for estimating the model parameters, followed by Bayes inversion for obtaining the full posterior density function of a sound source direction. We extend this solution to deal with missing data and redundancy in real world spectrograms, and hence for 2D localization of natural sound sources such as speech. We further generalize the model to the challenging case of multiple sound sources and we propose a variational EM framework. The associated algorithm, referred to as variational EM for source separation and localization (VESSL) yields a Bayesian estimation of the 2D locations and time-frequency masks of all the sources. Comparisons of the proposed approach with several existing methods reveal that the combination of acoustic-space learning with Bayesian inference enables our method to outperform state-of-the-art methods.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Machine Learning [stat.ML] Acoustique [physics.class-ph] Acoustique [physics.class-ph] Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

submission-ijns13.pdf (5.09 Mo)

emb_high-ILD.png (834.66 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Format : Figure, Image

Perception team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00960796

Soumis le : mercredi 19 mars 2014-09:11:19

Dernière modification le : jeudi 4 avril 2024-21:28:55

Archivage à long terme le : lundi 10 avril 2017-01:08:08

Dates et versions

hal-00960796 , version 1 (19-03-2014)

Identifiants

HAL Id : hal-00960796 , version 1
DOI : 10.1142/S0129065714400036

Citer

Antoine Deleforge, Florence Forbes, Radu Horaud. Acoustic Space Learning for Sound-Source Separation and Localization on Binaural Manifolds. International Journal of Neural Systems, 2015, 25 (1), 21p. ⟨10.1142/S0129065714400036⟩. ⟨hal-00960796⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA LJK LJK_GI LJK_PS LJK_GI_PERCEPTION LJK_PS_MISTIS INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

876 Consultations

456 Téléchargements