Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization

Xiaofei Li 1 Laurent Girin 1, 2 Radu Horaud 1 Sharon Gannot 3
1 PERCEPTION - Interpretation and Modelling of Images and Videos
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann, INPG - Institut National Polytechnique de Grenoble
2 GIPSA-CRISSP - CRISSP
GIPSA-DPC - Département Parole et Cognition
Abstract : This paper addresses the problem of binaural localization of a single speech source in noisy and reverberant environments. For a given binaural microphone setup, the binaural response corresponding to the direct-path propagation of a single source is a function of the source direction. In practice, this response is contaminated by noise and reverberations. The direct-path relative transfer function (DP-RTF) is defined as the ratio between the direct-path acoustic transfer function of the two channels. We propose a method to estimate the DP-RTF from the noisy and reverberant microphone signals in the short-time Fourier transform domain. First, the convolutive transfer function approximation is adopted to accurately represent the impulse response of the sensors in the STFT domain. Second, the DP-RTF is estimated by using the auto-and cross-power spectral densities at each frequency and over multiple frames. In the presence of stationary noise, an inter-frame spectral subtraction algorithm is proposed, which enables to achieve the estimation of noise-free auto-and cross-power spectral densities. Finally, the estimated DP-RTFs are concatenated across frequencies and used as a feature vector for the localization of speech source. Experiments with both simulated and real data show that the proposed localization method performs well, even under severe adverse acoustic conditions, and outperforms state-of-the-art localization methods under most of the acoustic conditions.
Type de document :
Article dans une revue
IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2016, 24 (11), pp.2171 - 2186. 〈10.1109/TASLP.2016.2598319〉
Liste complète des métadonnées

Littérature citée [40 références]  Voir  Masquer  Télécharger


https://hal.inria.fr/hal-01349691
Contributeur : Team Perception <>
Soumis le : jeudi 28 juillet 2016 - 12:49:44
Dernière modification le : mercredi 11 avril 2018 - 01:59:41
Document(s) archivé(s) le : samedi 29 octobre 2016 - 10:55:08

Fichiers

binaural_localization.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Xiaofei Li, Laurent Girin, Radu Horaud, Sharon Gannot. Estimation of the Direct-Path Relative Transfer Function for Supervised Sound-Source Localization. IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2016, 24 (11), pp.2171 - 2186. 〈10.1109/TASLP.2016.2598319〉. 〈hal-01349691〉

Partager

Métriques

Consultations de la notice

1153

Téléchargements de fichiers

243