Skip to Main content Skip to Navigation
Conference papers

Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition

Sunit Sivasankaran 1 Emmanuel Vincent 1 Dominique Fohr 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : We investigate the effect of speaker localization on the performance of speech recognition systems in a multispeaker, multichannel environment. Given the speaker location information, speech separation is performed in three stages. In the first stage, a simple delay-and-sum (DS) beamformer is used to enhance the signal impinging from the speaker location which is then used to estimate a time-frequency mask corresponding to the localized speaker using a neural network. This mask is used to compute the second order statistics and to derive an adaptive beamformer in the third stage. We generated a multichannel, multispeaker, reverberated, noisy dataset inspired from the well studied WSJ0-2mix and study the performance of the proposed pipeline in terms of the word error rate (WER). An average WER of 29.4% was achieved using the ground truth localization information and 42.4% using the localization information estimated via GCC-PHAT. Though higher signal- to-interference ratio (SIR) between the speakers was found to positively impact the speech separation performance, equivalent performances were obtained for mixtures with lower SIR values when the speakers are well separated in space.
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download

https://hal.inria.fr/hal-02355669
Contributor : Sunit Sivasankaran <>
Submitted on : Tuesday, June 23, 2020 - 6:23:41 PM
Last modification on : Friday, June 26, 2020 - 3:33:32 AM

File

sunits_eusipco_2020.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02355669, version 3

Collections

Citation

Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr. Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition. 28th European Signal Processing Conference, Jan 2021, Amsterdam, Netherlands. ⟨hal-02355669v3⟩

Share

Metrics

Record views

119

Files downloads

82