Skip to Main content Skip to Navigation
Conference papers

SLOGD: Speaker Location Guided Deflation Approach to Speech Separation

Sunit Sivasankaran 1 Emmanuel Vincent 1 Dominique Fohr 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Speech separation is the process of separating multiple speakers from an audio recording. In this work we propose to separate the sources using a Speaker LOcalization Guided Deflation (SLOGD) approach wherein we estimate the sources iteratively. In each iteration we first estimate the location of the speaker and use it to estimate a mask corresponding to the localized speaker. The estimated source is removed from the mixture before estimating the location and mask of the next source. Experiments are conducted on a reverberated, noisy multichannel version of the well-studied WSJ-2MIX dataset using word error rate (WER) as a metric. The proposed method achieves a WER of 44.2 %, a 34% relative improvement over the system without separation and 17% relative improvement over Conv-TasNet.
Complete list of metadata

Cited literature [22 references]  Display  Hide  Download
Contributor : Sunit Sivasankaran Connect in order to contact the contributor
Submitted on : Friday, February 14, 2020 - 9:49:30 AM
Last modification on : Saturday, October 16, 2021 - 11:26:10 AM
Long-term archiving on: : Friday, May 15, 2020 - 12:35:49 PM


Files produced by the author(s)


  • HAL Id : hal-02355613, version 2


Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr. SLOGD: Speaker Location Guided Deflation Approach to Speech Separation. ICASSP 2020 - 45th International Conference on Acoustics, Speech, and Signal Processing, May 2020, Barcelona, Spain. ⟨hal-02355613v2⟩



Record views


Files downloads