SLOGD: Speaker Location Guided Deflation Approach to Speech Separation

Sunit Sivasankaran 1 Emmanuel Vincent 1 Dominique Fohr 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Speech separation is the process of separating multiple speakers from an audio recording. In this work we propose to separate the sources using a Speaker LOcalization Guided Deflation (SLOGD) approach wherein we estimate the sources iteratively. In each iteration we first estimate the location of the speaker and use it to estimate a mask corresponding to the localized speaker. The estimated source is removed from the mixture before estimating the location and mask of the next source. Experiments are conducted on a reverberated, noisy multichannel version of the well-studied WSJ-2MIX dataset using word error rate (WER) as a metric. The proposed method achieves a WER of 44.2 %, a 34% relative improvement over the system without separation and 17% relative improvement over Conv-TasNet.
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download
Contributor : Sunit Sivasankaran <>
Submitted on : Friday, November 8, 2019 - 12:58:04 PM
Last modification on : Friday, November 15, 2019 - 10:20:39 AM


Files produced by the author(s)


  • HAL Id : hal-02355613, version 1


Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr. SLOGD: Speaker Location Guided Deflation Approach to Speech Separation. 2019. ⟨hal-02355613⟩



Record views


Files downloads