SLOGD: Speaker Location Guided Deflation Approach to Speech Separation

Sunit Sivasankaran 1 Emmanuel Vincent 1 Dominique Fohr 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Speech separation is the process of separating multiple speakers from an audio recording. In this work we propose to separate the sources using a Speaker LOcalization Guided Deflation (SLOGD) approach wherein we estimate the sources iteratively. In each iteration we first estimate the location of the speaker and use it to estimate a mask corresponding to the localized speaker. The estimated source is removed from the mixture before estimating the location and mask of the next source. Experiments are conducted on a reverberated, noisy multichannel version of the well-studied WSJ-2MIX dataset using word error rate (WER) as a metric. The proposed method achieves a WER of 44.2 %, a 34% relative improvement over the system without separation and 17% relative improvement over Conv-TasNet.
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal.inria.fr/hal-02355613
Contributor : Sunit Sivasankaran <>
Submitted on : Friday, November 8, 2019 - 12:58:04 PM
Last modification on : Friday, November 15, 2019 - 10:20:39 AM

File

loop_est.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02355613, version 1

Citation

Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr. SLOGD: Speaker Location Guided Deflation Approach to Speech Separation. 2019. ⟨hal-02355613⟩

Share

Metrics

Record views

45

Files downloads

192