Skip to Main content Skip to Navigation
Conference papers

SLOGD: Speaker Location Guided Deflation Approach to Speech Separation

Sunit Sivasankaran 1 Emmanuel Vincent 1 Dominique Fohr 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Speech separation is the process of separating multiple speakers from an audio recording. In this work we propose to separate the sources using a Speaker LOcalization Guided Deflation (SLOGD) approach wherein we estimate the sources iteratively. In each iteration we first estimate the location of the speaker and use it to estimate a mask corresponding to the localized speaker. The estimated source is removed from the mixture before estimating the location and mask of the next source. Experiments are conducted on a reverberated, noisy multichannel version of the well-studied WSJ-2MIX dataset using word error rate (WER) as a metric. The proposed method achieves a WER of 44.2 %, a 34% relative improvement over the system without separation and 17% relative improvement over Conv-TasNet.
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal.inria.fr/hal-02355613
Contributor : Sunit Sivasankaran <>
Submitted on : Friday, November 8, 2019 - 12:58:04 PM
Last modification on : Saturday, February 15, 2020 - 1:36:36 AM
Document(s) archivé(s) le : Sunday, February 9, 2020 - 6:15:17 PM

File

loop_est.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02355613, version 1

Citation

Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr. SLOGD: Speaker Location Guided Deflation Approach to Speech Separation. 45th International Conference on Acoustics, Speech, and Signal Processing, May 2020, Barcelona, Spain. ⟨hal-02355613v1⟩

Share

Metrics

Record views

78

Files downloads

208