SLOGD: Speaker Location Guided Deflation Approach to Speech Separation - Archive ouverte HAL Access content directly
Conference Papers Year :

SLOGD: Speaker Location Guided Deflation Approach to Speech Separation

(1) , (1) , (1)
1

Abstract

Speech separation is the process of separating multiple speakers from an audio recording. In this work we propose to separate the sources using a Speaker LOcalization Guided Deflation (SLOGD) approach wherein we estimate the sources iteratively. In each iteration we first estimate the location of the speaker and use it to estimate a mask corresponding to the localized speaker. The estimated source is removed from the mixture before estimating the location and mask of the next source. Experiments are conducted on a reverberated, noisy multichannel version of the well-studied WSJ-2MIX dataset using word error rate (WER) as a metric. The proposed method achieves a WER of 44.2 %, a 34% relative improvement over the system without separation and 17% relative improvement over Conv-TasNet.
Fichier principal
Vignette du fichier
sivasankaran.pdf (210.41 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-02355613 , version 1 (08-11-2019)
hal-02355613 , version 2 (14-02-2020)

Identifiers

  • HAL Id : hal-02355613 , version 2

Cite

Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr. SLOGD: Speaker Location Guided Deflation Approach to Speech Separation. ICASSP 2020 - 45th International Conference on Acoustics, Speech, and Signal Processing, May 2020, Barcelona, Spain. ⟨hal-02355613v2⟩
158 View
190 Download

Share

Gmail Facebook Twitter LinkedIn More