SLOGD: Speaker Location Guided Deflation Approach to Speech Separation - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

SLOGD: Speaker Location Guided Deflation Approach to Speech Separation

Résumé

Speech separation is the process of separating multiple speakers from an audio recording. In this work we propose to separate the sources using a Speaker LOcalization Guided Deflation (SLOGD) approach wherein we estimate the sources iteratively. In each iteration we first estimate the location of the speaker and use it to estimate a mask corresponding to the localized speaker. The estimated source is removed from the mixture before estimating the location and mask of the next source. Experiments are conducted on a reverberated, noisy multichannel version of the well-studied WSJ-2MIX dataset using word error rate (WER) as a metric. The proposed method achieves a WER of 44.2 %, a 34% relative improvement over the system without separation and 17% relative improvement over Conv-TasNet.
Fichier principal
Vignette du fichier
sivasankaran.pdf (210.41 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02355613 , version 1 (08-11-2019)
hal-02355613 , version 2 (14-02-2020)

Identifiants

  • HAL Id : hal-02355613 , version 2

Citer

Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr. SLOGD: Speaker Location Guided Deflation Approach to Speech Separation. ICASSP 2020 - 45th International Conference on Acoustics, Speech, and Signal Processing, May 2020, Barcelona, Spain. ⟨hal-02355613v2⟩
166 Consultations
231 Téléchargements

Partager

Gmail Facebook X LinkedIn More