Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings

Lauréline Perotin; Romain Serizel; Emmanuel Vincent; Alexandre Guérin

Communication Dans Un Congrès Année : 2018

Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings

(1, 2) , (1) , (1) , (2)

1
2

Lauréline Perotin

Fonction : Auteur
PersonId : 17508
IdHAL : laureline-perotin
IdRef : 241191971

Speech Modeling for Facilitating Oral-Based Communication

Orange Labs [Cesson-Sévigné]

Romain Serizel

Fonction : Auteur
PersonId : 10320
IdHAL : romain-serizel
IdRef : 223797391

Speech Modeling for Facilitating Oral-Based Communication

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Speech Modeling for Facilitating Oral-Based Communication

Alexandre Guérin

Fonction : Auteur

Orange Labs [Cesson-Sévigné]

Résumé

We present a source separation system for high-order ambisonics (HOA) contents. We derive a multichannel spatial filter from a mask estimated by a long short-term memory (LSTM) recurrent neural network. We combine one channel of the mixture with the outputs of basic HOA beamformers as inputs to the LSTM, assuming that we know the directions of arrival of the directional sources. In our experiments, the speech of interest can be corrupted either by diffuse noise or by an equally loud competing speaker. We show that adding as input the output of the beamformer steered toward the competing speech in addition to that of the beamformer steered toward the target speech brings significant improvements in terms of word error rate.

Mots clés

Speech separation high-order ambisonics (HOA) multichannel filtering LSTM

Domaines

Traitement du signal et de l'image [eess.SP]

Fichier principal

2018-Perotin-Multichannel_speech_separation_hoa.pdf (347.49 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Lauréline Perotin : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01699759

Soumis le : lundi 30 avril 2018-09:56:15

Dernière modification le : jeudi 1 février 2024-10:05:50

Archivage à long terme le : mardi 25 septembre 2018-16:57:03

Dates et versions

hal-01699759 , version 1 (05-02-2018)

hal-01699759 , version 2 (30-04-2018)

Identifiants

HAL Id : hal-01699759 , version 2

Citer

Lauréline Perotin, Romain Serizel, Emmanuel Vincent, Alexandre Guérin. Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings. 43rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2018), Apr 2018, Calgary, Canada. ⟨hal-01699759v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

726 Consultations

2629 Téléchargements

Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager