Skip to Main content Skip to Navigation
Conference papers

Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings

Lauréline Perotin 1, 2 Romain Serizel 1 Emmanuel Vincent 1 Alexandre Guérin 2
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : We present a source separation system for high-order ambisonics (HOA) contents. We derive a multichannel spatial filter from a mask estimated by a long short-term memory (LSTM) recurrent neural network. We combine one channel of the mixture with the outputs of basic HOA beamformers as inputs to the LSTM, assuming that we know the directions of arrival of the directional sources. In our experiments, the speech of interest can be corrupted either by diffuse noise or by an equally loud competing speaker. We show that adding as input the output of the beamformer steered toward the competing speech in addition to that of the beamformer steered toward the target speech brings significant improvements in terms of word error rate.
Document type :
Conference papers
Complete list of metadata

Cited literature [25 references]  Display  Hide  Download
Contributor : Lauréline Perotin Connect in order to contact the contributor
Submitted on : Monday, April 30, 2018 - 9:56:15 AM
Last modification on : Saturday, October 16, 2021 - 11:26:10 AM
Long-term archiving on: : Tuesday, September 25, 2018 - 4:57:03 PM


Files produced by the author(s)


  • HAL Id : hal-01699759, version 2



Lauréline Perotin, Romain Serizel, Emmanuel Vincent, Alexandre Guérin. Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings. 43rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2018), Apr 2018, Calgary, Canada. ⟨hal-01699759v2⟩



Record views


Files downloads