Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings

Lauréline Perotin 1, 2 Romain Serizel 1 Emmanuel Vincent 1 Alexandre Guérin 2
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : We present a source separation system for high-order ambisonics (HOA) contents. We derive a multichannel spatial filter from a mask estimated by a long short-term memory (LSTM) recurrent neural network. We combine one channel of the mixture with the outputs of basic HOA beamformers as inputs to the LSTM, assuming that we know the directions of arrival of the directional sources. In our experiments, the speech of interest can be corrupted either by diffuse noise or by an equally loud competing speaker. We show that adding as input the output of the beamformer steered toward the competing speech in addition to that of the beamformer steered toward the target speech brings significant improvements in terms of word error rate.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [25 references]  Display  Hide  Download

https://hal.inria.fr/hal-01699759
Contributor : Lauréline Perotin <>
Submitted on : Monday, April 30, 2018 - 9:56:15 AM
Last modification on : Wednesday, April 3, 2019 - 1:23:11 AM
Document(s) archivé(s) le : Tuesday, September 25, 2018 - 4:57:03 PM

File

2018-Perotin-Multichannel_spee...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01699759, version 2

Collections

Citation

Lauréline Perotin, Romain Serizel, Emmanuel Vincent, Alexandre Guérin. Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings. 43rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2018), Apr 2018, Calgary, Canada. ⟨hal-01699759v2⟩

Share

Metrics

Record views

278

Files downloads

1184