Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings - Archive ouverte HAL Access content directly
Conference Papers Year :

Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings

(1, 2) , (1) , (1) , (2)
1
2

Abstract

We present a source separation system for high-order ambisonics (HOA) contents. We derive a multichannel spatial filter from a mask estimated by a long short-term memory (LSTM) recurrent neural network. We combine one channel of the mixture with the outputs of basic HOA beamformers as inputs to the LSTM, assuming that we know the directions of arrival of the directional sources. In our experiments, the speech of interest can be corrupted either by diffuse noise or by an equally loud competing speaker. We show that adding as input the output of the beamformer steered toward the competing speech in addition to that of the beamformer steered toward the target speech brings significant improvements in terms of word error rate.
Fichier principal
Vignette du fichier
2018-Perotin-Multichannel_speech_separation_hoa.pdf (347.49 Ko) Télécharger le fichier
Origin : Files produced by the author(s)
Loading...

Dates and versions

hal-01699759 , version 1 (05-02-2018)
hal-01699759 , version 2 (30-04-2018)

Identifiers

  • HAL Id : hal-01699759 , version 2

Cite

Lauréline Perotin, Romain Serizel, Emmanuel Vincent, Alexandre Guérin. Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings. 43rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2018), Apr 2018, Calgary, Canada. ⟨hal-01699759v2⟩
705 View
2529 Download

Share

Gmail Facebook Twitter LinkedIn More