Saladnet: Self-Attentive Multisource Localization in the Ambisonics Domain - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Saladnet: Self-Attentive Multisource Localization in the Ambisonics Domain

Résumé

In this work, we propose a novel self-attention based neural network for robust multi-speaker localization from Ambisonics recordings. Starting from a state-of-the-art convolutional recurrent neural network, we investigate the benefit of replacing the recurrent layers by self-attention encoders, inherited from the Transformer architecture. We evaluate these models on synthetic and real-world data, with up to 3 simultaneous speakers. The obtained results indicate that the majority of the proposed architectures either perform on par, or outperform the CRNN baseline, especially in the multisource scenario. Moreover, by avoiding the recurrent layers, the proposed models lend themselves to parallel computing, which is shown to produce considerable savings in execution time.
Fichier principal
Vignette du fichier
waspaa2021.pdf (300.91 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03537340 , version 1 (20-01-2022)

Identifiants

Citer

Pierre-Amaury Grumiaux, Srdan Kitić, Prerak Srivastava, Laurent Girin, Alexandre Guérin. Saladnet: Self-Attentive Multisource Localization in the Ambisonics Domain. WASPAA 2021 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Oct 2021, New Paltz / Virtual, United States. pp.336-340, ⟨10.1109/WASPAA52581.2021.9632737⟩. ⟨hal-03537340⟩
52 Consultations
109 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More