Multichannel audio source separation with deep neural networks

Aditya Arie Nugraha 1 Antoine Liutkus 1 Emmanuel Vincent 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : This article addresses the problem of multichannel audio source separation. We propose a framework where deep neural networks (DNNs) are used to model the source spectra and combined with the classical multichannel Gaussian model to exploit the spatial information. The parameters are estimated in an iterative expectation-maximization (EM) fashion and used to derive a multichannel Wiener filter. We present an extensive experimental study to show the impact of different design choices on the performance of the proposed technique. We consider different cost functions for the training of DNNs, namely the probabilistically motivated Itakura-Saito divergence, and also Kullback-Leibler, Cauchy, mean squared error, and phase-sensitive cost functions. We also study the number of EM iterations and the use of multiple DNNs, where each DNN aims to improve the spectra estimated by the preceding EM iteration. Finally, we present its application to a speech enhancement problem. The experimental results show the benefit of the proposed multichannel approach over a single-channel DNN-based approach and the conventional multichannel nonnegative matrix factorization based iterative EM algorithm.
Type de document :
Article dans une revue
IEEE/ACM Transactions on Audio, Speech, and Language Processing, IEEE, 2016, 24 (10), pp.1652-1664. 〈10.1109/TASLP.2016.2580946〉
Liste complète des métadonnées

Littérature citée [58 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01163369
Contributeur : Aditya Arie Nugraha <>
Soumis le : mardi 21 juin 2016 - 09:44:19
Dernière modification le : vendredi 27 avril 2018 - 14:00:06

Fichier

main.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Aditya Arie Nugraha, Antoine Liutkus, Emmanuel Vincent. Multichannel audio source separation with deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, IEEE, 2016, 24 (10), pp.1652-1664. 〈10.1109/TASLP.2016.2580946〉. 〈hal-01163369v5〉

Partager

Métriques

Consultations de la notice

1446

Téléchargements de fichiers

5254