Text-informed audio source separation. Example-based approach using non-negative matrix partial co-factorization

Luc Le Magoarou; Alexey Ozerov; Ngoc Duong

Article Dans Une Revue Journal of Signal Processing Systems Année : 2014

Text-informed audio source separation. Example-based approach using non-negative matrix partial co-factorization

(1) , (2) , (2)

1
2

Luc Le Magoarou

Fonction : Auteur
PersonId : 4463
IdHAL : luc-le-magoarou
ORCID : 0000-0002-2871-4406
IdRef : 197010563

Parcimonie et Nouveaux Algorithmes pour le Signal et la Modélisation Audio

Alexey Ozerov

Fonction : Auteur
PersonId : 930358

Technicolor R & I [Cesson Sévigné]

Ngoc Duong

Fonction : Auteur
PersonId : 946470

Technicolor R & I [Cesson Sévigné]

Résumé

The so-called informed audio source separation, where the separation process is guided by some auxiliary information, has recently attracted a lot of research interest since classical blind or non-informed approaches often do not lead to satisfactory performances in many practical applications. In this paper we present a novel text-informed framework in which a target speech source can be separated from the background in the mixture using the corresponding textual information. First, given the text, we propose to produce a speech example via either a speech synthesizer or a human. We then use this example to guide source separation and, for that purpose, we introduce a new variant of the non-negative matrix partial co-factorization (NMPCF) model based on a so-called excitation-filter-channel speech model. Such a modeling allows sharing the linguistic information between the speech example and the speech in the mixture. The corresponding multiplicative update (MU) rules are eventually derived for the parameters estimation and several extensions of the model are proposed and investigated. We perform extensive experiments to assess the effectiveness of the proposed approach in terms of source separation and alignment performance.

Domaines

Apprentissage [cs.LG] Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

template.pdf (658.17 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Luc Le Magoarou : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01010602

Soumis le : vendredi 20 juin 2014-09:58:33

Dernière modification le : vendredi 24 mars 2023-14:52:59

Archivage à long terme le : samedi 20 septembre 2014-10:45:37

Dates et versions

hal-01010602 , version 1 (20-06-2014)

Identifiants

HAL Id : hal-01010602 , version 1

Citer

Luc Le Magoarou, Alexey Ozerov, Ngoc Duong. Text-informed audio source separation. Example-based approach using non-negative matrix partial co-factorization. Journal of Signal Processing Systems, 2014, pp.13. ⟨hal-01010602⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-D5 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

399 Consultations

455 Téléchargements

Text-informed audio source separation. Example-based approach using non-negative matrix partial co-factorization

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager