Experiments in audio source separation with one sensor for robust speech recognition

Laurent Benaroya 1 Frédéric Bimbot 1 Guillaume Gravier 1 Rémi Gribonval 1
1 METISS - Speech and sound data modeling and processing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : This paper focuses on the problem of noise compensation in speech signals for robust speech recognition. We investi- gate on a novel paradigm based on source separation techniques to remove music from speech, a common situation in broadcast news transcription tasks. The two methods proposed, namely adaptive Wiener filtering and adaptive shrinkage, rely on the use of a dictionary of spectral shapes to deal with the non-stationarity of the signals. Unlike most classical noise suppression methods, we assume a prior knowledge of the sources that are mixed. The proposed algorithms are compared to simple standard approaches on the source separation task and assessed in terms of average distortion. Their effect on the entire transcription system is eventually compared in terms of word error rate. Results indicate that source separation tech- niques show some effectiveness for robust transcription at signal/noise ratio lower than 15 dB. We also observe that the improvement of the word error rate is correlated to the spectral distortion rather than to specific source separation per- formance measure such as the signal to interference ratio.
Mots-clés : source separation
Type de document :
Article dans une revue
Speech Communication, Elsevier : North-Holland, 2006, Non Linear Speech Processing (NOLISP), 48 (7), pp.848-854. 〈10.1016/j.specom.2005.11.002〉
Liste complète des métadonnées

https://hal.inria.fr/inria-00544988
Contributeur : Rémi Gribonval <>
Soumis le : mardi 20 août 2013 - 15:51:32
Dernière modification le : mercredi 11 avril 2018 - 01:53:12

Identifiants

Citation

Laurent Benaroya, Frédéric Bimbot, Guillaume Gravier, Rémi Gribonval. Experiments in audio source separation with one sensor for robust speech recognition. Speech Communication, Elsevier : North-Holland, 2006, Non Linear Speech Processing (NOLISP), 48 (7), pp.848-854. 〈10.1016/j.specom.2005.11.002〉. 〈inria-00544988〉

Partager

Métriques

Consultations de la notice

217