HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Segmentation Parole/Musique pour la transcription automatique

Joseph Razik 1 Dominique Fohr 1 Odile Mella 1 Nathalie Parlangeau-Vallès
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Dans cet article nous présentons une étude sur l'utilisation des paramètres MFCC dans la tâche de segmentation parole/musique indispensable dans les applications de transcription automatique d'émissions radiophoniques. Nous avons étudié, l'influence du nombre de coefficients MFCC, celle des paramètres dynamiques et notamment celui de la variance des coefficients MFCC calculée sur une seconde. Les tests ont été effectués sur un corpus radiophonique réel et difficile et sur le corpus de Scheirer. || The speech/music segmentation process is very useful as a first step for different tasks like speech recognition or automatic transcription. In this article, we present some studies about the use of MFCC for this speech/music segmentation. We mainly use a realworld broadcast corpus with various backgrounds and superimposed segments (speech with music). We investigate the role of the number of cepstral coefficients, the influence of different kinds of dynamic parameters, and the robustness of some of them when a mismatch between train and test conditions occurs. So we can notice that the standard MFCC coefficients with the first and second derivatives achieve good results. But, better performances were obtained with dynamic parameters and mainly with the variance of the static coefficients computed on a long–term window (1s).
Document type :
Conference papers
Complete list of metadata

https://hal.inria.fr/inria-00107763
Contributor : Publications Loria Connect in order to contact the contributor
Submitted on : Thursday, October 19, 2006 - 9:08:15 AM
Last modification on : Friday, February 18, 2022 - 6:38:04 AM
Long-term archiving on: : Friday, November 25, 2016 - 1:03:08 PM

Identifiers

  • HAL Id : inria-00107763, version 1

Collections

Citation

Joseph Razik, Dominique Fohr, Odile Mella, Nathalie Parlangeau-Vallès. Segmentation Parole/Musique pour la transcription automatique. Actes des XXVes Journées d'Etude sur la Parole - JEP'2004, 2004, Fès, Maroc. 4 p. ⟨inria-00107763⟩

Share

Metrics

Record views

213

Files downloads

202