inria-00526776, version 1
HMM-based Automatic Visual Speech Segmentation Using Facial Data
Interspeech 2010 (2010) 1401-1404
Résumé : We describe automatic visual speech segmentation using facial data captured by a stereo-vision technique. The segmentation is performed using an HMM-based forced alignment mechanism widely used in automatic speech recognition. The idea is based on the assumption that using visual speech data alone for the training might capture the uniqueness in the facial compo- nent of speech articulation, asynchrony (time lags) in visual and acoustic speech segments and significant coarticulation effects. This should provide valuable information that helps to show the extent to which a phoneme may affect surrounding phonemes visually. This should provide information valuable in labeling the visual speech segments based on dominant coarticulatory contexts.
- a – INRIA
- b – Université Nancy II
- c – Université Henri Poincaré - Nancy I
- 1 :
- INRIA – CNRS : UMR7503 – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL)
- 2 :
- CNRS : UMR7503 – INRIA – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL)
- Domaine : Sciences du Vivant/Autre
Informatique/Multimédia
Informatique/Synthèse d'image et réalité virtuelle
Informatique/Traitement du signal et de l'image
Sciences de l'ingénieur/Traitement du signal et de l'image - Mots-clés : facial speech – speech segmentation – forced alignment – coarticulation
- inria-00526776, version 1
- http://hal.inria.fr/inria-00526776
- oai:hal.inria.fr:inria-00526776
- Contributeur :
- Soumis le : Vendredi 15 Octobre 2010, 16:55:46
- Dernière modification le : Mardi 19 Octobre 2010, 19:37:27


Documents associés
Exporter