Skip to Main content Skip to Navigation
Conference papers

Automatic Feature Selection for Acoustic-Visual Concatenative Speech Synthesis: Towards a Perceptual Objective Measure

Utpala Musti 1 Vincent Colotte 1 Slim Ouni 1 Caroline Lavecchia 1 Brigitte Wrobel-Dautcourt 2 Marie-Odile Berger 2
1 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
2 MAGRIT - Visual Augmentation of Complex Environments
LORIA - ALGO - Department of Algorithms, Computation, Image and Geometry, Inria Nancy - Grand Est
Abstract : We present an iterative algorithm for automatic feature selection and weight tuning of target cost in the context of unit selection based audio-visual speech synthesis. We perform feature selection and weight tuning for a given unit-selection corpus to make the ranking given by the target cost function consistent with the ordering given by an objective dissimilarity measure. We explicitly perform feature elimination to reduce the redundancy and noise in target cost calculation based on an objective metric. Finding an objective metric highly correlated to perception should improve the quality of tuning. This is the purpose of the second part where we are making an attempt to such goal. Firstly, we present the human-centered evaluation done of the synthesized audio-visual speech and secondly, its preliminary analysis in relation to the objective evaluation metrics. This analysis of correlation between objective and subjective evaluation results shows interesting patterns which might help in designing better tuning metrics and objective evaluation techniques. The key point is to find a link between objective and perceptual measures.
Document type :
Conference papers
Complete list of metadata
Contributor : Vincent Colotte Connect in order to contact the contributor
Submitted on : Tuesday, January 7, 2014 - 3:38:02 PM
Last modification on : Saturday, October 16, 2021 - 11:26:08 AM


  • HAL Id : hal-00925115, version 1


Utpala Musti, Vincent Colotte, Slim Ouni, Caroline Lavecchia, Brigitte Wrobel-Dautcourt, et al.. Automatic Feature Selection for Acoustic-Visual Concatenative Speech Synthesis: Towards a Perceptual Objective Measure. AVSP - Audio Visual Speech Processing, Sep 2013, Annecy, France. ⟨hal-00925115⟩



Record views