Automatic Feature Selection for Acoustic-Visual Concatenative Speech Synthesis: Towards a Perceptual Objective Measure

Utpala Musti 1 Vincent Colotte 1 Slim Ouni 1 Caroline Lavecchia 1 Brigitte Wrobel-Dautcourt 2 Marie-Odile Berger 2
1 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
2 MAGRIT - Visual Augmentation of Complex Environments
Inria Nancy - Grand Est, LORIA - ALGO - Department of Algorithms, Computation, Image and Geometry
Abstract : We present an iterative algorithm for automatic feature selection and weight tuning of target cost in the context of unit selection based audio-visual speech synthesis. We perform feature selection and weight tuning for a given unit-selection corpus to make the ranking given by the target cost function consistent with the ordering given by an objective dissimilarity measure. We explicitly perform feature elimination to reduce the redundancy and noise in target cost calculation based on an objective metric. Finding an objective metric highly correlated to perception should improve the quality of tuning. This is the purpose of the second part where we are making an attempt to such goal. Firstly, we present the human-centered evaluation done of the synthesized audio-visual speech and secondly, its preliminary analysis in relation to the objective evaluation metrics. This analysis of correlation between objective and subjective evaluation results shows interesting patterns which might help in designing better tuning metrics and objective evaluation techniques. The key point is to find a link between objective and perceptual measures.
Type de document :
Communication dans un congrès
AVSP - Audio Visual Speech Processing, Sep 2013, Annecy, France. 2013
Liste complète des métadonnées

https://hal.inria.fr/hal-00925115
Contributeur : Vincent Colotte <>
Soumis le : mardi 7 janvier 2014 - 15:38:02
Dernière modification le : jeudi 11 janvier 2018 - 06:25:24

Identifiants

  • HAL Id : hal-00925115, version 1

Citation

Utpala Musti, Vincent Colotte, Slim Ouni, Caroline Lavecchia, Brigitte Wrobel-Dautcourt, et al.. Automatic Feature Selection for Acoustic-Visual Concatenative Speech Synthesis: Towards a Perceptual Objective Measure. AVSP - Audio Visual Speech Processing, Sep 2013, Annecy, France. 2013. 〈hal-00925115〉

Partager

Métriques

Consultations de la notice

246