Automatic Feature Selection for Acoustic-Visual Concatenative Speech Synthesis: Towards a Perceptual Objective Measure

Utpala Musti; Vincent Colotte; Slim Ouni; Caroline Lavecchia; Brigitte Wrobel-Dautcourt; Marie-Odile Berger

Communication Dans Un Congrès Année : 2013

Automatic Feature Selection for Acoustic-Visual Concatenative Speech Synthesis: Towards a Perceptual Objective Measure

(1) , (1) , (1) , (1) , (2) , (2)

1
2

Utpala Musti

Fonction : Auteur
PersonId : 880717

Analysis, perception and recognition of speech

Vincent Colotte

Fonction : Auteur
PersonId : 16268
IdHAL : vincent-colotte
IdRef : 070401683

Analysis, perception and recognition of speech

Slim Ouni

Fonction : Auteur
PersonId : 1158
IdHAL : slim-ouni
ORCID : 0000-0001-5286-7368

Analysis, perception and recognition of speech

Caroline Lavecchia

Fonction : Auteur
PersonId : 835619

Analysis, perception and recognition of speech

Brigitte Wrobel-Dautcourt

Fonction : Auteur

Visual Augmentation of Complex Environments

Marie-Odile Berger

Fonction : Auteur
PersonId : 830601

Visual Augmentation of Complex Environments

Résumé

We present an iterative algorithm for automatic feature selection and weight tuning of target cost in the context of unit selection based audio-visual speech synthesis. We perform feature selection and weight tuning for a given unit-selection corpus to make the ranking given by the target cost function consistent with the ordering given by an objective dissimilarity measure. We explicitly perform feature elimination to reduce the redundancy and noise in target cost calculation based on an objective metric. Finding an objective metric highly correlated to perception should improve the quality of tuning. This is the purpose of the second part where we are making an attempt to such goal. Firstly, we present the human-centered evaluation done of the synthesized audio-visual speech and secondly, its preliminary analysis in relation to the objective evaluation metrics. This analysis of correlation between objective and subjective evaluation results shows interesting patterns which might help in designing better tuning metrics and objective evaluation techniques. The key point is to find a link between objective and perceptual measures.

Mots clés

Unit selection audio-visual speech synthesis target cost target feature selection weight tuning

Domaines

Interface homme-machine [cs.HC]

Vincent Colotte : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00925115

Soumis le : mardi 7 janvier 2014-15:38:02

Dernière modification le : jeudi 1 février 2024-10:04:26

Dates et versions

hal-00925115 , version 1 (07-01-2014)

Identifiants

HAL Id : hal-00925115 , version 1

Citer

Utpala Musti, Vincent Colotte, Slim Ouni, Caroline Lavecchia, Brigitte Wrobel-Dautcourt, et al.. Automatic Feature Selection for Acoustic-Visual Concatenative Speech Synthesis: Towards a Perceptual Objective Measure. AVSP - Audio Visual Speech Processing, Sep 2013, Annecy, France. ⟨hal-00925115⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA UNIV-LORRAINE INRIA2 LORIA LORIA-ALGO LORIA-NLPKD UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

169 Consultations

0 Téléchargements

Automatic Feature Selection for Acoustic-Visual Concatenative Speech Synthesis: Towards a Perceptual Objective Measure

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager