The IRISA Text-To-Speech System for the Blizzard Challenge 2017

This paper describes the implementation of the IRISA unit selection-based TTS system for our participation to the Blizzard Challenge 2017. We describe the process followed to build the voice from given data and the architecture of our system. It uses a selection cost which integrates notably a DNN-based prosodic prediction and also a specific score to deal with narrative/direct speech parts. Unit selection is based on a Viterbi-based algorithm with preselection filters used to reduce the search space. A penalty is introduced in the concatenation cost to block some concatenations based on their phonological class. Moreover, a fuzzy function is used to relax this penalty based on the concatenation quality with respect to the cost distribution. Integrating a lot of constraints, this system achieves average results compared to others.

Domaines

Intelligence artificielle [cs.AI] Traitement du signal et de l'image [eess.SP] Son [cs.SD] Interface homme-machine [cs.HC]

Fichier principal

IRISA_Blizzard2017.pdf (177.4 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Damien Lolive : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01662361

Soumis le : mercredi 13 décembre 2017-09:53:42

Dernière modification le : mardi 23 janvier 2024-16:10:52

Dates et versions

hal-01662361 , version 1 (13-12-2017)

Identifiants

HAL Id : hal-01662361 , version 1

Citer

Damien Lolive, Pierre Alain, Nelly Barbot, Jonathan Chevelu, Gwénolé Lecorvé, et al.. The IRISA Text-To-Speech System for the Blizzard Challenge 2017. Blizzard Challenge, Aug 2017, Stockholm, Sweden. ⟨hal-01662361⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-RENNES1 CNRS INRIA INSA-RENNES ENSSAT IRISA IRISA-D6 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

479 Consultations

122 Téléchargements