Acoustical Frame Rate and Pronunciation Variant Statistics

Denis Jouvet 1 Katarina Bartkova 2
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Speech technology enables computing statistics on word pronunciation variants as well as investigating various phonetic phenomena. This is achieved through a forced alignment of large amounts of speech signals with their possible pronunciations variants. Such alignments are usually performed using a 10 ms frame shift acoustical analysis. Therefore , the three emitting state structure of conventional acoustic hidden Markov models introduces a minimum duration constraint of 30 ms for each phone segment. This constraint is not critical at low speaking rates, but may introduce artefacts at high speaking rates. Thus, this paper investigates the impact of the acoustical frame rate on corpus-based phonetic statistics. Statistics on pronunciation variants obtained with a shorter frame shift (5 ms) are compared to the statistics resulting from the standard 10 ms frame shift. Statistics are computed on a large speech corpus of more than 3 million running words, and are analyzed with respect to the estimated local speaking rate. Results exhibit some discrepancies between the two sets of statistics, in particular for high speaking rates where the usual acoustic analysis frame shift of 10 ms leads to an underestimation of the frequency of the longest pronunciation variants.
Type de document :
Communication dans un congrès
International Conference on Statistical Language and Speech Processing, Nov 2015, Budapest, Hungary. Proceedings SLSP'2015, 3rd International Conference on Statistical Language and Speech Processing
Liste complète des métadonnées

Littérature citée [27 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01184195
Contributeur : Denis Jouvet <>
Soumis le : jeudi 13 août 2015 - 11:26:59
Dernière modification le : jeudi 11 janvier 2018 - 06:27:31
Document(s) archivé(s) le : samedi 14 novembre 2015 - 10:15:45

Fichier

main.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01184195, version 1

Collections

Citation

Denis Jouvet, Katarina Bartkova. Acoustical Frame Rate and Pronunciation Variant Statistics. International Conference on Statistical Language and Speech Processing, Nov 2015, Budapest, Hungary. Proceedings SLSP'2015, 3rd International Conference on Statistical Language and Speech Processing. 〈hal-01184195〉

Partager

Métriques

Consultations de la notice

357

Téléchargements de fichiers

116