Acoustical Frame Rate and Pronunciation Variant Statistics

Denis Jouvet 1 Katarina Bartkova 2
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Speech technology enables computing statistics on word pronunciation variants as well as investigating various phonetic phenomena. This is achieved through a forced alignment of large amounts of speech signals with their possible pronunciations variants. Such alignments are usually performed using a 10 ms frame shift acoustical analysis. Therefore , the three emitting state structure of conventional acoustic hidden Markov models introduces a minimum duration constraint of 30 ms for each phone segment. This constraint is not critical at low speaking rates, but may introduce artefacts at high speaking rates. Thus, this paper investigates the impact of the acoustical frame rate on corpus-based phonetic statistics. Statistics on pronunciation variants obtained with a shorter frame shift (5 ms) are compared to the statistics resulting from the standard 10 ms frame shift. Statistics are computed on a large speech corpus of more than 3 million running words, and are analyzed with respect to the estimated local speaking rate. Results exhibit some discrepancies between the two sets of statistics, in particular for high speaking rates where the usual acoustic analysis frame shift of 10 ms leads to an underestimation of the frequency of the longest pronunciation variants.
Document type :
Conference papers
Complete list of metadatas

Cited literature [27 references]  Display  Hide  Download

https://hal.inria.fr/hal-01184195
Contributor : Denis Jouvet <>
Submitted on : Thursday, August 13, 2015 - 11:26:59 AM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM
Long-term archiving on: Saturday, November 14, 2015 - 10:15:45 AM

File

main.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01184195, version 1

Collections

Citation

Denis Jouvet, Katarina Bartkova. Acoustical Frame Rate and Pronunciation Variant Statistics. International Conference on Statistical Language and Speech Processing, Nov 2015, Budapest, Hungary. ⟨hal-01184195⟩

Share

Metrics

Record views

435

Files downloads

229