HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Acoustical Frame Rate and Pronunciation Variant Statistics

Denis Jouvet 1 Katarina Bartkova 2
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Speech technology enables computing statistics on word pronunciation variants as well as investigating various phonetic phenomena. This is achieved through a forced alignment of large amounts of speech signals with their possible pronunciations variants. Such alignments are usually performed using a 10 ms frame shift acoustical analysis. Therefore , the three emitting state structure of conventional acoustic hidden Markov models introduces a minimum duration constraint of 30 ms for each phone segment. This constraint is not critical at low speaking rates, but may introduce artefacts at high speaking rates. Thus, this paper investigates the impact of the acoustical frame rate on corpus-based phonetic statistics. Statistics on pronunciation variants obtained with a shorter frame shift (5 ms) are compared to the statistics resulting from the standard 10 ms frame shift. Statistics are computed on a large speech corpus of more than 3 million running words, and are analyzed with respect to the estimated local speaking rate. Results exhibit some discrepancies between the two sets of statistics, in particular for high speaking rates where the usual acoustic analysis frame shift of 10 ms leads to an underestimation of the frequency of the longest pronunciation variants.
Document type :
Conference papers
Complete list of metadata

Cited literature [27 references]  Display  Hide  Download

Contributor : Denis Jouvet Connect in order to contact the contributor
Submitted on : Thursday, August 13, 2015 - 11:26:59 AM
Last modification on : Thursday, January 20, 2022 - 5:27:08 PM
Long-term archiving on: : Saturday, November 14, 2015 - 10:15:45 AM


Files produced by the author(s)


  • HAL Id : hal-01184195, version 1


Denis Jouvet, Katarina Bartkova. Acoustical Frame Rate and Pronunciation Variant Statistics. International Conference on Statistical Language and Speech Processing, Nov 2015, Budapest, Hungary. ⟨hal-01184195⟩



Record views


Files downloads