Abstract : Proper names are usually keys to understand the information contained in a document. Our work focuses on increasing the vocabulary size of a speech transcription system by automatically retrieving proper names from contemporary diachronic text documents. We proposed methods that dynamically augment the automatic speech recognition system vocabulary, using lexical and temporal features. We assume that the same proper names frequently appear in documents relating to the same time period. We studied a method based on Mutual Information and we proposed a new method based on cosine similarity to retrieve new proper names. In this new method, proper name context is represented by vector space model (Bag of Words). We also studied different metrics for proper name selection in order to limit the vocabulary augmentation and therefore the impact on the ASR performances. Recognition results show a significant reduction of the word error rate using augmented vocabulary with retrieved proper names.
https://hal.inria.fr/hal-01092214
Contributor : Dominique Fohr <>
Submitted on : Monday, December 8, 2014 - 2:19:42 PM Last modification on : Tuesday, January 14, 2020 - 10:38:05 AM Long-term archiving on: : Monday, March 9, 2015 - 11:45:12 AM
Irina Illina, Dominique Fohr, Georges Linarès. Extension du vocabulaire d’un système de transcription avec de nouveaux noms propres en utilisant un corpus diachronique. Journées d'Etude sur la parole, Jun 2014, Le Mans, France. ⟨hal-01092214⟩