Extension du vocabulaire d’un système de transcription avec de nouveaux noms propres en utilisant un corpus diachronique

Irina Illina 1 Dominique Fohr 1 Georges Linarès 2
1 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Proper names are usually keys to understand the information contained in a document. Our work focuses on increasing the vocabulary size of a speech transcription system by automatically retrieving proper names from contemporary diachronic text documents. We proposed methods that dynamically augment the automatic speech recognition system vocabulary, using lexical and temporal features. We assume that the same proper names frequently appear in documents relating to the same time period. We studied a method based on Mutual Information and we proposed a new method based on cosine similarity to retrieve new proper names. In this new method, proper name context is represented by vector space model (Bag of Words). We also studied different metrics for proper name selection in order to limit the vocabulary augmentation and therefore the impact on the ASR performances. Recognition results show a significant reduction of the word error rate using augmented vocabulary with retrieved proper names.
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [13 references]  Display  Hide  Download

https://hal.inria.fr/hal-01092214
Contributor : Dominique Fohr <>
Submitted on : Monday, December 8, 2014 - 2:19:42 PM
Last modification on : Wednesday, April 3, 2019 - 1:23:03 AM
Document(s) archivé(s) le : Monday, March 9, 2015 - 11:45:12 AM

File

jep2014Fr.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01092214, version 1

Citation

Irina Illina, Dominique Fohr, Georges Linarès. Extension du vocabulaire d’un système de transcription avec de nouveaux noms propres en utilisant un corpus diachronique. Journées d'Etude sur la parole, Jun 2014, Le Mans, France. ⟨hal-01092214⟩

Share

Metrics

Record views

321

Files downloads

122