Temporal and Lexical Context of Diachronic Text Documents for Automatic Out-Of-Vocabulary Proper Name Retrieval

Irina Illina 1 Dominique Fohr 1 Georges Linares 2 Imane Nkairi 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Proper name recognition is a challenging task in information retrieval from large audio/video databases. Proper names are semantically rich and are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving proper names from contemporary diachronic text documents. We proposed methods that dynamically augment the automatic speech recognition system vocabulary using lexical and temporal features in diachronic documents. We also studied different metrics for proper name selection in order to limit the vocabulary augmentation and therefore the impact on the ASR performances. Recognition results show a significant reduction of the proper name error rate using an augmented vocabulary.
Document type :
Book sections
Complete list of metadatas

https://hal.inria.fr/hal-01475080
Contributor : Irina Illina <>
Submitted on : Monday, February 27, 2017 - 4:02:37 PM
Last modification on : Tuesday, January 14, 2020 - 10:38:06 AM
Long-term archiving on: Sunday, May 28, 2017 - 12:17:52 PM

File

LNAI_2015_VocabularyIncreasing...
Files produced by the author(s)

Identifiers

Citation

Irina Illina, Dominique Fohr, Georges Linares, Imane Nkairi. Temporal and Lexical Context of Diachronic Text Documents for Automatic Out-Of-Vocabulary Proper Name Retrieval. Zygmunt Vetulani; Hans Uszkoreit; Marek Kubis Human Language Technology. Challenges for Computer Science and Linguistics, 9561, Springer, pp.41-54, 2016, Lecture Notes in Computer Science, 978-3-319-43808-5. ⟨10.1007/978-3-319-43808-5_4⟩. ⟨hal-01475080⟩

Share

Metrics

Record views

680

Files downloads

198