Temporal and Lexical Context of Diachronic Text Documents for Automatic Out-Of-Vocabulary Proper Name Retrieval - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Chapitre D'ouvrage Année : 2016

Temporal and Lexical Context of Diachronic Text Documents for Automatic Out-Of-Vocabulary Proper Name Retrieval

Résumé

Proper name recognition is a challenging task in information retrieval from large audio/video databases. Proper names are semantically rich and are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving proper names from contemporary diachronic text documents. We proposed methods that dynamically augment the automatic speech recognition system vocabulary using lexical and temporal features in diachronic documents. We also studied different metrics for proper name selection in order to limit the vocabulary augmentation and therefore the impact on the ASR performances. Recognition results show a significant reduction of the proper name error rate using an augmented vocabulary.
Fichier principal
Vignette du fichier
LNAI_2015_VocabularyIncreasing_v1808.pdf (131.88 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01475080 , version 1 (27-02-2017)

Identifiants

Citer

Irina Illina, Dominique Fohr, Georges Linares, Imane Nkairi. Temporal and Lexical Context of Diachronic Text Documents for Automatic Out-Of-Vocabulary Proper Name Retrieval. Zygmunt Vetulani; Hans Uszkoreit; Marek Kubis Human Language Technology. Challenges for Computer Science and Linguistics, 9561, Springer, pp.41-54, 2016, Lecture Notes in Computer Science, 978-3-319-43808-5. ⟨10.1007/978-3-319-43808-5_4⟩. ⟨hal-01475080⟩
433 Consultations
208 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More