Temporal and Lexical Context of Diachronic Text Documents for Automatic Out-Of-Vocabulary Proper Name Retrieval

Irina Illina 1 Dominique Fohr 1 Georges Linares 2 Imane Nkairi 1
1 MULTISPEECH - Speech Modeling for Facilitating Oral-Based Communication
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Proper name recognition is a challenging task in information retrieval from large audio/video databases. Proper names are semantically rich and are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving proper names from contemporary diachronic text documents. We proposed methods that dynamically augment the automatic speech recognition system vocabulary using lexical and temporal features in diachronic documents. We also studied different metrics for proper name selection in order to limit the vocabulary augmentation and therefore the impact on the ASR performances. Recognition results show a significant reduction of the proper name error rate using an augmented vocabulary.
Type de document :
Chapitre d'ouvrage
Zygmunt Vetulani; Hans Uszkoreit; Marek Kubis Human Language Technology. Challenges for Computer Science and Linguistics, 9561, Springer, pp.41-54, 2016, Lecture Notes in Computer Science, 978-3-319-43808-5. 〈10.1007/978-3-319-43808-5_4〉
Liste complète des métadonnées

https://hal.inria.fr/hal-01475080
Contributeur : Irina Illina <>
Soumis le : lundi 27 février 2017 - 16:02:37
Dernière modification le : vendredi 26 janvier 2018 - 10:47:11
Document(s) archivé(s) le : dimanche 28 mai 2017 - 12:17:52

Fichier

LNAI_2015_VocabularyIncreasing...
Fichiers produits par l'(les) auteur(s)

Identifiants

Collections

Citation

Irina Illina, Dominique Fohr, Georges Linares, Imane Nkairi. Temporal and Lexical Context of Diachronic Text Documents for Automatic Out-Of-Vocabulary Proper Name Retrieval. Zygmunt Vetulani; Hans Uszkoreit; Marek Kubis Human Language Technology. Challenges for Computer Science and Linguistics, 9561, Springer, pp.41-54, 2016, Lecture Notes in Computer Science, 978-3-319-43808-5. 〈10.1007/978-3-319-43808-5_4〉. 〈hal-01475080〉

Partager

Métriques

Consultations de la notice

488

Téléchargements de fichiers

78