Proper Name Retrieval from Diachronic Documents for Automatic Speech Transcription using Lexical and Temporal Context

Irina Illina 1 Dominique Fohr 1 Georges Linarès 2
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Proper names are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving new proper names from contemporary diachronic text documents. The idea is to use in-vocabulary proper names as an anchor to collect new linked proper names from the diachronic corpus. Our assumption is that time is an important feature for capturing name-to-context dependencies, that was confirmed by temporal mismatch experiments. We studied a method based on Mutual Information and proposed a new method based on cosine-similarity measure that dynamically augment the automatic speech recognition system vocabulary. Recognition results show a significant reduction of the word error rate using augmented vocabulary for broadcast news transcription.
Type de document :
Communication dans un congrès
Workshop on Speech, Language and Audio in Multimedia, Sep 2014, Penang, Malaysia
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01092224
Contributeur : Dominique Fohr <>
Soumis le : lundi 8 décembre 2014 - 14:25:09
Dernière modification le : vendredi 26 janvier 2018 - 10:47:06
Document(s) archivé(s) le : lundi 9 mars 2015 - 11:45:51

Fichier

slam2014_final.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01092224, version 1

Collections

Citation

Irina Illina, Dominique Fohr, Georges Linarès. Proper Name Retrieval from Diachronic Documents for Automatic Speech Transcription using Lexical and Temporal Context. Workshop on Speech, Language and Audio in Multimedia, Sep 2014, Penang, Malaysia. 〈hal-01092224〉

Partager

Métriques

Consultations de la notice

354

Téléchargements de fichiers

131