Proper Name Retrieval from Diachronic Documents for Automatic Speech Transcription using Lexical and Temporal Context - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Proper Name Retrieval from Diachronic Documents for Automatic Speech Transcription using Lexical and Temporal Context

Irina Illina
Dominique Fohr
Georges Linarès

Résumé

Proper names are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving new proper names from contemporary diachronic text documents. The idea is to use in-vocabulary proper names as an anchor to collect new linked proper names from the diachronic corpus. Our assumption is that time is an important feature for capturing name-to-context dependencies, that was confirmed by temporal mismatch experiments. We studied a method based on Mutual Information and proposed a new method based on cosine-similarity measure that dynamically augment the automatic speech recognition system vocabulary. Recognition results show a significant reduction of the word error rate using augmented vocabulary for broadcast news transcription.
Fichier principal
Vignette du fichier
slam2014_final.pdf (94.28 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01092224 , version 1 (08-12-2014)

Identifiants

  • HAL Id : hal-01092224 , version 1

Citer

Irina Illina, Dominique Fohr, Georges Linarès. Proper Name Retrieval from Diachronic Documents for Automatic Speech Transcription using Lexical and Temporal Context. Workshop on Speech, Language and Audio in Multimedia, Sep 2014, Penang, Malaysia. ⟨hal-01092224⟩
235 Consultations
162 Téléchargements

Partager

Gmail Facebook X LinkedIn More