Tree-Structured Named Entities Extraction from Competing Speech Transcriptions

Davy Weissenbacher 1, * Christian Raymond 1
* Corresponding author
1 LinkMedia - Creating and exploiting explicit links between multimedia fragments
IRISA-D6 - MEDIA ET INTERACTIONS, Inria Rennes – Bretagne Atlantique
Abstract : When real applications are working with automatic speech transcription, the first source of error does not originate from the incoher-ence in the analysis of the application but from the noise in the automatic transcriptions. This study presents a simple but effective method to generate a new transcription of better quality by combining utterances from competing transcriptions. We have extended a structured Named Entity (NE) recognizer submitted during the ETAPE Challenge. Working on French TV and Radio programs, our system revises the transcriptions provided by making use of the NEs it has detected. Our results suggest that combining the transcribed utterances which optimize the F-measures, rather than minimizing the WER scores, allows the generation of a better transcription for NE extraction. The results show a small but significant improvement of 0.9% SER against the baseline system on the ROVER transcription. These are the best performances reported to date on this corpus. Index Terms: speech transcription, structured named entities, multi-pass decoding. When real applications are working with automatic speech transcription, the first error does not originate from the incoherence in the analysis of the application , but from the noise of the automatic transcription outputs. With a rate often close to one in three words incorrect in the transcription, the quality of the preprocessing is low and, as a result, the output analysis of the application is often unexploitable. An explanation for this low performance of speech recog-nizers can be found in [8]. Little lexical and syntactic information is effectively used to enable the computation of the decoding of the acoustic output. More complex information are reintegrated in a second decoding pass where only the best sequences of words produced during the first pass are considered. The main contribution of this study is to present a simple but effective method to generate a new transcription of better quality by combining several competing transcriptions. Current Automatic Speech Recognition (ASR) systems rely on various strategies and/or resources to discover the original utterances pronounced. As a consequence, errors made by competing ASRs are different, which make the transcriptions complementary. The Rover method exploits such complementarity to recombine several transcriptions and output a
Document type :
Conference papers
Liste complète des métadonnées

Cited literature [20 references]  Display  Hide  Download

https://hal.inria.fr/hal-01196808
Contributor : Christian Raymond <>
Submitted on : Thursday, September 10, 2015 - 2:27:29 PM
Last modification on : Thursday, November 15, 2018 - 11:58:51 AM
Document(s) archivé(s) le : Monday, December 28, 2015 - 11:55:51 PM

File

NLDB2015.pdf
Files produced by the author(s)

Identifiers

Citation

Davy Weissenbacher, Christian Raymond. Tree-Structured Named Entities Extraction from Competing Speech Transcriptions. International Conference on Application of Natural Language to Information Systems, Jun 2015, Passau, Germany. ⟨10.1007/978-3-319-19581-0_22⟩. ⟨hal-01196808⟩

Share

Metrics

Record views

449

Files downloads

88