Unsupervised naming of speakers in broadcast TV: using written names, pronounced names or both ?

Abstract : Persons identification in video from TV broadcast is a valuable tool for indexing them. However, the use of biometric mod- els is not a very sustainable option without a priori knowledge of people present in the videos. The pronounced names (PN) or written names (WN) on the screen can provide hypotheses names for speakers. We propose an experimental comparison of the potential of these two modalities (names pronounced or written) to extract the true names of the speakers. The names pronounced offer many instances of citation but transcription and named-entity detection errors halved the potential of this modality. On the contrary, the written names detection benefits of the video quality improvement and is nowadays rather robust and efficient to name speakers. Oracle experiments presented for the mapping between written names and speakers also show the complementarity of both PN and WN modalities.
Document type :
Conference papers
Complete list of metadatas

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/hal-00953088
Contributor : Marie-Christine Fauvet <>
Submitted on : Monday, March 3, 2014 - 4:34:16 PM
Last modification on : Tuesday, September 17, 2019 - 1:13:51 AM
Long-term archiving on : Saturday, May 31, 2014 - 10:45:40 AM

File

POIGNANT--INTERSPEECH--2013.pd...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00953088, version 1

Citation

Johann Poignant, Laurent Besacier, Viet Bac Le, Sophie Rosset, Georges Quénot. Unsupervised naming of speakers in broadcast TV: using written names, pronounced names or both ?. the 14rd Annual Conference of the International Speech Communication Association, INTERSPEECH, 2013, Lyon, France. ⟨hal-00953088⟩

Share

Metrics

Record views

419

Files downloads

173