Skip to Main content Skip to Navigation
Conference papers

Speaker Identity Indexing In Audio-Visual Documents

Abstract : The identity of persons in audiovisual documents represents very important semantic information for content-based indexing and retrieval. The task of speaker's identity detection can be carried out by exploiting data elements resulting from different modalities (text, image and audio). In this article, we propose an approach for speaker identity indexing in broadcast news using audio content. After a speaker segmentation phase, an identity is given to speech segments by applying linguistic patterns to their transcription from speech recognition. Three types of patterns are used to predict the speaker in the previous, current and next speech segments. Predictions are then propagated to other segments by similarity at the acoustic level. Evaluations have been conducted on part of the TREC 2003 corpus: a speaker identity could be assigned to 53% of the annotated corpus with an 82% precision.
Document type :
Conference papers
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download
Contributor : Marie-Christine Fauvet Connect in order to contact the contributor
Submitted on : Monday, March 3, 2014 - 2:38:08 PM
Last modification on : Wednesday, July 6, 2022 - 4:20:56 AM
Long-term archiving on: : Sunday, April 9, 2017 - 7:43:08 PM


Files produced by the author(s)


  • HAL Id : hal-00953917, version 1



Mbarek Charhad, Daniel Moraru, Stéphane Ayache, Georges Quénot. Speaker Identity Indexing In Audio-Visual Documents. Content-Based Multimedia Indexing (CBMI2005), 2005, Riga, Latvia. ⟨hal-00953917⟩



Record views


Files downloads