Skip to Main content Skip to Navigation
Conference papers

Speaker Identity Indexing In Audio-Visual Documents

Abstract : The identity of persons in audiovisual documents represents very important semantic information for content-based indexing and retrieval. The task of speaker's identity detection can be carried out by exploiting data elements resulting from different modalities (text, image and audio). In this article, we propose an approach for speaker identity indexing in broadcast news using audio content. After a speaker segmentation phase, an identity is given to speech segments by applying linguistic patterns to their transcription from speech recognition. Three types of patterns are used to predict the speaker in the previous, current and next speech segments. Predictions are then propagated to other segments by similarity at the acoustic level. Evaluations have been conducted on part of the TREC 2003 corpus: a speaker identity could be assigned to 53% of the annotated corpus with an 82% precision.
Document type :
Conference papers
Complete list of metadata

Cited literature [14 references]  Display  Hide  Download

https://hal.inria.fr/hal-00953917
Contributor : Marie-Christine Fauvet <>
Submitted on : Monday, March 3, 2014 - 2:38:08 PM
Last modification on : Tuesday, December 8, 2020 - 10:42:35 AM
Long-term archiving on: : Sunday, April 9, 2017 - 7:43:08 PM

File

CBMI05.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00953917, version 1

Collections

Citation

Mbarek Charhad, Daniel Moraru, Stéphane Ayache, Georges Quénot. Speaker Identity Indexing In Audio-Visual Documents. Content-Based Multimedia Indexing (CBMI2005), 2005, Riga, Latvia. ⟨hal-00953917⟩

Share

Metrics

Record views

480

Files downloads

317