Speaker Identity Indexing In Audio-Visual Documents

Abstract : The identity of persons in audiovisual documents represents very important semantic information for content-based indexing and retrieval. The task of speaker's identity detection can be carried out by exploiting data elements resulting from different modalities (text, image and audio). In this article, we propose an approach for speaker identity indexing in broadcast news using audio content. After a speaker segmentation phase, an identity is given to speech segments by applying linguistic patterns to their transcription from speech recognition. Three types of patterns are used to predict the speaker in the previous, current and next speech segments. Predictions are then propagated to other segments by similarity at the acoustic level. Evaluations have been conducted on part of the TREC 2003 corpus: a speaker identity could be assigned to 53% of the annotated corpus with an 82% precision.
Type de document :
Communication dans un congrès
Content-Based Multimedia Indexing (CBMI2005), 2005, Riga, Latvia. 2005
Liste complète des métadonnées

Littérature citée [14 références]  Voir  Masquer  Télécharger

Contributeur : Marie-Christine Fauvet <>
Soumis le : lundi 3 mars 2014 - 14:38:08
Dernière modification le : jeudi 11 octobre 2018 - 08:48:03
Document(s) archivé(s) le : dimanche 9 avril 2017 - 19:43:08


Fichiers produits par l'(les) auteur(s)


  • HAL Id : hal-00953917, version 1


Mbarek Charhad, Daniel Moraru, Stéphane Ayache, Georges Quénot. Speaker Identity Indexing In Audio-Visual Documents. Content-Based Multimedia Indexing (CBMI2005), 2005, Riga, Latvia. 2005. 〈hal-00953917〉



Consultations de la notice


Téléchargements de fichiers