Speaker Identity Indexing In Audio-Visual Documents

Mbarek Charhad; Daniel Moraru; Stéphane Ayache; Georges Quénot

Communication Dans Un Congrès Année : 2005

Speaker Identity Indexing In Audio-Visual Documents

(1) , (2) , (3) , (3)

1
2
3

Mbarek Charhad

Fonction : Auteur
PersonId : 861488

Communication Langagière et Interaction Personne-Système

Daniel Moraru

Fonction : Auteur

Equipe GEOD, Groupe d'étude sur l'oral et le dialogue

Stéphane Ayache

Fonction : Auteur
PersonId : 16733
IdHAL : stephane-ayache
ORCID : 0000-0003-2982-7127
IdRef : 129313254

Modélisation et Recherche d’Information Multimédia [Grenoble]

Georges Quénot

Fonction : Auteur
PersonId : 3114
IdHAL : georges-quenot
ORCID : 0000-0003-2117-247X
IdRef : 034104518

Modélisation et Recherche d’Information Multimédia [Grenoble]

Résumé

The identity of persons in audiovisual documents represents very important semantic information for content-based indexing and retrieval. The task of speaker's identity detection can be carried out by exploiting data elements resulting from different modalities (text, image and audio). In this article, we propose an approach for speaker identity indexing in broadcast news using audio content. After a speaker segmentation phase, an identity is given to speech segments by applying linguistic patterns to their transcription from speech recognition. Three types of patterns are used to predict the speaker in the previous, current and next speech segments. Predictions are then propagated to other segments by similarity at the acoustic level. Evaluations have been conducted on part of the TREC 2003 corpus: a speaker identity could be assigned to 53% of the annotated corpus with an 82% precision.

Domaines

Recherche d'information [cs.IR]

Fichier principal

CBMI05.pdf (153.7 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Marie-Christine Fauvet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00953917

Soumis le : lundi 3 mars 2014-14:38:08

Dernière modification le : jeudi 4 avril 2024-21:40:59

Archivage à long terme le : dimanche 9 avril 2017-19:43:08

Dates et versions

hal-00953917 , version 1 (03-03-2014)

Identifiants

HAL Id : hal-00953917 , version 1

Citer

Mbarek Charhad, Daniel Moraru, Stéphane Ayache, Georges Quénot. Speaker Identity Indexing In Audio-Visual Documents. Content-Based Multimedia Indexing (CBMI2005), 2005, Riga, Latvia. ⟨hal-00953917⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA IMAG CNRS LIG LIG_TDCGE_MRIM LIG_SIDCH

241 Consultations

144 Téléchargements

Speaker Identity Indexing In Audio-Visual Documents

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager