Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation

Camille Guinaudeau 1 Guillaume Gravier 1 Pascale Sébillot 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Transcript-based topic segmentation of TV programs faces several difficulties arising from transcription errors, from the presence of potentially short segments and from the limited number of word repetitions to enforce lexical cohesion, i.e., lexical relations that exist within a text to provide a certain unity. To overcome these problems, we extend a probabilistic measure of lexical cohesion based on generalized probabilities with a unigram language model. On the one hand, confidence measures and semantic relations are considered as additional sources of information. On the other hand, language model interpolation techniques are investigated for better language model estimation. Experimental topic segmentation results are presented on two corpora with distinct characteristics, composed respectively of broadcast news and reports on current affairs. Significant improvements are obtained on both corpora, demonstrating the effectiveness of the extended lexical cohesion measure for spoken TV contents as well as its genericity over different programs.
Type de document :
Article dans une revue
Computer Speech and Language, Elsevier, 2012, 26 (2), pp.90-104
Liste complète des métadonnées

Littérature citée [32 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00645705
Contributeur : Guillaume Gravier <>
Soumis le : mercredi 30 novembre 2011 - 13:58:27
Dernière modification le : jeudi 12 juillet 2018 - 12:32:08
Document(s) archivé(s) le : jeudi 1 mars 2012 - 02:20:36

Fichier

guinaudeau.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00645705, version 1

Citation

Camille Guinaudeau, Guillaume Gravier, Pascale Sébillot. Enhancing lexical cohesion measure with confidence measures, semantic relations and language model interpolation for multimedia spoken content topic segmentation. Computer Speech and Language, Elsevier, 2012, 26 (2), pp.90-104. 〈hal-00645705〉

Partager

Métriques

Consultations de la notice

1646

Téléchargements de fichiers

320