Topic segmentation of TV-streams by watershed transform and vectorization - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Article Dans Une Revue Computer Speech and Language Année : 2015

Topic segmentation of TV-streams by watershed transform and vectorization

Résumé

A fine-grained segmentation of Radio or TV broadcasts is an essential step for most multimedia processing tasks. Applying segmentation algorithms to the speech transcripts seems straightforward. Yet, most of these algorithms are not suited when dealing with short segments or noisy data. In this paper, we present a new segmentation technique inspired from the image analysis field and relying on a new way to compute similarities between candidate segments called Vectorization. Vectorization makes it possible to match text segments that do not share common words; this property is shown to be particularly useful when dealing with transcripts in which transcription errors and short segments makes the segmentation difficult. This new topic segmen-tation technique is evaluated on two corpora of transcripts from French TV broadcasts on which it largely outperforms other existing approaches from the state-of-the-art.
Fichier principal
Vignette du fichier
csl2015.pdf (988.59 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00998259 , version 1 (13-11-2019)

Identifiants

Citer

Vincent Claveau, Sébastien Lefèvre. Topic segmentation of TV-streams by watershed transform and vectorization. Computer Speech and Language, 2015, 29 (1), pp.63-80. ⟨10.1016/j.csl.2014.04.006⟩. ⟨hal-00998259⟩
246 Consultations
138 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More