Regroupement des occurrences des mots hors-vocabulaire répétés en vue de leur modélisation pour la transcription d'émissions radio

Frederik Stouten; Irina Illina; Dominique Fohr

Communication Dans Un Congrès Année : 2010

Regroupement des occurrences des mots hors-vocabulaire répétés en vue de leur modélisation pour la transcription d'émissions radio

(1) , (1) , (1)

Frederik Stouten

Fonction : Auteur

Analysis, perception and recognition of speech

Irina Illina

Fonction : Auteur
PersonId : 15663
IdHAL : irina-illina
IdRef : 120731746

Analysis, perception and recognition of speech

Dominique Fohr

Fonction : Auteur
PersonId : 15652
IdHAL : dominique-fohr
IdRef : 031092942

Analysis, perception and recognition of speech

Résumé

This paper describes a novel technique to cluster Out-Of-Vocabulary (OOV) word tokens in a LVCSR system used for transcribing broadcast news speech data. The system is composed of two blocks: (1) an OOV word detector and (2) a clustering module working on the detected OOV word segments. This combination allows a more reliable detection of repeated OOV words than would be possible with the OOV detector only. In the paper we focus our attention on the second part of the system i.e. the cluster-ing algorithm. This algorithm is based on the estimation of the entropy. The proposed algorithm gives better per-formance than a classical incremental clustering algorithm based on a distance threshold.

Mots clés

LVCSR OOV clustering

Domaines

Interface homme-machine [cs.HC]

Dominique Fohr : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00544140

Soumis le : mardi 7 décembre 2010-13:24:49

Dernière modification le : vendredi 24 mars 2023-14:52:53

Dates et versions

inria-00544140 , version 1 (07-12-2010)

Identifiants

HAL Id : inria-00544140 , version 1

Citer

Frederik Stouten, Irina Illina, Dominique Fohr. Regroupement des occurrences des mots hors-vocabulaire répétés en vue de leur modélisation pour la transcription d'émissions radio. 28ème Journées d'étude sur la parole - JEP'10, Université de Mons, May 2010, Mons, Belgique. ⟨inria-00544140⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA

69 Consultations

0 Téléchargements

Regroupement des occurrences des mots hors-vocabulaire répétés en vue de leur modélisation pour la transcription d'émissions radio

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager