A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription

Denis Jouvet; David Langlois

Communication Dans Un Congrès Année : 2013

A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription

(1) , (1)

Denis Jouvet

Fonction : Auteur
PersonId : 15904
IdHAL : denis-jouvet
IdRef : 029418666

Analysis, perception and recognition of speech

David Langlois

Fonction : Auteur
PersonId : 298
IdHAL : david-langlois
IdRef : 070239509

Analysis, perception and recognition of speech

Résumé

This paper introduces a new approach based on neural networks for selecting the vocabulary to be used in a speech transcription system. Indeed, nowadays, large sets of text data can be collected from web sources, and used in addition to more traditional text sources for building language models for speech transcription systems. However, web data sources lead to large amounts of heterogeneous data, and, as a consequence, standard vocabulary selection procedures based on unigram approaches tend to select unwanted and undesirable items as new words. As an alternative to unigram-based and empirical manual-based selection approaches, this paper proposes a new selection procedure that relies on a machine learning technique, namely neural networks. The paper presents and discusses the results obtained with the various selection procedures. The neural network based selection experiments are promising and they can handle automatically various detailed information in the selection process.

Mots clés

vocabulary selection neural network language modeling speech transcription speech recognition

Domaines

Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Denis Jouvet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00834302

Soumis le : vendredi 14 juin 2013-16:22:08

Dernière modification le : lundi 11 septembre 2023-17:41:19

Dates et versions

hal-00834302 , version 1 (14-06-2013)

Identifiants

HAL Id : hal-00834302 , version 1

Citer

Denis Jouvet, David Langlois. A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription. TSD - 16th International Conference on Text, Speech and Dialogue - 2013, Sep 2013, Pilsen, Czech Republic. pp.60-67. ⟨hal-00834302⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD

143 Consultations

0 Téléchargements

A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager