Speaker normalization for template based speech recognition

Sébastien Demange 1 Dirk Van Compernolle
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Vocal Tract Length Normalization (VTLN) has been shown to be an efficient speaker normalization tool for HMM based systems. In this paper we show that it is equally efficient for a template based recognition system. Template based systems, while promising, have as potential drawback that templates maintain all non phonetic details apart from the essential phonemic properties; i.e. they retain information on speaker and acoustic recording circumstances. This may lead to a very inefficient usage of the database. We show that after VTLN significantly more speakers - also from opposite gender - contribute templates to the matching sequence compared to the non-normalized case. In experiments on the Wall Street Journal database this leads to a relative word error rate reduction of 10%.
Type de document :
Communication dans un congrès
10th Annual Conference of the International Speech Communication Association - Interspeech 2009, Sep 2009, Brighton, United Kingdom. pp.560--563, 2009
Liste complète des métadonnées

https://hal.inria.fr/inria-00583853
Contributeur : Sébastien Demange <>
Soumis le : mercredi 6 avril 2011 - 17:57:27
Dernière modification le : jeudi 11 janvier 2018 - 06:19:56

Identifiants

  • HAL Id : inria-00583853, version 1

Collections

Citation

Sébastien Demange, Dirk Van Compernolle. Speaker normalization for template based speech recognition. 10th Annual Conference of the International Speech Communication Association - Interspeech 2009, Sep 2009, Brighton, United Kingdom. pp.560--563, 2009. 〈inria-00583853〉

Partager

Métriques

Consultations de la notice

57