3527 articles – 5249 Notices  [english version]

inria-00583853, version 1

Speaker normalization for template based speech recognition

Sébastien Demange () 1, Dirk Van Compernolle a

10th Annual Conference of the International Speech Communication Association - Interspeech 2009 (2009) 560--563

Résumé : Vocal Tract Length Normalization (VTLN) has been shown to be an efficient speaker normalization tool for HMM based systems. In this paper we show that it is equally efficient for a template based recognition system. Template based systems, while promising, have as potential drawback that templates maintain all non phonetic details apart from the essential phonemic properties; i.e. they retain information on speaker and acoustic recording circumstances. This may lead to a very inefficient usage of the database. We show that after VTLN significantly more speakers - also from opposite gender - contribute templates to the matching sequence compared to the non-normalized case. In experiments on the Wall Street Journal database this leads to a relative word error rate reduction of 10%.

  • a –  Katholieke Universiteit Leuven
  • 1 :  PAROLE (INRIA Lorraine - LORIA)
  • INRIA – CNRS : UMR7503 – Université Henri Poincaré - Nancy I – Université Nancy II – Institut National Polytechnique de Lorraine (INPL)
  • Domaine : Informatique/Informatique et langage
 
  • inria-00583853, version 1
  • oai:hal.inria.fr:inria-00583853
  • Contributeur : 
  • Soumis le : Mercredi 6 Avril 2011, 17:57:27
  • Dernière modification le : Jeudi 7 Avril 2011, 11:08:37