Combining Forward-based and Backward-based Decoders for Improved Speech Recognition Performance

Denis Jouvet 1 Dominique Fohr 1
1 PAROLE - Analysis, perception and recognition of speech
Inria Nancy - Grand Est, LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Combining outputs of speech recognizers is a known way of increasing speech recognition performance. The ROVER approach handles efficiently such combinations. In this paper we show that the best performance is not achieved by combining the outputs of the best set of recognizers, but rather by combining outputs of recognizers that rely on different processing components, and in particular on a different order (backward vs. forward) for processing speech frames. Indeed, much better speech recognition results were obtained by combining outputs of sphinx-based recognizers with outputs of Julius-based recognizers than by combining the same number of outputs from only sphinx-based recognizers, even if the individual sphinx-based systems led to better results than the individual Julius-based recognizers. Further experiments have also been conducted using sphinx-based tools for processing speech frames in reverse order (i.e. backward in time). The results clearly show that combining forward-based and backward-based decoders provide significant improvement with respect to a combination of forward only or backward only decoders. Experiments have been conducted on the ESTER2 and ETAPE speech corpora. Overall, combining sphinx-based and Julius-based systems led to 18.6% word error rate on ESTER2 test data, and 24.5% word error rate on ETAPE test data.
Type de document :
Communication dans un congrès
InterSpeech - 14th Annual Conference of the International Speech Communication Association - 2013, Aug 2013, Lyon, France. 2013
Liste complète des métadonnées

https://hal.inria.fr/hal-00834282
Contributeur : Denis Jouvet <>
Soumis le : vendredi 14 juin 2013 - 15:54:54
Dernière modification le : jeudi 11 janvier 2018 - 06:25:24

Identifiants

  • HAL Id : hal-00834282, version 1

Collections

Citation

Denis Jouvet, Dominique Fohr. Combining Forward-based and Backward-based Decoders for Improved Speech Recognition Performance. InterSpeech - 14th Annual Conference of the International Speech Communication Association - 2013, Aug 2013, Lyon, France. 2013. 〈hal-00834282〉

Partager

Métriques

Consultations de la notice

296