Deep-neural network approaches for speech recognition with heterogeneous groups of speakers including children

Abstract : This paper introduces deep neural network (DNN)-hidden Markov model (HMM) based methods to tackle speech recognition in heterogeneous groups of speakers including children. We target three speaker groups consisting of children, adult males and adult females. Two different kinds of approaches are introduced here: approaches based on DNN adaptation and approaches relying on vocal-tract length normalisation (VTLN). First, the recent approach that consists in adapting a general DNN to domain/language specific data is extended to target age/gender groups in the context of DNN-HMM. Then, VTLN is investigated by training a DNN-HMM system by using either mel frequency cepstral coefficients (MFCC) normalised with standard VTLN or MFCC derived acoustic features combined with the posterior probabilities of the VTLN warping factors. In this later, novel, approach the posterior probabilities of the warping factors are obtained with a separate DNN and the decoding can be operated in a single pass when standard VTLN approach requires two decoding passes. Finally, the different approaches presented here are combined to take advantage of their complementarity. System combination approaches are shown to improve the baseline phone error rate performance by 30% to 35% relative and the baseline word error rate performance by about 10% relative.
Type de document :
Article dans une revue
Natural Language Engineering, Cambridge University Press (CUP), 2016, 1, pp.0 - 0
Liste complète des métadonnées

Littérature citée [54 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01390905
Contributeur : Romain Serizel <>
Soumis le : mardi 8 novembre 2016 - 13:28:55
Dernière modification le : jeudi 11 janvier 2018 - 06:23:39
Document(s) archivé(s) le : mardi 14 mars 2017 - 22:24:25

Fichier

15-1.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-01390905, version 1

Citation

Romain Serizel, Diego Giuliani. Deep-neural network approaches for speech recognition with heterogeneous groups of speakers including children. Natural Language Engineering, Cambridge University Press (CUP), 2016, 1, pp.0 - 0. 〈hal-01390905〉

Partager

Métriques

Consultations de la notice

163

Téléchargements de fichiers

195