Non-linear vector interpolation by neural network for phoneme identification in continuous speech

Abstract : The coorelation between vectors in a sequence of analysis frames are supposed to be specific to phonetic units in acoustic-phonetic decoding of speech. We propose non-linear vector interpolation techniques to represent this correlation and to recognize phonemes. The interpolation is based on the decomposition of a frame sequence into two parts and on the construction of a function that interpolates one part using information from the second part. According to quantities to be interpolated, three families of interpolator models are developed. In a recognition system, each phonetic symbol is associated with a non-linear vector interpolator which is trained to give minimum interpolation error for that specific phoneme. Multi-layer feedforward neural networks are used to implement the non-linear vector interpolators. For a continuous speech phoneme spotting test using 16 LPCC-derived cepstrum coefficients as parametric vectors, the three categories of models gave compatible results. Vector-pair interpolator models yielded best recognition rate. Compared to a VQ-coded reference technique, this model gives close global recognition rate and significatly outperforms for plosive sounds.
Type de document :
Rapport
[Research Report] RR-1457, INRIA. 1991
Liste complète des métadonnées

https://hal.inria.fr/inria-00075104
Contributeur : Rapport de Recherche Inria <>
Soumis le : mercredi 24 mai 2006 - 17:25:41
Dernière modification le : samedi 17 septembre 2016 - 01:06:49
Document(s) archivé(s) le : mardi 12 avril 2011 - 21:09:14

Fichiers

Identifiants

  • HAL Id : inria-00075104, version 1

Collections

Citation

Yifan Gong, Jean-Paul Haton. Non-linear vector interpolation by neural network for phoneme identification in continuous speech. [Research Report] RR-1457, INRIA. 1991. 〈inria-00075104〉

Partager

Métriques

Consultations de la notice

189

Téléchargements de fichiers

46