Speaker-Dependent Emotion Recognition For Audio Document Indexing

Xuan Hung Le 1 Georges Quénot 1 Eric Castelli 2
1 MRIM - Modélisation et Recherche d’Information Multimédia [Grenoble]
LIG - Laboratoire d'Informatique de Grenoble, Inria - Institut National de Recherche en Informatique et en Automatique
Abstract : The researches of the emotions are currently great interest in speech processing as well as in human-machine interaction domain. In the recent years, more and more of researches relating to emotion synthesis or emotion recognition are developed for the different purposes. Each approach uses its methods and its various parameters measured on the speech signal. In this paper, we proposed using a short-time parameter: MFCC coefficients (Mel-Frequency Cepstrum Coefficients) and a simple but efficient classifying method: Vector Quantification (VQ) for speaker-dependent emotion recognition. Many other features: energy, pitch, zero crossing, phonetic rate, LPC� and their derivatives are also tested and combined with MFCC coefficients in order to find the best combination. The other models: GMM and HMM (Discrete and Continuous Hidden Markov Model) are studied as well in the hope that the usage of continuous distribution and the temporal behaviour of this set of features will improve the quality of emotion recognition. The maximum accuracy recognizing five different emotions exceeds 88% by using only MFCC coefficients with VQ model. This is a simple but efficient approach, the result is even much better than those obtained with the same database in human evaluation by listening and judging without returning permission nor comparison between sentences [8]; And this result is positively comparable with the other approaches.
keyword : Emotion Recognition
Type de document :
Communication dans un congrès
International Conference on Electronics, Information, 2004, Unknown, 2004
Liste complète des métadonnées

Contributeur : Marie-Christine Fauvet <>
Soumis le : vendredi 28 février 2014 - 16:06:42
Dernière modification le : jeudi 11 janvier 2018 - 06:23:16


  • HAL Id : hal-00953924, version 1



Xuan Hung Le, Georges Quénot, Eric Castelli. Speaker-Dependent Emotion Recognition For Audio Document Indexing. International Conference on Electronics, Information, 2004, Unknown, 2004. 〈hal-00953924〉



Consultations de la notice