Extraction of formants of oral vowels and critical analysis for speaker characterization - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 1994

Extraction of formants of oral vowels and critical analysis for speaker characterization

Odile Mella

Résumé

Methods for achieving automatic speaker recognition may be classified into two categories : pattern recognition based approaches that implicitly use interspeaker and intraspeaker variability of speech and approaches which explicitly take into account the sources of interspeaker and intraspeaker differences. The latter examine linguistic units in order to extract features which are relevant for speaker characterization. The aim of the present paper is precisely to study the relative effectiveness of the first three formants of differents French vowels for speaker characterization. As a part of a larger set of preselected acoustic and phonetic parameters, the seven French vowels : / i /, / e /, / E /, / 0 /, / a /, / O /, / u /, with a neutral bilabial previous context / p /, / b / and a lengthening subsequent context / R /, have been studied. For that purpose, we have recorded and digitalized a set of seventeen sentences, uttered four times by ten male speakers coming from the same region. In order to isolate the trigrams / p-vowel-R / and / b-vowel-R /, we have hand labeled the sentences according to strict rules. We have then established an automatic method to determine very reliable values of the three frequencies of the first formants of selected vowels. The retained frequencies were then used to conduct a speaker identification experiment. Its aim was to identify an unknown speaker from a group of ten known speakers by using his utterance of a given vowel. To this end, a speaker was represented by a vector of one, two or three formant frequencies or by a vector of one, two or three differences between two formant frequencies. For each vowel and for each type of vector, i.e. each combination of formant frequencies, three "relevance indicators" have been computed, i.e. the global speaker identification rate, the sum of recognition ranks of every speaker and the ratio of intraspeaker to interspeaker inertia. These indicators have been established for five kinds of weighting distance among which a perceptual one. In the first part of this paper, we present our methodology to evaluate the formant frequencies of the vowels and we discuss the reliability of the results. In the second one, we examine the relative effectiveness of every vowel for each combination of formant frequencies by focusing on an interpretation of the results with respect to speech production process. We also compare our results with those obtained in normalization studies, in particular with the non-uniform female/male formant frequency ratios (ki).
Fichier principal
Vignette du fichier
complet2.pdf (165.94 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01739608 , version 1 (21-03-2018)

Identifiants

  • HAL Id : hal-01739608 , version 1

Citer

Odile Mella. Extraction of formants of oral vowels and critical analysis for speaker characterization. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, Apr 1994, Martigny, Switzerland. pp.193-196. ⟨hal-01739608⟩
91 Consultations
149 Téléchargements

Partager

Gmail Facebook X LinkedIn More