Extraction of formants of oral vowels and critical analysis for speaker characterization

Odile Mella 1
1 PAROLE - Analysis, perception and recognition of speech
INRIA Lorraine, LORIA - Laboratoire Lorrain de Recherche en Informatique et ses Applications
Abstract : Methods for achieving automatic speaker recognition may be classified into two categories : pattern recognition based approaches that implicitly use interspeaker and intraspeaker variability of speech and approaches which explicitly take into account the sources of interspeaker and intraspeaker differences. The latter examine linguistic units in order to extract features which are relevant for speaker characterization. The aim of the present paper is precisely to study the relative effectiveness of the first three formants of differents French vowels for speaker characterization. As a part of a larger set of preselected acoustic and phonetic parameters, the seven French vowels : / i /, / e /, / E /, / 0 /, / a /, / O /, / u /, with a neutral bilabial previous context / p /, / b / and a lengthening subsequent context / R /, have been studied. For that purpose, we have recorded and digitalized a set of seventeen sentences, uttered four times by ten male speakers coming from the same region. In order to isolate the trigrams / p-vowel-R / and / b-vowel-R /, we have hand labeled the sentences according to strict rules. We have then established an automatic method to determine very reliable values of the three frequencies of the first formants of selected vowels. The retained frequencies were then used to conduct a speaker identification experiment. Its aim was to identify an unknown speaker from a group of ten known speakers by using his utterance of a given vowel. To this end, a speaker was represented by a vector of one, two or three formant frequencies or by a vector of one, two or three differences between two formant frequencies. For each vowel and for each type of vector, i.e. each combination of formant frequencies, three "relevance indicators" have been computed, i.e. the global speaker identification rate, the sum of recognition ranks of every speaker and the ratio of intraspeaker to interspeaker inertia. These indicators have been established for five kinds of weighting distance among which a perceptual one. In the first part of this paper, we present our methodology to evaluate the formant frequencies of the vowels and we discuss the reliability of the results. In the second one, we examine the relative effectiveness of every vowel for each combination of formant frequencies by focusing on an interpretation of the results with respect to speech production process. We also compare our results with those obtained in normalization studies, in particular with the non-uniform female/male formant frequency ratios (ki).
Document type :
Conference papers
Liste complète des métadonnées

https://hal.inria.fr/hal-01739608
Contributor : Odile Mella <>
Submitted on : Wednesday, March 21, 2018 - 11:16:19 AM
Last modification on : Thursday, March 22, 2018 - 12:11:03 PM
Document(s) archivé(s) le : Thursday, September 13, 2018 - 8:39:51 AM

File

complet2.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01739608, version 1

Collections

Citation

Odile Mella. Extraction of formants of oral vowels and critical analysis for speaker characterization. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, Apr 1994, Martigny, Switzerland. pp.193-196. ⟨hal-01739608⟩

Share

Metrics

Record views

65

Files downloads

36