Learnable MFCCs for Speaker Verification - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Learnable MFCCs for Speaker Verification

Résumé

We propose a learnable mel-frequency cepstral coefficients (MFCCs) front-end architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven version of four linear transforms in a standard MFCC extractor-windowing, discrete Fourier transform (DFT), mel filterbank and discrete cosine transform (DCT). Results reported reach up to 6.7% (VoxCeleb1) and 9.7% (SITW) relative improvement in term of equal error rate (EER) from static MFCCs, without additional tuning effort. Index Terms-Speaker verification, feature extraction, melfrequency cesptral coefficients (MFCCs).
Fichier principal
Vignette du fichier
ISCAS_2021_Xuechen.pdf (311.35 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03139532 , version 1 (12-02-2021)

Identifiants

Citer

Xuechen Liu, Md Sahidullah, Tomi Kinnunen. Learnable MFCCs for Speaker Verification. ISCAS 2021 - IEEE International Symposium on Circuits and Systems, May 2021, Daegu, South Korea. ⟨10.1109/ISCAS51556.2021.9401593⟩. ⟨hal-03139532⟩
198 Consultations
474 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More