Learnable Nonlinear Compression for Robust Speaker Verification

Xuechen Liu; Md Sahidullah; Tomi Kinnunen

doi:10.1109/ICASSP43922.2022.9747185

Communication Dans Un Congrès Année : 2022

Learnable Nonlinear Compression for Robust Speaker Verification

(1, 2) , (1) , (2)

1
2

Xuechen Liu

Fonction : Auteur

Speech Modeling for Facilitating Oral-Based Communication

University of Eastern Finland

Md Sahidullah

Fonction : Auteur
PersonId : 737397
IdHAL : sahid

Speech Modeling for Facilitating Oral-Based Communication

Tomi Kinnunen

Fonction : Auteur

University of Eastern Finland

Résumé

In this study, we focus on nonlinear compression methods in spectral features for speaker verification based on deep neural network. We consider different kinds of channel-dependent (CD) nonlinear compression methods optimized in a data-driven manner. Our methods are based on power nonlinearities and dynamic range compression (DRC). We also propose multi-regime (MR) design on the nonlinearities, at improving robustness. Results on VoxCeleb1 and Vox-Movies data demonstrate improvements brought by proposed compression methods over both the commonly-used logarithm and their static counterparts, especially for ones based on power function. While CD generalization improves performance on VoxCeleb1, MR provides more robustness on VoxMovies, with a maximum relative equal error rate reduction of 21.6%.

Mots clés

Speaker Verification Nonlinear Compression Multi-Regime Compression

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Apprentissage [cs.LG] Son [cs.SD] Traitement du signal et de l'image [eess.SP]

Fichier principal

LearnableNonlinear_ICASSP2022.pdf (2 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Md Sahidullah : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03616852

Soumis le : mercredi 23 mars 2022-04:20:09

Dernière modification le : lundi 11 septembre 2023-17:41:19

Archivage à long terme le : vendredi 24 juin 2022-18:15:16

Dates et versions

hal-03616852 , version 1 (23-03-2022)

Identifiants

HAL Id : hal-03616852 , version 1
ARXIV : 2202.05236
DOI : 10.1109/ICASSP43922.2022.9747185

Citer

Xuechen Liu, Md Sahidullah, Tomi Kinnunen. Learnable Nonlinear Compression for Robust Speaker Verification. ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing, May 2022, Singapore, Singapore. ⟨10.1109/ICASSP43922.2022.9747185⟩. ⟨hal-03616852⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD

49 Consultations

64 Téléchargements

Learnable Nonlinear Compression for Robust Speaker Verification

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager