A Machine of Few Words Interactive Speaker Recognition with Reinforcement Learning

Mathieu Seurin; Florian Strub; Philippe Preux; Olivier Pietquin

doi:10.21437/Interspeech.2020-2892

Communication Dans Un Congrès Année : 2020

A Machine of Few Words Interactive Speaker Recognition with Reinforcement Learning

(1, 2, 3, 4) , (5) , (1, 2, 3, 4) , (6)

1
2
3
4
5
6

Mathieu Seurin

Fonction : Auteur
PersonId : 1039295

Scool

Sequential Learning

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Université de Lille

Florian Strub

Fonction : Auteur
PersonId : 18649
IdHAL : florian-strub
ORCID : 0000-0001-7271-5345

DeepMind [Paris]

Philippe Preux

Fonction : Auteur
PersonId : 5488
IdHAL : preux-philippe
IdRef : 059896353

Scool

Sequential Learning

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Université de Lille

Olivier Pietquin

Fonction : Auteur
PersonId : 4024
IdHAL : olivier-pietquin
ORCID : 0000-0002-5386-465X
IdRef : 142821861

Google Research [Paris]

Résumé

Speaker recognition is a well known and studied task in the speech processing domain. It has many applications, either for security or speaker adaptation of personal devices. In this paper, we present a new paradigm for automatic speaker recognition that we call Interactive Speaker Recognition (ISR). In this paradigm, the recognition system aims to incrementally build a representation of the speakers by requesting personalized utterances to be spoken in contrast to the standard text-dependent or text-independent schemes. To do so, we cast the speaker recognition task into a sequential decision-making problem that we solve with Reinforcement Learning. Using a standard dataset, we show that our method achieves excellent performance while using little speech signal amounts. This method could also be applied as an utterance selection mechanism for building speech synthesis systems.

Mots clés

active speaker recognition reinforcement learning deep learning iterative representation learning

Domaines

Informatique [cs] Apprentissage [cs.LG] Intelligence artificielle [cs.AI] Traitement du signal et de l'image [eess.SP]

Fichier principal

Interspeech_2020.pdf (641.83 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Mathieu Seurin : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03123999

Soumis le : jeudi 28 janvier 2021-12:18:41

Dernière modification le : mercredi 24 janvier 2024-09:54:22

Archivage à long terme le : jeudi 29 avril 2021-18:44:11

Dates et versions

hal-03123999 , version 1 (28-01-2021)

Identifiants

HAL Id : hal-03123999 , version 1
DOI : 10.21437/Interspeech.2020-2892

Citer

Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin. A Machine of Few Words Interactive Speaker Recognition with Reinforcement Learning. Conference of the International Speech Communication Association (INTERSPEECH), Oct 2020, Shanghai, China. ⟨10.21437/Interspeech.2020-2892⟩. ⟨hal-03123999⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CRISTAL INRIA2 CRISTAL-SEQUEL UNIV-LILLE CRISTAL-SCOOL

86 Consultations

115 Téléchargements

A Machine of Few Words Interactive Speaker Recognition with Reinforcement Learning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager