What does the Canary Say? Low-Dimensional GAN Applied to Birdsong - Archive ouverte HAL Access content directly
Preprints, Working Papers, ... Year :

What does the Canary Say? Low-Dimensional GAN Applied to Birdsong

(1) , (1) , (2) , (1)
1
2
Silvia Pagliarini
Nathan Trouvain
  • Function : Author
Xavier Hinaut

Abstract

The generation of speech, and more generally com- plex animal vocalizations, by artificial systems is a difficult problem. Generative Adversarial Networks (GANs) have shown very good abilities for generating images, and more recently sounds. While current GANs have high-dimensional latent spaces, complex vocalizations could in principle be generated through a low-dimensional latent space, easing the visualization and evaluation of latent representations. In this study, we aim to test the ability of a previously developed GAN, called WaveGAN, to reproduce canary syllables while drastically reducing the latent space dimension. We trained WaveGAN on a large dataset of canary syllables (16000 renditions of 16 different syllable types) and varied the latent space dimensions from 1 to 6. The sounds produced by the generator are evaluated using a RNN- based classifier. This quantitative evaluation is paired with a qualitative evaluation of the GAN productions across training epochs and latent dimensions. Altogether, our results show that a 3-dimensional latent space is enough to produce all syllable types in the repertoire with a quality often indistinguishable from real canary vocalizations. Importantly, we show that the 3-dimensional GAN generalizes by interpolating between the various syllable types. We rely on UMAP [1] to qualitatively show the similarities between training and generated data, and between the generated syllables and the interpolations produced. We discuss how our study may provide tools to train simple models of vocal production and/or learning. Indeed, while the RNN- based classifier provides a biologically realistic representation of the auditory network processing vocalizations, the small dimensional GAN may be used for the production of complex vocal repertoires.
Fichier principal
Vignette du fichier
Pagliarini2021_canary_GAN__HAL-v2.pdf (22.56 Mo) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-03244723 , version 1 (01-06-2021)
hal-03244723 , version 2 (26-11-2021)

Identifiers

  • HAL Id : hal-03244723 , version 2

Cite

Silvia Pagliarini, Nathan Trouvain, Arthur Leblois, Xavier Hinaut. What does the Canary Say? Low-Dimensional GAN Applied to Birdsong. 2021. ⟨hal-03244723v2⟩

Collections

CNRS INRIA INRIA2
202 View
70 Download

Share

Gmail Facebook Twitter LinkedIn More