Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Canary Song Decoder: Transduction and Implicit Segmentation with ESNs and LTSMs

Nathan Trouvain 1 Xavier Hinaut 1
1 Mnemosyne - Mnemonic Synergy
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest, IMN - Institut des Maladies Neurodégénératives [Bordeaux]
Abstract : Domestic canaries produce complex vocal patterns embedded in various levels of abstraction. Studying such temporal organization is of particular relevance to understand how animal brains represent and process vocal inputs such as language. However, this requires a large amount of annotated data. We propose a fast and easy-to-train transducer model based on RNN architectures to automate parts of the annotation process. This is similar to a speech recognition task. We demonstrate that RNN architectures can be efficiently applied on spectral features (MFCC) to annotate songs at time frame level and at phrase level. We achieved around 95% accuracy at frame level on particularly complex canary songs, and ESNs achieved around 5% of word error rate (WER) at phrase level. Moreover, we are able to build this model using only around 13 to 20 minutes of annotated songs. Training time takes only 35 seconds using 2 hours and 40 minutes of data for the ESN, allowing to quickly run experiments without the need of powerful hardware.
Complete list of metadata
Contributor : Xavier Hinaut <>
Submitted on : Tuesday, April 20, 2021 - 5:46:33 PM
Last modification on : Thursday, April 22, 2021 - 3:29:09 AM


Files produced by the author(s)


  • HAL Id : hal-03203374, version 1



Nathan Trouvain, Xavier Hinaut. Canary Song Decoder: Transduction and Implicit Segmentation with ESNs and LTSMs. 2021. ⟨hal-03203374⟩



Record views


Files downloads