Skip to Main content Skip to Navigation
Conference papers

From Phonemes to Robot Commands with a Neural Parser

Xavier Hinaut 1
1 Mnemosyne - Mnemonic Synergy
LaBRI - Laboratoire Bordelais de Recherche en Informatique, Inria Bordeaux - Sud-Ouest, IMN - Institut des Maladies Neurodégénératives [Bordeaux]
Abstract : The understanding of how children acquire language [1][2], from phoneme to syntax, could be improved by computational models. In particular when they are integrated in robots [3]: e.g. by interacting with users [4] or grounding language cues [5]. Recently, speech recognition systems have greatly improved thanks to deep learning. However, for specific domain applications, like Human-Robot Interaction, using generic recognition tools such as Google API often provide words that are unknown by the robotic system when not just irrelevant [6]. Additionally, such recognition system does not provide much indications on how our brains acquire or process these phonemes, words or grammatical constructions (i.e. sentence templates). Moreover, to our knowledge they do not provide useful tools to learn from small corpora, from which a child may bootstrap from. Here, we propose a neuro-inspired approach that processes sentences word by word, or phoneme by phoneme, with no prior knowledge of the semantics of the words. Previously, we demonstrated this RNN-based model was able to generalize on grammatical constructions [7] even with unknown words (i.e. words out of the vocabulary of the training data) [8]. In this preliminary study, in order to try to overcome word misrecognition, we tested whether the same architecture is able to solve the same task directly by processing phonemes instead of grammatical constructions [9]. Applied on a small corpus, we see that the model has similar performance (even if a little weaker) when using phonemes as inputs instead of grammatical constructions. We speculate that this phoneme version could overcome the previous model when dealing with real noisy phoneme inputs, thus improving its performance in a real-time human-robot interaction.
Complete list of metadata

Cited literature [10 references]  Display  Hide  Download
Contributor : Xavier Hinaut <>
Submitted on : Sunday, December 17, 2017 - 1:15:46 AM
Last modification on : Thursday, February 7, 2019 - 4:20:00 PM


Files produced by the author(s)


  • HAL Id : hal-01665823, version 1



Xavier Hinaut. From Phonemes to Robot Commands with a Neural Parser. IEEE ICDL-EPIROB Workshop on Language Learning, Sep 2017, Lisbon, Portugal. pp.1-2. ⟨hal-01665823⟩



Record views


Files downloads