Online Language Learning to Perform and Describe Actions for Human-Robot Interaction

Xavier Hinaut; Maxime Petit; Peter Dominey

Communication Dans Un Congrès Année : 2012

Online Language Learning to Perform and Describe Actions for Human-Robot Interaction

(1) , (1) , (1)

Xavier Hinaut

Fonction : Auteur
PersonId : 8171
IdHAL : xavier-hinaut
ORCID : 0000-0002-1924-1184
IdRef : 22823218X

Institut cellule souche et cerveau

Maxime Petit

Fonction : Auteur
PersonId : 21643
IdHAL : maxime-petit
ORCID : 0000-0001-5785-2915

Institut cellule souche et cerveau

Peter Dominey

Fonction : Auteur
PersonId : 742047
IdHAL : pfdominey
ORCID : 0000-0002-9318-179X
IdRef : 067732887

Institut cellule souche et cerveau

Résumé

The goal of this research is to provide a real-time and adaptive spoken langue interface between humans and a humanoid robot. The system should be able to learn new grammatical constructions in real-time, and then use them immediately following or in a later interactive session. In order to achieve this we use a recurrent neural network of 500 neurons-echo state network with leaky neurons [1]. The model processes sentences as grammatical constructions, in which the semantic words (nouns and verbs) are extracted and stored in working memory, and the grammatical words (prepositions, auxiliary verbs, etc.) are inputs to the network. The trained network outputs code the role (predicate, agent, object/location) that each semantic word takes. In the final output, the stored semantic words are then mapped onto their respective roles. The model thus learns the mappings between the grammatical structure of sentences and their meanings. The humanoid robot is an iCub [2] who interacts around a instrumented tactile table (ReacTable TM) on which objects can be manipulated by both human and robot. A sensory system has been developed to extract spatial relations. A speech recognition and text to speech off-the-shelf tool allows spoken communication. In parallel, the robot has a small set of actions (put(object, location), grasp(object), point(object)). These spatial relations, and action definitions form the meanings that are to be linked to sentences in the learned grammatical constructions. The target behavior of the system is to learn two conditions. In action performing (AP), the system should learn to generate the proper robot command, given a spoken input sentence. In scene description (SD), the system should learn to describe scenes given the extracted spatial relation. Training corpus for the neural model can be generated by the interaction with the user teaching the robot by describing spatial relations or actions, creating pairs. It could also be edited by hand to avoid speech recognition errors. These interactions between the different components of the system are shown in the Figure 1. The neural model processes grammatical constructions where semantic words (e.g. put, grasp, toy, left, right) are replaced by a common marker. This is done with only a predefined set of grammatical words (after, and, before, it, on, the, then, to, you). Therefore the model is able to deal with sentences that have the same constructions than previously seen sentences. In the AP condition, we demonstrate that the model can learn and generalize to complex sentences including "Before you put the toy on the left point the drums."; the robot will first point the drums and then put the toy on the left: showing here that the network is able to establish the proper chronological order of actions. Likewise, in the SD condition, the system can be exposed to a new scene and produce a description such as "To the left of the drums and to the right of the toy is the trumpet." In future research we can exploit this learning system in the context of human language development. In addition, the neural model could enable errors recovery from speech to text recognition. Index Terms: human-robot interaction, echo state network, online learning, iCub, language learning. References [1] H. Jaeger, "The "echo state" approach to analysing and training recurrent neural networks", Tech. Rep. GMD model has been developed with Oger toolbox: http://reservoir-computing.org/organic/engine. Figure 1: Communication between the speech recognition tool (that also controls the robotic platform) and the neural model.

Domaines

Réseau de neurones [cs.NE] Apprentissage [cs.LG] Robotique [cs.RO] Neurosciences [q-bio.NC] Linguistique

Fichier principal

Hinaut2012_robotdoc_workshop.pdf (373.24 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Xavier Hinaut : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02561346

Soumis le : dimanche 3 mai 2020-18:54:19

Dernière modification le : jeudi 1 février 2024-10:03:37

Dates et versions

hal-02561346 , version 1 (03-05-2020)

Identifiants

HAL Id : hal-02561346 , version 1

Citer

Xavier Hinaut, Maxime Petit, Peter Dominey. Online Language Learning to Perform and Describe Actions for Human-Robot Interaction. Post-Graduate Conference on Robotics and Development of Cognition, Sep 2012, Lausanne, Switzerland. ⟨hal-02561346⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSERM UNIV-RENNES1 UNIV-LYON1 INRA IRISA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UDL INRAE UR1-MATH-NUM

53 Consultations

70 Téléchargements

Online Language Learning to Perform and Describe Actions for Human-Robot Interaction

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager