End-to-end optimization of goal-driven and visually grounded dialogue systems Harm de Vries

Florian Strub 1, 2 Harm de Vries 3 Jeremie Mary 1, 2 Bilal Piot 4 Aaron Courville 3 Olivier Pietquin 4
1 SEQUEL - Sequential Learning
Inria Lille - Nord Europe, CRIStAL - Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Abstract : End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too simplistic to render the intrinsic planning problem inherent to dialogue as well as its grounded nature , making the context of a dialogue larger than the sole history. This is why only chitchat and question answering tasks have been addressed so far using end-to-end architectures. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues , based on the policy gradient algorithm. This approach is tested on a dataset of 120k dialogues collected through Mechanical Turk and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download

https://hal.inria.fr/hal-01549642
Contributor : Florian Strub <>
Submitted on : Wednesday, June 28, 2017 - 11:44:06 PM
Last modification on : Friday, March 22, 2019 - 1:34:16 AM
Long-term archiving on : Thursday, January 18, 2018 - 2:40:26 AM

File

1703.05423.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

  • HAL Id : hal-01549642, version 1
  • ARXIV : 1703.05423

Citation

Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot, Aaron Courville, et al.. End-to-end optimization of goal-driven and visually grounded dialogue systems Harm de Vries. International Joint Conference on Artificial Intelligence, Aug 2017, Melbourne, Australia. ⟨hal-01549642⟩

Share

Metrics

Record views

488

Files downloads

287