Task Completion Transfer Learning for Reward Inference
Résumé
Reinforcement learning-based spoken dialogue systems aim
to compute an optimal strategy for dialogue management
from interactions with users. They compare their different
management strategies on the basis of a numerical reward
function. Reward inference consists of learning a reward
function from dialogues scored by users. A major issue for reward
inference algorithms is that important parameters influence
user evaluations and cannot be computed online. This is
the case of task completion. This paper introduces Task Completion
Transfer Learning (TCTL): a method to exploit the
exact knowledge of task completion on a corpus of dialogues
scored by users in order to optimise online learning. Compared
to previously proposed reward inference techniques,
TCTL returns a reward function enhanced with the possibility
to manage the online non-observability of task completion.
A reward function is learnt with TCTL on dialogues with a
restaurant seeking system. It is shown that the reward function
returned by TCTL is a better estimator of dialogue performance
than the one returned by reward inference.