MEDIA: a semantically annotated corpus of task oriented dialogs in French

Abstract : The aim of the French Media project was to define a protocol for the evaluation of speech understanding modules for dialog systems. Accordingly, a corpus of 1,257 real spoken dialogs related to hotel reservation and tourist information was recorded, transcribed and semantically annotated, and a semantic attribute-value representation was defined in which each conceptual relationship was represented by the names of the attributes. Two semantic annotation levels are distinguished in this approach. At the first level, each utterance is considered separately and the annotation represents the meaning of the statement without taking into account the dialog context. The second level of annotation then corresponds to the interpretation of the meaning of the statement by taking into account the dialog context; in this way a semantic representation of the dialog context is defined. This paper discusses the data collection, the detailed definition of both annotation levels, and the annotation scheme. Then the paper comments on both evaluation campaigns which were carried out during the project and discusses some results.
Hélène Bonneau-Maynard, Matthieu Quignard, Alexandre Denis. MEDIA: a semantically annotated corpus of task oriented dialogs in French. Language Resources and Evaluation, Springer Verlag, 2009, Language Resources and Evaluation, 43 (4), pp.329-354. ⟨10.1007/s10579-009-9103-2⟩. ⟨inria-00424619⟩



