Creating Training Corpora for NLG Micro-Planning

Abstract : In this paper, we focus on how to create data-to-text corpora which can support the learning of wide-coverage micro-planners i.e., generation systems that handle lexicalisation, aggregation, surface re-alisation, sentence segmentation and referring expression generation. We start by reviewing common practice in designing training benchmarks for Natural Language Generation. We then present a novel framework for semi-automatically creating linguistically challenging NLG corpora from existing Knowledge Bases. We apply our framework to DBpedia data and compare the resulting dataset with (Wen et al., 2016)'s dataset. We show that while (Wen et al., 2016)'s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of generating text from KB data.
Type de document :
Communication dans un congrès
55th annual meeting of the Association for Computational Linguistics (ACL), Jul 2017, Vancouver, Canada
Liste complète des métadonnées

Littérature citée [16 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01623744
Contributeur : Claire Gardent <>
Soumis le : mercredi 25 octobre 2017 - 16:05:04
Dernière modification le : mardi 24 avril 2018 - 13:30:46
Document(s) archivé(s) le : vendredi 26 janvier 2018 - 14:30:47

Fichier

2017-ACL-webnlgchallenge.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01623744, version 1

Citation

Claire Gardent, Anastasia Shimorina, Shashi Narayan, Laura Perez-Beltrachini. Creating Training Corpora for NLG Micro-Planning. 55th annual meeting of the Association for Computational Linguistics (ACL), Jul 2017, Vancouver, Canada. 〈hal-01623744〉

Partager

Métriques

Consultations de la notice

379

Téléchargements de fichiers

86