Creating Training Corpora for NLG Micro-Planning

Claire Gardent; Anastasia Shimorina; Shashi Narayan; Laura Perez-Beltrachini

Communication Dans Un Congrès Année : 2017

Creating Training Corpora for NLG Micro-Planning

(1) , (1) , (2) , (2)

1
2

Claire Gardent

Fonction : Auteur
PersonId : 3949
IdHAL : claire-gardent
ORCID : 0000-0002-3805-6662
IdRef : 034104593

Natural Language Processing : representations, inference and semantics

Anastasia Shimorina

Fonction : Auteur
PersonId : 1015010

Natural Language Processing : representations, inference and semantics

Shashi Narayan

Fonction : Auteur
PersonId : 767539
IdRef : 182508757

School of Informatics [Edimbourg]

Laura Perez-Beltrachini

Fonction : Auteur
PersonId : 882796

School of Informatics [Edimbourg]

Résumé

In this paper, we focus on how to create data-to-text corpora which can support the learning of wide-coverage micro-planners i.e., generation systems that handle lexicalisation, aggregation, surface re-alisation, sentence segmentation and referring expression generation. We start by reviewing common practice in designing training benchmarks for Natural Language Generation. We then present a novel framework for semi-automatically creating linguistically challenging NLG corpora from existing Knowledge Bases. We apply our framework to DBpedia data and compare the resulting dataset with (Wen et al., 2016)'s dataset. We show that while (Wen et al., 2016)'s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of generating text from KB data.

Domaines

Informatique [cs]

Fichier principal

2017-ACL-webnlgchallenge.pdf (269.8 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Claire Gardent : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01623744

Soumis le : mercredi 25 octobre 2017-16:05:04

Dernière modification le : jeudi 7 mars 2024-12:32:05

Archivage à long terme le : vendredi 26 janvier 2018-14:30:47

Dates et versions

hal-01623744 , version 1 (25-10-2017)

Identifiants

HAL Id : hal-01623744 , version 1

Citer

Claire Gardent, Anastasia Shimorina, Shashi Narayan, Laura Perez-Beltrachini. Creating Training Corpora for NLG Micro-Planning. 55th annual meeting of the Association for Computational Linguistics (ACL), Jul 2017, Vancouver, Canada. ⟨hal-01623744⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE LORIA LORIA-NLPKD

1071 Consultations

677 Téléchargements

Creating Training Corpora for NLG Micro-Planning

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager