Creating Training Corpora for NLG Micro-Planning

Abstract : In this paper, we focus on how to create data-to-text corpora which can support the learning of wide-coverage micro-planners i.e., generation systems that handle lexicalisation, aggregation, surface re-alisation, sentence segmentation and referring expression generation. We start by reviewing common practice in designing training benchmarks for Natural Language Generation. We then present a novel framework for semi-automatically creating linguistically challenging NLG corpora from existing Knowledge Bases. We apply our framework to DBpedia data and compare the resulting dataset with (Wen et al., 2016)'s dataset. We show that while (Wen et al., 2016)'s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of generating text from KB data.
Document type :
Conference papers
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.inria.fr/hal-01623744
Contributor : Claire Gardent <>
Submitted on : Wednesday, October 25, 2017 - 4:05:04 PM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM
Long-term archiving on : Friday, January 26, 2018 - 2:30:47 PM

File

2017-ACL-webnlgchallenge.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01623744, version 1

Citation

Claire Gardent, Anastasia Shimorina, Shashi Narayan, Laura Perez-Beltrachini. Creating Training Corpora for NLG Micro-Planning. 55th annual meeting of the Association for Computational Linguistics (ACL), Jul 2017, Vancouver, Canada. ⟨hal-01623744⟩

Share

Metrics

Record views

605

Files downloads

143