HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Creating Training Corpora for NLG Micro-Planning

Abstract : In this paper, we focus on how to create data-to-text corpora which can support the learning of wide-coverage micro-planners i.e., generation systems that handle lexicalisation, aggregation, surface re-alisation, sentence segmentation and referring expression generation. We start by reviewing common practice in designing training benchmarks for Natural Language Generation. We then present a novel framework for semi-automatically creating linguistically challenging NLG corpora from existing Knowledge Bases. We apply our framework to DBpedia data and compare the resulting dataset with (Wen et al., 2016)'s dataset. We show that while (Wen et al., 2016)'s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of generating text from KB data.
Document type :
Conference papers
Complete list of metadata

Cited literature [16 references]  Display  Hide  Download

Contributor : Claire Gardent Connect in order to contact the contributor
Submitted on : Wednesday, October 25, 2017 - 4:05:04 PM
Last modification on : Wednesday, November 24, 2021 - 9:54:10 AM
Long-term archiving on: : Friday, January 26, 2018 - 2:30:47 PM


Files produced by the author(s)


  • HAL Id : hal-01623744, version 1



Claire Gardent, Anastasia Shimorina, Shashi Narayan, Laura Perez-Beltrachini. Creating Training Corpora for NLG Micro-Planning. 55th annual meeting of the Association for Computational Linguistics (ACL), Jul 2017, Vancouver, Canada. ⟨hal-01623744⟩



Record views


Files downloads