Surface Realisation from Knowledge Bases

Bikash Gyawali 1
1 SYNALP - Natural Language Processing : representations, inference and semantics
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Natural Language Generation (NLG) is the task of automatically producing natural language text to describe information present in non-linguistic data. It involves three main subtasks: (i) selecting the relevant portion of input data; (ii) determining the words that will be used to verbalise the selected data; and (iii) mapping these words into natural language text. The latter task is known as Surface Realisation (SR). In my thesis, I study the SR task in the context of input data coming from Knowledge Bases (KB). I present two novel approaches to surface realisation from knowledge bases: a supervised approach and a weakly supervised approach. In the first, supervised, approach, I present a corpus-based method for inducing a Feature Based Lexicalized Tree Adjoining Grammar (FB-LTAG) from a parallel corpus of text and data. The resulting grammar includes a unification based semantics and can be used by an existing surface realiser to generate sentences from test data. I show that the induced grammar is compact and generalises well over the test data yielding results that are close to those produced by a handcrafted symbolic approach and which outperform an alternative statistical approach. In the weakly supervised approach, I explore a method for surface realisation from KB data which uses a supplied lexicon but does not require a parallel corpus. Instead, I build a corpus from heterogeneous sources of domain-related text and use it to identify possible lexicalisations of KB symbols (classes and relations) and their verbalisation patterns (frames). Based on the observations made, I build different probabilistic models which are used for selection of appropriate frames and syntax/semantics linking while verbalising KB inputs. I evaluate the output sentences and analyse the issues relevant to learning from non-parallel corpora. In both these approaches, I use the data derived from an existing biomedical ontology as a reference input. The proposed methods are generic and can be easily adapted for input from other ontologies for which a parallel/non-parallel corpora exists.
Complete list of metadatas

Cited literature [89 references]  Display  Hide  Download

https://hal.inria.fr/tel-01754499
Contributor : Gyawali Bikash <>
Submitted on : Thursday, February 18, 2016 - 4:05:09 PM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM
Long-term archiving on : Thursday, May 19, 2016 - 11:00:45 AM

Identifiers

  • HAL Id : tel-01754499, version 2

Citation

Bikash Gyawali. Surface Realisation from Knowledge Bases. Computation and Language [cs.CL]. Université de Lorraine, 2016. English. ⟨NNT : 2016LORR0004⟩. ⟨tel-01754499v2⟩

Share

Metrics

Record views

416

Files downloads

1063