Skip to Main content Skip to Navigation
Theses

From Logic to Language: Natural Language Generation from Logical Forms

Abstract : The central theme of this thesis is the generation of natural language from a formal representation of meaning. In a nutshell, the problem we want to solve is to go from a logical formula such as forall(x) man(x) implies exists(y) woman(y) and love(x,y) to the sentence “every man loves a woman”. This is achieved by employing several computer al- gorithms and statistical techniques. Moreover, not all representation formalisms are equal, some being better than others for the purpose of generation. The first chapter of this thesis starts by presenting to the reader the problem of Nat- ural Language Generation in a general way. This chapter introduces the formal representation of the meaning of natural language, in particular with formalisms based on logic, followed by the relationship between NLG and Machine Transla- tion, and the motivation behind the approach to NLG undertaken in this thesis, that is, robustness and theoretical soundness. Finally, the research questions that drive the work presented in the rest of the thesis are formulated. Chapter 2 contains a review of the literature in the field of Natural Language Gen- eration, with particular focus on methods and problems relevant to several aspects of the work presented in the rest of this thesis. The chapter begins with present- ing the traditional architectural organization of NLG tasks, then presents a review of previous work on statistical generation, generation from knowledge bases, and prediction of surface order, that is, the order of the words in the final output. The second part of the chapter shows how the representations of meaning found in the literature do not support well approaches to generation like the one proposed in this thesis. The chapter ends with a review of software packages for NLG. Chapter 3 introduces the plan for the architecture of a novel system that generates natural language expressions given formal representations of meaning as input. Two of the modules, namely the surface order prediction module and the lexical- ization module are covered in greater detail in the central chapters of the thesis. The surface realization module, positioned at the end of the pipeline, is also described in this chapter. It takes the output of the previous modules and produces the complete surface form that expresses the meaning encoded in the original meaning repre- sentation. This chapter also introduces a novel formalism for the representation of meaning, based on formal logic. The crucial feature of this formalism, called Dis- course Representation Graphs, is that it favors the alignment between the abstract meaning representation and the text at the level of words. Thanks to this schema of alignment, several tasks of the NLG pipeline can be treated by supervised machine learning approaches. Chapter 4 presents a semantically annotated resource called the Groningen Meaning Bank. This resource is used to train the models introduced in the previous chapter, as well as to extract data to test the proposed approaches through experimental tri- als. A number of design choices have been made for the creation and the annotation of the resource, and several software tools were employed to automatically analyze large quantity of text. Moreover, crowdsourcing methods were applied to gather linguistic annotations from the public, through a Web interface for experts and a Game With A Purpose called Wordrobe. Chapter 5 covers the module of the system responsible for the prediction of the order of words and phrases composing the surface form. This problem is solved by leveraging a dataset of text-aligned meaning representations and building statistical models for learning to rank to predict the order of small sets of items local to each concept. Chapter 6 presents the other central module, that is, the module responsible for the production of content words for the concepts contained in the original abstract meaning representation. This module actually solves two problems: the choice of the correct lemma from a closed set of options, based on the semantic content to convey, and the production of the correct morphological inflection. For the first task, two alternative methods are proposed: an unsupervised one and a supervised one, trained on GMB data. For the second task, a pilot study is presented in which the problem is solved by a supervised model of inflectional morphology of English. The final chapter contains a series of reflections to conclude the thesis. A look a posteriori highlights the decisions that have been made for the design of the NLG system, thus inviting to speculate about alternative directions. Several problems are still open and it is important to consider how they affect the performance of the system. While these issues need to be addressed in order to obtain better results, the approach to NLG presented in this thesis is a step forward in improving existing approached to generation from logical forms.
Complete list of metadata

Cited literature [243 references]  Display  Hide  Download

https://hal.inria.fr/tel-01342434
Contributor : Valerio Basile <>
Submitted on : Wednesday, July 6, 2016 - 9:14:28 AM
Last modification on : Monday, October 9, 2017 - 1:18:03 PM

Identifiers

  • HAL Id : tel-01342434, version 1

Collections

Citation

Valerio Basile. From Logic to Language: Natural Language Generation from Logical Forms. Linguistics. University of Groningen, 2015. English. ⟨tel-01342434⟩

Share

Metrics

Record views

416

Files downloads

3318