Skip to Main content Skip to Navigation

Natural Language Generation for Language Learning

Laura Haide Perez 1
1 SYNALP - Natural Language Processing : representations, inference and semantics
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : In this work, we explore how Natural Language Generation (NLG) techniques can be used to address the task of (semi-)automatically generating language learning material and activities in Camputer-Assisted Language Learning (CALL). In particular, we show how a grammar-based Surface Realiser (SR) can be usefully exploited for the automatic creation of grammar exercises. Our surface realiser uses a wide-coverage reversible grammar namely SemTAG, which is a Feature-Based Tree Adjoining Grammar (FB-TAG) equipped with a unification-based compositional semantics. More precisely, the FB-TAG grammar integrates a flat and underspecified representation of First Order Logic (FOL) formulae. In the first part of the thesis, we study the task of surface realisation from flat semantic formulae and we propose an optimised FB-TAG-based realisation algorithm that supports the generation of longer sentences given a large scale grammar and lexicon. The approach followed to optimise TAG-based surface realisation from flat semantics draws on the fact that an FB-TAG can be translated into a Feature-Based Regular Tree Grammar (FB-RTG) describing its derivation trees. The derivation tree language of TAG constitutes a simpler language than the derived tree language, and thus, generation approaches based on derivation trees have been already proposed. Our approach departs from previous ones in that our FB-RTG encoding accounts for feature structures present in the original FB-TAG having thus important consequences regarding over-generation and preservation of the syntax-semantics interface. The concrete derivation tree generation algorithm that we propose is an Earley-style algorithm integrating a set of well-known optimisation techniques: tabulation, sharing-packing, and semantic-based indexing. In the second part of the thesis, we explore how our \semtag -based surface realiser can be put to work for the (semi-) automatic generation of grammar exercises. Usually, teachers manually edit exercises and their solutions, and classify them according to the degree of dificulty or expected learner level. A strand of research in (Natural Language Processing (NLP) for CALL addresses the (semi-)automatic generation of exercises. Mostly, this work draws on texts extracted from the Web, use machine learning and text analysis techniques (e.g. parsing, POS tagging, etc.). These approaches expose the learner to sentences that have a potentially complex syntax and diverse vocabulary. In contrast, the approach we propose in this thesis addresses the (semi-) automatic generation of grammar exercises of the type found in grammar textbooks. In other words, it deals with the generation of exercises whose syntax and vocabulary are tailored to specific pedagogical goals and topics. Because the grammar-based generation approach associates natural language sentences with a rich linguistic description, it permits defining a syntactic and morpho-syntactic constraints specification language for the selection of stem sentences in compliance with a given pedagogical goal. Further, it allows for the post processing of the generated stem sentences to build grammar exercise items. We show how Fill-in-the-blank, Shuffle and Reformulation grammar exercises can be automatically produced. The approach has been integrated in the Interactive French Learning Game (I-FLEG) serious game for learning French and has been evaluated both based in the interactions with online players and in collaboration with a language teacher.
Complete list of metadata
Contributor : Laura Perez-Beltrachini Connect in order to contact the contributor
Submitted on : Monday, January 18, 2016 - 3:33:39 PM
Last modification on : Saturday, October 16, 2021 - 11:26:06 AM
Long-term archiving on: : Tuesday, April 19, 2016 - 10:13:21 AM


  • HAL Id : tel-01749799, version 2


Laura Haide Perez. Natural Language Generation for Language Learning. Artificial Intelligence [cs.AI]. Université de Lorraine, 2013. English. ⟨NNT : 2013LORR0062⟩. ⟨tel-01749799v2⟩



Record views


Files downloads