Skip to Main content Skip to Navigation
Conference papers

Expanding lexicons by inducing paradigms and validating attested forms

Abstract : One of the bottlenecks in Natural Language Processing for a given language is creating a lexicon that covers the language. The morphological lexicon provides two important pieces of information for NLP applications: 1) the normalization of a word, its lemmatization, which allows the application to recognize two variants of the same word; and 2) the part-of-speech roles that the word can play, which allows the application to parse the text, creating relations between the words in a text. Many NLP applications, e.g. Information Retrieval, Classification, Terminology Extraction, etc., depend upon the normalization and parsing information found in lexicons. When words are not present in these lexicons, it is difficult to predict what their proper lemmatizations and parts-of-speech are. In this paper we present a technique for updating a lexicon given an unknown word via induction of paradigms from an existing, but incomplete, lexicon and validation of the paradigm using corpus evidence.
Complete list of metadata

Cited literature [10 references]  Display  Hide  Download
Contributor : Gregory Grefenstette Connect in order to contact the contributor
Submitted on : Thursday, November 6, 2014 - 4:58:20 PM
Last modification on : Friday, February 4, 2022 - 3:17:43 AM
Long-term archiving on: : Saturday, February 7, 2015 - 11:16:26 AM


grefquevans-libre (1).pdf
Files produced by the author(s)


  • HAL Id : hal-01081023, version 1



Gregory Grefenstette, yan Qu, David Evans. Expanding lexicons by inducing paradigms and validating attested forms. LREC 2002, May 2002, LAS PALMAS, Spain. ⟨hal-01081023⟩



Record views


Files downloads