Expanding lexicons by inducing paradigms and validating attested forms

Abstract : One of the bottlenecks in Natural Language Processing for a given language is creating a lexicon that covers the language. The morphological lexicon provides two important pieces of information for NLP applications: 1) the normalization of a word, its lemmatization, which allows the application to recognize two variants of the same word; and 2) the part-of-speech roles that the word can play, which allows the application to parse the text, creating relations between the words in a text. Many NLP applications, e.g. Information Retrieval, Classification, Terminology Extraction, etc., depend upon the normalization and parsing information found in lexicons. When words are not present in these lexicons, it is difficult to predict what their proper lemmatizations and parts-of-speech are. In this paper we present a technique for updating a lexicon given an unknown word via induction of paradigms from an existing, but incomplete, lexicon and validation of the paradigm using corpus evidence.
Type de document :
Communication dans un congrès
LREC 2002, May 2002, LAS PALMAS, Spain. 2002
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01081023
Contributeur : Gregory Grefenstette <>
Soumis le : jeudi 6 novembre 2014 - 16:58:20
Dernière modification le : jeudi 9 février 2017 - 15:47:19
Document(s) archivé(s) le : samedi 7 février 2015 - 11:16:26

Fichier

grefquevans-libre (1).pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01081023, version 1

Collections

Citation

Gregory Grefenstette, Yan Qu, David Evans. Expanding lexicons by inducing paradigms and validating attested forms. LREC 2002, May 2002, LAS PALMAS, Spain. 2002. 〈hal-01081023〉

Partager

Métriques

Consultations de la notice

177

Téléchargements de fichiers

84